Atlas Web>AtlasComputing>ComputingTutorials>SoftwareTutorial>SoftwareTutorialAnalyzingD3PDsInROOT (18-Mar-2013, AlexanderMadsen)

EditAttachPDF

SoftwareTutorialAnalyzingD3PDsInROOT

Introduction
A Basic EventLoop Algorithm
Basic Selection Tools
- Using ObjectSelectors
- Selecting Good Events with GoodRunsLists
Getting More out of EventLoop
Writing out a Slimmed n-Tuple
Making an Automatic Stack Plot With SampleHandler
Using MultiDraw
Version of the Tutorial in SVN

Introduction

This part of the tutorial deals with D3PD analysis using ROOT only. This part of the tutorial should work on any Linux or Mac OS system that has ROOT installed, but the instructions below are for running on lxplus. You are encouraged to try it on whatever system you like, but if you don't use lxplus, you'll need to setup ROOT yourself and copy the tutorial files from CERN (e.g. via AFS or scp).

This tutorial will introduce you to all the tools the Physics Analysis Tools group (PAT) provides for D3PD analysis in ROOT. While doing this tutorial you should try out all these tools, though for your actual analysis you are free to replace our tools with tools you have written yourself or which you received from other sources. If you choose not to use our tools or have a suggestion on what we could do better, please let us know how we can improve our tools and we will do our best to accommodate that.

There are two exceptions to this mix and match rule: The performance groups distribute various tools for performing good object selection, corrections, etc. In general the validity of your physics result will depend on following these prescriptions precisely (including any last minute changes), so unless you enjoy doing unnecessary checks as part of your blessing procedure you should just use the official tools. The other exception is the package manager RootCore, which the other packages (including the performance tools) depend on. While not impossible to skip, it would be a lot of unnecessary effort.

This tutorial is divided up in sections. Once you have completed the first section which gives you the essential setup (A Basic EventLoop Algorithm), the remaining sections may be completed in any order. However, unless otherwise noted, within each section you should work from start to finish. At the beginning of each section is a short introduction as to what you should learn in this section for orientation.

Note: This tutorial is meant as an example of the basic usage of tools on lxplus using standalone root. Occasionally you will see sections like this with remarks on how you would change things to run in other places or how to change things for serious work. If you see a section like this you can typically ignore it safely.

A Basic EventLoop Algorithm

The purpose of this section is to provide you with the basic setup you will need to complete the remaining section. In the course you will get a basic introduction to the package manager RootCore, the n-tuple reader D3PDReader, the sample manager SampleHandler and the job manager EventLoop. All of these will receive a followup in-depth discussion in further sections of the tutorial. At the end of the section you will have a job that runs over the data and produces a single histogram, which is pretty much the minimal amount of work you can do.

Setting up Root

As the first step, log in to the machine you want to use. Lxplus is a safe default, but you should be able to do this on any Linux or MacOS machine with root installed:

ssh lxplus.cern.ch

Now let's setup ROOT (you'll have to do this every time you log in):

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
localSetupROOT

Note that here we are using ROOT from CVMFS at CERN, which will work on lxplus as well as on many tier 3 sites. More information can be found here. If your shell is csh based (e.g. tcsh) rather than bash based (e.g. zsh), replace "sh" with "csh" above. As an alternative you can also set up an ATLAS release and use its root release. However you should NOT try to setup both ROOT and an ATLAS release in the same session, as it can cause problems. Also, if you log in again, you have to setup ROOT exactly the same way, or you will likely get in trouble.

It is highly recommended not to put the code for setting up root into your .login file, since you will often work on different projects, each of which may require a different way to setup your environment. However you can put all these setup commands into a file setup.sh (or whatever you'd like to call it) and source that whenever you log in. You would naturally also add the RootCore setup from the next section to this as well.

Warning: When using root from CVMFS it will occasionally be updated to the latest version. RootCore will detect such a change and give you a warning, in which case you should follow the instructions in the next section for changing root versions. Please note that while updating to the latest root version has its advantages, it can at the same time break your existing code. If you want to have a specific version of root (and avoid this trouble), you can list all available versions of root by calling
showVersions --show=root
and then set up a particular version using
localSetupROOT --rootVersion=<value>

Other Sites: If you don't work on lxplus, make sure that you have a local root version installed. You can then set it up by using it's thisroot.sh script:
source /opt/root/root_5.27.06/bin/thisroot.sh
Or you can manually set ROOTSYS, LD_LIBRARY_PATH, and PATH to point to your local ROOT installation, e.g.:
export ROOTSYS=/opt/root/root_5.27.06
export LD_LIBRARY_PATH=${ROOTSYS}/lib:$LD_LIBRARY_PATH
export PATH=$ROOTSYS/bin:$PATH

Setting up RootCore

First make a directory in which you want to put your RootCore installation:

mkdir ROOTTutorial
cd ROOTTutorial

Now let's setup RootCore. I remind you that RootCore is PAT's system for building packages outside of the Athena environment. First we need to check out RootCore:

svn co svn+ssh://svn.cern.ch/reps/atlasoff/PhysicsAnalysis/D3PDTools/RootCore/tags/RootCore-00-01-52 RootCore

Note that if you don't have ssh keys or a kerberos ticket setup, you'll have to type in your CERN password every time you check something out via ssh. This can be quite tedious, so the recommendation is for you to do setup one or the other whenever possible.

Now you have to let RootCore configure itself. To do this, execute the following:

cd RootCore/
./configure
cd ..

This will scan your local system, detect the location of your ROOT installation and configure RootCore accordingly. You will have to repeat that step in a fresh shell whenever you change your ROOT or RootCore version. The next step is to set up your local environment for RootCore (you need to repeat this every time you log in):

source RootCore/scripts/setup.sh

Now you are in the RootCore build environment.

Warning: If you ever change your root or RootCore version you have to repeat these steps in a fresh shell. If you change the root version (on purpose or by accident) you should remove the old object files as well to force recompilation with the new root version or in other words you need to call
$ROOTCOREDIR/scripts/clean.sh

For future reference, the RootCore documentation walks you through how to add packages, build, etc and is a good resource.

Checking out Packages and Building

The next step is to check out all the packages used in this tutorial. We could also check them out as we use them, but for simplicity we'll check them out at once. We could do that ourselves directly, but instead we are going to ask RootCore to do it for us. For that create a file packages.txt with the list of packages:

atlasoff/DataQuality/GoodRunsLists/tags
atlasoff/PhysicsAnalysis/AnalysisCommon/PATCore/tags
atlasoff/PhysicsAnalysis/D3PDTools/EventLoop/tags
atlasoff/PhysicsAnalysis/D3PDTools/EventLoopGrid/tags
atlasoff/PhysicsAnalysis/D3PDTools/EventLoopAlgs/tags
atlasoff/PhysicsAnalysis/D3PDTools/MultiDraw/tags
atlasoff/PhysicsAnalysis/D3PDTools/RootCore/tags
atlasoff/PhysicsAnalysis/D3PDTools/RootCoreUtils/tags
atlasoff/PhysicsAnalysis/D3PDTools/SampleHandler/tags
atlasoff/PhysicsAnalysis/JetMissingEtID/JetSelectorTools/tags
atlasoff/Reconstruction/egamma/egammaEvent/tags

Please note that in this case we ask for a tagged version, but do not specify a tag. This instructs RootCore to use the latest tagged version, which is usually the latest stable version (as opposed to the trunk version, which may be in a non-working situation). In practice this is often a good choice, though there are good reasons to pick a specific tagged version instead.

The actual command for checking out the packages is:

$ROOTCOREDIR/scripts/checkout.sh packages.txt

Please note that this will ignore any packages that are already checked, in this case RootCore. We still have RootCore in the list, so that if we ever decide to update our packages to the latest version, RootCore will be updated as well.

Next you have to ask RootCore to locate all the packages and check their dependencies. You'll have to repeat this step whenever you check out or create a new package, or change the dependencies of an existing package. It is very important that you run this inside ROOTTutorial directory and not one of the subdirectories, since it picks up all the packages inside the current directory:

$ROOTCOREDIR/scripts/find_packages.sh

Sometimes when debugging problems you may want to find out which versions of the packages you are actually using. Particularly if you send out email asking for help you should provide that information. RootCore provides a script that lists all the currently configured packages for you:

$ROOTCOREDIR/scripts/version.sh

Now you have to compile all of the packages. You essentially have to do that whenever you change your code or add packages. The first time you do this it may take a while, but subsequent compilations should be faster. You can also compile packages individually, but that doesn't do quite as much as compiling all packages, so when you add packages or programs you should just recompile everything:

$ROOTCOREDIR/scripts/compile.sh

Adding D3PDReader Skeletons

For this tutorial we will be using the D3PDReader to read in the data, which is the way recommended by the PAT group, but there are numerous other ways. Maybe the main advantage of the D3PDReader is that it is highly optimized, meaning (everything else the same) your code will likely run faster with less effort.

The D3PDReader is actually not a centrally distributed package, instead the package gets generated from the data files you are using. This is necessary, so that it can contain all the variables contained your D3PD. Please note that this only works with D3PDs produced with the D3PDMaker, not with custom n-tuples you may have produced yourself. The documentation for the package that generates these classes can be found under D3PDMakerReader.

The D3PDReader package is actually generated using Athena, so you will have to open up a new shell and setup athena. Please don't use the same session, as setting up Athena may confuse RootCore. On lxplus you can setup Athena in your test area (you may already have the first three lines in your login script from the first day in the tutorial in that case you do not need to do these lines):

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
asetup 17.2.7.4.1,here,64

Now, generate the D3PDReader files in this session (the .root files are the files we will be using later on). First, go to the main directory and create a new code directory where the code will go to.

cd ROOTTutorial
mkdir code

Then, run the code generator:

d3pdReadersFromFile.exe -f /afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/user.krumnack.pat_tutorial_data.v1/user.lheelan.003342.MyFirstD3PD._00001.root /afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/user.krumnack.pat_tutorial_ttbar_corrected.v1/user.lheelan.003375.MyFirstD3PD._00001.root /afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/user.krumnack.pat_tutorial_Zmumu.v1/user.lheelan.003373.MyFirstD3PD._00001.root -n Event -o ./code

And also generate a RootCore package with the generated code from the code directory transferred:

d3pdReaderRootCoreMaker.py -p D3PDReader ./code/*

Finally, make sure that the resulting D3PDReader package actually contains the the source code (look in the Root sub-directory). If that is the case, delete the temporary code directory again:

rm -rf code

Please note that you will get a lot of warning messages, which are normal. I'm still trying to figure out how to stop them.

You can now close this session. You should have a new package called D3PDReader now.

While you need athena to generate the D3PDReader, you can use it without it. Or in other words, you can generate the D3PDReader package on lxplus, and then copy it to your laptop or any other machine you would like to work on. As long as you have root and RootCore working on your machine you can use D3PDReader.

Warning: The following should be done in your original RootCore session, or in a new session that you set up in the same way. Do not try to do this in the session in which you set up athena. You will get all kinds of problems.

In your original RootCore session you can now add and compile this package (this will take a while):

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Adding Your Own Package

Now it is time to create your own package. RootCore provides you a script that creates a skeleton package, and we highly recommend you use it. We'll call the package MyAnalysis, but you can name it anything you want. In the ROOTTutorial directory execute:

$ROOTCOREDIR/scripts/make_skeleton.sh MyAnalysis

Side Note: For actual work (not this tutorial) we highly recommend storing all your packages, even private ones, in SVN. This makes it a lot easier to share your code across machines or with other people; and it gives you the advantages of version management, in case something ever goes wrong. To do that (assuming the base package already exists in SVN), just check it out before you call make_skeleton.sh, i.e.:
svn co [svn-url] MyAnalysis
$ROOTCOREDIR/scripts/make_skeleton.sh MyAnalysis

If you list the contents of the package directory, you'll see:

cmt directory: Where the package Makefile sits. You'll have to modify this Makefile if your package depends on other packages (we'll do that soon).
MyAnalysis directory: This is where all of your C++ header files go.
Root directory: This is where all of your C++ source files go. It also holds the LinkDef.h file that ROOT uses to build dictionaries.

Now let's just ask RootCore to pick up this new package:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Technically you don't have to rebuild after every step, since you won't be running code until the very end of this tutorial, but it is good practice to check in regular intervals whether your code still compiles.

Creating an EventLoop Algorithm

EventLoop is a package maintained by the PAT group for looping over events. You first start out by making an algorithm class that will hold your analysis code. We'll call it MyFirstD3PDAnalysis, but you can name it anything you want. You could create this class by hand, but the EventLoop package provides a script to create the skeleton of the class for you inside the package you specify (here MyAnalysis):

$ROOTCOREBIN/user_scripts/EventLoop/make_skeleton MyAnalysis MyFirstD3PDAnalysis

Now we should be able to build our package with our empty algorithm:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

We'll have to rerun find_packages.sh here, because make_skeleton updated the dependencies for our package. Normally you can recompile using just compile.sh.

Adding D3PDReader Classes to Algorithm

To use the D3PDReader classes inside our code, we first need to add the dependency to our package. For that edit the file MyAnalysis/cmt/Makefile.RootCore and change the line with PACKAGE_DEP to

PACKAGE_DEP = EventLoop D3PDReader

Now we are going to add the hooks needed for the D3PDReader classes to our algorithm. First we'll need to add the D3PDReader header files to the includes in at the top of MyAnalysis/MyAnalysis/MyFirstD3PDAnalysis.h:

#include <D3PDReader/Event.h>

Now let's insert a new data member into the class definition in the the same file (please note that the //! is important):

D3PDReader::Event *event;  //!

Forgetting the //! is the most common problem when using EventLoop. It tends to manifest itself as a crash when you try to run the job. In effect what it does is to tell EventLoop that the corresponding variable will not be initialized until initialize gets called. So you should add this to all variables in your algorithm, except for configuration parameters.

The exact meaning of //!: What happens internally is that EventLoop will store a copy of your algorithm object on disk, and then for every sample it reads back a copy, initializes it and processes the samples. Saving objects and reading them back happens through a so called streamer. These get automatically generated for you by ROOT CINT. Adding //! to a variable tells CINT to ignore that variable. If you were to forget the //! in this case, CINT would try to store a copy of the event structure on disk, but since it hasn't been initialized yet it will (most likely) crash.

Now we need to tell EventLoop that we actually want to use the D3PDReader in our job. The easiest way to do that is to add the request in the setupJob function in MyAnalysis/Root/MyFirstD3PDAnalysis.cxx:

EL::StatusCode MyFirstD3PDAnalysis :: setupJob (EL::Job& job)
{
  // Here you put code that sets up the job on the submission object
  // so that it is ready to work with your algorithm, e.g. you can
  // request the D3PDReader service or add output files.  Any code you
  // put here could instead also go into the submission script.  The
  // sole advantage of putting it here is that it gets automatically
  // activated/deactivated when you add/remove the algorithm from your
  // job, which may or may not be of value to you.
  job.useD3PDReader ();
  return EL::StatusCode::SUCCESS;
}

Now we need to change the initialize function in MyAnalysis/Root/MyFirstD3PDAnalysis.cxx so that we connect this member to our D3PDReader. All this does is create a local reference to the D3PDReader event object, which adds to our convenience and is also slightly faster than accessing it otherwise:

EL::StatusCode MyFirstD3PDAnalysis :: initialize ()
{
  // Here you do everything that you need to do after the first input
  // file has been connected and before the first event is processed,
  // e.g. create additional histograms based on which variables are
  // available in the input files.  You can also create all of your
  // histograms and trees in here, but be aware that this method
  // doesn't get called if no events are processed.  So any objects
  // you create here won't be available in the output if you have no
  // input events.
  event = wk()->d3pdreader();
  return EL::StatusCode::SUCCESS;
}

Now we should be able to build our package using D3PDReader classes in our algorithm:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Again we had to rerun find_packages.sh here, because we updated the dependencies for our package.

Filling a Simple Histogram

Now we are in the situation, in which we can add code to our algorithm to fill a simple histogram. In our case, all we want to do is make a histogram of the electron pt spectrum, which is fairly straightforward.

Again in the header file MyFirstD3PDAnalysis.h add an include for the TH1 class file (before the class statement):

#include <TH1.h>

Within our algorithm class add a histogram pointer as a member to our MyFirstD3PDAnalysis algorithm class (in MyFirstD3PDAnalysis.h). You can make this either public or private, it doesn't matter. Please note that the //! is important:

TH1 *AHist; //!

Now inside the source file MyFirstD3PDAnalysis.cxx, we need to book the histogram and add it to the output. That happens in the histInitialize function:

EL::StatusCode MyFirstD3PDAnalysis :: histInitialize ()
{
  // Here you do everything that needs to be done at the very
  // beginning on each worker node, e.g. create histograms and output
  // trees.  This method gets called before any input files are
  // connected.
  AHist = new TH1F("AHist", "AHist", 100, 0, 1.e5);
  wk()->addOutput (AHist);
  return EL::StatusCode::SUCCESS;
}

This method is called before processing any events. Note that the wk()->addOutput call is a mechanism EventLoop uses for delivering the results of an algorithm to the outside world. When running in PROOF, ROOT will merge all of the objects in this list. In principle you can place any object that inherits from TObject in there, but if you put anything other than histograms there you need to take special precautions for the merging step.

Now that we have the histogram, we also need to fill it. For that we can use the execute method, which is called for every event:

EL::StatusCode MyFirstD3PDAnalysis :: execute ()
{
  // Here you do everything that needs to be done on every single
  // events, e.g. read input variables, apply cuts, and fill
  // histograms and trees.  This is where most of your actual analysis
  // code will go.

  // Histogramming the pT of electrons
  for( Int_t i = 0; i < event->el.n(); ++i ){
    AHist->Fill(event->el[i].pt());
  }

  return EL::StatusCode::SUCCESS;
}

Now we should be able to build our package with the full algorithm:

$ROOTCOREDIR/scripts/compile.sh

This time around you don't need to call find_packages.sh since the dependencies didn't change. This is the normal situation when you are actually developing code. In this tutorial you will call find_packages.sh a lot, but that is because we show you a lot of different packages.

Making a Steering Macro

To actually run your algorithm you need some steering code. This can be a root macro in either C++ or python or some compiled C++ code. For this tutorial we will create a C++ macro.

First let's create a Run directory for it. Within ROOTTutorial execute:

mkdir Run
cd Run

And in that directory we will create a new file called ATestRun.cxx. Note that we will use another PAT tool called SampleHandler which is a nice tool that allows for easy sample management. Fill this new file, ATestRun.cxx, with the following:

void ATestRun (const std::string& submitDir)
{
  // Load the libraries for all packages
  gROOT->ProcessLine(".x $ROOTCOREDIR/scripts/load_packages.C");

  // create a new sample handler to describe the data files we use
  SH::SampleHandler sh;

  // scan for datasets in the given directory
  // this works if you are on lxplus, otherwise you'd want to copy over files
  // to your local machine and use a local path.  if you do so, make sure
  // that you copy all subdirectories and point this to the directory
  // containing all the files, not the subdirectories.
  SH::scanDir (sh, "/afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/");

  // set the name of the tree in our files
  sh.setMetaString ("nc_tree", "MyFirstD3PD");

  // further sample handler configuration may go here

  // print out the samples we found
  sh.print ();

  // this is the basic description of our job
  EL::Job job;
  job.sampleHandler (sh);

  // add our algorithm to the job
  MyFirstD3PDAnalysis *alg = new MyFirstD3PDAnalysis;

  // later on we'll add some configuration options for our algorithm that go here

  job.algsAdd (alg);

  // make the driver we want to use:
  // this one works by running the algorithm directly
  EL::DirectDriver driver;

  // process the job using the driver
  driver.submit (job, submitDir);

  // Fetch and plot our histogram... Note that "AHist" is the name of
  // the histogram and that "data..." is the name of the sample
  // this is mostly for show, in a real analysis you would likely do this in a separate macro.
  SH::SampleHandler sh_hist;
  sh_hist.load (submitDir + "/hist");
  TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("AHist");
   hist->Draw ();
}

Running Your algorithm

Ok, now the big moment has come. Within your Run directory execute your ATestRun.cxx macro with root:

root -l 'ATestRun.cxx ("submitDir")'

You should have a histogram pop up. You can quit ROOT by typing ".q" at the ROOT prompt once you're happy with the resulting plot.

Note that submitDir is the directory where the output of your job is stored. If you want to run again, you either have to remove that directory or pass a different name into ATestRun.cxx.

Please Note: For actual work you should give more thought about how you want to organize your submission directories. Since those contain the output of your jobs you probably want to keep old versions around, so that you can compare. Also, you may not want to keep them in your source directory tree, but instead put them into a separate data directory tree. However, try to avoid putting them inside a RootCore managed package. Doing so may result in them getting copied around when submitting to the grid or a batch system.

Basic Selection Tools

This section introduces you to some of the tools that allow you to select good physics objects and events. Each subsection can be done independently of the others. To be honest this section is overall too short and only covers a subset of tools. We plan to address this shortcoming for the next tutorial and provide a uniform interface for all tools, but for now it is what it is.

Using ObjectSelectors

Object selectors are tools that allow to select physics objects that fulfill the requirements of the joint performance groups. In this example we'll use the JetCleaningTool to apply tighter jet cleaning. Note that the procedure for using the other selectors (e.g. Muon or Electron) is essentially the same.

First we need to tell RootCore that we will be depending on the package where the JetCleaningTool lives. Open the Makefile.RootCore in the cmt directory of our MyAnalysis package, and add JetSelectorTools to PACKAGE_DEP. So the PACKAGE_DEP line should look something like this (if you have other packages listed here, don't remove them):

PACKAGE_DEP = EventLoop D3PDReader JetSelectorTools

PACKAGE_DEP = EventLoop D3PDReader JetSelectorTools

Next add the include for the JetCleaningTool to the header file of our analysis algorithm (MyFirstD3PDAnalysis.h):

#include "JetSelectorTools/TJetCleaningTool.h"

Now add a pointer to a TJetCleaningTool to the analysis class. This is a configurable object we will initialize in our steering code, i.e. a configuration option. As such you should not be using //! and have to make it public:

public:
Root::TJetCleaningTool *my_JetCleaningTool;

For this tutorial we'll demonstrate the result of jet cleaning in a single histogram. Let's add that histogram to the analysis class as well (note that the //! is important):

TH1 *jetHist; //!

Now in the source file (MyFirstD3PDAnalysis.cxx) add the includes we need:

#include <TH2D.h>

We are going to instantiate and configure the jet cleaning tool in our steering macro, so the class itself doesn't have to take care of that. For now let's just initialize our jet cleaning tool to a null pointer in MyFirstD3PDAnalysis::MyFirstD3PDAnalysis constructor (your code will likely crash if you forget to do this):

my_JetCleaningTool = 0;

Inside initialize check that the tool is there and initialize it:

if (!my_JetCleaningTool){
    throw std::string ("No JetCleaningTool configured");
  }
  my_JetCleaningTool->initialize();

Since we want to create another histogram we have to create it in the initialize function:

jetHist = new TH2D ("jet_n", "cut vs. uncut jets", 20, 0, 20, 20, 0, 20);
  wk()->addOutput (jetHist);

Now in the execute function in the source file, we add a jet counter and fill our histogram to compare the number of tight jets with the total number of jets:

unsigned njets = 0;
  for( Int_t i = 0; i < event->jet.n(); ++i ) {
    if (my_JetCleaningTool->accept(event->jet[i].eta(),
                                   event->jet[i].NegativeE(),
                                   event->jet[i].hecf(),
                                   event->jet[i].HECQuality(),
                                   event->jet[i].emfrac(),
                                   event->jet[i].sumPtTrk(),
                                   event->jet[i].LArQuality(),
                                   event->jet[i].Timing(),
                                   event->jet[i].fracSamplingMax(),
                                   event->jet[i].AverageLArQF() )){
      ++ njets;
    } // end if 
  } // end for loop of jets

  jetHist->Fill (event->jet.n(), njets);

Note that we are in the process of implementing a new accept method for the cleaning tools, so you can call them with the object (i.e. jet) and not have to know the appropriate arguments.

Now we need to compile our package again. Since we changed our dependencies, you have to run find_packages.sh again. Remember that this has to be done in the ROOTTutorial directory:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Now we need to update our steering macro (ATestRun.cxx) to initialize the jet cleaning tool in our algorithm. There is a python macro provided with the jet cleaning tools for configuring the jet cleaning tool correctly, and all the following code does is to call that macro. Just add this code in your steering macro where it says to put the algorithm configuration:

// Configure the Jet selector (tight selections)
TString path = "$ROOTCOREBIN/python/JetSelectorTools/ConfiguredTJetCleaningTools.py";
gSystem->ExpandPathName (path);
TPython::LoadMacro (path.Data());
Root::TJetCleaningTool* myTightJetCleanTool = static_cast<Root::TJetCleaningTool*>((void*)TPython::Eval("ConfiguredTJetCleaningTool_Tight()"));

// Tell our analysis to use this selector.
alg->my_JetCleaningTool=myTightJetCleanTool;

In order to see the effect of the jet cleaning tool, let's change the last two lines which display the final histogram:

TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("jet_n");
   hist->Draw ("BOX");

Now let's run again, inside the Run directory call. If you already have a submitDir you either have to rename or remove it:

root -l 'ATestRun.cxx ("submitDir")'

You should now see a 2d histogram which shows the total number of jets on the x-axis and the number of tight jets on the y-axis. Hopefully the latter will always be smaller than the former.

Selecting Good Events with GoodRunsLists

The GoodRunsLists package (GRL) allows you to select events that are limited to periods that are considered good for physics. In general you will have a good runs lists containing a list of all data taking periods. The actual file is a .xml file and how you create or obtain it will be covered elsewhere. For the purposes of this tutorial we will provide you with one.

First we need to tell RootCore that we will be depending on the GoodRunsLists package. Open the Makefile.RootCore in the cmt directory of our MyAnalysis package, and add GoodRunsLists to PACKAGE_DEP. So the PACKAGE_DEP line should look something like this (if you have other packages listed here, don't remove them):

PACKAGE_DEP = EventLoop D3PDReader GoodRunsLists

Next add the include for GoodRunsLists to the header file of our analysis algorithm (MyFirstD3PDAnalysis.h):

#include <GoodRunsLists/TGoodRunsListReader.h>

And then add a good runs lists to our algorithm class. This is a configuration option, so we should not use the //! and have to make it public:

public:
Root::TGoodRunsList my_grl;

We will add another histogram so that we can compare the result with and without GRL, so let's add that to the class (note that the //! is important):

TH1 *GRLHist; //!

Now in the source file let's add that histogram in the histInitialize function:

GRLHist = new TH1F("GRLHist", "GRLHist", 100, 0, 1.e5);
  wk()->addOutput (GRLHist);

Within the execute function we check if the event passes the GRL, and if it does we fill GRLHist just like we did for AHist:

if (event->eventinfo.isSimulation() ||
      my_grl.HasRunLumiBlock(event->eventinfo.RunNumber(),
                             event->eventinfo.lbn()))
  {
    for( Int_t i = 0; i < event->el.n(); ++i )
      GRLHist->Fill(event->el[i].pt());
  }

Please note that we accept the event if it is either a simulation event or if it is in our GRL. Also note that in most real algorithms you'd just skip the event entirely if you failed the GRL selection. However, this being a tutorial we want to show you the results of your actions.

Now we need to compile our package again. Since we changed our dependencies, you have to run find_packages.sh again. Remember that this has to be done in the ROOTTutorial directory:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Now we need to update our steering macro (ATestRun.cxx) to set the GRL in our algorithm. Add this code where it says to put the algorithm configuration:

// Read the GRL.  if you don't have AFS on your local machine, copy
  // the file over and change the path accordingly.
  Root::TGoodRunsListReader* grlR = new Root::TGoodRunsListReader();
  grlR->SetXMLFile ("/afs/cern.ch/user/a/atlasdqm/www/grlgen/All_Good/data11_7TeV.periodAllYear_DetStatus-v36-pro10-02_CoolRunQuery-00-04-08_All_Good.xml");
  grlR->Interpret ();

  // Tell our analysis to use this GRL
  alg->my_grl= grlR->GetMergedGoodRunsList();

In order to see the effect of the GRL, let's change the lines which display the final histogram to:

sh_hist.load (submitDir + "/hist");
  TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("AHist");
   hist->Draw ();
  TH1 *hist2 = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("GRLHist");
  hist2->SetLineColor (2);
  hist2->Draw ("SAME");

Not let's run again, inside the Run directory call. If you already have a submitDir you either have to rename or remove it:

root -l 'ATestRun.cxx ("submitDir")'

You should now see two histograms of the electron pt spectrum. With the black one showing all events and the red one only the events passing the GRL. Hopefully the later will always be smaller than the former, but hopefully not by much.

Getting More out of EventLoop

This section contains examples of how to run EventLoop differently, so that you can take a bigger advantage of it.

Using Sample Meta-Data

SampleHandler has the ability to store additional data with each sample, called meta-data. This data can be accessed from within your algorithm. Among other things this can serve as a way to configure your algorithm differently for each sample. I was too lazy to cook up an actual example, but the way this works in practice is quite simple. And if you look at the section on making a stack plot with SampleHandler, it uses that mechanism quite a bit.

You can set the sample meta-data the following way (we use that in the stack plot example):

sh.get ("sampleName")->setMetaDouble ("doubleVarName", 17.3);
  sh.get ("sampleName")->setMetaString ("stringVarName", "some string");

Or you can also set a meta-data field for all the samples in the sample handler. This is useful if you want to provide a default for all samples and then change individual ones. We also use this mechanism in the base example to set the tree name for all samples. The general form is like this:

sh.setMetaDouble ("doubleVarName", 17.3);
  sh.setMetaString ("stringVarName", "some string");

You can then retrieve the meta-data within your algorithm using:

double doubleMeta = wk()->metaData()->getDouble ("doubleVarName");
  std::string stringMeta = wk()->metaData()->getString ("stringVarName");

If you want to, you can also provide a default value, in case the meta-data wasn't set for a particular sample. This value will then be used if the field wasn't initialized:

double doubleMetaDefault = wk()->metaData()->getDouble ("doubleVarName", 17.3);
  std::string stringMetaDefault = wk()->metaData()->getString ("stringVarName", "some string");

Just for completeness, let's mention that you can then retrieve the meta-data within your steering script as well. We use that in the stack plot example, the general case is:

double doubleMeta = sh.get ("sampleName")->getMetaDouble ("doubleVarName");
  std::string stringMeta = sh.get ("sampleName")->getMetaString ("stringVarName");
  double doubleMetaDefault = sh.get ("sampleName")->getMetaDouble ("doubleVarName", 17.3);
  std::string stringMetaDefault = sh.get ("sampleName")->getMetaString ("stringVarName", "some string");

Using PROOF Lite

PROOF lite is a fairly effective and easy way of improving the performance of your analysis code. PROOF lite will execute your code in parallel on every CPU core available on your local machine, so if your machine has 16 cores, your analysis will run up to 16 times as fast. The actual amount of speed-up will depend on a variety of things, the number of cores in your machine, the speed of your hard drive, how you read your data, the amount of processing you do for each event, etc.

Using PROOF lite is very straight-forward, you just change the driver statement in ATestRun.cxx to:

EL::ProofDriver driver;

That's it. There are a couple of practical changes though. The main one is that any messages your job prints while processing events no longer appear on the screen. Instead they go into a log file in the ~/.proof directory hierarchy. In general it is a little hard to debug jobs in PROOF, so if something goes wrong there is an advantage to running with DirectDriver instead.

Using a PROOF Farm

Warning: This feature is not well tested yet, and I wouldn't be surprised if you run into trouble using it at the current time.

Beyond the lite mode, PROOF also can be used to run on your local batch system. For that the batch system has to be configured properly for PROOF mode. This can be done either centrally by your system administrator, or you can use a mode called PROOF-on-demand that will run on top of your local batch system.

PROOF has been specifically designed for use with root, and it offers the following two advantages over a traditional batch system:

It will split the job dynamically so that all workers are kept continuously busy until the end of your job. In a more traditional approach the job is split ahead of time, meaning some of your worker nodes may finish early and then sit idle, while others have to work overtime.
If your data is actually located on the worker nodes (a common setup at many institutions), PROOF will as much as possible attempt to run your jobs on the machines that host the data, minimizing network traffic. How much this will buy you will depend on your hardware configuration, but in certain cases it can speed up your jobs substantially.

To make use of your PROOF farm you need to configure the PROOF driver accordingly:

EL::ProofDriver driver;

// set the PROOF master.  this needs to be changed to point to your PROOF master node
driver.proofMaster = "acas1010";

// by default the PROOF driver only creates a .par file for PROOF lite
// resetting this option will give you a full-featured .par file
driver.makeParOptions = "";

// this configures how output n-tuples are returned.
// for full PROOF mode the output n-tuples
// stay on the farm, as configured by this option.
driver.returnFiles = false;

The way PROOF works is that it will upload your code to your PROOF cluster and then recompile it there. This means you'll encounter a full recompile of your code the first time you upload, after that it will only recompile partially according to the changes that you made. However, sometimes it gets confused, mostly if you actually delete any source files (they don't get deleted on server). If that happens it is easiest to remove your software package from the PROOF master and recompile it completely. The easiest way to do that is to use the following command:

driver.removePar ();

Using Lxbatch and Other Batch Systems

If you work on lxplus, you can run your job on the local batch system: lxbatch. This allows you to run on many nodes in parallel, improving the turnaround time on your job drastically. If you work on your institutions tier 3 site you may be able to do something similar there by using the correct driver for your batch system. For lxbatch just change the driver statement in ATestRun.cxx to:

EL::LSFDriver driver;
driver.shellInit = "export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase && source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh && eval localSetupROOT --skipConfirm";

Note the shellInit parameter, which is used to set up root on each of the worker nodes. If you run somewhere else than lxplus you will have to update that. Notice that we call localSetupROOT slightly differently than normal to make it work on the lxbatch worker nodes.

For now there are a number of practical limitations though: Right now both the submitDir and your ROOTTutorial directory need to be located on a shared filesystem. If you create n-tuples they will be copied into submitDir, not to your local file server. All of these limitations can be overcome, probably with modest effort. However, I will not start working on them until I actually get a request by a user to do so.

Using TTreeCache

The speed with which your jobs will run will in most cases be dominated by the speed with which you can read your input data. So it is often worthwhile to try to maximize that speed. One important step towards that end is using D3PDReader, which will only read the data you actually use, thereby avoiding to read unneeded data.

Another way to improve read performance is to use TTreeCache, which will predict the data that you are likely to access and preload it in big chunks. This is mostly important when reading files over the network, and it is virtually mandatory in case you read the files directly from the grid (see the next section on FAX).

Using TTreeCache with EventLoop is very straightforward, just specify the size of the cache you would like for your job before submitting it (in this case 10MB):

job.options()->setDouble (EL::Job::optCacheSize, 10*1024*1024);

The default way in which TTreeCache predicts what data your job will access is by looking at the first n events and assume that the pattern will hold for the rest of your jobs. If you want to, you can change the number of events to be read to suit your needs:

job.options()->setDouble (EL::Job::optCacheLearnEntries, 20);

You may have to play around a little to determine which number works best for you. There is a definite tradeoff, in that a too large number will mean that you read too many events without the benefit of the cache, while a too small number means that you will not be able to cache variables that you do not access on every event.

Using Federated XRootD (FAX)

Warning: FAX is a very new product and you may very well find it is not yet ready to suit your needs. However it is at a stage where user feedback is appreciated, and it can be used for productive work by experienced users. This section is for advanced users who are willing to give it a try or just want to see how it works. If you are a beginner, you might just want to skip this section and go on to grid processing instead. If you run into trouble or want to leave feedback, the support mailing lists are atlas-adc-federated-xrootd@cernSPAMNOT.ch, usatlas-federated-xrootd@cernSPAMNOT.ch. Here are some issues that you may encounter:

The biggest issue is that not all datasets are available via FAX. If your dataset is only available at sites that do not support fax you'd have to make a DaTRI request to send the dataset to a site that supports FAX. Or as an alternative you can use the grid driver, or download the dataset directly (e.g. with dq2-get)
I found that not all redirectors work equally well. Some may give you errors on accessing a file, while the same file may work well with other redirectors. Your actual read performance may also vary dramatically from one redirector to the next (even from one sample to the next). E.g. when I wrote this section I had one redirector that worked just as well and fast as reading from AFS while another redirector read one dataset fine, took 10-20 minutes for the next dataset, and produced an error for the third.
The usual kind of problems you may expect from new products and services: things may not be entirely stable yet, the different failure modes may not yet be known or understood, etc.
And of course you may experience all kinds of grid problems, though that may not go away.

FAX is a mechanism that allows you to read data directly from the grid without downloading it first. Depending on your usage pattern this may not only be more convenient, but also faster than downloading the files directly: If you download a file, you have to download it in its entirety, whereas if you read the file directly from the grid you may read only a small fraction of it. So if you are reading the file only once (or only a few times) this may be an option to improve your analysis workflow.

Warning: While not strictly necessary, it is strongly recommended that you use TTreeCache when using FAX, otherwise your performance is likely to be very poor. So if you haven't done so already, you should work through the section on TTreeCache first.

As a first step, set up the DQ2 tools, which will be needed for using FAX. There may be some specific advantages to setting them up before or after setting up root, but I wouldn't know about that. Anyway, when setting up on lxplus we already sourced atlasLocalSetup.sh, so from there the recommended commands are:

localSetupDQ2Client
voms-proxy-init -voms atlas
export STORAGEPREFIX=root://atlas-xrd-eos-n2n.cern.ch/

The first line here takes care of setting up the actual DQ2 tools. If you do not work on lxplus you may need to do something different. The second line will initialize the VOMS proxy that holds your grid certificate. If you have problems with that command you should ask your resident grid expert. There are too many things that can go wrong with that to discuss them here. Now the last line is specific to FAX and tells it which machine to contact for samples. This again may vary depending on your site. You can check the official documentation, particularly this list here.

Now it's time to actually use FAX. For that comment out the part where we scan the local directory for samples, and instead scan DQ2:

// SH::scanDir (sh, "/afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/");
  SH::scanDQ2 (sh, "user.krumnack.pat_tutorial_*.v1");

That should do it. You can now just run your script the same way you did before and with a little luck you will get the same result as before. The initialization will take a little longer, as this will actually query dq2 to find the datasets matching your request, and then again for each dataset to locate the actual files. However, compared to the overall run-time this overhead should be small, and the power you gain is most likely worth it.

Warning: There are a lot more things that can go wrong with this, then with reading files locally. After all, you might be reading files from the other side of the world and not even know it. And this is also a fairly new service, so there might be some growing pains associated as well.

Using Automatic DQ2 downloads

If you can not or do not want to use FAX for accessing grid datasets, you either have to run on the grid or download them to your site. One option for the later is to tell SampleHandler that you have a certain area where your grid samples should go, and it will then automatically download them for you the first time you access them. For the most part this is a convenience feature, but if you activate it at the same time as FAX, it will try accessing the dataset with FAX first and if that fails it will then download the dataset.

To use this feature you have to setup the DQ2 tools as well as a cache area. There may be some specific advantages to setting up the DQ2 tools before or after setting up root, but I wouldn't know about that. Anyway, when setting up on lxplus we already sourced atlasLocalSetup.sh, so from there the recommended commands are:

localSetupDQ2Client
voms-proxy-init -voms atlas
export SAMPLEHANDLER_DQ2_AREA=/tmp/$USER/sh_dq2
mkdir -p $SAMPLEHANDLER_DQ2_AREA

The first line here takes care of setting up the actual DQ2 tools. If you do not work on lxplus you may need to do something different. The second line will initialize the VOMS proxy that holds your grid certificate. If you have problems with that command you should ask your resident grid expert. There are too many things that can go wrong with that to discuss them here. The third line defines the directory where your cached data samples will go, and the fourth line makes sure that area exists.

Warning: In any real life application you will want to point this somewhere else than the /tmp area. We only point it to the /tmp area, because by default you don't have any other place to put large data files when working on lxplus. For actual work please point this to some nice and large data disk. If you want to put your files directly onto a storage element, please talk to me and I'll see if I can make that a reality.

Now let's use some grid samples instead of the AFS ones. For that comment out the part where we scan the local directory for samples, and instead scan DQ2:

// SH::scanDir (sh, "/afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/");
  SH::scanDQ2 (sh, "user.krumnack.pat_tutorial_*.v1");

That should do it. You can now just run your script the same way you did before and with a little luck you will get the same result as before. The initialization will take a little longer, as this will actually query dq2 to find the datasets matching your request, and then for the first time it will take quite a while to download all data samples. However, for subsequent runs the overhead for querying dq2 should be small, and the power you gain is most likely worth it.

Warning: The achilles heel of this mechanism is that if you try to download the same dataset multiple times simultaneously something bad is likely to happen and I'm not sure that my code will catch that. The easiest way to avoid it is to run only one job at a time, or if you run multiple jobs run them on different datasets or run them in sequence or make sure all datasets are already downloaded. If you feel you can't work around this talk to me.

Separate Submit-Retrieve

So far our tutorial has been assuming that when you run a job you will just sit there and wait until it finishes. This works well for the jobs we are doing in the tutorial since they are fairly short, but it works less well when you run longer jobs. Particularly when working on the grid with large volumes of data it can take many hours. To better deal with these cases, one may want to split our ATestRun.cxx script into two:

one script for submitting the job to your batch system / the grid
one script for retrieving and processing the job output after your job has finished.

Please note that while the following will work with all drivers, you will only get the full benefits with some of the drivers. At the time of this writing the supported drivers are the grid driver and the batch drivers. The other drivers will still do the full event processing in the submission step. If you feel there is another driver to which this capability should be added, send me an email (nils.erik.krumnack@cernSPAMNOT.ch) and I will try to implement it for you.

To create the first script, copy the content of ATestRun.cxx all the way up to, and including, the driver.submit statement into a new file ATestSubmit.cxx. Don't forget to change the name of the macro at the beginning of the file to ATestSubmit:

void ATestSubmit (const std::string& submitDir)

Instead of the regular driver.submit statement we will ask EventLoop only to submit the job and immediately return control to us. To do that, add to the end of ATestSubmit.cxx replace driver.submit with:

// submit the job, returning control immediately if the driver supports it
  driver.submitOnly (job, submitDir);
}

Next, let's create the script for retrieving and processing the job output, let's call it ATestRetrieve.cxx. Start it out by loading all the packages and then retrieving the output:

void ATestRetrieve (const std::string& submitDir)
{
  // Load the libraries for all packages
  gROOT->ProcessLine(".x $ROOTCOREDIR/scripts/load_packages.C");

  // retrieve the job, waiting for it, if it is not finished
  EL::Driver::wait (submitDir);

And then to process the output, just copy everything from ATestRun.cxx that comes after driver.submit. Please note that it is very important to call wait, even if you know that your job already finished. wait does more than just waiting for the job to finish, it also takes care of merging histograms, moving files, and any other post-processing your job may require.

Now for the actual running.

root -l -b -q 'ATestSubmit.cxx ("submitDir")'
root -l 'ATestRetrieve.cxx ("submitDir")'

The point here is that you can wait an arbitrary amount of time between the two commands, and you can (if you want to) call the second command as often as you want, e.g. to tweak how the histograms get displayed. Should the job not be finished, wait will perform a partial retrieve, print a message, wait for a minute and then try again. It is perfectly safe to kill the job while it is waiting, but you should not kill it while it does the partial retrieve.

Please note that at least for the batch drivers you may get your results somewhat faster, if you have wait running all the time as opposed to only after the job finished. This is because the driver can do some of the post-processing during the partial retrieve while the other jobs are still running. Whether that matters will depend on your job. If you don't want to have your job actively waiting, you can instead trigger a partial retrieve manually (just don't do both at the same time):

root -l -b -q $ROOTCOREDIR/scripts/retrieve.C \(\"submitDir\"\)

Running on the grid

In this section of the tutorial we will teach you how to run on the grid. There are two main advantages to running on the grid: First, you have access to the vast computing power of the grid, which means even very lengthy jobs can finish within hours, instead of days or weeks. Second, you have direct access to all the datasets available on the grid, which means you don't need to download them to your site first, which can save you hours if not days of time. There are also two main disadvantages: First, for all but the simplest jobs your turnaround time will be measured in hours, if not days. And this time can vary depending on the load at the various grid sites. Second, there are more things beyond your control that can go wrong, e.g. the only grid site with your samples may experience problems and go offline for a day or two thereby delaying the execution of your jobs.

One common model is to run on the grid to produce a smaller more highly processed set of n-tuple and then run on those n-tuples locally with a second job. If done right this will take advantage of the strengths of the grid to do the large scale processing, and then take advantage of your local resources to produce histograms efficiently. This is a common model, but it may not be the best one for you, e.g.: Your primary n-tuples may be small enough that you can just download them directly, in which case you can probably avoid grid processing altogether. Or you may be one of the (rare) cases in which one can not (sensibly) produce a processed n-tuple.

Warning: While not strictly necessary, it is strongly recommended that you use TTreeCache when running on the grid, this can significantly improve the run time on some grid sites. So if you haven't done so already, you should work through the section on TTreeCache first.

Warning: While not strictly necessary, it is strongly recommended that you use a separate submit and retrieval script when running on the grid. A grid job can take hours if not days to finish, and you wouldn't want your job to be just sitting there. So if you haven't done so already, you should work through the section on separate submit-retrieve first.

Before running for the first time: The GridDriver uses ganga, which currently needs to create a few configuration defaults before it can be run for the first time. If you have never used ganga before, start it once in interactive mode before using the GridDriver:
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
localSetupGanga
ganga
Answer the questions, then press ctrl+d to exit. You only need to do this once on each computer where you want to use the GridDriver.

As a first step, set up the DQ2 tools, which will be needed for running on the grid. There may be some specific advantages to setting them up before or after setting up root, but I wouldn't know about that. Anyway, when setting up on lxplus we already sourced atlasLocalSetup.sh, so from there the recommended commands are:

localSetupDQ2Client
voms-proxy-init -voms atlas

Open ATestSubmit.cxx and make sure that you are using the grid datasets. If you did the section on FAX or DQ2 downloads, this should be set already, otherwise comment out the directory scan now and instead scan DQ2:

// SH::scanDir (sh, "/afs/cern.ch/atlas/project/PAT/data/D3PDForRootAnalysis/");
SH::scanDQ2 (sh, "user.krumnack.pat_tutorial_*.v1");

Next, replace the driver with the GridDriver:

//EL::DirectDriver driver;
EL::GridDriver driver;

Before the jobs can be submitted you need to tell the driver which ATLAS software release should be set up on the worker nodes, otherwise the jobs cannot run. This in turn will decide which versions of ROOT and GCC are used for your analysis:

driver.athenaTag="17.2.7.4.1";

The GridDriver supports a number of optional configuration options that you might recognize from the prun program. If you want detailed control over how jobs are submitted, please consult this page for a list of options: https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/EventLoop#Grid_Driver

Finally, submit the jobs as before:

root -l -b -q 'ATestSubmit.cxx ("myGridJob")'

This command may take a while to complete - do not interrupt it! You can follow the evolution of your jobs by going to http://panda.cern.ch, clicking on users and then finding yourself in the list. Note that jobs for all samples may not appear immediately, they will be submitted over a period of time.

As opposed to the other batch drivers, it is not necessary to call wait() to trigger a partial retrieve. Grid jobs are often very large and numerous. Output of finished jobs is therefore continuously being downloaded in the background and only the final merging step happens when you call retrieve() or wait(). Failed jobs are also automatically retried a maximum of 4 times. This process will continue for 24 hours even if you log out. If your jobs are not completed by then, it will be restarted whenever you call wait() or retrieve(). But if you want the driver to inform you when the jobs are done, you can just keep wait() running:

root -l 'ATestRetrieve.cxx ("myGridJob")'

Depending on the size of your samples and the load on the grid, it can take hours or even days before results are ready. However, our test jobs in this tutorial are quite small and should hopefully not take that long to finish. Otherwise, just cancel the retrieve script and try again later or the next day. Upon completion, you should see the histogram pop up as before.

What is happening behind the scenes? The grid driver relies on ganga to manage your jobs. Ganga creates a "server" process that will keep running on your machine till the jobs are finished (max 24 hrs), even if you logoff. In case of problems the ganga log file (~/.ganga.log) or server messages (~/gangadir-EL/server) might provide some insight into what is going on. In addition, you can set GridDriver::gangaLogFile to save driver / server communication to a location of your choice. (Hint: the tail -f command lets you follow log file updates in real time).

Specifying paths to local files: When running on the grid, your local file system is of course inaccessible. All axuiliary files needed by your job must be placed in the share directory in one of your RootCore packages and accessed using paths of the form $ROOTCOREDIR/data/PackageName/FileName.root (take a look around inside the RootCore directory after compiling to see how symlinks on this form are automatically created). It is a good praxis to always use this system, as it works with all drivers.

Writing out a Slimmed n-Tuple

In this section we'll build a simple example of writing a new n-tuple. Done right, creating your n-tuple is a good technique for improving the speed of your analysis:

if you drop events or objects that you will cut anyway, you'll save yourself the time to read and cut them
if you drop variables you don't need, your n-tuple will be more compact making it faster to read
if you add variables that take a lot of time to calculate (or need a lot of inputs) you don't need to calculate them every time you run over the n-tuple saving you time.

For our example we'll use the capabilities of the EventLoop NTupleSvc and D3PDReader to create the new n-tuple. You can also create a new n-tuple yourself by using the facilities of TTree directly. As a demonstration, we'll do a bit of skimming/slimming/thinning.

Adding the N-Tuple Service

Before we start make a copy of ATestRun.cxx, which we will need in a second:

cp ATestRun.cxx ATestRun2.cxx

ATestRun.cxx will be the skim job and ATestRun2.cxx will run over the skimmed n-tuple. As a first step we will simply use the NTupleSvc and a basic selection in our job. As a first step, let us add the NTupleSvc to the job. This has to happen before you add other algorithms to the job that want to use the service:

EL::OutputStream output  ("output");
job.outputAdd (output);
EL::NTupleSvc *ntuple = new EL::NTupleSvc ("output");
ntuple->copyBranch ("RunNumber");
ntuple->copyBranch ("EventNumber");
ntuple->copyBranch ("isSimulation");
ntuple->copyBranch ("el_n");
ntuple->copyBranch ("el_pt");
ntuple->copyBranch ("lbn");
ntuple->copyBranch ("jet_n");
ntuple->copyBranch ("jet_hecf");
ntuple->copyBranch ("jet_eta");
ntuple->copyBranch ("jet_NegativeE");  
ntuple->copyBranch ("jet_HECQuality");
ntuple->copyBranch ("jet_emfrac");
ntuple->copyBranch ("jet_sumPtTrk");
ntuple->copyBranch ("jet_LArQuality");
ntuple->copyBranch ("jet_Timing");
ntuple->copyBranch ("jet_fracSamplingMax");
ntuple->copyBranch ("jet_AverageLArQF");
job.algsAdd (ntuple);

The string output is just an arbitrary name, that identifies this particular output. If you create multiple outputs in the same job each of them needs to be given a different label. In the future, some drivers may allow you to specify further options as part of the OutputStream object, e.g. specify on which disk the output n-tuples should be stored.

Now let's add a simple selection, in this case all events which contain at least two electrons or two muons:

EL::AlgSelect *select = new EL::AlgSelect ("output");
select->addCut ("el_n>=2||mu_muid_n>=2");
select->histName ("cut_flow");
job.algsAdd (select);

We are also creating a cut flow histogram here, which is actually kind of boring in that we only have a single cut. However, if you add multiple cuts you will get to see how many events survive each cut.

Now we can go on and try to produce our n-tuple:

root -l 'ATestRun.cxx ("submitDir")'

Using the Skimmed N-Tuple

Assuming that went fine, this is now where things get complicated. Depending on how you ran your code, your n-tuple will show up in different places. There are two ways to get to them. The more general way is to load the sample handler that was created for your output stream. You can either use that SampleHandler to access the n-tuples yourself or to use them as an input for another run of EventLoop. Assuming the later, we now modify the ATestRun2.cxx we created earlier. First make sure that your function name matches the name of the macro, so you change the first line to

void ATestRun2 (const std::string& submitDir)

Also change the section for the input sample handler to:

// create a new sample handler to describe the data files we use
  SH::SampleHandler sh;

  // load an existing sample handler
  sh.load ("submitDir/output-output");

  // set the name of the tree in our files
  sh.setMetaString ("nc_tree", "MyFirstD3PD");

  // further sample handler configuration may go here

  // print out the samples we found
  sh.print ();

Now you can run ATestRun2.cxx like you did for ATestRun.cxx:

root -l 'ATestRun2.cxx ("submitDir2")'

This will run your analysis job again, but on your skimmed n-tuple instead of the full n-tuple. Note that in a more realistic example we would probably have different algorithms for running on the skimmed n-tuples instead of the full ones, e.g. we wouldn't want to write out a skimmed n-tupled every time we run on the skimmed n-tuple...

As I said there is another way to access your data, and that is by doing it directly. Where the data is located will strongly depend on the EventLoop driver you are using. If you are running on a single machine (e.g. direct or PROOF-lite driver) you will find the output in submitDir/data-output, but if you are running on a computing farm the output will normally be stored on the farm storage elements or the farm nodes themselves. That is for a very practical reason: N-tuple files are typically huge. Copying them all to your interactive machine could take a lot of time and may require more disk space than locally available. Furthermore, if your n-tuples are large you will typically want to run over them using your farm. Your interactive node is typically not set up to deliver n-tuples efficiently to your farm machines.

However assuming you used the direct or PROOF-lite driver you can try to access and inspect the output n-tuple like this:

root -l submitDir/data-output/user.krumnack.pat_tutorial_data.v1.root
root [0] TBrowser x

Connecting our Algorithm to our Output N-Tuple

So far we have copied branches directly from the input tree to the output tree, and specified the events we want to keep using a simple expression. While that is sufficient in some cases, in many cases we would like to do more complicated things. For that we need to connect our algorithm to the n-tuple service, so that it can provide additional inputs.

As a first step we need to add the n-tuple service to our job. To do that, make sure your package depends on EventLoopAlgs, i.e. modify the PACKAGE_DEP lin in your Makefile.RootCore so that it reads:

PACKAGE_DEP = ... EventLoopAlgs

(where ... stand for the other package dependencies you might have listed from previous parts of this tutorial or your own analysis.)

Now recompile. We need to run find_packages.sh since we updated our dependencies:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Next add to the includes MyFirstD3PDAnalysis.h in:

#include <EventLoopAlgs/NTupleSvc.h>

And add a pointer to the n-tuple service in the class (note that the //! is important):

EL::NTupleSvc *output; //!

And add a configuration member that contains the name of the output stream. This is an example of a configuration parameter, basically the only case where you do not need and should not use the //!:

public:
std::string outputName;

Now connect to the n-tuple service in the initialize function in MyFirstD3PDAnalysis.cxx:

if (!outputName.empty()){
  output = EL::getNTupleSvc (wk(), outputName);
} 
else{
  output = 0;
}

And in ATestRun.cxx add the output stream to the configuration of our algorithm:

alg->outputName = "output";

Adding A Custom Variable

One simple thing we can do, is add another variable that is calculated from the existing variables. For demonstration purposes we are using the invariant mass of the first two electrons. In real life you would apply quality cuts to make sure you have two good electrons, but for simplicity we just take the first two.

Since we need the TLorentzVector class we need to update our Makefile.RootCore, so that the PACKAGE_PRELOAD line reads:

PACKAGE_PRELOAD = ... Physics

(where ... stand for the other package dependencies you might have listed from previous parts of this tutorial or your own analysis.)

First add the new variable to the class in MyFirstD3PDAnalysis.h:

Float_t diel_mass; //!

Now in MyFirstD3PDAnalysis.cxx add another include for the mass calculation:

#include <TLorentzVector.h>

Now register the variable with the tree in initialize after you connect to the ntuple service:

if (output){
  output->tree()->Branch ("diel_mass", &diel_mass, "diel_mass/F");
}

And in execute fill the variable with a value:

diel_mass = 0;
if (event->el.n() >= 2)
{
  TLorentzVector mom1, mom2;
  mom1.SetPtEtaPhiM (event->el[0].pt(), event->el[0].eta(), event->el[0].phi(), 0);
  mom2.SetPtEtaPhiM (event->el[1].pt(), event->el[1].eta(), event->el[1].phi(), 0);
  diel_mass = (mom1 + mom2).M();
}

Now we can go on and try to produce our n-tuple:

$ROOTCOREDIR/scripts/compile.sh
root -l 'ATestRun.cxx ("submitDir")'

And go on inspect the variable in our new n-tuple:

root -l submitDir/data-output/user.krumnack.pat_tutorial_data.v1.root
root [0] MyFirstD3PD->Draw ("diel_mass")

Adding A Custom Selection

In this section we will replace our basic selection of skim events with a selection based on quantities we calculated in our algorithm, in this case the dielectron mass. Note that this is simplified, for a realistic scenario we would have to put precautions in place about what to do if we have more than two electrons.

Add a cut parameter to the class in MyFirstD3PDAnalysis.h:

float massCut;

And in the constructor MyFirstD3PDAnalysis() in MyFirstD3PDAnalysis.cxx set a default value for that parameter:

massCut = 30e3;

Adding the selection in our code is very straightforward. After we fill diel_mass in execute add these lines:

if (output && diel_mass > massCut)
  output->setFilterPassed ();

Now in ATestRun.cxx remove AlgSelect or you will get an OR of both selections (which is not what we want). Now rerun the code and inspect the output n-tuple:

$ROOTCOREDIR/scripts/compile.sh
root -l 'ATestRun.cxx ("submitDir")'
root -l submitDir/data-output/user.krumnack.pat_tutorial_data.v1.root
root [0] MyFirstD3PD->Draw ("diel_mass")

You should now see that the mass spectrum is cut off at the specified value. As an exercise, modify the cut value from ATestRun.cxx and see if you see what you expect.

Thinning Output Collections

In the next section we will thin out the input electron collection to include only high pt electrons. To that end we are using the facilities of the D3PDReader to create that new collection and then write it out.

As a first step we need to add the new collection to our class in MyFirstD3PDAnalys.h:

D3PDReader::ElectronD3PDObject out_electrons; //!

Let's also add a configuration member for the electron pt cut to that class:

float electron_pt_cut;

And in the constructor MyFirstD3PDAnalysis() in the source file (=MyFirstD3PDAnalysis.cxx) give the parameter a default value:

electron_pt_cut=10e3;

Then in your initialize function (just before the return statement) setup writing out some selected objects. After the code from the previous step add a custom prefix for the output collection. Note that we used regular expressions to control what information we like to write out (i.e. slimming):

// Set a new prefix for the selected objects
out_electrons.SetPrefix ("my_el_");

// We select just basic kinematic info by activating just those branches
out_electrons.SetActive (kTRUE, "^my_el_n$|^my_el_pt$|^my_el_eta$|^my_el_phi$");

// Tell the objects to write into the output TTree
if (output)
  out_electrons.WriteTo (output->tree());

Now back in the execute function, add just the selected input objects to the output:

// Select objects to put into output
out_electrons.Clear();
for (Int_t i = 0; i < event->el.n(); ++ i)
  if (event->el[i].pt() > electron_pt_cut)
    out_electrons += event->el[i];

Now re-run the skimming job and check the effect on the pt spectrum:

$ROOTCOREDIR/scripts/compile.sh
root -l 'ATestRun.cxx ("submitDir")'
root -l submitDir/data-output/user.krumnack.pat_tutorial_data.v1.root
root [0] MyFirstD3PD->Draw ("my_el_pt")

This should show that the final collection only contains electrons that are above the cut. If you want to play with this code a little more, here are a couple of things you can try:

change the electron pt cut by setting it in ATestRun.cxx
update the diel_mass calculation to use only the electrons that pass our selection
if you have completed the section on object selectors, require the electrons to pass good object selection in addition to the pt cut
instead of creating a new collection my_el_ replace the original collection el_ with our new thinned collection. this is often useful, in that it allows code to run on either n-tuple, if it only looks at electrons passing the selection. to do this you need to change the name in initialize (within MyFirstD3PDAnalysis.cxx) and remove the electron variables from the list of variables configured in NTupleSvc (within ATestRun.cxx)

Making an Automatic Stack Plot With SampleHandler

In this section of the tutorial we will demonstrate how to use the features of SampleHandler to produce a stack plot from the histogram we created. The idea is that it will show you both how to use SampleHandler more effectively as well as giving you a generic tool for creating stack plots that you can build on and use in your analysis. It is probably not suited for all cases, but it should form a good basis for making quick and unofficial plots.

In this section we will build on the algorithm we developed earlier, but there is also an older, more self-contained version of this tutorial here.

While this is a very nice example of how to use SampleHandler to manage a large and changing number of samples (as well as their meta-data), it is not necessarily the best way to actually produce your final plots. Once you try to add systematics or other advanced features to your plot it can become quite tricky. To deal with this, I am currently working on a new package that will produce publication quality plots. Among other things, it will allow you to produce a nice looking stack plot directly from the output of your EventLoop job using only a few lines. It hasn't been reviewed or released yet, but if you are interested you can send me an email (Nils.Erik.Krumnack@cernSPAMNOT.ch) and I will inform you once it is. If you want, you can also volunteer as a pre-release tester, which would help me and may make life easier for you as well, but you would have to contend with the fact that this package isn't stable (or official) yet.

Adding Meta-Data to the Samples

So far we haven't used SampleHandler to do anything, except to provide the file lists for EventLoop. One very useful feature of SampleHandler is that you can attach meta-data to your individual data samples. Let's say we want to attach a label and the luminosity to your code. Then you would add the following lines to ATestRun.cxx after the SH::scanDir command where it says to put further meta-data configuration:

// the luminosity we have to set manually and one by one
  sh.get ("user.krumnack.pat_tutorial_Zmumu.v1")->setMetaDouble ("lumi", 23.25);
  sh.get ("user.krumnack.pat_tutorial_Zmumu.v1")->setMetaString ("label", "Z+jets");
  sh.get ("user.krumnack.pat_tutorial_ttbar_corrected.v1")->setMetaDouble ("lumi", 121.21);
  sh.get ("user.krumnack.pat_tutorial_ttbar_corrected.v1")->setMetaString ("label", "ttbar");
  sh.get ("user.krumnack.pat_tutorial_data.v1")->setMetaDouble ("lumi", 5.891);
  sh.get ("user.krumnack.pat_tutorial_data.v1")->setMetaString ("label", "data");

Since these meta-data fields are defined for the input sample handler, they get automatically propagated to the sample handlers EventLoop generates for the output data. This will require you to re-run ATestRun.cxx. If you want to avoid that, you can instead add these lines to the macro we are about to create after you load the sample handler (there is a comment marking the right place).

A Basic Plotting Macro

Start out with a new macro AStackPlot.cxx which will contain your plotting code. You could make this part of ATestRun.cxx, but I don't recommend that, since you would have to rerun your event loop every time you want to change your plotting style. In the beginning you first have to load all your packages again

#include <vector>

void AStackPlot (const std::string& submitDir)
{
  // Load the libraries for all packages
  gROOT->ProcessLine(".x $ROOTCOREDIR/scripts/load_packages.C");

To make our life simpler, define the plot parameters at the beginning of our code:

// the name of the histogram we are trying to plot
  std::string histName = "AHist";

The way our code will work is that we first create all the histograms, and then plot them. So we need some variables to hold the variables in between (you'll also need to add an include for <vector> in the beginning):

// these vectors contain the data and mc histograms
  std::vector<TH1*> data, mc;

Now we use the ability of SampleHandler to loop over events to get all the histograms we need:

// create and initialize the sample handler
  SH::SampleHandler sh;
  sh.load (submitDir + "/hist");
  // add any extra meta-data you desire here

  for (unsigned samplesIter = 0; samplesIter != sh.size(); ++ samplesIter)
  {
    // the sample we are plotting
    SH::Sample *sample = sh.at (samplesIter);

    // determine whether this is data or MC
    bool isMC = sample->name().find ("data") == std::string::npos;

    // retrieve the histogram
    TH1 *hist = sample->readHist (histName);

    // store the histogram for later.
    if (isMC){
      mc.push_back (hist);
    }
    else{
      data.push_back (hist);
    }
  } // end for loop over samples sh.size

Since we want to make nice data-MC comparison stack plots we now have to do a bit of addition to get the correct plots:

// add together the histograms for proper stack-plots
  for (unsigned iter = 1; iter < data.size(); ++ iter){
    data[0]->Add (data[iter]);
  }
  for (unsigned iter = 1; iter < mc.size(); ++ iter){
    mc[mc.size()-1-iter]->Add (mc[mc.size()-iter]);
  }

And now just plot the histograms in the right order in a decent style:

// now plot all mc histograms
  for (unsigned iter = 0; iter < mc.size(); ++ iter)
  {
    mc[iter]->SetLineColor (2 + iter);
    mc[iter]->SetFillColor (2 + iter);
    mc[iter]->SetFillStyle (3001);
    mc[iter]->SetTitle (0);
    mc[iter]->Draw (iter ? "SAME HIST" : "HIST");
  }

  // and if we have data, plot that too
  if (!data.empty())
  {
    data[0]->SetMarkerStyle(20);
    data[0]->SetTitle (0);
    data[0]->Draw (mc.empty() ? "ERR" : "SAME ERR");
  }
}

Now we have a complete plot macro we can try. First re-run your event-loop to make sure that all the meta-data is properly propagated:

root -l -b -q 'ATestRun.cxx ("submitDir")'

Now run your new macro, this should create a new canvas with your stack plot:

root -l 'AStackPlot.cxx ("submitDir")'

Normalizing to Luminosity

While the last macro we created works nice at producing a plot, the plot is not very meaningful: The samples are not properly normalized to each other. To take care of that we are using the luminosity meta-data we stored in the sample handler. To do that properly we need to keep track of the luminosity in data, so create a variable to store it (before the sample loop):

// the total luminosity of all data histograms
  double data_lumi = 0;

Inside the sample loop, normalize the MC to a fixed luminosity, and keep track of the data luminosity:

// evaluate the luminosity
    // note the explicit use of meta-value default values
    if (isMC)
      hist->Scale (1./sample->getMetaDouble ("lumi", 1));
    else
      data_lumi += sample->getMetaDouble ("lumi", 0);

Now after the sample loop, but before you add the histograms together, apply the data luminosity as a normalization factor to the MC:

// do something meaningful if we don't have any data luminosity
  if (data_lumi <= 0){
    data_lumi = 100;
  }

  // apply data luminosity to MC (if available)
  for (unsigned iter = 0; iter < mc.size(); ++ iter){
    mc[iter]->Scale (data_lumi);
  }

Now your macro should create a new canvas with your stack plot:

root -l 'AStackPlot.cxx ("submitDir")'

Note that when I last tried this, the luminosities didn't seem to add up. Not sure why, they used to in the past. On the other hand, they are close and this is more of a proof of principle anyway. For a real analysis we would have to select only good electrons and have to apply a GRL selection to use only valid events.

Plot Labels

While this plot is now kind of correct from a physics side, it is still somewhat meaningless, in that we can't tell which distribution is which. To fix that, we want to add a key to our plot. As a first step, we store the label in the title of the histogram. So in the sample loop add the following code:

// set the title of the histogram to the label we want to use.
    // note that we use the sample name if no label was set
    hist->SetTitle (sample->getMetaString ("label", sample->name()).c_str());

Now add some code after the sample loop:

// turn off the statistics box
  gStyle->SetOptStat (0);
  gStyle->SetOptFit (0);

  // create a key instead of the statistics box
  TLegend *legend
    = new TLegend(0.6666667,0.809322,0.9885057,0.9957627,NULL,"brNDC");

Then before we plot the histograms, add them to the key:

// now add all mc histograms to the key
  for (unsigned iter = 0; iter < mc.size(); ++ iter)
  {
    std::string label = mc[iter]->GetTitle();
    // drawing code
    legend->AddEntry (mc[iter], label.c_str());
  }

  // and if we have data, add that to the key too
  if (!data.empty())
  {
    // drawing code
    legend->AddEntry (data[0], "data");
  }

And then in the end, also draw the key:

// don't forget to add the key
  legend->Draw ();

Please note that this example now uses all the meta-data from the original samples, which got passed through to the derived samples.

Using MultiDraw

MultiDraw is a package that was developed to allow using TTree::Draw like formulas inside EventLoop. There are two ways of doing so: By using MultiDraw algorithms directly or by incorporating MultiDraw formulas inside your own algorithms. The former is envisioned to be useful for quickly adding a couple more plots to your analysis. The later is useful if you want to make it easier to configure your algorithms.

The big advantage of using MultiDraw formulas is that they are typically very easy to write in simple cases. The disadvantage is that more complicated cases can result in very longish formulas or may be impossible to do at all. One of the underlying assumptions is that what you have in your n-tuple is very close to your final physics quantities, i.e. all the cleaning and correction tools have already been applied. You can still use MultiDraw in other situations, but its value will be reduced.

Adding a Simple Histogram

As the simplest exercise, let us simply add the electron pt spectrum to the list of our final histograms. We are already doing this in our algorithm, but now we will be doing it in MultiDraw, allowing for a direct comparison.

All we need to do is add another algorithm to our steering macro ATestRun.cxx:

job.algsAdd (new MD::AlgHist (new TH1F("MDHist", "MDHist", 100, 0, 1.e5), "el_pt"));

Now to see that it is actually doing the right thing, let's overlay our old histogram with the new one:

SH::SampleHandler sh_hist;
  sh_hist.load (submitDir + "/hist");
  TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("AHist");
   hist->Draw ();
  TH1 *hist2 = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("MDHist");
  hist2->SetLineColor (2);
  hist2->SetLineStyle (2);
  hist2->Draw ("SAME");

Not let's run again, inside the Run directory call. If you already have a submitDir you either have to rename or remove it:

root -l 'ATestRun.cxx ("submitDir")'

It should now show you our original histogram in black and the new histogram overlaid using a red dotted line. Both histograms should turn out the same.

MultiDraw can also fill multi-dimensional histograms and apply weights to each events. Just pass additional variables into the constructor.

Adding a Cut-Flow

A slightly more complicated, but quite powerful example of how to use MultiDraw algorithms is to create a cut flow. A cut flow is a histogram that shows the number of events (or objects) after each cut and is commonly used when comparing analysises.

All we need to do is add another algorithm to our steering macro ATestRun.cxx. For that we first create a histogram describing the cut flow and then use it to create a cut flow algorithm:

TH1 *cflow = new TH1D ("cflow", "cutflow", 10, 0, 10);
  cflow->GetXaxis()->SetBinLabel (2, "el_n>=2");
  cflow->GetXaxis()->SetBinLabel (3, "el_pt[0]>=25e3");
  cflow->GetXaxis()->SetBinLabel (4, "el_pt[1]>=20e3");
  job.algsAdd (new MD::AlgCFlow (cflow));

Now to see that it is actually doing the right thing, so change the histogram drawing section of the steering macro to display the cut flow:

SH::SampleHandler sh_hist;
  sh_hist.load (submitDir + "/hist");
  TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("cflow");
  hist->Draw ();

Not let's run again, inside the Run directory call. If you already have a submitDir you either have to rename or remove it:

root -l 'ATestRun.cxx ("submitDir")'

It should now show a cut flow, which is a histogram that gets smaller with each consecutive cut. Note that the first and the second bin are the same, because our sample is filtered to contain only di-electron events.

MultiDraw can also create more complicated cut flows, for details see the MultiDraw documentation.

Using MultiDraw Formulas in Your Algorithm

Another form of using MultiDraw is to use its formulas in your own algorithm. This has the advantage of allowing your algorithms to be configured to use arbitrary input variables and simple combinations of input variables without having to hard-code these variables in your algorithm. On the downside it is slower than the hard coded variables, so you should only use this feature for variables that you actually intend to configure this way.

First we need to tell RootCore that we will be depending on the MultiDraw package. Open the Makefile.RootCore in the cmt directory of our MyAnalysis package, and add MultiDraw to PACKAGE_DEP. So the PACKAGE_DEP line should look something like this (if you have other packages listed here, don't remove them):

PACKAGE_DEP = EventLoop D3PDReader MultiDraw

Next add the include for MultiDraw to the header file of our analysis algorithm (MyFirstD3PDAnalysis.h):

#include <MultiDraw/Global.h>

And then add a cut string to our algorithm class. This is a configuration option, so we should not use the //! and have to make it public:

public:
std::string cutString;

We will need access to the formula service and a single formula, so let's add that to the class (note that the //! is important):

MD::FormulaSvc *formSvc; //!
  const MD::Formula *cutFormula; //!

We will add another histogram so that we can compare the result with and without the cut, so let's add that to the class (note that the //! is important):

TH1 *cutHist; //!

Now in the source file (MyFirstD3PDAnalysis.cxx) add some more includes we need:

#include <MultiDraw/Formula.h>
#include <MultiDraw/FormulaSvc.h>

Now in the setupJob function we need to specify that we use MultiDraw formulas in our job:

MD::useFormulas (job);

And in the initialize function we need to register our formula with the formula service:

formSvc = MD::formulas (wk());
  cutFormula = formSvc->addForm (cutString);

And we also need to add a histogram for the result in initialize as well:

cutHist = new TH1F("cutHist", "cutHist", 100, 0, 1.e5);
  wk()->addOutput (cutHist);

Within the execute function we check if the event passes our cut, and if it does we fill cutHist just like we did for AHist except we allow our cutString to encode a weight as well:

double weight = cutFormula->value (0);
  if (weight > 0)
  {
    for( Int_t i = 0; i < event->el.n(); ++i )
      cutHist->Fill(event->el[i].pt(), weight);
  };

Please note that MultiDraw can also handle array variables and we should check for that. But for purposes of simplicity we are not. For details check out the MultiDraw documentation.

Now we need to compile our package again. Since we changed our dependencies, you have to run find_packages.sh again. Remember that this has to be done in the ROOTTutorial directory:

$ROOTCOREDIR/scripts/find_packages.sh
$ROOTCOREDIR/scripts/compile.sh

Now we need to update our steering macro (ATestRun.cxx) to set the cut string in our algorithm. Add this code where it says to put the algorithm configuration:

alg->cutString = "jet_n>=3";

In order to see the effect of the cut, let's change the lines which display the final histogram to:

SH::SampleHandler sh_hist;
  sh_hist.load (submitDir + "/hist");
  TH1 *hist = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("AHist");
  hist->Draw ();
  TH1 *hist2 = sh_hist.get ("user.krumnack.pat_tutorial_data.v1")->readHist ("cutHist");
  hist2->SetLineColor (2);
  hist2->Draw ("SAME");

Not let's run again, inside the Run directory call. If you already have a submitDir you either have to rename or remove it:

root -l 'ATestRun.cxx ("submitDir")'

You should now see two histograms of the electron pt spectrum. With the black one showing all events and the red one only the events passing our cut. Hopefully the later will always be smaller than the former, but hopefully not by much.

Version of the Tutorial in SVN

While our general recommendation is that you work through the tutorial code one step at a time, there is also a version of the code in SVN, which you can check out if you want to investigate particular problems.

Simply check out the control package:

svn co svn+ssh://svn.cern.ch/reps/atlasusr/krumnack/tutorial/trunk/control

Make sure you have root setup and source the following script to set up a couple more control variables

source control/setup.sh lxplus BaseVersion

The first parameter here defines where we run, the second which part of the tutorial we want to use (check control/versions for a list).

Now build it:

control/build

And run it as many times as you want:

control/run

Major updates:
-- AmirFarbin - 23-Sep-2011

Responsible: LouiseHeelan
Last reviewed by: Never reviewed

Topic revision: r99 - 18-Mar-2013 - AlexanderMadsen

Atlas

ATLAS Homepage
ATLAS Collaboration

ATLAS TWiki
- Public Results
- Physics
- Detectors
- Trigger
- Computing
- Data Preparation
- Documentation Help
- Help
- Glossary

Software Tutorial

Tutorial Preface

ATLAS Software Basics

Useful Links

Finding and Getting ATLAS data and Monte Carlo

ATLAS Reconstructed Objects

DAOD-making

Using the Grid

Analysis in Athena

D3PD-making

Analyzing D3PDs in Athena

Analyzing D3PDs in ROOT

Event Displays

TWiki Search

kojisatocernch

SSO logout

Atlas All webs

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback