-
Notifications
You must be signed in to change notification settings - Fork 5
Tools: MultiSamplePlot
The MultiSamplePlot class may seem a rather complicated class code-wise. Nevertheless, it is flexible enough for most user cases, such that it is normally not needed to understand what is 'under the hood'. The class provides a way to superimpose data to stacked Standard Model processes and overlayed new physics processes. Data/MC ratio plots and error bands around the nominal SM histogram can be drawn as well. Canvases are written as regular 'luminosity normalized' plots (SM MC normalized to the data luminosity), as well as 'area normalized' plots (SM MC normalized to the total number of events in data); both in linear and log scale. Additional user-specified texts can be written on the canvasses. Merging processes and the color and style of histograms can be configured via the input dataset xml file. Apart from the canvasses, also plain TH1F histograms for each input sample are saved in the output file.
Create a map of MultiSamplePlot objects and define an MSPlot given a vector of Dataset objects, for example read from an xml file:
map<string,MultiSamplePlot*> MSPlot;
string xmlFileName = "config/Run2SingleLepton_samples.xml";
TTreeLoader treeLoader;
vector < Dataset* > datasets;
treeLoader.LoadDatasets (datasets, xmlfile);
MSPlot["MS_MET_mu"] = new MultiSamplePlot(datasets,"MET", 30, 0, 300, "Missing transverse energy (GeV)");
The MultiSamplePlot constructor syntax is similar to a standard TH1F; here "MET" is the name of the MSPlot, and "Missing transverse energy (GeV)" the title that will be displayed on the x axis. The numbers in between represent the number of bins and the range on the x axis. (Note that there is also an MSPlot constructor which take vectors of histograms as input, which is useful when your input is just histograms from an earlier run and you are not looping over events.) Check the MultiSamplePlot.h header to see how one can specify the y axis label and some additional text you want to display on the canvas.
The MSPlot as constructed above should be filled inside a loop over the datasets, and a loop over the events of the current dataset
MSPlot["MET"]->Fill(mets[0]->Et(), datasets[d], true, Luminosity*scaleFactor);
For MC datasets, one should be careful that for proper scaling of the sample, the Luminosity should be specified in the same units as the equivalent luminosity 'EqLumi' specified in the Dataset object via the xml file 'EqLumi' element. The equivalent luminosity is defined as the number of events of the toptree without prerselection divided by the cross section and has to be put in by hand by the user. Note that like this, the 'xsection' specified in the xml file is not used directly in the scaling of samples!
At the end of your macro, you can 'draw' and 'write' your MultiSamplePlots map like
for(map<string,MultiSamplePlot*>::const_iterator it = MSPlot.begin(); it != MSPlot.end(); it++)
{
MultiSamplePlot *temp = it->second;
string name = it->first;
temp->showNumberEntries(showEntriesLegend);
temp->setPreliminary(setCMSPrelim);
if(!runonData) temp->setDataLumi(Luminosity); //in order to set the data luminosity text even if you don't run on data...
temp->setErrorBandFile(errorbandfile, dosystfile);
temp->Draw(name,RatioType, addRatioErrorBand, addErrorBand, ErrorBandAroundTotalInput, scaleNPSignal);
temp->Write(fout, name, savePNG, outputDirectory, "png");
}
Writing like this means the MultiSamplePlot options are configured with some booleans/integers/strings that you can specify in the beginning of your macro:
//MultiSamplePlot options
bool showEntriesLegend = false; //to show number of (weighted) events of the samples in the legend
bool setCMSPrelim = false; //if true, will display "CMS Preliminary", otherwise "CMS"
int RatioType = 0; //0: no ratio plot, 1: ratio = data/MC, 2: ratio = (data-MC)/MC
bool addErrorBand = false; //display an error band around the stacked SM MC on the main canvas
bool addRatioErrorBand = false; //display an error band on the ratio plot below the main canvas
bool ErrorBandAroundTotalInput = false; //see dedicated discussion below.
string errorbandfile = "ErrorBands/ErrorBandFile_15Jul15.root"; //a root file containing systematically shifted distributions to create error bands around the stacked SM MC. See dedicated discussion below.
bool dosystfile = false; //see dedicated discussion below.
int scaleNPSignal = 20; //determines the factor with which the new physics signal samples are scaled, only on the canvas (note that the TH1F histogram in the MSPlot output root file itself is not scaled with this factor!)
bool savePNG = false; //automatically save png files of MSPlots.
Drawing error bands is rather complicated, so this functionality is discussed here separately. Two ways to produce the error bands are supported for MultiSamplePlot:
-
The user has produced a root file with systematic shifted histograms, in the form of histograms that can directly be used as 'up' and 'down' direction of the error bands, above and below the total nominal SM distribution bins, respectively. An example macro (adapt according to your need!) that produces such a format of error band is TopBrussels/FCNCAnalysis/CreateErrorBands.cc, which takes as input nominal and all specified systematic shapes of all specified processes. The error bands are obtained by adding all systematics shifts in quadrature. The produced file ("errorbandfile") with error bands can then be passed to the MultiSamplePlot objects in the actual analysis/plotting macro by setting the boolean dosystfile = false in temp->setErrorBandFile(errorbandfile,dosystfile).
-
The user has produced a root file with systematic shifted histograms, in the form of histograms where the 'up' and 'down' variations are not necessarily above and below the total nominal SM distribution bins, respectively. This would for instance be a situation when you have only 1 dominating systematics and you want to ignore the other systematics. The produced file ("errorbandfile") can then be passed to the MultiSamplePlot objects in the actual analysis/plotting macro by setting the boolean dosystfile = true in temp->setErrorBandFile(errorbandfile,dosystfile).
Both options are supported for historical reasons, eventually only one standardized way could be implemented in the framework.
The boolean ErrorBandAroundTotalInput only make sense if dosystfile = false in temp->setErrorBandFile(errorbandfile,dosystfile), so only for option 1 above. It indicates if the error band should be displayed around the nominal SM that was used to calculate the shifts. Setting the boolean to false would be the usual thing to do. Setting the boolean to true however, means that you take this error band and put it around the total SM MC that you happen to run on in your current run (can be e.g. the nominal but scaled with an additional factor, and hence the error band would get shifted along).
Here are some features the user should now in order to understand what to plot.
- 'Real data' should be specified in the xml file with a 'name' starting with "Data", "data" or "DATA". This is an identifier that will be recognized in the MultiSamplePlot class to display the data histogram as black dots with error bars, superimposed to the stacked MC.
- Other SM MC (or data-driven estimations) can be named as you wish, and will be stacked on top of each other by default. The 'title' in the dataset configuration in the xml file will be the one shown in the legend. Important: all samples with the exact same 'title', whatever their 'names', will be merged together and get one legend entry. In this way you can group for example the different single top samples into one single-top entry. Note that this merging is only done on the canvas, the histograms in the MSPlot output root file are unmerged.
- If you want to overlay for example new physics processes, instead of stacking it to the SM MC, you can do so by giving the sample a 'name' in the xml config starting with "NP_overlay_". If instead you want to run on new physics signal but just not display them on the canvas, you specify them with a 'name' in the xml config starting with "NP_".
- The histogram color, line style (e.g. solid or dashed) and line width to be displayed in the MSPlot is configured in the xml file by the 'color', 'ls' and 'lw' elements, respectively. Note that the line style and width are only relevant for overlayed processes (like new physics samples), as regular SM MC is drawn as a filled histogram.
- The style of MultiSamplePlots is improved (larger axis labels, better position of legend and text) if you include in your macro '#include "Style.C"' and do 'setTDRStyle();' in the beginning of your macro, where this Style.C can be fetched from TopBrussels/FCNCAnalysis/Style.C
- Check out some extra functions (to display text on the canvas, ...) in the MultiSamplePlot.h header file