r5 - 04 Dec 2012 - 13:54:41 - Main.fioriniYou are here: TWiki >  Atlas Web  >  PlottingCuts > AnalysisWithMVA

Analysis with MVA (2011)

Prerequisites

Setting up ROOT and XROOTD

Edit the .bashrc file and add the following lines. Do not forget to modify the ROOT and gcc versions, according to your needs.

For ROOT:

  • . /afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/setup.sh
  • source /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.02/x86_64-slc5-gcc43-opt/root/bin/thisroot.sh

For XROOTD

  • BUILD=x86_64-slc5-gcc43-opt
  • export XRDPATH=/afs/cern.ch/sw/lcg/external/xrootd/3.2.4/$BUILD
  • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$XRDPATH/lib64
  • export PATH=$PATH:$XRDPATH/bin

In order to get the plots with the default settings used in ATLAS you have to add the file ".rootlogon.C" to your home directory. The file can be found at the end of this twiki.

Check-out code

Do the following:

  • export SVNROOT=svn+ssh://lYOURCERNUSERNAMEHERE@svn.cern.ch/reps/ificuwrepo
  • svn co $SVNROOT/PhysTools/httll/plots_mva2011 plots_mva2011

Compile Libraries

The only library that must be compiled for the macro to work is CalibrationDataInterface (version of 2011)

It may be obtained from the HiggsToTauTauToLL Twiki

Compiling

Place the folder inside the "lib" folder. Enter in it and access to the 'scripts' folder. Then,

  • $ ./CompileStandAlone.sh

Linking

If you look at "plot_mvavbf.h", the header looks for the libraries in the following way:

#include "CalibrationDataInterface/CalibrationDataInterfaceROOT.h"

So you have to make a symbolic link inside de 'plots' folder to the correspondent library for each one of the libraries. Example:

  • $ ln -s ../CalibrationDataInterface/CalibrationDataInterface/ .

Files

Three kind of files have to be correctly located in order to the plot to work. These are

  • Data (background MC, signal MC and real data)
  • Z embedding
  • Fake Leptons

The easiest way to redirect the codes is placing the files under a known folder and then pointing the following variables to the correct direction: "dir_data", "dir_fake", and "dir_embed".

The files have to be renamed as:

"bkg_cutX.root", "sig_cutX.root", "data_cutX.root", "embed_cutX.root"

with X being the number of the cut: "1", "2n", "3", "3_2j".

Actual Path of the input files:

The actual directories of the files are:

Data: dir_data = "root://valtical.cern.ch//localdisk/xrootd/users/daalvare/httll/ntuples_hcp_2011/v1/data/"

Fake: dir_fake = "root://valtical.cern.ch//localdisk/xrootd/users/daalvare/httll/ntuples_hcp_2011/v1/fake/"

Embed: dir_embed = "root://valtical.cern.ch//localdisk/xrootd/users/daalvare/httll/ntuples_hcp_2011/v1/embed/"

MVA

Loading

Once all the libraries have been compiled, they have to be loaded by ROOT each time before using plot. The way of doing that is by 'gSystem->Load'. After all of them have been loaded, the macro has also to be loaded in the system.

  • root[0] gSystem->Load("../CalibrationDataInterface/scripts/libCalibrationDataInterface.so");
  • root[1] gROOT->LoadMacro("./plot_mvavbf.cxx+");

Plotting

Once the macro has been successfully loaded, you can execute it to plot a particular cut. The scheme to follow is:

root[1] plot_mvavbf *p = new plot_mvavbf("var", "cut", "chan", "dir");

The three arguments to give are:

  • "var": Is the variable to plot, it has to be chosen from these main ones:
"mtautau" for the mass and for the score of the MVA: "mvaztt", "mvaother" or "mvacomb"

  • "cut": This is the main selection of the MVA , the cut has to be chosen among these ones:
"mvavbfztt", "mvavbfother", "readervbf" or the final mass cuts: "rvbfcomb", "rvbfcombveto" or "9c"

  • "chan": Is the channel to be analysed. It has to be chosen among these ones, it is default set to "all"
"ee", "emu", "mumu", "sf", "df, "EMU", "MUE" or "all", where "e" refers to "electron", "mu" to "muon", "sf" to leptons of the Same Flavour (ee+mumu), "df" to leptons of Different Flavour (emu + mue), "all" is all the channels, and EMU and MUE, the leading lepton.

  • "dir": This is the direction of the path where the files (png, eps, pdf, macros, rootfiles and log files) will be saved. For example, if you write "test", the plots will be saved in "results/mva/test/"

MVA Operation

Introduction

MVA is a MultiVAriate frame for the analysis. It calculates some variables from a weighted linear combinations of several inputs variables. The TMVA are classifed depending of the algorithm for computing the outputs. In the module SetupTMVA, you can select the different kind of algorithms you want to used in the analysis (Cuts, BDT, Likelihood and NN)

For these analysis the Neural Network algorithm (NN) will be mainly used. When a event enters in the NN, the NN uses its inputs variables (weighted in a specific way) to discriminate if the event correspond to a signal event or if it is a background event. This is done by assigning to the event a number, between 0 to 1, which represent the likelihood of that event of being a signal one. This number is called the MVA "score".

But before the NN can be used with the data, it has to be trained. For doing that, we use a set of MC events (from whom we know if they are signal or background) to train the NN in selecting correctly the events. Along with the train, a test is performed to avoid the "overtraining", it is, avoid that the NN becomes too specific to these particular set of events and loses its unbiased discriminant power.

NN Operation

The current configuration of the MVA Analysis for the LepLep is using two NN, one against Z to tautau background and another different one against the rest of backgrounds. The signal is in both cases only the VBF signal. Both NN results are then merged into a final NN which is the one used for selecting the data events. After all the events have been assigned the MVA score, a mass cut will be applied. All the events that don't pass a certain threshold (it is, mostly background) will be discarded, thus reducing the amount of background events and increasing the ratio Signal/Background and the Significance

The NN needs three inputs. The first one is the sets of MC selected for the training/test. The signal used is currently only VBF signal. The MC is either Ztautau or everything-but-Ztautau, depending of the NN you are working with. The sets will be automatically selected by the cut variable. The second one is the input variables to be used for the analysis. They have to be modiified (added/removed) in two places: in function SetupTMVA, where the NN is configured, and in the beginning of the code, inside the variable "tvars". The variables have to be setup in the exact same order.

The third one is the cut selected. It can be:

  • " mvavbfztt " for the Ztautau NN
  • " mvavbfother " for the rest of background NN
  • " readervbf " to merge both previous NN into a single one
  • " rvbfcomb " to use the final NN on the data and get the events which pass the threshold
  • " rvbfcombveto " to use the final NN pn the data and get the events which do NOT pass the threshold

All the NN will save their outputs in the folder "results/mva/..." and in the folder "weights". Meanwhile the results are just to show the performance and the pictures, the weights files (xml) are the needed files to make the combination. If you want to keep them for the final NN, you have to copy them to the folder "reader/weights_ztt" (or weights_other"). If not, the weights will be overwritten. If you select as "var" to be plot, "mvaztt", "mvaother" or "mvacomb" you will get the distribution of events sorted by its MVA score. (Notice that this plot must be done "after" having executed the NN once before).

As said before, to use the reader, you have to place the correspondent weights for the Ztautau and "otherbkg" in the respective folders inside "reader/". Notice that a file have to be inside the folder "weights_comb" for the reader to work, even if it is not the correct one (it will be automatically overwritten, but if it does not exist in advance, it will fail). After the reader have been executed, the final weights files (xml) will appear in the folder "weights" (which then, they have to be copied to the weights_comb" folder. (?)

Lastly, using rvbfcomb and rvbfcombveto (along with the mtautau var) will show the mass plot of the remaining events, after having applied the cut. The threshold for the cut is currently set in 0.8. it is, all the events whose MVA score is smaller than 0.8 will be discarded and become the Boosted category and those whose score is bigger than 0.8 will form the VBF category. This threshold may be modified in the "decide_cut" module.

TMVA Graphic user interface (TMVAGui)

A Graphic User Interface can be opened to show the control plots of each of the NN. In order to do that you have to load and compile the macro TMVAGui.C which is inside the folder "root/tmva". The easiest way is copying this macro (and its dependeces) to your own folder, so it is easy to locate. Then,

  • root[0] .L TMVAGui.C
  • root[1] TMVAGui("")

Inside the "" you have to write the path for the root file of the NN you want to study (mvavbfztt, mvavbfother, readervbf). A small window will be opened with several buttons. The more interesting plots to show are:

  • Input variables distribution
  • Linear Correlation matrices
  • Classifier Cut Efficiencies
  • Significance plot (ROC Curve)
  • KS test
  • NN training

 

-- Main.dalpiq - 23 Nov 2012

Edit | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r5 < r4 < r3 < r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback