View topic | Edit | WYSIWYGAttachPDF
r6 - 22 Oct 2009 - 10:23:32 - LuisMarchYou are here: TWiki >  Atlas Web  >  GridComputing > AtlasDataAnalysisTutorial

Atlas Computing and Data Analysis Tutorial

INTRODUCTION

The main goal of this tutorial is that people use their own local resources, the User Interfaces (UIs) in this case, to perform analysis at different levels, according to the analysis and data. If you have a look at the ATLAS Analysis Model, you will see different reconstruction outputs from RAW data: ESD, AOD, TAG, DPDs, ...

DATA MANAGEMENT

DATA ANALYSIS: ATHENA

0.- Set up

At IFIC, the ATLAS software is installed in AFS and can be accessed through a User Interface, so login to one of the User Interfaces (ie. ui04)

$> ssh -X mivipe@ui04.ific.uv.es

Note: The following example uses release 15.3.0 of the Atlas software.

WHAT TO DO FOR THE FIRST TIME:

Preparing your account to use the ATLAS software:

ATLAS software is divided into packages, and these are managed through the use of a configuration management tool, CMT. This is used to copy ("check out") code from the main ATLAS repository and handle linking and compilation.

You must prepare your account, in the following way. Note that the cmthome directory does not have to be in $HOME, it can be in any sub-directory, but if so you will need to amend all the following examples accordingly.

$> cd $HOME
$> mkdir tutorial
$> mkdir tutorial/15.3.0           // this is the directory in which you will work
$> mkdir tutorial/cmthome

$> source /opt/exp_software/atlas/prod/releases/rel_15-2/CMT/v1r20p20090520/mgr/setup.sh
$> cd tutorial/cmthome

Now using your text editor of choice, create a file called requirements. See AtlasLogin for a full explanation of what this does. A basic requirements file for using the User Analysis package at IFIC would be:

#---------------------------------------------------------------------
set CMTSITE STANDALONE
set SITEROOT /opt/exp_software/atlas/prod/releases/rel_15-3/
macro ATLAS_DIST_AREA ${SITEROOT}
macro ATLAS_TEST_AREA ${HOME}/tutorial
apply_tag projectArea
apply_tag 32Default
macro SITE_PROJECT_AREA ${SITEROOT}
macro EXTERNAL_PROJECT_AREA ${SITEROOT}
apply_tag simpleTest
set SVNROOT svn+ssh://myLxplusUsername@svn.cern.ch/reps/atlasoff
apply_tag noSVNROOT
use AtlasLogin AtlasLogin-* $(ATLAS_DIST_AREA)
set CMTCONFIG i686-slc4-gcc34-opt
#---------------------------------------------------------------------

Note: Make sure you have set ATLAS_TEST_AREA correctly in the requirements file or you will get errors saying that the include directory is not found in your work area, when you try to compile for example. You will need to modify this requirements file if you change your test area directory. Also make sure you use ${HOME} or an absolute path to refer to your home directory and not ~. Then do :

$> cmt config

You will only have to follow these procedures once until the version of CMT changes. Now kill the terminal window and log back in.

Note: Some tips for using CMT can be found at SoftwareDevelopmentWorkbookCmtTips and more advanced setup instructions at WorkBookAdvancedSetup.

1.- Get your data

Before getting the data you need for your analysis, it is usefull to check its properties using the AMI Database.

For this tutorial you are going to use the following data:

mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635

Try going to the AMI webpage (here) and clic on AMI Dataset Search, then:

  • Have a look at the data properties.
  • Find the meaning of the tags (e378_s462_r635)

Now that you know more about the data, you can download a small sample for testing.

>voms-proxy-init -voms atlas
>source /afs/ific.uv.es/project/atlas/software/ddm/DQ2Clients/setup.sh

>dq2-ls mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635* 
mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635/
mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635_tid045869
mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635_tid045868

It is generally better to work with the full container (the one that ends with '/') as it has the whole sample. Now, we only need a test sample so we are going to download only one file.

>mkdir /tmp/chooseafunnyname/
>cd /tmp/chooseafunnyname/
>dq2-get -n 1 mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635_tid045869

The '-n 1' tells dq2 to get 1 random file from the container. Also note that every UI has a different /tmp directory.

2.- The UserAnalysis? Package

Now let us get the User Analysis package and compile it. For that execute the following commands :

$> ssh ui04
$> source ~/tutorial/cmthome/setup.sh -tag=15.3.0
$> cd ~/tutorial/15.3.0

Athena packages have different tags for the different Athena releases. How can you find which package tag (ie. -00-13-17) goes with which Athena release?

For that we can use the AMI Database. Click here to go to the AMI Portal Home. Then click on Tag Collector. In the top menu, click on Search->Package Versions. There you will find all the packages each Athena release has with their proper tags.

  • Use AMI to find the tag and the complete path of the package UserAnalysis?.

Now, you can use CMT to check out the package:

$> cmt co -r UserAnalysis-00-13-17 PhysicsAnalysis/AnalysisCommon/UserAnalysis
#CMT---> Info: Working on PhysicsAnalysis/AnalysisCommon/UserAnalysis (UserAnalysis-00-13-17)
mivipe@svn.cern.ch's password: 

In order to access CERN SVN repository you need to introduce your CERN account password.

You can have a look now at what CMT did for you.

$> cd PhysicsAnalysis/AnalysisCommon/UserAnalysis/
$> ls
ChangeLog  cmt  doc  python  Root  run  share  src  UserAnalysis

You can learn more about the package structure later at Creating_your_own_package. But for now, you can just compile and run the code.

$> cd cmt/
$> cmt config
$> source setup.sh
$> cmt broadcast gmake

To run the compiled software, get first your AnalysisSkeleton_topOptions.py file by doing the following :

$> cd ../run
$> get_files StructuredAAN_topOptions.py 
$> get_files AnalysisSkeleton_topOptions.py 

Note that, in order to set the AOD data you want to process, you have to edit the AnalysisSkeleton_topOptions.py file and change the line containing the EventSelector.InputCollections variable as follows :

EventSelector.InputCollections = [ "put_here_the_name_of_your_specified_AOD.pool.root_that _you_want_to_process"] 
finally run the following command :
$> athena.py AnalysisSkeleton_topOptions.py

WHAT TO DO EVERY TIME YOU LOGIN :

Once your account is prepared for using the ATLAS software, you have to do the following to run athena:

$> ssh ui04.ific.uv.es
$> cd tutorial/cmthome
$> source setup.sh -tag=15.3.0
$> cd ../15.3.0/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt/
$> source setup.sh
$> cd ../run
$> athena.py AnalysisSkeleton_topOptions.py 

Note: If your make changes in the C++ files located in the src/ directory of the package, you have to recompile before running Athena.

Note: If you want to use a patched production cache e.g. 15.3.0.1 instead of 15.3.0 you have to do:

$>source $HOME/tutorial/cmthome/setup.sh -tag=AtlasProduction,15.3.0.1

2.1.- Basic Athena coding

Here we explain the main features of the Athena code by looking at AnalysisSkeleton?.cxx, which contains the AnalysisSkeleton? algorithm, the principal algorithm of the analysis code. This is the algorithm called by AnalysisSkeleton?_topOptions.py that is run by athena and any other methods that one might wish to define will be called from within AnalysisSkeleton?.

All athena code has three primary functions, initialize(), execute(), and finalize(). The function initialize() is run a single time to set up athena tools and variables used in the actual analysis. Execute() is then run for each event (so if you analyze n events you will run execute() n times). It is in this function that the actual analysis is done and data is manipulated and written to the output ntuple. Finalize() is run once at the end. In our example, not much is done with it, although one might want to use it to perform calculations with global variables after all events have been run over.

$> cd src
$> emacs AnalysisSkeleton.cxx &

Our code's principle parts and their important features are as follows:

  • The constructor: Creates an instance of the AnalysisSkeleton? algorithm. Multiple instances of an algorithm with different job options can be called from within the AnalysisSkeleton?_topOptions.py file. Later, we will see how this is done. Within the constructor there are statements of the form
       declareProperty("string", variable = VALUE );
       
    These define variables that can be set in the AnalysisSkeleton?_topOptions.py file (We will do this later). Should no value be specifed in the jobOptions file, the variable is set to VALUE as given in the line above.

  • Initialize:

    Initialize is run a single time before the analysis of events begins. Here, the various tools needed to access the data file are set up. First, the StoreGate? is initialized, which allows access to the contents of the data file. The tools retrieve is then initialized, which allows one to open specific containers (eg. jets or electrons) and access their contents among other things.

    Histograms and branches of the output ntuple are also defined in initialize with statements of the form
       addBranch("string", variable);
       

  • Clear: This resets variables to be used for the next event.

  • Execute: In execute we have some detailed analysis examples.

  • Finalize: Not much is done here.

Let's focus, for example, on the method electronSkeleton() (go to line 620). The key lines are as follows:

mLog << MSG::DEBUG << "in electronSkeleton()" << endreq;

this is how you can print information in the screen. It can be ALL, VERBOSE, DEBUG, INFO, WARNING, ERROR or FATAL.

const ElectronContainer* elecTES = 0;
sc=m_storeGate->retrieve( elecTES, m_electronContainerName);

this is how you can use the data. First you need to create a container for the data you want to work with (in this case electrons). Then, you can load the data into your container.

m_h_elecpt->Fill( (*elecItr)->pt(), 1.);
m_h_eleceta->Fill( (*elecItr)->eta(), 1.);
this is how you fill histograms.

m_aan_eta->push_back((*elecItr)->eta());
m_aan_pt->push_back((*elecItr)->pt());

this is how you fill ntuples.

Now that you know how to use the message service:

  • Try adding some messages of your choice in the execute method and in the finalize method and see the difference.

We'll need more information about the electrons in the ROOT lecture.

  • Add to the ntuple the following variables
       m_aan_e->push_back((*elecItr)->e());
       m_aan_px->push_back((*elecItr)->px());
       m_aan_py->push_back((*elecItr)->py());
       m_aan_pz->push_back((*elecItr)->pz());
       

    Don't forget to add the proper branches in the initialize method and to clean them in the clean method. Also notice that in c++ data members should be declared in the .h file (although it is not compulsory) so go to ../UserAnalysis and add the proper lines to AnalysisSkeleton?.h.

Having considered the major features of the AnalysisSkeleton?.cxx, we can now compile the code.

$>cd ../cmt
$>cmt make

In order to run the code, you need to go to ../run and open AnalysisSkeleton?_topOptions.py

At first you will find this file difficult to understand and that is because it is indeed difficult to understand. You can look at an easier example later in Creating_your_own_package.

The key lines in the jobOptions file are the following:

  • You already know how to set the dataset you want to work with
       ServiceMgr.EventSelector.InputCollections = ["...."]
       

  • These lines add the algorithm you just modified (AnalysisSkeleton?) to the algorithm sequence (topSequence). Of course, you can add more algorithms to topSequence, you can even add the same algorithm more than once.

       from UserAnalysis.UserAnalysisConf import AnalysisSkeleton
       topSequence.CBNT_AthenaAware += AnalysisSkeleton() 
       AnalysisSkeleton = AnalysisSkeleton()
       

  • Lines like this one give values to the properties declared in the algorithm constructor.
       AnalysisSkeleton.McParticleContainer = "SpclMC"
       

  • Here you can set AnalysisSkeleton? and general screen output level.
       AnalysisSkeleton.OutputLevel = INFO
       ...
       ...
       ServiceMgr.MessageSvc.OutputLevel = INFO
       

  • Here is where you set the number of events you want to run over. Default is 10 and -1 runs over all events available.
       theApp.EvtMax = 2
       

You can now run your algorithm

$>athena.py AnalysisSkeleton_topOptions.py

and have a look at the output and check if the changes you made are really there.

$>root AnalysisSkeleton.aan.root
root [1] CollectionTree->Draw("ElectronPx")

Good luck.

2.2.- Some tips

2.2.1.-Use non-default DBRelease

If you need to use a non-default ATLAS DBRelease one option is to point to the Athena release where the database you want is installed.

$> export ATLAS_DB_AREA=/opt/exp_software/atlas/prod/releases/rel_15-2/
$> export DBRELEASE_OVERRIDE=6.9.1
$> source cmthome/setup.sh -tag=15.3.0,setup

Another option is to install the DBRelease in a place of your choice using Pacman. Pacman is available on AFS so you can use it directly.

$> ssh ui04
$> mkdir /tmp/DBRelease/
$> export DBRELEASE_INSTALLDIR=/tmp/DBRelease/
$> cd $DBRELEASE_INSTALLDIR
$> source /afs/cern.ch/atlas/software/pacman/pacman-latest/setup.sh
$> pacman -allow trust-all-caches -get http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/pacman4/DBRelease:DBRelease-6.9.1

Now, go to your work area.

$> export ATLAS_DB_AREA=${DBRELEASE_INSTALLDIR}
$> export DBRELEASE_OVERRIDE=6.9.1
$> source cmthome/setup.sh -tag=15.3.0,setup

Good luck.

2.2.2.-Run Athena in different modes
> athena
by default, it looks for jobOptions.py and runs in batch mode.

> athena myJobOptions.py
Athena will use myJobOptions.py for input configuration.

> athena myJobOptions.py >& athena.log
directs all screen printout to athena.log.

> athena myJobOptions.py |tee athena.log
you can see the output in the screen but it directs all screen printout to athena.log as well.

> athena -s myJobOptions.py
-s prints out all jobOptions scripts that are included within.

> athena -i myJobOptions.py
interactive mode of running athena.
   >myAlg.MyInt = 4
   >theApp.EvtMax = 5
   >theApp.run()
   >ctrl+d

> athena -l ERROR myJobOptions.py
sets logging level (ALL, VERBOSE, DEBUG,INFO, WARNING, ERROR or FATAL)

3.- Creating your own package

Based on WorkBookCreatingNewPackage

3.1.- Creating the package

First, log into your account and set up as explained avobe.

Now, go to your test area and create a new package. Note the 15.3.0 directory.

> cd tutorial/15.3.0
> cmt create MyNewPackage MyNewPackage-00-00-01

If your package is in a subdirectory such as MyPath/MySubPath put this path as an extra argument at the end of the cmt create command.

> cmt create MyNewPackage MyNewPackage-00-00-01 MyPath/MySubPath

Do not put MyPath/MySubPath/MyNewPackage.

The package created contains the following directories:

  • cmt - contains files for compiling, cleanup and setup scripts
  • src - contains the source code .cxx files

You also need to create two additional directories and some files.

  • MyNewPackage? (or whatever the package name happens to be) - this contains the header .h files
  • share - contains the TutorialjobOptions?.py file, which is run by athena and specifies the characteristics of the job, eg. data input files, variables that are read into source code, etc.

First, create the requirements file in the cmt directory. This file tells the package where to look for the Athena libraries that are used in the code and usually declared with an include statement in the header file.

Go to the cmt directory of your new package:

> cd MyNewPackage/cmt

and create (or replace) the requirements file. The requirements file must contain the following:

  • The package name
  • The author name
  • The package dependencies (i.e. a list of other packages needed by your package)
  • A list of all the source code used by the package

A minimal requirements file would be:

#################################################
package MyNewPackage
author ATLAS Workbook
use AtlasPolicy AtlasPolicy-01-*
use GaudiInterface GaudiInterface-01-* External
use AthenaBaseComps AthenaBaseComps-*   Control
library MyNewPackage *.cxx -s=components *.cxx
apply_pattern component_library
apply_pattern declare_joboptions files="MyJobOptions.py"
#################################################

3.2.- Inserting the algorithm

The algorithm source code goes into the src directory of the package, and the header file into the directory with the name of the package. The example algorithm provided does little other than produce a single message; its purpose is simply to demonstrate how to write new algorithms.

> cd ../src

Create a file MyAlg?.cxx:

#include "MyNewPackage/MyAlg.h"
#include "GaudiKernel/MsgStream.h"
/////////////////////////////////////////////////////////////////////////////
MyAlg::MyAlg(const std::string& name, ISvcLocator* pSvcLocator) :
AthAlgorithm(name, pSvcLocator)
{
  // Properties go here
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
StatusCode MyAlg::initialize(){

    ATH_MSG_INFO ("initialize()");

    return StatusCode::SUCCESS;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
StatusCode MyAlg::execute() {

   ATH_MSG_INFO ("Your new package and algorithm are successfully executing");
   ATH_MSG_DEBUG ("This is a DEBUG message");

    return StatusCode::SUCCESS;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
StatusCode MyAlg::finalize() {
    
    ATH_MSG_INFO ("finalize()");
    
    return StatusCode::SUCCESS;
}

You also need to create two files that are used to create entry points into a dynamically loaded library, so you can say at runtime "load this alg" and it will be found in the correct library and be loaded into memory.

> mkdir components
> cd components

Create a file called MyNewPackage?_entries.cxx containing this:

#include "MyNewPackage/MyAlg.h"
#include "GaudiKernel/DeclareFactoryEntries.h"
DECLARE_ALGORITHM_FACTORY( MyAlg )
DECLARE_FACTORY_ENTRIES(MyNewPackage) {
DECLARE_ALGORITHM( MyAlg )
}

and a file called MyNewPackage?_load.cxx containing this:

#include "GaudiKernel/LoadFactoryEntries.h"
LOAD_FACTORY_ENTRIES(MyNewPackage)

Next you should create the header files:

> cd ../..
> mkdir MyNewPackage
> cd MyNewPackage

Create a file MyAlg?.h:

#include "AthenaBaseComps/AthAlgorithm.h"
/////////////////////////////////////////////////////////////////////////////
class MyAlg:public AthAlgorithm {
    public:
    MyAlg (const std::string& name, ISvcLocator* pSvcLocator);
    StatusCode initialize();
    StatusCode execute();
    StatusCode finalize();
};

Finally, you should create a JobOptions? file.

> cd ..
> mkdir share
> cd share

Create a file MyJobOptions?.py:

#--------------------------------------------------------------
# Private Application Configuration options
#--------------------------------------------------------------
# Full job is a list of algorithms
from AthenaCommon.AlgSequence import AlgSequence
job = AlgSequence()
# Add top algorithms to be run
from MyNewPackage.MyNewPackageConf import MyAlg
job += MyAlg( "FirstInstance" )   # 1 alg, named "FirstInstance"
job += MyAlg("SecondInstance") # 2 alg, named "SecondInstance"
#--------------------------------------------------------------
# Set output level threshold (DEBUG, INFO, WARNING, ERROR, FATAL)
#--------------------------------------------------------------
job.FirstInstance.OutputLevel = INFO
job.SecondInstance.OutputLevel = DEBUG

#--------------------------------------------------------------
# Event related parameters
#--------------------------------------------------------------
# Number of events to be processed (default is 10)
theApp.EvtMax = 1
#==============================================================

Note that you do not need to create MyNewPackageConf?.py yourself. This is generated as part of the build process in the genconf directory of MyNewPackage? and then installed in InstallArea?/python/MyNewPackage. Do not name the JobOptions? file the same as your package, meaning MyNewPackage?.py, since this will give an error message and cause Athena to crash.

3.3.- Building the new package

Now build your package:

> cd ../cmt
> cmt config
> gmake

If all is well you should be able to go to your run directory:

> mkdir ../run
> cd ../run
> cp  ../share/MyJobOptions.py .

> athena.py MyJobOptions.py

If all is well you should see lines like this in the output:

MyNewPackage          INFO initialize()
MyNewPackage          INFO execute()
MyNewPackage          INFO Your new package and algorithm are successfully installed
MyNewPackage          INFO finalize()

Good luck.

4.- DPD, TAG analysis

4.0- Get an AOD file

Log into a UI and do the following steps (if you did it before: here and here, then skip this and go directly to here):

$> cd <path>/tutorial/15.3.0
$> voms-proxy-init -voms atlas
$> source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh   # Use the CERN setup, if there is not any local DQ2 setup 
$> dq2-get -n 1 mc08.105208.TTbar_McAtNlo_Jimmy_HighPtTop.recon.AOD.e378_s462_r635_tid045869

4.1- DPD

Let's start with the DPDs. Do the ATHENA setup, if you didn't do it before or you start a new terminal session. Then, check-out the TopPhysDPDMaker? package:

$> cd <path>/tutorial/cmthome
$> source setup.sh -tag=15.3.0

$> cd ../15.3.0

$> cmt co -r TopPhysDPDMaker-14-02-11 PhysicsAnalysis/TopPhys/TopPhysDPDMaker
$> cd PhysicsAnalysis/TopPhys/TopPhysDPDMaker/cmt/
$> cmt config
$> source setup.sh
$> cmt broadcast gmake

$> cd ../run

Create the following jobOptions and call it MyD1PD?_topOptions.py:

InputCollections = ["<path>/AOD.XXXX.YYYY.pool.root"]
EvtMax=10
SkipEvents=0
include("TopPhysDPDMaker/ElectroweakD1PD_topOptions.py")

Run the previous jobOptions:

$> athena.py MyD1PD_topOptions.py | tee MyD1PD_topOptions.log

There could be some problem related to some container inside the AOD you are analyzing:

Go to: ../share/ElectroweakD1PD_topOptions.py

and comment the AddItem? which fails. For example: AddItem?('JetCollection#Kt6H1TowerJets')

Run again the joboptions and check if everything is fine now. This time you should get the AOD output file, called TopPhysD1PDStream?.pool.root.

Do the same for the D2PD, copy the previous jobOptions and change where D1PD appears by D2PD

InputCollections = ["<path>/AOD.XXXX.YYYY.pool.root"]
EvtMax=10
SkipEvents=0
include("TopPhysDPDMaker/ElectroweakD2PD_topOptions.py")

Take into account the change you did before for the D1PD in ../share/ElectroweakD1PD_topOptions.py

Do the same change in ../share/ElectroweakD2PD_topOptions.py and run it:

$> athena.py MyD2PD_topOptions.py | tee MyD2PD_topOptions.log 

You will get the following AOD output file: Electroweak.D2PD.pool.root

Now, let's go for the D3PD. Edit the MyD3PD?_topOptions.py file with the following contents:

InFileNames = ["<path>/AOD.XXXX.YYYY.pool.root"]
EvtMax=10
SkipEvents=0
Analysis=["Trigger","TruthAll"]
include("TopPhysDPDMaker/ElectroweakD3PD_topOptions.py")

You don't need to modify anything else, like in the previous steps. Run it:

$> athena.py MyD3PD_topOptions.py | tee MyD3PD_topOptions.log

You will get the following ntuple output file: Electroweak.D3PD_test.aan.root

4.2- TAG

Get the jobOptions to make the TAG file:

$> get_files -jo aodtotag.py 

Edit a new jobOptions TAG_jobOptions.py with the following info:

PoolAODInput=["<path>/AOD.XXXX.YYYY.pool.root"] 
DetDescrVersion="ATLAS-GEO-02-01-00" 
PoolTAGOutput="TAG.test.pool.root"
include("aodtotag.py")

You can see that the DetDescrVersion? is requiered to tun the TAG jobOptions. Where could you get this info? using AMI. Look for the dataset using AMI and cross-check this value.

Run the TAG_jobOptions.py:

$> athena.py TAG_jobOptions.py | tee TAG_jobOptions.log

You will get the following ntuple output file:

DATA ANALYSIS: USE OF GANGA

ROOT

PROOF

-- MiguelVillaplana - 21 Oct 2009

View topic | Edit |  | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r10 |r8 < r7 < r6 < r5 | More topic actions...
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback