View topic | Edit | WYSIWYGAttachPDF
r26 - 13 Apr 2011 - 15:11:40 - ElenaOliverGarciaYou are here: TWiki >  Main Web  >  TWikiUsers > ElenaOliverGarcia

Mis apuntes

ApuntesPython


Prueba modificaciones Twiki de Ganga

VI.- Performing Atlas Distributed Data Analysis on the Grid

The official way to perform ATLAS Data Analysis is using the Ganga or the Pathena tool. However, the Grid Resources can also be used directly by the user. In the following let us discuss how to use Ganga for Distributed Analysis in Atlas on the Grid. For a detailed information on how to use the "Ganga Command Line Interpreter (CLI)" see the Working with Ganga document. See also Introduction to GangaAtlas slides.

For the Atlas Distributed Data Analysis using Ganga, visit this site for more details, there are some tutorials for different versions at the bottom.

VI.1.- Setting up and configuring the Ganga Environement at IFIC

When running Ganga for the first time, a .gangarc configuration file will be created in our home directory. We have then to change some configuration parameters in it, accordingly with what we need. To do this let us execute the following commands (Note that we have to login into a User Interface machine and have a valid proxy-certificate if we want to run jobs on the Grid using Ganga) :

$> ssh ui00.ific.uv.es
$> source /afs/ific.uv.es/project/atlas/software/ganga/install/etc/setup-atlas.sh
$> export GANGA_CONFIG_PATH=GangaAtlas/Atlas.ini

if you want another version write it like argument in the end:

$> source  /afs/ific.uv.es/project/atlas/software/ganga/install/etc/setup-atlas.sh 5.5.21

Create your configuration file (.gangarc)

$> ganga -g

Let us answer "yes" to the question asked by Ganga to create the .gangarc configuration file. We leave Ganga (with Ctrl-D) and use our favourite editor to edit the .gangarc file, then do the following changes corresponding to the ATLAS-IFIC environment :

In the section labelled [Athena] add the line

ATLASOutputDatasetLFC = lfcatlas.pic.es
In the section labelled [Configuration] add the line
RUNTIME_PATH = GangaAtlas:GangaPanda:GangaJEM
In the section labelled [LCG] add the line
DefaultLFC = lfcatlas.pic.es
DefaultSE = srmv2.ific.uv.es
VirtualOrganisation = atlas
In the section labelled [defaults_GridProxy] add the line
voms = atlas
In the section labelled [defaults_VomsCommand] add the line
init = voms-proxy-init -voms atlas

The variable ATLASOutputDatasetLFC catalogues your output in the PIC Cloud for ATLASOutputDataset option. The variable RUNTIME_PATH chooses the ATLAS applications. The variables DefaultLFC and DefaultSE define a catalogue and Storage Element where your input file can be save if it is bigger than 10 MBs (the input size maximum on the GRID), and GANGA can make a copy in the job site. The variables voms and init permit to GANGA create the correct grid proxy.

Once these changes done in the .gangarc configuration file, we are ready to use Ganga for the Atlas distributed data analysis.

VI.2.- Running analysis using Athena with Ganga

In the following example we will see how to run a job using the release 16.0.2 of the Atlas software and User Analysis package. If we are using a different software release, we have to make the adequate changes when executing the setup commands.

First of all, login into a User Interface machine and get a valid proxy-certificate :

$> ssh ui00.ific.uv.es
$> voms-proxy-init -voms atlas

As the Athena framework will be used, we have to configure the environment variables to take that into account doing the Athena Setup before Ganga ejecution. Athena Setup configuration variables and environment as follows (It is supposed that the "PhysicsAnalysis/AnalysisCommon/UserAnalysis" Package is installed. Visit section V. to see how to install the package) :

$> source /lustre/ific.uv.es/sw/atlas/local/setup.sh
$> mkdir -p AthenaTestArea/16.0.2
$> export AtlasSetup=${VO_ATLAS_SW_DIR}/prod/releases/rel_16-2/AtlasSetup
$> alias asetup='source $AtlasSetup/scripts/asetup.sh'
$> asetup 16.0.2 --testarea=$HOME/AthenaTestArea --multitest --dbrelease "<latest>"

IMPORTANT: Note that Ganga should be run from the run/ directory, of the Physics Analysis Package and ganga will recognize your athena package. So, let us do the following from the run directory :

$> cd $TestArea/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run

The next command allows us to use the latest version of Ganga that is installed in our environment.

$> source /afs/ific.uv.es/project/atlas/software/ganga/install/etc/setup-atlas.sh
$> export GANGA_CONFIG_PATH=GangaAtlas/Atlas.ini

Suppose now that we want to run an Athena job (with a montecarlo input dataset), corresponding to the following Ganga-python file configuration named myGangaJob.py, and that this file is located in the run/ directory of our Physics Analysis Package :


# FileName myGangaJob.py #########################

j = Job()
number=str(j.id)
j.name='twiki-Panda-'+number
j.application=Athena()
j.application.option_file=['AnalysisSkeleton_topOptions_AutoConfig.py']
j.application.max_events=-1
j.application.atlas_dbrelease='LATEST'
j.application.prepare(athena_compile=False)
j.splitter=DQ2JobSplitter()
j.splitter.numfiles=1
j.inputdata=DQ2Dataset()
j.inputdata.dataset= ["mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306/"]
j.inputdata.number_of_files=1
j.outputdata=DQ2OutputDataset()
j.outputdata.datasetname='user.elenao.test.ganga.panda.'+number
j.outputdata.location='IFIC-LCG2_SCRATCHDISK'
j.backend=Panda()
j.submit()

# End File #########################

The variable 'number' is just to save the id (the ganga identication) of the job for defining a unique output dataset name (a requirement of DQ2). With j.outputdata.location, you can choose the Storage Element for your output datasets according the DDM police. The output dataset is first storaged in the scratchdisk of the site where the subjobs has run ,and then, a Datri request is made to your location.

You can see your job status and the stdout file in the Panda Monitor Users (look for your name like in the Grid Certificate).

To make Ganga executing this file do the following :

$> ganga

Now we are inside the Command Line Interpreter (CLI) of Ganga, then we can use the own commands of Ganga. For example, in order to execute our "myGangaJob.py" file we use the execfile() command as follows :

In [1]: execfile('myGangaJob.py')

Other commands for working with our jobs are:

See all the jobs information:

In [1]: jobs

See the only information from an interval of jobs:

In [1]: jobs.select(1,10)

See the Ganga Status of one job and of a subjob:

In [1]: jobs(10).status

In [1]: jobs(10).subjobs(2).status

See only subjobs with concrete status:

In [1]: jobs(10).subjobs.select(status='failed')

See subjobs number with a concrete status:

In [1]: len(jobs(10).subjobs.select(status='failed'))

See one subjob in the panda monitor:

In [1]: ! firefox $jobs(731).subjobs(4826).backend.url  &

Kill the job:

In [1]: jobs(10).kill()

Remove one job:

In [1]: jobs(10).remove()

-- ElenaOliverGarcia - 13 Apr 2011

View topic | Edit |  | WYSIWYG | Attach | PDF | Raw View | Backlinks: Web, All Webs | History: r29 < r28 < r27 < r26 < r25 | More topic actions...
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback