Tutorial
Part 1: the user perspective
Preliminaries
Before you start walking through tutorial, you should do two things:
- Set up
nextoffline
, see SoftwareOfflineInstallation
- Create some area where to write files created by this tutorial. It does not really matter where it is and how it is called (the tutorial should be path-independent), as long as you have write permission in there. Let's assume you are going to write things into a directory called
offlinetutorial
. Create the following directory structure:
mkdir offlinetutorial
cd offlinetutorial
mkdir data; mkdir data/evt; mkdir data/histo; mkdir data/img
We will assume in the following that the tutorial is run (that is, the
ana
executable is invoked) from within this directory (
offlinetutorial
, in the example).
Starting with the
helium
release, this tutorial is now a
SRT
package that is itself part of the
nextoffline
release. Have a look at
$SRT_PUBLIC_CONTEXT/Tutorial
; there you will find the required xml and data files organized in the following directory structure:
-
data/nexus
: directory containing two sample files produced by the nexus detector simulation, which we will use as input to the reconstruction. More specifically, these are both bb0nu events simulated at the detector center.
-
data/raw
: directory containing one file of NEXT-1-IFIC raw events taken with the GasIFIC? DAQ.
-
xml
: directory containing nine job configuration files, one for each job in this tutorial.
Example MC1: Understanding XML configuration data
Have a look at the
ExampleMC1.xml
file in
$SRT_PUBLIC_CONTEXT/Tutorial/xml
. You will find that:
- Two shared libraries are loaded at run-time by the
ana
executable, AnaBase
and BHEPReader
, as specified within the <link> </link>
XML tags
- One default module configuration file from your base release is loaded, the one corresponding to the
BHEPReader
module, as specified within the <xmlfile> </xmlfile>
tags
- No modules are executed at run-time, since no modules appear within the
<job> </job>
XML tags
To run the first example, type:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC1.xml -n 1
After the ROOT initialization banner, you will see that the job does not actually do much, terminating with a line of the type:
+- Example 1 sequence (1/0/1 events 0u/0s seconds)
meaning that one event was processed, and that the job took essentially no time...
Still, the job successfully loaded two shared libraries. If you tried to remove or comment out the
AnaBase
library loading, for example by commenting
AnaBase
, like this:
<!--
AnaBase
-->
(XML comments are just like HTML comments), then you would get an error at run-time, since the
BHEPReader
library depends upon the
AnaBase
one. Note that also the order counts: you would get the same error if you were trying to load
BHEPReader
before
AnaBase
. It is good practice to load libraries with raw, reconstruction and analysis-level objects (
RawBase
,
RecoBase
and
AnaBase
) before any other library, since many libraries depend upon them. Similarly, we will see later a library called
EventFilters
, which in fact groups several modules. Since so many modules (with many dependencies) are present there, it is good practice to place that library toward the end of you library link list.
Now, let's look at the other bit of information: module configuration files. Type:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC1.xml -n 1 -X DumpExampleMC1?.xml
and have a look at the output produced,
DumpExampleMC1.xml
. This is a copy of all the configuration data gathered by the job before execution. You will find that a series of configurations for the
BHEPreader
have been loaded, eg the ones within the
<config> </config>
XML tags.
While
<xmlfile>
is useful to load default configurations for modules, you can also define your own configurations within the job control XML file. Try adding to
ExampleMC1.xml
the following text:
<config name="BHEPReader" version="user">
<param name="BranchID">
<int> 1 </int>
ID of branch where BHEP event is stored. Branch identifiers as follows: 0: MC 1: DetSim 2: DAQ 3: Raw 4: RawAux 5: Cal 6: Reco 7: User 8: Summary
</param>
<param name="InputFiles">
<string> "userfile.root" </string>
Input filenames
</param>
</config>
If you rerun with the
-X
option, you will see that your local configuration for
BHEPReader
has been successfully loaded. In addition to default or entirely local module configurations, a mixture is also possible, for example:
<config name="BHEPReader" version="user2" base="default">
<param name="InputFiles">
<string> "userfile2.root" </string>
Input filenames
</param>
</config>
which tells FMWK that the
user2
configuration matches the one of its base configuration
default
, except for the parameters explicitly listed.
Example MC2: Reading BHEP DSTs
Now let's start processing some events and execute some modules. Have a look at
ExampleMC2.xml
: not much difference from
ExampleMC1.xml
, except that now we are executing a single "node" within the job, as specified by the line:
<node module="io::BHEPReader" config="detsim" reco="1" ana="0"/>
In this case, the node is simply a FMWK module (a sequence of modules can be defined as a node, also)
Now run a job by typing:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC2.xml -n 100 -i /dev/null:0
The
-n
option tells how many events to process, while the
-i
one tells the list of FMWK input files to read from.
/dev/null:0
indicates an empty input datastream. This is the case here as we want to read BHEP DSTs and create FMWK events via the
BHEPReader
module (which is an event generator), so we want to start with empty events.
The job above will complain because the
$INPUTBHEPFILES
environment variable is not defined. Please define it to make it point to a nexus file:
export INPUTBHEPFILES=$SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file1.dst.root
or
setenv INPUTBHEPFILES $SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file1.dst.root
or, alternatively, you can get rid of the default
BHEPReader
configuration and specify locally within the job XML file your own configuration for
BHEPReader
.
Note that in addition to environment variables, the
InputFiles
string parameter also recognizes regular expressions and space-separated multiple files, such that the following three statements are equivalent:
-
<string> $SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file1.dst.root $SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file2.dst.root </string>
-
<string> $SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file*.dst.root </string>
-
<string> $SRT_PUBLIC_CONTEXT/Tutorial/data/nexus/nexus_example2.file[1,2].dst.root </string>
Now the job should succeed, and the Example 2 sequence printed at the end should indicate that 100 events were processed through the "Reco" interface of BHEPReader, and that all of them passed. You will also find a non-zero job time.
Now let's discuss some output other than the one on screen. Having not specified the
-g 0
option, you may have inadvertently created "histogram files" in the directory you have executed
ana
from, named
histos.root
,
histos-01.root
, etc., one for each job executed. Now you can safely remove all of them! You can also remove the
DumpExampleMC1.xml
file.
Let's now run:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC2.xml -n 100 -i /dev/null:0 -o data/evt/fmwk_example2.evt.root -g data/histo/fmwk_example2.histo.root
The
-o
option specifies the output ROOT file with FMWK event information, and the
-g
one specifies the output ROOT file with FMWK "histogram" information. The latter is meant for all information that is not event-based (eg, run-based) and can contain any ROOT object, for example histograms, ntuples, trees, etc.
Example MC3: Reading and dumping FMWK event data
Now let's run
ExampleMC3.xml
with:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC3.xml -n 100 -i data/evt/fmwk_example2.evt.root -g 0
The first thing to note is that now we reading as input not an empty stream, but the FMWK event information produced by the previous example. If you look at the
ExampleMC3.xml
file, you will note that we are not executing the
BHEPReader
module: we are reading FMWK events as opposed to BHEP events. You will also note that a single module is being executed also this time, called
EventDump
.
EventDump
is a very simple module that dumps on screen event information for each executed event. On screen, you should see text such as:
Dump of run = 0.0 event = 99
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|-* anabase::Events[1]
|=======================================================================
evt.DAQ()
|=======================================================================
evt.Raw()
|=======================================================================
evt.RawAux()
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
which is telling us that the only object present in the event is a single instance (note the
[1]
) of the
anabase::Event
object within the
DetSim
branch. This object is what the
BHEPReader
module loaded into memory in Example2, and FMWK wrote it to disk. This is a full copy of the event information stored in the nexus BHEP DST.
Example MC4: Event filters
Now let's start to do some analysis on the nexus DST file. Type:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC4.xml -n 100 -i /dev/null:0 -o data/evt/fmwk_example4.evt.root -g data/histo/fmwk_example4.histo.root
If you look at
Example4.xml
, you will find that in addition to
BHEPReader
, now also two event filters are executed:
TrueVetoFilter
and
TrueEnergyWindowPreFilter
. These modules only implement the
Ana
event-level phase and not
Reco
, meaning they do not modify event information. All they do is to select or reject events based on MC truth information contained in
anabase::Event
. Events rejected by a module won't go through following modules. From the FMWK job summary output on screen, you can see that
TrueVetoFilter
rejects 1 out of 100 events, while
TrueEnergyWindowPreFilter
rejects 8 out of 99 events.
You can now check (by adding the
EventDump
module) that no event information has been written by the two filters: only
anabase::Event
objects are present, one per event, there are just simply fewer overall events in
fmwk_example4.evt.root
compared to
fmwk_example2.evt.root
If you now look at the histogram file,
fmwk_example4.histo.root
, you can see that:
- the module that used more resources has been the
BHEPReader
one, as shown in the totTime
histogram
- while the two filters have not modified event information, they do have created histogram information, as can be seen in their respective subfolders in the file
Example MC5: Adding reco objects into the FMWK event
Let's now run:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleMC5.xml -n 100 -i data/evt/fmwk_example4.evt.root -o data/evt/fmwk_example5.evt.root -g data/histo/fmwk_example5.histo.root
This job uses as input the output of the Example4 job, and runs the following modules:
-
MakeVoxels
: create "voxels" of energy deposition within chamber by pixelizing the MC true 3D ionization hit information
-
cal::SmearVoxels
: perform simple energy smearing for each voxel
-
TrackID
: create tracks out of voxels
-
TrackWireCounterFilter
: filter to select event topology based on the number of total tracks (of "wire" or "isolated blob" type) and on the number of "wires"
-
EnergyROIFilter
: select events based on total deposited energy per event
-
EventDump
: dumps on screen event information
By looking at the job summary output, you can see that the two filters reduce the number of events making it to the next module, while
MakeVoxels
,
SmearVoxels
and
TrackID
do not. These three modules populate the event with two new types of objects:
-
recobase::Voxels
-
recobase::Tracks
recobase::Voxel
objects are created both by
MakeVoxels
(in the
DetSim
branch) and by
SmearVoxels
(in the
Reco
branch).
recobase::Track
objects are created by
TrackID
(in the
Reco
branch). We only see events with one
recobase::Track
instance since we make this requirement in the
TrackWireCounterFilter
module, and the event dump is placed after this cut. On the other hand, there are typically tens of
recobase::Voxel
objects per event.
Example for one event:
Dump of run = 0.0 event = 98
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|-* anabase::Events[1]
|-* ionization
|-* recobase::Voxels[32]
|=======================================================================
evt.DAQ()
|=======================================================================
evt.Raw()
|=======================================================================
evt.RawAux()
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|-* ionization
|-* recobase::Voxels[32]
|-* recobase::Tracks[1]
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
The other thing to note is that these three object types live underneath the
ionization
subfolder of the
Cal()
and
Reco()
event branches. This is because the nexus simulation output used refers to the "fast" simulation, simulating energy deposition by ionization in the chamber by the various tracks, without drift+light production+detector response.
If you look at the produced "histogram" file, you will see that several histograms are produced by the various modules. Have a look at the source code to see what they are. You will also see that
MakeVoxels
and
TrackID
are the most CPU-intensive modules in this job.
Example DATA1: reading NEXT-1-IFIC raw data
Let us move now to real data. In the
data/raw
directory you can find one file containing 32 raw events obtained with the NEXT-1-IFIC chamber and the "GasIFIC" DAQ system. Each event contains 32 channels, 19 connected to anode PMTs and 13 connected to cathode PMTs. Let us run:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleDATA1.xml -n -1 -i $SRT_PUBLIC_CONTEXT/Tutorial/data/raw/alphas_new_trigger_all_channels_with_blanket_002.gasific -g 0
The
-n -1
option means all events in the file, 32 in this case. Note that the
ExampleDATA1.xml" file specifies only one regular FMWK module in the job definition, =io::EventDump
. How are the events correctly read, then? Raw data are read in this case via a special I/O module called
RawDataReader
. Unlike FMWK general modules, FMWK I/O modules do not have an associated
xml
file. Rather, they "know" how to read or write in a specific format based on the filename prefix/suffix specifications in the
ana
command line options
-i
and
-o
. In this case, the fact that the input file ends in
.gasific" tells =nextoffline
to use the
GasIFICreadModule=/=GasIFICEventHandle
classes in
RawDataReader
.
The event dump for a single event looks like:
Dump of run = 0.0 event = 32
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|=======================================================================
evt.DAQ()
|-* recobase::DAQDigits[32]
|=======================================================================
evt.Raw()
|=======================================================================
evt.RawAux()
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
We are loading in memory 32 instances of the
recobase::DAQDigit
class, one for each PMT waveform stored in the raw data.
Example DATA2: NEXT-1-IFIC Alpha Analysis, Pass 1
Now execute the following:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleDATA2.xml -n -1 -i $SRT_PUBLIC_CONTEXT/Tutorial/data/raw/alphas_new_trigger_all_channels_with_blanket_002.gasific -o data/evt/alphas_new_trigger_all_channels_with_blanket_002.pass1.evt.root -g data/histo/alphas_new_trigger_all_channels_with_blanket_002.pass1.histo.root
This is the first pass of the preliminary analysis alpha analysis in NEXT-1-IFIC, as detailed in the CDR. Examples DATA3 and DATA4 below also refer to this analysis.
In addition to reading the raw data as in the previous example, we are now executing the following (module names / module configurations):
-
PedestalSubtraction/local
: create 32 pedestal-subtracted waveforms (recobase::Digit
instances), one for each channel. In addition to subtracting the pedestals, the waveforms are also resampled, from a 10 ns to a 100 ns clock tick width. In order to save space, the original recobase::DAQDigit
objects are deleted from the event.
-
SumDigits/local
: sum all waveforms from cathode PMTs (there are 13 of them), and store in Raw()/CathodeSum/recobase::Digit
-
flt::EnergyROIFilter/local
: select events with a minimum charge of 20.E6 ADC counts for the cathode sum
-
PeakFinder/cathodesum
: process the cathode sum waveform to reconstruct recobase::Peak
objects present in it
-
flt::PeakTypeCounterFilter/local
: select events with one and only one S1-like recobase::Peak
, and one and only one alpha-like recobase::Peak
-
flt::DriftTimeFilter/local
: select events with a drift time between 40 and 160 us, in order to select events well within the chamber fiducial volume. The drift time is computed as the time difference between the alpha-like peak timing and the S1-like peak timing
-
EventDump/default
: dumps on screen event information
As a result, a typical event content looks like the following:
Dump of run = 0.0 event = 26
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|=======================================================================
evt.DAQ()
|-* recobase::DAQDigits[0]
|=======================================================================
evt.Raw()
|-* recobase::Digits[32]
|-* CathodeSum
|-* recobase::Digits[1]
|=======================================================================
evt.RawAux()
|-* CathodeSum
|-* recobase::Peaks[13]
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
Note that the 32
recobase::DAQDigit
objects have been deleted (0 such objects are now present). In this particular event,
PeakFinder
found 13
recobase::Peak
objects in the cathode sum waveform.
At the dataset level, 11 out of 32 events pass the
flt::EnergyROIFilter
, 11 out of 11 events pass the
flt::PeakTypeCounterFilter
filter, and 10 out of 11 events pass the
flt::DriftTimeFilter
.
Some histograms are created by the various modules, to illustrate the event selection. These histograms are also used to create graphics images that are produced at the end of the job. Those images are created only for a quick evaluation of the event selection. For better-looking plots, the typical procedure would be to post-process the FMWK histogram file
alphas_new_trigger_all_channels_with_blanket_002.pass1.histo.root
(where these histograms have been written), for example via a simple ROT macro.
Example DATA3: NEXT-1-IFIC Alpha Analysis, Pass 2
We can now do further event reconstruction and selection, using as starting point the output of example DATA2:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleDATA3.xml -n -1 -i data/evt/alphas_new_trigger_all_channels_with_blanket_002.pass1.evt.root -o data/evt/alphas_new_trigger_all_channels_with_blanket_002.pass2.evt.root -g data/histo/alphas_new_trigger_all_channels_with_blanket_002.pass2.histo.root
While in example DATA2 we have been using only information from cathode PMTs, here we will use information from anode PMTs. We are executing four FMWK (module names / module configurations):
-
PeakFinder/singlechannel
: create recobase::Peak
objects out of recobase::Digit
ones. Unlike in the previous example, this time we are processing the waveforms from the individual PMTs rather than the cathode PMT waveform.
-
PointReconstruction/local
: create a recobase::ChamberHit
object per event, containing full 3-dimensional hit information for the nearly point-like energy deposition expected for a alpha decay within the gas. We use information from the individual anode PMT waveforms (more precisely, from the recobase::Peak
objects found within them) to reconstruct the x and y position via a simple center of gravity algorithm. The z position is reconstructed using the drift time (measured via the cathode sum waveform, as in the previous example) times the drift velocity. The latter is specified via a module xml
parameter.
-
flt::RadiusFilter/local
: given the reconstructed x and y, we can compute their quadrature sum to obtain r, the reconstructed distance of the alpha decay from the chamber axis. We require r to be less than 35 mm, in order to select events well within the chamber fiducial volume.
-
EventDump/default
: dumps on screen event information
A typical event content looks like:
Dump of run = 0.0 event = 21
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|=======================================================================
evt.DAQ()
|-* recobase::DAQDigits[0]
|=======================================================================
evt.Raw()
|-* recobase::Digits[32]
|-* CathodeSum
|-* recobase::Digits[1]
|=======================================================================
evt.RawAux()
|-* CathodeSum
|-* recobase::Peaks[41]
|-* SinglePMTs
|-* recobase::Peaks[575]
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|-* recobase::ChamberHits[1]
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
Compared to event contents from example DATA2, you can now see an additional
recobase::ChamberHit
object in the
reco()
branch.
The
flt::RadiusFilter
module selects 9 out of 10 events, and creates a scatter plot illustrating the effect of the radial cut.
Example DATA4: NEXT-1-IFIC Alpha Analysis, Pass 3
In example DATA3 we completed our selection of alpha candidates. We are now interested in reconstructing the energy and do other analysis on our final sample of 9 events. Let's execute:
ana -x $SRT_PUBLIC_CONTEXT/xml/ExampleDATA4.xml -n -1 -i data/evt/alphas_new_trigger_all_channels_with_blanket_002.pass2.evt.root -o data/evt/alphas_new_trigger_all_channels_with_blanket_002.pass3.evt.root -g data/histo/alphas_new_trigger_all_channels_with_blanket_002.pass3.histo.root
Apart from the usual
io::EventDump
module, this job only runs one more algorithm:
EnergyReconstruction
. This module starts from the cathode PMT information to reconstruct the deposited energy per event. This information is stored in the
recobase::Energy
object:
Dump of run = 0.0 event = 21
evt.Header()
|=======================================================================
evt.MC()
|=======================================================================
evt.DetSim()
|=======================================================================
evt.DAQ()
|-* recobase::DAQDigits[0]
|=======================================================================
evt.Raw()
|-* recobase::Digits[32]
|-* CathodeSum
|-* recobase::Digits[1]
|=======================================================================
evt.RawAux()
|-* CathodeSum
|-* recobase::Peaks[41]
|-* SinglePMTs
|-* recobase::Peaks[575]
|=======================================================================
evt.Cal()
|=======================================================================
evt.Reco()
|-* recobase::ChamberHits[1]
|-* recobase::Energys[4]
|=======================================================================
evt.User()
|=======================================================================
evt.Summary()
|=======================================================================
Why four
recobase::Energy
objects per event rather than one? This is simply an analysis choice. A few corrections are sequentially applied in the estimation of energy. The four different values store not only the "final" energy value but also "temporary" ones, before all corrections are applied. Here's the meaning of the four different energy measurements:
- First instance of
recobase::Energy
: very rough gain intercalibration of cathode PMTs performed. This energy estimator is simply the cathode sum after this rough intercalibration
- Second instance of
recobase::Energy
: a correction for electron attachment is also applied. The correction is applied as a function of measured drift time. This is because the longer the drift, the higher is the fraction of ionization charge that is lost along the drift.
- Third instance of
recobase::Energy
: a correction for radial effects is also applied. The correction is applied as a function of measured radius. This is because the more peripheral (high radius) the event is, the larger is the effect of photon absorption on the light tube walls, and correspondingly the lower the recontructed energy.
- Fourth instance of
recobase::Energy
: currently unused, yielding a value equal to the third instance of recobase::Energy
. May be used to correct the energy as a function of the event time extent.
In addition to apply the above-mentioned corrections, the
EnergyReconstruction
algorithm can also be used to compute them, directly from the data. This is determined depending on the value of a few flags (
ComputeRadialCorrection
,
ComputeDriftCorrection
,
ComputeWidthCorrection
). So, if nothing is known about such corrections a priori, one may have to run this job three times:
- a first time with all corrections set to zero, the drift time correction is computed at the end of this job
- a second time with the the drift time correction evaluated in the previous pass, the correct radial correction is computed at the end of this job
- a third time with the correct drift time and radial corrections. The "final" reconstructed energy per event is evaluated in this step.
Again, a number of histograms, profile histograms and scatter plots (together with the corresponding images) are produced during this job. In this case, some data fits are also performed. See the source code for details.
Part 2: the developer perspective
--
MichelSorel - 05 Oct 2010