DssPrototype
Developers
Jerry, Jack, Marco
Description
A simple end-to-end skimming service prototype.
- Create a TAG database, given an AOD.
- Process to create SkimSelector output.
- Simple Athena job and wrapper as SkimExtractor prototype.
- Publish results into LRC
Dependencies
- AOD sample
- TAG schema
- MySQL database
- Local DQ2 client tools
- Athena releases
- Data services prototype machine
Indentation indicates a dependency.
Initial Testing of Skim Functionality and Pool File Catalog Utilities
On November 27, 2006 Jack, Marco, and Jerry got together to walk through a set of exercises intended to demonstrate that a skim of the TAG databse on tier2-05.uchicago.edu could be performed manually and that the resultant output could then be processed correctly by athena. These exercises were also intended to demonstrate and prove the funcitonality of several
PoolFileCatalog utilities.
Initialization
The following steps were executed:
1. Setup the d-cache and atlas environments. The atlas environment chosen was for Release 12.0.3
source /local/workdir/d-cache.setup.sh
source /share/app/atlas_app/atlas_rel/12.0.3/cmtsite/setup.sh -tag=AtlasOffline,12.0.3
source /share/app/atlas_app/atlas_rel/12.0.3/AtlasOffline/12.0.3/AtlasOfflineRunTime/cmt/setup.sh
Note that the ordering of the source scripts is important. D-cache must be courced before the atlas setup scripts.
Basic component tests
2. We then ran a connection test on the tier2-05 LRC (UC_VOB) using one of the POOL utilities
that comes with the Atlas release.
Execute FClistPFN to list the files referenced by the local replica catalog (lrc) on tier2-05.uchicago.edu
FClistPFN -u mysqlcatalog_mysql://dq2user:dqpwd@tier2-05.uchicago.edu/localreplicas
The output should be similar to the following:
gsiftp://tier2-d1.uchicago.edu:2811/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00034.pool.root.1
gsiftp://tier2-d1.uchicago.edu:2811/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00032.pool.root.2
.
.
gsiftp://tier2-d1.uchicago.edu:2811/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.ESD.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.ESD.v12000201_tid002968._00041.pool.root.1
gsiftp://tier2-d1.uchicago.edu:2811/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.ESD.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.ESD.v12000201_tid002968._00022.pool.root.2
.
.
There should be 40 AOD files and 40 ESD files
3. For a variety of reasons we don't want to use the LRC directly for running athena jobs.
- Avoid interference with DDM activities.
- The PFN's used by the LRC are not intelligible to athena.
- The LRC does not set the file type.
So the next step was to create a copy of the LRC that we could edit. The easiest thing
was to export it to an xml file using the POOL utilities. We used FCpublish with a query
limiting it to files containing the extension
root
.
FCpublish -d file:/tmp/myPoolFileCatalog.xml -u mysqlcatalog_mysql://dq2user:dqpwd@tier2-05.uchicago.edu/localreplicas -q "pfname like '%.root.%'"
The output file
/tmp/myPoolFileCatalog.xml
should have a size of 60046 bytes and contain 1053 lines.
4. For our test we can do simple hacks to get things going. All of the files are in dcache, so we just replace
the gsiftp protocols with dcache protocols. In the real system, one should be able to look up pfn's and either
add or replace protocols based on client capabilities. Also all of the files are POOL files, so we can just
do a global replace of the file type. In the future there may be different file types, so this needs a long term
solution not used here.
Create a working directory (e.g. dss_prototype_tests) and exexcute:
cd dss_prototype_tests
mv /tmp/myPoolFileCatalog.xml PoolFileCatalog.xml
The file PoolFileCatalog.xml must be edited so that the protocol of each pfn is listed as "dcache:" instead of "gsiftp:"
The file PoolFileCatalog.xml must also be edited so that the "pfn filetype" is set to "ROOT_All" instead of "NULL".
You can do this by executing the following vi commands:
vi PoolFileCatalog.xml
:1,$s/gsiftp:\/\/tier2-d1.uchicago.edu:2811/dcache:/g
:1,$s/filetype="NULL"/filetype="ROOT_All"/g
:wq
Steps 3 and 4 can be executed using the script
makePoolFileCatalog.
5. Before we move on to the TAG's let's see if athena can actually access one of the pfns listed in the generated PoolFileCatalog.xml with the
dcache:
protocol. We'll run an athena job using this catalog and accessing one of the files by LFN and have it execute AthenaPoolUtilities/EventCount.py
athena -c "In=['LFN:testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00034.pool.root.1']" AthenaPoolUtilities/EventCount.py
The following lines in the output file show that this input pfn has 1000 events:
EventCount INFO ---------- INPUT FILE SUMMARY ----------
EventCount INFO Input contained: 1000 events
EventCount INFO -- Event Range ( 50700 .. 33199 )
This shows that we have a catalog which allows us to read files resident in dcache. The
next step is to look up the file using the references in a TAG file rather than an LFN in the catalog.
6. A set of TAG root files are stored at
'/local/workdir/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968
. Each root file contains the collected TAG data from four different AOD root files for a total of 4000 events per TAG root file. Each of these TAG root files can be considered a
collection
. Now we'll try to have athena read one of these
collections
using the previously created PoolFileCatalog.xml and count the number of events in the input.
athena -c "In=
'/local/workdir/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968._01-04'];CollType='ExplicitROOT'" AthenaPoolUtilities/EventCount.py
The output should show the following:
HistorySvc INFO Service finalised successfully
EventCount INFO ---------- INPUT FILE SUMMARY ----------
EventCount INFO Input contained: 4000 events
EventCount INFO -- Event Range ( 4800 .. 22199 )
7.
DEFERRED, skip to 8 Create a new mysql database on tier2-06.uchicago.edu to put the PoolFileCatalog.xml data into. Then we'll have a mysql database to use instaed of a static PoolFileCatalog.xml file .
mysql -h tier2-06.uchicago.edu -u root -p tagdbadmin1
NOTE: This failed with "access denied" The problem probably stems from the fact that we used the wrong mysql. The one we used was from /usr/bin/mysql and probably should have been from /opt/mysql-standard-4.1.20-pc-linux-gnu-i686/bin/mysql
We decided at this point in time to NOT create the mysql database and continue using the static PoolFileCatalog.xml file.
8. Now that we have checked that we can take a TAG file and access data, let's start at the beginning of our
Overview diagram and move
through a simple Composer->Selector->Extractor.
At this point we don't have much of a
SkimComposer, so one can
- look directly at the 10 TAG root files in
/local/workdir/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968
Execute:
root /local/workdir/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968._01-04.root
root> CollectionTree.Draw("name_of_token_to_display");
Here is the resultant plot executing
CollectionTree.Draw("NLooseElectron");
- look at the mysql database directly.
Execute:
mysql -h tier2-06.uchicago.edu -u tagreader tier2tagdb
mysql> select count(*) from testIdeal07_005711_TAG_v12000201 where ...
For example, one could open one of the root files and make plots using methods described on
TagDBView.
This gives the visual analysis needed to decide what sort of cuts make sense. One can then look at the full
sample in the mysql database to find the total.
We chose a simple cut requiring at least one jet (NJet>0) and at least one electron (NLooseElectron>0).
This reduced the event sample from roughly 40k events to 4014 events, i.e. a relatively loose 10% selection.
Note that at these small sizes one could TChain all 10 of the TTree's in the root files and look at the full sample,
but that's not a scalable solution.
9. The selection criteria form the pseudo output from the
SkimComposer. Now we want to simulate the
SkimSelector
by extracting a simplified TAG file which can be used as input to athena.
Run the following script for the collection
testIdeal07_005711_TAG_v12000201
from the TAG-database (
tagreader@tier2-06.uchicago.edu). The output destination is defined to be the root file "test.coll"
CollCopy.exe -src testIdeal07_005711_TAG_v12000201 MySQLltCollection -srcconnect mysql://tagreader@tier2-06.uchicago.edu/tier2tagdb -dst test.coll RootCollection -query "NJet>0&&NLooseElectron>0" -queryopt "SELECT EventNumber,RunNumber"
The
query
option implements the selection from above. The
queryopt
option tells
it to strip off
all of the metadata other than EventNumber and RunNumber.
Note that
EventNumber and RunNumber are purely for debugging. They are
not used by athena
in the following steps.
The following output should be seen:
CollCopy: Finished copying input collection(s) `testIdeal07_005711_TAG_v12000201:MySQLltCollection' to output collection(s) `test.coll:RootCollection'
The file test.coll.root should have been created with a size of 48132 bytes.
10. Copy the local file PoolFileCatalog.xml to the directory /share/data/t2data so that it is visible to all working nodes. Then use the file "test.coll.root" as input to athena with the file PoolFileCatalog.xml file stored in /share/data/t2data. First we will need to get a local copy of the script EventCount.py and edit it so that the PoolSvc knows to read the file PoolFileCatalog.xml located at /share/data/t2data.
cp PoolFileCatalog.xml /share/data/t2data
get_files AthenaPoolUtilities/EventCount.py
edit the file EventCount.py and add the following three lines after line 23:
theApp.EvtMax = 20 # Only analyze 20 events in this example
PoolSvc = Service("PoolSvc")
PoolSvc.ReadCatalog = ["file:/share/data/t2data/PoolFileCatalog.xml"]
Jak put a local copy of the file testExtract.py into the directory
/share/data/t2data/
. We copied the file testExtract.py from
/share/data/t2data
into our local working directory and executed:
athena -c "In=['test.coll']; CollType='ExplicitROOT'; OutputAODFile='testExtractor.AOD.root'" testExtract.py
The output file testExtractor.AOD.root should have been created with a size of roughly 2283743 bytes.
The command above took 37 sec on tier2-06, so roughly 2 sec/ev.
Validation
11. The simplest way to get plots of the variables that we used to cut on is to recreate a TAG file from the skim
file
testExtractor.AOD.root:
wget http://twiki.mwt2.org/pub/DataServices/DataServicesMachine/MergeAODwithTags_v12.py.txt
mv MergeAODwithTags_v12.py.txt MergeAODwithTags_v12.py
athena -c "PoolAODInput=['testExtractor.AOD.root']; PoolAODOutput='mytest'; doWriteAOD=False" MergeAODwithTags_v12.py
The command above took 30 sec on tier2-06, so roughly 1.5 sec/ev. For more information on using these job options look at
Building TAGs.
The file
mytest.TAG.root
contains all the tags and can be viewed doing some plots with root.
If you haven't already done so, execute the
source
commands from Step 1.
root mytest.TAG.root
.ls
CollectionTree.Print()
CollectionTree.Draw("NLooseElectron")
CollectionTree.Draw("NJet")
.q
As visible from the plots below, there are at least one Jet and at least 1 Electron (as designed and expected).
12. To validate the process the same cuts have been done on the source data and the data has been plotted (before and after the cuts). Keep in mind that this example processed only 20 valid events (the first 250 events total).
The plots show that the distribution is the same. They also show that the important cut (for this sample at least is on
NLooseElectron
): if
NLooseElectron>0 then NJets>0.
root /local/workdir/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.TAG.v12000201_tid002968._01-04.root
CollectionTree.Draw("NJet", "NJet>0&&NLooseElectron>0", "", 250, 0)
CollectionTree.Draw("NJet", "NLooseElectron>0", "", 250, 0)
CollectionTree.Draw("NJet", "", "", 250, 0)
CollectionTree.Draw("NLooseElectron", "NJet>0&&NLooseElectron>0", "", 250, 0)
CollectionTree.Draw("NLooseElectron", "NJet>0", "", 250, 0)
CollectionTree.Draw("NLooseElectron", "", "", 250, 0)
Next steps
Comments from Jack on November 28, 2006:
Once we've converged on the recipe from our notes, I would
like to have Ed try to follow it later this week. I think we should also
start timing everything we do for initial performance evaluation.
Other milestones we can then move on to.
- run the single job extraction on a worker node rather than tier2-06.
- look at the TNT scripts and see what is reasonable to borrow.
- put the TAG files that Jerry built into a DDM dataset.
- Try to subscribe to the TAG dataset at BNL.
Running on the Grid
Now we would like to be able to run the same athena job as previously, but on the grid site UC_ATLAS_MWT2. We will also change the testExtract.py file so that the maximum number of events to process is 2000 events; instead of 20 as previously. Since the 20 event run took about 37 seconds to complete, a run of 2000 events should scale to about 65 minutes.
We downloaded the attachment
run-on-the-grid.tar.gz
and executed the command
tar -xzvf run-on-the-grid.tar.gz
. This created the directory
runongrid
. In the directory
runongrid
the file
gridjob.sub
is the submit file to the Condor jobmanager. We attempted first to run this on the OSG server,
tier2-osg.uchicago.edu
but were unsuccessful in getting the job to start running even after 24 hours because of a hugh backlog of Panda jobs on this server. Instead, we decide to access the condor scheduler directly on the UC_ATLAS_MWT2 site by simply re-logging in to tier2-06.uchicago.edu and executing:
export PATH=/opt/condor/bin:/opt/condor/sbin:$PATH
This allowed us to submit directly to the UC_ATLAS_MWT2 condor scheduler; putting us at a higher priority than the ATLAS jobs submitted from remote sites.
We then changed to the working directory
runongrid
and submitted the job directly to the condor scheduler by executing
condor_submit gridjob.sub
. The progress of the job was monitored by periodically executing
condor_q
. The job tookould take approximately 1 hour to run on a worker node at the UC_ATLAS_MWT2 site.
The standard output from the job was in the file
gridjob.runongrid.out
. The standard error from the job was in the file
gridjob.runongrid.err
. Examining the contents of the file
gridjob.runongrid.err
showed the following timing results:
Time(s) for job on worker node:
real 60m32.698s
user 21m2.660s
sys 0m23.070s
The size of the output file
testExtractor.AOD.root
was
Size of output file:~240Mb
-rw-r--r-- 1 gfg gfg 239477995 Dec 6 12:06 testExtractor.AOD.root
Here is the resultant plot executing
CollectionTree.Draw("NLooseElectron");
Here is the resultant plot executing
CollectionTree.Draw("NJet");
The ATLAS
TagNavigatorTool (TNT) is a utility which is designed to allow ATLAS physicists to use the Tag database for analysis, using the
Distributed Data Management (DDM) system in an integrated way. It consists of a number of python scripts which interact with the Tag database, the grid, and DQ2 (the ATLAS DDM implementation). The twiki page
TagNavigatorToolOsg describes some preliminary investigations into the use of this tool in an OSG environment. Attempts to duplicate the exercises described previously in this twiki using TNT in the OSG environment on tier2-06.uchicago.edu are described in the
TagNavigatorToolOsg twiki.
Major Updates:
--
JackCranshaw - 29 Nov 2006 - updated with comments on section purposes and direction
--
JerryGieraltowski - 28 Nov 2006 - update with exercises done on November 27, 2006
--
RobGardner - 06 Nov 2006
- plot1: N Electron:
- plot2: N Jets:
- N Electron in source data:
- N Electron in source data, cut:
- N Electron in source data, double cut:
- N Jets in source data:
- N Jets in source data, cut:
- N Jets in source data, double cut: