GetToKnowDataset

How to explore what a dataset is.

Dataset has been downloaded with dq2_get, e.g. streamtest.004891.inclJet.merge.TAG.v12000699

Request:
  • is the practice dataset locally available?
  • can you do your best to answer (get to know this dataset):
    • which physics generation parameters were used (what kind of dataset is it)
    • formats: are these ESD or ADO, or something else?
    • total number of events in the dataset
    • total number of files
    • average number of events/file
    • total number of GB in the dataset

Dataset for DSS test are names streamtest.XXX.inclJet.merge.AOD.v12000699. For each AOD dataset there is a TAG dataset. Dataset has been downloaded with dq2_get, e.g. streamtest.004891.inclJet.merge.AOD.v12000699 and it is available. dq2_ls -f allows to see how many files are in the DS and that are all in the directory. This dataset is in /ecache/marco/for_dss/streamtest.004891.inclJet.merge.AOD.v12000699

which physics generation parameters were used (what kind of dataset is it)

streamtest.004891.inclJet.merge.AOD.v12000699: From the name I think it is the result of a merge operation of several datasets, selecting datasets with Jet events.

Searching for it in AMI (simple search with dataset name) gives no result: http://ami.in2p3.fr/

Using overview search, looking for streamtest I see a similar dataset (TAG one): streamtest.004968.inclJet.merge.AOD.v12000699 (I don't think the link will bring to it, anyway: https://atlastagcollector.in2p3.fr:8443/AMI/servlet/net.hep.atlas.Database.Bookkeeping.AMI.Servlet.Command )

logicalDatasetName streamtest.004968.inclJet.merge.TAG.v12000699 - DQ2 - GANGA export - Prodsys - Provenance - Series
dataType TAG
physicsCategory run = 8
physicsSubcategory stream = 0
TransformationPackage DEFAULT
physicistResponsible Ayana Holloway,J. Cranshaw
physicsComment https://twiki.cern.ch/twiki/bin/view/Atlas/StreamingTest2006
physicsProcess StreamTest
AtlasRelease 12.0.6
prodsysStatus UNKNOWN
jobConfig DEFAULT
principalPhysicsGroup soft-test
physicsShort inclJet
requestedBy UNKNOWN
totalEvents  
creationComment v3 private production
datasetNumber 4968
version  
lastModified 2008-1-15.3.1. 39. 328123000
created 2007-09-18 12:06:12.0
amiStatus VALID
trashedBy  
trashDate  
trashAnnotation  
trashTrigger  
createdBy mike
modifiedBy mike
productionStep UNKNOWN
relationalLoaded 1

The dataset seems part of the StreamTest_2007 project and the physics in it is coming from StreamingTest2006: https://twiki.cern.ch/twiki/bin/view/Atlas/StreamingTest2006

Probably AOD are coming from StreamingTest2006 (see page above). Jack Cranshaw or Hans Von Der Schmitt may know more

TAGs were produced with 12.0.6.5, here is a description of the content: https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection

formats: are these ESD or ADO, or something else?

Files should be AODs (in the AOD datasets). TAG datasets contain pool collections (with pointers to events, basically GUID,event number tuples)

total number of events in the dataset

Running Athena to count events. The dataset contains 3 files: streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00001.pool.root.1 streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00002.pool.root.1 streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00003.pool.root.1
> . /osg/app/atlas_app/atlas_rel/12.0.6/cmtsite/setup.sh -tag=AtlasOffline,12.0.6.5
AtlasLogin: Configuration problem - /nfs/osg/app/atlas_app/atlas_rel/12.0.6/AtlasOffline/latest non-existent
> . /osg/app/atlas_app/atlas_rel/12.0.6/cmtsite/setup.sh -tag=AtlasOffline,12.0.6
> . /osg//app/atlas_app/atlas_rel/12.0.6/AtlasOffline/12.0.6/AtlasOfflineRunTime/cmt/setup.sh
> export POOL_CATALOG="xmlcatalog_file:PoolFileCatalog.xml"
> pool_insertFileToCatalog streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00001.pool.root.1
> pool_insertFileToCatalog streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00002.pool.root.1
> pool_insertFileToCatalog streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00003.pool.root.1
> declare -i myctr
> for i in streamtest.004891*; do ((myctr=myctr+1)); athena -c "In=['LFN:$i']" AthenaPoolUtilities/EventCount.py >> output.$myctr; date; done

... counting the events is taking much longer that I was expecting... (2.nd time was faster: ~60,40,20min respectively) anyway here are the results for the 3 files (in order):
EventCount           INFO Input contained: 4275 events
EventCount           INFO  -- Event Range ( 10890 .. 7689 )
EventCount           INFO Input contained: 1 runs
EventCount           INFO  -- 1
EventCount           INFO Input contained the following Event Types
EventCount           INFO  -- Detector
EventCount           INFO  -- Physics
EventCount           INFO  -- Simulation

EventCount           INFO Input contained: 3677 events
EventCount           INFO  -- Event Range ( 22810 .. 20589 )
EventCount           INFO Input contained: 1 runs
EventCount           INFO  -- 1
EventCount           INFO Input contained the following Event Types
EventCount           INFO  -- Detector
EventCount           INFO  -- Physics
EventCount           INFO  -- Simulation

EventCount           INFO Input contained: 2876 events
EventCount           INFO  -- Event Range ( 35080 .. 29899 )
EventCount           INFO Input contained: 1 runs
EventCount           INFO  -- 1
EventCount           INFO Input contained the following Event Types
EventCount           INFO  -- Detector
EventCount           INFO  -- Physics
EventCount           INFO  -- Simulation

There are 10828 events total

total number of files

There are 3 files:
> ls -lh
total 1.8G
-rw-r--r--  1 marco mwt2  831 Feb 13 16:57 PoolFileCatalog.xml
-rw-r--r--  1 marco mwt2 714M Feb  4 13:54 streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00001.pool.root.1
-rw-r--r--  1 marco mwt2 618M Feb  4 13:53 streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00002.pool.root.1
-rw-r--r--  1 marco mwt2 481M Feb  4 13:53 streamtest.004891.inclJet.merge.AOD.v12000699_tid000000._00003.pool.root.1

Considering all the inclusive Jet datasets included in the stream amd involved in the skim there are 27 files.

average number of events/file

total number of GB in the dataset

I evaluated this with du -sh: 1.8G

Considering all the inclusive Jet datasets included in the stream amd involved in the skim there are 14.9G
> du -sb *AOD*
1899250991      ../streamtest.004891.inclJet.merge.AOD.v12000699
1761686113      ../streamtest.004902.inclJet.merge.AOD.v12000699
1769648534      ../streamtest.004913.inclJet.merge.AOD.v12000699
2074051533      ../streamtest.004924.inclJet.merge.AOD.v12000699
1706809973      ../streamtest.004935.inclJet.merge.AOD.v12000699
1769148313      ../streamtest.004946.inclJet.merge.AOD.v12000699
1494394146      ../streamtest.004957.inclJet.merge.AOD.v12000699
1732153262      ../streamtest.004968.inclJet.merge.AOD.v12000699
1766115652      ../streamtest.004979.inclJet.merge.AOD.v12000699

-- MarcoMambelli - 13 Feb 2008
Topic revision: r3 - 15 Feb 2008, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback