Introduction
Tags are event-level meta data written as part of primary AOD production and stored in a database (file based or relational). This metadata can be used to quickly filter or skim through the events so that one only processes events of interest.
The meta data can vary for different data and/or different software releases.
It is essentially a pool collection including information about
- run numbers
- triggers
- physics attributes
and references to the events and the files containing different formats of that event (AOD, but also ESD and RDO).
TAGs can be stored in:
- ordinary ROOT files (ROOT TAGs) that can be opened in ROOT just like NTuples
- TAG content in Relational Databases (Relational TAGs). The supported database flavors are MySQL and Oracle. There is a central database at CERN part of the Tier0
Both persistency formats can be accessed with POOL utilities.
ROOT-based TAGs can be accessed also directly from Athena.
The Relational TAGs hosted at CERN can also be accessed via Web using the TAG browser (
ELSSI - Event Level Selection Service Interface) or a
Web Service interface.
You can check the
download page TutorialTag080318addFiles to download the files referenced in the examples of this page.
TAG production
By default, the standard reconstruction,
RecExCommon, produces the TAG together with the ESD and AOD, in one step but the TAG is actually produced from the AOD. Since the AOD files are small in number of events, the AOD will be merged before producing the TAG from the merged AOD files. The produced TAG is in ROOT format, i.e., it is a ROOT file just like the AthenaAwareNTuple is a ROOT file. The idea being that, in the subsequent step, the ROOT files of the TAG will be loaded into the master TAG database (manually or by Tier0 Manager). One can make selections by querying the TAG databases or the TAG ROOT files.
You can also build your own Tags with Athena, but this will not be covered here. If you are interested in doing this, please look at the
TagForEventSelection wiki.
The available tags are documented in
https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection
There is also a boolean for each trigger used.
A list of all the tags in FDR is available in
tagContent.txt.
Caveats
problems with POOL in current release
- POOL utilities in current release (13.0.40.x) have some problems with FDR data (utility hangs at the end, some references are out of order). Bugs have been fixed and fixes will be in a future ATLAS release, probably in the 14 series.
problems with FDR TAGs
- TAG's produced in the initial production did not have the correct trigger information. A reprocessed set of TAG's was produced, but needed to be modified to point to the merged AOD from the initial production.
- The reprocessed TAGs after modificatin have been loaded into the CERN TAG database. All TAG files other than temporary files used in the loading are corrupt in one way or another.
- Although luminosity blocks numbers are available in the TAG, a method to connect that to a luminosity calculation is missing (the information needed is being laboriously parsed from log files and uploaded into the database). So although you can do selections, information needed for cross section calculations will be mostly missing.
Setup
Setup your Grid environment and initialize your proxy
source /share/osg-client/setup.sh
grid-proxy-init
Setup DQ2 clients (both dq2-client and dq2_enduser_tools)
source /share/dq2/dq2.sh
If you are running locally setup the ATLAS release.
The recommended version to work with FDR-1 data and FDR-1 TAGs is 13.0.40
source setup.sh -tag=13.0.40.2,AtlasProduction,releases,runtime,slc3,gcc323
If you plan to run on the Grid you need to setup also Pathena.
In the examples below are referred some files that are available for download in the
download page
Since TAGs are stored essentially in POOL collections, there are a number of command line utilities which can be useful when operating on TAGs. These are:
- CollListAttrib - Prints out the metadata values for events from a collection or collections.
- CollAppend - Appends events from an existing collection or collections onto the end of an existing collection (the collection is created if it does not exist already).
- CollListFileGUID - Prints out a list of the GUIDs of AOD files which contain the events.
- CollListPFN - Prints out a list of physical filenames of AOD files which contain the events
- CollListToken - Prints out the stringified POOL Token objects (i.e. stringified persistent references) for the number of events specified.
- CollSplitByGUID - Splits a collection (or collections) into a number of smaller collections, based on file boundaries and number of events.
Full details are available in the
POOL reference manual.
$ CollAppend -src streamtest_inclJet_v3_1203442639_user RootCollection \
> -dst myTest RootCollection -queryopt 'SELECT RunNumber, EventNumber'
CollAppend: Finished copying input collection(s) `streamtest_inclJet_v3_1203442639_user:RootCollection' to output collection(s) `myTest:RootCollection'
$ CollSplitByGUID.exe -src myTest RootCollection
$ CollListFileGUID.exe -src sub_collection_1 RootCollection
>> Creating query for the collection
>> Executing the query
04A50ABB-D82F-DC11-89AE-00E08127C853
Using DQ2 is possible to find the files or datasets included in your selection.
DatasetByFileGUID 04A50ABB-D82F-DC11-89AE-00E08127C853
Then using
dq2_ls or the
dataset browser you can find where your datasets are replicated
Once you setup your Athena environment, you can follow the instruction in
https://twiki.cern.ch/twiki/bin/view/Atlas/WorkBookReadingAOD to be ready to read an AOD file and do a simple analysis starting from the provided AnalysisSkeleton.
To read FDR data you have remove references to MC truth quantities inside the AnalysisSkeleton job. in order to read the FDR I data. Modify AnalysisSkeleton_topOptions.py to not to refer to MC quantities and also to read the FDR I data
You can find an example in /users/ryoshida/testarea/13.0.40/PhysicsAnalysis/AnalysisCommon/UserAnalysis/src/AnalysisSkeleton.cxx.mod2.
At this point to use TAGs in the AnalysisSkeleton_topOptions.py you have to:
- Remove the line ServiceMgr.EventSelector.InputCollections=["your_data_file.AOD.pool.root"]
- Add the following five lines which
- supply a tag file as input in place of an AOD file;
- let the EventSelector know that the input is a tag file;
- tell the EventSelector which reference to use (tags can also navigate to ESD and RAW);
- specify a filter predicate;
- add a "ReadCatalog" in which the AOD files pointed to by the tags have been registered
ServiceMgr.EventSelector.InputCollections = [ "/users/malon/FDR1Tags/StreamEgamma_r12.root" ]
ServiceMgr.EventSelector.CollectionType = "ExplicitROOT"
ServiceMgr.EventSelector.RefName = "StreamAOD"
ServiceMgr.EventSelector.Query = "RunNumber==003070 && NLooseElectron>1"
ServiceMgr.PoolSvc.ReadCatalog += [ "xmlcatalog_file:/data/nas2/users/ryoshida/fdr08_run1/PoolFileCatalog.xml" ]
You'll obtain something like the job option file AnalysisSkeleton_topOptions.py for event extraction in the
download page
Then you can prepare your job as reported in the instructions of the analysis example referenced above
and run locally as before.
Use Pathena to send event selections on the Grid
To use Pathena you have to set it up as explained yesterday, then you have to prepare the job:
- locally included files have to be in the run directory
- prepare a file myfiles_xfer with the local files that you need to transfer
Now you can send the job using a command like
pathena --inputFileList=myfiles_xfer --shipInputs --outDS your_output_ntuple_name AnalysisSkeleton_topOptions.py
You don't need to specify the input dataset. Pathena will find it for you using back navigation from the events in the collection.
You can query the TAG DB and select events by
ELSSI, the Event Level Selection Service Interface (TAG browser)
Introduction
For the FDR-1 data we shall be using the "Tutorial" version 1.0 of ELSSI, which can be found here.
For future reference, the official production version is
here (currently hosting Streamtest data). I order to access ELSSI you need to have a Grid certificate installed in your browser.
You can find instruction to load a Grid certificate in a browser
here.
If you have your Grid Certificate loaded into the browser and navigate to the ELSSI front page, you will be identified by an abbreviated version of your Distinguished Name (DN), which usually corresponds to your surname or full name. This will be displayed in the title banner and your full DN subsequently used for internal ELSSI bookkeeping.
The Browser Pane
Most interaction with ELSSI is through the browser pane (see below image), which is organised using a familiar "tabbed" format. There are three parent tabs, two of which contain a set of sub-tabs.
You should use the "Back/Continue" buttons at the top of the browser pane to navigate between tabs. In doing so, you will be taken through the selection criteria in the most logical manner. However, it is possible to select a tab by clicking on its title. At any point you may reset the entire selection criteria to their default (blank) values with either of the Reset buttons in the browser pane.
Creating a Query
The Create Query tab has five sub-tabs which are used to construct the query to be passed to the relational TAG database. These sub-tabs are:
Temporal Cut
From here you can select the
Run Range of interest (not available yet for FDR) using the two menus to set the minimum and maximum run numbers. In future versions of ELSSI, you will also be able to select a time range corresponding to a period of data taking.
You can choose also the
Time period selecting the date when the events were produced in the radio button.
The
Continue button allows to go to the next sub-tab within the query creation and to all the following tabs in the correct order.
Streams
The
Streams sub-tab shows details of the physics streams from which selections can be made. Full Dress-Rehearsal (FDR-1) streams are available in the
_tutorial_ version of ELSSI. The numbers in parentheses indicate the total number of events available in a given stream; these values are updated on an hourly basis.
More detailed information about a stream can be obtained in a popup window which appears when you click on a stream's name.
This links is currently missing for FRD data but it will be added. For completeness I'm adding images about the production ELSSI server (Streamtest data) that includes the links.
- In the production server Click on the streamtest_inclEle_v3 link to open up the following window.
This stream has 9 TAG datasets associated with it. In addition to the dataset's name, its "Relational Loaded" status is given. This is an indicator of whether all of the events associated with a particular dataset have been loaded into the TAG database, and thus are available for querying through ELSSI.
- Clicking on one of the dataset's names will open a window similar to that shown below.
This page shows the TAG files which are associated with a given dataset; again indicating whether those events have been loaded into the TAG database.
You can select the desired stream (9 streams are available for FDR-1) by selecting the appropriate check box.
The
Continue button allows to go to the next tab
Data Quality
This tab allows you to choose between executing your query on all luminosity blocks or on only complete luminosity blocks.
- You can select the Good data - COMPLETE luminosity blocks only radio button
- or all the available data
The
Continue button allows to go to the next tab
Trigger
Here you have the option to select which triggers you want your data to have passed (or not passed). The available triggers depend from the stream that you select. Full Dress-Rehearsal data have Level 1, Level 2 triggers and Event filters, Streams Test data only had two trigger levels only, and real data will have an additional triggers.
You can select the specific trigger and decide if your event should
fire or should
not fire the selected trigger by selecting the corresponding radio button.
The trigger naming scheme may appear confusing initially, but is well documented on the
Trigger User Pages.
Physics Attributes
In the Physics Attributes tab you are presented with a series of menus which allow you to refine the returned events on the basis of their physics properties. You can add multiple constraints.
- Using the Electrons menu and the Attribute Constraint box, place the following constraints on the TAG query
- NLooseElectron = 1
- LooseElectronEta1 < 2
- LooseElectronPt1 > 20000
- If necessary, you can manually edit the criteria in the Review/Edit Query String text box, or remove them from the query with the Clear button
The
Continue button allows to go to the next tab
Reviewing the Query
The
Review Query tab allows you to manually edit the query prior to sending it to the TAG database. At the top of the tab, the streams on which the query is to be run are displayed. Below that is a text input field in which you may review and amend your query before submitting it.
The
Perform Query tab provides the interface for submitting your query to the TAG database.
Count
Prior to submitting your query, make sure that a sensible number of results are going to be returned. The reason for performing this check is that "counting" the events is much less punishing on the database than "selecting" them.
Click the
Count button: Whilst the database is being queried, a
Loading page will be displayed. Once the query has completed, you will be shown information about the submitted query, detailing:
- The collection which was queried
- The full query which was submitted to the TAG database
- A small table showing the number of events returned by the query for each of the selected streams
Displaying Results
By selecting the Perform Query : Display Results sub-tab you have the opportunity to display a subset of your events' attributes, prior to (or rather than) creating a POOL ROOT collection
Select the
LooseElectronEta1,
LooseElectronPt1 and
NLooseElectron attributes from the left hand menu and click the
Confirm button.
The
maximum number of results to display can be left at the default value of 100.
Click the
Display Selection button: After a short wait, you will be presented with a table showing the requested attributes.
You can use the
Back to selection button and navigate to a different sub-tab.
Retrieve Event Collection
In the
Perform Query : Retrieve Event Collection sub-tab you can retrieve a POOL ROOT collection of your selected events by clicking on the
Retrieve Collection button.
In the resulting
Results Page, download the newly created ROOT file by clicking on the appropriate link.
Register your collection in the AMI database by clicking on the button, and then view the resulting entry in AMI.
By clicking on the
Details link adjacent to the
logicalDatasetName, you will be able to view your collection's metadata. Interesting attributes include
- queryString: The SQL query that was created in ELSSI and subsequently run against the TAG database to produce the collection
- webLocation: A URL which can be used by a web browser or the Unix wget utility to download the collection
- afsLocation: The location of the collection on the CERN AFS filesystem
Registering your collection in AMI allows you to easily share it with colleagues. An AMI tutorial is available here (as described yesterday). You can view metadata describing your collections by visiting AMI and then searching for your dataset (i.e. collection) in the Dataset Search utility (see below)
The Session Pane
The session pane, highlighted in the below image, allows you to save a number of ELSSI sessions (i.e. combinations of selection criteria) for future use and also provides a summary view of the current session.
Saving Sessions
The Save Session functionality uses browser cookies, therefore, if your browser rejects cookies, you will not be able to save sessions. ELSSI cookies have a default lifetime of one year, but may be deleted by your browser upon exit, or if you use the Clear Private Data tool in Firefox.
- Enter a name for the current session and click the Save Session button.
- Reset ELSSI using either of the two Reset buttons.
- Verify that the form contents have been cleared.
- Reload the session into the browser by clicking on the link in the Saved Sessions menu.
- Note that stored sessions can be deleted by clicking on the appropriate red cross in the Saved Sessions menu
Viewing Selection Summaries
The Selection Summary provides a thumbnail view of the selection criteria as the query is constructed. Clicking on the Expand/Collapse All button will show/hide this information.
Use ELSSI's event collections
Event collections produced using ELSSI are similar to the ROOT-base TAG files and to the event collections produced by querying ROOT-based TAG files. Therefore is possible to use these collections to extract files with Athena or Pathena.
Links
FDR description
Here are some pages about TAGs (mostly with ATLAS rel 12)
FDR-1 data and TAG files (ATLAS rel 13.0.40.2)
Here is a tutorial on working with TAGs and streamtest data (ATLAS release 12.0.6)
Links to ELSSI
--
MarcoMambelli - 18 Mar 2008