TagNavigatorToolOsg

Introduction

THIS IS A WORK IN PROGRESS!!!!!

This twiki page is intended to document a set of exercises performed using the TagNavigatorTool (TNT) on the OSG site, tier2-06.uchicago.edu.

Workflow

TNT-OSG allows a user to:
  • Select events of interest by running a query on the Tag database
  • Specify an Athena job to be run on those selected events
  • On the OSG grid, run the specified Athena job on the selected events. The job is split up to give one job for each file which contains those events (with an option to specify a minimum number of events to be processed per job - see below for details), with each sub-job then run on a different worker node.
  • Any output files can then be registered on the OSG grid and in a DQ2 dataset - [registration is optionally specified]
  • The output sandboxes from the jobs are returned to the user and the completed dataset can be registered at a site chosen by the user.

Installation and Running Instructions

Prerequisites

  • a working directory. Create the working directory /local/workdir/tnt on tier2-06.uchicago.edu
  • an installed ATLAS Release. I will be using ATLAS Release 12.0.3 for the exercises. We could, however, use any relatively recent version of the ATLAS release (preferably 12.0.x), including the POOL Collection Utilities. These come with the standard release, such as that described in the ATLAS WorkBook. I chose to use 12.0.3 to match the Release used in the set of exercises described on the DssPrototype twiki. The ATLAS Release 12.0.3 is installed at /share/data/app
  • a standard grid OSG installation. This installation has already been installed at /local/inst/pjs. The location /local/inst/pjs is defined on tier2-06.uchicago.edu as the variable GLOBUS_LOCATION. In general, on any OSG site, doing
         source $GLOBUS_LOCATION/setup.sh 
    is all you need to set this up. Site specific variables for all production OSG sites can be obtained from the OSG Site Catalog.
  • the CollSplitByGUID.exe utility is currently part of the nightly builds, but is not yet in a proper release. I downloaded this file from here and placed it in the tnt working directory.
  • a valid grid certificate
  • Python version 2.2 or above; on tier2-06.uchicago.edu, 2.4.1 is the default.

CVS

The TNT code, which is currently designed to work solely on LCG sites, is kept in the ATLAS CVS under http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/offline/Database/AthenaPOOL/POOLCollectionTools/tnt/?cvsroot=atlas offline/Database/AthenaPOOL/POOLCollectionTools/tnt. This code is the start point for the TNT-OSG extensions.

Installing TNT

  1. Download the file tnt.tar into the tnt working directory. The working directory now contained the files:
       266240 Dec 13 10:17 tnt.tar
       76615 Dec 13 10:17 CollSplitByGUID.exe
       
    I then untar'd the file tnt.tar. The resultant files and directories present after the untar were:
       [gfg@tier2-06 tnt]$ pwd;ls -l
       /local/workdir/tnt
       total 536
       authentication.xml
       AUTHORS
       CollSplitByGUID.exe
       example
      GenerateCatalogs.py
       generateLCGJob.py
       GuidExtractor.py
       install_dq2_client.sh
       LICENCE
       monitorLCGJobs.py
       notifyUser.py
       pycurl.so
       README
       TNT.conf
       TNT.py
       tnt.tar
       
    1. Copy the files TNT.conf and setup-tnt-env.sh to the working directory. Rename these files to TNT.conf and setup-tnt-env.sh
  2. Install a DQ2 client using the DQ2 install script. I chose to do this in a new sub-directory called dq2.
     
          mkdir dq2
          cd dq2    
          ../install_dq2_client.sh
       
    This downloads and installs the necessary components for the DQ2 client into the dq2 sub-directory of the tnt working directory. The version that is installed is Version=0.2.11
  3. Set the LFC_HOST variable:
           export LFC_HOST=tier2-05.uchicago.edu
       
    This should be the LFC containing the data you require, and where your output data will be registered.
    1. Set your PATH and PYTHONPATH variables to include the sub-directory dq2. THESE LINES HAVE ALREADY BEEN INCLUDED IN THE FILE setup-tnt-env.sh.
             #DQ2_CLIENT_TOOLS------------------------------------------------------
             export DQ2_CLIENT_PATH="/local/workdir/tnt/dq2"
             echo "Setting up paths to dq2_client tools from......${DQ2_CLIENT_PATH}"
             # ADD PATHS for dq2_client
             export PATH=${DQ2_CLIENT_PATH}:$PATH
             export PYTHONPATH=${DQ2_CLIENT_PATH}:$PYTHONPATH
         

  1. The setup file setup-tnt-env.sh is expected to be sourced each time the user logs into the system. This setup file file executes the following:
   echo "Setting up d-cache tools"
   export ATLAS_REL="12.0.3"
   echo "Setting up Atlas Release for Release ${ATLAS_REL}"
   export OSG_LOCATION="/local/inst/pjs"
   echo "Setting up OSG grid client tools from..${OSG_LOCATION}"
   export DQ2_CLIENT_PATH="/local/workdir/tnt/dq2"
   echo "Setting up paths to dq2_client tools from......${DQ2_CLIENT_PATH}"
   export LFC_HOST='tier2-05.uchicago.edu'
   echo "Defining the global variable LFC_HOST to be ${LFC_HOST}"
   echo "Checking to see if you have a valid grid certificate"
        If you do not have a valid grid certificate you will be prompted with:
           ->Use command "grid-proxy-init" to initialize your grid certificate.

Running TNT

The main executable for TNT is the TNT.py script. It takes the following options:
Usage: TNT.py [-h | --help] [-v | --verbose] [-c | --conf configuration file] [-b | --background] [-a | --archive]
-h outputs the Usage message above
-v give more detailed logging information while TNT is running
-c specifies a configuration file to use. The default is TNT.conf
-b sets TNT to running as a background process, returning the prompt to the user and writing all output to a log file. An email notification is sent when the job is ready.
-a causes TNT to archive certain data: the configuration file used, the event collection which resulted from the query, and the log file from TNT running. These are put in a directory of form archive/TNT-$$, where $$ is the PID associated with the job.

To run TNT, a user should modify the configuration file to suit their requirements and then simply run the script. Each of the parameters in the configuration file is described below. A 'blank' configuration file is provided in the working directory and an example in the example/ directory.

The Configuration File

Parameters are included in the configuration file, TNT.conf, with the format
PARAMETER:= VALUE
It is important to include the ":=" - this acts as the separator when parsing the file.
SRC_CONNECTION_STRING:= mysql://tagreader@tier2-06.uchicago.edu/tier2tagdb
MIN_EVENTS:=

QUERY:= NJet>0&&NLooseElectron>0
ATHENA_COMMAND:= athena -c "In=['myEvents']; CollType='ExplicitROOT'" EvenCount.py
OUTPUT_FILES:=
GRID_TYPE:= OSG

INPUT_SANDBOX:= EventCount.py
OUTPUT_SANDBOX:=
REGISTER_OUTPUT:= NO
OUTPUT_DATASET_NAME:=
OUTPUT_DATASET_LOCATION:=
EMAIL_ADDRESS:= jerryg@anl.gov

Parameter Value Default
SRC_COLLECTION_NAME Tag collection you wish to query over. See here for a list of available collections. testIdeal07_005711_TAG_v12000201
SRC_COLLECTION_TYPE POOL collection type for Tag database. For anyone not at CERN, this should always be RelationalCollection. At CERN, one can also use MySQLltCollection if accessing the MySQL database. MySQLltCollection
SRC_CONNECTION_STRING Connection string for Tag database. For Rome tags (which is all we can use right now) on Oracle DB, this should be oracle://atlas_tags/atlas_tags_rome mysql://tagreader@tier2-06.uchicago.edu/tier2tagdb
MIN_EVENTS minimum number of events to write into a sub-collection (for analysis by one of the sub-jobs) null
QUERY Query you wish to run on the Tags. See here for a list of query-able attributes. NJet>0&&NLooseElectron>0
ATHENA_COMMAND The Athena command you want to run, exactly as it should appear on the command-line.
OUTPUT_FILES The names of any output files from your job which you want registered in LCG and/or DQ2. If there is more than one file, names should be separated by spaces only. For information on file naming, see note below athena -c "In=['myEvents']; CollType='ExplicitROOT'" EventCount.py
GRID_TYPE LCG, OSG, or NG OSG
INPUT_SANDBOX Any extra things you want in the input sandbox (the event list, file catalogues etc get put in automatically). This includes jobOptions for your Athena job. EventCount.py
OUTPUT_SANDBOX Any extra things to return in your output sandbox. Only the standard output and error returned by default. null
REGISTER_OUTPUT Whether or not to register the output files in a DQ2 dataset. Should be YES or NO NO
OUTPUT_DATASET_NAME Name of DQ2 dataset to create. Must not exist already. null
OUTPUT_DATASET_LOCATION Where to register DQ2 dataset when it is completed. Must be one of the DQ2-recognised site names - see here for a list. null
EMAIL_ADDRESS If running in background mode, address to mail user at with notification of job completion. your_mailto_address

A note on the output file naming convention

NOTE: THIS SECTION IS SPECIFIC TO AN LCG IMPLEMENTATION AND HAS NOT YET BEEN MODIFIED TO WORK FOR OSG The name you give in the OUTPUT_FILES parameter is used as the basis for all the output file names for the grid jobs after they've been split. If there are N jobs, the first '.' in the given filename is interpolated with a '._x' where x is an integer between 0 and (N-1); this gives one output file per sub-job. A new directory is created in the LFC under /grid/atlas/dq2 with a name corresponding to the first part of the OUTPUT_FILES name, and all the output files are placed in that directory.

This is repeated for every individual filename specified in the OUTPUT_FILES section of the configuration file.

Example: Suppose you set OUTPUT_FILES as joe_bloggs_output.pool.root. Then a new directory /grid/atlas/dq2/joe_bloggs_output will be created in the LFC. If your tag query results in 10 AOD files being run over by Athena, giving 10 grid jobs, then you will end up with files joe_bloggs_output._0.pool.root to joe_bloggs_output._9.pool.root in that directory.

This was implemented in this way to suit the DQ2 file naming conventions, but comments / feedback are welcome.

What happens at run-time

When TNT starts running, the following steps occur.
  • First, the configuration file is parsed and the input variables stored.
  • The given query is then run on the Tag database, using the connection parameters given in the configuration file. Any events which pass the query are written to a file in the working directory called myEvents.root. Any pre-existing file with that name is deleted.
  • A list of the GUIDs of files which contain the events in myEvents.root is then extracted, and a POOL XML file catalogue generated which contains all the files. The physical filenames for these are extracted from the central LFC which was set with the LFC_HOST variable.
  • The output collection, myEvents.root, is split into a number of sub-collections. There is by default one sub-collection for every file GUID contained in myEvents.root. One can also, however, set a minimum number of events per sub-collection using the MIN_EVENTS parameter in the configuration file. Then, when the events are being gathered into sub-collections, if there are fewer than MIN_EVENTS present for a particular file, these events will be grouped together with those from another file, and so on until the sub-collection contains at least MIN_EVENTS events.
  • If the user has chosen to register output as a DQ2 dataset, a dataset with the selected name is created.
  • The grid job executables and JDL files are generated and stored in the jobs/ directory. For each sub-collection generated in the previous step, there will be one grid job. Names are of the form gridJob_sub_collection_X.jdl, gridJob_sub_collection_X.sh, where X is an index over the sub-collections. Any files with these names already in the jobs/ directory are overwritten.
  • The jobs are submitted to the grid. The job IDs are stored in a file in the working directory called jobIDfile-$$ where $$ is the script process ID, so you can check the status of the jobs any time using edg-job-status -i jobIDfile-$$ (e.g. if you are getting impatient and want to know whether your jobs are running or not).
  • TNT polls the Resource Broker regularly until the jobs have all finished (either completed or failed). If a job fails because of some problem with the worker node (e.g. incorrect version of the ATLAS release, wrong python version etc), the job is resubmitted.
  • As jobs finish, their output sandboxes are delivered to the output/ directory where they can be examined at leisure.
  • When all jobs have finished successfully, with required output registered correctly in LCG and DQ2, if requested, the dataset is closed, frozen and its location registered.
  • If, however, the files could not all be written to the desired SE but some were instead written to the default SE, the dataset cannot be frozen. The user then needs to manually move the file(s) to the desired SE, close and register the location of the DQ2 dataset.
  • In the event that some output files were not registered at all, the job is considered a failure. The user should examine the output and resubmit the job manually if necessary.

Exercises

We will run the same simple exercise that we previously ran manually as described in DssPrototype. The first thing we need to do is set up the correct environment for TNT by executing the script setup-tnt-env.sh.
  • Setup the tnt environment by executing: source ./setup-tnt-env.sh
  • Start TNT running by executing: python ./TNT.py -v * The modified file TNT.py was used in the following steps. * Since we want to duplicate the exercises we performed in the DssPrototype tests we will need to have a copy of the file PoolFileCatalog.xml and PoolFileCatalog.xml.BAK linked from /share/data/t2data to our working directory /local/workdir/tnt. I created the following links in the directory /local/wrokdir/tnt:
              ln -fs /share/data/t2data/PoolFileCatalog.xml PoolFileCatalog.xml
              ln -fs /share/data/t2data/PoolFileCatalog.xml.BAK PoolFileCatalog.xml.BAK
    
             A sample description of a File entry in PoolFileCatalog.xml is
                     <File ID="68A2CDFA-DA38-DB11-9117-00123F20A945">
                     <physical>
                        <pfn filetype="ROOT_All" name="dcache:/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00034.pool.root.1"/>
                     </physical>
                     <logical>
                        <lfn name="testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00034.pool.root.1"/>
                   </logical>
                   <metadata att_name="md5sum" att_value="4c9226fea06663cb25e6d2d0b0709b60"/>
                   <metadata att_name="fsize" att_value="96364865"/>
                   <metadata att_name="lastmodified" att_value="1160167893"/>
                   <metadata att_name="archival" att_value="V"/>
                   </File>
       
    Note that the file protocol of each pfn is "dcache" and the pfn filetype of each pfn is "ROOT_ALL". There are 40 "*recon.AOD*" files in PoolFileCatalog.xml and 40 "*recon.ESD*" files.

When TNT starts running, the following steps occur.
  • First, the configuration file is parsed and the input variables stored.
    • The configuration file used was TNT.conf.
  • The given query is then run on the Tag database, using the connection parameters given in the configuration file. Any events which pass the query are written to a file in the working directory called myEvents.root. Any pre-existing file with that name is deleted.
    • The attached TNT.conf file has the QUERY attribute set to NJet>0&&NLooseElectron>0. This will generate a TAG collection of 4014 events as seen by the exercises in the DSSPrototype section. TNT will by default generate multiple sub-collections; each job containing a minumum of MIN_EVENTS events. If we use the attached TNT.conf file with MIN_EVENTS not set, TNT will generate multiple sub-collections with each collection containing a minimum of 100 events. This would result in the creation of 4014/100 = ~40 jobs. If we set MIN_EVENTS to be 4000, TNT would generate 1 sub-collection of 4014 events. If we set MIN_EVENTS to be 1000, TNT would generate 3 sub-collections . To determine roughly how many sub-collections will be created by TNT just divide the total number of events in the TAG collection by the attribute MIN_EVENTS. For now, let's set the value of MIN_EVENTS in TNT.conf to be 1000. This will result in the creation of 3 sub-collections.
    • Executing TNT and aborting after 10-20 seconds we see the first few lines of output to be:
            /local/workdir/tnt/dq2/lfc.py:4: RuntimeWarning: Python C API version mismatch for module _lfc: This Python has API version  1012, module _lfc has version 1011.
           import _lfc
           **** Welcome to TNT! ****

           Using TNT.conf as configuration file

           Job name is gfg-18094

           Executing query 'NJet>0&&NLooseElectron>0' on tag database...
           CollCopy.exe -src testIdeal07_005711_TAG_v12000201 MySQLltCollection -srcconnect mysql://tagreader@tier2-   06.uchicago.edu/tier2tagdb -dst myEvents RootCollection -queryopt 'SELECT RunNumber, EventNumber' -  query "NJet>0&&NLooseElectron>0"
           CollCopy: Finished copying input collection(s) `testIdeal07_005711_TAG_v12000201:MySQLltCollection' to output collection(s) `myEvents:RootCollection'

          ./CollSplitByGUID.exe -src myEvents RootCollection -minevents 1000
          Minimum number of events: 1000

         We also see that the following three sub-collections have been created:
            16610 Dec 18 14:36 sub_collection_1.root
            16876 Dec 18 14:36 sub_collection_2.root
            26009 Dec 18 14:36 sub_collection_3.root
         Using =root= to tell us how many events are in each sub-collection we see that:
           sub_collection_1.root contains 1030 events
           sub_collection_2.root contains 1059 events
           sub_collection_3.root contains 1925 events

  • A list of the GUIDs of files which contain the events in myEvents.root is then extracted, and a POOL XML file catalogue generated which contains all the files. The physical filenames for these are extracted from the central LFC which was set with the LFC_HOST variable.
    • The dataset which contained all of the AOD and ESD data for this exercise was testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201. If we query the DQ2 dataset browser for this dataset we see that
        Registered locations for the latest dataset version
        Complete replicas: None
        Incomplete replicas: BNLPANDA UC_VOB
        OSG sub-datasets, with modification dates:
            testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968   2006-10-05 08:03:07
            testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002925   2006-10-02 13:35:45
    • The sub-dataset testIdeal*_tid002968 is the one we are interested in since it was registered locally at the UC_VOB server. If, however, we look at the DQ2 dataset browser for this sub-dataset we see that the main catalog at CERN only sees this sub-dataset as INCOMPLETE and existing only at BNLPANDA and CERNCAF.
         Registered locations for the latest dataset version
         Complete replicas: None
         Incomplete replicas: BNLPANDA CERNCAF
This causes a problem for TNT since TNT expects to be able to find all of the GUIDs associated with the dataset testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968 at some Completed replica. That is not true for the dataset we are interested in. The following line in the file GuidExtractor.py was modified to look specifically for INCOMPLETE replicas:
       site_map = dq.locationClient.queryDatasetLocations(vuids, dataset_names, LocationState.INCOMPLETE)


-- JerryGieraltowski - 14 Dec 2006
Topic attachments
I Attachment Action Size Date Who Comment
TNT.conf.txttxt TNT.conf.txt manage 1 K 14 Dec 2006 - 16:00 JerryGieraltowski TNT.conf
TNT.py.txttxt TNT.py.txt manage 21 K 19 Dec 2006 - 16:14 JerryGieraltowski TNT.py - modified for OSG exercises
setup-tnt-env.sh.txttxt setup-tnt-env.sh.txt manage 1 K 12 Dec 2006 - 20:56 JerryGieraltowski Setup TNT environment
tnt.tartar tnt.tar manage 260 K 12 Dec 2006 - 21:02 JerryGieraltowski TNT tarball - version 0.2
Topic revision: r7 - 19 Dec 2006, JerryGieraltowski
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback