TagNavigatorTool

Introduction

This is the Twiki page for TNT, the ATLAS Tag Navigator Tool. TNT is a utility which aims to allow ATLAS physicists to use the Tag database for analysis, using the Distributed Data Management (DDM) system in an integrated way. It consists of a number of python scripts which interact with the Tag database, the grid, and DQ2 (the ATLAS DDM implementation).

Ganga plugin

TNT has now been developed as a plugin for Ganga, which will be available shortly. For information specific to the Ganga plugin, please see TagNavigatorToolGangaPlugin.

Workflow

TNT allows a user to:
  • Select events of interest by running a query on the Tag database
  • Specify an Athena job to be run on those selected events
  • On the grid, run the specified Athena job on the selected events. The job is split up to give one job for each file which contains those events (with an option to specify a minimum number of events to be processed per job - see below for details), with each sub-job then run on a different worker node.
  • Any output files are then registered on the grid and in a DQ2 dataset
  • The output sandboxes from the jobs are returned to the user and the completed dataset registered at a site chosen by the user.

The diagram below shows the workflow within the system. tnt.png (click for a larger view)

Installation and Running Instructions

Prerequisites

You need to have:
  • a relatively recent version of the ATLAS release (preferably 12.0.x, but it will also work will 11.0.x), including the POOL Collection Utilities. These come with the standard release, such as that described in the ATLAS WorkBook.
  • a standard grid User Interface installation: on lxplus, doing
         source /afs/cern.ch/project/gd/LCG-share/sl3/etc/profile.d/grid_env.sh 
    is all you need to set this up
  • the CollSplitByGUID.exe utility is currently part of the nightly builds, but is not yet in a proper release. Until this is the case, please download it from here and place it in the TNT running directory.
  • a valid grid certificate
  • Python version 2.2 or above; on lxplus, 2.4.2 is the default.

CVS

The TNT code is kept in the ATLAS CVS under http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/offline/Database/AthenaPOOL/POOLCollectionTools/tnt/?cvsroot=atlas offline/Database/AthenaPOOL/POOLCollectionTools/tnt

Installing TNT

  1. Download the tnt.tar into a clean directory (e.g. create a new directory called tnt/) and untar it. You will see several python scripts, a DQ2 install script, and a directory with an example configuration:
          [lxplus010]tnt> tar -xvf tnt.tar
          ./authentication.xml
          ./example/
          ./example/EventCount.py
          ./example/TNT.conf
          ./GenerateCatalogs.py
          ./generateLCGJob.py
          ./GuidExtractor.py
          ./install_dq2_client.sh
          ./LICENCE
          ./notifyUser.py
          ./pycurl.so
          ./README
          ./TNT.conf
          ./TNT.py
       
  2. Install a DQ2 client using the DQ2 install script:
         
          ./install_dq2_client.sh
       
    This downloads and installs the necessary components for the DQ2 client. For more information on DQ2, see their Twiki page
  3. Set the LFC_HOST variable:
           export LFC_HOST=lfc-atlas-test.cern.ch
       
    This should be the LFC containing the data you require, and where your output data will be registered.
    N.B. If you have an LFC client version older than 1.5.8, some LFCs may cause problems.
    For example, with LFC client 1.5.7, trying to access lfc-atlas-test.cern.ch gives a segmentation fault. To fix this temporarily, download these _lfc.so and lfc.py files (from LFC 1.5.10 client) into the TNT working directory and it should be ok.
  4. Set your PYTHONPATH to include the directory which holds lfc.py and _lfc.so, e.g.:
           export PYTHONPATH=/opt/lcg/lib/python:$PYTHONPATH
       
  5. If you are using an ATLAS release older than release 12, you must set POOL_AUTH_USER and POOL_AUTH_PASSWORD, e.g.:
          export POOL_AUTH_USER=atlas_tags_rome_reader
    
          export POOL_AUTH_PASSWORD=tags%reader
       
    These should match the values in the authentication.xml file
  6. If you are running from lxplus, you may need to reset your $X509_CERT_DIR variable to allow access to some UK sites which have a certificates signed by the new eScience CA:
          export X509_CERT_DIR=/etc/grid-security/certificates
       

Running TNT

The main executable for TNT is the TNT.py script. It takes the following options:
Usage: TNT.py [-h | --help] [-v | --verbose] [-c | --conf configuration file] [-b | --background] [-a | --archive]
-h outputs the Usage message above
-v give more detailed logging information while TNT is running
-c specifies a configuration file to use. The default is TNT.conf
-b sets TNT to running as a background process, returning the prompt to the user and writing all output to a log file. An email notification is sent when the job is ready.
-a causes TNT to archive certain data: the configuration file used, the event collection which resulted from the query, and the log file from TNT running. These are put in a directory of form archive/TNT-$$, where $$ is the PID associated with the job.

To run TNT, a user should modify the configuration file to suit their requirements and then simply run the script. Each of the parameters in the configuration file is described below. A 'blank' configuration file is provided in the working directory and an example in the example/ directory.

The Configuration File

Parameters are included in the configuration file with the format
PARAMETER:= VALUE
It is important to include the ":=" - this acts as the separator when parsing the file.

Parameter Value
SRC_COLLECTION_NAME Tag collection you wish to query over. See here for a list of available collections.
SRC_COLLECTION_TYPE POOL collection type for Tag database. For anyone not at CERN, this should always be RelationalCollection. At CERN, one can also use MySQLltCollection if accessing the MySQL database.
SRC_CONNECTION_STRING Connection string for Tag database. For Rome tags (which is all we can use right now) on Oracle DB, this should be oracle://atlas_tags/atlas_tags_rome
MIN_EVENTS minimum number of events to write into a sub-collection (for analysis by one of the sub-jobs)
QUERY Query you wish to run on the Tags. See here for a list of query-able attributes.
ATHENA_COMMAND The Athena command you want to run, exactly as it should appear on the command-line.
OUTPUT_FILES The names of any output files from your job which you want registered in LCG and/or DQ2. If there is more than one file, names should be separated by spaces only. For information on file naming, see note below
GRID_TYPE At the moment, this should be LCG
INPUT_SANDBOX Any extra things you want in the input sandbox (the event list, file catalogues etc get put in automatically). This includes jobOptions for your Athena job.
OUTPUT_SANDBOX Any extra things to return in your output sandbox. Only the standard output and error returned by default.
REGISTER_OUTPUT Whether or not to register the output files in a DQ2 dataset. Should be YES or NO
OUTPUT_DATASET_NAME Name of DQ2 dataset to create. Must not exist already.
OUTPUT_DATASET_LOCATION Where to register DQ2 dataset when it is completed. Must be one of the DQ2-recognised site names - see here for a list.
EMAIL_ADDRESS If running in background mode, address to mail user at with notification of job completion.

A note on the output file naming convention

The name you give in the OUTPUT_FILES parameter is used as the basis for all the output file names for the grid jobs after they've been split. If there are N jobs, the first '.' in the given filename is interpolated with a '._x' where x is an integer between 0 and (N-1); this gives one output file per sub-job. A new directory is created in the LFC under /grid/atlas/dq2 with a name corresponding to the first part of the OUTPUT_FILES name, and all the output files are placed in that directory.

This is repeated for every individual filename specified in the OUTPUT_FILES section of the configuration file.

Example: Suppose you set OUTPUT_FILES as joe_bloggs_output.pool.root. Then a new directory /grid/atlas/dq2/joe_bloggs_output will be created in the LFC. If your tag query results in 10 AOD files being run over by Athena, giving 10 grid jobs, then you will end up with files joe_bloggs_output._0.pool.root to joe_bloggs_output._9.pool.root in that directory.

This was implemented in this way to suit the DQ2 file naming conventions, but comments / feedback are welcome.

What happens at run-time

When TNT starts running, the following steps occur.
  • First, the configuration file is parsed and the input variables stored.
  • The given query is then run on the Tag database, using the connection parameters given in the configuration file. Any events which pass the query are written to a file in the working directory called myEvents.root. Any pre-existing file with that name is deleted.
  • A list of the GUIDs of files which contain the events in myEvents.root is then extracted, and a POOL XML file catalogue generated which contains all the files. The physical filenames for these are extracted from the central LFC which was set with the LFC_HOST variable.
  • The output collection, myEvents.root, is split into a number of sub-collections. There is by default one sub-collection for every file GUID contained in myEvents.root. One can also, however, set a minimum number of events per sub-collection using the MIN_EVENTS parameter in the configuration file. Then, when the events are being gathered into sub-collections, if there are fewer than MIN_EVENTS present for a particular file, these events will be grouped together with those from another file, and so on until the sub-collection contains at least MIN_EVENTS events.
  • If the user has chosen to register output as a DQ2 dataset, a dataset with the selected name is created.
  • The grid job executables and JDL files are generated and stored in the jobs/ directory. For each sub-collection generated in the previous step, there will be one grid job. Names are of the form gridJob_sub_collection_X.jdl, gridJob_sub_collection_X.sh, where X is an index over the sub-collections. Any files with these names already in the jobs/ directory are overwritten.
  • The jobs are submitted to the grid. The job IDs are stored in a file in the working directory called jobIDfile-$$ where $$ is the script process ID, so you can check the status of the jobs any time using edg-job-status -i jobIDfile-$$ (e.g. if you are getting impatient and want to know whether your jobs are running or not).
  • TNT polls the Resource Broker regularly until the jobs have all finished (either completed or failed). If a job fails because of some problem with the worker node (e.g. incorrect version of the ATLAS release, wrong python version etc), the job is resubmitted.
  • As jobs finish, their output sandboxes are delivered to the output/ directory where they can be examined at leisure.
  • When all jobs have finished successfully, with required output registered correctly in LCG and DQ2, if requested, the dataset is closed, frozen and its location registered.
  • If, however, the files could not all be written to the desired SE but some were instead written to the default SE, the dataset cannot be frozen. The user then needs to manually move the file(s) to the desired SE, close and register the location of the DQ2 dataset.
  • In the event that some output files were not registered at all, the job is considered a failure. The user should examine the output and resubmit the job manually if necessary.


Major updates:
-- Main.cnichols - 17 Oct 2006

%RESPONSIBLE% Main.cnichols
%REVIEW% Main.cnichols - 14 Nov 2006
Topic revision: r1 - 05 Dec 2006, JerryGieraltowski
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback