Setup of Data Services Machine prototype

Introduction

A Data Services Machine is meant to be a machine that lives at a Tier2 Center as a VO-edge server, dedicated to ATLAS use, that acts to:

  • Provide users the ability to access the Tier2's DQ2 server.
  • Hosts or provides access to ATLAS specific database services, such as TAG and possibibly conditions (IOV and calibration) databases.
  • Provide a skimming service for Tier2-resident datasets through either command line or web interfaces.

The host tier2-06.uchicago.edu is the prototype used for the Data Services. To avoid NFS problems (known to be common - also during pacman installations) the software is installed locally. The directory chosen is /local, a 100GB disk dedicated to it. The components used are:

  • a MySQL DB
  • a DQ2 client (and the Grid clients required to operate it)
  • ATLAS releases, accessed using the cluster-wide installation /share/app (OSG_APP)

Directory structure

In order to distinguish the prototype-installed software from the system software for the time being the DS prototype sw is mostly installed in a separate area on a local disk: /local/inst/. In this directory reside both tests and working clients that can be used also by MW Tier2 users as described in the quick howto: DQ2Subscriptions

The preferred work area for transfers and tests by the people working on the DS project is: /local/workdir Developers can create subdirectories in this area and use them as they like (download data, write scripts, elaborate data, run Athena, ...). If they have to install packaged software useful for the project they should use the install area above.

MySQL DB

The database has been installed in /opt/mysql using the tar-gz package from mysql.com (the RPMs are optimized for x386 while the tar.gz is for x686): mysql-standard-4.1.20-pc-linux-gnu-i686.tar.gz It has been installed and configured. Self-tests have been run and here you can see the variables (mysql-variables.txt) and mysql-my.cnf (my.cnf).

A database has been created with a reader user with unprotected read (select) access (tier2reader) and a writer user (tier2writer). Here (mysql-tag-install.txt) is a log with notes about the installation.

Java and Jas3

The required Java5 (jdk1.5) and JAS3 have been installed on the prototype cluster ( /local/inst/java and /local/inst/jas).

To run jas3 source the java setup (. /local/inst/java/setup.sh) and start jas: /local/inst/jas/jas3

The SQLTuple plugin has been installed and a tuples DB has been created in the current server: reader and writer user are the same as for the taddb.

For documentation check:

Globus and Grid clients

Grid software has been installed using the Panda Job Submitter package and adding SRM clients to it.
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
... install it
pacman -get GCL:PandaJS
pacman -get http://vdt.cs.wisc.edu/vdt_1311_cache:SRM-V1-Client
pacman -get http://vdt.cs.wisc.edu/vdt_1311_cache:SRM-V2-Client
. ./setup.sh 

There is the idea to consider the OSG WN Client for future installations. The use of a Panda/OSG packaged client instead of starting from the components in VDT allows not to worry about the components included: other packager will make sure that all the necessary elements are included.

One more installation has been performed to compare the OSG/ITB WN Client package with the Panda client. The goal is to select the client that includes all the required functionalities and is smaller/lighter (or has less requirements).

A note about the installations is that both VDT 1.3.10 (PJS) and 1.3.11 (WNCli) do not recognize SLC 3 (It includes SLF 3) so it will ask confirmation to proceed anyway.

cd /local/inst/wnc060814
pacman -get ITB:wn-client.pacman  

DQ2 client and utilities

DQ2 client has been installed following the instructions in: https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientInstallation
wget http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/DQ2_0_2_11/client/install_dq2_client.sh
...

Once the client was installed the LCG libs (that according to instructions and setup files are available through AFS) had to be installed locally (in lcg300_py) and copied to the dq2 directory. Then the DQ2 end user tools have been installed locally making a checkout of the CVS content (in dq2util) and copied to the dq2 dir. Here are instructions on how to get them.

The setup has been adapted from the multiple example to get one that points only to the local installation (Python 2.4. is already installed and in the path).

At this point to use DQ2 you have simply to . ./setup.sh in /local/inst/dq2.

Here are instructions on how to use:

Using Athena

Releases are installed by the release manager for production (Xin Zhao - BNL) and considered already available at the CE.

Setup commands for release 11.0.42
source /share/app/atlas_app/atlas_rel/11.0.42/setup.sh
source /share/app/atlas_app/atlas_rel/11.0.42/dist/11.0.42/AtlasOfflineRunTime/AtlasOfflineRunTime-00-00-00/cmt/setup.sh

Setup commands differ for releases greater than 11.3.0. These differ because there was a change to project builds between releases 11 and 12 resulting in a different directory structure for the Release.

These are the setup commands for release 12.0.4
source /share/app/atlas_app/atlas_rel/12.0.4/cmtsite/setup.sh -tag=AtlasOffline,12.0.4
source /share/app/atlas_app/atlas_rel/12.0.4/AtlasOffline/12.0.4/AtlasOfflineRunTime/cmt/setup.sh
Remember to add all "-tag" parameters (including 12.0.4) to the first setup. Otherwise, it will not complain but it will also not work correctly.

If the setup commands for Release 11.0.42 are executed, you should then be able to do a 'which athena' and see something like

alias athena='athena.py'
        /share/app/atlas_app/atlas_rel/11.0.42/dist/11.0.42/InstallArea/share/bin/athena.py

NOTE: If you prefer to place the source commands in an executable script and then try to execute the script, you will not get a correct path definition for "athena". You MUST, in that case, source the executable script you just created.

Example:
Create the script "sourceme.11" containing the two setup commands for Release 11
> chmod +x sourceme.11
>source ./sourceme.11

This would correctly identify the alias for "athena". If you had tried to execute sourceme.11 directly (i.e., ./sourceme.11), a subsequent call to "which athena" would result in athena not being identified. This all has to do with "atehna" being defined through an alias created in your login shell which is not passed down to other sub-shells.

Make a work area using mkdir and put the following content in a file named requirements properly edited to reflect your work area and release in use. Users may want to check out and compile code and then use athena with that code. To do the checkouts properly, one needs the requirements file to be visible in your work area.

Writing TAGs Example

Custom job options

One can build tags by putting the following content in a job options file, which for this example we will call CSCMergeAODwithTags.py. There are differences between release 11 and 12 libraries to load and their locations. The following is configured for release 12, but there are comments which indicate how to change it to work for release 11.

NOTE: Release 11 and Release 12 AOD are incompatible. If you are reading a csc11 file, you must set the flags to use release 11.

CSCMergeAODWithTags_v12.py

This then gives you several flags that you can set. Defaults are in [ ].
  • EvtMax [1000000]: the maximum number of events to read from the input
  • seed [17]: the random seed used to put the random number in the tags
  • dataset [0]: the dataset identifier such as 4100, 5010, ...
  • PoolAODInput [NULL]: Input file list in python string list format, e.g. ['file.root']. Note that if one uses TAG files, then you must remove the .root from the name, e.g. test.TAG.root would look like ['test.TAG']. Also note that this can be a list of files.
  • CollType [ImplicitROOT]: You need to set this if you are reading a TAG file or database.
  • PoolAODOutput [test]: prefix prepended to AOD.root or TAG.root depending on what is written out.
  • doWriteAOD [True]: borrowed directly from RecExCommon/RecExCommon_flags.py
  • doWriteTAG [True]: borrowed directly from RecExCommon/RecExCommon_flags.py

So if you configure the job options above for release 11, an example which would write a TAG file for an existing AOD file. The release 12 files are available via dcache, so you could also copy one of those files and change PoolAODInput.

pfndir="/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968"
pfnfile="testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1"
rm -f ${pfnfile}
dccp ${pfndir}/${pfnfile} .
ls -l
time athena -c "PoolAODInput=['testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1']; PoolAODOutput='mytest'; EvtMax=100; doWriteAOD=False" CSCMergeAODwithTags.py
NOTE: If you put this command by itself or with others into an executable script (say runathena) you must execute the script with the command source ./runathena.

This will produce the following ouptuts for the previous athena run where EvtMax was set to 100:
real    0m37.345s
user    0m31.110s
sys     0m1.570s
total 89752
-rw-rw-r--    1 gfg      gfg               26 Oct 12 10:05 AtRndmGenSvc.out
-rw-rw-r--    1 gfg      gfg                0 Oct 12 10:04 cdb.log
-rw-rw-r--    1 gfg      gfg         17185 Oct 12 10:05 CLIDDBout.txt
-rw-r--r--     1 gfg      gfg         85879 Oct 12 10:05 mytest.TAG.root
-rw-rw-r--    1 gfg      gfg           106   Oct 12 10:05 PoolFileCatalog.xml
The PoolAODOutput file is named mytest.TAG.root. The above command took 31 sec to build the TAGs for 100 ev on tier2-06.

Doublemint job options

All the parameters in the -c quote of the athena command can be put into a python file like myTopOptions.py

PoolAODInput=['testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1']; 
PoolAODOutput='mytest'; 
EvtMax=100; 
doWriteAOD=False; 
doWriteTAG=True
readAOD=True; 
doAOD=False;

One can then do the equivalent processing with

athena myTopOptions.py CSCMergeAODwithTags.py

Deleting the EvtMax parameter from the athena command would force it to use the default of EvtMax=10000. The following output was produced for an athena run using the default (which resulted in 1000 events):
real    1m7.737s
user    0m53.630s
sys     0m7.940s
total 90136
-rw-rw-r--    1 gfg      gfg               28 Oct 12 10:37 AtRndmGenSvc.out
-rw-rw-r--    1 gfg      gfg                 0 Oct 12 10:36 cdb.log
-rw-rw-r--    1 gfg      gfg         1 7185 Oct 12 10:37 CLIDDBout.txt
-rw-r--r--     1 gfg      gfg        387444 Oct 12 10:37 mytest.TAG.root
-rw-rw-r--    1 gfg      gfg              106 Oct 12 10:36 PoolFileCatalog.xml

The PoolAODOutput file is named mytest.TAG.root. The above command took 54 seconds to build the TAGs for 1000 ev on tier2-06.

If one then wants to check the resulting mytest.TAG.root, execute the following commands. Note that one needs to do a pool_insertFileToCatalog of the input file for it to work properly. In the future, hopefully we can just point jobs at the LRC where the file is already registered.
pool_insertFileToCatalog  /local/workdir/test1/csc11.005010.J1_pythia_jetjet.merge.AOD.v11004104._00001.pool.root
athena -c "In=['mytest.TAG']; CollType='ExplicitROOT'" AthenaPoolUtilities/EventCount.py

The resultant output should contain the following lines with the event count matching that of the athena run of 1000 events.
EventCount           INFO ---------- INPUT FILE SUMMARY ----------
EventCount           INFO Input contained: 1000 events
EventCount           INFO  -- Event Range ( 4800 .. 51599 )

Test Results: Working with Release 12.0.x

As a followup to the Data Services meeting of October 10, a set of tests were identified to asess the relative ease or difficulty of working with two Release 12.0 datasets resident in the UC_VOB local replica catalog. The current status of these tests can be found here.

Using RecExCommon job options

For those more familiar with RecExCommon, one can ostensibly use the job options out of the release and make an aodtotag.py like the following. For RecExCommon the default value fore EvtMax is 5.

PoolAODInput=['/local/workdir/test1/csc11.005010.J1_pythia_jetjet.merge.AOD.v11004104._00001.pool.root']; 
PoolAODOutput='mytest'; 

EvtMax=100; 
doCBNT=False
readAOD=True; 
doWriteESD=False 
doWriteAOD=False; 
doAOD=False;
doWriteTAG=True; 

DetDescrVersion = 'ATLAS-DC3-02' 

# main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")

RDBAccessSvc = Service("RDBAccessSvc")
RDBAccessSvc.HostName = "/tmp/cranshaw/geomDB_sqlite_11.0.42"

To get this to work, you need to make a personal copy of the geomdb somewhere. Erik suggested just doing it in /tmp like
cp /share/app/atlas_app/atlas_rel/11.0.42/atlas/offline/data/geomDB_sqlite   /tmp/cranshaw/geomDB_sqlite_11.0.42

Then one can do
time athena aodtotag.py

For 100 events the RecExCommon job takes 60 seconds whereas CSCMergeAODwithTags takes 33 seconds.

Building AOD and TAG from ESD using RecExCommon

One can use the following job options to build both AOD and TAG if one is beginning with an ESD dataset. Again, one has to do the geomdb indirection fix. So put the following in esdtoaodtag.py.

# steering file for ESD->AOD step
# see myTopOptions.py for more info

PoolESDInput=["/share/data3/agupta/csc11/csc11.005010.J1_pythia_jetjet.recon.ESD.v11004201/csc11.005010.J1_pythia_jetjet.recon.ESD.v11004201._00164.pool.root"]

PoolAODOutput="AOD.pool.root"

readESD=True
doWriteESD=False 
doWriteAOD=True 
doAOD=True 
#doWriteTAG=False 

DetDescrVersion = 'ATLAS-DC3-02' 

# main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")

RDBAccessSvc = Service("RDBAccessSvc")
RDBAccessSvc.HostName = "/tmp/cranshaw/geomDB_sqlite_11.0.42"

Then one can do
time athena esdtoaodtag.py

It took 1 min/ev for the processing above on tier2-06.

SSH CVS Access

Atlas CVS Access Help

SSH access

export CVSROOT=:ext:atlas-sw.cern.ch:/atlascvs
export CVS_RSH=ssh

Building Code that has been checked out and/or modified

You will need to add your work area to the CMTPATH, so go to your work area and

export CMTPATH=`pwd`:${CMTPATH}

WIP - Failed attempts, etc...

Some of the errors:
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/cmtsite/setup.sh 
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/cmtsite/setup.sh -tag=AtlasOffline,12.0.0
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/AtlasOffline/12.0.0/AtlasOfflineRunTime/cmt/setup.sh 
#CMT> Warning: package AtlasProductionRunTime AtlasProductionRunTime-*  not found (requested by AtlasOfflineRunTime)
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$ 

Sources of informations

Code and manuals useful to understand how to setup the working environment

Panda

From Panda Pilot:
       if job.atlasEnv : # atlas job, then we follow atlas conventions
            # define the job runtime environment

            if not analJob and job.trf.endswith('.py'): # for production python trf
                cmd1="source %s/atlas_app/atlas_rel/%s/cmtsite/setup.sh -tag=AtlasOffline,%s"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
                cmd2="export CMTPATH=%s/atlas_app/atlas_rel/%s/%s; source %s/atlas_app/atlas_rel/%s/%s/AtlasProductionRunTime/cmt/setup.sh"%(self.site.appdir,job.atlasRelease,job.homePackage,self.site.appdir,job.atlasRelease,job.homePackage)

            elif analJob and job.atlasRelease >= "11.3.0": # for anal python trf
                cmd1="source %s/atlas_app/atlas_rel/%s/cmtsite/setup.sh -tag=AtlasOffline,%s"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
                cmd2="source %s/atlas_app/atlas_rel/%s/AtlasOffline/%s/AtlasOfflineRunTime/cmt/setup.sh"%(self.site.appdir,job.atlasRelease,job.atlasRelease)

            else: # old fashion trf
                os.environ["RELEASE"]=job.atlasRelease
                os.environ["SITEROOT"]="%s/atlas_app/atlas_rel/%s"%(self.site.appdir,job.atlasRelease)
                os.environ["T_RELEASE"]=job.atlasRelease
                os.environ["T_DISTREL"]=os.environ["SITEROOT"]+"/dist/"+os.environ["T_RELEASE"]
                os.environ["WORKDIR"]=self.site.workdir

                # construct the command of execution
                cmd1="source %s/setup.sh"%(os.environ["SITEROOT"])
                cmd2="source %s/atlas_app/atlas_rel/%s/dist/%s/AtlasRelease/*/cmt/setup.sh -tag_add=DC2"%(self.site.appdir,job.atlasRelease,job.atlasRelease)

            if analJob:
                trfName = job.trf.split('/')[-1]
                #print commands.getoutput('wget %s' % job.trf)
                print commands.getoutput('%s %s' % (wgetCommand,job.trf))
                os.chmod(trfName,0755)
                import storage_access_info
                cmd3=''
                if storage_access_info.copytools.has_key(self.site.sitename):
                    cmd3='source %s;' % storage_access_info.copytools[self.site.sitename][1]
                cmd3+='./%s %s -u %s' % (trfName,job.jobPars,self.site.dq2url)
            elif job.trf.endswith('.py'): # for production python trf
                cmd3="%s %s"%(job.trf,job.jobPars)
            elif job.homePackage and job.homePackage != 'NULL': #non empty path
                cmd3="%s/atlas_app/atlas_rel/kitval/KitValidation/JobTransforms/%s/%s %s"%(self.site.appdir,job.homePackage,job.trf,job.jobPars)
            else:
                cmd3="%s/atlas_app/atlas_rel/kitval/KitValidation/JobTransforms/%s %s"%(self.site.appdir,job.trf,job.jobPars)
            cmd=cmd1+";"+cmd2+";"+cmd3
        else: # generic job
            if analJob:
                trfName = job.trf.split('/')[-1]
#                print commands.getoutput('wget %s' % job.trf)
                print commands.getoutput('%s %s' % (wgetCommand,job.trf))
                os.chmod(trfName,0755)
                cmd='./%s %s' % (trfName,job.jobPars)
            elif job.homePackage and job.homePackage != 'NULL' and job.homePackage != ' ': #non empty path
                cmd="%s/%s %s"%(job.homePackage,job.trf,job.jobPars)
            else:
                cmd="%s %s"%(job.trf,job.jobPars)
                
        print "\n !!! Command to run the job is : \n%s"%(cmd)
        sys.stdout.flush()
        rc=self.__forkThisJob(job,cmd)

-- MarcoMambelli - 11 Aug 2006 -- JerryGieraltowski - 10 Oct 2006
I Attachment Action Size Date Who Comment
MergeAODwithTags_v12.py.txttxt MergeAODwithTags_v12.py.txt manage 4 K 12 Oct 2006 - 21:47 JackCranshaw release 12 merge jo
mysql-my.cnfcnf mysql-my.cnf manage 4 K 15 Aug 2006 - 19:04 MarcoMambelli tier2-06 mysql configuration file
mysql-tag-install.txttxt mysql-tag-install.txt manage 2 K 15 Aug 2006 - 19:09 MarcoMambelli mysql installation log/notes
mysql-variables.txttxt mysql-variables.txt manage 20 K 15 Aug 2006 - 19:03 MarcoMambelli tier2-06 mysql install variables
requirementsEXT requirements manage 777 bytes 12 Oct 2006 - 21:52 JackCranshaw release 11 requirements
Topic revision: r22 - 08 Jan 2007, JackCranshaw
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback