Setup of Data Services Machine prototype
Introduction
A Data Services Machine is meant to be a machine that lives at a Tier2 Center as a VO-edge server, dedicated to ATLAS use, that acts to:
- Provide users the ability to access the Tier2's DQ2 server.
- Hosts or provides access to ATLAS specific database services, such as TAG and possibibly conditions (IOV and calibration) databases.
- Provide a skimming service for Tier2-resident datasets through either command line or web interfaces.
The host
tier2-06.uchicago.edu
is the prototype used for the Data Services.
To avoid NFS problems (known to be common - also during pacman installations) the software is installed locally. The directory chosen is
/local
, a 100GB disk dedicated to it. The components used are:
- a MySQL DB
- a DQ2 client (and the Grid clients required to operate it)
- ATLAS releases, accessed using the cluster-wide installation
/share/app
(OSG_APP
)
Directory structure
In order to distinguish the prototype-installed software from the system software for the time being the DS prototype sw is mostly installed in a separate area on a local disk:
/local/inst/
.
In this directory reside both tests and working clients that can be used also by MW Tier2 users as described in the quick howto:
DQ2Subscriptions
The preferred work area for transfers and tests by the people working on the DS project is:
/local/workdir
Developers can create subdirectories in this area and use them as they like (download data, write scripts, elaborate data, run Athena, ...).
If they have to install packaged software useful for the project they should use the install area above.
MySQL DB
The database has been installed in
/opt/mysql
using the tar-gz package from mysql.com (the RPMs are optimized for x386 while the tar.gz is for x686):
mysql-standard-4.1.20-pc-linux-gnu-i686.tar.gz
It has been installed and configured. Self-tests have been run and here you can see the
variables (mysql-variables.txt) and
mysql-my.cnf (my.cnf).
A database has been created with a reader user with unprotected read (select) access (
tier2reader
) and a writer user (
tier2writer
).
Here (mysql-tag-install.txt) is a log with notes about the installation.
Java and Jas3
The required Java5 (jdk1.5) and JAS3 have been installed on the prototype cluster (
/local/inst/java
and
/local/inst/jas
).
To run jas3 source the java setup (
. /local/inst/java/setup.sh
) and start jas:
/local/inst/jas/jas3
The SQLTuple plugin has been installed and a tuples DB has been created in the current server: reader and writer user are the same as for the taddb.
For documentation check:
Globus and Grid clients
Grid software has been installed using the Panda Job Submitter package and adding SRM clients to it.
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
... install it
pacman -get GCL:PandaJS
pacman -get http://vdt.cs.wisc.edu/vdt_1311_cache:SRM-V1-Client
pacman -get http://vdt.cs.wisc.edu/vdt_1311_cache:SRM-V2-Client
. ./setup.sh
There is the idea to consider the OSG WN Client for future installations.
The use of a Panda/OSG packaged client instead of starting from the components in VDT allows not to worry about the components included: other packager will make sure that all the necessary elements are included.
One more installation has been performed to compare the OSG/ITB WN Client package with the Panda client. The goal is to select the client that includes all the required functionalities and is smaller/lighter (or has less requirements).
A note about the installations is that both VDT 1.3.10 (PJS) and 1.3.11 (WNCli) do not recognize SLC 3 (It includes SLF 3) so it will ask confirmation to proceed anyway.
cd /local/inst/wnc060814
pacman -get ITB:wn-client.pacman
DQ2 client and utilities
DQ2 client has been installed following the instructions in:
https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientInstallation
wget http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/DQ2_0_2_11/client/install_dq2_client.sh
...
Once the client was installed the LCG libs (that according to instructions and setup files are available through AFS) had to be installed locally (in lcg300_py) and copied to the dq2 directory.
Then the DQ2 end user tools have been installed locally making a checkout of the CVS content (in dq2util) and copied to the dq2 dir.
Here are instructions on how to get them.
The setup has been adapted from the multiple example to get one that points only to the local installation (Python 2.4. is already installed and in the path).
At this point to use DQ2 you have simply to
. ./setup.sh
in
/local/inst/dq2
.
Here are instructions on how to use:
Using Athena
Releases are installed by the release manager for production (Xin Zhao - BNL) and considered already available at the CE.
Setup commands for release 11.0.42
source /share/app/atlas_app/atlas_rel/11.0.42/setup.sh
source /share/app/atlas_app/atlas_rel/11.0.42/dist/11.0.42/AtlasOfflineRunTime/AtlasOfflineRunTime-00-00-00/cmt/setup.sh
Setup commands differ for releases greater than 11.3.0. These differ because there was a change
to project builds between releases 11 and 12 resulting in a different directory structure for the Release.
These are the setup commands for release 12.0.4
source /share/app/atlas_app/atlas_rel/12.0.4/cmtsite/setup.sh -tag=AtlasOffline,12.0.4
source /share/app/atlas_app/atlas_rel/12.0.4/AtlasOffline/12.0.4/AtlasOfflineRunTime/cmt/setup.sh
Remember to add all "-tag" parameters (including 12.0.4) to the first setup. Otherwise, it will not complain but it will also not work correctly.
If the setup commands for Release 11.0.42 are executed, you should then be able to do a 'which athena' and see something like
alias athena='athena.py'
/share/app/atlas_app/atlas_rel/11.0.42/dist/11.0.42/InstallArea/share/bin/athena.py
NOTE: If you prefer to place the source commands in an executable script and then try to execute the script, you will not get a correct path definition for "athena". You MUST, in that case, source the executable script you just created.
Example:
Create the script "sourceme.11" containing the two setup commands for Release 11
> chmod +x sourceme.11
>source ./sourceme.11
This would correctly identify the alias for "athena".
If you had tried to execute sourceme.11 directly (i.e., ./sourceme.11), a subsequent call to "which athena" would result in athena not being identified. This all has to do with "atehna" being defined through an alias created in your login shell which is not passed down to other sub-shells.
Make a work area using mkdir and put the following content in a file named
requirements
properly edited to reflect your work area and release in use. Users may want to check out and compile code and then use
athena
with that code. To do the checkouts properly, one needs the
requirements
file to be visible in your work area.
Custom job options
One can build tags by putting the following content in a job options file, which for
this example we will call
CSCMergeAODwithTags.py
. There are differences between
release 11 and 12 libraries to load and their locations. The following is configured
for release 12, but there are comments which indicate how to change it to work for
release 11.
NOTE: Release 11 and Release 12 AOD are incompatible. If you are reading a csc11
file, you
must set the flags to use release 11.
CSCMergeAODWithTags_v12.py
This then gives you several flags that you can set. Defaults are in [ ].
- EvtMax [1000000]: the maximum number of events to read from the input
- seed [17]: the random seed used to put the random number in the tags
- dataset [0]: the dataset identifier such as 4100, 5010, ...
- PoolAODInput [NULL]: Input file list in python string list format, e.g. ['file.root']. Note that if one uses TAG files, then you must remove the .root from the name, e.g. test.TAG.root would look like ['test.TAG']. Also note that this can be a list of files.
- CollType [ImplicitROOT]: You need to set this if you are reading a TAG file or database.
- PoolAODOutput [test]: prefix prepended to AOD.root or TAG.root depending on what is written out.
- doWriteAOD [True]: borrowed directly from RecExCommon/RecExCommon_flags.py
- doWriteTAG [True]: borrowed directly from RecExCommon/RecExCommon_flags.py
So if you configure the job options above for release 11, an example which would write a TAG file for an existing AOD file.
The release 12 files are available via dcache, so you could also copy one of those files and change
PoolAODInput.
pfndir="/pnfs/uchicago.edu/data/usatlas/testIdeal_07/testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968"
pfnfile="testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1"
rm -f ${pfnfile}
dccp ${pfndir}/${pfnfile} .
ls -l
time athena -c "PoolAODInput=['testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1']; PoolAODOutput='mytest'; EvtMax=100; doWriteAOD=False" CSCMergeAODwithTags.py
NOTE: If you put this command by itself or with others into an executable script (say
runathena
) you must execute the script with the command
source ./runathena
.
This will produce the following ouptuts for the previous athena run where EvtMax was set to 100:
real 0m37.345s
user 0m31.110s
sys 0m1.570s
total 89752
-rw-rw-r-- 1 gfg gfg 26 Oct 12 10:05 AtRndmGenSvc.out
-rw-rw-r-- 1 gfg gfg 0 Oct 12 10:04 cdb.log
-rw-rw-r-- 1 gfg gfg 17185 Oct 12 10:05 CLIDDBout.txt
-rw-r--r-- 1 gfg gfg 85879 Oct 12 10:05 mytest.TAG.root
-rw-rw-r-- 1 gfg gfg 106 Oct 12 10:05 PoolFileCatalog.xml
The PoolAODOutput file is named
mytest.TAG.root
. The above command took 31 sec to build the TAGs for 100 ev on tier2-06.
Doublemint job options
All the parameters in the -c quote of the athena command can be put into a python file like
myTopOptions.py
PoolAODInput=['testIdeal_07.005711.PythiaB_bbmu6mu4X.recon.AOD.v12000201_tid002968._00001.pool.root.1'];
PoolAODOutput='mytest';
EvtMax=100;
doWriteAOD=False;
doWriteTAG=True
readAOD=True;
doAOD=False;
One can then do the equivalent processing with
athena myTopOptions.py CSCMergeAODwithTags.py
Deleting the EvtMax parameter from the athena command would force it to use the default of EvtMax=10000. The following output was produced for an athena run using the default (which resulted in 1000 events):
real 1m7.737s
user 0m53.630s
sys 0m7.940s
total 90136
-rw-rw-r-- 1 gfg gfg 28 Oct 12 10:37 AtRndmGenSvc.out
-rw-rw-r-- 1 gfg gfg 0 Oct 12 10:36 cdb.log
-rw-rw-r-- 1 gfg gfg 1 7185 Oct 12 10:37 CLIDDBout.txt
-rw-r--r-- 1 gfg gfg 387444 Oct 12 10:37 mytest.TAG.root
-rw-rw-r-- 1 gfg gfg 106 Oct 12 10:36 PoolFileCatalog.xml
The PoolAODOutput file is named
mytest.TAG.root
. The above command took 54 seconds to build the TAGs for 1000 ev on tier2-06.
If one then wants to check the resulting mytest.TAG.root, execute the following commands.
Note that one needs to do a pool_insertFileToCatalog of the input file for it to work properly. In the future, hopefully
we can just point jobs at the LRC where the file is already registered.
pool_insertFileToCatalog /local/workdir/test1/csc11.005010.J1_pythia_jetjet.merge.AOD.v11004104._00001.pool.root
athena -c "In=['mytest.TAG']; CollType='ExplicitROOT'" AthenaPoolUtilities/EventCount.py
The resultant output should contain the following lines with the event count matching that of the athena run of 1000 events.
EventCount INFO ---------- INPUT FILE SUMMARY ----------
EventCount INFO Input contained: 1000 events
EventCount INFO -- Event Range ( 4800 .. 51599 )
Test Results: Working with Release 12.0.x
As a followup to the Data Services meeting of October 10, a set of tests were identified to asess the relative ease or difficulty of working with two Release 12.0 datasets resident in the UC_VOB local replica catalog. The current status of these tests can be found
here.
Using RecExCommon job options
For those more familiar with RecExCommon, one can ostensibly use the job options out of the release and make
an aodtotag.py like the following. For RecExCommon the default value fore EvtMax is
5.
PoolAODInput=['/local/workdir/test1/csc11.005010.J1_pythia_jetjet.merge.AOD.v11004104._00001.pool.root'];
PoolAODOutput='mytest';
EvtMax=100;
doCBNT=False
readAOD=True;
doWriteESD=False
doWriteAOD=False;
doAOD=False;
doWriteTAG=True;
DetDescrVersion = 'ATLAS-DC3-02'
# main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")
RDBAccessSvc = Service("RDBAccessSvc")
RDBAccessSvc.HostName = "/tmp/cranshaw/geomDB_sqlite_11.0.42"
To get this to work, you need to make a personal copy of the geomdb somewhere. Erik
suggested just doing it in /tmp like
cp /share/app/atlas_app/atlas_rel/11.0.42/atlas/offline/data/geomDB_sqlite /tmp/cranshaw/geomDB_sqlite_11.0.42
Then one can do
time athena aodtotag.py
For 100 events the RecExCommon job takes 60 seconds whereas CSCMergeAODwithTags takes 33 seconds.
Building AOD and TAG from ESD using RecExCommon
One can use the following job options to build both AOD and TAG if one is beginning with an ESD dataset.
Again, one has to do the geomdb indirection fix. So put the following in esdtoaodtag.py.
# steering file for ESD->AOD step
# see myTopOptions.py for more info
PoolESDInput=["/share/data3/agupta/csc11/csc11.005010.J1_pythia_jetjet.recon.ESD.v11004201/csc11.005010.J1_pythia_jetjet.recon.ESD.v11004201._00164.pool.root"]
PoolAODOutput="AOD.pool.root"
readESD=True
doWriteESD=False
doWriteAOD=True
doAOD=True
#doWriteTAG=False
DetDescrVersion = 'ATLAS-DC3-02'
# main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")
RDBAccessSvc = Service("RDBAccessSvc")
RDBAccessSvc.HostName = "/tmp/cranshaw/geomDB_sqlite_11.0.42"
Then one can do
time athena esdtoaodtag.py
It took 1 min/ev for the processing above on tier2-06.
SSH CVS Access
Atlas CVS Access Help
SSH access
export CVSROOT=:ext:atlas-sw.cern.ch:/atlascvs
export CVS_RSH=ssh
Building Code that has been checked out and/or modified
You will need to add your work area to the CMTPATH, so go to your work area and
export CMTPATH=`pwd`:${CMTPATH}
WIP - Failed attempts, etc...
Some of the errors:
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/cmtsite/setup.sh
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/cmtsite/setup.sh -tag=AtlasOffline,12.0.0
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$ source /share/app/atlas_app/atlas_rel/12.0.0/AtlasOffline/12.0.0/AtlasOfflineRunTime/cmt/setup.sh
#CMT> Warning: package AtlasProductionRunTime AtlasProductionRunTime-* not found (requested by AtlasOfflineRunTime)
#CMT> The CMTSITE value STANDALONE is not used in any tag expression. Please check spelling
[marco@tier2-06 inst]$
Code and manuals useful to understand how to setup the working environment
Panda
From Panda Pilot:
if job.atlasEnv : # atlas job, then we follow atlas conventions
# define the job runtime environment
if not analJob and job.trf.endswith('.py'): # for production python trf
cmd1="source %s/atlas_app/atlas_rel/%s/cmtsite/setup.sh -tag=AtlasOffline,%s"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
cmd2="export CMTPATH=%s/atlas_app/atlas_rel/%s/%s; source %s/atlas_app/atlas_rel/%s/%s/AtlasProductionRunTime/cmt/setup.sh"%(self.site.appdir,job.atlasRelease,job.homePackage,self.site.appdir,job.atlasRelease,job.homePackage)
elif analJob and job.atlasRelease >= "11.3.0": # for anal python trf
cmd1="source %s/atlas_app/atlas_rel/%s/cmtsite/setup.sh -tag=AtlasOffline,%s"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
cmd2="source %s/atlas_app/atlas_rel/%s/AtlasOffline/%s/AtlasOfflineRunTime/cmt/setup.sh"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
else: # old fashion trf
os.environ["RELEASE"]=job.atlasRelease
os.environ["SITEROOT"]="%s/atlas_app/atlas_rel/%s"%(self.site.appdir,job.atlasRelease)
os.environ["T_RELEASE"]=job.atlasRelease
os.environ["T_DISTREL"]=os.environ["SITEROOT"]+"/dist/"+os.environ["T_RELEASE"]
os.environ["WORKDIR"]=self.site.workdir
# construct the command of execution
cmd1="source %s/setup.sh"%(os.environ["SITEROOT"])
cmd2="source %s/atlas_app/atlas_rel/%s/dist/%s/AtlasRelease/*/cmt/setup.sh -tag_add=DC2"%(self.site.appdir,job.atlasRelease,job.atlasRelease)
if analJob:
trfName = job.trf.split('/')[-1]
#print commands.getoutput('wget %s' % job.trf)
print commands.getoutput('%s %s' % (wgetCommand,job.trf))
os.chmod(trfName,0755)
import storage_access_info
cmd3=''
if storage_access_info.copytools.has_key(self.site.sitename):
cmd3='source %s;' % storage_access_info.copytools[self.site.sitename][1]
cmd3+='./%s %s -u %s' % (trfName,job.jobPars,self.site.dq2url)
elif job.trf.endswith('.py'): # for production python trf
cmd3="%s %s"%(job.trf,job.jobPars)
elif job.homePackage and job.homePackage != 'NULL': #non empty path
cmd3="%s/atlas_app/atlas_rel/kitval/KitValidation/JobTransforms/%s/%s %s"%(self.site.appdir,job.homePackage,job.trf,job.jobPars)
else:
cmd3="%s/atlas_app/atlas_rel/kitval/KitValidation/JobTransforms/%s %s"%(self.site.appdir,job.trf,job.jobPars)
cmd=cmd1+";"+cmd2+";"+cmd3
else: # generic job
if analJob:
trfName = job.trf.split('/')[-1]
# print commands.getoutput('wget %s' % job.trf)
print commands.getoutput('%s %s' % (wgetCommand,job.trf))
os.chmod(trfName,0755)
cmd='./%s %s' % (trfName,job.jobPars)
elif job.homePackage and job.homePackage != 'NULL' and job.homePackage != ' ': #non empty path
cmd="%s/%s %s"%(job.homePackage,job.trf,job.jobPars)
else:
cmd="%s %s"%(job.trf,job.jobPars)
print "\n !!! Command to run the job is : \n%s"%(cmd)
sys.stdout.flush()
rc=self.__forkThisJob(job,cmd)
--
MarcoMambelli - 11 Aug 2006
--
JerryGieraltowski - 10 Oct 2006