Submitting jobs into ATLAS Connect from a Tier3
Enable Flocking from a Local SCHEDD into a RCC Factory
Before a user can flock jobs into an RCC Factory, the local HTCondor administrator must perform several steps.
These can be found at
How to Setup Flocking into ATLAS Connect
User submit file modifications for various modes
Submit directly to ATLAS Connect
The following needs to be added to your HTCondor submit files to route your jobs into
ATLAS Connect factories. Jobs submitted with this "Requirements" string will run exclusively on ATLAS Connect.
Requirements = ( IS_RCC )
Should_Transfer_Files = IF_Needed
When_To_Transfer_Output = ON_Exit
Transfer_Output = True
Transfer_Input_Files = <comma separated list of input files>
Submit to your local Tier3 only
If you wish to have your jobs run exclusively on your local HTCondor pool, use the following "Requirements" string:
Requirements = ( IS_RCC =?= UNDEFINED" )
Submit locally, but overflow to additional resources
If you wish to have your jobs run on the local HTCondor pool and then flock to ATLAS Connect as an overflow, remove the "Requirements" string with ( IS_RCC ).
Your local HTCondor should first try to run the job in the local condor pool. If not slots are available locally which meet the jobs requirements, the HTCondor will then attempt to flock the jobs to the RCC Factory on ATLAS Connect.
Submit a simple test job
The attached command file
hostname.cmd is the simplest test job you can run.
The job will execute a "hostname" and exit.
The steps to this simple test are
- Download the hostname.cmd command file
- Submit this file with condor_submit
- Check on the job status with condor_q
- When the job has completed, example the err and out files
The following is a sample session
[ddl@lx0 hostname]$ cat hostname.cmd
Universe = Vanilla
Requirements = ( IS_RCC )
Executable = /bin/hostname
Should_Transfer_Files = IF_Needed
When_To_Transfer_Output = ON_Exit
Transfer_Output = True
Log = hostname.log
Output = hostname.out
Error = hostname.err
Notification = Never
Queue 1
[ddl@lx0 hostname]$ condor_submit hostname.cmd
Submitting job(s).
1 job(s) submitted to cluster 20870.
[ddl@lx0 hostname]$ condor_q
-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
20870.0 ddl 12/4 15:15 0+00:00:00 I 0 0.0 hostname
1 jobs; 1 idle, 0 running, 0 held
[ddl@lx0 hostname]$ condor_q
-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 idle, 0 running, 0 held
[ddl@lx0 hostname]$ ls
hostname.cmd hostname.err hostname.log hostname.out
[ddl@lx0 hostname]$ cat hostname.out
################################################################################
##### #####
##### Job is running within a Remote Cluster Connect Factory #####
##### #####
##### Date: Wed Dec 4 15:16:29 CST 2013 #####
##### User: ruc.uiuc #####
##### Host: uct2-c275.mwt2.org #####
##### #####
################################################################################
uct2-c275.mwt2.org
In the above example, the job ran successfully on node
uct2-c275.mwt2.org
Submit more complex test job
Attached are three files for a more complex test.
Create a directory and download these three files into that directory. Also create the directory "log" in this same directory which will hold the log files.
rcc.cmd is a sample HTCondor submit file with the needed additions to flock to and run properly on an MWT2 node.
This command file will execute the other attached bash script,
rcc.sh, which executes a few simple commands.
One of the commands is to display the contents of the transferred input file
rcc.dat
The current command file will queue up 10 runs of this script
Submit this file with the "condor_submit". A "condor_q" command should show 10 entries in your condor queue. After some period of time,
when a job slot opens at MWT2, the jobs should flock to and run on a MWT2 node. When the jobs complete, the log files should show
which node the job ran on. The node name will vary but are of the type
- uct2-cNNN.mwt2.org
- iut2-cNNN.iu.edu
- taubNNN
- golubNNN
Example session:
# mkdir -p rcc/log
# cd rcc
*Download the three attached files into "rcc", then
# condor_submit rcc.cmd
# cat log/rcc.out*
Setting up a working environment
When a job executes on MWT2, the environment is bare bones. The job does not have a grid proxy, Atlas Local Root Base (ALRB) is not setup, there are no data files (except what you transfer with your job), etc. You will need to setup a working environment needed by your job. This would include importing a proxy, setting up ALRB, moving files with xrdcp, etc. The following are some examples how you can setup a working environment for your job.
Importing a grid proxy
To use a grid proxy on MWT2, you will need to first create a valid proxy on your local node and then transfer that proxy to MWT2 as part of your job.
First create a valid proxy
voms-proxy-init -voms atlas -out x509_proxy
In the HTCondor submit command file, transfer this proxy file with your job
Transfer_Input_Files = x509_proxy
When the job runs on MWT2, you must define an environment variable so that applications can find this x509 proxy in a non-standard location
export X509_USER_PROXY=${PWD}/x509_proxy
Setting up Atlas Local Root Base
You setup Atlas Local Root Base (ALRB) using the same procedure that you would use on your local node. The following are the steps that you would use.
Define the location where ALRB can be found
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
If you use DQ2 Clients, specify a local site
export DQ2_LOCAL_SITE_ID=<valid DQ2 site>
You can now setup ARLB and any of the support products within ALRB
source $ATLAS_LOCAL_ROOT_BASE/user/atlasLocalSetup.sh
localSetupDQ2Client --skipConfirm
localSetupROOT --skipConfirm
asetup <Panda Release>
Example using Atlas Local Root Base, FAX and a grid proxy
The attached files are examples how to setup and use ALRB in a job
- Create a directory called "arlb" and download the above two files into this directory
- Create a proxy that will be imported with the job
- Submit with condor_submit alrb.cmd
- Wait for the jobs to finish with condor_q
- Examine the err and out files. You should also have a copy of the file xrootd.mwt2-1M
Email Notification
If you would to be notified on the status of your job via email, add the following line to your command file
Notification = Always
Notify_User = myname@email.edu
Replace
myname@email.edu
with the email address where you would like the notifications to be sent. Please note that the node running the local HTCondor must be properly configured to send email for notifications to be sent to the given email address.
Environmental Variables
The following is a list of environment variables which are predefined and available for a job to use
- IS_RCC=True
- IS_RCC_<factory>=True
- _RCC_Factory=<factory>
- _RCC_Port=<port>
- _RCC_MaxRunningJobs=nnn
Examples on how to use this in a bash script are
# [[ -n $IS_RCC ]] && echo "We are running as an RCC job"
We are running as an RCC job
# echo $_RCC_Factory
mwt2
# [[ $_RCC_Factory == "mwt2" ]] && echo "Running on MWT2"
Running on MWT2
# case $_RCC_Factory in
(mwt2) echo "MWT2" ;;
(uiuc) echo "UIUC" ;;
(fresnostate) echo "Fresno State" ;;
(*) echo "Unknown" ;;
esac
MWT2