Submitting jobs into ATLAS Connect from a Tier3

Enable Flocking from a Local SCHEDD into a RCC Factory

Before a user can flock jobs into an RCC Factory, the local HTCondor administrator must perform several steps.

These can be found at How to Setup Flocking into ATLAS Connect

User submit file modifications for various modes

Submit directly to ATLAS Connect

The following needs to be added to your HTCondor submit files to route your jobs into ATLAS Connect factories.\xA0 Jobs submitted with this "Requirements" string will run exclusively on ATLAS Connect.

Requirements            = ( IS_RCC )
Should_Transfer_Files   = IF_Needed
When_To_Transfer_Output = ON_Exit
Transfer_Output         = True
Transfer_Input_Files    = <comma separated list of input files>

Submit to your local Tier3 only

If you wish to have your jobs run exclusively on your local HTCondor pool, use the following "Requirements" string:

Requirements = ( IS_RCC =?= UNDEFINED" )

Submit locally, but overflow to additional resources

If you wish to have your jobs run on the local HTCondor pool and then flock to ATLAS Connect as an overflow, remove the "Requirements" string with ( IS_RCC ). Your local HTCondor should first try to run the job in the local condor pool. If not slots are available locally which meet the jobs requirements, the HTCondor will then attempt to flock the jobs to the RCC Factory on ATLAS Connect.

Submit a simple test job

The attached command file hostname.cmd is the simplest test job you can run. The job will execute a "hostname" and exit.

The steps to this simple test are

  1. Download the hostname.cmd command file
  2. Submit this file with condor_submit
  3. Check on the job status with condor_q
  4. When the job has completed, example the err and out files

The following is a sample session

[ddl@lx0 hostname]$ cat hostname.cmd
Universe                = Vanilla

Requirements            = ( IS_RCC )

Executable              = /bin/hostname
Should_Transfer_Files   = IF_Needed
When_To_Transfer_Output = ON_Exit
Transfer_Output         = True

Log                     = hostname.log
Output                  = hostname.out
Error                   = hostname.err

Notification            = Never

Queue 1

[ddl@lx0 hostname]$ condor_submit hostname.cmd
Submitting job(s).
1 job(s) submitted to cluster 20870.


[ddl@lx0 hostname]$ condor_q


-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
20870.0   ddl            12/4  15:15   0+00:00:00 I  0   0.0  hostname          

1 jobs; 1 idle, 0 running, 0 held


[ddl@lx0 hostname]$ condor_q

-- Submitter: lx0.hep.uiuc.edu : <128.174.118.140:42710> : lx0.hep.uiuc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held


[ddl@lx0 hostname]$ ls
hostname.cmd  hostname.err  hostname.log  hostname.out


[ddl@lx0 hostname]$ cat hostname.out

################################################################################
#####                                                                      #####
#####        Job is running within a Remote Cluster Connect Factory        #####
#####                                                                      #####
##### Date: Wed Dec 4 15:16:29 CST 2013                                    #####
##### User: ruc.uiuc                                                       #####
##### Host: uct2-c275.mwt2.org                                             #####
#####                                                                      #####
################################################################################

uct2-c275.mwt2.org

In the above example, the job ran successfully on node uct2-c275.mwt2.org

Submit more complex test job

Attached are three files for a more complex test.

Create a directory and download these three files into that directory. Also create the directory "log" in this same directory which will hold the log files.

rcc.cmd is a sample HTCondor submit file with the needed additions to flock to and run properly on an MWT2 node. This command file will execute the other attached bash script, rcc.sh, which executes a few simple commands. One of the commands is to display the contents of the transferred input file rcc.dat The current command file will queue up 10 runs of this script

Submit this file with the "condor_submit". A "condor_q" command should show 10 entries in your condor queue. After some period of time, when a job slot opens at MWT2, the jobs should flock to and run on a MWT2 node. When the jobs complete, the log files should show which node the job ran on. The node name will vary but are of the type

  • uct2-cNNN.mwt2.org
  • iut2-cNNN.iu.edu
  • taubNNN
  • golubNNN

Example session:

# mkdir -p rcc/log
# cd rcc

*Download the three attached files into "rcc", then

# condor_submit rcc.cmd
# cat log/rcc.out*

Setting up a working environment

When a job executes on MWT2, the environment is bare bones. The job does not have a grid proxy, Atlas Local Root Base (ALRB) is not setup, there are no data files (except what you transfer with your job), etc. You will need to setup a working environment needed by your job. This would include importing a proxy, setting up ALRB, moving files with xrdcp, etc. The following are some examples how you can setup a working environment for your job.

Importing a grid proxy

To use a grid proxy on MWT2, you will need to first create a valid proxy on your local node and then transfer that proxy to MWT2 as part of your job.

First create a valid proxy

voms-proxy-init -voms atlas -out x509_proxy

In the HTCondor submit command file, transfer this proxy file with your job

Transfer_Input_Files    = x509_proxy

When the job runs on MWT2, you must define an environment variable so that applications can find this x509 proxy in a non-standard location

export X509_USER_PROXY=${PWD}/x509_proxy

Setting up Atlas Local Root Base

You setup Atlas Local Root Base (ALRB) using the same procedure that you would use on your local node. The following are the steps that you would use.

Define the location where ALRB can be found

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase

If you use DQ2 Clients, specify a local site

export DQ2_LOCAL_SITE_ID=<valid DQ2 site>

You can now setup ARLB and any of the support products within ALRB

source $ATLAS_LOCAL_ROOT_BASE/user/atlasLocalSetup.sh
localSetupDQ2Client --skipConfirm
localSetupROOT --skipConfirm
asetup <Panda Release>

Example using Atlas Local Root Base, FAX and a grid proxy

The attached files are examples how to setup and use ALRB in a job

  1. Create a directory called "arlb" and download the above two files into this directory
  2. Create a proxy that will be imported with the job
  3. Submit with condor_submit alrb.cmd
  4. Wait for the jobs to finish with condor_q
  5. Examine the err and out files. You should also have a copy of the file xrootd.mwt2-1M

Email Notification

If you would to be notified on the status of your job via email, add the following line to your command file

Notification            = Always
Notify_User             = myname@email.edu

Replace myname@email.edu with the email address where you would like the notifications to be sent. Please note that the node running the local HTCondor must be properly configured to send email for notifications to be sent to the given email address.

Environmental Variables

The following is a list of environment variables which are predefined and available for a job to use

  • IS_RCC=True
  • IS_RCC_<factory>=True
  • _RCC_Factory=<factory>
  • _RCC_Port=<port>
  • _RCC_MaxRunningJobs=nnn

Examples on how to use this in a bash script are

# [[ -n $IS_RCC ]] && echo "We are running as an RCC job"
We are running as an RCC job

# echo $_RCC_Factory
mwt2

# [[ $_RCC_Factory == "mwt2" ]] && echo "Running on MWT2"
Running on MWT2

# case $_RCC_Factory in
(mwt2)          echo "MWT2" ;;
(uiuc)          echo "UIUC" ;;
(fresnostate)   echo "Fresno State" ;;
(*)             echo "Unknown" ;;
esac
MWT2


Topic attachments
I Attachment Action Size Date Who Comment
alrb.cmdcmd alrb.cmd manage 395 bytes 05 Dec 2013 - 00:31 DaveLesny  
alrb.shsh alrb.sh manage 873 bytes 05 Dec 2013 - 00:31 DaveLesny  
hostname.cmdcmd hostname.cmd manage 377 bytes 05 Dec 2013 - 00:30 DaveLesny  
rcc.cmdcmd rcc.cmd manage 483 bytes 05 Dec 2013 - 00:31 DaveLesny  
rcc.datdat rcc.dat manage 121 bytes 05 Dec 2013 - 00:31 DaveLesny  
rcc.shsh rcc.sh manage 95 bytes 05 Dec 2013 - 00:31 DaveLesny  
Topic revision: r28 - 07 Aug 2014, RobGardner
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback