OSG End User Tools

Introduction

These exercises introduce you to some simple Grid activities. They will give you the necessary skills to begin using OSG for your own applications. Most of it can be applies to other Globus-based grids.

These notes have been written for the hands-on tutorial "OSG End User tools" at the March 2009 OSG All Hands Meeting. During the hands-on session (Track I: OSG End User Tools, March 2, 2009, 1:30pm-3:00pm, Main Conference Room) the main part of this tutorial will be explained. To follow the session you are expected to have installed already OSG Client on the machine that you will be using. Please follow along and do not hesitate to talk and ask questions.

You are welcome to use these notes in a later time to guide you through the presented exercises at your own pace. You will be given commands to type, along with the expected output and notes highlighting the key points of each step.

These notes have transcripts from a machine called uct3-edge5.uchicago.edu used as client host. Most likely you might be using a different machine, in which case you should be careful to replace uct3-edge5.uchicago.edu with the name of the machine you are logged in to. Most of the examples submit jobs to uct3-edge7.uchicago.edu, this gatekeeper belongs to UC_ITB, an ITB site, and is sponsored by the ATLAS VO. If you are submitting to a different host please replace uct3-edge7.uchicago.edu with the correct hostname.

Client installation and setup

This part will not be covered in the hands-on session. You are expected to have a working client and know how to set up the working environment. Anyway here are some pointers and a brief description.

The client installation is described in the OSG wiki: http://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/ClientInstallationGuide See http://vdt.cs.wisc.edu/releases/1.10.1 for system requirements. If you are not familiar with Pacman you can check Pacman's documentation to see how to use it to install OSG Client.

Create a directory for the client and cd to it, such as:
[~]> mkdir osg-client
[~]> cd osg-client/
And install:
[~/osg-client]> pacman -get OSG:client
...
In the instructions above you'll find how to answer to the installation questions. In brief reply yall and y to caches and licenses questions, l (local installation) when asked where to install the certificates and no to all other questions. Remember at the end to do the post install operations to install the CA certificates.

To use the client is sufficient to source the setup file:
> source your-osg-client-dir/setup.(c)sh

If you have a firewall pay attention at the GLOBUS_TCP_PORT_RANGE variable: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/ClientInstallationGuide#Firewall_Considerations

Security

This exercise will provide hands-on experience in using various tools to setup and use the Grid Security Infrastructure (GSI) for working on the grid. The first few sections delve into certificates and proxies and demonstarate how pre-configured credentials can be used to run some grid enable programs. (more information)

Proxies

In order to do things (like submit jobs or transfer data) on the grid, you need a grid proxy. A grid proxy contains everything necessary to authenticate you to grid resources. The All Hands meeting has a session all about security and recommendations.

You can create a proxy certificate with
voms-proxy-init
. You can check the result with the voms-proxy-info (or grid-proxy-info) command.

[uct3-edge5] /ecache/marco/test_osg > voms-proxy-init -voms atlas:/atlas/usatlas/Role=software -hours 300
Enter GRID pass phrase:
Your identity: /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802
Creating temporary proxy ......................... Done
Contacting  lcg-voms.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "atlas" Done
Creating proxy ................................................................ Done
Your proxy is valid until Sun Mar 15 05:37:47 2009
When you request a VOMS proxy you need to have your x509 certificate (in the ~/.globus directory) and you can activate any VO, group or role enabled for your certificate. The parameter -voms atlas:/atlas/usatlas/Role=software instructs to connect to the ATLAS VOMS server, activates a proxy for the ATLAS VO, in the /atlas/usatlas group and with a software Role.

Here the command to check a proxy:
> voms-proxy-info -all
WARNING: Unable to verify signature! Server certificate possibly not installed.
Error: Cannot verify AC signature!
subject   : /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802/CN=proxy
issuer    : /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802
identity  : /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u20003
timeleft  : 299:57:53
=== VO atlas extension information ===
VO        : atlas
subject   : /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802
issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch
attribute : /atlas/usatlas/Role=software/Capability=NULL
attribute : /atlas/lcg1/Role=NULL/Capability=NULL
attribute : /atlas/usatlas/Role=NULL/Capability=NULL
attribute : /atlas/Role=NULL/Capability=NULL
attribute : nickname =  (atlas)
timeleft  : 11:57:53
uri       : lcg-voms.cern.ch:15001

Look at the timeleft field. This tells you how much time this proxy will be valid for. Check that there is some time left on your proxy. (When this proxy has expired, you will no longer be able to use the grid, and you will have to get a new proxy)

Note that the expiration time of the extended attributes is shorter than the one of the general proxy. The VOMS server may limit the longevity of the proxy to a time shorter to the one requested (server configuration).

You can play with different options to voms-proxy-init and checking the content, specially the extended attributes. grid-proxy-info will give a similar information without the extended attributes.

Do not worry about about the message "WARNING: Unable to verify signature! Server certificate possibly not installed. Error: Cannot verify AC signature!". It means that the signature of the server adding the extensions could not be verified. This will be done by the servers accepting your requests. Here we are interested in verifying the content of the proxy more than its authenticity.

Grid Proxy Details

  • subject - The distingushed name (DN) from the certificate, appended with a uniqe string of numbers.
  • issuer - The distinguished name of the user certificate itself.
  • path - The file system location where the your proxy is stored.
  • timeleft - How much longer the proxy will be valid, in hours, minutes and seconds.

As you can see, the issuer of the grid certificate is the user certificate. This shows the chain of trust: CA -> user certificate -> proxy certificate. The chain can be arbitrary long: you can generate a proxy out of a proxy. In fact voms-proxy-init is first generating a regular grid proxy, then sending that one to the VOMS server to add the extended attributes.

The proxy certificate contains the private key generated for proxy, corresponding public key, and is signed like a certificate by the user certificate.

Now list the contents of the proxy using grid-cert-info, specifying the full path to your proxy.
> grid-cert-info -file /tmp/x509up_u20003
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 26685 (0x683d)
        Signature Algorithm: md5WithRSAEncryption
        Issuer: DC=org, DC=doegrids, OU=People, CN=Marco Mambelli 325802
        Validity
            Not Before: Mar  2 13:20:47 2009 GMT
            Not After : Mar 15 01:25:47 2009 GMT
        Subject: DC=org, DC=doegrids, OU=People, CN=Marco Mambelli 325802, CN=proxy
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (1024 bit)
                Modulus (1024 bit):
                    00:e9:ea:5f:28:ad:86:d1:f6:97:3c:41:8f:34:2a:
                    1b:4e:c8:18:33:1e:02:74:cd:44:4b:4d:c7:fb:15:
                    49:b9:73:f4:51:11:9b:cb:43:21:28:56:4e:3b:e6:
                    7e:96:7d:ed:1c:21:1d:04:5a:aa:64:51:49:2c:e2:
                    9e:14:be:d9:c0:e9:e0:8b:74:04:7e:58:9e:e5:ba:
                    da:cc:d7:c5:72:5e:68:7f:c7:4b:4f:2c:17:45:84:
                    63:bf:cd:59:30:48:35:2d:d1:38:11:08:1d:98:22:
                    a8:1d:30:ea:60:1c:a5:28:3f:e6:0e:20:21:fd:75:
                    a5:99:d6:f4:48:be:33:e6:21
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            1.3.6.1.4.1.8005.100.100.5: 
                0..90..50..10......0l.j0d.b0`1.0..
..&...,d....org1.0..
..U....People1.0...U....Marco Mambelli 325802..h=.b0`.^0\1.0..
..&...,d....org1.0..
.......*.:.0"..20090302132546Z..20090303012546Z0..0...acf.bnl.gov0
+.....Edd.1..0......atlas://vo.racf.bnl.gov:150030...,/atlas/usatlas/Role=software/Capability=NULL. /atlas/Role=NULL/Capability=NULL.(/atlas/usatlas/Role=NULL/Capability=NULL.%/atl.................F.@..!.lity=NULL0,0...U.
..+....
            X509v3 Key Usage: critical
            Digital Signature, Key Encipherment, Data Encipherment
            1.3.6.1.4.1.8005.100.100.6: 
                03
    Signature Algorithm: md5WithRSAEncryption
        3d:22:a7:50:55:5f:cb:0f:ed:f8:95:a5:7d:13:be:b4:ab:04:
        13:7a:96:e9:ff:ff:61:22:bb:5a:38:e1:9d:8a:de:46:bb:0e:
        3d:61:4c:08:7d:87:54:cc:f9:04:e3:69:9b:0a:54:40:a5:74:
        bd:c9:9e:2e:04:47:44:90:37:a5:e0:80:70:15:d1:ca:60:30:
        34:e7:b0:d2:7a:7c:ec:b1:6f:dd:90:bc:88:c8:a3:11:83:c9:
        99:5b:9e:31:16:f7:b2:f7:42:e9:4e:19:5f:da:87:fa:00:ae:
        1a:ae:26:fa:6a:f3:d9:62:fb:fb:df:3a:ad:37:14:87:f2:a3:
        e5:dd
Pay attention that the length of the proxy may be shorter than the one requested. The contents are similar to your user certificate (run grid-cert-info against it), but there are some differences; for example, the issuer is the DN of the user certificate, rather than of the certificate authority.

grid-cert-info is useful to see how long your proxy certificate will last (the Not Before and Not After lines under Validity).

Contents of the Grid Mapfile

Globus services (for example, GRAM and GridFTP) use a grid mapfile located by default in /etc/grid-security/grid-mapfile on each server.

This file has restricted write access, but the file can be read by anyone.

You can look at the gridmap file on uct3-edge7.uchicago.edu like this:
> cat /etc/grid-security/grid-mapfile
"/DC=org/DC=doegrids/OU=People/CN=Suchandra Thapa 757586" sthapa
"/DC=org/DC=doegrids/OU=People/CN=Martin Feller 807394"  mfeller
"/DC=org/DC=doegrids/OU=People/CN=R Jefferson Porter 227760" ivdgl

Grid mapfiles can be created by system administrators by hand or using a number of tools. Most OSG sites use a tool called GUMS this allows to generate a grid-map file or to have callouts to verify the certificate's authorization dynamically.

Job submission with GRAM

Running Grid Jobs with Globus Commands

Now you should be able to run some execution jobs on a Computing Element (CE).

First we'll try a simple 'Hello World' job:
> globus-job-run uct3-edge7.uchicago.edu /bin/echo Hello World
Hello World

You've just submitted a job (the Linux command echo) to run on uct3-edge7.uchicago.edu. This is a simple building block for grid execution.

The globus-job-run utility runs commands on remote sites. You must tell this command several pieces of information:

The name of the host on which to run the job. In this example, we specified 'localhost', meaning the host you are using.

The name of the command to execute remotely. This must be be fully qualified path names (i.e., it must start with a "/"). In this example, we specified '/bin/echo'.

Parameters to pass to the command. In this example, we specify a message for echo, the text 'Hello World'.

Now we will run the Linux command hostname on the remote site to verify that we're talking to the resource we think we are.

Run it locally to make sure you are invoking it correctly. This works best if you are running locally on a gatekeeper that you want to verify. It will likely work if the system is similar (in my case uct3-edge5.uchicago.edu and uct3-edge7.uchicago.edu are both SL4).
> hostname
uct3-edge5.uchicago.edu

Use the command which to discover the location of the version of hostname that you are using. It will return a fully-qualified path name.
> which hostname
/bin/hostname

This tells you that to run hostname via globus-job-run, use /bin/hostname.

Use which to discover the location of the following commands on the system:
id
env
ps
uptime

Now run hostname remotely, on uct3-edge7.uchicago.edu, to verify that you really are reaching a remote system:
> globus-job-run uct3-edge7.uchicago.edu /bin/hostname
uct3-edge7.uchicago.edu

Next, see what else can you learn about the remote system with this approach.

Discover what user ID your job ran under using id.

Discover what environment variables are set using env.

Discover the load on the remote Grid server using uptime.

Discover the default working directory in which your remote job will run using pwd.

Do an ls of this working directory.

Use df to discover how much storage space exists in this working directory.

Use df to discover how much storage space exists in the remote /tmp directory.

Immediate and Batch Job Managers

GRAM, the Globus component for running remote jobs, supports the concept of a job manager as an adapter to Local Resource Managers (LRM). Each site can support one or more such job managers. Our lab systems have two job managers: The fork job manager runs a job immediately, spawning a process on the gatekeeper to execute the job. The Condor job manager submits jobs into the Condor batch scheduling system.

Now we will investigate some of the differences between the fork and Condor jobmanagers. Which do you think will be faster? Use the command time to test which jobmanager is faster.

The "fork" job manager is very fast - it has low scheduling latency. It runs trivial commands very quickly. But it also has very little compute power - its usually just a single CPU on a front-end computer called the head node. A batch job manager, on the other hand, has a higher scheduling overhead, but usually gives you access to all computers in a cluster and access to a lot more compute power.

uct3-edge7 uses the Condor LRM. Other sites systems sometimes use other LRMs. For example, Portable Batch System (PBS) is very common. To submit a job to a site using PBS, you must specify jobmanager-pbs.

Now try a job through PBS on a different machine: uct2-grid6.uchicago.edu
> globus-job-run uct2-grid6.uchicago.edu/jobmanager-pbs /bin/hostname
uct2-grid6.uchicago.edu

To time a command, enter time commandname:
> time sleep 3
real    0m3.007s
user    0m0.004s
sys     0m0.000s

Use this to time a few trivial Grid jobs to compare Fork and Condor:
> time globus-job-run uct3-edge7.uchicago.edu/jobmanager-condor /bin/hostname
uct3-edge7.uchicago.edu

real    0m10.678s
user    0m0.090s
sys     0m0.030s

> time globus-job-run uct3-edge7.uchicago.edu/jobmanager-fork /bin/hostname
uct3-edge7.uchicago.edu

real    0m0.488s
user    0m0.090s
sys     0m0.020s

Condor components for job management

Earlier we learned about Condor as a LRM. Condor allows to manage also remote jobs. In this case it is called Condor-G. Now we will submit some simple jobs using it. It will facilitate to keep track of your jobs.

Submission to Condor-G requires a submit file (to complex for a single command line) and the use of additional file will ease the monitoring of the job providing a log file and a copy of the job's standard output and standard error.

Getting set up

You need to start your local Condor-G
> condor_master
Make sure that you can write in the log dir of Condor (OSG_LOCATION/condor/hostname). Condor will complain if there are problems and not start condor_master and condor_schedd.

To kill your local Condor-G you use
condor_off -all
and make sure that you don't have any condor_* process left.

Check the Condor queue with condor_q
> condor_q

-- Submitter: uct3-edge5.uchicago.edu : <10.1.3.249:54103> : uct3-edge5.uchicago.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held
This command lists everything that Condor (in our case Condor-G) has been asked to run. Everyone will be using the same Condor installation for these exercises, so you will often see other users' jobs in the queue alongside your own.

Next, create some directories for you to work in. Make them in your home directory:
> cd ~
> mkdir condor-tutorial
> cd condor-tutorial
> mkdir submit

Submit a Simple Grid Job with CondorG

Now we are ready to submit our first job with Condor-G. The basic procedure is to create a Condor job submit description file. This file can tell Condor what executable to run, what resources to use, how to handle failures, where to store the job's output, and many other characteristics of the job submission. Then this file is given to condor_submit.

There are many options that can be specified in a Condor-G submit description file. We will start out with just a few. We'll be sending the job to the computer uct3-edge7.uchicago.edu and running under the "jobmanager-fork" job manager. We're setting notification to never to avoid getting email messages about the completion of our job, and redirecting the stdout/err of the job back to the submission computer.

For more information, see the condor_submit manual.

Create the Submit File

Here is an example for a remote job submission. Note the universe variable in the following snippet. It defines where to send the job. If the universe used were vanilla, the job would be executed on the submitting site itself (Condor would have to be configured to allow local execution). To submit jobs to a remote resource, like in our examples, the universe has to be set to grid (the resource will vary depending on the protocol used). Here is the example:
########################################
#                       
#  A sample Condor-G submission file
#                                        
########################################

executable = APPDIR/YOURUSERNAME/primetest

transfer_executable = false
universe       = grid
grid_resource = gt2 SITE/jobmanager
log            = prime.log
arguments      = 100 2 100
output = prime.out

queue
Note few elements in the submit file:
  • transfer_executable - false, the executable is available at the destination. If the executable is small you can ask Condor to transfer it.
  • executable - remote or local (depending on transfer) path to the executable
  • universe (see above)
  • grid_resource (see above)

Move to the scratch submission directory and create the submit file. Verify that it was entered correctly:
> cd ~/condor-tutorial/submit
USE YOUR FAVOURITE TEXT EDITOR TO ENTER THE FILE
CONTENT
> cat myjob.submit
executable=/share/osg/app/allhands/primetest
arguments=143
output=results.output
error=results.error
log=results.log
notification=never
universe=grid
grid_resource=gt2 uct3-edge7.uchicago.edu/jobmanager-fork
queue

Submit your test job to Condor-G
> condor_submit myjob.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 1.

Run condor_q to see the progress of your job. You can also run condor_q -globus to see Globus-specific status information. (See the condor_q manual for more information.)
> condor_q

-- Submitter: uct3-edge5.uchicago.edu : <10.1.3.249:54103> : uct3-edge5.uchicago.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   marco           3/2  17:40   0+00:00:22 R  0   0.0  primetest 143     

1 jobs; 0 idle, 1 running, 0 held
> condor_q -globus

-- Submitter: uct3-edge5.uchicago.edu : <10.1.3.249:54103> : uct3-edge5.uchicago.edu
 ID      OWNER          STATUS  MANAGER  HOST                EXECUTABLE        
   1.0   marco         ACTIVE fork     uct3-edge7.uchicag  /share/osg/app/all

Tip

If condor_q lists too many entries, you can use the job-id to refer your job. In the above example, this is 1 in the line 1 job(s) submitted to cluster 1 returned by condor_submit. Now you can just do condor_q XXX. You can also try using condor_q marco to enlist jobs submitted by user marco. You can try various options like -long and -globus with condor_q to see more details.

Monitoring Progress with tail

In another window, run tail -f on the log file for your job to monitor progress. Re-run tail when you submit one or more jobs throughout this tutorial. You will see how typical Condor-G jobs progress. Use Ctrl+C to stop watching the file.
> cd ~/condor-tutorial/submit
> tail -f --lines=500 results.log
000 (001.000.000) 03/02 17:40:34 Job submitted from host: <10.1.3.249:54103>
...
017 (001.000.000) 03/02 17:40:48 Job submitted to Globus
    RM-Contact: uct3-edge7.uchicago.edu/jobmanager-fork
    JM-Contact: https://uct3-edge7.uchicago.edu:33276/2757/1236037244/
    Can-Restart-JM: 1
...
027 (001.000.000) 03/02 17:40:48 Job submitted to grid resource
    GridResource: gt2 uct3-edge7.uchicago.edu/jobmanager-fork
    GridJobId: gt2 uct3-edge7.uchicago.edu/jobmanager-fork https://uct3-edge7.uchicago.edu:33276/2757/1236037244/
...
001 (001.000.000) 03/02 17:40:56 Job executing on host: gt2 uct3-edge7.uchicago.edu/jobmanager-fork
...
005 (001.000.000) 03/02 17:44:01 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...

Verifying completed jobs

When the job is no longer listed in condor_q, or when the log file reports Job terminated, the results can be viewed using condor_history:
> condor_history 
 ID      OWNER            SUBMITTED     RUN_TIME ST   COMPLETED CMD            
   1.0   marco           3/2  17:40   0+00:03:00 C   3/2  17:44 /share/osg/app/
When the job completes, verify that the output is as expected. The binary name is different from what you created because of how Globus and Condor-G cooperate to stage your file to execute computer.
> ls
myjob.submit  myscript.sh*  results.error  results.log   results.output
> cat results.error
> cat results.output 
NO - 11 is a factor

If you didn't watch results.log with tail -f, you will want to examine the logged information with cat results.log.

Submitting a job to batch queues

Create a new submit file:
> cat > myjob2.submit
executable=/share/osg/app/allhands/primetest
arguments=143
output=results2.output
error=results2.error
log=results2.log
notification=never
universe=grid
grid_resource=gt2 uct3-edge7.uchicago.edu/jobmanager-condor
queue
Ctrl+D
$ cat myjob2.submit
executable=/share/osg/app/allhands/primetest
arguments=143
output=results2.output
error=results2.error
log=results2.log
notification=never
universe=grid
grid_resource=gt2 uct3-edge7.uchicago.edu/jobmanager-condor
queue
Notice that the setting for the grid_resource now refers to Condor instead of fork. Globus will submit the job to Condor on uct3-edge7.uchicago.edu instead of running the job directly.

Submit the job to Condor-G:
$ condor_submit myjob2.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 2.

You can monitor the job's progress just like the first job. If you log into osg-edu.cs.wisc.edu in another window, you can see your job in the Condor queue there. Be quick, or the job will finish before you look!
> ssh uct3-edge7.uchicago.edu
marco@uct3-edge7.uchicago.edu's password: 
> condor_status 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@uct3-edge7.u LINUX      INTEL  Unclaimed Idle     0.090  2030  0+00:40:04
slot2@uct3-edge7.u LINUX      INTEL  Unclaimed Idle     0.000  2030  6+16:45:45
slot3@uct3-edge7.u LINUX      INTEL  Unclaimed Idle     0.000  2030  6+16:45:46
slot4@uct3-edge7.u LINUX      INTEL  Unclaimed Idle     0.000  2030  6+16:45:47

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     4     0       0         4       0          0        0

               Total     4     0       0         4       0          0        0
> condor_q


-- Submitter: uct3-edge7.uchicago.edu : <10.1.3.251:57171> : uct3-edge7.uchicago.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held

Clean up the results after the second job has finished running:
$ rm results.* results2.*

Data Management

Getting set up

Make a working directory for this exercise. For the rest of this exercise, all your work should be done in there.
> mkdir dataex
> cd dataex

Next create some files of different sizes, to use for exercises:
> dd if=/dev/zero of=smallfile-marco bs=1M count=10
> dd if=/dev/zero of=mediumfile-marco bs=1M count=50
> dd if=/dev/zero of=largefile-marco bs=1M count=200
> ls -sh
total 260M
200M largefile-marco   50M mediumfile-marco   10M smallfile-marco

Moving Files with GridFTP

Transfers to a remote site

Now try transferring a file to a remote site.

First you will need some scratch space on the remote system. You can create a working directory in the remote data directory.
> globus-job-run uct3-edge7.uchicago.edu /bin/env | grep OSG_DATA
OSG_DATA=/share/osg/data
> globus-job-run uct3-edge7.uchicago.edu /bin/mkdir /share/osg/data/osgallhand-marco

Now copy the file over to this directory:
> globus-url-copy -vb file:///home/marco/dataex/smallfile-marco gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/ex1
Source: file:///home/marco/dataex/
Dest:   gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/
  smallfile-marco  ->  ex1
      1048576 bytes         1.84 MB/sec avg         1.84 MB/sec inst

You will probably find that the transfer rate is much lower than when copying to local machines.

You can try copying to other sites in addition to uct3-edge7.uchicago.edu. Remember that you might need to make a scratch directory on each one, and that the place for this will be different for each site.

Measuring transfer speed

See how fast the file transfer is happening by using the -vb flag when copying the large file. Since this is a transfer over a local network that should not be too busy it should be fairly quick:
> globus-url-copy -vb file:///home/marco/dataex/largefile-marco gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/ex1
Source: file:///home/marco/dataex/
Dest:   gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/
  largefile-marco  ->  ex1
    134217728 bytes        90.85 MB/sec avg       127.00 MB/sec inst

URL formats

A quick reminder on URL formats: We've seen two kind of URLs so far.

file:///home/marco/dataex/largefile - a file called largefile on the local file system, in the directory /home/marco/dataex/.

gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/ - a directory accessible via gsiftp on the host called uct3-edge7.uchicago.edu in directory /share/osg/data/osgallhand-marco/.

Parallel streams

Trying using 4 parallel data streams by adding the -p flag with an argument of 4:

Use the following globus-url-copy command to transfer the file from uct3-edge5.uchicago.edu to the osg-edu.cs.wisc.edu:
> globus-url-copy  -p 4 -vb file:///home/marco/dataex/smallfile-marco gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/ex1
Source: file:///home/marco/dataex/
Dest:   gsiftp://uct3-edge7.uchicago.edu/share/osg/data/osgallhand-marco/
  smallfile-marco  ->  ex1
     10485760 bytes        11.11 MB/sec avg        11.11 MB/sec inst
Experiment with transferring different file sizes and numbers of parallel streams, to both local and remote sites and see how the speed varies.

Third party transfers

Next try a third-party transfer. You do this by specifying two gsiftp URLs, instead of one gsiftp URL and one file URL.

globus-url-copy will control the transfers but data will not pass through the local machine. Instead, it will go directly between the source and destination machines.

Transfer a file between two remote sites, and see if it is faster than if you had transferred it to workshop2.ci.uchicago.edu and then back out again.

Try to make up a command line for this yourself - you should use two gsiftp URLs, instead of a file url and a gsiftp URL.

Data Management 2: SRM

Getting set up

Make a working directory for this exercise. For the rest of this exercise, all your work should be done in there.
> mkdir srmex
> cd srmex

There are a few environmental variables already set for you (in OSG-Client setup):
  • SRM_CONFIG : srm client configuration file (includes installation directory and defaults)
You may define some variables for your convenience:
  • SRMEP : SRM service endpoint; srm://gwdca04.fnal.gov:8443/srm/managerv2
  • SRMPATH : Working directory on SRM storage; /pnfs/fnal.gov/data/osgedu
  • MYNAME : your login
> export SRMEP=srm://itb1.uchicago.edu:8443/srm/v2/server
> export SRMPATH=/xrootd/mount
> export MYNAME=marco

Next create a file to use for exercises:
> dd if=/dev/zero of=smallfile-$MYNAME bs=1M count=2
2+0 records in
2+0 records out
> ls -l
total 2048
-rw-r--r--  1 marco mwt2 2097152 Mar  3 17:11 smallfile-marco

Assumptions

You already have used globus-url-copy to move your files from your local machine to one of designated target machine and from a remote gridftp server to your local machine.

Basic operations

Checking the status of SRM

Use srm-ping to find out the status of SRM server on $SRMEP.

> srm-ping $SRMEP
This returns SRM version number, similar to the following.
Ping versionInfo=v2.2
Extra information
        Key=backend_type
        Value=BeStMan
        Key=backend_version
        Value=2.2.1.2.i2
        Key=backend_build_date
        Value=2009-02-09T16:24:42.000Z 
        Key=GatewayMode
        Value=Enabled
        Key=gsiftpTxfServers
        Value=gsiftp://itb1.uchicago.edu
        Key=clientDN
        Value=/DC=org/DC=doegrids/OU=People/CN=Marco Mambelli 325802
        Key=gumsIDMapped
        Value=usatlas2
        Key=staticToken(0)
        Value=ATLASDATADISK desc=ATLASDATADISK size=5368709120
        Key=staticToken(1)
        Value=ATLASPRODDISK desc=ATLASPRODDISK size=3221225472
        Key=staticToken(2)
        Value=ATLASGROUPDISK desc=ATLASGROUPDISK size=3221225472
There is a lot of information returned by srm-ping. E.g. it allows to find out that we are speaking to a BeStMan SRM server, installes in GatewayMode (it does not support space reservation, opposed to the FullMode that supports it) and that it is using GUMS (gumsIDMapped) and not a static gridmap file.

Putting a file into SRM managed storage

File transfer into SRM managed storage goes through several protocols including gridftp file transfer. This client operation communicates with SRM server through several interfaces internally; srmPrepareToPut to request your file request, srmStatusOfPutRequest to check your request, gridftp file transfer and srmPutDone to finalize the state of your file transfer.
> srm-copy file:////home/marco/srmex/smallfile-$MYNAME \
           $SRMEP\?SFN=$SRMPATH/smallfile-$MYNAME

Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT*REQUESTTYPE=put
SRM-CLIENT*TOTALFILES=1
SRM-CLIENT*TOTAL_SUCCESS=1
SRM-CLIENT*TOTAL_FAILED=0
SRM-CLIENT*REQUEST_TOKEN=put:2
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*SOURCEURL[0]=file:////home/marco/srmex/smallfile-marco
SRM-CLIENT*TARGETURL[0]=srm://itb1.uchicago.edu:8443/srm/v2/server?SFN=/xrootd/mount/smallfile-marco
SRM-CLIENT*TRANSFERURL[0]=gsiftp://itb1.uchicago.edu//xrootd/mount/smallfile-marco
SRM-CLIENT*ACTUALSIZE[0]=2097152
SRM-CLIENT*FILE_STATUS[0]=SRM_SUCCESS
SRM-CLIENT*EXPLANATION[0]=SRM-CLIENT: PutDone is called successfully
ExitCode=0

URL formats

A quick reminder on URL formats:

We've seen two kinds of URLs so far.

file:////home/marco/srmex/smallfile - a file called smallfile on the local file system, in directory /home/marco/srmex/. The appended $MYNAME is only to make the filename unique.

srm://itb1.uchicago.edu:8443/srm/v2/server\?SFN=/xrootd/mount/smallfile-marco - a SiteURL for a file name smallfile-marco on SRM running on the host called itb1.uchicago.edu and port 8443 with the web service handle /srm/v2/server in directory /xrootd/mount. SFN represents Site File Name.

Browsing a file in SRM managed storage

Now try to find out the properties of the file that you just put into SRM.
> srm-ls   $SRMEP\?SFN=$SRMPATH/smallfile-$MYNAME -fulldetailed
Upon successful completion, this returns a summary similar to the following:
SRM-DIR: Printing text report now ...
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*SURL=/xrootd/mount/smallfile-marco
SRM-CLIENT*BYTES=2097152
SRM-CLIENT*FILETYPE=FILE
SRM-CLIENT*FILE_STATUS=SRM_SUCCESS
SRM-CLIENT*FILE_EXPLANATION=Read from disk
SRM-CLIENT*OWNERPERMISSION=owner
SRM-CLIENT*LIFETIMELEFT=-1
SRM-CLIENT*FILELOCALITY=ONLINE
SRM-CLIENT*OWNERPERMISSION.USERID=owner
SRM-CLIENT*OWNERPERMISSION.MODE=RWX
SRM-CLIENT*GROUPPERMISSION.GROUPID=defaultGroup
SRM-CLIENT*GROUPPERMISSION.MODE=RX
SRM-CLIENT*OTHERPERMISSION=RX
SRM-CLIENT*LASTACCESSED=2009-3-3-18-5-49
ExitCode=0

Getting a file from SRM managed storage

Now try to get the file that you just browsed and put into SRM from the SRM managed storage to your local machine. This client operation communicates with SRM server through several interfaces internally: srmPrepareToGet to request your file request, srmStatusOfGetRequest to check your request, gridftp file transfer and srmReleaseFiles to release the file after your transfer.
> srm-copy $SRMEP\?SFN=$SRMPATH/smallfile-$MYNAME \
           file:////home/marco/srmex/my-smallfile
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT*REQUESTTYPE=get
SRM-CLIENT*TOTALFILES=1
SRM-CLIENT*TOTAL_SUCCESS=1
SRM-CLIENT*TOTAL_FAILED=0
SRM-CLIENT*REQUEST_TOKEN=get:3
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*SOURCEURL[0]=srm://itb1.uchicago.edu:8443/srm/v2/server?SFN=/xrootd/mount/smallfile-marco
SRM-CLIENT*TARGETURL[0]=file:////home/marco/srmex/my-smallfile
SRM-CLIENT*TRANSFERURL[0]=gsiftp://itb1.uchicago.edu//xrootd/mount/smallfile-marco
SRM-CLIENT*ACTUALSIZE[0]=2097152
SRM-CLIENT*FILE_STATUS[0]=SRM_FILE_PINNED
ExitCode=0
After srm-copy is completed, find out the file size at the target on your local machine:
> ls -l
total 4096
-rw-r--r--  1 marco mwt2 2097152 Mar  4 00:03 my-smallfile
-rw-r--r--  1 marco mwt2 2097152 Mar  3 17:11 smallfile-marco

Removing a file in SRM managed storage

Now try to remove the file that you put from the SRM managed storage.
> srm-rm $SRMEP\?SFN=$SRMPATH/smallfile-$MYNAME
Upon successful completion, this returns a summary similar to the following:
SRM-DIR: Total files to remove: 1
        status=SRM_SUCCESS
        explanation=null
        surl=srm://itb1.uchicago.edu:8443/srm/v2/server?SFN=/xrootd/mount/smallfile-marco
        status=SRM_SUCCESS
        explanation=null

After srm-rm returns successfully, find out the file properties of the same SURL on the SRM with srm-ls. You should see that the SURL is invalid.

Creating and removing a directory in SRM managed storage

Now try to create a directory in SRM managed storage.
> srm-mkdir $SRMEP\?SFN=$SRMPATH/$MYNAME
This will create a directory under the SRM that you can use in your SURLs. Upon successful completion, this returns a summary similar to the following:
SRM-DIR: Wed Mar 04 00:06:17 CST 2009 Calling SrmMkdir
        status=SRM_SUCCESS
        explanation=null
Browse the directory to see what kind of property information that you retrieve from SRM.

Now try to remove the directory from SRM.
> srm-rmdir $SRMEP\?SFN=$SRMPATH/$MYNAME
This will remove a directory under the SRM. Upon successful completion, this returns a summary similar to the following:
SRM-DIR: Wed Mar 04 00:06:56 CST 2009 Calling SrmRmdir
Sending srmRmdir request ...
SRM-DIR: ........................
        status=SRM_SUCCESS
        explanation=null

Summary of basic operations

Experiment with putting and getting files with different file sizes and numbers of parallel streams to and from the remote SRM site, and see the differences. When you use 4 parallel data streams by adding the -parallelism option with an argument of 4, the client operation goes through the same protocol, and the parallel streams are used in the gridftp file transfer. Larger files would make a significant difference in file transfer performance.

Experiment with directory structure in your path.

Note: Remember to remove those files and directories that you created afterwards.

For space reservation I will use a different SRM endpoint, always on UC_ITB, since the BeStMan SRM at UC_ITB does not support space reservation. This endpoint uses dCache and has a different path as well:
> export SRMEP=srm://uct3-edge6.uchicago.edu:8443/srm/managerv2
> export SRMPATH=/pnfs/uchicago.edu/data/usatlas3

Reserving a space in SRM for opportunistic use

Now, let's make a space reservation for 5M bytes of total space, 4M bytes of guaranteed space and lifetime of 900 seconds:
> srm-sp-reserve -serviceurl $SRMEP -size 5000000 -gsize 4000000 -lifetime 900
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT: Tue Mar 17 15:15:35 CDT 2009 Calling Reserve space request
SRM-CLIENT: Status Code for spaceStatusRequest SRM_SUCCESS
        explanation=
        SpaceToken=10004
        TotalReservedSpaceSize=4000000
        Retention Policy=REPLICA
        Access Latency=ONLINE

Printing text report now...

SRM-CLIENT*REQUEST_STATUS=SRM_REQUEST_INPROGRESS
SRM-CLIENT*REQUEST_EXPLANATION= at Tue Mar 17 15:15:04 CDT 2009 state TQueued : put on the thread queue
SRM-CLIENT*REQUEST-TOKEN=-2147482141
SRM-CLIENT*SPACE-TOKEN=10004
SRM-CLIENT*EXPLANATION=
SRM-CLIENT*TOTALRESERVEDSPACESIZE=4000000
SRM-CLIENT*RETENTIONPOLICY=REPLICA
SRM-CLIENT*ACCESSLATENCY=ONLINE
Upon successful space reservation, this will show you the space token which will be used in the next exercises. (e.g. 258138 from above, but it is not necessarily numbers always and different storage may return different string format.) Note that your reserved space was returned as 4MB. Let's set the returned space token as an environment variable to re-use later on:
> export SPTOKEN=10004
Finding out space properties from SRM

Now, let's find out the space information with the space token that you just received above:
> srm-sp-info -serviceurl $SRMEP -spacetoken $SPTOKEN
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT:  ....space token details ....
        status=SRM_SUCCESS
        SpaceToken=10004
        TotalSize=4000000
        Owner=VoGroup=/atlas/usatlas VoRole=software
        LifetimeAssigned=900
        LifetimeLeft=682
        UnusedSize=4000000
        GuaranteedSize=4000000
        RetentionPolicy=REPLICA
        AccessLatency=ONLINE
        status=SRM_SUCCESS
        explanation=ok

Retrieving space tokens from SRM

Supposed you lost your space token, and let’s find out how to retrieve the space tokens that belong to you:
> srm-sp-tokens -serviceurl $SRMEP 
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT: ...................................
        Status=SRM_SUCCESS
        Explanation=OK
SRM-CLIENT (0)SpaceToken=10004
This would show all the space tokens that belong to your grid identity and its mapping on the server.

Updating a space in SRM

Some time passed since the above space reservation, and the lifetime of the reserved space may be near the expiration. Now, let's update the lifetime of the space as well as the size of the space. We'llll use 7MB of total space with 6MB of guaranteed space, and make the lifetime 950 seconds:
> srm-sp-update -serviceurl $SRMEP -spacetoken $SPTOKEN -size 7000000 -gsize 6000000 -lifetime 950
Upon successful completion, this returns a summary similar to the following because the target SRM storage does not support this functionality.
SRM-SPACE: Sat Jan 12 19:09:55 CST 2008 Calling updateSpace request
        status=SRM_NOT_SUPPORTED
        explanation=can not find a handler, not implemented
        Request token=null
However, when the SRM storage supports the functionality and the request is successful, this returns a summary similar to the following.
SRM-SPACE: Sat Jan 12 21:22:50 PST 2008 Calling updateSpace request
        status=SRM_SUCCESS
        Request token=null
        lifetime=950
        Min=7000000
        Max=7000000
Your space token is the same as before, and upon successful completion, the lifetime and size of your space should be updated. Let’s find out the space information from the SRM and verify using srm-sp-info to see the new updated information.

Putting a file into the reserved space in SRM

Now let's put a file into your reserved space using the space token. This client operation communicates with the SRM server, same as before. However, because of your space token, your file will be written into the space that you have reserved. (Since the previous token expired, I requested a new one before the copy, 10006 - so you can make sense of the number in the examples).
> srm-copy file:////home/train99/srmex/smallfile-$MYNAME \
           $SRMEP\?SFN=$SRMPATH/smallfile-space-$MYNAME \
	   -spacetoken $SPTOKEN
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT*REQUESTTYPE=put
SRM-CLIENT*TOTALFILES=1
SRM-CLIENT*TOTAL_SUCCESS=1
SRM-CLIENT*TOTAL_FAILED=0
SRM-CLIENT*REQUEST_TOKEN=-2147482128
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*REQUEST_EXPLANATION= at Tue Mar 17 15:35:18 CDT 2009 state Pending : created
SRM-CLIENT*SOURCEURL[0]=file:////home/marco/srmex/smallfile-marco
SRM-CLIENT*TARGETURL[0]=srm://uct3-edge6.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/data/usatlas3/smallfile-space-marco
SRM-CLIENT*TRANSFERURL[0]=gsiftp://uct3-edge6.uchicago.edu:2811///smallfile-space-marco
SRM-CLIENT*ACTUALSIZE[0]=2097152
SRM-CLIENT*FILE_STATUS[0]=SRM_SUCCESS
SRM-CLIENT*EXPLANATION[0]=Done
After successful completion, find out the file properties with srm-ls.
> srm-ls $SRMEP\?SFN=$SRMPATH/smallfile-space-$MYNAME
Upon successful completion, this returns a summary similar to the following:
SRM-DIR: Printing text report now ...
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*REQUEST_EXPLANATION=srm-ls completed normally
SRM-CLIENT*SURL=/pnfs/uchicago.edu/data/usatlas3/smallfile-space-marco
SRM-CLIENT*BYTES=2097152
SRM-CLIENT*FILETYPE=FILE
SRM-CLIENT*STORAGETYPE=PERMANENT
SRM-CLIENT*FILE_STATUS=SRM_SUCCESS
SRM-CLIENT*OWNERPERMISSION=21003
SRM-CLIENT*LIFETIMELEFT=-1
SRM-CLIENT*LIFETIMEASSIGNED=-1
SRM-CLIENT*CHECKSUMTYPE=adler32
SRM-CLIENT*CHECKSUMVALUE=01e00001
SRM-CLIENT*FILELOCALITY=ONLINE
SRM-CLIENT*OWNERPERMISSION.USERID=21003
SRM-CLIENT*OWNERPERMISSION.MODE=RW
SRM-CLIENT*GROUPPERMISSION.GROUPID=21005
SRM-CLIENT*GROUPPERMISSION.MODE=R
SRM-CLIENT*OTHERPERMISSION=R
SRM-CLIENT*SPACETOKENS(0)=10006
SRM-CLIENT*RETENTIONPOLICY=REPLICA
SRM-CLIENT*ACCESSLATENCY=ONLINE
SRM-CLIENT*LASTACCESSED=2009-3-17-15-35-52
SRM-CLIENT*CREATEDATTIME=Tue Mar 17 15:35:52 CDT 2009
Note from the previous srm-ls output that this time it shows the space token you used when putting your file into the SRM managed storage.

Releasing the reserved space from SRM

Now let's release the reserved space using the space token.
> srm-sp-release -serviceurl $SRMEP -spacetoken $SPTOKEN
Upon successful completion, this returns a summary similar to the following:
SRM-CLIENT: Releasing space for token=10006
        status=SRM_SUCCESS
        explanation=Space released
This operation may fail if you have any files in the space associated with the space token. In such case, remove the files with srm-rm to try releasing the space again.
> srm-rm  $SRMEP\?SFN=$SRMPATH/smallfile-space-$MYNAME
After successful releasing your reserved space, find out the space properties with srm-sp-info.

Summary of space management operations

Experiment on reserving spaces with different space sizes and lifetimes, and putting your files into the reserved spaces with space token. Experiment updating the reserved space after you put your files into the reserved space. Experiment with directory structure in your SURL.

Note: Remember to remove those files and directories that you created afterwards. Also remember to release those spaces that you reserved if still active.

Running applications on other Grid sites

Objectives

So far you've seen how to use the tools in OSG client and run jobs on various grid sites, but they have all been sites specifically chosen and prepared. Now we will try running on some other sites on the grid. There are a few issues to cover. Most of the content is generic: you can use OSG Client to submit to other grids (Teragrid, WLCG/EGEE, ...). Some of the content is OSG specific.

Finding Sites

There are a number of machines that you can probably submit to: some on Open Science Grid, some on TeraGrid and some tutorial-specific machines. You need to use a different mechanism to discover the machines in each of these grids.

Finding sites on the Open Science Grid

The OSG Monitoring page gives you a good introduction to monitoring and information services.

OIM (OSG Information Management System) is the main source of information about OSG sites.

VORS (Virtual Organization Resource Selector) is another of the monitoring tools available on the Open Science Grid. It can be used to get a good view of the Open Science Grid. For instance, there is a map of sites in the OSGEDU VO. The sites in OSG ITB (Integration TestBed) should support users belonging to any OSG VO.

You can use this to check the status of many OSG sites on this list, to find out which sites are working and which sites support the OSGEDU VO or the VO which you are part of.

In the following examples I'm using again uct3-edge7 but you should use a site selected using the monitoring above.

Checking authentication

Use globusrun to verify that you are authorized to use a site and can authenticate to it:
> globusrun -a -r uct3-edge7.uchicago.edu/jobmanager-fork
GRAM Authentication test successful

Testing a Site

Let's test a site, using GRAM2:
> globus-job-run uct3-edge7.uchicago.edu /bin/date
Sun Jul 10 23:25:25 CDT 2005
Try to copy your application to the site.

Caution: Do you know where to copy the files? You can not use a random directory. Is there any temporary directory available? And if so, how do you find it?

If you plan to use any of the OSG sites, and you are authorized to do so (GRAM Authentication test successful), go to the VORS page again.

Pick a site (one that supports your VO, e.g an ITB site). Click on that site in VORS.

Under that listing, the entry labelled $APP (OSG_APP) location is the APPDIR you will be using.

The APPDIR is where you should copy your applications. (After making a separate directory, of course. You don't want your application to be messed up by other grid users, do you?)

Running a job

Now you can go ahead and
  • Create your workspace in the APPDIR
  • "Stage-in" your application with globus-url-copy
  • Execute your application

Remember to replace SITE, APPDIR and YOURUSERNAME with values that are appropriate for you.
> export SITE=uct3-edge7.uchicago.edu
> export APPDIR=/share/osg/app
> export YOURUSERNAME=marco
> globus-job-run $SITE /bin/mkdir $APPDIR/$YOURUSERNAME
> globus-url-copy file://`pwd`/primetest gsiftp://$SITE/$APPDIR/$YOURUSERNAME/primetest
> globus-job-run $SITE /bin/chmod +x $APPDIR/$YOURUSERNAME/primetest
> globus-job-run $SITE $APPDIR/$YOURUSERNAME/primetest 200 2 200
NO - 2 is a factor

Using Condor-G to submit jobs to OSG (and other grids)

As seen in the previous section Condor-G allows one to use Condor tools to submit jobs to Globus resources. It allows to send and manage easily multiple jobs. Here is an example-template for a submit file.

########################################
#                       
#  A sample Condor-G submission file
#                                        
########################################

executable = APPDIR/YOURUSERNAME/primetest
log = prime.log
output = prime.out
error = prime.err

transfer_executable = false
universe = grid
grid_resource = gt2 SITE/jobmanager
arguments  = 100 2 100

queue

Submit your prime application using this submission file to a site (below is the submission to uct3-edge7.uchicago.edu). Then you can monitor your application using condor_q.
> cat example.sub
executable=/share/osg/app/allhands/primetest
arguments  = 100 2 100
log=prime.log
output=prime.output
error=prime.error

notification=never
transfer_executable = false
universe=grid
grid_resource=gt2 uct3-edge7.uchicago.edu/jobmanager-fork

queue
> condor_submit example.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 33.

$ condor_q

-- Submitter: uct3-edge5.uchicago.edu : <206.76.233.104:36236> : uct3-edge5.uchicago.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   9.0   marco      7/10 23:39   0+00:00:00 I  0   0.0  prime             

1 jobs; 1 idle, 0 running, 0 held

Checking output

Check the output file prime.out.

Submitting multiple jobs to multiple sites using Condor-G

There are various ways to use Condor-G to submit multiple jobs to multiple sites:

Write multiple submission files and changes attributes manually or using a script. This is clumsy and difficult to manage.

Write a single common submission file and dynamically change only the attributes that need to be changed

First, we must identify the attributes that need to be changed for different instantiations of the application:

range of divisors

site names

application directories

output file names

How do we do this? By passing parameters to condor_submit:
$ condor_submit -a "arguments = $num $start $end" -a "grid_resource = gt2 $site/jobmanager" ...
The strings that you specify with -a option get added to the submission file you specify.

Write a common submission file and submit three instantiations of your prime application to three sites. Note that you have to use different output file names for each instantiation.

Create a submission file named example.sub with following contents.
####################
#
# Submission file for prime number finder
#
####################

transfer_executable = false
universe       = grid
log            = prime.log
queue
Note that we do not specify the site and arguments.

Create a directory for the output files:
$ mkdir output

Submit a job using the submission file by passing arguments to condor_submit:
$ condor_submit -a "arguments = 1000 2 1000" -a "output = output/1.out" -a "grid_resource = gt2 ufgrid05.phys.ufl.edu/jobmanager" -a "executable = APPDIR/YOURUSERNAME/prime" example.sub

Submit multiple jobs to multiple sites. Note that you have to copy your executables to the site, if it doesn't have it already. Use VORS to find the APP variables.

Inspecting the output

A simple grep through all your output files should tell whether the number is a prime or not.
$ grep NO output/*
Clean-up

If you no longer plan to use the executable you (globus-url-)copied to the remote-site(s), please go ahead and clean up your workspace(s):
$ globus-job-run SITE /bin/rm -rf APPDIR/YOURUSERNAME 

Optional exercise: Putting it all together

Write a script to submit jobs to multiple sites automatically.

Perhaps your script would contain something like this:
$ ./script.sh
Usage: ./script.pl 
    1 - Make dir
    2 - Copy exes
    3 - Run prime jobs
    4 - Grep output
    5 - Remove dir

Acknowledgments and references

Many thanks to Ben Clifford and the Open Science Grid Education, Outreach and Training group that wrote the 2008 Clemson tutorial. This document is heavily based on it.

Other tutorials or educational material: *https://twiki.grid.iu.edu/bin/view/Education/MWGS2008Syllabus

References

-- MarcoMambelli - 02 Mar 2009

I Attachment Action Size Date Who Comment
primetestEXT primetest manage 7 K 02 Mar 2009 - 17:58 MarcoMambelli  
primetest.cc primetest.c manage 623 bytes 10 Mar 2009 - 15:33 MarcoMambelli Source file for the prime test
Topic revision: r13 - 06 Aug 2011, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback