Execution of Analysis jobs on ANALY_MWT2

Introduction

Jobs have been submitted using Pathena (from uct3-edge5). Results have been checked in the Panda DB with mysql client typing the queries. jobsArchived4 is the table containing my analysis jobs. File results have been checked using DQ2 enduser clients.

Job description

The Job executed is an example available in DPD production:
  • Each Pathena job has 1 build job.
  • Each job is split in 1 job per input file, this makes 21 jobs (input dataset has 21 files)
  • It copies with dccp on input AOD to the run directory (in /scratch)
  • Runs AnalyzeJpsiphi.py
  • At the end copies the file back to the USERDISK (using lcg-cp)

Each Pathena job has 1 build job. Each job is split in 1 job per input file, this makes 21 jobs (input dataset has 21 files)

Results

Generic statistics:
  • number of submission: 250 Pathena jobs (repetitions of the same job)
  • resulting Panda jobs: 5250 Panda jobs
    • 250 build jobs
    • 5000 run jobs
  • Build jobs
    • finished 247
    • failed 3
  • Run jobs
    • finished 5138
    • failed 112 (63 of which never started due to build job failure)

CPU Use (kSI2kseconds)
Job AVG min Max Total
finished BUILD Jobs 453.34 388 933 111974
failed BUILD Jobs 271.00 0 416 816
finished RUN Jobs 1299.36 481 3803 6676092
failed RUN Jobs 322.15 0 1989 36081
Total 1240.90 0 3803 6824960

CPU types are: Quad-Core AMD Opteron(tm) Processor 2350 512 KB and Dual Core AMD Opteron(tm) Processor 285 1024 KB

Wall Time Use (seconds)
Job AVG min Max Total
finished BUILD Jobs 4045.72 2226 6673 999293
failed BUILD Jobs 4289.33 1 6533 12868
finished RUN Jobs 10818.82 873 767584 55576278
failed RUN Jobs 16042.62 1 50218 770046
Total 10867.19 1 767584 56346324

Not started (63 failed run jobs) jobs are excluded from the wall-time count due to wrong entries in the DB.

Each Pathena job completing successfully reads one dataset with 2 files, produces 3 datasets, one output directory, 21 root files and 21 log files:
  • One dataset is used for the input files (DSname_shadow) and has no replicas at the end of the job
  • The other 2 datasets contain the same 42 files (21 root files and 21 log files): DSname and DSname_subXXX
  • most of the root files are around MB (except the last one of the job)
  • log files size varies (and are generally smaller)
  • below are statistics about both 1 successfully completed job (1J) and for the whole sample
  • File sizes are always measured in MB (10^6 bytes) unless otherwise specified
  • Estimated total events written: 66210470 (one per read events, excluding failures)

File type AVG min Max Total
Root files 1J 63.0 38.4 68.8 1323.1
LOG files 1J 0.21 0.16 0.22 4.4
Total 1J 31.6 0.16 68.8 1327.5
Root files 63.0 0 68.9 322771.3
LOG files 0.21 0 0.57 1080.5
Total 30.7 0 68.9 326171.5

The input dataset is fdr08_run2.0052283.physics_Jet.merge.AOD.o3_f47_m26:
  • 21 files
  • 37.5 GB
  • 270615 events
  • it has been read by each job (that passed the build phase)
  • total events read: 66841905

The job is not really a skim, the skim ratio is 100% (all events are written to the output)

Plot from Charles

Nice plot that shows 5000 jobs completing:

marcos-jobs.png

Conclusion

The jobs caused some trouble in the cluster, specially for the gatekeeper and the NFS server for the home directories.

Anyway it is not possible to check now whether pathena is abusing the gass cache, since there is no track of the data flow. That has to be done while the job is running. These analysis jobs have nothing special, different from others:
  • pathena is staging the pilot and its auxiliary files using Globus gass-cache
  • the jobs use the movers to copy input and output files

Some additional information

-- MarcoMambelli - 26 Nov 2008
I Attachment Action Size Date Who Comment
failed-jobs-detail.txttxt failed-jobs-detail.txt manage 1 K 02 Dec 2008 - 00:31 MarcoMambelli Failed jobs detail
jobs-db-description.txttxt jobs-db-description.txt manage 23 K 02 Dec 2008 - 00:30 MarcoMambelli Jobs DB description
marcos-jobs.pngpng marcos-jobs.png manage 45 K 02 Dec 2008 - 00:20 MarcoMambelli Charle's plot
sample-queries-files.txttxt sample-queries-files.txt manage 1 K 02 Dec 2008 - 00:32 MarcoMambelli Queries about files (LFC DB)
sample-queries-jobs.txttxt sample-queries-jobs.txt manage 8 K 02 Dec 2008 - 00:30 MarcoMambelli Queries about jobs
Topic revision: r2 - 02 Dec 2008, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback