Grid Client evaluation test

Introduction

This test involves DQ2Client, specifically enduser tools, and programs used to transfer files, like: srmcp (Fermi SRM client), srm-copy (LBNL SRM client), globus-url-copy, lcg-cp, ngcp.

All but the last one are included in the wlcg-client package.

Problem

Evaluating programs used to transfer files. There are 2 kinds of problems encountered using these programs:
  1. bugs that comport the failure of the copy (sometime limited to a specific client+server combination)
  2. complications. I use this term to refer to either additional requirements (e.g. a correctly populated BDII), or to specific command line options that vary depending on the source or destination URLs or the hosts involved in the file transfer (sometime varying over time)

Here I will refer only to the "complications". These are not an indication that the program is wrong. These make more difficult the task of a program invoking them (e.g. dq2-get) or of the user.

Some of these problems became more evident when MWT2_UC updated to SRMv2 without adding SE information to the BDII. File copies that were working before stopped working because the base URL of the SE changed.

Test1: Vanilla usability test

Here I'm running a set of clients, each time with a fixed set of options, against different combinations of source and destination URLs. A negative result in the test does not mean that the program is buggy and needs to be fixed; it simply does not work with those parameters for those files. A program that can always be used with the same set of parameters is easier to use than one that requires parameters tuned to the source and destination URLs.

Tests involve gsiftp URLs, different SRM URLs (w/ w/o default port number, v1 v1) to a file URL (with 3 or 4 slashes) in different combinations, to involve 2 or 2 party transfers. All individual tests have a timeout of 5 min (are considered failed if not completed in 5 min)
srmV2BNL2UC :  SRMv2 copy from BNL to UC srm
   from: srm://dcsrm.usatlas.bnl.gov:8443/srm/managerv2?SFN=/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1  to: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/test/cptestmarco-files
rmV2BNL2UC.pool.root

srmV1 :  SRMv1 from Chicago, modified URL
   from: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: file:///tmp/marco/cptest/file2b.pool.root

srmV1BNL :  SRMv1 copy from BNL
   from: srm://dcsrm.usatlas.bnl.gov:8443/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1  to: file:///tmp/marco/cptest/file1.pool.root
srmV1np :  SRMv1 from Chicago, no port # in the SRM URL
   from: srm://uct2-dc1.uchicago.edu/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: file:///tmp/marco/cptest/file2c.pool.root

srmV2 :  SRMv2 from Chicago, URLSs as from LRC
   from: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: file:///tmp/marco/cptest/file2.pool.root

srmV2f4 :  SRMv2 from Chicago, URLSs with file:////... (4/)
   from: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: file:////tmp/marco/cptest/file2.pool.root

srm3pV1 :  SRMv1 to GSIftp copy, 3 party transfer
   from: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: gsiftp://uct3-edge1.uchicago.edu/pnfs/uct3/test/cptestmarco-file3.pool.root

gsifp :  gsiftp copy from Chicago - some URL in LRC still use it
   from: gsiftp://uct2-dc1.uchicago.edu/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root  to: file:///tmp/marco/cptest/file2d.pool.root

srmV1BNL2UC :  SRMv1 copy from BNL to UC srm
   from: srm://dcsrm.usatlas.bnl.gov:8443/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1  to: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/test/cptestmarco-filesrmV1BNL2UC.pool.root

Clients involved are the following with the corresponding command lines (the same command line is used against different URLs)
copy_commands:
 'globus': 'globus-url-copy' 
 'lcg':  'lcg-cp'
 'lcg2': 'lcg-cp -b'
 'lcg3': 'lcg-cp -b -T srmv1'
 'lcg4': 'lcg-cp -b -T srmv2'
 'lcg5': 'lcg-cp -b --vo atlas'
 'nordugrid': 'ngcp -d 2'
 'srmf': 'srmcp'
 'srmf2': 'srmcp -2'
 'srmf3': 'srmcp -1'
 'srmf4': 'srmcp -streams_num=1'
 'srml':  'srm-copy'
 'srmg': 'glite-srm-get'

The setup is:
  • source /share/wlcg-client/setup.sh default, an installation of WLCG-Client v0.11
  • export LCG_GFAL_INFOSYS="lcg-bdii.cern.ch:2170" - additional for lcg-cp when using BDII (lcg)
  • source /local/inst2/ng080721/nordugrid-arc-standalone-0.6.3/setup.sh /local/inst2/ng080721/nordugrid-arc-standalone-0.6.3/ - for nordugrid client, where the directory in question is an installation of NG standalone client
  • source /share/glite/glite/etc/profile.d/grid-env.sh - for glite-srm-get, a standalone installation of LCG WN

Test results are:
Tests srmV2BNL2UC srmV1 srmV1BNL srmV1np srmV2 srmV2f4 srm3pV1 gsifp srmV1BNL2UC
lcg               OK  
lcg2               OK  
lcg3               OK  
lcg4         OK OK   OK  
lcg5               OK  
globus               OK  
srmg                  
srml         OK OK   OK  
srmf     OK       OK    
srmf2     OK       OK    
srmf3     OK       OK    
srmf4     OK       OK    
nordugrid OK OK OK OK OK OK OK OK OK

Some comments:
  • The NorduGrid client is outperforming all the other clients
  • in glite-srm-get probably there is a configuration problem
  • globus-url-copy is failing all tests involving SRM protocol as expected (SRM is unsupported)

Some problems found on the different clients:
  • srmcp requires file URLs be specified as file:////path/name (4/) instead of the more common form file:///path/name (3/)
  • srmcp requires to be specified by hand the version of the SRM protocol (-1 or -2) else defaults to version 1
  • lcg-cp requires the -b option not to llok into the BDII. Most of the US sites are not entered correctly in BDII and the lookup would cause an error and the failure of the copy
  • lcg-cp -b (when not using BDII) rquires the exact ptotocol of both source and destination and cannot handle gsiftp entries
  • srmcp as installed in VDT-Client tryes several attempts to copy a file before giving up, spaced by several seconds. This gives to the user the impression that dq2-get is hanging. It retries also for errors impossible to recover, like when it does not understand a parameter
  • as visible in the log of the test some of the client left processes hanging well after the termination of the test

Attached is a full log of the test: testwlcgc011.out

cpwrapper.py

I wrote a wrapper for the most common copy commands (mostly srmcp and lcg-cp) so that command line parameters and URLs are modified to try to obtain a successful transfer. The python code is being contributed to DQ2Client and available in wlcg-client (to have a solution until the next dq2-client release. The python file is attached: cpwrapper.py.txt (if you download the file, rename it to cpwrapper.py)

Test2: test with WLCG-Client 0.12 (and cpwrapper.py)

wlcg-client v0.12 is using cpwrapper.py to 'straighten' the parameters each time srmcp or lcg-cp are invoked. As you can see from the new result table the wrappers improved greatly the expectance of a succesfull transfer. Anyway there are still some remarks and I still believe that ngcp performance are superior.

Tests srmV2BNL2UC srmV1 srmV1BNLSorted ascending srmV1np srmV2 srmV2f4 srm3pV1 gsifp srmV1BNL2UC
globus               OK  
srmg                  
srml         OK OK   OK  
lcg OK OK OK OK OK OK OK? OK OK
lcg2 OK OK OK OK OK OK OK? OK OK
lcg3 OK OK OK OK OK OK OK? OK OK
lcg4 OK OK OK OK OK OK OK? OK OK
lcg5 OK OK OK OK OK OK OK? OK OK
srmf   OK OK OK OK OK OK OK  
srmf2   OK OK OK OK OK OK OK  
srmf3 OK? OK OK OK OK OK OK OK  
srmf4   OK OK OK OK OK OK OK  
nordugrid OK OK OK OK OK OK OK OK OK

Notes on Test2 (wlcg-client v0.12 test)
  • I still see hanging srmcp processes after the test completion
  • checking the output directories some files that should have been there are not (in the results are marked with a question mark, as "OK?")
    • srmcp seems to fail all the 3 party transfers (BNL to Chicago), including the one marked OK
    • lcg-cp seems to fail the 3rd party transfers from MWT2_UC (SRM) to UCT3 (gsiftp)
  • in glite-srm-get probably there is a configuration problem
  • the results of srm-copy (LBNL client) are the same because there has been no URL/parameter correction for it. It may be added in the future.
  • globus-url-copy is failing all tests involving SRM protocol as expected (SRM is unsupported)

A detailed output of the test testwlcgc012.out is attached to this page.

Client's gotchas:

Reported problems on clients.

lcg-cp

Known problems:
  • VDT version reports error when quering BDII
  • not using BDII it requires full URL (srm://host:port/server_path?SFN=/filepath_with_name) and version explicitely specified on the command line
  • accepts server certificates (host, FQDN) but not service certificates (service/FQDN). See email below
From: Alan Sill 
To: OSG STORAGE 
Cc: sam-support@cern.ch
Subject: lcg-cp does not properly use service certificates; SAM tests of SE fail as a result

I'd like to request that lcg-cp be updated to recognize certificates of the form "service"/"host" (like http/FQDN) rather than its current
apparent restriction of only being able to use host certificates.

This is leading to incorrect reporting of our BeStMan-based SE as down, even though it is up, stable, and working, currently moving data
steadily at a respectable fraction of the total bandwidth available to the university and running al the time:

http://lxarda16.cern.ch/dashboard/request.py/servicehistory?servicename=sigmorgh.hpcc.ttu.edu

Thanks very much for your attention.   Please pass this on to the proper parties or let me know where I should send it if you do not know
who they are (I don't, and never get any responses from sam-support when I send them, rats).

Thanks,
Alan

srmcp

Known problems:
  • non standard file URL with 4 slashes 'file:////path/name'
    • submitted bug to Globus bugzilla:
    • discussion with srmcp developers
  • requires port number
  • requires SRM version in the command line else it defaults to 1 (and fails if the server or URL are version 2)

-- MarcoMambelli - 25 Jul 2008
I Attachment Action Size Date Who Comment
copytest.py.txttxt copytest.py.txt manage 9 K 26 Jul 2008 - 00:15 MarcoMambelli  
cpwrapper.py.txttxt cpwrapper.py.txt manage 7 K 25 Jul 2008 - 23:19 MarcoMambelli  
testwlcgc011.outout testwlcgc011.out manage 237 K 25 Jul 2008 - 22:29 MarcoMambelli  
testwlcgc012.outout testwlcgc012.out manage 175 K 26 Jul 2008 - 00:28 MarcoMambelli  
Topic revision: r8 - 11 Aug 2008, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback