Grid Client evaluation test
Introduction
This test involves
DQ2Client, specifically enduser tools, and programs used to transfer files, like:
srmcp (Fermi SRM client), srm-copy (LBNL SRM client), globus-url-copy, lcg-cp, ngcp.
All but the last one are included in the
wlcg-client package.
Problem
Evaluating programs used to transfer files.
There are 2 kinds of problems encountered using these programs:
- bugs that comport the failure of the copy (sometime limited to a specific client+server combination)
- complications. I use this term to refer to either additional requirements (e.g. a correctly populated BDII), or to specific command line options that vary depending on the source or destination URLs or the hosts involved in the file transfer (sometime varying over time)
Here I will refer only to the "complications". These are not an indication that the program is wrong. These make more difficult the task of a program invoking them (e.g.
dq2-get
) or of the user.
Some of these problems became more evident when MWT2_UC updated to SRMv2 without adding SE information to the BDII.
File copies that were working before stopped working because the base URL of the SE changed.
Test1: Vanilla usability test
Here I'm running a set of clients, each time with a fixed set of options, against different combinations of source and destination URLs.
A negative result in the test does not mean that the program is buggy and needs to be fixed; it simply does not work with those parameters for those files.
A program that can always be used with the same set of parameters is easier to use than one that requires parameters tuned to the source and destination URLs.
Tests involve
gsiftp
URLs, different
SRM
URLs (w/ w/o default port number, v1 v1) to a file URL (with 3 or 4 slashes) in different combinations, to involve 2 or 2 party transfers. All individual tests have a timeout of 5 min (are considered failed if not completed in 5 min)
srmV2BNL2UC : SRMv2 copy from BNL to UC srm
from: srm://dcsrm.usatlas.bnl.gov:8443/srm/managerv2?SFN=/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1 to: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/test/cptestmarco-files
rmV2BNL2UC.pool.root
srmV1 : SRMv1 from Chicago, modified URL
from: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: file:///tmp/marco/cptest/file2b.pool.root
srmV1BNL : SRMv1 copy from BNL
from: srm://dcsrm.usatlas.bnl.gov:8443/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1 to: file:///tmp/marco/cptest/file1.pool.root
srmV1np : SRMv1 from Chicago, no port # in the SRM URL
from: srm://uct2-dc1.uchicago.edu/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: file:///tmp/marco/cptest/file2c.pool.root
srmV2 : SRMv2 from Chicago, URLSs as from LRC
from: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: file:///tmp/marco/cptest/file2.pool.root
srmV2f4 : SRMv2 from Chicago, URLSs with file:////... (4/)
from: srm://uct2-dc1.uchicago.edu:8443/srm/managerv2?SFN=/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: file:////tmp/marco/cptest/file2.pool.root
srm3pV1 : SRMv1 to GSIftp copy, 3 party transfer
from: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: gsiftp://uct3-edge1.uchicago.edu/pnfs/uct3/test/cptestmarco-file3.pool.root
gsifp : gsiftp copy from Chicago - some URL in LRC still use it
from: gsiftp://uct2-dc1.uchicago.edu/pnfs/uchicago.edu/data/ddm1/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2_sub02082889/user.RikutaroYoshida.test.22July.fdr2.0052304.Egamma.mwt2.AANT._00001.root to: file:///tmp/marco/cptest/file2d.pool.root
srmV1BNL2UC : SRMv1 copy from BNL to UC srm
from: srm://dcsrm.usatlas.bnl.gov:8443/pnfs/usatlas.bnl.gov/ESD01/valid1/ESD/valid1.005200.T1_McAtNlo_Jimmy.recon.ESD.e322_s454_r489_tid023919_sub02072646/ESD.023919._00001.pool.root.1 to: srm://uct2-dc1.uchicago.edu:8443/pnfs/uchicago.edu/test/cptestmarco-filesrmV1BNL2UC.pool.root
Clients involved are the following with the corresponding command lines (the same command line is used against different URLs)
copy_commands:
'globus': 'globus-url-copy'
'lcg': 'lcg-cp'
'lcg2': 'lcg-cp -b'
'lcg3': 'lcg-cp -b -T srmv1'
'lcg4': 'lcg-cp -b -T srmv2'
'lcg5': 'lcg-cp -b --vo atlas'
'nordugrid': 'ngcp -d 2'
'srmf': 'srmcp'
'srmf2': 'srmcp -2'
'srmf3': 'srmcp -1'
'srmf4': 'srmcp -streams_num=1'
'srml': 'srm-copy'
'srmg': 'glite-srm-get'
The setup is:
-
source /share/wlcg-client/setup.sh
default, an installation of WLCG-Client v0.11
-
export LCG_GFAL_INFOSYS="lcg-bdii.cern.ch:2170"
- additional for lcg-cp
when using BDII (lcg
)
-
source /local/inst2/ng080721/nordugrid-arc-standalone-0.6.3/setup.sh /local/inst2/ng080721/nordugrid-arc-standalone-0.6.3/
- for nordugrid client, where the directory in question is an installation of NG standalone client
-
source /share/glite/glite/etc/profile.d/grid-env.sh
- for glite-srm-get
, a standalone installation of LCG WN
Test results are:
Some comments:
- The NorduGrid client is outperforming all the other clients
- in
glite-srm-get
probably there is a configuration problem
-
globus-url-copy
is failing all tests involving SRM protocol as expected (SRM is unsupported)
Some problems found on the different clients:
-
srmcp
requires file URLs be specified as file:////path/name
(4/) instead of the more common form file:///path/name
(3/)
-
srmcp
requires to be specified by hand the version of the SRM protocol (-1 or -2) else defaults to version 1
-
lcg-cp
requires the -b option not to llok into the BDII. Most of the US sites are not entered correctly in BDII and the lookup would cause an error and the failure of the copy
-
lcg-cp -b
(when not using BDII) rquires the exact ptotocol of both source and destination and cannot handle gsiftp
entries
-
srmcp
as installed in VDT-Client tryes several attempts to copy a file before giving up, spaced by several seconds. This gives to the user the impression that dq2-get
is hanging. It retries also for errors impossible to recover, like when it does not understand a parameter
- as visible in the log of the test some of the client left processes hanging well after the termination of the test
Attached is a full log of the test:
testwlcgc011.out
cpwrapper.py
I wrote a wrapper for the most common copy commands (mostly
srmcp
and
lcg-cp
) so that command line parameters and URLs are modified to try to obtain a successful transfer.
The python code is being contributed to
DQ2Client and available in wlcg-client (to have a solution until the next dq2-client release.
The python file is attached:
cpwrapper.py.txt (if you download the file, rename it to
cpwrapper.py
)
Test2: test with WLCG-Client 0.12 (and cpwrapper.py)
wlcg-client
v0.12 is using
cpwrapper.py
to 'straighten' the parameters each time
srmcp
or
lcg-cp
are invoked.
As you can see from the new result table the wrappers improved greatly the expectance of a succesfull transfer.
Anyway there are still some remarks and I still believe that
ngcp
performance are superior.
Tests |
srmV2BNL2UC |
srmV1 |
srmV1BNL |
srmV1np |
srmV2 |
srmV2f4 |
srm3pV1 |
gsifp |
srmV1BNL2UC |
lcg |
OK |
OK |
OK |
OK |
OK |
OK |
OK? |
OK |
OK |
lcg2 |
OK |
OK |
OK |
OK |
OK |
OK |
OK? |
OK |
OK |
lcg3 |
OK |
OK |
OK |
OK |
OK |
OK |
OK? |
OK |
OK |
lcg4 |
OK |
OK |
OK |
OK |
OK |
OK |
OK? |
OK |
OK |
lcg5 |
OK |
OK |
OK |
OK |
OK |
OK |
OK? |
OK |
OK |
globus |
|
|
|
|
|
|
|
OK |
|
srmg |
|
|
|
|
|
|
|
|
|
srml |
|
|
|
|
OK |
OK |
|
OK |
|
srmf |
|
OK |
OK |
OK |
OK |
OK |
OK |
OK |
|
srmf2 |
|
OK |
OK |
OK |
OK |
OK |
OK |
OK |
|
srmf3 |
OK? |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
|
srmf4 |
|
OK |
OK |
OK |
OK |
OK |
OK |
OK |
|
nordugrid |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
Notes on Test2 (wlcg-client v0.12 test)
- I still see hanging
srmcp
processes after the test completion
- checking the output directories some files that should have been there are not (in the results are marked with a question mark, as "OK?")
-
srmcp
seems to fail all the 3 party transfers (BNL to Chicago), including the one marked OK
-
lcg-cp
seems to fail the 3rd party transfers from MWT2_UC (SRM) to UCT3 (gsiftp)
- in
glite-srm-get
probably there is a configuration problem
- the results of
srm-copy
(LBNL client) are the same because there has been no URL/parameter correction for it. It may be added in the future.
-
globus-url-copy
is failing all tests involving SRM protocol as expected (SRM is unsupported)
A detailed output of the test
testwlcgc012.out is attached to this page.
Client's gotchas:
Reported problems on clients.
lcg-cp
Known problems:
- VDT version reports error when quering BDII
- not using BDII it requires full URL (srm://host:port/server_path?SFN=/filepath_with_name) and version explicitely specified on the command line
- accepts server certificates (host, FQDN) but not service certificates (service/FQDN). See email below
From: Alan Sill
To: OSG STORAGE
Cc: sam-support@cern.ch
Subject: lcg-cp does not properly use service certificates; SAM tests of SE fail as a result
I'd like to request that lcg-cp be updated to recognize certificates of the form "service"/"host" (like http/FQDN) rather than its current
apparent restriction of only being able to use host certificates.
This is leading to incorrect reporting of our BeStMan-based SE as down, even though it is up, stable, and working, currently moving data
steadily at a respectable fraction of the total bandwidth available to the university and running al the time:
http://lxarda16.cern.ch/dashboard/request.py/servicehistory?servicename=sigmorgh.hpcc.ttu.edu
Thanks very much for your attention. Please pass this on to the proper parties or let me know where I should send it if you do not know
who they are (I don't, and never get any responses from sam-support when I send them, rats).
Thanks,
Alan
srmcp
Known problems:
- non standard file URL with 4 slashes 'file:////path/name'
- submitted bug to Globus bugzilla:
- discussion with srmcp developers
- requires port number
- requires SRM version in the command line else it defaults to 1 (and fails if the server or URL are version 2)
--
MarcoMambelli - 25 Jul 2008