BestmanDcache

Introduction

We're experiencing poor performance from the dCache implementation of SRM, on the order of 2 transactions/sec. According to Pedro @ BNL even when fully tuned we can expect about 6 transactions/sec. Looking for better performance, we're setting up a trial of BestmanGateway, load balancing between existing dCache Gridftp doors iut2-dc1 and iut2-dc3. We will write into /pnfs/iu.edu/test, which is an unreserved area.

Installation

Using the OSG ReleaseDoc BestmanGateway. My configure command was:
$VDT_LOCATION/vdt/setup/configure_bestman --server y \
 --user daemon \
 --cert /etc/grid-security/bestman/hostcert.pem  \
 --key  /etc/grid-security/bestman/hostkey.pem  \
 --http-port  8080  \
 --https-port  8443  \
 --gums-host  uct2-grid4.uchicago.edu  \
 --gums-port  8443  \
 --enable-gateway  \
 --with-allowed-paths="/pnfs/iu.edu/test/;/pnfs/iu.edu/dzero/;/tmp/"  \
 --with-tokens-list  "ATLASDATADISK[desc:ATLASDATADISK][40000];ATLASPRODDISK[desc:ATLASPRODDISK][30000];ATLASGROUPDISK[desc:ATLASGROUPDISK][30000]"   \
 --with-transfer-servers "gsiftp://iut2-dc1.iu.edu;gsiftp://iut2-dc3.iu.edu"

I am using the config option to use size-checking via gridftp rather than file system, because PNFS does report size correctly for large files.

Bestman initially won't start because there's no certificate dir. I used a symlink:
[root@iut2-s3 bestman]# ln -s /etc/grid-security/certificates/ /opt/bestman/globus/TRUSTED_CA
srm-ping now works.

srm-copy fails with 'Dir structure does not exist.srm://iut2-s3.iu.edu:8443/srm/v2/server?SFN=/pnfs/iu.edu/test/'. Nothing useful in the log. I created a user 'bestman', re-ran configure, and changed ownership on /etc/grid-security/bestman/*.pem and the log file. Bestman started and wrote the file successfully.

Java version note (added by cgw): the VDT package comes with Java which we don't need, since Java is already installed. I've modified the installation slightly to work with the "system" Java 1.6 installed in /usr/bin. Note that the /opt/bestman/setup.sh script contains
LD_LIBRARY_PATH="/opt/bestman/jdk1.5/jre/lib/i386:/opt/bestman/jdk1.5/jre/lib/i386/server:/opt/bestman/jdk1.5/jre/lib/i386/client:$LD_LIBRARY_PATH" which refers to directories that don't even exist - this is a 64-bit host and there are no 'i368' directories.

I (cgw) commented out the offending lines in setup.{sh,csh} and now it seems to be working just fine with /usr/bin/java and no superfluous environment variables.

Ooops, JAVA_HOME is also hardcoded into all the files in /opt/bestman/bestman/{bin,sbin}. Backed these up to ORIG and made the following mod:

sed -i s,JAVA_HOME=/opt/bestman/jdk1.6,JAVA_HOME=/usr, *

Test 1

Test 1 was done using a home-brew script, which timed lcg-cp into storage with a space token. My method was to use 30 worker nodes, each doing 10 sequential writes of a 1 byte file. I ran the BestMan & dCache tests simultaneously, and both srm gateways are running on the same node (iut2-dc1). The results were:

  • dCache: MAX: 90.69 MIN: 12.47 AVERAGE: 54.23 seconds
  • Bestman: MAX: 89.47 MIN: 3.38 AVERAGE: 53.51 seconds

Tuning

Does Bestman cache Dn/username mappings? Brian B says yes

From Alex Sim via Dan:
ok, Alex was saying that with a dedicated 
quad node /4 GB, talking through GUMS, you might expect 20 transactions/sec
he was not sure if you would get a speed up by putting bestman on dCache
GUMS slows this down quite a bit.
factor of 2 probably

Brian Brockleman says:
Are you trying transfers or SRM operations?  I would 
recommend benchmarking them separately, starting with srmLs (by far, one of the 
most common SRM operations).  I also recommend using lcg-ls and lcg-cp in your tests.

In $VDT_LOCATION/bestman/conf/server-config.wsdd, replace

  <parameter name="containerThreads" value="5"/>

with

  <parameter name="containerThreads" value="1000"/>

or some other large number.
Michael Thomas says:
When you increase the number of containerThreads, you'll 
also need to increase the java heap size in bestman.server.  We run with 512 container 
threads and 4GB of java heap for our bestman server.  With 512 threads, I found that even 
2GB was not enough.

Test 2

I tried boosting containerThreads to 256 and the heap to 2G. Then I tested with 60 writers. Results this time are more encouraging:
  • Bestman: MAX: 255.63 MIN: 0.02 AVERAGE: 98.9934918032786
  • dCache: MAX: 405.60 MIN: 0.02 AVERAGE: 146.67931147541

Test 3

Test 2 was done using Brian Brockleman's srm-punch script, found at svn://t2.unl.edu/brian/se_testkit.

At the meeting this week, Alex asked for some tests using Brian's utility to get measurements in hertz. To get this measurement from dCache, I needed to turn up the loglevel in axis-trace.log to ALL.

Using srm_punch.py, I ran 7 threads, 100 iterations each, of srmLs. The results were 3hz using dCache SRM, 28hz for BeStMan SRM.

Further work

In looking at the axis-trace log, I think this was probably not a fair test. The dCache srm is in production, so it is doing a variety of operations, while BeStMan only runs srmLs from the testing script.

What I'd like to do is use the axis-trace log to create a simulation of our production load. I've put a sample from that log here: http://www.mwt2.org/~sarah/axis-trace.log

I have an idea that some of the operations should be combined - I think lcg-cp does GetSpaceTokens, PrepareToPut, StatusOfPutRequest, and PutDone. srmLs, Rm, Mkdir and ping seem self-explanatory. I'm not sure what produces ReleaseFiles, setFileStatus, or srmGetSpaceMetaData.

References

Logs & Configuration File Locations

Module Name Configuration files Log files
BeStMan $VDT_LOCATION/bestman/conf/bestman.rc
$VDT_LOCATION/bestman/conf/server-config.wsdd
$VDT_LOCATION/vdt-app-data/bestman/logs/event.srm.log
GridFTP $VDT_LOCATION/vdt/services/vdt-run-gsiftp.sh.env $VDT_LOCATION/globus/var/log/gridftp.log
$VDT_LOCATION/globus/var/log/gridftp-auth.log

-- SarahWilliams - 01 Sep 2009

This topic: Main > WebHome > BestmanDcache
Topic revision: 03 Sep 2009, SarahWilliams
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback