Renewing grid cert, verify ATLAS membership status
Hello ,
here is the page of the VOMRS server at CERN:
https://lcg-voms.cern.ch:8443/vo/atlas/vomrs
You need your certificate loaded into the browser in order to open this
page and you'll see your identity in red at the bottom of the page.
In order to verify all the details of your registration click on "Member
info" in the menu on the left.
One of the information is the expiration of your VO (ATLAS) membership.
If you click on the [+] next to Member info, you will see another menu
"Re-sign Grid and VO AUPs" that allows to sign the ATLAS User Policies to
renew your membership
These may be other useful links about grid certificates:
https://www.racf.bnl.gov/docs/howto/grid/voatlas
https://www.racf.bnl.gov/docs/howto/grid/renewcert
https://www.racf.bnl.gov/docs/howto/grid/installcert
Cheers,
Marco
VDT's Grid-monitor
Sarah, all,
I think both samgrid and atlas are using the grid-monitor to submit jobs, at least normally.
Jaime can correct me if I'm wrong in the next explanation.
The use of grid-monitor is initiated by the client. Condor-G has it enabled as default in the VDT installation.
grid-monitor use by the client triggers the start of a grid-monitor process on the server that then handles GRAM job submission.
The server has to be "grid-monitor enabled" and have a grid-monitor process running in order to use it.
If this process is not there it is restarted.
If the restart fails Condor-G falls back to normal GRAM submission
All this (grid-monitor communication, restart, fall back to normal submission, ...) would be visible in the GridManager log file on the
submit host.
My suspect is that because of the load on the gatekeeper (due to a regular increase of activity and the current configuration - timeouts
and queue length) or because some other reason (network load? ...) the grid-monitor failed to restart or at least the client failed to
acknowledge the presence of the grid-monitor and started sending regular GRAM jobs increasing even more the load on the gatekeeper.
This until a decrease in activity brought the load back to normal and the now responsive gatekeeper was able to start a grid-monitor
(Condor-G periodically retries to restart the grid-monitor) and behaving OK.
In order to find out more we should match the Condor-G GridManager and schedd log files with the one of the gatekeeper and its load and
build a common timeline
If we want to embark in this we need the colaboration of the people managing the submit hosts (Torre or Dantong? at BNL for usatlas
someone? for samgrid) to get the log files and maybe involve someone from VDT (Jaime?).
I'm willing to do the work (comparing log files, building a timeline) or helping whoever does it.
Cheers,
Marco
Authentication failed (No valid credentials)
"No valid credentials" may refer to any of the following:
a) locally skewed clock
b) locally outdated CA keys
c) locally outdated CRL list
d) locally misplaced credentials
e) locally missing environment variables
or anything else - "send2nsd" method comes from some gLite library.
Error: Could not establish authenticated connection with the server.
GSS Major Status: Authentication Failed
...
globus_gsi_callback.c:globus_i_gsi_callback_check_revoked:1158:
Invalid CRL: The available CRL has expired
In the certificate directories there are CRLs (Certificate Revocation Lists) that need to be uptodate.
The script
fetch-crl/sbin/fetch-crl should be run periodically (cron) to update the CRLs.
Check if running that fixes the problem.
Dataset replication
If you want data moved you need a request for replication. You can see the available
pathena queues from:
http://panda.cern.ch:25880/server/pandamon/query?dash=analysis
see under "Status of pathena analysis queues".
There is a FAQ on data replication request:
https://twiki.cern.ch/twiki/bin/view/Atlas/DAonPanda#How_can_I_ask_for_data_replicati
Pathena
> I know I can submit a job with a command like:" pathena
> singleMuon_200_pileup_digitization_1.py --inDS
> user09.ZenoDDickGreenwood.singleMuon_200_30GeV_slhc_Hits.1 --outDS
> user09.ZenoDDickGreenwood.singleMuon_200_30GeV_slhc_pileup_15.2 --site
> ANALY_BNL_ATLAS_1
> --minDS=user08.ZenoDDickGreenwood.min_bias.300.slhc.Hits.1v5 --tmpDir
> /tmp --nMin=3"
> Hos do I use multiple minbias DSs?
You need to create a container dataset out of the minbias datasets you made.
You can take a look here:
https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo#I_want_to_create_a_container
Then you can supply the name of the contanier dataset to pathena with
--minDS option as before.
I Cc Tadashi if he may have any further comments.
Custom output file with same name in different jobs
How to add in Pathena custom output files that will be copied to storage and available in DQ2?
--extOutFile=fname1,fname2 allows you to specify extra output files.
--extOutFile option: even if the jobs are split, Pathena automatically prepends a unique dataset name + job number to each .dat file and registers the results in dq2 catalog. Even if the dataset name would not be in the file name, that should not be a problem: different DS may have different files with same name. dq2-get will put files in subdirectories named after the dataset.
--
MarcoMambelli - 08 Jul 2008