Development meeting.
Some assumptions (to focus development, may be removed in the futures):
files are in LRC
develop a local service (for OSG Tier2 and similars, with skims running on local queue, distributed runs later)
no dependency on external components (more reliable)
maintain visibility as a independent software, not included in bigger packages
able to work with both local copy and shared access to the input files
hardcode most of the choiches: keep the software flexible but don't expose too many options to the users
Currently scripts can be coded both to access files from a shared file system (NFS and similar, dcache) or to copy files locally to the execution directory.
Option 1 is possible if a shared file system supported by root is provided: NFS (and similars, e.g. GPFS, ...), dCache
Option 2 is possible for all FS currently supported in panda production: DMU is used by both
Would be useful to do some scalability/performance test
Provenance:
TAG (DB and root file) provenance could be stored in the TAG DB in a dedicated table since they are closely related (DS, TAG root files DS, date, trf used)
Skim provenance should probably in a separate DB: Skim DB (TAG table, date, query, ...). This DB should be available to access the information from the outside (which TAGs are at MWT2?) but should be specific to the Site, not shared (at least for now)
A shared Skim DB would brin the problem of Skim equivalence that has yet to be defined: under which condition are 2 skims equivalent?
produced from the same DS and TAG?
never (only if they are replicas of the same skim)?
Skim Composers:
users expect a certain structure (prepackaged queries with summary data on the DS)
no ideal reusable composer is known
Julius interface is an option (download JAS and use it)
check with Sasha and Torre for existing possibilities
looking for a flexible MySQL GUI (to execute query, hiding DB structure - no phpMyAdmin)
Xrootd for building TAG. It's at early stage and not directly connected to DSS. For now we should monitor the development. A lot of people don;t want ot deal with relational DB for site services (install them, replicate them, ...). Therefore here is a proposal to use Root TAG files as an alternative to the TAG DB. Tools will be provided to interact with them.
Tadashi interested in integrating DSS with Pathena
Dietricht interested in integrating DSS with Ganga
There is the plan to build TAGs for CSC production. Jack will follow it closely.
We did not do any scalability test and don't know how the sistem will behave with big DSs. Christo did tests on DBs with 10 mil rows. Effects show up. The system does not break but performance is affected.
Skimminng not only AODs?
E.g. ESD or RAW data could be skimmed
no big requests so far: Tom is skeptical but it is a possibility that may be requested in the future
for now DSS will work with AODs. Go ahead and hardcode that (do not expose alternatives to the user) but be prepared anyway for ESD and RAW (keep the sw flexible)
Current TRFs work for AOD. Jack will check if they are more general than expected (at least the ones used in extraction - TAGs will be still generated from AODs).
Jerry is working on the tools to extract GUIDs/LFNs from the collection with the Skim and on job splitting
Marco is workin on merging pieces together and interactions with PBS.
We will touch base by the end of the week (status by email).
Next meeting 4/18 at 10am, direct phone call (Jack and Jerry )
-- MarcoMambelli - 11 Apr 2007