Tier2 Data Services

Overview

A number of Tier2-centric data services need to be operational in order to efficiently use the cluster, network, and storage resources of a Tier2 center, and to provide optimal access to (primarily) AOD, ESD and TAG datasets for ATLAS physcists. Initial focus is on development of a Data Skimming Service which uses metadata and TAG database queries to efficiently process Tier2-based AOD processing tasks.

Description

Deliver a service to provide simple and efficient access to, and event extraction from, datasets locally resident at Tier2 centers. Considerations:

A Tier2 center will have a full replica of all ATLAS AOD datasets, according to the Computing TDR.
Focus on the most common use cases for data skimming
- Do these simple, standard tasks more efficiently and easily than what a physicist would need to do himself.
- Articulate these use cases. For example:
  1. Tell me what datasets are available at the Tier2 (eg. using DDM browser). Add analysis checks:
    - I the dataset complete?
    - Are they accessible - fileservers working?
    - Is the catalog contents consisent with disk?
  2. Tell me the content and format (containers, POOL format, etc).
  3. Give me all events subject to (cut set 1, cut set 2), saving only the (jets, electrons) objects in the output file.
  4. Put those files into the Tier2 output buffer, easily accessible to local, Tier3, etc.
  5. Let me know when its complete, and if there were any errors. Also, how long I have to get them before the space is reclaimed.
Provide this initially as a local service
- Focus on simplicity of design and operation.
- Avoid grid services, at least initially, but use various backends if available (eg. Pathena for distributed skimming).
- Avoid remote, centralized services, systems and catalogs where possible.

More discussion points available here: DssDisccusion.

Software Components

Tier2 infrastructure and services

Setup of a data services machine (tier2-06.uchicago.edu)
DatasetsUCdCache - Notes about setting up datasets at UC dCache (UC_VOB)
DQ2Subscriptions - how to subscribe to a dataset
Pathena Tests - backend for distributed skimming
TagDBMysql - database schema testing
TagDBView - Looking at the data in the TAG DB
TagQueryEngine - initial ideas
SkimTest061212 - automated Skim test and result validation
TagNavigatorToolOsg - Using the TAG Navigator Tool in an OSG environment
UsingDQ2AtUC - How to set up and use DQ2 User Tools

Project plan

The project fits into the US ATLAS project plan in the Data Services section of the Facilities plan (WBS 2.3.4).

WBS	Deliverable	Description	Resource	Milestones
1	Dataset Skimming Service
1.1	Design and Plan	Service specification and plan	RG, DM, JC, MM	Delivery of design, Nov06	edit
1.2	Components & tools	Identification, coding, dependencies	MM, JC, JG	Feb07 (Rel 0.1); Apr07 (Rel 0.2); Jun07 (Rel 0.3)	edit
1.3	DSS Prototype	End-to-end testing	MM, JG, JC	Dec 06	edit
1.4	DSS Production service	Packaged, deployable set of services	All	Aug07 (Release 1.0)	edit
2	Tier2 Database Infrastructure
2.1	TAG database	Server, schema, deployment	JG, JC, MM	Sep06 (prototypes); with DSS releases.	edit

Meeting notes and action items

Tier2 Database Infrastructure

What are the Tier2 activities associated with providing the appropriate ATLAS database infrastructure necessary for Monte Carlo production and AOD analysis?

Discussions from the Tier2 Workshop

General agreement that it was necessary to provide Tier2 centers with a recipe for building/providing the necessary database infrastructure, including Mysql database services, Squid caches, etc.

Meeting website: http://www.usatlas.bnl.gov/twiki/bin/view/Admins/Tier2Workshop0606
Storage and database services program of work, here.

Tag and Event Store

Conditions and Tag databases

Action items:

Test database services for Condition and Tag databases
Start with the current model for deployment and access
Develop requirements based on use cases
Provide a deployment recipe (Pacman-based) and documentation.

Calibration and Alignment Challenge

Check the readiness for the Calibration Alignment challenge (CAC will be this fall, after Release 13, at the end of September). This must be done for all Tier2 centers, and should use the results of prototyping activities.

Provide a database service (one machine with DB server hosting Tag and Conditions DB - will be finalized by prototyping):
- Tag: probably a single dedicated mysql server.
- Conditions: probably file based database like SQLlite, but may be mysql.
- Access to these services will be local, from within the Tier2 site.
Have at least one static replica of the whatever Calibration and Conditions are needed for the challenge (less than 1 TB).
Be prepared to have one dedicated server with a 1 TB of space.

DataServices Web Utilities

-- MarcoMambelli - 25 May 2006 -- RobGardner - 07 Aug 2006

I	Attachment	Action	Size	Date	Who	Comment
jpg	dss-v1.jpg	manage	61 K	06 Nov 2006 - 16:20	RobGardner	dataset skimming service, v1
jpg	dss-v2.jpg	manage	66 K	15 Nov 2006 - 21:49	RobGardner	v2
txt	robots.txt	manage	26 bytes	31 May 2007 - 19:23	UnknownUser

This topic: DataServices > WebHome
Topic revision: 02 Aug 2007, MarcoMambelli

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback