Tier2 Data Skimming Service (DSS)

Questions about the Data Skimming Service

  1. Given a selection of events (obtained how? represented how?), skim the selected events from a chosen dataset (chosen how? referenced how?).
  2. There will be a service, such as a persistent web service, which receives requests in some format and on the backend does the necessary job construction, submission, management, monitoring, output data handling, notification, etc.
  3. The high level request obviously gets split into a number of jobs to be submitted to the Tier2 facility (in the case of MWT2, there would be at least two clusters to submit too, which may have a common scheduling system above them such as Condor).
  4. A job will receive the "selection", suitably expressed (perhaps an XML description), and use the full AOD collection, if necssary, to produce as output one or more files containing the selected events. The produced result has to be visible in the view above; a notification (email, callback, ...) of job completion has to be possible.
  5. The processes executing the selection have to be as optimized as the one that a user could use (e.g. use data locality, manage, re-try and failures)
  6. Allow "filters" to write out only piece of the event, if necessary. How do we describe this in Athena?
  7. Facilitate selection modalities:
    • Select events from queries to the Tag database
    • Perform random or regular sampling of the dataset. For example, select at random 10% of the events over the appropriately weighted luminosity running periods.
  8. For MWT2, datasets will be resident in dCache pools at either site. Each site should advertize which datasets are available, and the DSS should match job to location. Thinking of using a single Condor pool across UC and IU to make this simple. Users should not need to know which site has their data.
  9. Understand how data is organized into streams. Allow users to design admixtures of streams to provide, for example, composite datasets with signal+backgrounds, and signals of various types for normalization purposes.
  10. Understand how luminosity booking is to be handled, for a given dataset extraction.
  11. Allow selection over run periods, and provide appropriate luminosity scaling factors.
  12. How to select events based on triggers satisfied?
  13. User interface to the service
    • Provide a simple web interface easy to use and self explanatory, at least as a starting point.
    • Provide a python scripting 'language' and message passing interface to the service -- much later.

-- RobGardner - 06 Nov 2006
Topic revision: r1 - 06 Nov 2006, RobGardner
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback