Datasets

12
David Adams ATLAS Datasets David Adams BNL December 19, 2002 PPDG meeting Interactive analysis

description

Datasets. PPDG meeting Interactive analysis. David Adams BNL December 19, 2002. DIAL Dataset properties Dataset representations Dataset package status Future. Contents. DIAL. DIAL is Distributed Interactive Analysis of Large datasets DIAL described at - PowerPoint PPT Presentation

Transcript of Datasets

Page 1: Datasets

David Adams

ATLAS

Datasets

David Adams

BNL

December 19, 2002

PPDG meeting

Interactive analysis

Page 2: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 2

David Adams

ATLAS

ContentsDIAL

Dataset properties

Dataset representations

Dataset package status

Future

Page 3: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 3

David Adams

ATLAS

DIALDIAL is

• Distributed Interactive Analysis of Large datasets

DIAL described at• http://www.usatlas.bnl.gov/~dladams/dial/talks/021219_dial.ppt

Use DIAL to deduce dataset properties

Page 4: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 4

David Adams

ATLAS

Dataset propertiesDataset is a collection of data objects

• Means to iterate over objects• Typically objects are also indexed with labels

– Unique within dataset

– For event data: event ID + type + string key> E.g. run 123, event 456, EM jet, cone_0.5

– Allows for random access

• Data may be in a persistent store– Each object has a GUID

Page 5: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 5

David Adams

ATLAS

Dataset properties (cont)Dataset has content

• Indicates suitability for a particular analysis or other transformation

• Might be expressed in terms of object labels• For ATLAS event data:

– Event ID’s + type-keys for each (ATLAS) event

• (Part of type in GriPhyN VDG)

Page 6: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 6

David Adams

ATLAS

Dataset properties (cont)Data in dataset has a location

• Persistent store where data may be found• List of files holding the data

– File ID’s or LFN’s> Persistent store locates physical replicas

• Or rows in RDB tables…• May be multiple locations for a dataset

– Due to different representations

– More later

Page 7: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 7

David Adams

ATLAS

Dataset properties (cont)Dataset has a history

• Transformation used to create the dataset– Executable, version, input parameters

– (VDG transformation)

• Input datasets– (VDG derivation)

• Run-time properties (node, time, …)– Multiple values for distributed processing

– (VDG invocation)

Page 8: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 8

David Adams

ATLAS

Dataset properties (cont)Dataset has a unique identity (name)

• So it can b referenced

Dataset has portable representation• Possible to carry around a description the

content and location of a dataset without reference to any DB’s

• Dataset package uses XML

Page 9: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 9

David Adams

ATLAS

Dataset representationsThere are different ways to represent the data in a dataset

Simple datasets:• All data in a single file• Table in a RDB• Indexed list of GUID’s for a persistent store

– Commercial ODB such as Objectivity

– HES such as LCG POOL

Page 10: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 10

David Adams

ATLAS

Dataset representations (cont)Compound datasets

• Concatenation of datasets– Concatenation of content

– Any overlap between content of constituent datasets must index identical objects

• Subset of a dataset– Based on content

• Result of an algorithm applied on a dataset– Virtual data

Page 11: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 11

David Adams

ATLAS

Dataset package statusDatasets

• Generic implementation in place– http://www.usatlas.bnl.gov/~dladams/dataset

• Assumes content is event data• Supported representations:

– Single file> AthenaRoot format> ATLAS Monte Carlo generator output

– Concatenation of events– Selection based on event ID

Page 12: Datasets

December 19 , 2002Datasets PPDG Interactive analysis 12

David Adams

ATLAS

FutureSupport other types of ATLAS event data

Add concatenation and selection based on event content

Add representation for POOL EventCollection

Add non-event data• Relevant conditions data objects• Derived metadata• Provenance and production history