A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler,...

20
A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh , S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics, 2003 UC San Diego Virtual Data In CMS Production

Transcript of A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler,...

Page 1: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M.

Wilde

CMS & GriPhyN

Conference in High Energy Physics, 2003

UC San Diego

Virtual Data In CMS Production

Page 2: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 2

Virtual Data Motivations in Production

Data track-ability and result audit-ability– Universally sought by scientists

Facilitates tool and data sharing and collaboration– Data can be sent along with its recipe

– Recipe is useful in searching for data Workflow management

– A new, structured paradigm for organizing, locating, and specifying data products

Performance optimizations– Ability to delay execution planning until as late as

possible

Page 3: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 3

Initial CMS Production tests using the Chimera Virtual Data System

Motivation– Simplify CMS production in a Grid environment

– Evaluate current state of Virtual Data technology

– Understand issues related to provenance of CMS data

Use-case– Implement a simple 5-stage CMS production pipeline on the US

CMS Test Grid

Solution– Wrote an interface between Chimera and the CMS production

software

– Wrote a simple grid scheduler

– Ran sample simulations to evaluate system

Page 4: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 4

What is a DAG?Directed Acyclic Graph

A DAG is the data structure used to represent job dependencies.

Each job is a “node” in the DAG.

Each node can have any number of “parent” or “children” nodes – as long as there are no loops!

We usually talk about workflow in units of "DAGs"

Job A

Job B Job C

Job D

Picture Taken from Peter Couvares

Page 5: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 5

ODBMS

Gen

era

tor

Sim

ula

tor

Form

ato

r

wri

teES

D

wri

teA

OD

wri

teTA

G

wri

teES

D

wri

teA

OD

wri

teTA

G

An

aly

sis

Scr

ipts

Dig

itis

er

Calib

. D

B

ExampleCMS Data/Workflow

Page 6: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 6

ODBMS

Gen

era

tor

Sim

ula

tor

Form

ato

r

wri

teES

D

wri

teA

OD

wri

teTA

G

wri

teES

D

wri

teA

OD

wri

teTA

G

An

aly

sis

Scr

ipts

Dig

itis

er

Calib

. D

B

Onlin

e Te

ams

(Re)

proc

essin

g Te

amMC P

rodu

ctio

n Te

am

Phys

ics G

roup

s

Data/workflowis a collaborative

endeavour!

Page 7: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 7

CMKIN

CMSIM

OOHITS

OODIGI

NTUPLE

.ntpl

.fz

EventDatabase

.ntpl

Events are generated(pythia).

Detector’s response is simulated for each event

(geant3).

Events are reformatted and written into a database.

Original events are digitised and reconstructed.

Reconstructed data is reduced and written to flat file.

A Simple CMS Production 5-Stage Workflow Use-case

Page 8: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 8

Fortran

DB

2-stage DAG Representation of the 5-stage Use-case

• Fortran job wraps the CMKIN and CMSIM stages.

• DB job wraps the OOHITS, OODIGI, and NTUPLE stages.

This structure was used to enforce policy constraints on the Workflow(i.e. Objectivity/DB license required for DB stages)

Initially useda simple scriptto generate Virtual Data Language (VDL)

McRunJob is nowused to generatethe Workflow inVDL (see talk by G. Graham)

CMKIN

CMSIM

OOHITS

OODIGI

NTUPL

.ntpl

.fz

EventDB

.ntpl

Responsibility of a Workflow Generator: creates the abstract plan

Page 9: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 9

Mapping Abstract Workflowsonto Concrete Environments

Abstract DAGs (virtual workflow)

– Resource locations unspecified

– File names are logical

– Data destinations unspecified

– build style

Concrete DAGs (stuff for submission)

– Resource locations determined

– Physical file names specified

– Data delivered to and returned from physical

locations

– make style

Abs. PlanVDC

RC C. Plan.

DAX

DAGMan

DAG

VDL

Log

ical

Ph

ysi

cal

XML

XML

In general there are a range of planning steps between abstract workflows and concrete workflows

Page 10: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 10

Fortran

DB

Stage File In

Execute Job

Stage File Out

Register File

Concrete DAG Representation of the CMS Pipeline Use-case

Responsibility of the Concrete Planner:

• Binds job nodes with physical grid sites

• Queries Replica and Transformation Catalogs for existence and location.

• Dresses job nodes with stage-in/out nodes.

Page 11: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 11

computemachinesCondor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

Default middleware configurationfrom the Virtual Data Toolkit

Page 12: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 12

computemachines

Condor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

WorkRunner

RefDBMcRunJob:

Generic WorkflowGenerator

Modified middleware configuration (to enable massive CMS production workflows)

Page 13: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 13

computemachines

Condor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

WorkRunner

McRunJob:Generic Workflow

Generator

Modified middleware configuration (to enable massive CMS production workflows)

RefDB

The CMS Metadata Catalog: - contains parameter/cards files - contains production requests - contains production status - etc

See Veronique Lefebure's talk on RefDB

Page 14: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 14

computemachines

Condor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

WorkRunner

RefDB

Modified middleware configuration (to enable massive CMS production workflows)

Linker VDL Generator

VDL Config

RefDB Module

The CMS Workflow Generator:

- Constructs production workflow from a request in the RefDB

- Writes workflow description in VDL (via ScriptGen) See Greg

Graham's talk on MCRunJob

McRunJob:Generic Workflow

Generator

Page 15: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 15

computemachines

Condor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

RefDBMcRunJob:

Generic WorkflowGenerator

Modified middleware configuration (to enable massive CMS production workflows)

WorkRunner

Condor-GMonitor

ChimeraInterface

Job Tracking Module

Workflow Grid Scheduler - very simple placeholder (due to lack of interface to resource broker)

- submits Chimera workflows based on simple job monitoring information from Condor-G

Page 16: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 16

computemachines

Condor-G

Chimera

DAGman

gahp_server

submit host remote host

gatekeeper

Local Scheduler(Condor, PBS, etc.)

WorkRunner

RefDBMcRunJob:

Generic WorkflowGenerator

Modified middleware configuration (to enable massive CMS production workflows)

Page 17: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 17

Initial Results

Production Test

– Results> 678 DAG’s (250 events each)> 167,500 test events computed (not delivered to CMS)> 350 CPU/days on 25 dual-processor Pentium (1 GHz) machines

over 2 weeks of clock time> 200 GB simulated data

– Problems> 8 failed DAG’s> Cause

Pre-emption by another user

Page 18: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 18

Initial Results (cont)

Scheduling Test

– Results> 5954 DAG’s (1 event each, not used by CMS)> 300 CPU/days on 145 CPU’s in 6 sites

University of Florida: USCMS Cluster (8), HCS Cluster (64), GriPhyN Cluster (28)

University of Wisconsin, Milwaukee, CS Dept. Cluster (30) University of Chicago, CS Dept. Cluster (5) Argonne National Lab DataGrid Cluster (10)

– Problems> 395 failed DAG’s> Causes

Failure to post final data from UF GriPhyN Cluster (200-300) Globus Bug, 1 DAG in 50 fails when communication is lost

> Primarily limited by the performance of lower-level grid middleware

Page 19: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 19

The Value of Virtual Data

Provides full reproducibility (fault tolerance) of one's results:– tracks ALL dependencies between transformations and their

derived data products– something like a "Virtual Logbook" – records the provenance of data products

Provides transparency with respect to location and existence. The user need not know:– the data location– how many data files are in a data set– if the requested derived data exists

Allows for optimal performance in planning. Should the derived data be:– staged-in from a remote site?

– send the job to the data– send the data to the job

– re-created locally on demand?

Page 20: A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

03.25.2003 CHEP 2003 20

Summary:Grid Production of CMS Simulated Data

CMS production of simulated data (to date)– O(10) sites– O(1000) CPUs – O(100) TB of data – O(10) production managers

Goal is to double every year—without increasing the number of production managers!

– More automation will be needed for upcoming Data Challenges!

Virtual Data provides – parts of the necessary abstraction required for automation and fault tolerance.– mechanisms for data provenance (important for search engines)

Virtual Data technology is "real" and maturing, but still in its childhood– much functionality currently exists– still requires placeholder components for intelligent planning and optimisation