J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo...

15
J.J.Blaising Apr il 02 AMS DataGrid-status 1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo

Transcript of J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo...

Page 1: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 1

DataGrid StatusJ.J Blaising

IN2P3

Grid Status

Demo introduction

Demo

Page 2: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status

Grid Technology:Introduction & Overview

Ian FosterArgonne National Laboratory

University of Chicago

Page 3: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 3

• Harvey B. Newman, Caltech• Data Analysis for Global HEP Collaborations

• LCG Launch Workshop, CERN• l3www.cern.ch/~newman/LHCCMPerspective_hbn031102.ppt

•LHC Computing Model Perspective

Page 4: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 4

• Query (task completion time) estimation• Queueing and co-scheduling strategies• Load balancing (e.g. Self Organizing Neural Network)• Error Recovery: Fallback and Redirection Strategies• Strategy for use of tapes• Extraction, transport and caching of physicists’

object-collections; Grid/Database Integration• Policy-driven strategies for resource sharing

among sites and activities; policy/capability tradeoffs• Network Peformance and Problem Handling

– Monitoring and Response to Bottlenecks– Configuration and Use of New-Technology Networks e.g.

Dynamic Wavelength Scheduling or Switching• Fault-Tolerance, Performance of the Grid Services Architecture • Consistent transaction management, …….

FROM H.Newman

Page 5: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

NLNLSURFnet

CERN

UKUKSuperJANET4

AbileneAbilene

ESNETESNET

MRENMREN

ITITGARR-B

GEANT

NewYork

STAR-TAP

STAR-LIGHT

DataTAG project

Major 2.5 Gbps circuits between Europe & USA

Page 6: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 6

DataGrid Goal

Develop middleware to allow WAN distributed

computing and data management

• Build a distributed batch system allowing to submit

jobs on different sites with automatic site selection

according to resource matching.

Next:

• Interactive use and parallel processing

• Other OS (Solaris)

Requirements from HEP, Earth Orbservation and

Biomedical applications.

Page 7: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 7

User ITF

Node

Computing element gatekeeper

Jobmanger-PBS/LSF/BQSPublish CPU resources

Storage element gatekeeper

Publish storage resources

Worker Node

Worker Node

Client

Worker Node

Resources provider

Storage

CPU

Workload managerInformation system

FileCatalog server

Grid Services

Submit job

Page 8: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 8

Middleware status v1.1.2

•Workload manager (UI+RB+JSS+LB), WP1

still bug fixing + improvements for year 2

• Data management, file catalog, replica manager, WP2

good collaboration with globus

• Information system, WP3 deployment of uniform FTREE/MDS/R-GMA

• Fabric management, WP4

LCFG, light LCFG for preinstalled systems

• Mass storage management, WP5, Castor, Hpss, …

Successful EU review on 1 March

Page 9: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 9

VO Services

• Computing and Storage element services deployed

at CERN, CC-IN2P3, CNAF, NIKHEF, RAL, more …

US sites soon to test Grid interoperability

For ALICE, ATLAS, CMS, LHCb, Earthobs, Biomed

deployment of dedicated services

• LDAP server (certificates)

• File catalog (LFN/PFN mapping)

• GDMP server (automatic data replication)

• More to come, Metadata catalog, …

Page 10: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 10

Application activities (WP8)

•Middleware evaluation using

ALICE, ATLAS, CMS, LHCb, Gen-Hep toolkits

• User requirements collection with

ALICE, ATLAS, CMS, LHCb

• Common HEP uses cases

• Common application use case

Page 11: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 11

OS & Net services

Bag of Services (GLOBUS)

Specific application layer

GLOBUS team

MiddleWare

MW1 MW2 MW3 MW4 MW5

MiddleWare

MW1 MW2 MW3 MW4 MW5

LHCVO use cases & requirements Other apps

If we manage to define

ALICE ATLAS CMS LHCb Other apps

MiddleWare

MW1 MW2 MW3 MW4 MW5

VO use cases & requirements Common core use case

Or even better

LHC Other apps

MiddleWare

MW1 MW2 MW3 MW4 MW5

VO use cases & requirements

It will be easier to arrive at

Common use cases

LHC Other apps

Common use cases

Page 12: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 12

What we want from a GRID

• This is the result of our experience on TB0 & TB1

OS & Net services

Basic Services

High level GRID middleware

LHCVO common application layer

Other apps

ALICE ATLAS CMS LHCb

Specific application layer

Other apps

GLOBUS team

GRID architectur

e

Common use cases

Page 13: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 13

Demo introduction

•Sites involved CERN, CNAF, LYON, NIKHEF, RAL

•User interface in X, dg-job-submit demo.jdl =>

job sent to the Workload management syst at CERN

•The WMS selects a site according to resource

attributes given in the jdl file and to the resources

published via the Infornation System.

•The job is sent to one of the site, a data file is written

the file is copied to the nearest MS and replicated on

all other sites.

•dg-job-get-output is used to retrieve the files

Page 14: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 14

Add lfn/pfnto

Rep Catalog

GenerateRaw eventson local disk

Raw/dst ?

Job argumentsData Type : raw/dstRun Number :xxxxxxNumber of evts :yyyyyyNumber of wds/evt:zzzzzzRep Catalog flag : 0/1Mass Storage flag : 0/1

Write logbook

raw_xxxxxx_dat.log

dst_xxxxxx_dat.log

Read raw eventsWrite dst events

Get pfnfrom

Rep Catalog

Add lfn/pfnto

Rep Catalog

MS

MS

Move toSE, MS ?

Write logbook

pfn local ? ny

raw_xxxxxx_dat.log

Copy raw data From SE toLocal disk

Generic HEP application flowchart

SEMove to SE, MS?

SE

Page 15: J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.

J.J.Blaising April 02 AMS DataGrid-status 15

demo.jdlExecutable = demo.csh;Arguments = raw 100147 50 1000 0 0StdInput = none;StdOutput = demo.out;StdError = demo.err;InputSandbox = {demo.csh,main.exe};OutputSandbox={demo.out.demo.err,demo.log};Requirements = other.OpSys==“RH 6.2;

dg-job-submit demo.jdl

dg-job-get-output job-id

User ITF

Node

Input sanboxOutput sandbox

Workload managerInformation system

STORAGE

COMPUTING

STORAGE

STORAGE

STORAGE

STORAGE

COMPUTING

COMPUTING

COMPUTING

COMPUTING

File catalogserver

data