Download - The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.

Transcript
Page 1: The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.

Grid-BasedData Selector

CompositionalAnalysis Tool

(CAT)

DAXGenerator

Pegasus

CondorDAGMAN

PathwayComposition

Tool

GRID

host1host2

Data

Data

CAT KnowledgeBase

SCEC DatatypeDB

MetadataCatalog Service

ReplicaLocationService

Dax

Dag

Rsl

HAZARD MAP

The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need to discover the available resources and schedule the jobs onto them, essentially composing detailed application workflow descriptions by hand. This leaves users struggling with the complexity of the Grid and weighing which resources to use, where to run the computations, where to access the data etc. Thus there is a need to automate the workflow generation and execution process as much as possible.

Pegasus: Planning for Execution in Grids http://pegasus.isi.edu

Maps from abstract to concrete workflow.Isolates the user from many Grid details. Automatically locates physical locations for both components

(transformations) and data, via Globus RLS

and the Transformation Catalog.Finds appropriate resources to execute the components

(via Globus MDS).Interfaces with external site selectors.Publishes newly derived data products.Reuses existing data products where applicable.Supports on demand staging of binary executables.

Other Success StoriesLaser Interferometer Gravitational Wave Observatory (LIGO)

http://www.ligo.caltech.eduMontage http://montage.ipac.caltech.eduBLAST Genome Analysis and Database Update

http://www-fp.mcs.anl.gov/pdq/pdq.htmATLAS Monte Carlo data productionSloan Digital Sky Survey galaxy cluster finding http://www.sdss.org

Planning the SCEC Pathways: Planning the SCEC Pathways: Pegasus at work on the GridPegasus at work on the Grid

People Involved:ISI : Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, John McGee

Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi

SCEC: Vipin Gupta, Phil Maechling

USC : Maureen Dougherty, Brian Mendenhall, Garrick Staples

SCEC COMPOSITION PROCESS

CAT (Compositional Analysis Tool) an ontology based

workflow composition tool or PCT (Pathway Composition Tool)

generate the application workflows template (using ontologies and data types).

The Grid-Based Input Data selection component allows the user to select the input data necessary to populate the workflow template. The result in an abstract workflow that refers only to the logical application components and logical input data required for a pathway.

The DAX generator translates the abstract workflow to a corresponding XML description (DAX).

Pegasus takes in the DAX and generates the concrete

workflow. Concrete Workflow identifies the resources that are used to

run on the grid and refers to the physical locations of input data.

Condor DAGMAN submits the workflow on the grid and tracks

the execution of the workflow. Successful execution generates the final hazard map for the

region.

A View of SCEC Composition Process

PEGASUS ENGINE

CPlanner (gencdag)

Rls-client Tc-clientGenpoolconfig

client

Data Transfer Mechanism

Gridlabtransfer

Transfer2

Multiple Transfer

Globus-url-copy

Stork

Transformation Catalog

Mechanism(TC)

DatabaseFile

Resource Information

Catalog

MDS File

Submit Writer

CondorStork Writer

GridLab GRMS

Pegasus command line clients

RoundRobin

Site SelectorMin-Min

Max-MinProphesy

Random

Grasp

RLS

Replica Query and Registration

Mechanism

Replica Selection

Existing Interfaces

Research Implementations

Production Implementations

Interfaces in development

RLS

Palm Springs

Caltech Teragrid

SCEC

NCSA-Teragrid

SDSC-Teragrid

USC

ANL-Teragrid

PSC-Teragrid

SCEC Resources Teragrid Resources Other Resources

Testbed

SCEC Workflow

Preparation job prepares input for

Pathway 2 simulation. Pathway 2 is Fortran based, wave

propagation MPI code. Pathway2PGV reads in binary

output file generated by Pathway2,

and converts it into hazard map that

can be visualized.