Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang...

18
Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems Laboratory University of Chicago & Argonne National Laboratory Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, Thomas Radke Max-Planck-Institut für Gravitationsphysik UCSD, UIUC, U.Tenn GrADS Groups
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang...

Page 1: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Cactus-G:Experiments with a Grid-Enabled

Computational Framework

Dave Angulo, Ian Foster

Chuang Liu, Matei Ripeanu, Michael RussellDistributed Systems Laboratory

University of Chicago & Argonne National Laboratory

Gabrielle Allen, Thomas Dramlitsch,

Ed Seidel, Thomas RadkeMax-Planck-Institut für Gravitationsphysik

UCSD, UIUC, U.Tenn GrADS Groups

Page 2: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Overview

Research goals: Why Cactus-G Context: Numerical relativity, Cactus, dynamic

Grid computing The Cactus-G Grid-enabled framework Cactus and GrADS

– The Cactus Worm model problem

– Dynamic resource selection & code migration

– Experimental results Future directions & lessons learned

Page 3: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Research Goals

Investigate methods and structures for efficient Grid execution via in-depth study of a demanding application, including– Constructs for adapting to heterogeneity

– Constructs for dynamic resource acquisition Create testbed for GrADSoft components,

as they emerge Investigate utility of computational

frameworks as facilitator of Grid computing

Page 4: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Context (1):Numerical Relativity

Numerical simulation of extreme astrophysical events: colliding black holes, neutron stars, etc.– Understand physics

– Predict gravitational wave forms Relativistic effects => Einstein eqns

– Computationally intensive (can be 1000s flops/grid point)

– 3-D simulations only recently possible: demanding users

LIGO gravitational wave observatory

Colliding black holes

Page 5: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Context (2): Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)

Modular, portable framework for parallel, multidimensional simulations

Construct codes by linking– Small core (flesh): mgmt services

– Selected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.

Custom linking/configuration tools Developed for astrophysics, but not

astrophysics-specific

Cactus “flesh”

Thorns

Page 6: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Context (3):Dynamic Grid Computing

Application behaviors in a Grid environment:– Identify fastest/cheapest/biggest resources

– Configure for efficient execution

– Detect need for new resources or behaviors (e.g., due to resource slowdown, new subtasks, new appln regime, user steering, new resource available)

– Adapt, and/or discover new resources; invoke subtasks on new resources and/or migrate

We have users who want these behaviors; we also have the enabling machinery

Page 7: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Cactus-G: An ApplicationFramework for Dynamic Grid Computing

Cactus thorns for active management of application behavior and resource use

Heterogeneous resources, e.g.:– Irregular decompositions

– Variable halo for managing message size

– Msg compression (comp/comm tradeoff)

– Comms scheduling for comp/comm overlap Dynamic resource behaviors/demands, e.g.:

– Perf monitoring, contract violation detection

– Dynamic resource discovery & migration

– User notification and steering

Page 8: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Cactus-G Example:Terascale Computing

Gig-E100MB/sec

SDSC IBM SP1024 procs5x12x17 =1020

NCSA Origin Array256+128+1285x12x(4+2+2) =480

OC-12 lineBut only 2.5MB/sec)

17

12

512

5

4 2 2

Solved EEs for gravitational waves (real code)– Tightly coupled, communications required through derivatives– Must communicate 30MB/step between machines– Time step take 1.6 sec

Used 10 ghost zones along direction of machines: communicate every 10 steps

Compression/decomp. on all data passed in this direction Achieved 70-80% scaling, ~200GF (only 14% scaling without tricks)

Page 9: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Cactus-G Model Problem:The Cactus Worm

Migrate to “faster/ cheaper” system– When better system

discovered

– When requirements change

– When characteristics change (e.g., competition)

Tests most elements of Cactus-G & GrADS

Page 10: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Cactus Worm Architecture

Cactus “flesh”

“Tequila” Thorn

Appln& otherthorns

Globus Toolkit substrate: resourcediscovery, allocation, management

GrADS Mechanisms

Resourceselector

Applicationmanager

Page 11: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Tequila Thorn Functions

Initiate adaptation on any one of– User request (e.g., HTTP thorn)

– Notification of new resources

– Application monitoring: contract violation Request resources (ClassAd protocol)

– E.g., GrADS ResourceSelector Checkpoint application Contact App Manager to request restart

– Security, robustness advantages vs. direct restart

Page 12: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Cactus WormDetailed Architecture & Operation

Cactus “flesh”

“Tequila” Thorn

Computeresource

Computeresource

Coderepository…

Coderepository

Storageresource

Storageresource

GridInformation

Service

GrADSResourceSelector

ApplicationManager

Appln& otherthorns

(1) Adapt.request (2)

Resourcerequest

(3) Writecheckpoint

(4) Migrationrequest

(5) Cactusstartup

(7) Readcheckpoint

(0) Possibleuser input

(6) Loadcode

(1’) Resourcenotification

Storemodels, etc.

Query

Page 13: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Contract Monitor

Driven by three user-controllable parameters– Time quantum for “time per iteration”

– % degradation in time per iteration (relative to prior average) before noting violation

– Number of violations before migration Potential causes of violation

– Competing load on CPU

– Computation requires more processing power: e.g., mesh refinement, new subcomputation

– Hardware problems

Page 14: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Current Status

We have developed– Tequila thorn: monitoring, selection, control

– ResourceSelector (via ClassAd protocol)

– Cactus performance model We have demonstrated on GrADS Macro Grid

– Contract monitoring for multiprocessor runs

– Dynamic resource selection

– Migration

Page 15: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 3 5 7 9

11

13

15

17

19

21

23

25

27

29

31

33

Clock Time

Ite

rati

on

s/S

ec

on

d

Migration in Action

Loadapplied

3 successivecontract

violations

RunningAt UIUC

(migrationtime not to scale)

Resourcediscovery

& migration

RunningAt UC

Page 16: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Ongoing and Future Work:New and Improved Capabilities

Optimize migration process Use performance models during selection

– And include cost of migration & information about future computation in the model

Matchmaker-based ResourceSelector– Separation of concerns between resource

characterization and selection– Study resource characterization process, use

NWS-based prediction techniques Dynamic notification of availability of “better”

resources

Page 17: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Ongoing and Future Work:Further Integration with GrADSoft

Contract monitoring– Pablo– Issues: determining which thorns are

monitored, or if flesh is monitored Program Preparation System Configurable Object Program and

Application Launcher – Cactus has its own launcher and compiles

its own code

Page 18: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.

Grid Application Development Software Project

Lessons Learned and Outcomes

Lessons learned– A real & demanding application can exploit

adaptive techniques to execute efficiently in Grid environments

– Even a relatively regular application can incorporate a range of useful mechanisms for adaptive behaviors & resource demands

Outcomes– Prototype Cactus-G framework: wonderful

experimental platform