RAMSES: Robust Analytic Models for Science at Extreme Scales

38
Gagan Agarwal 1* Prasanna Balaprakash 2 Ian Foster 2* Raj Kettimuthu 2 Sven Leyffer 2 Vitali Morozov 2 Todd Munson 2 Nagi Rao 3* Saday Sadayappan 1 Brad Settlemyer 3 Brian Tierney 4* Don Towsley 5* Venkat Vishwanath 2 Yao Zhang 2 1 Ohio State University 2 Argonne National Laboratory 3 Oak Ridge National Laboratory 4 ESnet 5 UMass Amherst (* Co-PIs) Advanced Scientific Computing Research Program manager: Rich Carlson

description

RAMSES: A new project in data-driven analytical modeling of distributed systems RAMSES is a new DOE-funded project on the end-to-end analytical performance modeling of science workflows in extreme-scale science environments. It aims to link multiple threads of inquiry that have not, until now, been adequately connected: namely, first-principles performance modeling within individual sub-disciplines (e.g., networks, storage systems, applications), and data-driven methods for evaluating, calibrating, and synthesizing models of complex phenomena. What makes this fusion necessary is the drive to explain, predict, and optimize not just individual system components but complex end-to-end workflows. In this talk, I will introduce the goals of the project and some aspects of our technical approach.

Transcript of RAMSES: Robust Analytic Models for Science at Extreme Scales

Page 1: RAMSES: Robust Analytic Models for Science at Extreme Scales

Gagan Agarwal1* Prasanna Balaprakash2 Ian Foster2* Raj Kettimuthu2

Sven Leyffer2 Vitali Morozov2 Todd Munson2 Nagi Rao3*

Saday Sadayappan1 Brad Settlemyer3 Brian Tierney4* Don Towsley5*

Venkat Vishwanath2 Yao Zhang2

1 Ohio State University 2 Argonne National Laboratory 3 Oak Ridge National Laboratory 4 ESnet 5 UMass Amherst (* Co-PIs)

Advanced Scientific Computing Research

Program manager: Rich Carlson♦

Page 2: RAMSES: Robust Analytic Models for Science at Extreme Scales

2

Source

data

store

Desti-

nation

data

store

Wide Area

Network

Prediction, explanation, & optimization are

challenging for even “simple” E2E workflows

For example, file transfer, for which we want to:

• Predict achievable throughput for a specific configuration

• Explain factors influencing performance

• Optimize parameter values to achieve high speeds

Page 3: RAMSES: Robust Analytic Models for Science at Extreme Scales

3

Application

OS

FS Stack

HBA/HCA

LANSwitch

Router

Source data transfer node

TCP

IP

NIC

Application

OS

FS Stack

HBA/HCA

LAN

Switch

Router TCP

IP

NIC

Storage Array

Wide Area

Network

OST

MDT

Lustre file system

Destination data transfer node

OSS

OSS

MDS

MDS

Prediction, explanation, & optimization are

challenging for even “simple” E2E workflows

+ diverse environments+ diverse workloads+ contention

Page 4: RAMSES: Robust Analytic Models for Science at Extreme Scales

85 Gbps sustained disk-to-disk over 100

Gbps network, Ottawa—New Orleans

4

Raj Kettiumuthu

and team,

Argonne

Page 5: RAMSES: Robust Analytic Models for Science at Extreme Scales

High-speed transfers to/from AWS cloud,

via Globus transfer service

• UChicago AWS S3 (US region): Sustained 2 Gbps

– 2 GridFTP servers, GPFS file system at UChicago

– Multi-part upload via 16 concurrent HTTP connections

• AWS AWS (same region): Sustained 5 Gbps

5

go#s3

Page 6: RAMSES: Robust Analytic Models for Science at Extreme Scales

6

Endpoint aps#clutch has transfers to 125 other endpoints

Endpoint aps#clutch has transfers to 125 other endpoints

One Advanced

Photon Source

data node:

125 destinations

Page 7: RAMSES: Robust Analytic Models for Science at Extreme Scales

Same

node

(1 Gbps

link)

Page 8: RAMSES: Robust Analytic Models for Science at Extreme Scales
Page 9: RAMSES: Robust Analytic Models for Science at Extreme Scales

9

Page 10: RAMSES: Robust Analytic Models for Science at Extreme Scales

How to create more accurate, useful, and

portable models of such systems?

Simple analytical model:

T= α+ β*l[startup cost + sustained bandwidth]

Experiment + regression

to estimate α, β

10

First-principles modeling

to better capture details

of system & application

components

Data-driven modeling to

learn unknown details of

system & application

components

Model

composition

Model, data

comparison

Page 11: RAMSES: Robust Analytic Models for Science at Extreme Scales

The RAMSES vision

To develop a new science of end-to-end

analytical performance modeling that will

transform understanding of the behavior of

science workflows in extreme-scale science

environments.

Based on integration of first-principles and

data-driven modeling, and structured

approach to model evaluation & composition

11

Page 12: RAMSES: Robust Analytic Models for Science at Extreme Scales

Modeling

Develop, evaluate,

and refine component

and end-to-end models

Tools

Develop easy-to-use

tools to provide end-

users with actionable

advice

Estimation

Develop and apply data-

driven estimation methods:

differential regression,

surrogate models,

etc.

Experiments

Extensive, automated

experiments to test models

& build database

The RAMSES research agenda & platform

12

Evaluators Advisor

TesterEstimators

Databas

e

Page 13: RAMSES: Robust Analytic Models for Science at Extreme Scales

We are informed by five challenge workflows

13

Transfer: High-performance, end-to-end

file transfer

Scattering: Capture and analysis of

diffuse scattering experimental data

MapReduce: Data-intensive, distributed

data analytics

Exascale: Performance of exascale

application kernels on memory hierarchies

In-situ: Configuration and placement of in-

situ analysis computations

Page 14: RAMSES: Robust Analytic Models for Science at Extreme Scales

14

Application

OS

FS Stack

HBA/HCA

LANSwitch

Router

Source data transfer node

TCP

IP

NIC

Application

OS

FS Stack

HBA/HCA

LAN

Switch

Router

Predict: Throughput for configuration

Explain: Factors influencing performance

Optimize: Parameters for high speeds

TCP

IP

NIC

Storage Array

Wide Area

Network

OST

MDT

Lustre file system

Destination data transfer node

OSS

OSS

MDS

MDS

Transfer: End-to-end file movement

Page 15: RAMSES: Robust Analytic Models for Science at Extreme Scales

Scattering: Linking simulation and

experiment to study disordered structures

Diffuse scattering images from Ray Osborn et al., Argonne

SampleExperimentalscattering

Material composition

Simulated structure

Simulatedscattering

La 60%Sr 40%

Detect errors (secs—mins)

Knowledge basePast experiments;

simulations; literature; expert knowledge

Select experiments (mins—hours)

Contribute to knowledge base

Simulations driven by experiments (mins—days)

Knowledge-drivendecision making

Evolutionary optimization

Page 16: RAMSES: Robust Analytic Models for Science at Extreme Scales

Immediate assessment of alignment quality in

near-field high-energy diffraction microscopy

16

Blue Gene/QOrthros

(All data in NFS)

3: Generate

Parameters

FOP.c

50 tasks

25s/task

¼ CPU hours

Uses Swift/K

Dataset

360 files

4 GB total

1: Median calc

75s (90% I/O)

MedianImage.c

Uses Swift/K

2: Peak Search

15s per file

ImageProcessing.c

Uses Swift/K

Reduced

Dataset

360 files

5 MB total

feedback to experiment

Detector

4: Analysis PassFitOrientation.c

60s/task (PC)

1667 CPU hours

60s/task (BG/Q)

1667 CPU hours

Uses Swift/TGO Transfer

Up to

2.2 M CPU hours

per week!

ssh

Globus Catalog

Scientific Metadata

Workflow ProgressWorkflow

Control

Script

Bash

Manual

This is a

single

workflow

3: Convert bin L

to N

2 min for all files,

convert files to

Network Endian

format

Before

After

Hemant Sharma, Justin Wozniak, Mike Wilde, Jon Almer

Page 17: RAMSES: Robust Analytic Models for Science at Extreme Scales

MapReduce: Distributing data and

computation for data analytics

...

...

Data

Slaves

Master

Local Cluster

LocalReduction

Job Assignment

...

...

Data

Slaves

Master

Cloud Environment

Job Assignment

LocalReduction

Index

17

Remote data

analysis

Job

assignment

Global

reduction

Page 18: RAMSES: Robust Analytic Models for Science at Extreme Scales

Exascale simulation

18

HACC Cosmology

• Compute intensive phase with

regular stride one access

• Tree walk phase: irregular

memory access with high

branching and integer ops

• 3D FFT communication intensive

phase

• I/O Phase

Images Courtesy: Joseph Insley (Argonne)

Nek5000 CFD

• Matrix vector product phase

• Conjugate gradient iteration

• Communication phase

involving nearest neighbor

exchange and vector

reductions

Page 19: RAMSES: Robust Analytic Models for Science at Extreme Scales

Compute

Resource

(Multi

Petaflop,

High Radix

Interconnect

Dragonfly,

5D Torus)

I/O

Nodes

Switch

Complex

(IB) File Server

Nodes

Analysis

Nodes/Cluster

Storage System

In situ analysis on the DOE Leadership

Computing Infrastructure

1536

GB/s

DTN Nodes

We need to perform the right computation at

the right place and time, taking into account

details of the simulation, resources, and analysis

1

2

3

4

Page 20: RAMSES: Robust Analytic Models for Science at Extreme Scales

A diverse set of components

Serv

er

Par

alle

l co

mp

ute

r

Ro

ute

r

Sto

rage

sys

tem

LAN

WA

N

TCP,

UD

T

Gri

dFT

P

File

sys

tem

s

Gri

dFT

Pse

rve

r

NEC

bo

ne

HA

CC

bo

ne

Ch

eck

sum

Encr

ypti

on

Map

Re

du

ce

Oth

er

app

s

Transfer Y Y Y Y Y Y Y Y Y Y Y

Scattering Y Y Y Y Y Y Y Y

Exascale Y Y Y Y Y Y

DistributedMapReduce Y Y Y Y Y Y Y Y Y

In-Situ Y Y Y Y Y Y Y Y

20

Page 21: RAMSES: Robust Analytic Models for Science at Extreme Scales

Develop, evaluate, and refine

component and end-to-end

models

• Models from the literature

• Fluid models for network flows

• SKOPE modeling

system

21

Develop and apply

data-driven

estimation methods

• Differential regression

• Surrogate models

• Other methods from literature

Develop easy-to-use tools to

provide end-users with

actionable advice

• Runtime advisor, integrated

with Globus transfer system

Automated experiments to

test models and build

database

• Experiment design

• Testbeds

Page 22: RAMSES: Robust Analytic Models for Science at Extreme Scales

OverviewInput Output

Code

skeletons

Parser

Per-function

intermediate repr.

(Block Skeleton Trees)

Transformation

engine

Behavior

modeling engine

Execution-based

intermediate repr.

(Bayesian execution tree)

Transformed

Bayesian execution

tree

Characterization

engine

Performance

projection

Hardware model

system

specifications

Performance

projection

Schema for

suggested

tranformations

Synthesized

characteristics

Source code

User Effort

(semi-automated with

a source-to-source

translator)

Automatic

SKOPE language

Workload input

Fro

nt

end

Back e

nd

Bottleneck analysis

SKOPE

performance

modeling

framework

Page 23: RAMSES: Robust Analytic Models for Science at Extreme Scales

Differential regression for combining

data from different sources

Example of use: Predict performance on connection length L

not realizable on physical infrastructure

E.g., IB-RDMA or HTCP throughput on 900-mile connection

1) Make multiple measurements of performance on path lengths d:

– Ms(d): OPNET simulation

– ME(d): ANUE-emulated path

– MU(di): Real network (USN)

2) Compute measurement regressions on d: ṀA(.), A∈{S, E, U}

3) Compute differential regressions: ∆ṀA,B(.) = ṀA(.) - ṀB(.), A, B∈{S, E, U}

4) Apply differential regression to obtain estimates, C∈{S, E}

𝓜U(d) = MC(d) - ∆ṀC,U(d)

simulated/emulated measurements point regression estimate

Page 24: RAMSES: Robust Analytic Models for Science at Extreme Scales

We will extend the differential regression

method in several areas

• To compare different component models

– E.g., different models of network elements, storage

systems, protocol implementations

• To compare different composite models

– E.g., different methods for combining memory and

CPU models

• To compare model outputs with measurements

24

Page 25: RAMSES: Robust Analytic Models for Science at Extreme Scales

System

parameters

Task size

parameters

Component model

component

i

cost

terms

performance

quality model

p

i s

i

Experiment design

(active learning)

Analytical

and

empirical

models

Qi( p

i,s

i) is a regression

estimate of

Page 26: RAMSES: Robust Analytic Models for Science at Extreme Scales

Source LAN

profile

WAN

profileDestination LAN

profile

Configuration for

host and edge

devices

Configuration

for WAN

devices

Configuration for

host and edge

devices

composition

operations

End-to-end profile composition

Page 27: RAMSES: Robust Analytic Models for Science at Extreme Scales

End-to-end model composition & analysis

• End-to-end model using composition

– It is an approximation: due to component interactions

not modelled by the composition operator

• Actual end-to-end performance model

– Component models are “corrected” to account for un-

modelled effects: this form is assumed to exist

27

Page 28: RAMSES: Robust Analytic Models for Science at Extreme Scales

Using end-to-end measurements and differential

regression to correct regression estimates

• Regression estimate of composed model:

– “Estimated”, since components models are “incomplete”

as derived from first principles and/or measurements

• Error due to regression estimate:

• Error can be mitigated using measurements:

Corrected estimate of :

28

Q p,s( )Å Q p,s( ) = Q p,s( ) - Q p,s( )é

ëùû

2

ˆ (p, )Q s

Qp,s

Q p,s( ) = Q p,s( )+ D p,s( )

Analytical

model

Correction from differential

regression using

measurements

Page 29: RAMSES: Robust Analytic Models for Science at Extreme Scales

Performance guarantees

• Vapnik-Chervonenkis theory: under finite VC-dim(F)

– Guarantees that error of regression estimate is close to

optimal with a certain probability

– Distribution-free: does not require detailed knowledge

of error distributions – uses end-to-end measurements

• Error of the corrected estimate:

29

ip

P I D,Q, p( )- I D*,Q, p( ) > e{ } <d F,l,e( )

I D,Q, p( ) = Qp,s

- Q p,s( ) - D p,s( )éë

ùûò dP

Qp,s

Estimated Optimal

Page 30: RAMSES: Robust Analytic Models for Science at Extreme Scales

Surrogate modeling framework

to inform choice of experiments

30

Machine learning &

optimization

Performance

metricsInformative

configurations

First-principles models

Evaluation

Page 31: RAMSES: Robust Analytic Models for Science at Extreme Scales

GridFTP flow i, parallelism ki

Bottleneck router

Solve for throughputs, and

transfer delays

Special case: known p

Fluid models of network flows

31

GridFTP flow i:

RTT Ri

Throughput Ti

Bottleneck

router:

Capacity C

Loss rate p

{ 0}1 Q j

j

dQC T

dt

ii

i

kT

R p

2

( )( ) ( )

2

i i ii

i i

dT k T tT t p t

dt R k

Page 32: RAMSES: Robust Analytic Models for Science at Extreme Scales

32

Analytical models

Regressionmodels

Model composition

Emulators

Experiments Historical logs

Code skeletons

SKOPE language

Workload parameters

Sourcecode

Benchmarks

Simulators

SKOPE

Performance projections

System models (current or future)

Application behavior models

Our

multi-

modal

approach

Page 33: RAMSES: Robust Analytic Models for Science at Extreme Scales

33

Analytical models

Regressionmodels

Model composition

Experiments Historical logs

Code skeletons

SKOPE language

Workload parameters

Sourcecode

SKOPE

File transfer performance projections

System models Application behavior models

Storage, TCP, WAN

iperf

XDDEmulators

GridFTP

Application

to file

transfer

Page 34: RAMSES: Robust Analytic Models for Science at Extreme Scales

34

Analytical models

Regressionmodels

Model composition

Experiments Historical logs

Code skeletons

SKOPE language

Workload parameters

Sourcecode

SKOPE

Exascale simulation perf. projections

System models Application behavior modelsCompute, memory,

interconnect

MPI benchmarks

IORDGEMM

Stream

pedagogical example, the code skeleton for dense mat rix mul-t iplicat ion (denoted with Mat Mul ) is shown in List ing 2. Thecorresponding CPU code is shown in List ing 1 in C. The syntaxof a code skeleton is not the focus of this paper. I t is brieflyint roduced in the comments of the example code skeletons andis not discussed in further detail.

L ist ing 1: Mat Mul ’s CPU code

1 f l oat A[ N] [ K] , B[ K] [ M] ;f l oat C[ N] [ M] ;

3 i nt i , j , k ;f or ( i =0; i <N; ++i ) {

5 f or ( j =0; j <M; ++j ) {f l oat sum = 0;

7 f or ( k =0; k <K; ++k ) {sum+=A[ i ] [ k ] * B[ k ] [ j ] ;

9 }C[ i ] [ j ] = sum;

11 }

L ist ing 2: Mat Mul ’s code skele-t on

1 f l oat A[ N] [ K]f l oat B[ K] [ M]

3 f l oat C[ N] [ M]/ * t he l oop space * /

5 par al l el _f or ( N, M): i , j

7 {/ * comput at i on w/ t

9 * i ns t r uc t i on count* /

11 comp 1/ * s t r eami ng l oop * /

13 s t r eam k = 0: K {/ * l oad * /

15 l d A[ i ] [ k ]l d B[ k ] [ j ]

17 comp 3}

19 comp 5/ * s t or e * /

21 st C[ i ] [ j ]}

L ist ing 3: Mat Mul ’s opt im ized GPUcode

f l oat A[ N] [ K] , B[ K] [ M] , C[ N] [ M] ;2 di m3 bl ock ( Bl kSi ze , Bl kSi ze ) ;

di m3 gr i d ( N/ Bl kSi ze , M/ Bl kSi ze ) ;4 Mat r i xMul <<<gr i d , bl ock >( A, B, C) ;

6 __gl obal __ Mat r i xMul ( A, B, C){

8 __shar ed__ a[ Bl kSi ze ] [ Bl kSi ze ] ;__shar ed__ b[ Bl kSi ze ] [ Bl kSi ze ] ;

10 i nt t y = t hr eadI dx . y ;i nt t x = t hr eadI dx . x ;

12 i nt y = bl ock I dx . y * bl ockDi m. y+t y ;i nt x = bl ock I dx . x * bl ockDi m. x+t x ;

14 f l oat sum = 0. f ;f or ( i nt n=0; n<K; n+=Bl kSi ze ) {

16 a[ t y ] [ t x ] =A[ y ] [ n+t x ] ;b[ t y ] [ t x ] = B[ n+t y ] [ x ] ;

18 __sync t hr eads ( ) ;f or ( i nt k =0; k <Bl kSi ze ; ++k ) {

20 sum += a[ t y ] [ k ] * b[ k ] [ t x ] ;}

22 __sync t hr eads ( ) ;}

24 C[ y ] [ x ] = sum;}

The following informat ion forms a code skeleton that expressesa computat ional kernel.

D at a par al lel ism is expressed as a set of parallel, homoge-neous tasks repeated over different data elements. Users shouldexpress data parallelism in it s finest granularity (i .e., down tothe innermost parallel f or loops).

A t ask corresponds to one iterat ion of the innermost parallelf or loop. I t is expressed as a sequence of data accesses andcomputat ion.

D at a accesses are expressed as a set of load and store oper-at ions. The accessed array elements are expressed given loop in-dices, array sizes, and other constants. Indirect data accesses canbe expressed as well; GROPHECY will assume indirect accessesare random unless users provide further hint s (see Sect ion 9.4and List ing 6).

Com put at ion inst r uct ions are counted by using methodsdescribed in Sect ion 7.3. Together with the number of memoryinst ruct ions, they indicate the computat ional intensity of t hekernel.

B r anch inst r uct ions are counted to judge the applicabilityof loop unrolling.

For loops wrap around blocks of computat ion and data ac-cesses to mark repet it ion within a task. They can be nested andthe nest ing does not have to be perfect .

St r eam ing loops are a special type of f or loop; they aremarked where a sequence of data elements are fetched and pro-cessed and can be discarded immediately. I t is a common pat ternfor reduct ion. St reaming loops can be temporally decomposedinto stages for the purpose of caching. Line 7 in List ing 1 is anexample of a st reaming loop.

M acr os that define array sizes and the number of loop itera-t ions. By adjust ing the macros, the same code skeleton can beused for workloads at different scales.

Once const ructed, the code skeleton can then be t ransformedto mimic GPU opt imizat ions. Note that the mimicked GPU im-plementat ion can differ significant ly from the original CPU code.As an example, List ing 3 shows the GPU kernel of Mat Mul , wheref or loops are not only spat ially decomposed among threadsbut also temporally decomposed into stages for the purpose ofcaching. Both t ransformat ions are common and crit ical in man-ual GPU opt imizat ion.

6. Code TransformationsGiven the code skeleton, GROPHECY t ransforms and lays

out code for a target GPU (recall Figure 1, Step 2). This sec-t ion describes how code layouts are represented (Sect ion 6.1),how the space of possible layouts is searched (Sect ion 6.2), andaddit ional representat ions and met rics needed to carry out thissearch (Sect ions 6.3–6.7).

6.1 Code Layout ParameterizationCode t ransformat ion involves the following factors, whose val-

ues joint ly define a part icular code layout .T hr ead block sizes, represented as B = { b1 , ..., bn } , where

n is the dimensionality of the loop space and bi is the lengthof the thread block in the i th dimension; si ze(B) denotes thenumber of threads in a thread block. We vary the thread blocksize given the loop space and the hardware const raint on thenumber of threads per block.1

St aging, or temporarily decomposing st reaming loops into se-quent ial stages of iterat ions. Within one stage, a thread blockonly needs to cache the port ion of data elements used in thisstage. Staging can be expressed as two integer vectors. For acode skeleton with n st reaming loops, S = { s1 , ..., sn } containssi which defines the staging size, or the number of iterat ions inone stage for the i th st reaming loop. Moreover, some consecu-t ive st reaming loops actually form a mult idimensional st reamingloop, whose t raversal orders are interchangeable with regard toouter loops and inner loops. Different t raversal orders may resultin different performances as a result of data locality and caching.Therefore, O = { o1 , ..., on } defines the t raversal order where oj

is the ident ifier of the j th st reaming loop to be t raversed.Folding, or assigning mult iple tasks to one thread. I t is rep-

resented as F = { f 1 , ..., f n } , where n is the dimensionality of theloop space and f i is the number of indices assigned to a threadalong the i th loop. When folding is not applied, GROPHECYassumes each thread computes one task and f i = 1 for all i ’s.The folding degree, F , is defined as the total number of tasks as-signed to a thread, or

n

i = 1f i . For the purpose of data reuse and

coalescing, folding always assigns neighboring tasks to threadswith adjacent thread indices [27]. Once applied, addit ional loopstatements will be added so that a thread can it erate throughassigned tasks. These addit ional loop statements are consideredas st reaming loops, and staging can be applied.

Caching St r at egy . The caching st rategy categorizes dataaccesses into uncached accesses to global memory and cachedaccesses to shared memory. For shared memory, the cachingst rategy also describes which array segments are cached. Weuse bounded regular sections (BRS) [12], a derived form of regu-lar section descr iptors (RSD) [6, 4], to represent data accesses.A data access statement in the code skeleton can be representedas A(D ,Θ, I ). D is the array to be accessed. Θ = { θ1 , ..., θm } ,where θj is the index to D ’s j t h dimension. Each θ can be afunct ion involving I = { I 1 , ..., I n } , which are indices of the loopsthat contain this data access statement . For all data accessesin a code skeleton, a code layout uses { A} to specify the set ofuncached memory accesses and { A} to specify the set of cachedmemory accesses. The shape of D ’s region cached in sharedmemory during each stage of the kth st reaming loop is denotedwith ShM em(D i , k); k = 0 corresponds to cached data for mem-ory accesses outside any st reaming loops. ShM em(D i , k) is afootpr int defined in Sect ion 6.3 and can be obtained by Equa-t ion 5.

L oop U nr ol l ing. Loop unrolling reduces inst ruct ions dueto loop overhead and is especially important for computat ion-bounded workloads. I t can be expressed by L = { l i , ..., ln } ,where l i is the number of iterat ions to be unrolled for the i thloop. According to our empirical studies of the NVCC com-piler [29], GROPHECY applies loop unrolling to any inner-thread,branch-free loops whose number of iterat ions can be determined

1 In a code layout , the dimensionality of a modeled thread blockis not rest ricted since a high-dimensional loop space can be flat -tened and reduced to a lower-dimensional space.

3

Application

to exascale

simulation

Page 35: RAMSES: Robust Analytic Models for Science at Extreme Scales

A performance database

• We aim to collect instrumentation data in a

central database to simplify model validation

• We plan to use the perfSONAR measurement

archive tool as a starting point

– REST API on top of Cassandra and Postgres

– Optimized for time series data

– Will extend as needed

– http://software.es.net/esmond/

35

Page 36: RAMSES: Robust Analytic Models for Science at Extreme Scales

Application to transfer optimization

36

Performance

predictor

Parameter

database

Performance

analyst

Model

refiner

User

feedback

agent

Globus

service(1) Transfer

description

(3) Transfer

performance

(4) User

feedback

Prediction

Analysis

Analysis

Parameter

update

(2)

Prediction

Page 37: RAMSES: Robust Analytic Models for Science at Extreme Scales

Summary

• We focus on the science of modeling: integration

of first-principles and data-driven models; model

composition and evaluation

• Our challenge applications span a broad

spectrum of DOE resources and disciplines

• We see big opportunities for cooperation: e.g.,

on development and evaluation of component

models

www.ramsesproject.org [soon!]

37

Page 38: RAMSES: Robust Analytic Models for Science at Extreme Scales

Thanks, and for more information

• Thanks to our sponsors:

Advanced Scientific Computing Research

Program manager: Rich Carlson

• Thanks to my RAMSES project co-participants

• For more information, please see

https://sites.google.com/site/ramsesdoeproject/

ianfoster.org and @ianfoster 38