Chalmers e-Science Initiative Seminar

42
The Mapper project receives funding from the EC's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° RI-261507. Multiscale Applications on European e- Infrastructures Marian Bubak AGH Krakow PL and University of Amsterdam NL on behalf of the MAPPER Consortium http://www.mapper-project.eu/ Chalmers e-Science Initiative Seminar 2 Dec 2011

description

Multiscale Applications on European e- Infrastructures Marian Bubak AGH Krakow PL and University of Amsterdam NL on behalf of the MAPPER Consortium http://www.mapper-project.eu/. Chalmers e-Science Initiative Seminar. 2 Dec 2011. About the speaker. - PowerPoint PPT Presentation

Transcript of Chalmers e-Science Initiative Seminar

Page 1: Chalmers e-Science Initiative Seminar

The Mapper project receives funding from the EC's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° RI-261507.

Multiscale Applications on European e- Infrastructures

Marian Bubak AGH Krakow PL and University of Amsterdam NL

on behalf of the MAPPER Consortium

http://www.mapper-project.eu/

Chalmers e-Science Initiative Seminar2 Dec 2011

Page 2: Chalmers e-Science Initiative Seminar

2

Academic Computer Centre

CYFRONET AGH (1973)

120 employees

http://www.cyfronet.pl/en/

Academic Computer Centre

CYFRONET AGH (1973)

120 employees

http://www.cyfronet.pl/en/

Department of Computer Science AGH (1980)

800 students, 70 employeeshttp://www.ki.agh.edu.pl/uk/index.htm

Faculty of Electrical Engineering, Automatics, Computer Science and

Electronics (1946)

4000 students, 400 employees

http://www.eaie.agh.edu.pl/

Faculty of Electrical Engineering, Automatics, Computer Science and

Electronics (1946)

4000 students, 400 employees

http://www.eaie.agh.edu.pl/

AGH University of Science and Technology (1919)

15 faculties, 36000 students; 4000 employeeshttp://www.agh.edu.pl/en

Other 14 faculties

Distributed Computing Environments (DICE)

Teamhttp://dice.cyfronet.pl)

About the speakerAbout the speaker

University of Amsterdam, Institute for Informatics, Computational Sciencehttp://www.science.uva.nl/~gvlam/wsvlam

1.

2.

Page 3: Chalmers e-Science Initiative Seminar

3

DICE DICE tteameam (http://dice.cyfronet.pl) (http://dice.cyfronet.pl)• Main research interests:

• investigation of methods for building complex scientific collaborative applications and large-scale distributed computing infrastructures

• elaboration of environments and tools for e-Science• development of knowledge-based approach to services,

components, and their semantic composition and integrationCrossGrid 2002-2005

interactive compute- and data-intensive applications

K-Wf Grid 2004-2007

knowledge-based composition of grid workflow applications

CoreGRID 2004-2008

problem solving environments, programming models

GREDIA 2006-2009

grid platform for media and banking applications

ViroLab 2006-2009

GridSpace virtual laboratory

PL-Grid 2009-2011

advanced virtual laboratory

gSLM 2010-2012

service level management for grid and clouds

UrbanFlood

2009-2012

Common Information Space for Early Warning Systems

MAPPER

VPH-Share

Collage

2010-2013

2011-2015

2011-?

computational strategies, software and services for distributed multiscale simulations Federating cloud resources for development and execution of VPH computationally and data intensive applications Executable Papers; 1st award of Elsevier Competition at ICCS2011

Page 4: Chalmers e-Science Initiative Seminar

4

Plan

• Multiscale applications• Multiscale modeling• Objectives of the MAPPER project• Programming and Execution Tools• MAPPER infrastructure• ISR application scenario• Summary

Page 5: Chalmers e-Science Initiative Seminar

5

Vision

• Distributed Multiscale Computing...– Strongly science driven, application pull

• ...on existing and emerging European e-Infrastructures, ...

• ... and exploiting as much as possible services and software developed in earlier (EU-funded) projects.– Strongly technology driven, technology push

Page 6: Chalmers e-Science Initiative Seminar

6

Nature is multiscale

• Natural processes are multiscale

– 1 H2O molecule

– A large collection of H2O molecules, forming H-bonds

– A fluid called water, and, in solid form, ice.

Page 7: Chalmers e-Science Initiative Seminar

7

Multiscale modeling

• Scale Separation Map

• Nature acts on all the scales

• We set the scales

• And then decompose the multiscale system in single scale sub-systems

• And their mutual coupling

temporalscale

spatialscale

x

L

t T

Page 8: Chalmers e-Science Initiative Seminar

8

From a Multiscale System to many Singlescale Systems

• Identify the relevant scales

• Design specific models which solve each scale

• Couple the subsystems using a coupling method

temporalscale

spatialscale

x

L

t T

Page 9: Chalmers e-Science Initiative Seminar

9

Why multiscale models?

• There is simply no hope to computationally track complex natural processes at their finest spatio-temporal scales.– Even with the ongoing growth in computational power.

Page 10: Chalmers e-Science Initiative Seminar

10

Minimal requirement

tol

interestofquantitiesinerrors

1solverscalefineofcost

solvermultiscaleofcost

Page 11: Chalmers e-Science Initiative Seminar

11

Multiscale computing

• Inherently hybrid models are best serviced by different types of computing environments

• When simulated in three dimensions, they usually require large scale computing capabilities.

• Such large scale hybrid models require a distributed computing ecosystem, where parts of the multiscale model are executed on the most appropriate computing resource.

• Distributed Multiscale Computing

Page 12: Chalmers e-Science Initiative Seminar

12

Two paradigms• Loosely Coupled

– One single scale model provides input to another

– Single scale models are executed once

– workflows

• Tightly Coupled

– Single scale models call each other in an iterative loop

– Single scale models may execute many times

– Dedicated coupling libraries are needed

temporalscale

spatialscale

x

L

t T temporalscale

spatialscale

x

L

t T

Page 13: Chalmers e-Science Initiative Seminar

13

MAPPER

Multiscale APPlications on European e-infRastructures

University of

Amsterdam

Max-Planck

Gesellschaft zur

Foerderung der

Wissenschaften E.V.

University of U

lster

Poznan

Supercomputing and

Netw

orking Centre

Akademia G

orniczo-

Hutnicza im

. Stanislawa

Staszica w Krakow

ie

Ludwig-M

aximilians-U

niversität

München

University of G

eneva

Chalm

ers Tekniska

Högskola

University C

ollege

London

Page 14: Chalmers e-Science Initiative Seminar

14

Motivation: user needs

VPHFusion

Computional Biology

MaterialScience

Engineering

Distributed Multiscale Computing Needs

Page 15: Chalmers e-Science Initiative Seminar

15

Applications

• 7 applications from 5 scientific domains ...

• ... brought under a common generic multiscale computing framework

virtual physiological human fusionhydrology nano material science computational biology

SSM Coupling topology (x)MML Task graph Scheduling

Page 16: Chalmers e-Science Initiative Seminar

16

Ambition

• Develop computational strategies, software and services

for distributed multiscale simulations across disciplines

exploiting existing and evolving European e-infrastructure

• Deploy a computational science infrastructure

• Deliver high quality components

aiming at large-scale, heterogeneous, high performance multi-disciplinary multiscale computing.

• Advance state-of-the-art in high performance computing on e-infrastructures

enable distributed execution of multiscale models across e-Infrastructures,

Page 17: Chalmers e-Science Initiative Seminar

17

High level tools: objectives• Design and implement an environment for

composing multiscale simulations from single scale models

– encapsulated as scientific software components

– distributed in various e-infrastructures

– supporting loosely coupled and tightly coupled paradigm

• Support composition of simulation models:

– using scripting approach

– by reusable “in-silico” experiments

• Allow interaction between software components from different e-Infrastructures in a hybrid way.

• Measure efficiency of the tools

Page 18: Chalmers e-Science Initiative Seminar

18

Requirements analysis• Focus on multiscale applications that are described as a set of connected,

but independent single scale modules and mappers (converters)

• Support describing such applications in uniform (standardized) way to:

– analyze application behavior

– support switching between different versions of the modules with the same scale and functionality

– support building different multiscale applications from the same modules (reusability)

• Support computationally intensive simulation modules

– requiring HPC or Grid resources

– often implemented as parallel programs

• Support tight (with loop), loose (without loop) and hybrid (both) connection modes

Page 19: Chalmers e-Science Initiative Seminar

19

Overview of tools

• MAPPER Memory (MaMe) - a semantics-aware persistence store to record metadata about models and scales

• Multiscale Application Designer (MAD) - visual composition tool transforming high level MML description into executable experiment

• GridSpace Experiment Workbench (EW) - execution and result management of experiments on e-infrastructures via interoperability layers (AHE, QCG)

Direct Experiment hosts (UIs)

User Interfaces and visual tools, task 8.1

Multiscale Application Designer

GridSpace ExperimentWorkbench

GridSpaceExecution

EngineTask 8.3

ProvenanceTask 8.4

Result and file browsing

XMMLRepository

Task 8.2

Mapper Memory(MaMe)Task 8.2

QCG-Broker(Interoerability layer

WP4)

GridSpaceRegistry of Interpreters

( such as MUSCLE)Task 8.3

Module implemented

in the firstprototype

Module in the design phase

Legend:

AHE (Interoperability

layer WP4)

MaMe Web Interface

Data flowCurrent

Planned

Result Management Task 8.3

ProvenanceInterface

REST REST

REST

Currently:GSExperiment file

QCG-client API and GridFTP

ssh

currently:ssh

Java API

Software packages created in WP7, adapted by WP4, integrated by WP5 and installed by WP6

on e-infrastructures

Page 20: Chalmers e-Science Initiative Seminar

20

Multiscale modeling language

• Uniformly describes multiscale models and their computational implementation on abstract level

• Two representations: graphical (gMML), textual (xMML)

• Includes description of

– scale submodules

– scaleless submodules (so called mappers and filters)

– ports and their operators (for indicating type of connections between modules)

– coupling topology

– implementation

Submodel execution loop in pseudocode

f := finit /*initialization*/

t := 0

while not EC(f, t):

Oi(f, t) /*intermediate observation*/

f := S(f, t) /*solving step*/

t += theta(f)

end

Of(f, t) /*final observation*/

Oi

Of

S

finit

undefined

Corresponding symbols in gMML

Example for Instent Restenosis application

IC – initial conditionsDD- drug diffusionBF – blood flowSMC – smooth muscle cells

Page 21: Chalmers e-Science Initiative Seminar

21

jMML library

Supports XMML analysis:

• Detection of initial models

• Constructing coupling topology (gMML)

• Generating task graph

• Deadlock detection

• Generating Scale Separation Map

• Supports Graphviz or pdf formats

Page 22: Chalmers e-Science Initiative Seminar

22

MaMe - MAPPER memory

• Provides rich, semantics-aware persistence store for other components to record information

• Based on a well-defined domain model containing MAPPER metadata defined in MML

• Other MAPPER tools store, publish and reuse such matadata throughout the entire Project and its Consortium

• Provides dedicated web interface for human users to browse and curate metadata

choose/add/delete

Mapper A

Mapper B

SubmoduleA

SubmoduleB

Page 23: Chalmers e-Science Initiative Seminar

23

MAD: Application Designer

• User friendly visual tool for composing multiscale applications

• Supports importing application structure from xMML (section A and B)

• Supports composing multiscale applications in gMML (section B) with additional graphical specific information - layout, color etc. (section C)

• Transforms gMML into xMML

• Performs MML analysis to identify its loosely and tightly coupled parts

• Using information from MaMe and GridSpace EW, transforms gMML into executable formats with information needed for actual execution (section D) :

– GridSpace Experiment

– MUSCLE connection file (cxa.rb)

Page 24: Chalmers e-Science Initiative Seminar

24

GridSpace Experiment Workbench

• Supports execution of experiments on e-infrastructures via interoperability layers

• Result management support

• Uses Interpreter-Executor model of computation:

– Interpreter - a software package available on the infrastructure, usually programatically accessible by DSL or script language e.g: MUSCLE, LAMMPS, CPMD

– Executor - a common entity for hosts, clusters, grid brokers etc. capable of running Interpreters

• Allows easy configuration of available executors and interpreters

Transforming example MML into executable GS Experiment

GS Experiment

Interpreter (muscle)Interpreter BInterpreter A

MML Tightly coupled part

Snippet 1 Snippet 2 Snippet 3

Interoperability Layer (QCG, AHE) , SSH accessible resources

E-Infrastructures

Exe

cutor A

Exe

cutor B

Exe

cutor C

MapperSubmodule(µm)

Submodule(cm)

Submodule(m)

Mapper

Page 25: Chalmers e-Science Initiative Seminar

25

User environment

Application composition:from MML to executable

experiment

Registration of MML metadata: submodules

and scales

Result Management

Execution of experiment using interoperability layer

on e-infrastructure

Page 26: Chalmers e-Science Initiative Seminar

26

… …2011 06 09 2012 201308 11

MoU signedTaskforceestablished 1st evaluation

• Joined task force between MAPPER, EGI and PRACE

• Collaborate with EGI and PRACE to introduce new capabilities and policies onto e-Infrastructures

• Deliver new application tools, problem solving environments and services to meet end-users needs

• Work closely with various end-users communities (involved directly in MAPPER) to perform distributed multiscale simulations and complex experiments

05

1st EU reviewselected two apps

on MAPPERe-Infrastructure

(EGI and PRACEresources) Tier - 2

Tier - 1

Tier - 0 MA

PP

ER

Taskfo

rceM

AP

PE

R T

askforce

E-infrastructure

Page 27: Chalmers e-Science Initiative Seminar

27

MAPPER e-infrastructure

• MAPPER pre-production infrastructure– Cyfronet, LMU/LRZ, PSNC, TASK, UCL, WCSS

– Environment for developing, testing and deployment of MAPPER components

• Central services– GridSpace, MAD, MaMe, monitoring, web-site

• EGI-MAPPER-PRACE task force– SARA Huygens HPC system

Page 28: Chalmers e-Science Initiative Seminar

28

2 scenarios in operation

loosely coupled DMC tightly coupled DMC

Page 29: Chalmers e-Science Initiative Seminar

29

In-stent restenosis

• Coronary heart disease (CHD) remains the most common cause of death in the Europe, being responsible for approximately 1.92 million deaths each year*

• A stenosis is an abnormal narrowing of a blood vessel

• Solution: a stent placed with a balloon angioplasty

• Possible response, in 10% of the cases: abnormal tissue growth in the form of in-stent restenosis

– Multiscale, multiphysics phenomenon involving physics, biology, chemistry, and medicine

Page 30: Chalmers e-Science Initiative Seminar

30

In-stent restenosis model• A 3D model of in-stent restenosis

(ISR3D)

– why does it occur, when does it stop?

– Ultimate goal:• Facilitate stent design• Effect of drug eluting stents

• Models:

– cells in the vessel wall;

– blood in the lumen;

– drug diffusion; and

– most importantly their interaction

• 3D model is computationally very expensive

• 2D model has published results*

*H. Tahir, A. G. Hoekstra et al. Interface Focus, 1(3), 365–373

Page 31: Chalmers e-Science Initiative Seminar

31

Scale separation map

• Four main submodels

• Same spatial scale

• Different temporal scale

Page 32: Chalmers e-Science Initiative Seminar

32

Coupling topology

• Model

– is tightly coupled (excluding initial condition)

– has a fixed number of synchronization points

– has one instance per submodel

Page 33: Chalmers e-Science Initiative Seminar

33

MML of ISR3D

start

stop

submodel

mapper

edge heads/tails

finalization initialization

intermediate intermediate

Page 34: Chalmers e-Science Initiative Seminar

34

ISR3D on MAPPER

Page 35: Chalmers e-Science Initiative Seminar

35

Demo: Mapper Memory (MaMe)

• Semantics-aware persistence store

• Records MML-based metadata about models and scales

• Supports exchanging and reusing MML metadata for– other MAPPER tools via

REST interface– human users via

dedicated Web interface

choose/add/delete

Mapper A

Mapper B

SubmoduleA

SubmoduleB

Ports and their operators

Page 36: Chalmers e-Science Initiative Seminar

36

Demo: Gridspace EW for ISR3D

• Obtains MAD generated experiment containing a configuration file for MUSCLE interpreter

• Provides two executors for MUSCLE interpreter

– SSH on Polish NGI UI – cluster execution

– QCG – multisite execution

• Uses QCG executor for running MUSCLE interpreter on QCG and staging input/output files

GS Experiment

MUSCLE Interpreter

MML Tightly coupled part

Snippet 1

Interoperability Layer (QCG, AHE) , SSH accessible resources

E-Infrastructures

QCG

Executor

Submodulecm

Submodulem

Mapper

# declare kernels which can be launched in the CxA cxa.add_kernel(’submodel_instance1, ’my.submodelA’) cxa.add_kernel(’submodel_instance2’, ’my.submodelB’) …# configure connection scheme of the CxA cs = cxa.cs # configure unidirectional connection between kernelscs.attach ’ submodel_instance1’=> ’submodel_instance2’ do tie ’portA’, ’portB’ …..end…

Page 37: Chalmers e-Science Initiative Seminar

37

Computing

• ISR3D is implemented using the multiscale coupling library and environment (MUSCLE)

• Contains Java, Fortran and C++ submodels

• MUSCLE provides uniform communication between tightly coupled submodels

• MUSCLE can be run on a laptop, a cluster or multiple sites.

Page 38: Chalmers e-Science Initiative Seminar

38

QCG Role

• Provides an interoperability layer between PRACE and EGI infrastructures

• Co-allocates heterogeneous resources according to the requirements of a MUSCLE application using an advance reservation mechanism

• Synchronizes the execution of application kernels in multi-cluster environment

• Efficiently executes and manages tasks on EGI and UCL resources

• Manages data transfers

Page 39: Chalmers e-Science Initiative Seminar

39

ISR3D Results

Page 40: Chalmers e-Science Initiative Seminar

40

ISR3D - conclusion

• Before MAPPER, ISR2D ran fast enough, ISR3D took too much exection time and a lot of time to couple

• Now, ISR3D runs distributedly using the MAPPER tools and middleware

• To get scientific results we will have to, and can, run many batch jobs

– Done in the MeDDiCa EU project

– Involves 1000s of runs

• Also, the code can be parallelized to run faster

Page 41: Chalmers e-Science Initiative Seminar

41

Summary

• Elaboration of a concept of an environment supporting developers and users of multiscale applications for grid and cloud infrastructures

• Design of the formalism for describing connections in multiscale simulations

• Enabling access to e-infrastructures

• Validation of the formalism against real applications structure by using tools

• Proof of concept for transforming high level formal description to actual execution using e-infrastructures

Page 42: Chalmers e-Science Initiative Seminar

More about MAPPER

http://www.mapper-project.eu/

http://dice.cyfronet.pl/