Grid Computing July 2009

Grid computingIan Foster

Computation Institute

Argonne National Lab & University of Chicago

2

“When the network is as fast as the

computer’s internal links, the machine

disintegrates across the net into a set of

special purpose appliances”

(George Gilder, 2001)

3

“I’ve been doing cloud computing since before it

was called grid.”

4

“Computation may someday be organized as a public utility …

The computing utility could become the basis for a new and important

industry.”

John McCarthy

(1961)

5

Scientific collaboration

Scientific collaboration

6

Addressing urban health

needs

7

Important characteristics

We must integrate systems that may not have worked together before

These are human systems, with differing goals, incentives, capabilities

All components are dynamic—change is the norm, not the exception

Processes evolve rapidly also

We are not building something simple like a

bridge or an airline reservation system

8

We are dealing withcomplex adaptive systems

A complex adaptive system is a collection of individual

agents that have the freedom to act in ways that are not

always predictable and whose actions are interconnected

such that one agent’s actions changes the context

for other agents.

Crossing the Quality Chasm, IOM, 2001; pp 312-13

Non-linear and dynamic Agents are independent

and intelligent Goals and behaviors

often in conflict Self-organization through

adaptation and learning No single point(s) of

control Hierarchical decomp-

osition has limited value

9

Ralph Stacey, Complexity and Creativity in Organizations, 1996

Low

LowHigh

High

Agreementabout

outcomes

Certainty about outcomes

We need to function in the zone of complexity

Plan and

control

Chaos

Zone of

complexity

10


Low

LowHigh

High

Agreementabout

outcomes


We need to function in the zone of complexity

Plan and

control

Chaos

11

“The Anatomy of the Grid,” 2001 The … problem that underlies the Grid concept is

coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource -brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).

12

Examples (from AotG, 2001)

“The application service providers, storage service providers, cycle providers, and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory”

“Members of an industrial consortium bidding on a new aircraft”

“A crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation”

“Members of a large, international, multiyear high-energy physics collaboration”

13From the organizational behavior and management community

“[A] group of people who interact through interdependent tasks guided by common purpose [that] works across space, time, and organizational boundaries with links strengthened by webs of communication technologies”

— Lipnack & Stamps, 1997

Yes—but adding cyber-infrastructure: People computational agents & services Communication technologies IT infrastructure

Collaboration based on rich data & computing capabilities

14

NSF Workshops on

Building Effective Virtual

Organizations

[Search “BEVO 2008”]

15

The Grid paradigm

1995 2000 2005 2010

Principles and mechanisms for dynamic VOs Leverage service oriented architecture (SOA) Loose coupling of

data and services Open software,

architecture

Computer science

Physics

Astronomy

Engineering

Biology

Biomedicine

Healthcare

16We call these groupingsvirtual organizations (VOs)

Healthcare = dynamic, overlapping VOs, linking Patient – primary care Sub-specialist – hospital Pharmacy – laboratory Insurer – …

A set of individuals and/or institutions engaged in the controlled sharing of

resources in pursuit of a common goal

But U.S. health system is marked by

fragmented and inefficient VOs with

insufficient mechanisms for

controlled sharing

I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser

17

The Grid paradigm and information integration

Radiology Medical records

Name resources; move data around

Make resources usable and useful

Make resources accessible over the network

Pathology Genomics Labs

Man

ag

e w

ho ca

n d

o w

hat

RHIOData

sources

Platform services

18


Transform data into knowledge


Management

Integration

Publication

Enhance user cognitive processes

Incorporate into business processes


Secu

rity a

nd

policy

RHIOData

sources

Platform services

19


Analysis


Management

Integration

Publication

Cognitive support

Applications


Secu

rity a

nd

policy

RHIOData

sources

Platform services

Value services

20

We partition the multi-faceted interoperability problem

Process interoperability Integrate work across healthcare

enterprise Data interoperability

Syntactic: move structured data among system elements

Semantic: use information across system elements

Systems interoperability Communicate securely, reliably

among system elements

Analysis

Management

Integration

Publication

Applications

21

Security and policy:Managing who can do what

Familiar division of labor

Publication level: bridge between local and global

Integration level: VO-specific policies, based on attributes

Attribute authorities

Identity-based authZMost simple - not scalable

Unix Access Control Lists (Discretionary Access Control: DAC)

Groups, directories, simple admin

POSIX ACLs/MS-ACLs

Finer-grained admin policy

Role-based Access Control (RBAC)

Separation of role/group from rule admin

Mandatory Access Control (MAC)

Clearance, classification, compartmentalization

Attribute-based Access Control (ABAC)

Generalization of attributes

>>> Policy language abstraction level and expressiveness >>>

>>> Policy language abstraction level and expressiveness >>>

23

Globus / caGrid GAARDS

24

Publication:Make information accessible

Make data available in a remotely accessible, reusable manner

Leave mediation for integration layer

Gateway from local policy/protocol into wide area mechanisms (transport, security, …)

25

TeraGrid participants

26Federating computers for physics data analysis

28

Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal

198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint

DOE/NSF climate modeling experiments

35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the

Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models

8,000 registered users 1,900 registered projects

Downloads to date 49 TB 176,000 files

Downloads to date 387 TB 1,300,000 files 500 GB/day

(average)

400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data

Earth System Grid

ESG usage: over 500 sites worldwide

ESG monthly download volumes

Globus

29

En

terp

rise/G

ridIn

terfa

ce se

rvice

DICOMprotocols

Grid protocols

(Web services)

DICOM

XDS

HL7

Vendor-specific

Wid

e a

rea

serv

ice a

ctor

Plug-in adapters

Children’s Oncology Group

30

ApplnService

Create

Index service

StoreRepository ServiceAdvertize

Discover

Invoke;get results

Introduce

Container

Transfer GAR

Deploy

caGrid, Introduce, gRAVI: Ohio State, U.Chicago

Automating service creation, deployment

Introduce Define service Create skeleton Discover types Add operations Configure security

Grid Remote Application Virtualization Infrastructure Wrap executables

31

As of Oct19, 2008:

122 participants105 services

70 data35 analytical

32

Management:Naming and moving information

Persistent, uniform global naming of

objects, independent of type

Orchestration of data movement among

services

D

S1

S2

S3

D

S1

S2

S3

D

S1

S2

S3

33

Birmingham•

LIGO Data Grid

Replicating >1 Terabyte/day to 8 sites770 TB replicated to date: >120 million replicasMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

Ann Chervenak et al., ISI; Scott Koranda et al, LIGO

Globus

34

Pull “missing” files to a storage system

List of required

Files

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Index

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data replication service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Index

Data movementData location

Data replication

35

Naming objects:A prerequisite to management

The naming problem: “Health objects” =

patient information, images, records, etc.

“Names” refer to health objects in records, files, databases, papers, reports, research, emails, etc.

Challenges: No systematic way of

naming health objects Many health objects,

like DICOM images and reports, include references to other objects through non-unique, ambiguous, PHI-tainted identifiers

A framework for distributed digital object services: Kahn, Wilensky, 1995

36

Health Object Identifier (HOI)naming system

uri:hdl://888.us.npi.1234567890.dicom/8A648C33-A5…4939EBE

Random String for Identifier-Body

PHI-free and guaranteed unique

Random String for Identifier-Body

PHI-free and guaranteed unique

888: CHI’s top-level naming

authority

888: CHI’s top-level naming

authority

National Provider Id used in hierarchical Identifier

Namespace

National Provider Id used in hierarchical Identifier

Namespace

Application Context’s Namespace governed by provider Naming Authority

Application Context’s Namespace governed by provider Naming Authority

HOI’s URI schema identifier—based on

Handle

HOI’s URI schema identifier—based on

Handle

37

Data movement in clinical trials

38Community public health:Digital retinopathy screening network

39

Integration:Making information useful

?

0% 100% Degree of prior syntactic and semantic agreement

Degree of communication

0%

100%

Rigid standards-based approach

Loosely coupled approach

Adaptive approach

40

Integration via mediation

Map between models Scoped to domain use

Multiple concurrent use

Bottom up mediation Between standards and

versions Between local versions In absence of

agreement

Query Reformulation

Query Optimization

Query Execution Engine

Wrapper

Query in the source schema

Wrapper

Query in union of exportedsource schema

Distributed query execution

Global Data Model

(Levy 2000)

41

ECOG 5202 integrated sample management

ECOGPCO

MD Anderson

Web portal

OGSA-DQP

OGSA-DAI OGSA-DAI OGSA-DAI

Mediator

ECOG CC

42

Analytics:Transform data into knowledge

“The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.”

— US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008

43Microarray clustering using Taverna

1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub

2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService

1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage

Workflow in/output

caGrid services

“Shim” servicesothers

Wei Tan

44

Many many tasks:Identifying potential drug targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

45

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues,

#MDsteps)

Amber Score:1. AmberizeLigand

3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED~4M x 60s x 1 cpu

~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

46DOCK on BG/P: ~1M tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time:

7257 sec Compute time:

21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to

32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

Time (secs)

47Scaling Posix to petascale

LFS Computenode

(local datasets)

LFS Computenode

(local datasets)

…

. . .

Largedataset

CN-striped intermediate file system

Torus and tree interconnects

Global file systemChirp(multicast)

MosaStore (striping)

Staging

Inter-mediate

Local

48

Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors

49

“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node

IoanRaicu

50

Same scenario, but with dynamic resource provisioning

51

Data diffusion sine-wave workload: Summary

GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs

52

Recap

Increased recognition that information systems and data understanding are limiting factor… much of the promise associated with health IT requires high

levels of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) …. RAND COMPARE

Health system is complex, adaptive system There is no single point(s) of control. System behaviors are often

unpredictable and uncontrollable, and no one is “in charge.” W Rouse, NAE Bridge

With diverse and evolving requirements and user communities… I advocate … a model of virtual integration rather than true

vertical integration…. G. Halvorson, CEO Kaiser

53


Low

LowHigh

High

Agreementabout

outcomes


Functioning in the zone of complexity

Plan and

control

Chaos

54


Analysis


Management

Integration

Publication

Cognitive support

Applications


Secu

rity a

nd

policy

RHIOData

sources

Platform services

Value services

55

“The computer revolution hasn’t happened yet.”

Alan Kay, 1997

56

TimeCon

nect

ivit

y (

on log

sca

le)

Science Enterprise Consumer

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

Grid Cloud ????

Computation Institutewww.ci.uchicago.edu

Thank you!

Grid Computing July 2009

Technology

Transcript of Grid Computing July 2009