Bob Thome Senior Manager, Grid Computing Enterprise Grid Computing.
Grid Computing July 2009
-
Upload
ian-foster -
Category
Technology
-
view
1.884 -
download
0
description
Transcript of Grid Computing July 2009
Grid computingIan Foster
Computation Institute
Argonne National Lab & University of Chicago
2
“When the network is as fast as the
computer’s internal links, the machine
disintegrates across the net into a set of
special purpose appliances”
(George Gilder, 2001)
3
“I’ve been doing cloud computing since before it
was called grid.”
4
“Computation may someday be organized as a public utility …
The computing utility could become the basis for a new and important
industry.”
John McCarthy
(1961)
5
Scientific collaboration
Scientific collaboration
6
Addressing urban health
needs
7
Important characteristics
We must integrate systems that may not have worked together before
These are human systems, with differing goals, incentives, capabilities
All components are dynamic—change is the norm, not the exception
Processes evolve rapidly also
We are not building something simple like a
bridge or an airline reservation system
8
We are dealing withcomplex adaptive systems
A complex adaptive system is a collection of individual
agents that have the freedom to act in ways that are not
always predictable and whose actions are interconnected
such that one agent’s actions changes the context
for other agents.
Crossing the Quality Chasm, IOM, 2001; pp 312-13
Non-linear and dynamic Agents are independent
and intelligent Goals and behaviors
often in conflict Self-organization through
adaptation and learning No single point(s) of
control Hierarchical decomp-
osition has limited value
9
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
Zone of
complexity
10
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
11
“The Anatomy of the Grid,” 2001 The … problem that underlies the Grid concept is
coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource -brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).
12
Examples (from AotG, 2001)
“The application service providers, storage service providers, cycle providers, and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory”
“Members of an industrial consortium bidding on a new aircraft”
“A crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation”
“Members of a large, international, multiyear high-energy physics collaboration”
13From the organizational behavior and management community
“[A] group of people who interact through interdependent tasks guided by common purpose [that] works across space, time, and organizational boundaries with links strengthened by webs of communication technologies”
— Lipnack & Stamps, 1997
Yes—but adding cyber-infrastructure: People computational agents & services Communication technologies IT infrastructure
Collaboration based on rich data & computing capabilities
14
NSF Workshops on
Building Effective Virtual
Organizations
[Search “BEVO 2008”]
15
The Grid paradigm
1995 2000 2005 2010
Principles and mechanisms for dynamic VOs Leverage service oriented architecture (SOA) Loose coupling of
data and services Open software,
architecture
Computer science
Physics
Astronomy
Engineering
Biology
Biomedicine
Healthcare
16We call these groupingsvirtual organizations (VOs)
Healthcare = dynamic, overlapping VOs, linking Patient – primary care Sub-specialist – hospital Pharmacy – laboratory Insurer – …
A set of individuals and/or institutions engaged in the controlled sharing of
resources in pursuit of a common goal
But U.S. health system is marked by
fragmented and inefficient VOs with
insufficient mechanisms for
controlled sharing
I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser
17
The Grid paradigm and information integration
Radiology Medical records
Name resources; move data around
Make resources usable and useful
Make resources accessible over the network
Pathology Genomics Labs
Man
ag
e w
ho ca
n d
o w
hat
RHIOData
sources
Platform services
18
The Grid paradigm and information integration
Transform data into knowledge
Radiology Medical records
Management
Integration
Publication
Enhance user cognitive processes
Incorporate into business processes
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
19
The Grid paradigm and information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services
20
We partition the multi-faceted interoperability problem
Process interoperability Integrate work across healthcare
enterprise Data interoperability
Syntactic: move structured data among system elements
Semantic: use information across system elements
Systems interoperability Communicate securely, reliably
among system elements
Analysis
Management
Integration
Publication
Applications
21
Security and policy:Managing who can do what
Familiar division of labor
Publication level: bridge between local and global
Integration level: VO-specific policies, based on attributes
Attribute authorities
Identity-based authZMost simple - not scalable
Unix Access Control Lists (Discretionary Access Control: DAC)
Groups, directories, simple admin
POSIX ACLs/MS-ACLs
Finer-grained admin policy
Role-based Access Control (RBAC)
Separation of role/group from rule admin
Mandatory Access Control (MAC)
Clearance, classification, compartmentalization
Attribute-based Access Control (ABAC)
Generalization of attributes
>>> Policy language abstraction level and expressiveness >>>
>>> Policy language abstraction level and expressiveness >>>
23
Globus / caGrid GAARDS
24
Publication:Make information accessible
Make data available in a remotely accessible, reusable manner
Leave mediation for integration layer
Gateway from local policy/protocol into wide area mechanisms (transport, security, …)
25
TeraGrid participants
26Federating computers for physics data analysis
27
28
Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal
198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint
DOE/NSF climate modeling experiments
35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models
8,000 registered users 1,900 registered projects
Downloads to date 49 TB 176,000 files
Downloads to date 387 TB 1,300,000 files 500 GB/day
(average)
400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data
Earth System Grid
ESG usage: over 500 sites worldwide
ESG monthly download volumes
Globus
29
En
terp
rise/G
ridIn
terfa
ce se
rvice
DICOMprotocols
Grid protocols
(Web services)
DICOM
XDS
HL7
Vendor-specific
Wid
e a
rea
serv
ice a
ctor
Plug-in adapters
Children’s Oncology Group
30
ApplnService
Create
Index service
StoreRepository ServiceAdvertize
Discover
Invoke;get results
Introduce
Container
Transfer GAR
Deploy
caGrid, Introduce, gRAVI: Ohio State, U.Chicago
Automating service creation, deployment
Introduce Define service Create skeleton Discover types Add operations Configure security
Grid Remote Application Virtualization Infrastructure Wrap executables
31
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
32
Management:Naming and moving information
Persistent, uniform global naming of
objects, independent of type
Orchestration of data movement among
services
D
S1
S2
S3
D
S1
S2
S3
D
S1
S2
S3
33
Birmingham•
LIGO Data Grid
Replicating >1 Terabyte/day to 8 sites770 TB replicated to date: >120 million replicasMTBF = 1 month
LIGO Gravitational Wave Observatory
Cardiff
AEI/Golm
Ann Chervenak et al., ISI; Scott Koranda et al, LIGO
Globus
34
Pull “missing” files to a storage system
List of required
Files
GridFTPLocal
ReplicaCatalog
ReplicaLocation
Index
Data Replication
Service
Reliable File
Transfer Service Local
ReplicaCatalog
GridFTP
Data replication service
“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
ReplicaLocation
Index
Data movementData location
Data replication
35
Naming objects:A prerequisite to management
The naming problem: “Health objects” =
patient information, images, records, etc.
“Names” refer to health objects in records, files, databases, papers, reports, research, emails, etc.
Challenges: No systematic way of
naming health objects Many health objects,
like DICOM images and reports, include references to other objects through non-unique, ambiguous, PHI-tainted identifiers
A framework for distributed digital object services: Kahn, Wilensky, 1995
36
Health Object Identifier (HOI)naming system
uri:hdl://888.us.npi.1234567890.dicom/8A648C33-A5…4939EBE
Random String for Identifier-Body
PHI-free and guaranteed unique
Random String for Identifier-Body
PHI-free and guaranteed unique
888: CHI’s top-level naming
authority
888: CHI’s top-level naming
authority
National Provider Id used in hierarchical Identifier
Namespace
National Provider Id used in hierarchical Identifier
Namespace
Application Context’s Namespace governed by provider Naming Authority
Application Context’s Namespace governed by provider Naming Authority
HOI’s URI schema identifier—based on
Handle
HOI’s URI schema identifier—based on
Handle
37
Data movement in clinical trials
38Community public health:Digital retinopathy screening network
39
Integration:Making information useful
?
0% 100% Degree of prior syntactic and semantic agreement
Degree of communication
0%
100%
Rigid standards-based approach
Loosely coupled approach
Adaptive approach
40
Integration via mediation
Map between models Scoped to domain use
Multiple concurrent use
Bottom up mediation Between standards and
versions Between local versions In absence of
agreement
Query Reformulation
Query Optimization
Query Execution Engine
Wrapper
Query in the source schema
Wrapper
Query in union of exportedsource schema
Distributed query execution
Global Data Model
(Levy 2000)
41
ECOG 5202 integrated sample management
ECOGPCO
MD Anderson
Web portal
OGSA-DQP
OGSA-DAI OGSA-DAI OGSA-DAI
Mediator
ECOG CC
42
Analytics:Transform data into knowledge
“The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.”
— US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008
43Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan
44
Many many tasks:Identifying potential drug targets
2M+ ligands Protein xtarget(s)
(Mike Kubal, Benoit Roux, and others)
45
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues,
#MDsteps)
Amber Score:1. AmberizeLigand
3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED~4M x 60s x 1 cpu
~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
46DOCK on BG/P: ~1M tasks on 118,000 CPUs
CPU cores: 118784 Tasks: 934803 Elapsed time:
7257 sec Compute time:
21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to
32 racks) Utilization:
Sustained: 99.6% Overall: 78.3%
Time (secs)
47Scaling Posix to petascale
LFS Computenode
(local datasets)
LFS Computenode
(local datasets)
…
. . .
Largedataset
CN-striped intermediate file system
Torus and tree interconnects
Global file systemChirp(multicast)
MosaStore (striping)
Staging
Inter-mediate
Local
48
Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors
49
“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node
IoanRaicu
50
Same scenario, but with dynamic resource provisioning
51
Data diffusion sine-wave workload: Summary
GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs
52
Recap
Increased recognition that information systems and data understanding are limiting factor… much of the promise associated with health IT requires high
levels of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) …. RAND COMPARE
Health system is complex, adaptive system There is no single point(s) of control. System behaviors are often
unpredictable and uncontrollable, and no one is “in charge.” W Rouse, NAE Bridge
With diverse and evolving requirements and user communities… I advocate … a model of virtual integration rather than true
vertical integration…. G. Halvorson, CEO Kaiser
53
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
Functioning in the zone of complexity
Plan and
control
Chaos
54
The Grid paradigm and information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services
55
“The computer revolution hasn’t happened yet.”
Alan Kay, 1997
56
TimeCon
nect
ivit
y (
on log
sca
le)
Science Enterprise Consumer
“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”
(George Gilder, 2001)
Grid Cloud ????
Computation Institutewww.ci.uchicago.edu
Thank you!