Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne...

68
Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster Keynote Talk, QUESTnet 2002 Conference, Gold Coast, July 4, 2002
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne...

Page 1: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

FutureScientific Infrastructure

Ian Foster

Mathematics and Computer Science Division

Argonne National Laboratory

and

Department of Computer Science

The University of Chicago

http://www.mcs.anl.gov/~foster

Keynote Talk, QUESTnet 2002 Conference, Gold Coast, July 4, 2002

Page 2: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

2

[email protected] ARGONNE CHICAGO

Evolution of Infrastructure

1890: Local power generation

2002: Primarily local computing & storage

– AC transmission => power Grid => economies of scale & revolutionary new devices

– Internet & optical technologies => ???

Page 3: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

3

[email protected] ARGONNE CHICAGO

A Computing Grid

On-demand, ubiquitous access to computing, data, and services

“We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service

individual homes and offices across the country”

(Len Kleinrock, 1969)

“When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances”

(George Gilder, 2001)

New capabilities constructed dynamically and transparently from distributed services

Page 4: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

4

[email protected] ARGONNE CHICAGO

Distributed Computing+VisualizationRemote CenterGenerates Tb+ datasets fromsimulation code

LAN/WAN Transfer

User-friendly striped GridFTP application tiles the frames and stages tiles onto display nodes

FLASH data transferredto ANL for visualization

GridFTP parallelismutilizes high bandwidth

(Capable of utilizing>Gb/s WAN links)

WAN Transfer Chiba City Visualizationcode constructsand storeshigh-resolutionvisualizationframes fordisplay onmany devices

ActiveMural DisplayDisplays very high resolution

large-screen dataset animations

Job SubmissionSimulation code submitted toremote center for execution

on 1000s of nodes

FUTURE (1-5 yrs)• 10s Gb/s LANs, WANs• End-to-end QoS• Automated replica management• Server-side data reduction & analysis• Interactive portals

Page 5: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

5

[email protected] ARGONNE CHICAGO

eScience Application: Sloan Digital Sky Survey Analysis

Page 6: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

6

[email protected] ARGONNE CHICAGO

catalog

cluster

5

4

core

brg

field

tsObj

3

2

1

brg

field

tsObj

2

1

brg

field

tsObj

2

1

brg

field

tsObj

2

1

core

3

Cluster-finding Data Pipeline

Page 7: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

7

[email protected] ARGONNE CHICAGO

Size distribution ofgalaxy clusters?

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

Chimera Virtual Data System+ iVDGL Data Grid (many CPUs)

Chimera Application: Sloan Digital Sky Survey Analysis

Page 8: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

8

[email protected] ARGONNE CHICAGO

•Lift Capabilities•Drag Capabilities•Responsiveness

•Deflection capabilities•Responsiveness

•Thrust performance•Reverse Thrust performance•Responsiveness•Fuel Consumption

•Braking performance•Steering capabilities•Traction•Dampening capabilities

Crew Capabilities- accuracy- perception- stamina- re-action times- SOPs

Engine Models

Airframe Models

Wing Models

Landing Gear Models

Stabilizer Models

Human Models

Grids at NASA: Aviation Safety

Page 9: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

9

[email protected] ARGONNE CHICAGO

NETWORK

IMAGINGINSTRUMENTS

COMPUTATIONALRESOURCES

LARGE DATABASES

DATA ACQUISITIONPROCESSING,

ANALYSISADVANCED

VISUALIZATION

Life Sciences: Telemicroscopy

Page 10: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

10

[email protected] ARGONNE CHICAGO

Business Opportunities

On-demand computing, storage, services– Significant savings due to reduced build-out,

economies of scale, reduced admin costs

– Greater flexibility => greater productivity Entirely new applications and services

– Based on high-speed resource integration Solution to enterprise computing crisis

– Render distributed infrastructures manageable

Page 11: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

11

[email protected] ARGONNE CHICAGO

Grid Evolution

Page 12: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

12

[email protected] ARGONNE CHICAGO

Grids and Industry: Early Examples

Butterfly.net: Grid for multi-player games

Entropia: Distributed computing(BMS, Novartis, …)

Page 13: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

13

[email protected] ARGONNE CHICAGO

Resources– Computing, storage, data

Communities– Operational procedures, …

Grid InfrastructureA

AA

Services– Authentication, discovery, …

Connectivity– Reduce tyranny of distance

Technologies– Build applications, services

Page 14: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

14

[email protected] ARGONNE CHICAGO

Example Grid Infrastructure Projects

I-WAY (1995): 17 U.S. sites for one week GUSTO (1998): 80 sites worldwide, experim. NASA Information Power Grid (since 1999)

– Production Grid linking NASA laboratories INFN Grid, EU DataGrid, iVDGL, … (2001+)

– Grids for data-intensive science TeraGrid, DOE Science Grid (2002+)

– Production Grids link supercomputer centers U.S. GRIDS Center

– Software packaging, deployment, support

Page 15: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

15

[email protected] ARGONNE CHICAGO

Topics in Grid Infrastructure

Regional, national, intl optical infrastructure– I-WIRE, StarLight, APAN

TeraGrid: Deep infrastructure– High-end support for U.S. community

iVDGL: Wide infrastructure– Building a (international) community

Open Grid Services Architecture– Future service & technology infrastructure

Page 16: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

16

[email protected] ARGONNE CHICAGO

Topics in Grid Infrastructure

Regional, national, intl optical infrastructure– I-WIRE, StarLight

TeraGrid: Deep infrastructure– High-end support for U.S. community

iVDGL: Wide infrastructure– Building a (international) community

Open Grid Services Architecture– Future service & technology infrastructure

Page 17: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

17

[email protected] ARGONNE CHICAGO

Targeted StarLightOptical Network Connections

Vancouver

Seattle

Portland

San Francisco

Los Angeles

San Diego(SDSC)

NCSA

Chicago NYC

SURFnetCA*net4Asia-

Pacific

Asia-Pacific

AMPATH

PSC

Atlanta

IU

U Wisconsin

DTF 40Gb

NTON

NTON

AMPATH

Atlanta

NCSA/UIUC

ANL

UICChicago Cross connect

NW Univ (Chicago) StarLight Hub

Ill Inst of Tech

Univ of Chicago Indianapolis (Abilene NOC)

St Louis GigaPoP

I-WIRE

www.startap.net

CERN

Page 18: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

19

[email protected] ARGONNE CHICAGO

UIUC/NCSA

Starlight(NU-Chicago)Argonne

UChicagoIIT

UIC

State/City ComplexJames R. Thompson CtrCity HallState of IL Bldg

4

12

4

2 2

4

18

4 10

12

2

Level(3)111 N. Canal

McLeodUSA151/155 N. MichiganDoral Plaza

Qwest455 N. CityfrontUC Gleacher Ctr

450 N. Cityfront

I-Wire Fiber Topology

• Fiber Providers: Qwest, Level(3), McLeodUSA, 360Networks

• 10 segments• 190 route miles; 816 fiber miles

•Longest segment: 140 miles

• 4 strands minimum to each site Numbers indicate fiber count (strands)

FNAL(est 4q2002)

Page 19: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

20

[email protected] ARGONNE CHICAGO

UIUC/NCSA

Starlight(NU-Chicago)Argonne

UChicagoIIT

UIC

State/City ComplexJames R. Thompson CtrCity HallState of IL Bldg

UC Gleacher Ctr450 N. Cityfront

I-Wire Transport

• Each of these three ONI DWDM systems have capacity of up to 66 channels, up to 10 Gb/s per channel• Protection available in Metro Ring on a per-site basis

TeraGrid Linear3x OC1921x OC48First light: 6/02

Metro Ring1x OC48 per siteFirst light: 8/02

Starlight Linear4x OC1924x OC48 (8x GbE)Operational

McLeodUSA151/155 N. MichiganDoral Plaza

Qwest455 N. Cityfront

Page 20: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

21

[email protected] ARGONNE CHICAGO

UI-Chicago

Illinois Inst. Tech

Argonne Nat’l Lab(approx 25 miles SW)

Northwestern Univ-Chicago“Starlight”

U of Chicago

I-55

Dan Ryan Expwy(I-90/94)

I-290

I-294

UIUC/NCSAUrbana (approx 140 miles South)

Illinois Distributed Optical Testbed

DAS-2

Page 21: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

22

[email protected] ARGONNE CHICAGO

Topics in Grid Infrastructure

Regional, national, intl optical infrastructure– I-WIRE, StarLight

TeraGrid: Deep infrastructure– High-end support for U.S. community

iVDGL: Wide infrastructure– Building a (international) community

Open Grid Services Architecture– Future service & technology infrastructure

Page 22: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

23

[email protected] ARGONNE CHICAGOwww.teragrid.org

TeraGrid: Deep Infrastructure

Page 23: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

24

[email protected] ARGONNE CHICAGO

TeraGrid Objectives

Create unprecedented capability– Integrated with extant PACI capabilities

– Supporting a new class of scientific research Deploy a balanced, distributed system

– Not a “distributed computer” but rather …

– a distributed “system” using Grid technologies> Computing and data management

> Visualization and scientific application analysis

Define an open and extensible infrastructure– Enabling infrastructure for scientific research

– Extensible beyond the original four sites> NCSA, SDSC, ANL, and Caltech

Page 24: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

26

[email protected] ARGONNE CHICAGO

TeraGrid Timelines

Early accessTo McKinleyAt Intel

Jan ‘01

Proposal SubmittedTo NSF

Jan ‘02 Jan ‘03Early McKinleys at TG sites forTesting/benchmarking

TeraGrid clusters

TeraGrid Operations Center Prototype Day Ops Production

Basic Grid svcs Linux clusters SDSC SP NCSA O2K

Core Grid services deployment

Initial appsOn McKinley

TeraGrid Operational

TeraGrid Networking Deployment

TeraGrid prototypeAt SC2001, 60 ItaniumNodes, 10Gbs network

McKinley systems

TeraGridPrototypes

Grid Services onCurrent Systems

Networking

Operations

10Gigabit Enet testing

“TeraGrid Lite” Systems and Grids testbed

Advanced Grid services testing

Applications

Page 25: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

27

[email protected] ARGONNE CHICAGO

64 hosts 64 hosts 64 hosts 64 hosts 64 hosts

Spine Switches

128-port Clos

Switches

64 inter-switch links

100Mb/s Switched EthernetManagement Network

(c) I/O - Storage (d) Visualization

Clos mesh InterconnectEach line = 8 x 2Gb/s links

64 TBRAID

64 inter-switch links

= 4 links

64 inter-switch links

Local Display Networks for Remote Display

(e) Compute

(b) Example 320-node Clos Network(a) Terascale Architecture Overview

Add’l Clusters, External Networks

Terascale Cluster Architecture

Myrinet System Interconnect

•FCS Storage Network•GbE for external traffic

Rendered Image files

Page 26: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

28

[email protected] ARGONNE CHICAGO

Initial TeraGrid Design

DWDM Optical Mesh

NCSA

2024 McKinley Processors (8 Teraflops, 512 nodes)250 TB RAID storage

ANL

384 McKinley Processors (1.5 Teraflops, 96 nodes)125 TB RAID storage

Caltech

SDSC

768 McKinley Processors (3 Teraflops, 192 nodes)250 TB RAID storage

384 McKinley Processors (1.5 Teraflops, 96 nodes)125 TB RAID storage

Page 27: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

29

[email protected] ARGONNE CHICAGO

NSF TeraGrid: 14 TFLOPS, 750 TB

MyrinetMyrinet MyrinetMyrinet

HPSS

1176p IBM SPBlue Horizon

Sun E10K1500p Origin

UniTree

1024p IA-32 320p IA-64

HPSS

574p IA-32 Chiba City

HR Display & VR Facilities

HPSS

256p HP X-Class

128p HP V2500

92p IA-32

NCSA: Compute-Intensive

ANL: VisualizationCaltech: Data collection analysis

SDSC: Data-Intensive

WAN Bandwidth Options:• Abilene (2.5 Gb/s, 10Gb/s late 2002)• State and regional fiber initiatives plus CANARIE CA*Net• Leased OC48• Dark Fiber, Dim Fiber, Wavelengths

WAN Architecture Options:• Myrinet-to-GbE; Myrinet as a WAN• Layer2 design• Wavelength Mesh• Traditional IP Backbone

Page 28: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

34

[email protected] ARGONNE CHICAGO

NSF TeraGrid: 14 TFLOPS, 750 TB

HPSS

HPSS

574p IA-32 Chiba City

128p Origin

HR Display & VR Facilities

MyrinetMyrinet MyrinetMyrinet

1176p IBM SPBlue Horizon

Sun E10K1500p Origin

UniTree

1024p IA-32 320p IA-64

HPSS

256p HP X-Class

128p HP V2500

92p IA-32

NCSA: Compute-Intensive

ANL: Visualization

Caltech: Data collection analysis

SDSC: Data-Intensive

Page 29: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

35

[email protected] ARGONNE CHICAGO

Defining Standard Services

IA-64 Linux Cluster Runtime

File-based Data Service

Collection-based Data Service

Volume-Render Service

Interactive Collection-Analysis Service

GridApplications

IA-64 Linux Cluster Interactive Development

Finite set of TeraGrid services- applications see standard

services rather than particular implementations…

…but sites also provide additional services that can be discovered and exploited.

Page 30: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

36

[email protected] ARGONNE CHICAGO

Standards Cyberinfrastructure

Runtime

File-based Data Service

Collection-based Data ServiceVisualization Services

Interactive Collection-Analysis Service

GridApplications

Interactive Development

IA-64 Linux Clusters

Alpha Clusters

IA-32 Linux Clusters

Visualization Services

Data/Information Compute Analysis

Relational dBase Data Service

TeraGrid Certificate Authority

Certificate AuthorityCertificate

AuthorityCertificate AuthorityCertificate

Authority

Grid Info

Svces

• TeraGrid: focus on a finite set of service specifications applicable to TeraGrid resources.

• If done well, other IA-64 cluster sites would adopt TeraGrid service specifications, increasing users’ leverage in writing to the specification, and others would adopt the framework for developing similar services (for Alpha, IA-32, etc.)

• Note the specification should attempt to offer improvement over general Globus runtime environment without bogging down attempting to do everything (for which a user is better off running interactively!)

Page 31: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

37

[email protected] ARGONNE CHICAGO

Strategy: Define Standard Services

Finite number of TeraGrid Services– Defined as specifications, protocols, APIs

– Separate from implementation Example: File-based Data Service

– API/Protocol: Supports FTP and GridFTP, GSI authentication

– SLA: All TeraGrid users have access to N TB storage, available 24/7 with M% availability, >= R Gb/s read, >= W Gb/s write, etc.

Page 32: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

38

[email protected] ARGONNE CHICAGO

General TeraGrid Services

Authentication– GSI: Requires TeraGrid CA policy and services

Resource Discovery and Monitoring– Define TeraGrid services/attributes to be published in

Globus MDS-2 directory services

– Require standard account information exchange to map use to allocation/individual

– For many services, publish query interface> Scheduler: queue status

> Compute, Visualization, etc. services: attribute details

> Network Weather Service

> Allocations/Accounting Database: for allocation status

Page 33: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

39

[email protected] ARGONNE CHICAGO

General TeraGrid Services Advanced Reservation

– On-demand services

– Staging data: coordination of storage+compute Communication and Data Movement

– All services assume any TeraGrid cluster node can talk to any TeraGrid cluster node

– All resources support GridFTP “Hosting environment”

– Standard software environment

– More sophisticated dynamic provisioning issues not yet addressed

Page 34: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

41

[email protected] ARGONNE CHICAGO

Topics in Grid Infrastructure

Regional, national, intl optical infrastructure– I-WIRE, StarLight

TeraGrid: Deep infrastructure– High-end support for U.S. community

iVDGL: Wide infrastructure– Building a (international) community

Open Grid Services Architecture– Future service & technology infrastructure

Page 35: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

42

[email protected] ARGONNE CHICAGO

iVDGL: A Global Grid Laboratory

International Virtual-Data Grid Laboratory– A global Grid laboratory (US, Europe, Asia, South

America, …)– A place to conduct Data Grid tests “at scale”– A mechanism to create common Grid infrastructure– A laboratory for other disciplines to perform Data Grid

tests– A focus of outreach efforts to small institutions

U.S. part funded by NSF (2001-2006)– $13.7M (NSF) + $2M (matching)

“We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.”

From NSF proposal, 2001

Page 36: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

43

[email protected] ARGONNE CHICAGO

Initial US-iVDGL Data Grid

Tier1 (FNAL)Proto-Tier2Tier3 university

UCSDFlorida

Wisconsin

FermilabBNL

Indiana

BU

Other sites to be added in

2002

SKC

Brownsville

Hampton

PSU

JHUCaltech

Page 37: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

44

[email protected] ARGONNE CHICAGOU.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

iVDGL:International Virtual Data Grid Laboratory

Tier0/1 facility

Tier2 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

Tier3 facility

Page 38: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

45

[email protected] ARGONNE CHICAGO

iVDGL Architecture(from proposal)

Page 39: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

46

[email protected] ARGONNE CHICAGO

US iVDGL Interoperability

US-iVDGL-1 Milestone (August 02)

ATLAS

CMS LIGO

SDSS/NVO

US-iVDGL-1

Aug 2002

US-iVDGL-1

Aug 2002

1

2

iGOC

1

2

1

2

1

2

Page 40: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

47

[email protected] ARGONNE CHICAGO

Transatlantic Interoperability

iVDGL-2 Milestone (November 02)

ATLAS

CMS LIGO

SDSS/NVO

iVDGL-2Nov 2002

iVDGL-2Nov 2002

ANL

BNL

BU

IU

UM

OU

UTA

HU

LBL

CIT

UCSD

UF

FNAL

FNAL

JHU

CS Research

ANL

UCB

UC

IU

NU

UW

ISI

CIT

UTB

PSU

UWM

UC

iGOC Outreach DataTAG

CERN

INFN

UK PPARC

U of A

Page 41: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

48

[email protected] ARGONNE CHICAGO

Topics in Grid Infrastructure

Regional, national, intl optical infrastructure– I-WIRE, StarLight

TeraGrid: Deep infrastructure– High-end support for U.S. community

iVDGL: Wide infrastructure– Building a (international) community

Open Grid Services Architecture– Future service & technology infrastructure

Page 42: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

49

[email protected] ARGONNE CHICAGO

“Standard” Software Infrastructure:Globus ToolkitTM

Small, standards-based set of protocols for distributed system management– Authentication, delegation; resource discovery;

reliable invocation; etc. Information-centric design

– Data models; publication, discovery protocols Open source implementation

– Large international user community Successful enabler of higher-level services and

applications

Page 43: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

50

[email protected] ARGONNE CHICAGO

Example Grid Projects in eScience

Page 44: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

51

[email protected] ARGONNE CHICAGO

User

Userprocess #1

Proxy

Authenticate & create proxy

credential

GSI(Grid

Security Infrastruc-

ture)

Gatekeeper(factory)

Reliable remote

invocation

GRAM(Grid Resource Allocation & Management)

Reporter(registry +discovery)

Userprocess #2Proxy #2

Create process Register

The Globus Toolkit in One Slide Grid protocols (GSI, GRAM, …) enable resource sharing within

virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services)

Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, …

Other service(e.g. GridFTP)

Other GSI-authenticated remote service

requests

GIIS: GridInformationIndex Server (discovery)

MDS-2(Monitor./Discov. Svc.)

Soft stateregistration;

enquiry

Page 45: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

52

[email protected] ARGONNE CHICAGO

Globus Toolkit: Evaluation (+)

Good technical solutions for key problems, e.g.– Authentication and authorization

– Resource discovery and monitoring

– Reliable remote service invocation

– High-performance remote data access This & good engineering is enabling progress

– Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support

– Growing community code base built on tools

Page 46: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

53

[email protected] ARGONNE CHICAGO

Globus Toolkit: Evaluation (-) Protocol deficiencies, e.g.

– Heterogeneous basis: HTTP, LDAP, FTP

– No standard means of invocation, notification, error propagation, authorization, termination, …

Significant missing functionality, e.g.– Databases, sensors, instruments, workflow, …

– Virtualization of end systems (hosting envs.) Little work on total system properties, e.g.

– Dependability, end-to-end QoS, …

– Reasoning about system properties

Page 47: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

54

[email protected] ARGONNE CHICAGO

Globus Toolkit Structure

GRAM MDS

GSI

GridFTP MDS

GSI

???

GSI

Reliable invocationSoft state

management

Notification

ComputeResource

DataResource

Other Serviceor Application

Jobmanager

Jobmanager

Lots of good mechanisms, but (with the exception of GSI) not that easilyincorporated into other systems

Service naming

Page 48: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

55

[email protected] ARGONNE CHICAGO

Grid Evolution:Open Grid Services Architecture

Refactor Globus protocol suite to enable common base and expose key capabilities

Service orientation to virtualize resources and unify resources/services/information

Embrace key Web services technologies for standard IDL, leverage commercial efforts

Result: standard interfaces & behaviors for distributed system management: the Grid service

Page 49: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

56

[email protected] ARGONNE CHICAGO

Open Grid Services Architecture:Transient Service Instances

“Web services” address discovery & invocation of persistent services– Interface to persistent state of entire enterprise

In Grids, must also support transient service instances, created/destroyed dynamically– Interfaces to the states of distributed activities

– E.g. workflow, video conf., dist. data analysis Significant implications for how services are managed,

named, discovered, and used– In fact, much of OGSA (and Grid) is concerned with the

management of service instances

Page 50: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

57

[email protected] ARGONNE CHICAGO

Open Grid Services Architecture

Defines fundamental (WSDL) interfaces and behaviors that define a Grid Service– Required + optional interfaces = WS “profile”

– A unifying framework for interoperability & establishment of total system properties

Defines WSDL extensibility elements– E.g., serviceType (a group of portTypes)

Delivery via open source Globus Toolkit 3.0– Leverage GT experience, code, community

And commercial implementations

Page 51: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

58

[email protected] ARGONNE CHICAGO

The Grid Service =Interfaces/Behaviors + Service Data

Servicedata

element

Servicedata

element

Servicedata

element

Implementation

GridService(required)Service data access

Explicit destructionSoft-state lifetime

… other interfaces …(optional) Standard:

- Notification- Authorization- Service creation- Service registry- Manageability- Concurrency

+ application-specific interfaces

Binding properties:- Reliable invocation- Authentication

Hosting environment/runtime(“C”, J2EE, .NET, …)

Page 52: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

59

[email protected] ARGONNE CHICAGO

Grid Service Example:Database Service

A DBaccess Grid service will support at least two portTypes– GridService

– DBaccess Each has service data

– GridService: basic introspection information, lifetime, …

– DBaccess: database type, query languages supported, current load, …, …

Maybe other portTypes as well– E.g., NotificationSource (SDE = subscribers)

GridService DBaccess

DB info

Name, lifetime, etc.

Page 53: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

60

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider

“I want to createa personal databasecontaining data one.coli metabolism”

.

.

.

DatabaseFactory

Page 54: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

61

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

“Find me a data mining service, and somewhere to store

data”

DatabaseFactory

Page 55: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

62

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

GSHs for Miningand Database factories

DatabaseFactory

Page 56: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

63

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

“Create a data mining service with initial lifetime 10”

“Create adatabase with initial lifetime 1000”

DatabaseFactory

Page 57: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

64

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

Miner

“Create a data mining service with initial lifetime 10”

“Create adatabase with initial lifetime 1000”

Page 58: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

65

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

Miner

Query

Query

Page 59: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

66

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

Miner

Query

Query

Keepalive

Keepalive

Page 60: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

67

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

MinerKeepalive

KeepaliveResults

Results

Page 61: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

68

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

Miner

Keepalive

Page 62: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

69

[email protected] ARGONNE CHICAGO

Example:Data Mining for Bioinformatics

UserApplication

BioDB n

Storage Service Provider

DatabaseFactory

MiningFactory

CommunityRegistry

DatabaseService

BioDB 1

DatabaseService

.

.

.

Compute Service Provider...

Database

Keepalive

Page 63: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

70

[email protected] ARGONNE CHICAGO

GT3: An Open Source OGSA-Compliant Globus Toolkit

GT3 Core– Implements Grid service

interfaces & behaviors

– Reference impln of evolving standard

– Multiple hosting envs:Java/J2EE, C, C#/.NET?

GT3 Base Services– Evolution of current Globus

Toolkit capabilities Many other Grid services

GT3 Core

GT3 Base Services

Other GridServicesGT3

DataServices

Page 64: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

71

[email protected] ARGONNE CHICAGO

OGSA Definition and Delivery(Very Approximate!!)

Grid ServiceSpecification

Globus OGSIReference Impln

GGF OGSIWG

Prototype

Feedback

Other Systems

Other specs: • Databases • Etc.• Etc.

GGFWGs

Other OGSA-based software

Other core specs: • Security • Res. Mgmt.• Etc.

GGFWGs

Globus ToolkitVersion 3

TIME

Page 65: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

72

[email protected] ARGONNE CHICAGO

Q2: What Higher-Level Services?

Page 66: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

73

[email protected] ARGONNE CHICAGO

Summary:Grid Infrastructure

Grid applications demand new infrastructure beyond traditional computers and networks– Network-accessible resources of all types

– High-speed networks

– Services and operational procedures

– Software technology for building services (which must also be treated as infrastructure)

TeraGrid, iVDGL, StarLight, DOT– Connections to international sites?

Page 67: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

74

[email protected] ARGONNE CHICAGO

Summary:Open Grid Services Architecture

Open Grid Services Architecture represents (we hope!) next step in Grid evolution

Service orientation enables unified treatment of resources, data, and services

Standard interfaces and behaviors (the Grid service) for managing distributed state

Deeply integrated information model for representing and disseminating service data

Open source Globus Toolkit implementation (and commercial value adds)

Page 68: Future Scientific Infrastructure Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.

75

[email protected] ARGONNE CHICAGO

For More Information

Survey + research articles– www.mcs.anl.gov/~foster

I-WIRE: www.iwire.org TeraGrid: www.teragrid.org iVDGL: www.ivDGL.org The Globus Project™

– www.globus.org GriPhyN project

– www.griphyn.org Global Grid Forum

– www.gridforum.org– www.gridforum.org/ogsi-wg– Edinburgh, July 22-24– Chicago, Oct 15-17