Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa...

17
Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation

Transcript of Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa...

Page 1: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

Service Oriented Science

Ian FosterArgonne National Laboratory

University of ChicagoUniva Corporation

Page 2: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

2

Two Exciting Things That I Won’t Talk About

Globus Toolkit v4 (release: April 30, 2005) Robustness, performance, usability, testing,

documentation, standards compliance E.g., GRAM supports 30,000 active jobs 180+ people on alpha tester list New functionality: data management, security,

registry, OGSA-DAI, C hosting, etc. Our work with DAGman, Condor-G, Condor

> 1 Million jobs (we estimate) run over the last year from many application domains

Mike Wilde’s talk (yesterday) gave details

Page 3: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

3

Instead: Scaling eScience

Dimensions of scaling Service-oriented science Separating concerns: hosting eScience

communities

eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

Page 4: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

4

Dimensions of Scaling:For Example, U.S. Dept of Energy

Lawrence BerkeleyNational Lab

•Advanced Light Source•National Center for Electron Microscopy

•National Energy Research Scientific Computing Facility

Los Alamos NeutronScience Center

Univ. of IL• Electron Microscopy Center

for Materials Research • Center for Microanalysis of

Materials

MIT•Bates Accelerator Center

•Plasma Science & Fusion Center

SC User FacilitiesInstitutions that Use SC Facilities

Fermi National Accelerator Lab•Tevatron

Stanford Linear Accelerator Center

•B-Factory•Stanford Synchrotron Radiation Laboratory

Princeton Plasma Physics Lab

GeneralAtomics

- DIII-D Tokamak

SC Laboratories

Pacific Northwest National Lab

• Environmental Molecular Sciences Lab

Argonne National Lab• Intense Pulsed Neutron Source•Advanced Photon Source•Argonne Tandem Linac Accelerator System

BrookhavenNational Lab

•Relativistic Heavy Ion Collider

•National Synchrotron Light Source

Oak Ridge National Lab•High-Flux Isotope Reactor Surface Modification & Characterization Center

•Spallation Neutron Source (under construction)

Thomas Jefferson NationalAccelerator Facility

•Continuous Electron Beam Accelerator Facility

Physics AcceleratorsSynchrotron Light SourcesNeutron SourcesSpecial Purpose FacilitiesLarge Fusion Experiments

Sandia Combustion Research Facility

James R. MacDonaldLaboratory

Page 5: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

5

Dimensions of Scaling:E.g., U.S. Dept of Energy

Goal: Any DOE scientist can access any DOE computer, software, data, instrument ~25,000 scientists* (vs. ~1000 DOE certs) ~1000 instruments** (vs. maybe 10 online?) ~1000 scientific applns** (vs. 2 Fusion services) ~10 PB of interesting data** (vs. 100TB on ESG) ~100,000 computers* (vs. ~3000 on OSG)

Not to mention many external partners

I.e., we need to scale by 2-3 orders of magnitude to have DOE-wide impact!

* Rough estimate; ** WAG

Page 6: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

6

Scaling eScience

Dimensions of scaling Service-oriented science Separating concerns: hosting eScience

communities

eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

Page 7: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

7

Scaling eScience:A Services Approach

Take the “Grid” moniker seriously Not “discover, deploy, debug, monitor, resubmit,

…” but “plug in and tune out” For example

GriPhyN virtual data service dispatches analysis tasks to campus or national Grid

Campus CHARMM service dispatches large jobs to national resources

Online biology service serves thousands, uses national resources to preprocess data

I.e., eScience as “service”

Page 8: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

8For Example: BLASTing for Protein Knowledge

Blasting complete NR DB for sequence similarity and function characterization Knowledge Base

PUMA enables researchers to find information about a specific protein after having been analyzed against the complete set of sequenced genomes (NR file: ~ 2 million sequences)

Analysis on the Grid

The analysis of protein sequences occurs in the background in the grid environment. Millions of processes are started since several tools are run to analyze each sequence, such as finding protein similarities (BLAST), protein family domain searches (BLOCKS), and structural characteristics of the protein.

Page 9: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

9

Provisioning Service-oriented

infrastructure Provision physical

resources to support application workloads

Service-Oriented ScienceRequires Grid Technology

Service-oriented applications Wrap applications as

(Web) services Compose applications

into workflows ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Page 10: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

10

Grid Technology asService-Oriented Infrastructure

IBM

IBM

Uniform interfaces,security mechanisms,Web service transport,

monitoring

Computers StorageSpecialized resource

UserApplication

UserApplication

UserApplication

IBM

IBM

GRAM GridFTPHost EnvUser Svc

DAIS

Database

ToolTool Reliable

FileTransfer

MyProxy

Host EnvUser Svc

MDS-Index

Page 11: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

11

Scaling eScience

Dimensions of scaling Service-oriented science Separating concerns: hosting

eScience communities

eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

Page 12: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

12Scaling eScience:A Range of Approaches

Cookie cutter Standard h/w + s/w E.g., BIRN, PlanetLab, NEES Simple deployment, limited scalability

Service ecology Standard interfaces, many service providers E.g., NVO, bioinformatics Powerful model, limited service capacity

General-purpose infrastructure Standard resource provider interfaces E.g., TeraGrid, OSG Need to work out how to host services

Page 13: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

13

Scaling eScience:Separating Concerns

Content Stuff that a community cares about: data, metadata,

software, analyses, instruments Community responsibility

Middleware/function Plumbing needed for community to function: membership,

data mgmt, registry, workflow Can often be provided by others

Resources The physical devices required to support community

content, function, computation Need not be the concern of individual users!

Page 14: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

14

Domain-independentDomain-dependent

Content

Function

Resources

Experimental apparatus Servers, storage, networks

Metadatacatalog

Dataarchive

Simulationserver

Certificateauthority

Simulationcode

Exptdesign

Telepresencemonitor

SimulationcodeExpt

output

Electronicnotebook

Portalserver

Scaling eScience:Separating Concerns

Page 15: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

15

Virtualizing Resources(K. Keahey et al.)

“Virtual workspace” as a core abstraction Computer(s), network(s), configuration(s)

Multiple implementation technologies Dynamic accounts (e.g., gLite deployment) Virtual machines (current prototyping)

E.g., “OSG virtual cluster” A collection of virtual machines running standard OSG

software (Virtual Data Toolkit) Instantiation by a resource provider makes it

immediately accessible as an OSG cluster Load (3 nodes): 1.3 sec; start: 0.7 sec

Page 16: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

16

Summary

Q: How to scale eScience? A1: Virtualization: eScience as service

AKA “science gateways” Service-oriented infrastructure for

management & provisioning A2: Separation of concerns

Allow providers to host communities by providing resources & function

Virtual workspaces as an enabling technology

Page 17: Service Oriented Science Ian Foster Argonne National Laboratory University of Chicago Univa Corporation.

17

For More Information

Globus Alliance www.globus.org

Globus Consortium www.globusconsortium.com

Global Grid Forum www.ggf.org

Open Science Grid www.opensciencegrid.org

Background information www.mcs.anl.gov/~foster

2nd Editionwww.mkp.com/grid2