Globus Presented by: Yayati Kasralikar for CPA 5937.

37
Globus Presented by: Yayati Kasralikar for CPA 5937

Transcript of Globus Presented by: Yayati Kasralikar for CPA 5937.

Page 1: Globus Presented by: Yayati Kasralikar for CPA 5937.

Globus

Presented by:

Yayati Kasralikar for CPA 5937

Page 2: Globus Presented by: Yayati Kasralikar for CPA 5937.

Motivational ExampleCancer imageData Mining

Software

Very largeDatabase of

cancer images

cancer images

cancer images

R

Rcancer images

Data Pre-processing Software

High-performance

machine

Page 3: Globus Presented by: Yayati Kasralikar for CPA 5937.

What is Grid?

1. Coordinates resources that are not subject to centralized control.

2. Uses standard, open, general-purpose protocols and interfaces.

3. Delivers nontrivial qualities of service.• Let’s Examine some technologies:

– Clusters– P2P Systems (e.g. Gnutella)

– Web

-Centralized Control

Do not use Open and Standard protocolsNot coordinated use resources

Page 4: Globus Presented by: Yayati Kasralikar for CPA 5937.

Why use Grid?• A biochemist exploits 10,000 computers to

screen 100,000 compounds in an hour.

• 1,000 physicists worldwide pool resources for peta-op analyses of petabytes of data.

• An insurance company mines data from partner hospitals for fraud detection.

• An application service provider offloads excess load to a compute cycle provider

Page 5: Globus Presented by: Yayati Kasralikar for CPA 5937.

Virtual Organization (VO)

?RR

R

RR

R

R

RR

R

R

R

R

RR

RR

VO A

VO B

VO C

A dynamic set of individuals or institutions sharing resources for problem solving

Page 6: Globus Presented by: Yayati Kasralikar for CPA 5937.

Grid Characteristics• Scale and Resource Selection

– Particular applications selecting resources from a very large collection according to criteria such as connectivity,cost,security and reliability

• Heterogeneity at multiple levels– heterogeneity ranging from physical devices, system

software to scheduling and usage

• Dynamic and unpredictable behavior– Behavior and performance of shared resources vary

over time

• Multiple administrative domain.– Challenging security problem

Page 7: Globus Presented by: Yayati Kasralikar for CPA 5937.

Globus Initiative• Provide basic infrastructure, Protocols, Services,

APIs and SDKs for Grid Computing.– Protocols: Focus on externals(interactions) rather than

internals(resource characteristics) (e,g. GRIP, IP)– Service: Protocol+Behavior (e.g. Information).– APIs and SDKs: Facilitate application developers to

develop complex applications(e.g. GSS API,JDBC API,JNDI SDK). Application robustness, correctness, development and maintenance cost.

• Globus Toolkit: A community-based,open-architecture,open-source set of services and software libraries that supports Grids and Grid Applications.

Page 8: Globus Presented by: Yayati Kasralikar for CPA 5937.

Layered Grid Architecture

Application

ApplicationCollective

Resource

Connectivity

Fabric

Internet

Link

Transport

Grid

Pro

toco

l Arc

hite

ctur

e

Inte

rnet

Pro

toco

l Arc

hite

ctur

e

Page 9: Globus Presented by: Yayati Kasralikar for CPA 5937.

Connectivity Layer

Application

Collective

Resource

Connectivity

Fabric

Grid

Pro

toco

l Arc

hite

ctur

e

NexsusInterface

Grid Security Infrastructure

GSI

Page 10: Globus Presented by: Yayati Kasralikar for CPA 5937.

Resource Layer

Application

Collective

Resource

Connectivity

Fabric

Grid Resource Information

Protocol(GRIP)

GridFTP

Grid Resource Access

Management(GRAM)

Grid

Pro

toco

l Arc

hite

ctur

e

Grid Resource Registration

Protocol(GRRP)

Data Transfer Gri

d Info

rmati

on S

erv

ices

Reso

urc

e M

anagem

ent

Page 11: Globus Presented by: Yayati Kasralikar for CPA 5937.

Collective Layer

Application

Collective

Resource

Connectivity

Fabric

Data Replication Services

Directory Services

Grid

Pro

toco

l Arc

hite

ctur

e

Monitoring Services

Scheduling and Brokering Services

Page 12: Globus Presented by: Yayati Kasralikar for CPA 5937.

Application Layer

Application

Collective

Resource

Connectivity

Fabric

Grid

Pro

toco

l Arc

hite

ctur

e

Languages & Frameworks

Collective APIs and SDKs

Resource APIs and SDKs

Connectivity APIs

Fabric

Collective Service Protocols

Resource Service Protocols

Connectivity Protocols

Page 13: Globus Presented by: Yayati Kasralikar for CPA 5937.

Communication Services

EP

SP

SP

SPEP

0 1 2

Nexus communication mechanism

Communicationlink

• Diverse Communication needs.• IP does not meet these needs on the other hand MPI do

not provide rich range of communication abstractions.• Communication link and remote service request (RSR).

– One-sided asynchronous RPC transfer data from SP to EP(s) and integrate it into the process containing the EP(s)

Page 14: Globus Presented by: Yayati Kasralikar for CPA 5937.

Resource ManagementChallenging resource management problems:• site autonomy

– resources are typically owned and operated by different organizations, in different administrative domains

• heterogeneous substrate– different sites may use different local resource management

systems

• policy extensibility– A resource management solution must support the frequent

development of new domain-specific management structures

• co-allocation– using resources simultaneously at several sites

• online control.– substantial negotiation can be required to adapt application

requirements to resource availability

Page 15: Globus Presented by: Yayati Kasralikar for CPA 5937.

Resource Management Architecture

GRAM GRAM GRAM

LSF Condor NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Page 16: Globus Presented by: Yayati Kasralikar for CPA 5937.

Resource Specification Language• Based on the syntax for filter specifications in the

LDAP.• An RSL is constructed by combining simple

parameter specifications and conditions with following operators:

• &: Specify conjunction• | : Specify disjunction• + : Combine two or more requests• Resource brokers,co-allocators and resource

managers can each define a set of parameters.• Example: I want “5 nodes with at least 256MB

memory, or 10 nodes with 64MB for myprog”• RSL:&(executable=myprog)(|(&(count=5) (memory>=256)) (|(&(count=10) (memory>=64)))

Page 17: Globus Presented by: Yayati Kasralikar for CPA 5937.

Local Resource Management• Globus Resource Allocation Manager (GRAM)

provide local component for resource management.

• GRAM is responsible for:1. Processing RSL specifications2. Enabling remote monitoring and management of

jobs3. Periodically updates the information service.

• Two major software components of GRAM:1. GateKeeper: create Grid service2. Job Manager Instance(JMI): resource management

and Job control

Page 18: Globus Presented by: Yayati Kasralikar for CPA 5937.

The Hour-Glass principle

• Simple well-defined interface form the neck.• Uniform access to diverse local implementations

and higher-level global services.

Page 19: Globus Presented by: Yayati Kasralikar for CPA 5937.

Grid Security Characteristics• Single Sign on

– Users must be able to authenticate just once to access to multiple grid resources.

• Delegation– Users must be able to endow a program with the

ability to run on his/her behalf.

• Integration with local security Solutions– Interoperate with various local solutions.

• User-based trust relationships– Each of the resource providers must not interact

with each other to configure security environment.

Page 20: Globus Presented by: Yayati Kasralikar for CPA 5937.

Security Policies:• Grid Environment consists of multiple trust domains.• Operations confined to a single trust domain are subject to

local security policy only.• Both local and global participants exists. For each trust

domain, there exists a partial mapping from global to local.• Operations between entities located in different trust domains

require mutual authentication.• An authenticated global subject mapped into a local subject is

assumed to be equivalent to being locally authenticated as that local subject.

• All access control decisions are made locally on the basis of the local subject.

• A program or process is allowed to act on behalf of a user and be delegated a subset of the user's rights.

• Processes running on behalf of the same subject within the same trust domain may share a single set of credentials.

Page 21: Globus Presented by: Yayati Kasralikar for CPA 5937.

Globus Security Infrastructure

Credentials

User Proxy

Globus Credentials

CertificateCertificate

User

KerberosPublic Key

GSI

GRAM

User Process

User Process

User Process User Process

User Process

User Process

GSI

GRAM

Page 22: Globus Presented by: Yayati Kasralikar for CPA 5937.

Globus Security Scenario

Site A(Kerberos)

Site B (Unix)

Site C(Kerberos)

Computer

User

Computer

Storagesystem

Communication

GSI-enabledFTP server

AuthorizeMap to local idAccess file

Remote fileaccess request

GSI-enabledGRAM server

GSI-enabledGRAM server

Remote processcreation requests

Process

Kerberosticket

Restrictedproxy

Process

Restrictedproxy

Local id Local id

AuthorizeMap to local idCreate processGenerate credentials

Same

Single sign-on via “grid-id”& generation of proxy cred.

Or: retrieval of proxy cred.from online repository

User ProxyProxy

credential

Page 23: Globus Presented by: Yayati Kasralikar for CPA 5937.

Information Services

• Initial Discovery and ongoing monitoring of Resources• Existing services such as LDAP and UDDI do not address the dynamic addition and deletion of resources.• Two Fundamental entities in Grid Information Service:

• Highly distributed information providers.• Specialized aggregate directory services.

• Both these entities speak two fundamental protocols.

Page 24: Globus Presented by: Yayati Kasralikar for CPA 5937.

Information Servicesdiscovery (GRIP)

lookup (GRIP) registration (GRRP)

P P P P

D D

VO-specific Aggregate Directories

• Initial Discovery and ongoing monitoring of Resources• Existing services such as LDAP and UDDI do not address the dynamic addition and deletion of resources.• Two Fundamental entities in Grid Information Service:

• Highly distributed information providers.• Specialized aggregate directory services.

• Both these entities speak two fundamental protocols.

Information Provider Services

Page 25: Globus Presented by: Yayati Kasralikar for CPA 5937.

Information Services - ProtocolsGrid Information Protocol (GRIP)

– Used to access information about entities– GRIP supports both discovery and enquiry– GRIP is adopted from Lightweight Directory Access

Protocol (LDAP)– LDAP defines data model,query language and wire

protocol.

Grid Registration Protocol (GRRP)– Define a notification mechanism to push simple

information from one ‘element’ to another ‘element’.– It is a soft-state protocol which is resilient to failures.– GRRP message contains name of the service,type

of notification service and timestamp.

Page 26: Globus Presented by: Yayati Kasralikar for CPA 5937.

Hierarchical Discovery

Host:hn=R1,O=O1Host:hn=R2,O=O1Host:hn=R3,O=O1Host:hn=R1,O=O2Host:hn=R2,O=O2Host:hn=R1

Host:hn=R1Host:hn=R2Host:hn=R3

Host:hn=R1Host:hn=R2

Host

HostHostHostHost Host

R1 R2 R3 R1 R3

R1O1 O2

VO Directory

Information Provider

Center 1Directory

Center 2Directory

Network of aggregate directories

Each directory usesGRIP and act as a

Information Provider

Page 27: Globus Presented by: Yayati Kasralikar for CPA 5937.

Data Transfer - GridFTP• High-speed transport protocol which extends

the popular FTP protocol.• GridFTP Functionality:

– GridFTP must support GSI – Third-party control of data transfer– Parallel data transfer– Stripped data transfer– Partial file transfer– Support for reliable and restartable data transfer.

• The implementation consists of two principal libraries: globus_ftp_control_library and globus_ftp_client_library

Page 28: Globus Presented by: Yayati Kasralikar for CPA 5937.

Replica Management Service

Application

Metadata Service

Replica Management

Service

Replica Selection Service

Information Services

Attributes of desired data

(1) Logical File Names

(2)

Sources and destination

(6)

Performance Measurements and Predictions(7)

Location of Selected Replicas

(8)

Location of 1 or more replicas(4)(3)

(5)

Page 29: Globus Presented by: Yayati Kasralikar for CPA 5937.

Replica Management Service• Creating new copies of a complete or partial

collection of files• Registering them in a Replica Catalog• Allow Applications to query the catalog• Data are organized into files.

– Logical File name Vs Physical File name.

• Key Architecture Decisions:– Separation of Replication and Metadata Information– Does not enforce Replication Semantics– Provide Rollback to keep the state consistent in case

of failures– No distributed locking mechanism

Page 30: Globus Presented by: Yayati Kasralikar for CPA 5937.

Relationships to other technologies

• World Wide Web– Web technologies mainly support client-server

architecture. Lack features (at least for now) for rich interaction and single-sign on security.

• ASP and SSP.– Provide outsource solutions which depend on

specific customer. Lack dynamic configuration.

• Enterprise Computing– Static arrangements of sharing resources.

• P2P computing– Getting closer to Grid technology, but provide

specific solutions rather than common protocols.

Page 31: Globus Presented by: Yayati Kasralikar for CPA 5937.

Other Grid Perspective

• Grid as a next-generation Internet

• Grid is a source of free cycles

• Grid requires new programming models

• Grid makes high-performance computers superfluous

Page 32: Globus Presented by: Yayati Kasralikar for CPA 5937.

References• What Is The Grid? A Three Point Checklist. I. Foster,

GRIDToday, July 22, 2002: Vol. 1 No. 6.

• Grid Computing on the Web Using the Globus Toolkit, G. Aloisio, M. Cafaro, P. Falabella, C. Kesselman, R. Williams HPCN Europe.

• Computational Grids. I. Foster, C. Kesselman. Chapter 11 of "The Grid: Blueprint for a New Computing Infrastructure", Morgan-Kaufman, 1999.

• The Globus Project: A Status Report. I. Foster, C. Kesselman. Proc. IPPS/SPDP '98 Heterogeneous Computing Workshop, pp. 4-18, 1998.

• Globus: A Metacomputing Infrastructure Toolkit. I. Foster, C. Kesselman. Intl J. Supercomputer Applications, 11(2):115-128, 1997.

Page 33: Globus Presented by: Yayati Kasralikar for CPA 5937.

References• Data Management and Transfer in High Performance Computa

tional Grid Environments. B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal, S. Tuecke. Parallel Computing Journal, Vol. 28 (5), May 2002, pp. 749-771.

• Computational Grids. I. Foster, C. Kesselman. Chapter 2 of "The Grid: Blueprint for a New Computing Infrastructure", Morgan-Kaufman, 1999.

• A Directory Service for Configuring High-Performance Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium on High-Performance Distributed Computing, pp. 365-375, 1997.

Page 34: Globus Presented by: Yayati Kasralikar for CPA 5937.

References• Grid Information Services for Distributed Resource Sharing. K.

Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001.

• A Security Architecture for Computational Grids. I. Foster, C. Kesselman, G. Tsudik, S. Tuecke. Proc. 5th ACM Conference on Computer and Communications Security Conference, pp. 83-92, 1998.

• A Resource Management Architecture for Metacomputing Systems. K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke. Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998.

Page 35: Globus Presented by: Yayati Kasralikar for CPA 5937.

Closing RemarksWe will probably see the spread of 'computer utilities', which, like present electric and telephone utilities, will service individual homes and offices across the country." - 1969, Len Kleinrock

We are a little late, but we are ready now!

Page 36: Globus Presented by: Yayati Kasralikar for CPA 5937.

Extra-1: A Model Architecture for Data GridsMetadata Catalog

Replica Catalog

Tape Library

Disk Cache

Attribute Specification

Logical Collection and Logical File Name

Disk Array Disk Cache

Application

Replica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP Control ChannelPerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

GridFTPDataChannel

Page 37: Globus Presented by: Yayati Kasralikar for CPA 5937.

Extra-2: Replica Catalog Structure:

Logical File Parent

Logical File Jan 1998

Logical CollectionC02 measurements 1998

Replica Catalog

Locationjupiter.isi.edu

Locationsprite.llnl.gov

Logical File Feb 1998

Size: 1468762

Filename: Jan 1998Filename: Feb 1998…

Filename: Mar 1998Filename: Jun 1998Filename: Oct 1998Protocol: gsiftpUrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate

Filename: Jan 1998…Filename: Dec 1998Protocol: ftpUrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi

Logical CollectionC02 measurements 1999