Introduction to Grid Computing

52
Introduction to Grid Computing Concurrent and Distributed Programming course Mark Silberstein, CS,Technion

description

Introduction to Grid Computing. Concurrent and Distributed Programming course Mark Silberstein, CS,Technion. Electric Power Grid analogy A little bit of history. Beginning of the XX century Electric power Know how to generate and how to use. Problem for wide adoption: Generators - PowerPoint PPT Presentation

Transcript of Introduction to Grid Computing

Page 1: Introduction to  Grid Computing

Introduction to Grid Computing

Concurrent and Distributed Programming course

Mark Silberstein, CS,Technion

Page 2: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 2

Electric Power Grid analogyA little bit of history

• Beginning of the XX century– Electric power

• Know how to generate and how to use.• Problem for wide adoption: Generators• Solution: Electric power grid – INFRASTRUCTURE for

power distribution and interface standardization• Integration of resources opens NEW opportunities

• Beginning of the XXI century– Computational power

• Know how to produce and how to use• Problem for high performance applications: High-end

resources• Solution: Computational grid – INFRASTRUCTURE for

pervasive and inexpensive access to high-end resource

Page 3: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 3

Grid Computing Vision

• Typical Grid usage scenario

1. Plug your PC into Computation Grid

– Infinite power (CPU/Storage/etc…)

2. Start application – You don’t care where

it is running

3. Get results– Output is waiting for

you locally

• Electric Power Grid usage scenario1. Plug in your Teapot

(many)– Infinite electric power

capacity

2. Turn it on– You don’t care WHO

supplies the power

3. Drink your tea– Water is inside the

teapot

Page 4: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 4

What is Grid Computing?

• Computational Grid is a collection of distributed (geographically/administrative domains), heterogeneous resourcesresources which can be used as an ensemble to execute large-scale applications

• Metacomputer – Virtualization of widely distributed resources

Page 5: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 5

PACI Grid

Page 6: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 6

Is it really that NEW idea?

• People connected computers together and used them long before Grid was introduced

• BUT! Everything was done manually:– I need to run simulation – Pre-Grid HOWTO Guide:

• Call admin at the remote site to open account• Stage your application and data to remote site

– Meanwhile storage is full, need to ask to remove old stuff– Different protocols

• Reserve (another call to admin) CPU• Run job and pray that nothing fails• If everything is fine – stage back output• Call admin and pay• Do it for every site and with different protocols

• Grid should provide AUTOMATION

Page 7: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 7

Scientific Grid Computing

• Collaboration - “Virtual Organizations”– “I have CPU, you produce Data, she has Storage”– “I have X CPUs (Storage), you have Y CPUs (Storage). Use

mine and I’ll use yours”– “I have Super Computer, but she has Visualization Cave. ”

• On-Demand computing– “My experiment requires many CPUs/Disk/anything. Let me use

your resources for 2 days.”• Better resource utilization

– “My computers are never used at night. You may use them when they are idle”

• Sharing of Experimental Results– CERN collider will produce PBytes of results. Researches all

over the world want to analyze them

Page 8: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 8

Why Grid? Grid Applications

• Distributed Supercomputing– Distributed Supercomputing applications

couple multiple computational resources – supercomputers/clusters/workstations over inter/intra net

– Examples include:• SFExpress (large-scale modeling of battle entities

with complex interactive behavior for distributed interactive simulation)

• Climate Modeling (high resolution, long time scales, complex models)

Page 9: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 9

Why Grid? Grid Applications

• High-Throughput Applications– Grid used to schedule large numbers of

independent or loosely coupled tasks with the goal of putting unused cycles to work

– High-throughput applications include RSA keycracking, Seti@home (detection of extra-terrestrial intelligence), MCell (Bioinformatics)

Page 10: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 10

Why Grid? Grid Applications

• Data-Intensive Applications – Focus is on synthesizing new information

from large amounts of physically distributed data (TERA/PETA bytes)

– Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications, digital library applications, CERN

Page 11: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 11

Grid Computing Challenges

• Grid is yet anotheryet another computing platform: META computer

• Unusable without specialized software, just like any other conventional computer

• What makes our computer usable?– Operating System + Drivers– Management Software– Applications

Page 12: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 12

Layered View of Computer Architecture

Applications

HardwareCPU

Memory

Peripherals

Buses

Core ServicesVMSecurity

I/OH/W

Abstraction

Layer Scheduling

High-level Services and ToolsSystem

utilitiesUser libraries

OS Internal Object Management

Page 13: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 13

Zoom on Core Services

Core ServicesVMSecurity

I/OH/W

Abstraction

LayerScheduling

OS Internal Object Management

Authentication,Authorization

Resources Access Protocols

IPC, Communication,

File System

Access to shared

resources

Allocation policy

NamingGlobal Information

Page 14: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 14

Grids vs. “PC” ;))• Different administration domains

– Security

• Geographical distribution– Communication, Scheduler, Object Management

• No global knowledge– Resource management, Naming

• No centralized control– Resource management, Allocation policy,

• Heterogeneity– Resource access protocols, Resource Management

• Scale– And all this for millions of resources!!

Page 15: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 15

Layered View of Grid ArchitectureApplications

Local Services

LSF

Condor

PBS

TCP

AIXLinux

UDP

Core Services

Metacomputing Directory

Remote process management Security

High performance I/O

Access to remote storage Reservation

Synchronization

Accounting

High-level Services and ToolsHigh level communication

Data Replication

Resource Managers and Schedulers

Grid Programming LibrariesGrid Utilities

Grid Compilers

Page 16: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 16

What is Grid Computing?

• Computational Grid is a collection of distributed (geographically/administrative domains), heterogeneous resources, resources, implementing open Grid implementing open Grid protocols to enable protocols to enable their use as part of their use as part of metacomputer(s)metacomputer(s)

Page 17: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 17

Agenda

• Core services– Globus architecture

• High Level services and tools– Condor-G

Page 18: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 18

Globus Toolkit Components

Core Services == Globus

Metacomputing Directory

Remote process management SecurityHigh performance I/O

Access to remote storage

Grid Security Infrastructure

Grid Resource Allocation Manager

Grid Access to Secondary Storage MetaData Service

Globus I/O

GridFTP

Page 19: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 19

Globus ToolkitGrid Core Services

• Provides Core Grid Services– GSI – security infrastructure– GRAM, DUROC – generic interface for resource allocation– GASS + GridFTP – data transfer and secondary storage access– MDS: GRIS/GIIS – Meta Data service– Replica Management – Data replication and management

• Provides C/Java/(Python soon) API to use and extend the services

• Provides command-line utilities• MPICH-G2 – Grid enabled MPI• Supports numerous architectures (no M$ yet)

Page 20: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 20

Security Terminology

• Authentication: Establishing identity• Authorization: Establishing rights• Accounting• Message protection

– Message integrity– Message confidentiality

• Digital signature• Public/private key • Certificate• Certificate Authority (CA)

Page 21: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 21

Public Key Based Authentication

• User sends certificate over the wire• Other end sends user a challenge string• User encodes the challenge string with private key

– Possession of private key means you can authenticate as subject in certificate

• Public key is used to decode the challenge.– If you can decode it, you know the subject

• Treat your private key carefully!!– Private key is stored only in well-guarded places, and only in

encrypted form

Page 22: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 22

Grid Security Requirements

• Single sign-on– User should authenticate only once

• Delegation of authority– Simultaneous access to large pool of resources

• Site autonomy– Respect and not override local site security

• Authentication and Authorization – One-to-one identification and user specific policy

Page 23: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 23

Globus Security Infrastructure

• Provides public key-based security system that layers on top of local site security– User identified to system using X.509 certificate (same as

certificates used for Web) containing info about the duration of permissions, public key, signature of certificate authority

– Each user has a Grid User ID, private key, certificate signed by a Certificate Authority (CA)

• GSI allows for delegation of authority and single sign on – certificate chaining with certificate proxy– Proxy is another certificate, signed by user private key

– Allows remote process to act on behalf of user, without password exposure

• Site autonomy: Grid User ID should have mapping to local user at the resource in order to “log in”

Page 24: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 24

Mutual authentication

• User and resources generates certificate and gets it signed by trusted CA one time– Certificate contains user’s name and public key– Grid coordinating authority operates CA

• User and resources each maintain list of trusted CA certificates– This enables mutual authentication (process by

which a subject proves its identity to a requestor, typically through the use of a credential.)

Page 25: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 25

Globus GSI• General scenario: User wants to execute on remote resources• How this happens securely:

– User is authenticated by a CA – one time only– To achieve a single logon effect, user creates a temporary user proxy

credential • User proxy has limited lifetime which user specifies

– User proxy credential sent to gatekeeper of each desired resource – Gatekeeper sends copy of its certificate to user– Mutual Authentication - user checks gatekeeper’s certificate signature

against trusted certificates; gatekeeper checks user signature against CA’s trusted certificates

– Gatekeeper checks to see if user has permission to execute on that machine

– If user has permission, then job is submitted to local job scheduler and job is started on remote machine

Page 26: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 26

Site A(Kerberos)

Site B (Unix)

Site C(Kerberos)

Computer

User

Single sign-on via “grid-id”& generation of proxy cred.

Or: retrieval of proxy cred.from online repository

User ProxyProxy

credential

Computer

Storagesystem

Communication*

GSI-enabledFTP server

AuthorizeMap to local idAccess file

Remote fileaccess request*

GSI-enabledgatekeeper

GSI-enabledgatekeeper

Remote processcreation requests*

* With mutual authentication

Process

Kerberosticket

Restrictedproxy

Process

Restrictedproxy

Local id Local id

AuthorizeMap to local idCreate processGenerate credentials

Ditto

GSI in Action“Create Processes at A and B that Communicate & Access Files at C”

Page 27: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 27

Globus Resource Allocation Manager

• Resource Management services provide mechanism for remote job submission and management

• 3 low level services: – GRAM (Globus Resource Allocation Manager)

• Provides remote job submission, monitoring and management

– DUROC (Dynamically Updated Request Online Co-allocator)

• Provides simultaneous job submission and barrier• Layers on top of GRAM

– RSL ( Resource specification language)

Page 28: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 28

GRAM Requirements

• Reliable invocation and cancellation– Only-once semantics

• Monitoring and event notification– Process failure should propagate to the submission

site– Deferred process invocation – state transitions

• Reliable job manager– Job may keep running, but remote monitoring agent

may fail• Heterogeneity of platforms

– Generic interface to any local resource manager• Send-boxing

Page 29: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 29

GRAM Components

Job Manager

Create

RSL Library

Parse

RequestAllocate &

create processes

Process

Process

Process

Monitor &control

Site boundary

Client

Gatekeeper

Local Resource ManagerEvent NotificationControl requests

1

2

3

4

Opaque https contact string

5

Grid SecurityInfrastructure

6Resource allocation request and

process creation

Page 30: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 30

Grid Information Infrastructure

• Requirements – Resource discovery

• All grid resources are registered– Resource selection

• Should contain specific resource information

• Challenges– Any information is always “already old”– Scalability– Fault-tolerance– Unknown information structure– Consistency– Access control

Page 31: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 31

Globus Information Infrastructure

• MDS (Metacomputing Directory Service)– MDS stores information about entry = some type

of object (organization, person, network, computer, etc.)

– Object class associated with each entry describes a set of entry attributes

– Every entry is tagged with creation time and TTL

– LDAP (Lightweight Directory Access Protocol) used to store information about resources

• LDAP = hierarchical, tree-structured information model defining form and character of information

Page 32: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 32

MDS object

Page 33: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 33

Information Infrastructure Components

• Information providers: Grid Resource Information Service (GRIS)– Run close to information source– Generate data in required format and store it in the Local

Information Directory– Queries

• Speak GRid Information Protocol (GRIP)– Perform soft-registration into Information Registries

• Speak GRid Registration Protocol (GRRP)

• Information Registries: Grid Index Information Service (GIIS) – Aggregates Info for Virtual Organization– Aggregate information about existing GRISes in VO– Provide hierarchical naming– May itself serve as GRIS for upper hierarchies– Forward all search requests to the low level GRISes

Page 34: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 34

How it all works

CPU, disk,

GRIS

Periodically invokes

scripts to obtain

information

CPU=PIIIFreeRAM=4GBCreated=20.2.2003:14.00TTL=10min

GIIS

VO A

Periodically registers(Soft registration) GIIS

VO B

CPU, disk,

GRIS

CPU, disk,

GRIS

CPU, disk,

GRIS

CPU, disk,

GRIS

GIIS

VO C

Host1: Vo-BHost2: Vo-BHost3: Vo-B

CPU, disk,

GRIS

CPU, disk,

GRIS

CPU, disk,

GRIS

Query

Page 35: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 35

GASS/GridFTP

• Grid Access to Secondary Storage– GASS Cache– Provides transparent access to remote files

• open(“ftp://..)

– Lazy copy– Utilities to enforce consistency

• FTP – open standard– Problem: low performance– GridFTP – FTP with high performance enhancements

Page 36: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 36

Globus Toolkit Componentsjust to remind what we learnt

Core Services == Globus

Metacomputing Directory

Remote process management SecurityHigh performance I/O

Access to remote storage

Grid Security Infrastructure

Grid Resource Allocation Manager

Grid Access to Secondary Storage MetaData Service

Globus I/O

GridFTP

Page 37: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 37

Grid resource management

• Raw grid infrastructure is useless without resource manager

• Resource manager requirements– Resource discovery– Resource selection– Optimal job placement – Scheduling– ….

Page 38: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 38

GRAM GRAM GRAM

Condor Linux PBS

Application

RSL

Information Service

Localresourcemanagers

Queries & Info

Global view of job invocation

Data and executableStaging

Resource Manager

Simple ground RSL

Runtime monitoring

Page 39: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 39

Condor-G – Condor gateway into grid

• Manual job invocation using Globus services is difficult– Manual data staging– No job restart after failure– Security issues– No queuing– High load on invocation machine

Page 40: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 40

Globus Universe

• Run a job on a Grid resource• Features

– Job management– Fault tolerance– Credential management

• User specifies grid resources in submission file

• Jobs are queued locally and then are executed on grid resource

Page 41: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 41

How It Works

ScheddSchedd GRAMGRAM

PBSPBS

Condor-G Grid Resource

User JobUser JobGridManagerGridManager

600 Globusjobs

Page 42: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 42

Condor-G: problems

• No resource selection

• Job monitoring is restricted by GRAM

• Can not use checkpointing and remote system calls

Page 43: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 43

GlideIn

• Run the Condor daemons on Grid resources as user jobs

• Create your own personal Condor pool from temporarily-acquired Grid resources

• Brings the full power of Condor to the Grid

Page 44: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 44

Globus Grid

PBS LSF

Condor

Condor-G

Page 45: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 45

Globus Grid

PBS LSF

Condor

600 Condorjobs

Condor-G

Page 46: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 46

Condor-G

Globus Grid

PBS LSF

Condor

600 Condorjobs

Page 47: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 47

Condor-G

Globus Grid

PBS LSF

Condor glide-ins

600 Condorjobs

Page 48: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 48

Condor-G

Globus Grid

PBS LSF

Condor glide-ins

600 Condorjobs

Page 49: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 49

Condor-G

Globus Grid

PBS LSF

Condor glide-ins

600 Condorjobs

Page 50: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 50

Condor-G

Globus Grid

PBS LSF

Condor glide-ins

600 Condorjobs

Page 51: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 51

Summary

• We talked about– Grid computing in general– Globus– Condor-G

• We did not talk about– Grid brokers and schedulers– Data grid– OGSI/OGSA

Page 52: Introduction to  Grid Computing

21/5/03 Mark Silberstein, CDP, Technion 52

References

• www.globus.org

• www.buyya.com

• The Grid Book by Foster and Kesselman

• New Grid Book by Berman et al

• grail.sdsc.edu

• www.cs.wisc.edu/condor