Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

27
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Transcript of Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Page 1: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource Management

Reading:

“A Resource Management Architecture for Metacomputing Systems”

Page 2: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

What is Resource Management?

Mechanisms for locating and allocating computational resourcesAuthenticationProcess creation

Remote job submission Scheduling Other resources that can be managed:

MemoryDisk Networks

Page 3: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource Management Issues for Grid Computing

Site autonomyResources owned by different organizations,

in different administrative domainsLocal policies for use, scheduling, security

Heterogeneous substrateDifferent local resource management

systems Policy extensibility

Local sites need ability to customize their resource management policies

Page 4: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

More Issues for Grid Computing

Co-allocationMay need resources at several sitesMechanism for allocating multiple

resources, initiating computation, monitoring and managing

On-line controlAdapt application requirements to resource

availability

Page 5: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Specifying Resource and Job Requirements

Resource requirements: Machine typeNumber of nodesMemoryNetwork

Job or scheduler parameters: DirectoryExecutableArgumentsEnvironmentMaximum time required

Page 6: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource and Job Specification

Globus: Resource Specification Language (RSL)&(executable=myprog) (|(&(count=5)

(memory>=64)) (&(count=10)(memory>=32)))

Condor: Classified adsResource owners advertise abilities and

constraintsApplications advertise resource requestsMatchmaking: match offers & requests

Page 7: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Components of Globus Resource Management Architecture

Resource specification using RSL Resource brokers: translate resource

requirements into specifications Co-allocators: break down requests for

multiple sites Local resource managers: apply local, site-

specific resource management policies Information about available compute

resources and their characteristics

Page 8: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource Specification Language

Common notation for exchange of information between components

API provided for manipulating RSL

Page 9: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

RSL Syntax

Elementary form: parenthesis clauses(attribute op value [ value … ] )

Operators Supported:<, <=, =, >=, > , !=

Some supported attributes:executable, arguments, environment, stdin,

stdout, stderr, resourceManagerContact,resourceManagerName

Unknown attributes are passed through May be handled by subsequent tools

Page 10: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Constraints: “&”

For example:

& (count>=5) (count<=10)

(max_time=240) (memory>=64)

(executable=myprog) “Create 5-10 instances of myprog, each

on a machine with at least 64 MB memory that is available to me for 4 hours”

Page 11: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Multirequest: “+”

A multirequest allows us to specify multiple resource needs, for example

+ (& (count=5)(memory>=64)

(executable=p1))

(&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine

with at least 64M of memoryExecute p2 on a machine with an ATM

connection Multirequests are central to co-allocation

Page 12: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource Broker

Takes high-level RSL specification Transforms into concrete specifications

through “specialization” process Locate resources that meet requirements

Multiple brokers may service single request Application-specific brokers translate

application requirements

Output: complete specification of locations of resources; given to co-allocator

Page 13: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Examples of Resource Brokers

Nimrod-GAutomates creation and management of

large parametric experimentsRun application under wide range of input

conditions and aggregate resultsQueries MDS to find resourcesGenerates number of independent jobsGRAM allocates jobs to computational nodesHigher-level broker: allows user to specify

time and cost constraints

Page 14: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Examples of Resource Brokers

AppLeSApplication Level SchedulerMap large number of independent tasks to

dynamically varying pool of available computers

Use GRAM to locate resources and initiate and manage computation

Page 15: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource co-allocators

May request resources at multiple sitesTwo or more computers and networks

Break multi-request into components Pass each component to resource manager Provide means for monitoring job status or

terminating job Complex:

Two or more resource managersGlobal state like availability of resources

difficult to determine

Page 16: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Different co-allocation services

1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource

2. Allocate at least N out of M resources and return

3. Return immediately, but gradually return more resources as they become available

Each useful for some class of applications

Page 17: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Concurrent Allocation

If advance reservations are available: Obtain list of available time slots from each

participating resource manager and choose timeslot

Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current

availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated

Page 18: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Disadvantages of Concurrent Allocation Scheme

Computational resources wasted while waiting for all requested resources to become available

Application must be altered to perform barrier to synchronize startup across components

Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

Page 19: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Local Resource Managers

Implemented with Globus Resource Allocation Manager (GRAM)1. Processing RSL specifications representing

resource requests Deny request Create one or more processes (jobs) that satisfy

request

2. Enable remote monitoring and management of jobs

3. Periodically update MDS information service with current availability and capabilities of resources

Page 20: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM (cont.)

Interface between grid environment and entity that can create processesE.g., Parallel scheduler or Condor pool

GRAM may schedule resource itself More commonly, maps resource

specification into a request to a local resource allocation mechanismE.g., Condor, LoadLeveler, LSF

Co-exists with local mechanisms

Page 21: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM (cont.)

GRAM API has functions for:Submitting a job request: produces globally

unique job handleCanceling a job requestAsking when job request is expected to runUpon submission, can request that progress

be signaled asynchronously to callback URL

Page 22: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM Scheduling Model

Jobs are either:Pending: resources have not yet been

allocated to the jobActive: resources allocated, job runningDone: when all processes have terminated

and resources have been deallocatedFailed: job terminates due to :

explicit terminationerror in request formatfailure in resource management systemdenial of access to resource

Page 23: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM Components Gatekeeper

Responds to a request:

1. Performs mutual authentication of user and resource

2. Determines local user name for remote user

3. Starts a job manager that executes as local user and handles request

Page 24: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM Components (cont.)

Job managerCreates processes requested by userSubmits resource allocation requests to

underlying resource management system (or does fork)

Monitors state of created processesNotifies callback contact of state transitionsImplements control operations like

termination

Page 25: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM Components (cont.)

GRAM reporter

Responsible for storing into MDS (information service) info about:Scheduler structure

Support reservations?Number of queues

Scheduler stateCurrently active jobsExpected wait time in queueTotal number of nodes and available nodes

Page 26: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

GRAM GRAM GRAM

LSF EASY-LL NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Resource Management Architecture

Page 27: Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Job Submission Interfaces

Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructure