Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Resource Management

Reading:

“A Resource Management Architecture for Metacomputing Systems”

What is Resource Management?

Mechanisms for locating and allocating computational resourcesAuthenticationProcess creation

Remote job submission Scheduling Other resources that can be managed:

MemoryDisk Networks

Resource Management Issues for Grid Computing

Site autonomyResources owned by different organizations,

in different administrative domainsLocal policies for use, scheduling, security

Heterogeneous substrateDifferent local resource management

systems Policy extensibility

Local sites need ability to customize their resource management policies

More Issues for Grid Computing

Co-allocationMay need resources at several sitesMechanism for allocating multiple

resources, initiating computation, monitoring and managing

On-line controlAdapt application requirements to resource

availability

Specifying Resource and Job Requirements

Resource requirements: Machine typeNumber of nodesMemoryNetwork

Job or scheduler parameters: DirectoryExecutableArgumentsEnvironmentMaximum time required

Resource and Job Specification

Globus: Resource Specification Language (RSL)&(executable=myprog) (|(&(count=5)

(memory>=64)) (&(count=10)(memory>=32)))

Condor: Classified adsResource owners advertise abilities and

constraintsApplications advertise resource requestsMatchmaking: match offers & requests

Components of Globus Resource Management Architecture

Resource specification using RSL Resource brokers: translate resource

requirements into specifications Co-allocators: break down requests for

multiple sites Local resource managers: apply local, site-

specific resource management policies Information about available compute

resources and their characteristics

Resource Specification Language

Common notation for exchange of information between components

API provided for manipulating RSL

RSL Syntax

Elementary form: parenthesis clauses(attribute op value [ value … ] )

Operators Supported:<, <=, =, >=, > , !=

Some supported attributes:executable, arguments, environment, stdin,

stdout, stderr, resourceManagerContact,resourceManagerName

Unknown attributes are passed through May be handled by subsequent tools

Constraints: “&”

For example:

& (count>=5) (count<=10)

(max_time=240) (memory>=64)

(executable=myprog) “Create 5-10 instances of myprog, each

on a machine with at least 64 MB memory that is available to me for 4 hours”

Multirequest: “+”

A multirequest allows us to specify multiple resource needs, for example

+ (& (count=5)(memory>=64)

(executable=p1))

(&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine

with at least 64M of memoryExecute p2 on a machine with an ATM

connection Multirequests are central to co-allocation

Resource Broker

Takes high-level RSL specification Transforms into concrete specifications

through “specialization” process Locate resources that meet requirements

Multiple brokers may service single request Application-specific brokers translate

application requirements

Output: complete specification of locations of resources; given to co-allocator

Examples of Resource Brokers

Nimrod-GAutomates creation and management of

large parametric experimentsRun application under wide range of input

conditions and aggregate resultsQueries MDS to find resourcesGenerates number of independent jobsGRAM allocates jobs to computational nodesHigher-level broker: allows user to specify

time and cost constraints

Examples of Resource Brokers

AppLeSApplication Level SchedulerMap large number of independent tasks to

dynamically varying pool of available computers

Use GRAM to locate resources and initiate and manage computation

Resource co-allocators

May request resources at multiple sitesTwo or more computers and networks

Break multi-request into components Pass each component to resource manager Provide means for monitoring job status or

terminating job Complex:

Two or more resource managersGlobal state like availability of resources

difficult to determine

Different co-allocation services

1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource

2. Allocate at least N out of M resources and return

3. Return immediately, but gradually return more resources as they become available

Each useful for some class of applications

Concurrent Allocation

If advance reservations are available: Obtain list of available time slots from each

participating resource manager and choose timeslot

Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current

availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated

Disadvantages of Concurrent Allocation Scheme

Computational resources wasted while waiting for all requested resources to become available

Application must be altered to perform barrier to synchronize startup across components

Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

Local Resource Managers

Implemented with Globus Resource Allocation Manager (GRAM)1. Processing RSL specifications representing

resource requests Deny request Create one or more processes (jobs) that satisfy

request

2. Enable remote monitoring and management of jobs

3. Periodically update MDS information service with current availability and capabilities of resources

GRAM (cont.)

Interface between grid environment and entity that can create processesE.g., Parallel scheduler or Condor pool

GRAM may schedule resource itself More commonly, maps resource

specification into a request to a local resource allocation mechanismE.g., Condor, LoadLeveler, LSF

Co-exists with local mechanisms

GRAM (cont.)

GRAM API has functions for:Submitting a job request: produces globally

unique job handleCanceling a job requestAsking when job request is expected to runUpon submission, can request that progress

be signaled asynchronously to callback URL

GRAM Scheduling Model

Jobs are either:Pending: resources have not yet been

allocated to the jobActive: resources allocated, job runningDone: when all processes have terminated

and resources have been deallocatedFailed: job terminates due to :

explicit terminationerror in request formatfailure in resource management systemdenial of access to resource

GRAM Components Gatekeeper

Responds to a request:

1. Performs mutual authentication of user and resource

2. Determines local user name for remote user

3. Starts a job manager that executes as local user and handles request

GRAM Components (cont.)

Job managerCreates processes requested by userSubmits resource allocation requests to

underlying resource management system (or does fork)

Monitors state of created processesNotifies callback contact of state transitionsImplements control operations like

termination

GRAM Components (cont.)

GRAM reporter

Responsible for storing into MDS (information service) info about:Scheduler structure

Support reservations?Number of queues

Scheduler stateCurrently active jobsExpected wait time in queueTotal number of nodes and available nodes

GRAM GRAM GRAM

LSF EASY-LL NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Resource Management Architecture

Job Submission Interfaces

Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructure

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Documents

Transcript of Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”