Apr 20, 2023 1
Global Community
Slide Courtesy of Ian Foster
Resource Management in Grid Computing
AZIZOL ABDULLAH,PhDDEPARTMENT OF COMMUNICATION TECHNOLOGY AND
NETWORK
Resource Management• What needs to be managed: Resources
– Physical resources (computer, disks, databases, networks, scientific instruments).
– Logical resources (jobs, executing applications, complex workflows etc.).
• What is the Goal– Resources must be available and meet
performance criteria.
Resource Management (Cont.)• What is Management:
– The process of locating various types of capability, arranging for their use, utilizing them and monitoring their state.Maintenance of resources and
environmentMonitoring their state and performanceReacting to internal and external
changes in resource or its environmentInitiating routine operations: initialization,
start/stop and tuning
What is Resource Management?
Mechanisms for locating and allocating computational resourcesAuthenticationProcess creation
Remote job submission Scheduling Other resources that can be managed:
MemoryDisk Networks
Resource Management Issues for Grid Computing
Site autonomyResources owned by different organizations,
in different administrative domainsLocal policies for use, scheduling, security
Heterogeneous substrateDifferent local resource management
systems Policy extensibility
Local sites need ability to customize their resource management policies
More Issues for Grid Computing
Co-allocationMay need resources at several sitesMechanism for allocating multiple
resources, initiating computation, monitoring and managing
On-line controlAdapt application requirements to resource
availability
Manageability• The ability of a resource to be managed• Manageability interfaces support common
operations (control and monitor)• Manageability standards specify standard interfaces • Problem:
– Existing interfaces are generally resource-specific– Almost impossible to add standard interfaces to
legacy resources– New standards may require additional interfaces
• Solution: – Common standards– Based on Service orientation, integration and
virtualization.
Service orientation
• Software services – A service provides some capability to its clients
through message exchanges– represent the physical manageable entities– understand the unique interfaces for the entities
they represent– implement applicable standard interfaces
• Integration– Encapsulated application in services become
Integratable building blocks
Service orientation (Cont.)• The management process
– Manager invokes the operation (service’s standard interface)
– Service performs operation on managed entity (resource’s unique interface)
– Service returns result to manager (through the standard interface)
• Problem– Need a common way to implement service
• Solution: Web Services
Virtualization
MANAGER
COMPUTERSOTHER
SERVICEPROVIDERS
COMMON INTERFACES
RESOURCE SPECIFIC INTERFACES
Cluster
RRR
Mainframe
RRRIBMIBM
Blades
RRR
DISKS TELESCOPESWEB
SERVICES
PHYSICAL RESOURCES
Traditional Resource Management
• Batch schedulers, workflow engines, operating systems
• Designed and operated under the assumption that:
– They have complete control over a resource
– They can implement the mechanisms and policies needed for effective use of that resource in isolation
• This is not the case for Grid Resource management Separate administrative domains Resource Hetrogeneity Lack of control and difference policies
Grid Resource Management• What is Grid Resource Management?
– Identifying application requirements, resource specification
– Matching resources to applications – Allocating/scheduling and monitoring
those resources and applications over time in order to run as effectively as possible.
Grid Resource Management (Cont.)
• Challenges in Grid Resource Management– Resources are heterogeneous in nature
• Processors, disks, data, networks, other services.
– Application has to compete for resources– Lack of available data about current
systems, needs of users, resource owners and administrators
Grid RM Mechanisms• Resource Information Dissemination
– Published by the Resource(push) or gathered by GIS (pull)
– On-demand dissemination (by agents)• Resource Discovery
– Centralized or distributed quesries, agents, distributed queries + agents
– Resources are described in schema/language or objects
• Resource Scheduling/Job execution– Assigning resourses, centralized, hierarchical,
distributed• Resource Monitoring and Re-Scheduling
– Monitoring can be done by application (polling) or by resource (notification to the app or periodic status updates).
Grid Resouce Brokerage
• Discovering suitable resources for user's job• Currently scenario: Manual or semi-manual
– users manually target their work at the machine that is already known to them.
• For larger grids, manual solution is not feasible• Solution is Grid Resource Broker:
– The user describes their needs to a third party (software)
– which searches for suitable resources, and passes the result(s) back to the user.
Grid Resouce Brokerage
• Role of the Broker in a Management System– Resource descovery
• Authorization filtering, Application definition, Minimum Requirement filtering
– System Selection• Dynamic information gathering, system selection
– Allocation and Advance reservation• Grid Information System
– Organize a set of sensors on resources so that client or broker can have easy access to data (static or dynamic)
Matchmaking
• Process of selecting resources based on application requirements
• Symmetric matchmaking– Attribute-based matching
• Resource provider and resource user have to agree on a schema, attribute names and value ranges
• Syntax based like ClassAds
• Asymmetric matchmaking
– Ontology based matching• Ontologies, domain background knowledge, matchmaking
rules
Specifying Resource and Job Requirements
Resource requirements: Machine typeNumber of nodesMemoryNetwork
Job or scheduler parameters: DirectoryExecutableArgumentsEnvironmentMaximum time required
Resource and Job Specification
Globus: Resource Specification Language (RSL)&(executable=myprog) (|(&(count=5)
(memory>=64)) (&(count=10)(memory>=32)))
Condor: Classified adsResource owners advertise abilities and
constraintsApplications advertise resource requestsMatchmaking: match offers & requests
Components of Globus Resource Management Architecture
Resource specification using RSL Resource brokers: translate resource
requirements into specifications Co-allocators: break down requests for
multiple sites Local resource managers: apply local, site-
specific resource management policies Information about available compute
resources and their characteristics
Resource Specification Language
Common notation for exchange of information between components
API provided for manipulating RSL
RSL Syntax
Elementary form: parenthesis clauses(attribute op value [ value … ] )
Operators Supported:<, <=, =, >=, > , !=
Some supported attributes:executable, arguments, environment, stdin,
stdout, stderr, resourceManagerContact,resourceManagerName
Unknown attributes are passed through May be handled by subsequent tools
Constraints: “&”
For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog) “Create 5-10 instances of myprog, each
on a machine with at least 64 MB memory that is available to me for 4 hours”
Multirequest: “+”
A multirequest allows us to specify multiple resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine
with at least 64M of memoryExecute p2 on a machine with an ATM
connection Multirequests are central to co-allocation
Resource Broker
Takes high-level RSL specification Transforms into concrete specifications
through “specialization” process Locate resources that meet requirements
Multiple brokers may service single request Application-specific brokers translate
application requirements
Output: complete specification of locations of resources; given to co-allocator
Examples of Resource Brokers
Nimrod-GAutomates creation and management of
large parametric experimentsRun application under wide range of input
conditions and aggregate resultsQueries MDS to find resourcesGenerates number of independent jobsGRAM allocates jobs to computational nodesHigher-level broker: allows user to specify
time and cost constraints
Examples of Resource Brokers
AppLeSApplication Level SchedulerMap large number of independent tasks to
dynamically varying pool of available computers
Use GRAM to locate resources and initiate and manage computation
GRAM GRAM GRAM
LSF EASY-LL NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Resource Management Architecture
Resource co-allocators
May request resources at multiple sitesTwo or more computers and networks
Break multi-request into components Pass each component to resource manager Provide means for monitoring job status or
terminating job Complex:
Two or more resource managersGlobal state like availability of resources
difficult to determine
Different co-allocation services
1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource
2. Allocate at least N out of M resources and return
3. Return immediately, but gradually return more resources as they become available
Each useful for some class of applications
Concurrent Allocation
If advance reservations are available: Obtain list of available time slots from each
participating resource manager and choose timeslot
Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current
availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated
Disadvantages of Concurrent Allocation Scheme
Computational resources wasted while waiting for all requested resources to become available
Application must be altered to perform barrier to synchronize startup across components
Detecting failure of a resource is difficult, e.g. in queue-based local resource managers
Local Resource Managers
Implemented with Globus Resource Allocation Manager (GRAM)1. Processing RSL specifications representing
resource requests Deny request Create one or more processes (jobs) that satisfy
request
2. Enable remote monitoring and management of jobs
3. Periodically update MDS information service with current availability and capabilities of resources
GRAM (cont.)
Interface between grid environment and entity that can create processesE.g., Parallel scheduler or Condor pool
GRAM may schedule resource itself More commonly, maps resource
specification into a request to a local resource allocation mechanismE.g., Condor, LoadLeveler, LSF
Co-exists with local mechanisms
GRAM (cont.)
GRAM API has functions for:Submitting a job request: produces globally
unique job handleCanceling a job requestAsking when job request is expected to runUpon submission, can request that progress
be signaled asynchronously to callback URL
GRAM Scheduling Model
Jobs are either:Pending: resources have not yet been
allocated to the jobActive: resources allocated, job runningDone: when all processes have terminated
and resources have been deallocatedFailed: job terminates due to :
explicit terminationerror in request formatfailure in resource management systemdenial of access to resource
GRAM Components Gatekeeper
Responds to a request:
1. Performs mutual authentication of user and resource
2. Determines local user name for remote user
3. Starts a job manager that executes as local user and handles request
GRAM Components (cont.)
Job managerCreates processes requested by userSubmits resource allocation requests to
underlying resource management system (or does fork)
Monitors state of created processesNotifies callback contact of state transitionsImplements control operations like
termination
GRAM Components (cont.)
GRAM reporter
Responsible for storing into MDS (information service) info about:Scheduler structure
Support reservations?Number of queues
Scheduler stateCurrently active jobsExpected wait time in queueTotal number of nodes and available nodes
Job Submission Interfaces
Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructure
Scheduling in Grid
Optimize Performance: execution time, throughput, fairness and etc. (QoS)
Load balancing. Help to design an effective program
model. Ubiquity. process scheduling in operating
system, task scheduling in parallel computing and scheduling in real life too.
Scheduling in GRID Application level. resource e.g. data, communication
bandwidth. Models, scheduling policy, program model,
performance model, performance measurement.
Current performance measure, minimize execution time.
Requirements on GRID scheduling model
Adaptive to the dynamic environment. Adaptive to the varying performance
metrics upon the course of application execution.
Performance predictions over time. Coarse and fine-tuning the component
parameters.
Techniques commonly employed
Parameterize the components in an application.
Make use of dynamic information, e.g. CPU slots available percentage, network bandwidth available percentage.
Compositional scheduling model, structural character of application and dynamic interaction with grid environment.
Scheduling Policy
Choose a set of resources to achieve the performance goal.
Fist Come, First Serve. Preemptive. Fair Queuing. And etc.
AppLes: Application-Level Scheduler
Everything evaluated in terms of the impact on the application, so the resources are evaluated in terms of the predicted capacities and their potential for requirements.
No resource manager is assumed. On User-level, no specific privilege required. Heterogeneous and cross organization. Depends on use Network Weather Service
for the dynamic resource load and availability.
AppLes(Cont’d)
Information gathered by the network weather service is used to parameterize performance models and to predict the state of grid resources at the time the application will be scheduled.
Time balancing, all processors are assigned some possibly nonuniform amount of the goal that they will all finish at roughly the same time.
Compositional component models is deployed. Adaptive scheduling scheme.
Conclusion
Scheduling is the key for performance in grid environment.
Coordinating resources in grid environment Most advanced grid application are
targeted to specific resources. High-Performance Scheduling Evolution.
Open issues• Multiple layers of schedulers
– The higher level scheduler has less information about the remote resources, local resource managers actually control the resources
• Lack of control over resources– Grid scheduler does not have ownership or control over the resources
• Shared resources and variance– No dedicated access to the resources (resources are shared)– This results in a high degree of variance and unpredictability
• Conflicting performance goals– Many participants have different/conflicting preferences– Many different local policies, cost models, security
Top Related