A Unified Approach to Scheduling in Grid Environments

A Unified Approach to Scheduling in Grid Environments

Peter KellyDHPC Group, School of Computer Science

Supervisors:

Dr. Paul Coddington

Dr. Andrew Wendelborn

What is grid computing? Middleware which allows people and organisations to share computing

resources in a coordinated manner Data and computation can be distributed between machines in other

institutions around the country/world Remote access to resources that are not available locally

Grid computing aims to be seamless and secure Users interact with “the grid”, instead of individual machines Different platforms integrated using standard protocols Support for a wide range of applications

Common ways in which a grid is used Execute a program on a remote host Read and write data stored on another machine Access services provided by a particular server

Types of grids“Heavyweight” grids Consist of supercomputers and other high-end machines Generally used for computation sharing High level of security and autonomy Can be complex to set up and maintain

“Lightweight” grids Commodity desktop machines in homes and offices Generally used for computation sharing Usually based on master-worker model Software are much simpler, can be installed by end users

Service based grids Can consist of any type of machine Provides specific functionality rather than generic compute cycles Platform independent; implementation details hidden behind interface

What is Scheduling?Need to decide when and where computation and other things are to

occur

Traditional CPU scheduling Familiar case of task scheduling on a single CPU - time-slicing

allocates CPU time to processes SMP systems - several CPUs - time slices are allocated between

processors Parallel computers - many processors - tasks usually have exclusive

access to a certain number of processors Clusters of workstations – similar case – tasks must be assigned to

specific processors

Lots of past research already done for parallel computers and clusters

Grid computing has some similarities to these However, many additional factors have to be considered

Grid SchedulingMany extra complexities

Scheduling between multiple computers Different CPU speeds, architectures Network connectivity can vary widely between machines Many different users and concurrent tasks

Data location is important Data should be close to computation for efficient access Affected by network bandwidth and available storage resources

Centralised vs. distributed scheduling Centralised scheduling offers more control, but limits autonomy of

individual resources Centralised algorithms only scale to a few hundred machines Distributed scheduling gives more control to machine owners, and is

much more scalable

Grid SchedulingCurrent scheduling strategies for grids are limited Some only support specific application models e.g. task farming Generic mechanisms do not usually support parallel programs Scheduling is generally based on independent jobs, or application-

specific parallel scheduling Centralised schedulers - limited scalability

Three main types of scheduling Job submission Services Data placement (replica management)

These types of scheduling are normally independent of each other.

Job submission Similar to batch processing model used on mainframes Client supplies details of job to run, including program name,

command-line arguments, environment variables In many cases, client also uploads executable to run Server runs job immediately or in the future, and notifies user on

completion Useful for gaining access to faster computers on the grid to run your

own programs Platform-dependent; client needs knowledge of server configuration,

and if binary executable supplied, server must have specific OS/architecture

Common examples Globus GRAM Condor LSF PBS

Job submission - SchedulingMetascheduling (done at grid level) On which resource should the job run? Choice based on job requirements, access permissions, machine load

and other parameters Parallel jobs can be scheduled to run across multiple resources Each resource may physically contain multiple processors – e.g.

parallel computer or cluster Example – parameter sweep application. Each resource handles a

particular range.

Local scheduling (done at individual resource level) Once job is assigned to a resource, when should it run? Usually determined by local job queuing system Job is run if machine is currently unused, or may be delayed until the

other jobs have completed Alternatively, job may start straight away and run concurrently with any

other existing jobs on that machine

Computational resource



ClientResource broker (metascheduler)

Services Well-defined interface with a set of operations Accessed via standard protocol – no knowledge of platform necessary Different implementations of a service, all conforming to the same

interface, can be accessed transparently by clients Set of services provided by a machine is generally fixed – clients can’t

supply code to execute as they can in the job submission model Additional services can only be installed/configured by machine owner

Uses Client-server app – access a single service Workflow application – access many services and coordinate flow of

data between them

Common technologies Java RMI Web Services Sun RPC

Services - Scheduling If a service is provided by more than one machine, client can chose

which to connect to Some machines may be more desirable than others, based on

expected time to perform operations or transfer data Clients either obtain list of servers from a central registry, or have

messages redirected by some intermediate entity

Example: Website mirroring Multiple web servers host copies of the same site Load balancer intercepts requests from clients and redirects to servers

according to some scheduling algorithm Alternatively, multiple DNS entries for the same site – client selects a

machine

Data accessGrid applications need to access data residing on other machines

Common approaches Mounting remote file system (e.g. NFS, CIFS) then using OS APIs Protocols such as FTP, HTTP, GridFTP. Programs often use shared

libraries implementing these (e.g. the GASS library) Interposition agents – intercept system calls and redirect file operations

to remote host Pre-staging – data files copied to remote machine before job runs

Data scheduling and replication

The faster a program can access its data, the better Can move data to the program, or move program to the data Multiple copies (replicas) of the data can exist on different machines Programs access the “closest” replica (the one they have the fastest

access to) Data from remote hosts may be cached locally

Knowledge about data access patterns can help with replication e.g. if many jobs access the same file, can pre-stage that file to remote

machine(s) that will run the jobs

Data vs. Computation scheduling

Scheduling decisions that choose a machine just on CPU speed/load could result in data access that is very slow.

Is placing the job on Host 1 or Host 2 better? This depends on how much data it reads…

Host 1: 400Mhz Host 2: 2.6Ghz

256kbps DSL

Data file18Gb

Job

Data vs. Computation scheduling If the job access the entire file and does only a small amount of computation, it

is better to run it on Host 1 But if it only reads a few kb from different parts of the file, running the job on

Host 2 would be quicker File could be pre-staged to Host 2 if it is reused by multiple jobs

Host 1: 400Mhz Host 2: 2.6Ghz

256kbps DSL

Data file18Gb

Job

Service scheduling and communications speed

We need to take into account communication costs not just for data file access, but for interaction between multiple tasks, and when accessing services

Should the client access Server 1 or Server 2? Again, this depends on the amount of data that needs to be transferred and the

amount of computation performed by the service. Scheduler needs lots of info – network bandwidth, CPU speed, application data

access requirements

Client

Server 1

Server 2

1.2Gz, loadavg 5.2

1.2Gz, loadavg 0.1

256kbps DSL

100mbps ethernet

ProblemExisting schedulers generally only consider one type of scheduling Optimising one aspect of performance, e.g. execution speed, can have

negative effects on other aspects, and on overall performance Ignoring data transfer costs could result in large transfer times Ignoring CPU performance could result in slow programs Ignoring common requirements between jobs can result in missing out

on chances for optimisation Simplifying assumptions made by some schedulers don’t always hold

Proposed Solution Build a scheduler which considers all aspects of performance This will allow for better overall performance in a wider range of

situations

An integrated viewRemove the distinction between different types of things that we

need to schedule Computation (tasks), data (files) and services can all be considered

“entities” that comprise an application Each entity is capable of residing on a subset of the machines on the

grid A scheduler can just consider where to place the entities, and a

separate mechanism deals with the low-level details

Examples A task entity corresponding to a Java class can run on any machine

with a JVM installed A service entity, representing a connection made by the program to a

service, can be assigned to any machine which provides that service A file entity, corresponding to a particular file accessed by a program,

can be moved or copied to any host with sufficient storage space available

Graph representation Based on the entities and relationships between them, an application

can be represented as a graph Relationships include communication between tasks, I/O on files, and

operation calls on services

Service A

Task 2

Task 3

File X

Task 1 Task 4 File Y

Integrated scheduling Scheduler is given information about each of the different entities This is used to make scheduling decisions, which are then carried out

by a separate component

Task 1 Service A File X

Type = task

ComputationCost = 0.4

StorageSize = 0Mb

MemoryReq = 18Mb

Type = service

ComputationCost = 0.8

StorageSize = 0Mb

MemoryReq = 33Mb

Type = file

ComputationCost = 0

StorageSize = 180Mb

MemoryReq = 0Mb

Process networks A graph-based model of computation Network consists of a number of processes, each of which perform

computation and input/output data along channels Data flows along channels connecting nodes

This is a useful model on which to base a grid scheduler

Process networksAny program run on a grid can be conceptually thought of as a

process network Each task, service and file corresponds to a “process” Tasks execute instructions according to code in the program Files are like processes which perform no computation but simply input

or output a stream of data (corresponding to read or write access) Services are like tasks except their operations are pre-defined, and

communication occurs through requests and responses

Channels connecting from one node to another can represent: Messages being sent from one task to another Data read from or written to a file The requests/responses sent between a task and a service

Initial research

This will utilise PAGIS, a grid programming infrastructure based on process networks, developed within the DHPC group

Support exists for writing tasks in Java Tasks can be configured together in a process network Will be extended to support data files and services as nodes Metascheduler to be written which runs within PAGIS and allows

multiple programs to be scheduled concurrently

Investigation Centralised and distributed scheduling algorithms Information about resources and applications needed by the scheduler Advantages of integrated view vs. scheduling based on one aspect of

performance Look at a range of different grid and application scenarios

Further researchApplicability of our research to other systems Will look at scheduling in the context of additional systems: e.g. Condor,

LSF, NetSolve, Globus, Nimrod, Taverna Scheduler developed in previous stages will be enhanced and interfaced

with other systems An existing simulation environment will be used to experiment with

scheduling algorithms on larger grid configurations

Outcomes Demonstrate and evaluate the integrated approach to scheduling Theoretical level: What are the best scheduling algorithms for grid

environments, and how are they affected by nature of grid and application?

Practical level: How can such scheduling be used to enhance the operation of existing grid systems?

Conclusion Grid computing is far from a “solved problem” Scheduling is an important area of research Much work done previously in the area of parallel and cluster

computing Some scheduling support already available in existing grids, but with

various limitations

In this project A range of different scheduling algorithms and techniques will be

investigated These will include existing algorithms used for parallel/cluster

computing, as well as the development of new algorithms Integrated approach to scheduling is proposed – will allow for more

effective scheduling over a wider range of cases Demonstration of our approach and evaluation of how well it applies in

a range of situations

Questions?

A Unified Approach to Scheduling in Grid Environments

Documents

Transcript of A Unified Approach to Scheduling in Grid Environments