L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk...

20
L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young

Transcript of L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk...

Page 1: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

L ondone-S cienceC entre

Application Scheduling in a Grid Environment

Nine month progress talk

Laurie Young

Page 2: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

2

L ondone-S cienceC entre Overview

• Introduction to grid computing

• Work so far…

• Imperial College E-Science Networked Infrastructure (ICENI)

• Scheduling within ICENI

• Optimisation criteria/Scheduling policy

• Scheduling/Mapping algorithms

Page 3: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

3

L ondone-S cienceC entre What is a Grid?

CPU Node

CPU Node

Storage Node

Scientific Instrument

Visulisation/Steering Software

Page 4: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

4

L ondone-S cienceC entre What is a Grid Application?

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Page 5: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

5

L ondone-S cienceC entre Current Work

• Development of Supporting Technologies– Development of EPIC (E-Science Portal @ IC)

• GridFTP (High throughput FTP)

• Grid/Globus submission of jobs to resources

• Development of test application– Parameter sweep analysis of submarine acoustics– Multithreaded and Component versions– Integration with EPIC

Page 6: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

6

L ondone-S cienceC entre ICENI

• IC e-Science Networked Infrastructure

• Developed by LeSC Grid Middleware Group

• Collect and provide relevant Grid meta-data

• Use to define and develop higher-level services

The Iceni, under Queen Boudicca, united the tribes of South-East England in a revolt against the occupying Roman forces in AD60.

Page 7: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

7

L ondone-S cienceC entre ICENI Component Applications

• Each ICENI job is composed of multiple components. Each runs on a different resource

• Each component is connected to at least one other component. Data is passed along these connections

Page 8: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

8

L ondone-S cienceC entre The Scheduling Problem

Given a component application and a (large) network of linked

computational resources, what is the best mapping of components

onto resources?

Page 9: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

9

L ondone-S cienceC entre Scheduler in ICENI

Resources

ICENI

App Builder (GUI) Component Repository Performance Models

Scheduler Broker

Page 10: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

10

L ondone-S cienceC entre Multiple Metrics (1)

• “It is the goal of a scheduler to optimise one or more metrics” (Feitelson & Rudolph)

• Generally one metric is most important– Application Optimisation

• Execution time• Execution cost

– Host Optimisation• Host utilisation• Host throughput• Interaction Latency

Page 11: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

11

L ondone-S cienceC entre

• In a Grid Environment there are three application optimisation based important metrics– Start time ( )– End time ( )– Cost ( )

• Relative importance varies on a user by user and application by application basis

Multiple Metrics (2)

b

e

Page 12: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

12

L ondone-S cienceC entre

• A Benefit Function maps the metrics we are interested in to a single Benefit Value metric

• Different benefit functions represent different optimisation preferences

Combining Metrics – Benefit Fn

),,( ebBB

Page 13: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

13

L ondone-S cienceC entre Optimisation Preferences

• Cost Optimisation

• Time Optimisation

• Cost/Time Optimisation

max max e and if

eB

max max e and if

eB

max max e and if

eB

Page 14: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

14

L ondone-S cienceC entre Graph Oriented Scheduling (1)

• Applications are described as a graph– Nodes represent application components– Edges represent component communication

• Resources are described as a graph– Nodes represent resources– Edges represent network connections

Page 15: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

15

L ondone-S cienceC entre

VOYAGERMicrosoft/DellIntel Cluster32 processor

Giganet

Centre Resources

SATURNSun E6800 SMP

24 processorsBackplane: 9.6GB/s

PIONEERAthlon Cluster22 processor

100Mb

Storage

StorageATLASCompaq / Quadrics Cluster

32 processorMPI: ~5.7us & >200 MB/s

CONDOR POOL~ 150 PIII processors

AP300080 Sparc Ultra II

APNet

VIKING 1P4/Linux Cluster

66 dual node Myrinet

VIKING 2P4/Linux Cluster

68 dual node100Mb

6TB

1.2TB

24TB

Page 16: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

16

L ondone-S cienceC entre Graph Oriented Scheduling (2)

Condor pool

Atlas Saturn

Viking

Design Analyse

Scatter

Gather

Mesh

DRACS

Mesh

DRACS

Mesh

DRACS

Factory

Page 17: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

17

L ondone-S cienceC entre Graph Oriented Scheduling (3)

Condor pool

ScatterGather

DesignAtlas

Factory

AnalyseSaturn

Viking

Page 18: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

18

L ondone-S cienceC entre Schedule Benefit

• Each component and communication has a benefit function

• Each resource and network connection has a predicted time & cost for each component or communication that could be deployed

• Fit the task graph onto the resource graph to get the maximum Total Predicted Benefit

),,( ebt BB

Page 19: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

19

L ondone-S cienceC entre Future Work

• Develop benefit maximisation algorithms

• Test schedulers – On grid simulators such as SimGrid, GridSim and

MicroGrid– On grid testbeds, such as IC Testbed and the

EUDG

• Develop brokering methods

• Define Scheduler-Broker communications

Page 20: L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.

20

L ondone-S cienceC entre Summary

• Concept of grid computing for HPC/HTC

• ICENI Middleware for utilization of grids

• Importance of scheduling metrics

• Combining metrics

• Mapping application graphs - resource graphs

• Optimisation of total benefit

• Need good mapping algorithms…