30.09.2004Computing in High Energy Physics - 2004, Interlaken Switzerland 1 Sphinx: A Scheduling...

16
30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland 1 Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid Jang Uk In, Paul Avery, Richard Cavanaugh , Laukik Chitnis, Mandar Kulkarni, Sanjay Ranka University of Florida

description

Computing in High Energy Physics , Interlaken Switzerland 3 A Real Life Example oMerge two grids into a single multi-VO “Inter-Grid” oHow to ensure that oneither VO is harmed? oboth VOs actually benefit? othere are answers to questions like: o“With what probability will my job be scheduled and complete before my conference deadline?” oClear need for a scheduling middleware! FNAL Rice UI MIT UCSD UF UW Caltech UM UTA ANL IU UC LBL SMU OU BU BNL

Transcript of 30.09.2004Computing in High Energy Physics - 2004, Interlaken Switzerland 1 Sphinx: A Scheduling...

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

1

Sphinx: A Scheduling Middleware for Data

Intensive Applications on a GridJang Uk In, Paul Avery, Richard

Cavanaugh, Laukik Chitnis, Mandar Kulkarni, Sanjay

Ranka

University of Florida

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

2

The Problem of Grid Scheduling

o Decentralised ownershipo No one controls the grid

o Heterogeneous compositiono Difficult to guarantee execution environments

o Dynamic availability of resourceso Ubiquitous monitoring infrastructure needed

o Complex policieso Issues of trusto Lack of accounting infrastructureo May change with time

o Information gathering and processing is critical!

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

3

A Real Life Exampleo Merge two grids into a single multi-VO

“Inter-Grid”

o How to ensure that o neither VO is harmed?o both VOs actually benefit?o there are answers to questions like:

o “With what probability will my job be scheduled and complete before my conference deadline?”

o Clear need for a scheduling middleware!

FNAL

Rice

UIMIT

UCSD

UF

UW

Caltech

UM

UTA

ANL

IU

UC

LBL

SMU

OU

BU

BNL

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

4

Emerging Challenge:Intra-VO

Managemento Today’s Grid

o Few Production Managerso Tomorrow’s Grid

o Few Production Managerso Many Analysis Users

o How to ensure thato “Handles” exist for the VO to “throttle” different priorities

o Production vs. Analysiso User-A vs. User-B

o The VO is able to o “Inventory” all resources currently available to it over some time

periodo Strategically plan for its own use of those resources during that

time period

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

5

Some Requirements for Effective Grid

Schedulingo Information requirements

o Past & future dependencies of the application

o Persistent storage of workflows

o Resource usage estimationo Policies

o Expected to vary slowly over time

o Global views of job descriptions

o Request Tracking and Usage Statistics

o State information important

o Resource Properties and Status

o Expected to vary slowly with time

o Grid weathero Latency measurement

importanto Replica management

o System requirementso Distributed, fault-

tolerant schedulingo Customisabilityo Interoperability with

other scheduling systemso Quality of Service

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

6

Incorporate Requirements

into a Framework

VDT Server

VDT Server

VDT Server

VDT Client

o Assume the GriPhyN Virtual Data Toolkit:o Client (request/job submission)

o Globus clientso Condor-G/DAGMano Chimera Virtual Data System

o Server (resource gatekeeper)o MonALISA Monitoring Serviceo Globus services o RLS (Replica Location Service)

??

?

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

7

Incorporate Requirements

into a Framework

o Assume the GriPhyN Virtual Data Toolkit:o Client (request/job

submission)o Clarens Web Serviceo Globus clientso Condor-G/DAGMano Chimera Virtual Data System

o Server (resource gatekeeper)o MonALISA Monitoring Serviceo Globus services o RLS (Replica Location Service)

VDT Server

VDT Server

VDT Server

VDT Client

o Framework design principles:o Information driveno Flexible client-server modelo General, but pragmatic and

simpleo Implement now; learn; extend

over timeo Avoid adding middleware

requirements on grid resourceso Take what is offered!

?

RecommendationEngine

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

8

The Sphinx Framework

Sphinx Server

VDT Client

VDT Server Site

MonALISA Monitoring Service

Globus Resource

Replica Location Service

Condor-G/DAGMan

Request Processing

Data Warehouse

Data Management

InformationGathering

Sphinx ClientChimera

Virtual Data System

ClarensWS Backbone

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

9

Sphinx Scheduling Server

o Functions as the Nerve Centre

o Data Warehouseo Policies, Account

Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc

o Control Processo Finite State Machine

o Different modules modify jobs, graphs, workflows, etc and change their state

o Flexibleo Extensible Sphinx Server

Control Process

Job Execution Planner

Graph Reducer

Graph Tracker

Job Predictor

Graph Data Planner

Job Admission Control

Message Interface

Graph Predictor

Graph Admission Control

Data Warehouse

Data Management

Information Gatherer

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

10

Quality of Serviceo For grid computing to

become economically viable, a Quality of Service is needed

o “Can the grid possibly handle my request within my required time window?”

o If not, why not? When might it be able to accommodate such a request?

o If yes, with what probability?

o But, grid computing today typically:o Relies on a “greedy” job

placement strategieso Works well in a resource rich

(user poor) environmento Assumes no correlation

between job placement choices

o Provides no QoS

o As a grid becomes resource limited, o QoS becomes more important!o “greedy” strategies not always

a good choiceo Strong correlation between

job placement choices

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

11

Policy Constraintso Defined by Resource Providers

o Actual grid sites (resource centres)o VO management

o Applied to Request Submitterso VO, group, user, or even a proxy request (e.g. workflow)

o Valid over a Period of Timeo Can be dynamic (e.g. periodic) or constant

o Hierarchical/Recursiveo VOs ↔ Sub-VOs ↔ Users ↔ Proxies o Grids ↔ CEs,SEs ↔ Machineso Months ↔ Weeks ↔ Days

o VO accounting and book-keeping is necessary

SubmittersResources Time

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

12

Policy Based Schedulingo Sphinx provides “soft” QoS through

time dependent, global views of o Submissions (workflows, jobs, allocation,

etc)o Policieso Resources

o Uses Linear Programming Methodso Satisfy Constraints

o Policies, User-requirements, etco Optimise an “objective” function

o Provides the QoS

o GOAL: Estimate probabilities to meet deadlines within policy constraints

ResourcesSu

bmis

sion

sTime

SubmissionsResources Time

Policy Space

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

13

Policy Based Scheduling

Simulations using LP

Highly biased usage quotas Initial workload

Resource allocations Changed workload

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

14

Effective Use of Monitoring

o Primary useful Real-time observableso Queue-depth on Compute Elements

o Number of idle jobso Number of running jobso Number of available CPU slots

o “df” on Storage Elementso RTT between data transfer points

o Research: o Estimate and manage information

latency o Forecast grid weather

o Use data-mining techniqueso Aggregate/Fusiono Statisticalo Mathematical Modelling

o No better (or worse?) than forecasting Hurricanes!

Sphinx Using Grid3 no Monitoring

Sphinx Using Grid3 with Monitoring

Completion Time (seconds)

Completion Time (seconds)

Num

ber o

f DA

Gs

Num

ber o

f DA

Gs

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

15

Sphinx Test Results from Grid3

o Experimental Set Up: Two competing Sphinx Servers

o One without monitoringo LP based schedulingo Round-robin load-balancer

o One with monitoringo LP based scheduling o Time dependent objective function

o Submitted the same 30, 10 step workflows to each Sphinx Server simultaneously

o 300 total jobs per Sphinx Server

o Exploited 13 sites across Grid3o Each site had to pass a tight stability “cut”o Grid3 was “loaded” with

o ATLAS DC2 Productiono Other iVDGL work

o To first order, Sphinx minimised the time spent waiting in the remote CE queue

o Real-time, knowledge based decisions are important!!

Idle Time (seconds)

Idle Time (seconds)

Num

ber o

f Job

sN

umbe

r of J

obs

Sphinx Using Grid3 no Monitoring

Sphinx Using Grid3 with Monitoring(queue-depth only)

30.09.2004 Computing in High Energy Physics - 2004, Interlaken Switzerland

16

Conclusionso Scheduling on a grid has unique requirements

o Informationo System

o Decisions based on global views providing a Quality of Service are importanto Particularly in a resource limited environment

o Sphinx is an extensible, flexible grid middleware which o Already implements many required features for effective

global schedulingo Provides an excellent “workbench” for future activities!

o For more information, please visito http://www.griphyn.org/sphinx/