Grid Scheduler: Plan & Schedule

16
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In

description

Grid Scheduler: Plan & Schedule. Adam Arbree Jang Uk In. Current System. User Request. VDC. RC. TC. Chimera. Abstract Planner. Concrete Planner. DAGMan. Condor-G. Globus (gahp-server). Remote Site. User Request. VDC. TC. RC. Data Rep. Service. Chimera Abstract Planner. - PowerPoint PPT Presentation

Transcript of Grid Scheduler: Plan & Schedule

Page 1: Grid Scheduler: Plan & Schedule

Grid Scheduler:Plan & Schedule

Adam Arbree

Jang Uk In

Page 2: Grid Scheduler: Plan & Schedule

Current System

VDCChimera

AbstractPlanner

ConcretePlanner

RC TC

Condor-GDAGManGlobus

(gahp-server)

User Request

Remote Site

Page 3: Grid Scheduler: Plan & Schedule

Proposed SystemUser Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraAbstract Planner

RCTC

JDBPRDB

SchedulingServer

GDB

Page 4: Grid Scheduler: Plan & Schedule

Scheduling Server

DAG Reducer

Message Interface

Prediction Engine

Tracking System

Planner

DataReplication Server

Scheduling Client

JDB.RC

RC & TC PJDB

Grid Mon.

Grid Mon.

Page 5: Grid Scheduler: Plan & Schedule

User Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraA-Planner

RCTC

JDBPRDB

SchedulingServer

GDB

Chimera Abstract Planner

• Input

– User virtual data request

• Output

– Abstract production plan

• Queries VDC for full dependency graph

Page 6: Grid Scheduler: Plan & Schedule

Scheduling Client

• Input

– Parse abstract DAG

– Read run messages from server

• Output

– Send DAG to server

– Build and send jobs for Condor-G

• Maintain local image of DAG progress

• Refresh the scheduler data by request

• Choose scheduling server

User Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraA-Planner

RCTC

JDBPRDB

SchedulingServer

GDB

Page 7: Grid Scheduler: Plan & Schedule

Scheduling Databases

• TC: Trans. Catalog– (LFN, site) (PFN, env)

• RC: Replica Catalog– (LFN, site) (PFN)– (LFN, site, copy) (PFN)

• PRDB: Prediction DB– (job, params, site)

• Execution Time• CPU use • Disk use • Bandwith

• JDB: Job DB– (job)

• Job state• Site• VO• User• Params• Prediction use• Current use

User Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraA-Planner

RCTC

JDBPRDB

SchedulingServer

GDB

Page 8: Grid Scheduler: Plan & Schedule

Grid Monitor

• Input– Monitor data

• Output– Data to Data Rep.

Service– Data to Server– Data to grid cache

• Monitors – Cost Function– VO limits table– CPU load (by job)– Disk Usage (by job) – Job List– Bandwidth (by job)

User Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraA-Planner

RCTC

JDBPRDB

SchedulingServer

GDB

Page 9: Grid Scheduler: Plan & Schedule

Message Interface• Input

– A-DAG (from client)– User status requests (from

client)– Job run requests (from

planner)– Job state request (from rep.

server)– Job state (from tracking)

• Output– Job run requests (to client)– Status updates (to client)– Pruned DAG (to client) – Job state (to rep. server)– Job state request (to

tracking)• Manages client connections• Provides incoming and out going

message queues• Checks connectivity of clients

DAG Reducer

Message Int.

Pred. Engine

Tracking Sys.

Planner

DataRep. Server

Sched. Client

JDBRC

PJDB

Grid Mon.

RC & TC Grid Mon.

Page 10: Grid Scheduler: Plan & Schedule

Dag Reducer

• Input– Complete Abstract DAG

(from message int.)– Replica data (from RC)

• Output– DAG pruned for file

existance (to message int.)

DAG Reducer

Message Int.

Pred. Engine

Tracking Sys.

Planner

DataRep. Server

Sched. Client

JDBRC

PJDB

Grid Mon.

RC & TC Grid Mon.

Page 11: Grid Scheduler: Plan & Schedule

Prediction Engine• Input

– Job description (from planner)

– Updated history information (from tracking system)

– History data (from PRDB)

• Output– Job prediction (to

planner)– History information (to

tracking sys.)– History Data (to PRDB)

• Predict the time for a job on each site in the grid

DAG Reducer

Message Int.

Pred. Engine

Tracking Sys.

Planner

DataRep. Server

Sched. Client

JDBRC

PJDB

Grid Mon.

RC & TC Grid Mon.

Page 12: Grid Scheduler: Plan & Schedule

Tracking System• Input

– Pruned DAG (from DAG reducer)

– Job status (from planner)– Prediction information

(from pred. engine)– Status req. (from message

interface)– Job data (from JDB)

• Output– Job status (to planner)– New history information (to

pred. engine)– Status information (to

message interface)– Job data (to JDB)

• Periodically access grid monitor and update job status

DAG Reducer

Message Int.

Pred. Engine

Tracking Sys.

Planner

DataRep. Server

Sched. Client

JDBRC

RC & TC PJDB

Grid Mon.

Grid Mon.

Page 13: Grid Scheduler: Plan & Schedule

Tracking System• Input

– Job status (from tracking system)

– Job predictions (from pred. engine)

– PFN’s (from TC and RC)– Grid status (from grid mon.)

• Output– Job status (to tracking

system)– Job run requests (to

message interface)• Scheduling process

– Check grid status– Determine next job to run

and its execution site– Transfer input files– Send message to client to

run job– Update tracking– Transfer files to storage– Clean up– Update RC

DAG Reducer

Message Int.

Pred. Engine

Tracking Sys.

Planner

DataRep. Server

Sched. Client

JDBRC

RC & TC PJDB

Grid Mon.

Grid Mon.

Page 14: Grid Scheduler: Plan & Schedule

Data Replication Service

• Input– Grid status– Job queue

• Output– Entries to RC

• Monitor grid and determine hot spots

• Select sites to replicate data• Transfer data to replication

sites• Clean up unneeded data

User Request

VDC

SchedulingClient

Globus(gahp-server)

Condor-G

Data Rep.Service

Grid MonitorInterface

Remote Site

ChimeraA-Planner

RCTC

RJDBPRDB

SchedulingServer

GDB

Page 15: Grid Scheduler: Plan & Schedule

Grid Simulation

• Only two outside interfaces– Condor-G

– Remote sites

• Condor-G emulator takes real Condor-G submit files and sends fake jobs to remote site emulators

• Remote site emulators sleeps for designated periods for each job and send simulated data to the grid monitor

Page 16: Grid Scheduler: Plan & Schedule

Development Schedule

1. Research ~ present-Jan 20th

• Survey existing monitoring sytems

• Decide what must be monitored

2. Initial framework ~ Jan 20th- end of Feb• Build grid monitor interface

• Build grid simulator

• Design scheduler and data replication service

3. Build scheduler ~ March

4. Build data replication service ~ April

5. Grid Testing ~ May