Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rh ône-Alpes

Post on 19-Feb-2016

50 views 0 download

Tags:

description

Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rh ône-Alpes GRAAL Research Team Join work with DIET TEAM. D istributed I nteractive E ngineering T oolbox. DIET Batch and Simbatch: a quick glance. RPC and Grid Computing: Grid RPC. Request. S2 !. A, B, C. - PowerPoint PPT Presentation

Transcript of Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rh ône-Alpes

Jean-Sébastien GayLIP ENS Lyon, Université Claude Bernard Lyon 1

INRIA Rhône-AlpesGRAAL Research Team

Join work withDIET TEAM

Distributed Interactive Engineering Toolbox

DIET Batch and Simbatch:a quick glance

RPC and Grid Computing: Grid RPC

AGENT(s)

S1 S2 S3 S4

A, B, CAnswer (C)

S2 !

Request

Op(C, A, B)

Client

Outline

1. Introduction

2. Diet-Batch

3. Simbatch

4. Conclusion and perspectives

DIET Architecture

LA

MA

LA

LALA

Server front end

Master Agent

Local Agent

Client

MA

MA

MA

MA

JXTA

FAST libraryApplicationModeling

Systemavailabilities

LDAP NWS

MA

SeD_parallel

FrontalNFS

LSF PBS Loadleveler

GLUE

SeD_batchSeD_seq

Parallel and batch submissions - 1/2

• Parallel & sequential jobs → transparent for the user

• Submit a parallel job→ system dependent

NFS: copy the code? MPI: LAM, MPICH?

batch system dependent Numerous batch systems

(homogenization?) Batch schedulers behaviour

(queues, scripts, etc.) Information about the

internal scheduling process Monitoring

& Performance prediction SGEOAR

LA

Parallel and batch submissions - 2/2

• 2 API Client side

Request for seq, // resolution or let DIET choose the best Server side

Script with generic mnemonics DIET_NAME_FRONTALE, DIET_NB_NODES, DIET_BATCH_NODESFILE

A program that must end with a call to diet_submit_call()

• Experiments

Performance prediction with batch system

• During the submission stage Need to know when the task will begin/end Need to decide how many processors will be used Need performance prediction!

• Three means Use a probabilistic tool Ask the batch system (only available for MAUI and OAR 2.0) Use a simulator

Batch scheduler overview

• Portable Batch System (PBS) First Come First Served (FCFS)

• OAR (v. 1.6) Conservative BackFilling (CBF)

• Torque + Maui Only torque: FCFS Maui

3 scheduling policies: BESTFIT, FIRSTFIT (CBF), GREEDY

• Sun Grid Engine (SGE) FCFS

• Loadleveler 3 scheduling policies: FCFS, CBF, GANG Possibility to plug external schedulers

EASY Maui (should soon become the standard scheduler)

Grid simulator overview

• Data replication: ChicSim :

I. Foster PARallel Simulation Environment for Complex Systems

OptorSim: W. H. Bell, D. G. Cameron, R. Carvajal-Schiaffino JAVA

• Grid-economy GridSim:

R.Buyya(Nimrod/G) JAVA Quite similar to Simgrid

• Non-specialized toolkit Simgrid

H. Casanova, A. Legrand and M. Quinson C

… and their drawbacks

• Minimal support for batch schedulers

• Sometimes lack of functionalities to create them

• Often difficult to reuse Example: OptorSim

• No parallel tasks available Backfilling impossible Lack of realism

Simbatch in a nutshell

• Goals Cluster simulation for enhancing realism Prediction tool for DIET

• API for clients Description of the platform in XML files Use of the API in the deployment.xml file

Example 1: Creating a batch process on the host « Frontal »• <process host=“Frontale” function=“SB_batch” />

Example 2: Creating a resource• <process host=“Node1” function=“SB_node” />

Each batch must be described in simbatch.xml A specific load can be simulated for each batch

• API for developers Algorithms are plug-ins Reusable functions

Find the first matching slot in a Gantt chart• slot_t * find_first_slot(cluster_t c, int nb_nodes, double start_time, double duration);

Empty queues and reschedule • void generic_reschedule(cluster_t cluster, void (*schedule)(cluster_t cluster, m_task_t task));

Experiment description

• 2 types of experiments Validation by simulation: parameter variation

Topology, scheduling algorithm… Comparison between simulated platform

• Task generation Inter-arrival time: Poisson law, µ = 300s Resources number: U(1,5) Run time: U(600,1800) Wall time: run time x U(1.1;1.3)

• Experiment platform 5 node cluster Star topology OAR v. 1.6

Validation

Simulation precision

• Number of tasks: 100• Makespan: 23h • Error rate on the flow metrics around 1%

Conclusion and perspectives

• DIET-Batch Diet is now able to handle batch schedulers 3 Sed types: sequential, batch, parallel Good performance improvements

• Simbatch Standalone simulations show good results Configuration file available to simulate Lyon’s site Excellent tool to replay load

• Next steps Integrate Simbatch in DIET-Batch

Questions ?

http://graal.ens-lyon.fr/DIET/

http://graal.ens-lyon.fr/simbatch/