Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2....

36

Transcript of Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2....

Page 1: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation
Page 2: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 3: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 4: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 5: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

•  Today (2014): Peta-scale Computing −  34PFlops −  3Million cores

•  Near Future (~2020): Exascale Computing −  1000PFlops −  1Billion cores

http://s.top500.org/static/lists/2014/06/TOP500_201406_Poster.pdf http://users.ece.gatech.edu/mrichard/ExascaleComputingStudyReports/ECSS%20report%20101909.pdf

Page 6: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Manages resources −  compute −  storage −  network

•  Job scheduling −  resource allocation −  job launch

•  Data management −  data movement −  caching

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 7: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  PetaScale à Exascale (10^18 Flops, 1B cores) •  Workload Diversity

−  Traditional large-scale High Performance Computing (HPC) jobs −  HPC ensemble runs −  Many-Task Computing (MTC) applications

•  State-of-the-art Resource Managers −  Centralized paradigm: job scheduling −  HPC resource managers: SLURM, PBS, Moab, Torque, Cobalt −  HTC resource manager: Condor, Falkon, Mesos, YARN

•  Next-generation Resource Managers −  Distributed paradigm −  Scalable, efficient, reliable

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Decade Uses and computer involved

1970s Weather forecasting, aerodynamic research (Cray-1).

1980s Probabilistic analysis, radiation shielding modeling (CDC Cyber).

1990s Brute force code breaking (EFF DES cracker).

2000s 3D nuclear test simulations as a substitute for legal conduct Nuclear Non-Proliferation Treaty (ASCI Q).

2010s Molecular Dynamics Simulation (Tianhe-1A)

Medical Image Processing: Functional Magnetic Resonance

Chemistry Domain: MolDyn

Molecular Dynamics: DOCK

Production Runs in Drug Design

Economic Modeling: MARS

Large-scale Astronomy Application Evaluation

Astronomy Domain: Montage

Data Analytics: Sort and WordCount

http://elib.dlr.de/64768/1/EnSIM_-_Ensemble_Simulation_on_HPC_Computes_EN.pdf

Page 8: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 9: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Scalability −  System scale is increasing − Workload size is increasing −  Processing capacity needs to increase

•  Efficiency −  Allocating resources fast − Making fast enough scheduling decisions − Maintaining a high system utilization

•  Reliability −  Still functioning well under failures

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 10: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 11: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

KVS server

Scheduler

Executor KVS server

Scheduler

Executor

Compute Node Compute Node

……

Fully-Connected

communication

Client Client Client

MTC Centralized Scheduling MTC Distributed Scheduling

cd cd cd

Controller and KVS server

Controller and KVS Server

cd cd cd

Controller and KVS Server

cd cd cd

…Fully-Connected

Client Client Client

HPC Centralized Scheduling HPC Distributed Scheduling

MATRIX

Slurm++ Slurm

Falkon

Page 12: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm workload manager Slurm++: extension to Slurm

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

•  Distributed •  Partition-based •  Resource sharing •  Distributed metadata management

cd cd cd

Controller and KVS server

Controller and KVS Server

cd cd cd

Controller and KVS Server

cd cd cd

…Fully-Connected

Client Client Client

Page 13: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 14: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Key Value Description

controller id free node list Free nodes in a partition

job id original controller id

The original controller that is responsible for a submitted job

job id + original controller id involved controller list

The controllers that participate in launching a job

job id + original controller id + involved controller id participated node list The nodes in each partition that are involved

in launching a job

Page 15: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Balancing resources among all the partitions •  Trivial for centralized architecture

•  Challenging for distributed architecture –  No global state information –  Need to be achieved dynamically –  Resource conflict may happen –  Distributed resource stealing

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 16: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Different controllers may modify the same resource concurrently

•  Resolved through atomic operation ALGORITHM 1: COMPARE AND SWAP Input: key (key), value seen before (seen_value), new value intended to insert (new_value), and the storage hash map (map) of a ZHT server. Output: A Boolean indicates success (TRUE) or failure (FALSE), and the current actual value (current_value). 1 current_value = map.get(key); 2 if (current_value == seen_value) then 3 map.put(key, new_value); 4 return TRUE; 5 else 6 return FALSE, current_value 7 end

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 17: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

ALGORITHM 2: RANDOM RESOURCE STEALING Input: number of nodes required (num_node_req), number of nodes allocated (*num_node_alloc), number of controllers/partitions (num_ctl), controller membership list (ctl_id[num_ctl]), sleep length (sleep_length), number of reties (num_retry). Output: list of involved controller ids (list_ctl_id_inv), participated nodes of each involved controller (part_node[]). 1 num_try = 0; num_ctl_inv = 0; default_sleep_length = sleep_length; 2 while *num_node_alloc < num_node_req do 3 sel_ctl_idx = ctl_id[Random(num_ctl)]; sel_ctl_node = zht_lookup(sel_ctl_id); 4 if (sel_ctl_node != NULL) then 5 num_more_node = num_node_req – *num_node_alloc; 6 again: num_free_node = sel_ctl_node.num_free_node; 7 if (num_free_node > 0) then 8 num_try = 0; sleep_length = default_sleep_length; 9 num_node_try = num_free_node > num_more_node ? num_more_node : num_free_node; 10 list_node = allocate(sel_ctl_node, num_node_try); 11 if (list_node != NULL) then 12 *num_node_alloc += num_node_try; part_node[num_ctl_inv++] = list_node; list_ctl_id_inv.add(remote_ctl_id); 13 else 14 goto again; 15 end 16 else if (++num_try > num_retry) do 17 release(list_ctl_id_inv, part_node); *num_node_alloc= 0; sleep_length = default_sleep_length; 18 else 19 sleep(sleep_length); sleep_length *= 2; 20 end 21 end 22 end 23 return list_ctl_id_inv, part_node;

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 18: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Poor performance (resource deadlock) –  For big jobs (e.g. full-scale, half-scale) –  Under high system utilization

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

0  

1000  

2000  

3000  

4000  

5000  

6000  

7000  

8000  

9000  

No.  of  S

tealing  Ope

ra1o

ns  

Scale  (No.  of  nodes)  

(j,u)=(1,0)  (j,u)=(0.75,0)  (j,u)=(0.5,0)  (j,u)=(0.25,0)  (j,u)=(0.75,0.25)  (j,u)=(0.5,0.25)  (j,u)=(0.25,0.25)  (j,u)=(0.5,0.5)  (j,u)=(0.25,0.5)  

•  Linearly increasing as the system scale linearly with the system scales

•  Exponentially increasing with both the job size and utilization

•  At 1M-node scale, a full-scale job needs about 8000 stealing operations

Page 19: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Monitoring-based weakly consistent •  A centralized monitoring service

–  collect all the resources of all partitions periodically (global view)

•  Controllers: Two phase tuning –  pull all the resources periodically (macro-tuning) –  store the resources in a binary-search-tree (BST)

(local view) –  find the most suitable partition according to BST –  lookup the resource of that partition –  update the BST (micro-tuning, not consistent)

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 20: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

ALGORITHM 3. MONITORING SERVICE LOGIC Input: number of controllers or partitions (num_ctl), controller membership list (ctl_id[num_ctl]), update frequency (sleep_length). Output: void. 1 global_key = “global resource specification”; global_value = “”; 2 while TRUE do 3 global_value = “”; 4 for each i in 0 to num_ctl – 1; do 5 cur_ctl_res = zht_lookup(ctl_id[i]); 6 if (cur_ctl_res == NULL) then 7 exit(1); 8 else 9 global_value += cur_ctl_res; 10 end 11 end 12 zht_insert(global_key, global_value); sleep(sleep_length); 13 end

Page 21: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Property –  left child <= root <= right

child •  Operations

– BST_insert(BST*, char*, int) – BST_delete(BST*, char*, int) – BST_delete_all(BST*) – BST_search_best(BST*, int)

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 22: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

PHASE 1: PULL-BASED MACRO-TUNING Input: number of controllers or partitions (num_ctl), controller membership list (ctl_id[num_ctl]), update frequency (sleep_length), binary search tree of the number of free nodes of each partition (*bst) Output: void. 1 global_key = “global resource specification”; global_value = “”; 2 while TRUE do 3 global_value = zht_lookup(global_key); 4 global_res[num_ctl] = split(global_value); 5 pthread_mutex_lock(&bst_lock); 6 BST_delete_all(bst); 7 for each i in 0 to num_ctl – 1; do 8 BST_insert(bst, ctl_id[i], global_res[i].num_free_node); 9 end 10 pthread_mutex_unlock(&bst_lock); 11 sleep(sleep_length); 12 end

Page 23: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

PHASE 2: MICRO-TUNING RESOURCE ALLOCATION Input: number of nodes required (num_node_req), number of nodes allocated (*num_node_alloc), self id (self_id), sleep length (sleep_length), number of reties (num_retry), binary search tree of the number of free nodes of each partition (*BST) Output: list of nodes allocated to the job 1 *num_node_alloc = 0; num_try = 0; alloc_node_list = NULL; target_ctl = self_id; 2 while *num_node_alloc < num_node_req do 3 resource = zht_lookup(target_ctl); nodelist = allocate_resource(target_ctl, resource, num_node_req – *num_node_alloc); 4 if (nodelist != NULL) then 5 num_try = 0; alloc_node_list += nodelist; *num_node_alloc += nodelist.size(); new_resource = resource – nodelist; 6 pthread_mutex_lock(&bst_lock); 7 old_resource = BST_search_exact(bst, target_ctl); 8 if (old_resource != NULL) then 9 BST_delete(bst, target_ctl, old_resource.size()); 10 end 11 BST_insert(bst, target_ctl, updated_resource.size()); 12 pthread_mutex_unlock(&bst_lock); 13 else if (num_try++ > num_retry) then 14 release_nodes(nodelist); alloc_node_list = NULL; *num_node_alloc = 0; num_try = 0; sleep(sleep_length); 15 end 16 if (*num_node_alloc < num_node_req) then 17 pthread_mutex_lock(&bst_lock); 18 if ((data_item = BST_search_best(bst, num_node_req –*num_node_alloc) != NULL) then 19 target_ctl = data_item.ctl_id; BST_delete(bst, target_ctl, data_item.size()); 20 else 21 target_ctl = self_id; 22 end 23 end 24 end 25 return alloc_node_list;

Page 24: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Code Available (developed in c) –  https://github.com/kwangiit/SLURMPP_V2

•  Code Complexity –  5K lines of new code –  Plus 500K lines of SLURM code –  8K lines of ZHT code –  1K lines of auto-generated code by Google Protocol Buffer

•  Dependencies –  Google Protocol Buffer –  ZHT key value store –  Slurm resource manager

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 25: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 26: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Test-bed –  The Probe 500-node Kodiak Cluster at LANL –  Each node has two AMD Opteron (tm) processers with 252

(2.6GHZ), and has 8GB memory –  The network supports both Ethernet and Infini-Band. –  Our experiments use 10Gbits Ethernet (default configuration of

Kodiak). –  Experiments up to 500 nodes

•  Workloads –  Micro-benchmarking NOOP sleep jobs –  Application traces from IBM Blue Gene/P supercomputers –  Scientific numeric MPI applications from the PETSc tool package

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 27: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

0

20

40

60

80

100

120

50 100 150 200 250 300 350 400 450 500

Thro

ughp

ut (j

obs

/ sec

)

Scale (No. of Nodes)

Slurm++ throughput Slurm++ v0 throughput Slurm throughput

0

10

20

30

40

50

60

70

80

50 100 150 200 250 300 350 400 450 500

Thro

ughp

ut (j

obs

/ sec

)

Scale (No. of Nodes)

Slurm++ throughput Slurm++ v0 throughput Slurm throughput

0

5

10

15

20

25

30

35

40

45

50

100 150 200 250 300 350 400 450 500

Thro

ughp

ut (j

obs

/ sec

)

Scale (No. of Nodes)

Slurm++ throughput Slurm++ v0 throughput Slurm throughput

0

2

4

6

8

10

12

50 100 150 200 250 300 350 400 450 500

Spe

edup

(Slu

rm++

/ S

lurm

)

Scale (No. of nodes)

One-node jobs Half-partition jobs Full-partition jobs

One-node jobs Half-partition jobs

Full-partition jobs Speedup Summary

Page 28: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

1.00%,  19.52%  

2.00%,  70.44%  

3.00%,  85.20%  

5.00%,  90.91%  

10.00%,  95.50%  20.00%,  98.77%   40.00%,  99.34%  

63.00%,  99.35%  

80.00%,  99.78%  

100.00%,  100.00%  

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%  

Percen

tage  of  N

umbe

r  of  Job

s  

Percentage  of  Job  size  to  Scale    

Percentage  of  number  of  jobs  

Workload Specification

0  

8  

16  

24  

32  

40  

48  

56  

64  

0  

5  

10  

15  

20  

25  

30  

35  

40  

50   100   150   200   250   300   350   400   450   500  

ZHT  Message  Cou

nt  

Throughp

ut  (job

s  /  se

c)  

Scale  (No.  of  nodes)  

Slurm++  throughput  Slurm  throughput  Average  ZHT  message  count  per  job  

Slurm++ vs. Slurm

Page 29: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

0

200

400

600

800

1000

50 100 150 200 250 300 350 400 450 500

Sch

edul

ing

Late

ncy

(ms)

Scale (No. of nodes)

Slurm++ average scheduling latency per job Slurm average scheduling latency per job

Page 30: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

10  

100  

1000  

10000  

0   50   100   150   200   250   300   350   400   450   500  

Throughp

ut  (job

s  /  se

c)  

Scale  (No.  of  nodes)  

ParSSon  size  =  1   ParSSon  size  =  4  ParSSon  size  =  16   ParSSon  size  =  64  

Page 31: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

4.66%  11.48%  

8.35%  8.10%  2.95%  1.56%  2.15%  3.28%  

4.75%  1.00%  

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

0  

4  

8  

12  

16  

20  

24  

28  

32  

36  

40  

50   100   150   200   250   300   350   400   450   500  

Normalize

d  Diffe

rence  

Throughp

ut  (job

s  /  se

c)  

Scale  (No.  of  node)  

Slurm++  throughput  SimSlurm++  throughput  Normalized  difference  

Validation

7.5  

17.5  

42.3  

130.4  

4.00  

6.56  

28.98  

85.89  

1  

2  

4  

8  

16  

32  

64  

128  

256  

1  

2  

4  

8  

16  

32  

64  

128  

256  

1024   4096   16384   65536  

ZHT  message  cou

nt  

Sche

duling  latency  (m

s)  

Scale  (No.  of  nodes)  

Averge  scheduling  latency  per  job  

Average  ZHT  message  count  per  job  

Scalability

Page 32: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 33: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Workloads are becoming finer-grained •  Key-value store is a viable building block •  Distributed resource manager is demanded •  Fully distributed architecture is scalable •  Slurm++ is 10X faster in making scheduling

decisions •  Simulation results show the technique is

scalable

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 34: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Introduction & Motivation •  Problem Statement •  Proposed Work •  Evaluation •  Conclusions •  Future Work

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 35: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  Contribute to the Intel ORCM cluster manager – Built on top of open-mpi – Make key-value store as a new Plugin

•  Distributed data-aware scheduling for HPC applications

•  Elastic resource allocation

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems

Page 36: Introduction & Motivation Problem Statementiraicu/teaching/CS554-S15/lecture06... · 2015. 2. 11. · •Introduction & Motivation • Problem Statement • Proposed Work • Evaluation

•  More information: – http://datasys.cs.iit.edu/~kewang/

•  Contact: – [email protected]

•  Questions?

Slurm++: a Distributed Workload Manager for Extreme-Scale High-Performance Computing Systems