Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an...
Transcript of Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an...
![Page 1: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/1.jpg)
Themis: Fair and Efficient GPU Cluster Scheduling
Kshiteej Mahajan1, Arjun Balasubramanian1, Arjun Singhvi1, Shivaram Venkataraman1, Aditya Akella1, Amar Phanishayee2, Shuchi Chawla1
1 2
1
![Page 2: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/2.jpg)
Deep Learning at a Large Enterprise
Speech, Image, Ads, NLP, Web Search…
2
![Page 3: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/3.jpg)
Deep Learning at a Large Enterprise
Speech, Image, Ads, NLP, Web Search…
Innovate and Train newer DL models on GPUs
3
![Page 4: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/4.jpg)
Deep Learning at a Large Enterprise
• Hyperparameter Optimization is typical –train same DL model (M)with different hyperparameters (H)
• DL App H1 (lr = 1e-3) H2 (lr = 1e-4)
H3 (lr = 5e-4) H4 (lr = 2e-3)
Job 1 Job 2
Job 3 Job 4
DL App, Model M 4
![Page 5: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/5.jpg)
DL Apps at a Large Enterprise
• Hyperparameter Optimization is typical –train same DL model (M)with different hyperparameters (H)
• DL App H1 (lr = 1e-3) H2 (lr = 1e-4)
H3 (lr = 5e-4) H4 (lr = 2e-3)
Job 1 Job 2
Job 3 Job 4
DL App, Model M 5
![Page 6: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/6.jpg)
Deep Learning at a Large Enterprise
Independent GPU Instances
6
![Page 7: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/7.jpg)
GPU Cluster Scheduler: Goal
GPU Cluster Scheduler
Multiplex access to shared GPU cluster for contending DL apps
Shared GPU cluster
7
![Page 8: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/8.jpg)
GPU Cluster Scheduler: Goal
GPU Cluster Scheduler
Multiplex access to shared GPU cluster for contending DL apps
Shared GPU cluster
Desired Goal –Incentivize sharing of GPU resources
8
![Page 9: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/9.jpg)
GPU Cluster Scheduler: Goal
GPU Cluster Scheduler
Multiplex access to shared GPU cluster for contending DL apps
Shared GPU cluster
Primary Goal – Sharing Incentive (SI)
If N DL apps are sharing a cluster then no DL app should run slower than on a private cluster with 1/N resources.
9
![Page 10: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/10.jpg)
Overview• Existing GPU Cluster Schedulers
• Do not give Sharing Incentive • DL App Properties• Drawbacks• Requirements
• Themis • Design• Implementation• Evaluation
10
![Page 11: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/11.jpg)
Existing GPU Cluster Schedulers
GPU Cluster Scheduler
Shared GPU cluster
• DRF• Tiresias (LAS)• SLAQ• Optimus• Gandiva
11
![Page 12: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/12.jpg)
GPU Cluster Scheduler: Drawback 1
Metric = App Resource
Share
Mechanism =Allocate on Task Completion to
Min Metric Shared GPU cluster
• DRF
Interface = Task Resource Demand
12
![Page 13: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/13.jpg)
GPU Cluster Scheduler: Drawback 1
Metric = Resource Share
Mechanism =on task completion, schedule task from
app with min Resource Share
Shared GPU cluster• DRF
Interface = Task Resource Demand
• Assume Short Tasks for Sharing Incentive
• Short Tasks allow for frequent multiplexing
13
![Page 14: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/14.jpg)
Interface = Task Resource Demand
GPU Cluster Scheduler: Drawback 1
Metric = Resource Share
Mechanism =on task completion, schedule task from
app with min Resource Share
Shared GPU cluster
• Assume Short Tasks for Sharing Incentive
• Short Tasks allow for frequent multiplexing
• DRF
• ML median task duration –3.75 hours
• Lot of apps with 5X shorter and 5X longer tasks
14
![Page 15: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/15.jpg)
Interface = Task Resource Demand
GPU Cluster Scheduler: Drawback 1
Metric = Resource Share
Mechanism =on task completion, schedule task from
app with min Resource Share
Shared GPU cluster
• Assume Short Tasks for Sharing Incentive
• Short Tasks allow for frequent multiplexing
• DRF
• ML median task duration –3.75 hours
• Lot of apps with 5X shorter and 5X longer tasks
• Long waiting time for Short Apps
• No SI for Short Apps
SHORT RESOURCE INTENSIVE LONGTASK
long waiting time
15
![Page 16: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/16.jpg)
Interface = Task Resource Demand
GPU Cluster Scheduler: Requirement 1
Metric = Resource Share
Mechanism =on task completion, schedule task from
app with min Resource Share
Shared GPU cluster
• Assume Short Tasks for Sharing Incentive
• Short Tasks allow for frequent multiplexing
• DRF
• ML median task duration –3.75 hours
• Lot of apps with 5X shorter and 5X longer tasks
• No SI for Short Apps
SHORT BREAK UP LONG TASK
• SI for ML Apps => Preemption is necessary
• Allocate any task for at most lease duration
schedule short task on preemption of
long task
16
![Page 17: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/17.jpg)
GPU Cluster Scheduler: Drawback 2
Metric = App Attained
Service (= #GPUs * time)
Mechanism =Allocate for lease duration to Min
MetricShared GPU cluster
• Tiresias (LAS)
Interface = Task Resource Demand
17
![Page 18: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/18.jpg)
GPU Cluster Scheduler: Drawback 2
Metric = App Attained
Service (= GPU time)
Mechanism =Allocate for lease duration to Min
MetricShared GPU cluster• Tiresias (LAS)
Interface = Task Resource Demand
• DL apps have a placement preference
• E.g.: VGG model family prefers dense placement
18
![Page 19: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/19.jpg)
GPU Cluster Scheduler: Drawback 2
Metric = App Attained
Service (= GPU time)
Mechanism =Allocate for lease duration to Min
MetricShared GPU cluster• Tiresias (LAS)
Interface = Task Resource Demand
• DL apps have a placement preference
• E.g.: VGG model family prefers dense placement
Server 1 Server 2• Attained Service is equal in both placements(= 4 GPUs * time)
• Both placements are equivalent
• Poor placement => slower execution time
• VGG app would rather prefer its own server
• No SI
Server 1 Server 2
VGG App
VGG App
Placement 1
Placement 2
19
![Page 20: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/20.jpg)
GPU Cluster Scheduler: Drawback 2
Metric = App Attained
Service (= GPU time)
Mechanism =Allocate for lease duration to Min
MetricShared GPU cluster• Tiresias (LAS)
Interface = Task Resource Demand
• DL apps have a placement preference
• E.g.: VGG model family prefers dense placement
• Attained Service is equal in both placements(= 4 GPUs * time)
• Both placements are equivalent
• SI violation with placement 1
Server 1 Server 2
Server 1 Server 2
VGG App
VGG App
Placement 1
Placement 2
• Binary Placement Enforcement –Strict Consolidation (wait for Placement 2)
• Partial Progress can be made with Placement 1
• Long wait time without progress
• SI is violated
20
![Page 21: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/21.jpg)
GPU Cluster Scheduler: Requirement 2
Metric = App Attained
Service (= GPU time)
Mechanism =Allocate for lease duration to Min
MetricShared GPU cluster• Tiresias (LAS)
Interface = Task Resource Demand
• DL apps have a placement preference
• E.g.: VGG model family prefers dense placement
• Attained Service is equal in both placements(= 4 GPUs * time)
• Both placements are equivalent
• SI violation with placement 1
Server 1 Server 2
Server 1 Server 2
VGG App
VGG App
Placement 1
Placement 2
• Binary Placement Enforcement –Strict Consolidation (only allow Placement 2)
• High wait time• SI is violated• SI for ML Apps =>
Fine-grained Placement Preference
21
![Page 22: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/22.jpg)
Overview• Existing GPU Cluster Schedulers
• Do not give Sharing Incentive • DL App Properties• Drawbacks• Requirements
• Themis • Design• Implementation• Evaluation
22
![Page 23: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/23.jpg)
Towards a new GPU Cluster Scheduler
Metric = ?
Mechanism =?
Shared GPU clusterThemis
Interface = ?
23
![Page 24: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/24.jpg)
Themis: Metric
Metric = ?
Mechanism =?
Shared GPU clusterThemis
Interface = ?
24
![Page 25: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/25.jpg)
Themis: Metric
GPU Cluster Scheduler
Shared GPU cluster
Green App’sShared Finish
Time Tsh
25
![Page 26: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/26.jpg)
Themis: Metric
GPU Cluster Scheduler
Shared GPU cluster
Independent GPU instances
Green App’sIndependent Finish Time
Tid
26
![Page 27: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/27.jpg)
Themis: Metric
GPU Cluster Scheduler
Shared GPU cluster
Independent GPU instances
Green App’sIndependent Running Time
Tid
Green App’sShared Running
Time Tsh
Primary Goal – Sharing Incentive (SI)
Tsh <= Tid
>=
27
![Page 28: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/28.jpg)
Themis: Finish-Time Fairness Metric
• ⍴ = Tsh / Tid• Tsh: finish-time of app in shared cluster • Tid: finish-time of app in exclusive 1/N share of cluster• N: Average contention during app lifetime
28
![Page 29: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/29.jpg)
Themis: Finish-Time Fairness Metric
• ⍴ = Tsh / Tid• Tsh: finish-time of app in shared cluster • Tid: finish-time of app in exclusive 1/N share of cluster• N: Average contention during app lifetime
• SI: for all apps, ⍴ <= 1
29
![Page 30: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/30.jpg)
Themis: Finish-Time Fairness Metric
• ⍴ = Tsh / Tid• Tsh: finish-time of app in shared cluster • Tid: finish-time of app in exclusive 1/N share of cluster• N: Average contention during app lifetime
• SI: for all apps, ⍴ <= 1• Fine-Grained Placement Preferences –
• Excessive queueing or bad placements worsens Tsh and hence ⍴
30
![Page 31: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/31.jpg)
Themis: Metric
Metric = finish-time fairness
(⍴)
Mechanism =?
Shared GPU clusterThemis
Interface = ?
31
![Page 32: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/32.jpg)
Themis: Interface
Metric = finish-time fairness
(⍴)
Mechanism =?
Shared GPU clusterThemis
Interface = ?
32
![Page 33: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/33.jpg)
Themis: Interface
• Key Purpose: Enable book-keeping of ⍴
• Who calculates ⍴ – the app or the scheduler?
33
![Page 34: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/34.jpg)
Themis: Interface
• DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier
34
![Page 35: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/35.jpg)
Themis: Interface
• DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier
• Launch several DL jobs with different Hyperparameters Hi
H1 H2 HN…
35
![Page 36: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/36.jpg)
Themis: Interface
• DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier
• Launch several DL jobs with different Hyperparameters Hi
• GPU allocation, Gi, within jobs decided by Hyperparam-Opt
H1 H2 HN…G1 G2 GNGapp
Hyperparam-OptAllocation Logic
36
![Page 37: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/37.jpg)
Themis: Interface
• DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier
• Launch several DL jobs with different Hyperparameters Hi
• GPU allocation, Gi, within jobs decided by Hyperparam-Opt
• Track training accuracy of each job and classify jobs as poor, ok and good
• Estimate finish time of each job
H1 H2 HN…G1 G2 GNGvector
Hyperparam-OptAllocation Logic
Hyperparam-OptTermination Logic
terminate at time1
terminate at timeN
trained model at time2
poor okgood
37
![Page 38: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/38.jpg)
Themis: Interface
• DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier
• Launch several DL jobs with different Hyperparameters Hi
• GPU allocation, Gi, within jobs decided by Hyperparam-Opt
• Terminate poor model training instances until one best trained model
• Wleft = ∑ Gi ∗ Hi�( is best calculated by
the Hyperparam-Opt
H1 H2 HN…G1 G2 GNGvector
Hyperparam-OptAllocation Logic
Hyperparam-OptTermination Logic
terminate at time1
terminate at timeN
trained model at time2
• App’s Hyperparam-Opt tracks per-job progress
• App does calculation of ⍴
• Scheduler pulls updated values of ⍴ from the Agent co-located with App’s Hyperparam-Opt
• Details in the paper38
![Page 39: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/39.jpg)
Towards a new GPU Cluster Scheduler
Metric = finish-time fairness
(⍴)
Mechanism =?
Shared GPU clusterThemis
Interface = Get ⍴estimates via Agent
39
![Page 40: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/40.jpg)
Towards a new GPU Cluster Scheduler
Metric = finish-time fairness
(⍴)
Mechanism =?
Shared GPU clusterThemis
Interface = Get ⍴estimates via Agent
40
![Page 41: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/41.jpg)
Themis: Mechanism
• Key Goal: Sharing Incentive
• SI: for all apps, ⍴ <= 1
• Difficult to guarantee with online arrivals
• Our focus: min (max ⍴ ): empirically keeps ⍴’s ≈ 1 without admission control
41
![Page 42: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/42.jpg)
Server 1 Server 2Strawman MechanismSI Objective – min (max ⍴)
Red GPUs become available
42
![Page 43: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/43.jpg)
Strawman MechanismServer 1 Server 2
SI Objective – min (max ⍴)
Interface: Get ⍴ estimates from all apps Red GPUs become available
43
![Page 44: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/44.jpg)
Server 1 Server 2Strawman Mechanism
…
SI Objective – min (max ⍴)
⍴1 ⍴2> > ⍴3 > ⍴N
Interface: Get ⍴ estimates from all apps Red GPUs become available
Sort in decreasing order of ⍴Allocate to app with highest ⍴ (green app) for lease duration
44
![Page 45: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/45.jpg)
Server 1 Server 2Strawman Mechanism: Issues
…
SI Objective – min (max ⍴)
⍴1 ⍴2> > ⍴3 > ⍴N
Interface: Get ⍴ estimates from all apps Red GPUs become available
Sort in decreasing order of ⍴Allocate to app with highest ⍴ (green app) for lease duration
1. Inefficient Allocation – Red GPUs are not co-located with Green Apps GPUs
2. Lying Apps – Apps can lie with high ⍴ values to hoard GPU resources
45
![Page 46: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/46.jpg)
Server 1 Server 2Themis: Mechanism
…
SI Objective – min (max ⍴)
⍴1 ⍴2> > ⍴3 > ⍴N
Interface: Get ⍴ estimates from all apps Red GPUs become available
1. Filter 1 – f apps with max ⍴ values2. Allocate to one or more of 1 – f apps for lease duration
using Partial Allocation Auctions
1 – f f
46
![Page 47: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/47.jpg)
Server 1 Server 2Themis: Mechanism
…
SI Objective – min (max ⍴)
⍴1 ⍴2> > ⍴3 > ⍴N
Interface: Get ⍴ estimates from all apps Red GPUs become available
1. Filter 1 – f apps with max ⍴ values2. Allocate to one or more of 1 – f apps for lease duration
(Red GPUs can potentially go to Blue App) using Auctions
1 – f f
1. Tradeoff SI for Efficiency – f → 0 => More apps to allocate resources => Better opportunity to match placement preference of apps to resources. Our sensitivity analysis suggests f = 0.8 gives a good tradeoff.
2. Partial Allocation Auction within 1– f apps – Incentivizes truth telling of ⍴
47
![Page 48: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/48.jpg)
Themis: Mechanism: Partial Allocation Auction
⍴1 ⍴2>
1 – fOffer Bids
Resource Alloc 𝜌new
+ 0 𝜌old
+1 Red GPU+2 Red GPUs 𝜌old-2𝜹
𝜌old-𝜹
Server 1 Server 2
Red GPUs become available
48
![Page 49: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/49.jpg)
Themis: Mechanism: Partial Allocation Auction
⍴1 ⍴2>
1 – fOffer Bids
Resource Alloc 𝜌new
+ 0 𝜌old
+ 2 Red GPUs+1 Red GPU 𝜌old-𝜹
𝜌old-2𝜹
• Widen Interface to make resource offer and get bids
• Bid from each app is a valuation table
Server 1 Server 2
Red GPUs become available
• Input: Valuation Tables from filtered apps
• Pareto efficiency (PE) – max ∏ 1/⍴𝑖, 𝑛𝑒𝑤 �( – proportional fair allocations
• Strategy Proofness (SP) – Allocate a fraction of this per app for lease duration – rest is “hidden payment”
• More lying => higher hidden payments => incentivizes truth-telling
• Leftover Allocation – Allocate hidden payments to unfiltered apps at random to avoid unallocated resources and enable work conservation
49
![Page 50: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/50.jpg)
Themis: Overall Design
Hyperparam-opt at the top
Agent: Shim layer added to Apps to enable interaction (estimate 𝜌, make bids) with Scheduler
Scheduler: Finish-Time Fair Metric (𝜌);Mechanism: SI + PE + SP THEMIS SCHEDULER
Ask all apps for 𝜌estimates
Offersto subset of apps
Make Bids
ReceiveAllocation
Give 𝜌Estimates
AGENT
App1 App2 AppN. . . . .
Appm
1
2
3
4
5
Res. Alloc. 𝜌new
0 𝜌old
M1:2GM1:1G,M2:1G 𝜌old - 𝜹
𝜌old - 2𝜹
Apps make bids Allocate Winning
Bids
50
![Page 51: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/51.jpg)
Themis: Implementation
Hyperparam-opt at the top
Agent: Shim layer added to Apps to enable interaction with Scheduler
Scheduler: Finish-Time Fair Metric;
Mechanism: SI + PE + SP
THEMIS SCHEDULER
Ask all apps for 𝜌estimates
Offersto subset of apps
Make Bids
ReceiveAllocation
Give 𝜌Estimates
AGENT
App1 App2 AppN. . . . .
Appm
1
2
3
4
5
Res. Alloc. 𝜌new
0 𝜌old
M1:2GM1:1G,M2:1G 𝜌old - 𝜹
𝜌old - 2𝜹
Apps make bids Allocate Winning
BidsSubmarine AM
Hadoop 3.2.0 YARN RM
51
![Page 52: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/52.jpg)
Themis: Evaluation
• 20 machine, 64 GPU cluster• 8 instances each with 2 Tesla K80 GPUs and • 12 instances each with 4 Tesla K80 GPUs
• A publicly available trace of DL apps from Microsoft • Baselines:
• Tiresias – Least Attained Service Job First • Optimus – Best Throughput Scaling First• Gandiva – Best Packing Job First• SLAQ – Best Loss Gradient Job First
52
![Page 53: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/53.jpg)
Macrobenchmark: Sharing Incentive
• CDF of ⍴ for all apps in the workload
• max ⍴ = 1.2 (~1) with Themis
• ⍴distribution has long tail without Themis
No Admission Control max 𝜌 = 1.2
Placement Insensitivity hurts!
53
![Page 54: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/54.jpg)
Macrobenchmark: Efficiency
• GPU Time to execute workload
• Themis better than Gandiva
• Auctions enable globally optimal packing
Greedy-local vs.globally-optimal packing
54
![Page 55: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/55.jpg)
Sensitivity Analysis/Tradeoffs
• max finish-time fair metric (⍴) and GPU time for different values of fairness knob (f)
55
![Page 56: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/56.jpg)
Sensitivity Analysis/Tradeoffs
• max finish-time fair metric (⍴) and GPU time for different values of fairness knob (f)
• f = 0.8 maximizes sharing incentive without degrading efficiency
Sub-optimalplacement
More choice;pack better
56
![Page 57: Themis: Fair and Efficient GPU Cluster Scheduling · Themis: Interface • DL App = Managed by an Hyperparameter Optimizer (Hyperparam-Opt) like Google Vizier • Launch several DL](https://reader034.fdocuments.us/reader034/viewer/2022042122/5e9cb878b4f539033b226de1/html5/thumbnails/57.jpg)
Conclusion
• Consolidation of GPUs => Sharing Incentive is key
• DL App properties => existing schedulers violate SI
• Themis proposes a new metric finish-time fairness that captures SI
• Filtering + Partial Allocation Auctions => Themis performs better than existing schedulers on SI and Efficiency
57