Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource...
Transcript of Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource...
![Page 1: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/1.jpg)
Heracles: Improving Resource
Efficiency at Scale
David Lo†, Liqun Cheng*, Rama Govindaraju*, Parthasarathy Ranganathan*, Christos Kozyrakis†
† Stanford University * Google Inc.
© 2012 Google Inc. All rights reserved. Google and the Google Logo are registered trademarks of Google Inc.
![Page 2: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/2.jpg)
The case for oversubscription
Diurnal load variation Total Cost of Ownership
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 2
61%16%
14%
6%
3%
Servers
Energy
Cooling
Networking
Other
[J. Hamilton, http://mvdirona.com]
Idleness in
latency
critical
workload! Bigger
OpportunityPEGASUS
[ISCA’14]
![Page 3: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/3.jpg)
Oversubscription summary
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 3
Motivation: fill in idle cycles
with useful work
How: Latency Critical (LC) +
Best Effort (BE)
Plenty of analytics jobs, such
as deep learning training
![Page 4: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/4.jpg)
Challenges of oversubscription
Allocation of shared resources between LC and BE
Interference on shared resources
DRAM
LLC
Cores
Network
Power
Difficult to guarantee quality of service (QoS)
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 4
![Page 5: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/5.jpg)
How bad can interference get?
Quick experiment with a latency critical job and a batch job
The latency critical job: Google websearch
The batch job: deep learning classifier
The setup:
Run batch job at very low priority to fill in idle CPU cycles
Hope that the Linux scheduler is sufficient for QoS
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 5
![Page 6: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/6.jpg)
How bad can interference get?
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 6
SLO latency
Cannot co-locate
workload at any
load!
![Page 7: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/7.jpg)
Interference is different based on resource
LLC >300% >300% >300% >300% >300% >300% >300% 264% 123%
DRAM >300% >300% >300% >300% >300% >300% >300% 270% 122%
HyperThread 110% 107% 114% 115% 105% 117% 120% 136% >300%
CPU power 124% 107% 116% 109% 115% 105% 101% 100% 100%
Network 36% 36% 37% 37% 39% 42% 48% 55% 64%
10% 20% 30% 40% 50% 60% 70% 80% 90%
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 7
Impact of interference on websearch’s latency
Re
sou
rce
Websearch load
0%
100%
300%
O
K
N
O
T
O
K
No oversubscription
possible
![Page 8: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/8.jpg)
Interference is different based on resource
LLC >300% >300% >300% >300% >300% >300% >300% 264% 123%
DRAM >300% >300% >300% >300% >300% >300% >300% 270% 122%
HyperThread 110% 107% 114% 115% 105% 117% 120% 136% >300%
CPU power 124% 107% 116% 109% 115% 105% 101% 100% 100%
Network 36% 36% 37% 37% 39% 42% 48% 55% 64%
10% 20% 30% 40% 50% 60% 70% 80% 90%
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 8
Impact of interference on websearch’s latency
Re
sou
rce
Websearch load
0%
100%
300%
Need to manage MULTIPLE resources
with DYNAMIC controller
![Page 9: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/9.jpg)
Oversubscription appears to be too hard
Google Twitter
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 9
[Barroso’09] [Delimitrou’14]
Even with cluster managers and lots of available jobs
Caused by fear of interference
20% avg. utilization30% avg. utilization
![Page 10: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/10.jpg)
Heracles: low latency and high utilization
Insights:
Use iso-latency to tolerate some interference
Fine-grained isolation on all shared resources to mitigate the rest
Implementation:
Dynamic controller to manage shared resource allocations
Evaluated on Google workloads, high utilization without QoS
violations
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 10
![Page 11: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/11.jpg)
0% 20% 40% 60% 80% 100%
Ov
era
ll q
ue
ry la
ten
cy
% of maximum cluster load
websearch latency vs. cluster load
What is iso-latency?
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 11
Can hide interference in this slack!SLO latency
![Page 12: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/12.jpg)
Fine-grained resource isolation mechanisms
CPU (HyperThread/L1/L2)
Use Linux cpuset cgroups to partition cores between LC and BE jobs
Single core granularity (Haswell has up to 18 cores)
~1ms response time
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 12
Core 1 Core 2 Core 3 Core 4 ... Core N-1 Core N
LC cpuset BE
Example partitioning setup:
![Page 13: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/13.jpg)
Fine-grained resource isolation mechanisms
LLC
Hardware cache partitioning in latest Haswell Xeon
Partitioning by cache way (20 ways in Haswell)
<1ms adjustment latency
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 13
Way 1 Way 2 Way 3 Way 4 ... Way N-1 Way N
Partition for LC BE
Example partitioning setup:
![Page 14: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/14.jpg)
Fine-grained resource isolation mechanisms
Network
Transmit rate limiting in Linux kernel with hierarchical token bucket
Extremely fine grained limits of at least 1Mbps
~1ms response time
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 14
BE pkt
BE pkt
BE pkt
LC pkt
LC pkt
BE queue LC queue
Pkt
Sched
To NIC
Rate limit BE flows
![Page 15: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/15.jpg)
Fine-grained resource isolation mechanisms
CPU power
Per-core DVFS to ensure minimum Turbo frequency for LC workload
Can change clock frequency in increments of 100MHz
<1 ms response of hardware
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15
Core 1 Core 2 Core 3 Core 4 ... Core N-1 Core N
3.0 GHz 3.0 GHz 3.0 GHz 3.0 GHz 3.0 GHz 2.0GHz 2.0GHz
BE coresLC cores
Shift power from BE to LC cores
to maintain guaranteed LC freq.
![Page 16: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/16.jpg)
Fine-grained resource isolation mechanisms
DRAM BW
Not available in hardware, have to simulate with other mechanisms
LLC partitions influences amount of traffic that is served by DRAM
Use number of cores to control DRAM BW
Intuition: each core can only issue so many requests/sec
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 16
LC
LC
BE
BE
LLC partitioning
Core partitioning
× DRAM BW
BWPerCore
NumCores
![Page 17: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/17.jpg)
But how should the knobs be set?
This looks like an optimization problem
Objective: maximize resources given to BE job
Constraints: preserve SLO of latency critical application
Challenge: 5-dimensional formulation!
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 17
![Page 18: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/18.jpg)
Control insight #1: independence
Observation: latency violations occur when a shared
resource is extremely loaded
High demand for resource causes significant contention
LC workload is unable to obtain its required allocation
Insight: assume independent interference under 2 conditions
LC workload is not starved for any resource
Each resource has enough slack (~10%) to absorb bursts
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 18
![Page 19: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/19.jpg)
Control insight #2: convexity
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 19
Performance as a function
of resources is convex for
benchmarked workloads
Use of gradient descent is
guaranteed to produce
optimality
![Page 20: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/20.jpg)
Heracles: high level controller overview
Goal: meet SLO, keep BE from saturating shared resource
Runs on each machine
LC
workload
Controller
CPU +
Memory
CPU
powerNetwork
LLC CPUDRAM
BWDVFS
CPU
PowerHTB
Net.
BW
Latency readings
Can BE grow?
Internal
feedback
loops
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 20
![Page 21: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/21.jpg)
LC
workload
Controller
CPU +
Memory
CPU
powerNetwork
LLC CPUDRAM
BWDVFS
CPU
PowerHTB
Net.
BW
Latency readings
Can BE grow?
Internal
feedback
loops
Heracles: high level controller overview
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 21
Cores LLC Core freq. Network BW
LC LC Max LCBE BE BE BE
L
BE BEBE BEBE BEBE
![Page 22: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/22.jpg)
LC
workload
Controller
CPU +
Memory
CPU
powerNetwork
LLC CPUDRAM
BWDVFS
CPU
PowerHTB
Net.
BW
Latency readings
Can BE grow?
Internal
feedback
loops
Heracles: high level controller overview
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 22
Cores LLC Core freq. Network BW
LC LC Max LCBE BE BE BE
L
![Page 23: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/23.jpg)
Example subcontroller: Core+Memory
Isolates: Cores, LLC, DRAM
Physical mechanisms: Partitioning of cores, LLC, and DRAM
Goal: maximize cores running BE job by minimizing DRAM BW
Guardband in DRAM BW to ensure LC job is not being starved
Iterative phases:
1. Reduce total DRAM BW through LLC partitioning
2. Grow allocation of BE cores
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 23
![Page 24: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/24.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
Start here
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 24
![Page 25: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/25.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
Reduce BW
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 25
![Page 26: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/26.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
Reduce BW
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 26
∇≈ 0Negligible benefit
![Page 27: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/27.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
+ BE cores
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 27
![Page 28: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/28.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
+ BE cores
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 28
Danger zone
![Page 29: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/29.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
Reduce BW
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 29
![Page 30: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/30.jpg)
Example subcontroller: Core+Memory
Time
LCD
RA
M B
W
Time
BE D
RA
M B
W
Time
Tota
l D
RA
M B
W
+ BE cores
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 30
Hit BW cap
![Page 31: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/31.jpg)
Evaluation of Heracles
Evaluation of Google production workloads on real hardware
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 31
![Page 32: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/32.jpg)
Latency Critical workloads
websearch
Leaf node, document retrieval/scoring
99%-ile latency SLO of tens of milliseconds
ml_cluster
Machine learning for text clustering
95%-ile latency SLO of tens of milliseconds
memkeyval
In-memory key-value store
99%-ile latency SLO of hundreds of microseconds
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 32
Production
![Page 33: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/33.jpg)
Best Effort jobs
stream-LLC: LLC antagonist
stream-DRAM: DRAM BW antagonist
cpu_pwr: CPU power antagonist
brain: deep learning (LLC, DRAM, CPU, CPU power)
streetview: image stitching (DRAM BW)
Run Heracles on real hardware, measure latency and utilization
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 33
Synthetic
Production
![Page 34: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/34.jpg)
Latency validation: do no harm
SLO latency
Iso-latency: recovering
slack and turning it into work
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 34
![Page 35: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/35.jpg)
Putting it together: resource efficiency
Effective Machine Utilization = (LC load) + (% BE throughput)
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 35
Load on LC app
Free batch processing
capability
![Page 36: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/36.jpg)
Putting it together: resource efficiency
Effective Machine Utilization = (LC load) + (% BE throughput)
Better than 100% is due to better
binpacking
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 36
![Page 37: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/37.jpg)
Bonus: energy efficiency too!
Power increase is far less than resource utilization increase!
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 37
300% more work for
60% more power
![Page 38: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/38.jpg)
Cluster results
Use load trace for off-peak hours on websearch cluster
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 38
![Page 39: Heracles: Improving Resource Efficiency at Scale · 2019-05-02 · Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 15 Core 1 Core 2 Core 3 Core 4 ... Core](https://reader030.fdocuments.us/reader030/viewer/2022040614/5f0c74277e708231d4357ab7/html5/thumbnails/39.jpg)
Conclusion
Increasing utilization is key to improving datacenter efficiency
Fine-grained knobs to control many sources of interference
Need coordinated policy to find optimal settings
Heracles significantly increases utilization
Achieves average of 90% utilization for Google workloads
Potential increase of >300% in cost efficiency
Heracles: Improving Resource Efficiency at Scale (ISCA-42 June 16, 2015) 39