CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman...
-
Upload
joan-marylou-gaines -
Category
Documents
-
view
217 -
download
1
Transcript of CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman...
![Page 1: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/1.jpg)
CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems
Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes
![Page 2: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/2.jpg)
2
CloudScale: Background• Background and Motivation
– Infrastructure as a Service(Iaas) providers like Amazon EC2 uses virtualizations to provide isolation among users
• Service Level Objective(SLO)– An agreement between a service provider and a customer, about the
performance of service provider in terms of measurable characteristics(e.g. 90% of the requests are fulfilled within 100ms)
Physical Host
VM1 VM2
1 2 3 4 5 6 7 8 9 100%
20%
40%
60%
80%
100%
Resource DemandCap 1Cap2
Time
VM R
esou
rce
Usa
ge
![Page 3: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/3.jpg)
3
CloudScale
• Automatic resource scaling system• Goal: meet Service Level Objective(SLO)
requirements of the applications with minimum resource and energy cost
![Page 4: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/4.jpg)
4
The CloudScale System Architecture
Xen hypervisor
Virtual Machine
CloudScale
Resource demand prediction
Prediction error correction
Scaling conflict handling
Predictive frequency/voltage scaling
Graph adapted from original paper
Virtual MachineVirtual
MachineDom 0
Host
![Page 5: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/5.jpg)
5
Module 1: Resource Demand Prediction
• Goal: predict future resource demand based on past• Signature-driven resource demand prediction
P1 P2 P3 P4 P5Pattern Windows
• If all pairs of pattern windows Pi and Pj are similar(determined by Pearson correlation), CloudScale uses the average values over all pattern windows to make its prediction.
• Pearson Correlation(X,Y) =
![Page 6: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/6.jpg)
6
Module 1: Resource Demand Prediction
Frequency components
• fd = frequency of the frequency component with most signal power
• Signal power of f = • Pattern window size = , r is sampling rate
Apply Fast Fourier Transform
Reso
urce
Usa
ge
Time
• How to determine the size of pattern window?
Graphs from http://en.wikipedia.org/wiki/Frequency_domain
![Page 7: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/7.jpg)
7
Module 1: Resource Demand Prediction
• State-driven resource demand prediction• Used when no signature is found
State 2
CPU [30 %, 60%)
State 1
CPU [0 %, 30%)
State 3
CPU [60 %, 100%)
State 1 State 2 State 3
State 1 0.5 0.3 0.2
State 2 0.2 0.6 0.2
State 3 0.3 0.4 0.3
State-transition matrix Pij
![Page 8: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/8.jpg)
8
The CloudScale System Architecture
Xen hypervisor
Virtual Machine
CloudScale
Resource demand prediction
Prediction error correction
Scaling conflict handling
Predictive frequency/voltage scaling
Graph adapted from original paper
Virtual MachineVirtual
MachineDom 0
HostPredicted Resource demands
![Page 9: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/9.jpg)
9
Module 2: Prediction Error Correction• Why? Avoid under-estimation correction• Proactive Approach : Burst-based Padding
Graphs from http://en.wikipedia.org/wiki/Frequency_domain
Top k frequencies in frequency spectrum
Frequency components
Ampl
itude
Frequency
Apply Reverse Fast Fourier Transform
Frequency Domain
![Page 10: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/10.jpg)
10
Module 2: Prediction Error Correction
Series1
0%
30%
60%
90%
Burst Pattern
Burst Pattern
• Burst density: percentage of positive values in burst pattern
• If burst density > 50%, CloudScale uses maximum of all burst values as padding value
![Page 11: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/11.jpg)
11
Module 2: Prediction Error Correction• Proactive Approach : Remedial Padding
1 2 3 4 5 6 7 8 9 100%
20%
40%
60%
80%
100%
Real Resource DemandPredicted Resource Demand
Time
VM R
esou
rce
Usa
ge
• Prediction Errors (e1, e2 … ) = Real Resource Demand – Predicted Resource Demand
• Remedial Padding Value = Weighted Moving Average of | (e1, e2 … ) |• Padding Value = max(Burst-based value, Remedial Padding)
![Page 12: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/12.jpg)
12
Module 2: Prediction Error Correction• Reactive Approach: Fast Under-estimation Correction• Challenge: real resource demand is unknown during under-provisioning
t t+1 t+2 t+30%
10%20%30%40%50%60%70%80%90%
100%
Time (seconds)
Reso
urce
Cap
x * α
x
x * α2
x * α3
Real Resource Demand
![Page 13: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/13.jpg)
13
Module 2: Prediction Error Correction
• When to trigger reactive error correction?– Let P = , i.e. resource pressure– Trigger under-estimation when P ≥ Punder (90%, etc)
• How to determine α (resource scale-up ratio)? – α αmax – αmin) + αmin
Pre-defined parameters for maximum and minimum of scale-up ratio
![Page 14: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/14.jpg)
14
The CloudScale System Architecture
Xen hypervisor
Virtual Machine
CloudScale
Resource demand prediction
Prediction error correction
Scaling conflict handling
Predictive frequency/voltage scaling
Graph adapted from original paper
Virtual MachineVirtual
MachineDom 0
HostResource demands
Initial resource caps
![Page 15: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/15.jpg)
15
Module 3: Scaling conflict handling• Conflict Prediction
t t+1 t+2 t+3 t+4 t+5 t+6 t+7 t+8 t+90%
20%
40%
60%
80%
100%
120%
140%
VM 1VM 2Host = VM1 + VM2
Conflict Degree
Conflict Duration
![Page 16: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/16.jpg)
16
Module 3: Scaling conflict handling
• Two Approaches– Local conflict handling• Used when conflict duration is short and conflict degree
is small
– Migration conflict handling• Used when conflict duration is long and conflict degree
is large
![Page 17: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/17.jpg)
17
Module 3: Scaling conflict handling• Local conflict handling
– Uniform scheme: set resource cap for each application in proportion to its resource demand
– Differentiated scheme: satisfy the resource demand of high priority application first
• Resource Under-provisioning Penalty (RP)– SLO penalty for application VM caused by one unit resource under-provisioning
• Total Resource Under-provisioning Penalty(QRP)
– QRP = RP * total units of resource under-provisioning over the duration of conflict, of all VMs
1 2 3 4 5 6 7 8 9 100
0.20.40.60.8
1
Predicted Resource DemandAllocated Resource Cap
Time
Reso
urce
Usa
ge
![Page 18: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/18.jpg)
18
Module 3: Scaling conflict handling
• Migration-based conflict handling– Key observation: trigger migration after the conflict already
happened is too late since host is already overloaded– Solution: trigger migration before conflict
• Leverage conflict prediction to trigger migration T seconds before the predicted conflict happens
• To avoid migration for transient conflicts, only migrate if conflict duration is larger than K seconds
– Total Migration Penalty (QM)• Migration Penalty(MP) : SLO penalty during migration per time unit• migration penalty for VMi = MPi * Migration Time• Aggregate migration penalties over all VMs
![Page 19: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/19.jpg)
19
Module 3: Scaling conflict handling
• Total Under-provisioning Penalty (QRP) vs Total Migration Penalty (QM)• If QRP > QM, trigger migration conflict handling
• If QM > QRP, trigger local conflict handling
QM > QRP
Yes NoLocal conflict handling Migration
![Page 20: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/20.jpg)
20
The CloudScale System Architecture
Xen hypervisor
Virtual Machine
CloudScale
Resource demand prediction
Prediction error correction
Scaling conflict handling
Predictive frequency/voltage scaling
Graph adapted from original paper
Virtual MachineVirtual
MachineDom 0
HostResource demands
Initial resource caps
Adjusted resource caps
![Page 21: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/21.jpg)
21
Module 4: Predictive Frequency/Voltage Scaling
• Goal: save energy without affecting application SLOs
CPU resource demandCPU frequency
frequency 1
frequency i
frequency j
frequency 2
Adjust the resource caps for VMs accordingly
Max CPU frequency
Real Demand
![Page 22: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/22.jpg)
22
Evaluation: CPU prediction error• World Cup and EPA
are two 6-hour workloads. EPA has more fluctuations than World Cup
• For World Cup, CloudScale makes less than 5% significant prediction errors (|e| > 10%)
• For EPA, CloudScale makes less than 10% significant prediction errors (|e| > 10%)
![Page 23: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/23.jpg)
23
Evaluation: Migration Prediction Accuracy
• Lead time: how early CloudScale triggers the migration (e.g. migrate T seconds before the conflict happens)
![Page 24: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/24.jpg)
24
Evaluation: Conflict Resolving Schemes vs SLO violation rate
• RUBiS: an online auction benchmark
• Settings: • two RUBiS web server VMs
on the same host, maintain 75% resource pressure
• Memory size: VM1 = 1GBVM2 = 2GB
• Schemes:• Local conflict resolving
• Uniform Scheme• Reactive migration
• Always migrate VM2• VM selection
• Selects the VM with less migration penalty(VM1)
• CloudScale• Predictive migration
• 70s before conflict
![Page 25: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/25.jpg)
25
Discussion
1. In related work section the authors claims that compared to previous work, CloudScale does not require ANY offline tuning. But the current CloudScale does need migration lead time, the duration of conflict to trigger migration, as well as resource pressure threshold.
2. Coordinated multi-metric resource scaling (CPU, memory, network, disc, etc)
3. Coordinated multi-tier resource scaling (host-level,etc)4. Prioritize resource to applications which have not violated SLO
yet, assuming SLO violation is binary.5. Fairness: how to ensure applications are providing real SLO
feedback and migration penalty?
![Page 26: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/26.jpg)
26
![Page 27: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/27.jpg)
27
Backup Slides
![Page 28: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/28.jpg)
28
Average Delay vs Average CPU cap
• Schemes• Correction: the scaling
system performs resource pressure triggered prediction error correction only
• Dynamic padding: dynamic padding only
• CloudScale RP: both dynamic padding and scaling error correction, scaling error correction is triggered by resource pressure(90%) only
• CloudScale RP + SLO: scaling error correction is also triggered by SLO feedback (5%)
![Page 29: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/29.jpg)
29
Energy Saving• Total energy saving is 8-10% and
idle energy consumption are dmoninating.
![Page 30: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/30.jpg)
30
Evaluation: two workloads
![Page 31: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/31.jpg)
31
Evaluation: 90% resource pressure
![Page 32: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/32.jpg)
32
Evaluation: 75% resource pressure
![Page 33: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/33.jpg)
33
SLO Violation for Different Scaling Schemes
• Schemes:• Local conflict resolving
• Uniform Scheme• Reactive migration
• Always migrate VM2• VM selection
• Selects the VM with less migration penalty(VM1)
• CloudScale• Predictive migration
• 70s before conflict
![Page 34: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/34.jpg)
34
Memory footprint trace for Hadoop MapReduce
![Page 35: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/35.jpg)
35
Memory Resource Prediction Error
![Page 36: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/36.jpg)
36
Performance of Memory Scaling• Schemes
• Mean: allocating average value over memory footprint trace
![Page 37: CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d1e5503460f949f2205/html5/thumbnails/37.jpg)
37
Overhead