Distributed Autonomous Virtual Resource …Distributed Autonomous Virtual Resource Management in...
Transcript of Distributed Autonomous Virtual Resource …Distributed Autonomous Virtual Resource Management in...
Distributed Autonomous Virtual Resource Management in Datacenters Using Finite-
Markov Decision Process
Liuhua Chen, Haiying Shen and Karan Sapra
Department of Electrical and Computer EngineeringClemson University
Load Balancing is Critical For IaaS Cloud
VMPM PM PM PM
VM – Virtual MachinePM – Physical MachineDatacenter DC Network
Increase resource utilization Reduce response time / SLA violations
2
Increase profit
Existing LB Methods Have Limitations
3
Reactive Methods
Proactive Methods
Reactively perform VM migration upon the occurrence of PM overloads
Predict VM resource demand in a short time for sufficient resource provision or load balancing
× Selecting migration VMs and destination PMs generates:-- high delay -- high overhead
× Cannot guarantee a long-term load balance state
Requirements for High Performance LB
Proactively handle the potential load imbalance problemGenerate low overhead and delay for load
balancingMaintain a long-term load balance state
4
Solution for High Performance LBo Solution
5
× Consider current situation
× Consider next step
Consider many steps
s1 s2
a1
a2
a2
a1
0.2
0.86-1
0.5
0.5
0.3
0.70.40.6
-58
-2-3
4
2
Finite-Markov Decision Process (MDP)
State
Transition probability
Action Reward
Optimal action policy π
Transition probability RewardPM state PM action
MDP model
Check state
Take action π
Periodically
Resource utilization monitor
Refer to π
Periodicalupdate
MDP-based Load Balancing
6
Determine s, a and r
Monitor & calculate
probabilityCompute optimal π
PM check self state Refer to π Take
action
Advantages of MDP-based Load Balancing
o Achieves long-term load balance hence reduces SLA violations
7
SLA
tenant provider
MDP
PM
PMPM
o Builds one MDP used by all PMs
o Reduces the overhead and delay of load balancing
Challenges of the MDP Design
1. MDP components must be well designed for lowoverhead
8
2. Transition probabilities in the MDP must be stable
VM resc. utilization changes over time
VMs have continuous resc. utilization values
Large action space
Frequent updates of transition probabilities
Probability changes over time
Probability changes over load thresholds
Difficult to determine the load threshold for statesFrequent updates of probabilities
Solution to Challenge 1
o Action: moving out a specific VMneeds to record the state transitions of a PM for
moving out each VMgenerates a prohibitive cost due to many VMsis not accurate due to time-varying VM load
9
Solution to Challenge 1
10
CPU
MEM
Low Med High
Low
Med
High
s9 s6 s3
s8 s5 s2
s7 s4 s1
VM/PM states
o Action: moving out a VM state (high, med, low)uses a PM load state as a staterecords the transitions between PM load states
by moving out a VM in a load state
Solution to Challenge 1
11
o Action: moving out a VM state (high, med, low)uses a PM load state as a staterecords the transitions between PM load states
by moving out a VM in a load state
o Advantages:the total number of VM-states in the action set
does not changeeach VM-state itself does not change, so the
associated transition probability does not change
Challenge 2
o MDP’s transition probabilities should be stableotherwise, it cannot accurately provide guidancemust be updated very frequently to keep the
transition probabilities accurate
o We have studied transition probabilities on VMmigrations based on traces (Google Cluster trace andPlanetLab VM trace)stable under slightly varying load thresholdstable over time
12
Trace Study on Transition Probabilities
13
0.00.30.50.81.0
0.7 0.8 0.9Load threshold
VM-high VM-medVM-low
Tran
sitio
n pr
obab
ility
Stable under varying threshold and over time
0.00.30.50.81.0
25 50 75Simulation time (hr)
VM-high VM-medVM-low
Tran
sitio
n pr
obab
ility
The probabilities that PM-high PM-medium by taking different actions, using Google Cluster trace
MDP Components
o State: classification of resource utilization of a PM
o Action: a migration of VM in a certain state (VM-State)
o Transition probability: the probability that state s will transit to state s’ after taking action a
o Reward: given after transition to state s’ from state s by taking action a
14
s1 s2
a1
a2
a2
a1
0.2
0.86-1
0.5
0.5
0.3
0.70.40.6
-58
-2-3
4
2
MDP model
Rewarding Policy
15
PM-high PM-low Positivereward A
PM-high PM-med Positivereward B
B > A
PM-low Positivereward C
C > D
no
PM-med Positivereward Dno
PM-high Negativereward no
Rewards for state changes Rewards for no actions
• Goal: avoid heavily loaded state of PMs while constrain the number of VM migrations
Transition probability RewardPM state PM action
MDP model
Optimal action policy π
Check state
Take action π
Periodically
Resource utilization monitor
Refer to π
Periodicalupdate
MDP-based Load Balancing
16
Optimal Action Determination
17
Value-iteration algorithm: find an action for each specific state that can quickly lead to the maximum reward
Transition probability Reward Value for a specific state, calculated based on rewards
Basic idea:• Set V=0• Iteratively update V by the equations• Update optimal actions for states in each iteration
The number of iterations indicates how many steps are considered in the future
Experimental Setup
18
Simulator: CloudSim
Traces: PlanetLab VM trace Google Cluster trace
Implement two versions: MDP uses the MDP model for migration VM selection;MDP* uses the model for both migration VM selection and destination PM selection.
Comparison methods: Sandpiper [1], CloudScale [2]
VMPM
trace
trace …
DC Network
X 1000 X 100Load balancing is conducted every 300 seconds for 30 times
PM
VM PM
[1] T. Wood, P. J. Shenoy, A. Venkataramani, and M. S. Yousif. Black-box and gray-box strategies for virtual machine migration. In Proc. of NSDI, 2007.[2] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes. CloudScale: Elastic resource scaling for multi-tenant cloud systems. In Proc. of SOCC, 2011.
The Number of VM Migrations
19
0
20
40
60
80
1.5 2 2.5Totla
l num
ber o
f m
igra
tions
MDP MDP*Sandpiper CloudScale
Load (x original load in trace)
Result: MDP*<MDP<Sandpiper<CloudScaleConclusion: MDP* has the least number of migrations
the lowest load balancing overhead
The Number of Overloaded PMs
20
Result: MDP*<MDP<Sandpiper<CloudScaleConclusion: MDP* has the least number of overloads
the longest time for the load balance state
01020304050
1.5 2 2.5
MDP MDP*Sandpiper CloudScale
Tota
l num
ber o
fov
erlo
aded
PM
s
Load (x original load in trace)
The CPU Time Consumptions
21
Result: MDP*<MDP<CloudScale<SandpiperConclusion: MDP* is the fastest in load balancing
0.0
2.0
4.0
6.0
2.5 3 3.5
CPU
tim
e (m
s)
VM/PM ratio
MDPMDP*SandpiperCloudScale
Total Amount of Energy Consumptions
22
Result: MDP*<MDP<Sandpiper<CloudScaleConclusion: MDP* consumes the least amount of energy0.0
2.0
4.0
6.0
2.5 3 3.5
Ener
gy (k
Wh)
VM/PM ratio
MDP MDP*Sandpiper CloudScale
Conclusion from Experimental Results
23
o The MDP model:Maintain the load balance state for a longer timeReduce SLA violations
Reduce the load balancing overhead and delay
SLA
tenant provider
Summaryo Motivation: Provide a load balancing method that
can reduce SLA violations and meanwhile reduce the load balancing overhead and delay
o We propose an MDP-based load balancing methodas an online decision making strategy to enhance the performance of cloud datacenters
o We conducted trace-driven experiments to show that our method outperforms previous reactive and proactive methods
o Future work: make our method fully distributed to increase its scalability
24