Multiple Virtual Machine Live Migration
in Federated Cloud Systems
Walter Cerroni Dept. of Electrical, Electronic and Information Engineering
University of Bologna, Italy
Motivations
• Success of cloud services and platforms
– significant savings in enterprise’s IT costs
– increasing number of mobile cloud users (e.g., social media)
• Huge growth of cloud computing investments
– public cloud market revenues in 2013: $ 58B
– expected to reach $ 191B by 2020 (source: Forrester, 2014)
• Incresing demand of computing, storage and
communication resources within Data Centers (DCs)
– R&D on DC infrastructure technologies
– advanced intra-DC and inter-DC networking solutions
2
Federated Cloud Computing
• DC over-provisioning may be too costly
– expensive computing and communication equipment
– energy consumption
• Federated cloud systems
– mutual agreement among different cloud providers
– smart workload sharing across multiple DC resources
– increased flexibility and mobility of cloud services
• How to design the inter-DC interconnection network?
– efficiently planning the underlying communication infrastructure
– providing the required level of QoS
– considering the specific workload of cloud services
3
Service Virtualization
• Service virtualization is widely used for DC administration
and maintenance
– decoupling service instances from underlying processing and
storage hardware
– key enabler for cloud federations
• Advantages of OS virtualization: Virtual Machines (VMs)
– platform independent
– quick deployment of new service instances
– easy service replication and migration flexibility and mobility
– effective load balancing and server consolidation
– easy backup and restore procedures
4
Live Migration of Virtual Machines
• Live migration of VMs
– moving services from one host/DC to another with minimal
disruption to end-user service availability
– current state of VM’s kernel and running processes is maintained
• Generalized to multiple VMs live migration
– moving groups of correlated VMs and virtual networks
– many multi-tier applications run across multiple VMs
– emerging Network Function Virtualization (NFV) solutions to
transform vendor-dependent network equipment into software
apps running on standard HW
– Software Defined Networking (SDN) technologies can help
maintaining also the network state
5
Live Migration of Virtual Machines
• Focus on memory migration
– storage migration through NAS syncronization (background traffic)
– network state migration through SDN
• Two approaches
– pre-copy: push most of the memory pages to destination host
before stopping VM at source host
– post-copy: pull most of the memory pages from source host
after resuming VM at destination host
• We assume the pre-copy approach (adoped by Xen, KVM, VirtualBox, etc.)
– iterative push phase: memory pages modified in a given round are sent
again in the next round, until total size of dirty
pages is below a given threshold or a maximum
number of iteration is reached
– stop-and-copy phase: VM is suspended at source host and the
remaining dirty pages are copied to destination
– resume phase: VM is resumed at destination with consistent
memory and network state
6
Performance Metrics for VM Live Migration
7
copied memory pages
dirtied memory pages
iterative push phase
stop-and-copy phase
resume phase
• Downtime ( ): amount of time the VM is suspended
measures the end-user’s perceived quality
• Total Migration Time ( ): amount of time needed to copy the
whole memory
measures the impact of the migration process on both communication infrastructure
and computing resource utilization (busy during whole migration time)
time
Simplified Model of VM Live Migration
• generic request to migrate a set of VMs
• all VMs in the given set have the same amount of memory
• all VMs show the same fixed page dirtying rate
• all VMs have the same memory page size
• the bit rate used to migrate VM is constant
• condition for pre-copy algorithm to be sustainable
8
dirty memory
size threshold max no. of
iterations
total migration time of VM
number of
iterations
Sequential vs. Parallel VM Migration
Sequential
Migration of one VM at a time at full
network channel capacity
Parallel
Simultaneous migration of all VMs
equally sharing the channel bit rate
9
Smaller transfer bit rate but same dirtying rate leads to more iterations in parallel
migration than in sequential
Sequential vs. Parallel VM Migration
10
Trade-off
Federated Cloud Network Scenario
• Federated DCs are interconnected by a full mesh of
guaranteed-bandwidth network pipes
– pre-established MPLS LSPs between edge routers
– pre-established lightpaths on optical inter-DC network
• Workload of multiple VMs migrating from source DC can
be hosted by a subset of remote federated DCs
– not enough computing resource available in some DCs
– service-specific DC location constraints (e.g., due to latency)
– other constraints due to load balancing, energy savings, etc.
• Available remote DC resources assigned following the
anycast service model
– any DC in the available/suitable subset is equivalent for hosting
the group of VMs to be migrated
11
Federated Cloud Network Scenario
12
MAN - WAN
VM set 1 VM set 2
Inter-DC Network Model Hypotheses
• H.1: each request z needs to migrate the same number M of VMs
• H.2: each multi-VM migration consumes the same amount of
channel capacity b
• H.3: each network pipe provides the same total amount of
guaranteed capacity B
• H.4: each remote DC has the computing and storage capacity of
hosting up to k groups of M VMs
• H.5: each migration request is allowed to choose among m instances
of the requested computing/storage resources, which are randomly
distributed over the n remote DCs
– considering the general case when multiple instances of the same
resources can be available in the same DC
• H.6: network state, as seen by a given DC, is the number r of
ongoing multi-VM migrations originated by that DC
– r = 0, 1, 2, … , n*B/b
13
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 0
14
z1
Cz11
Cz12
DC 1
DC 2
DC 3
source DC
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 1
15
z1
Cz11
DC 1
DC 2
DC 3
Cz12
source DC
z1
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 1
16
z2
Cz11
DC 1
DC 2
DC 3
Cz22
Cz21
source DC
z1
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 2
17
z2
Cz11
DC 1
DC 2
DC 3
Cz22
Cz21
source DC
z1 z2
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 2
18
z3
Cz11
DC 1
DC 2
DC 3
Cz22
Cz31
Cz32
Blocked!
source DC
z1 z2
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 2
19
z4
Cz11
DC 1
DC 2
DC 3
Cz22
Cz41
Cz42
source DC
z1 z2
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 3
20
z4
Cz11
DC 1
DC 2
DC 3
Cz22
Cz41
Cz42
source DC
z1 z2 z4
Inter-DC Network Model
Example with n = 3 , k = 4, m = 2, b = B
Network state: r = 3
21
Cz11
DC 1
DC 2
DC 3
Cz22
Cz41
source DC
• Multi-VM migration requests as a Poisson process
– request arrival rate
• Service time (when channel capacity b is busy) is the
total migration time
– service rate
– offered load
– loss system: results valid for any service time distribution with
same average
Markovian Model of Migration Request Blocking
22
Approximate Sub-state Probabilities
• Given state r, many combinations of connections to DCs are possible
• Exact solution would require to compute all sub-states probabilities
• Approximate solution with reduced state space considering only
"forward" state evolution
• Recursive expression of sub-space probabilities
n = 3, B = 3b
Prob. that m suitable resources
are hosted by unreachable DCs:
Prob. request blocked in state 5:
23
Markovian Model of Migration Request Blocking
24
Blocking probability:
Numerical Results
• VM memory size distribution
– bimodal distribution: groups of large or small VMs
– with probability 75%
– with probability 25%
• Reference values for model parameters
• Model results plus simulations to validate model accuracy
25
Sequential Migration Blocking Probability
• Good match with simulations reasonable accuracy
• Model allows to dimension the inter-DC network pipe capacity
26
Parallel Migration Blocking Probability
27
• Parallel migration shows worse performance than sequential
migration, due to larger total migration time
Seq. vs. Par. Migration Time and Downtime
28
• Model allows to quantify the trade-off between sequential and
parallel migration
B = 3 Gbps
Impact of the Cloud Federation Size
29
• Blocking rate can be reduced by increasing the number of DCs
• Need to asses the resulting network infrastructure cost
Conclusion
• Analytical model for inter-DC network dimensioning in federated
cloud systems
• Network load generated by multiple VM live migration
– performance depends on migration schedule and resources
– sequential vs. parallel migration
– trade off network resource usage with end-user’s perceived quality
• Further study on-going
– release some simplifying assumptions
– different bandwidth allocation stategies
– consider real DC traffic profiles and VM memory profiles
– trade-off holds in general
– memory transfer synchronization may help limiting the downtime
30
Top Related