Download - Multiple Virtual Machine Live Migration in Federated Cloud ...elkhatib/crosscloud/2014/docs/CrossCloud… · Federated Cloud Computing • DC over-provisioning may be too costly –

Multiple Virtual Machine Live Migration

in Federated Cloud Systems

Walter Cerroni Dept. of Electrical, Electronic and Information Engineering

University of Bologna, Italy

[email protected]

Motivations

• Success of cloud services and platforms

– significant savings in enterprise’s IT costs

– increasing number of mobile cloud users (e.g., social media)

• Huge growth of cloud computing investments

– public cloud market revenues in 2013: $ 58B

– expected to reach $ 191B by 2020 (source: Forrester, 2014)

• Incresing demand of computing, storage and

communication resources within Data Centers (DCs)

– R&D on DC infrastructure technologies

– advanced intra-DC and inter-DC networking solutions

2

Federated Cloud Computing

• DC over-provisioning may be too costly

– expensive computing and communication equipment

– energy consumption

• Federated cloud systems

– mutual agreement among different cloud providers

– smart workload sharing across multiple DC resources

– increased flexibility and mobility of cloud services

• How to design the inter-DC interconnection network?

– efficiently planning the underlying communication infrastructure

– providing the required level of QoS

– considering the specific workload of cloud services

3

Service Virtualization

• Service virtualization is widely used for DC administration

and maintenance

– decoupling service instances from underlying processing and

storage hardware

– key enabler for cloud federations

• Advantages of OS virtualization: Virtual Machines (VMs)

– platform independent

– quick deployment of new service instances

– easy service replication and migration flexibility and mobility

– effective load balancing and server consolidation

– easy backup and restore procedures

4

Live Migration of Virtual Machines

• Live migration of VMs

– moving services from one host/DC to another with minimal

disruption to end-user service availability

– current state of VM’s kernel and running processes is maintained

• Generalized to multiple VMs live migration

– moving groups of correlated VMs and virtual networks

– many multi-tier applications run across multiple VMs

– emerging Network Function Virtualization (NFV) solutions to

transform vendor-dependent network equipment into software

apps running on standard HW

– Software Defined Networking (SDN) technologies can help

maintaining also the network state

5

Live Migration of Virtual Machines

• Focus on memory migration

– storage migration through NAS syncronization (background traffic)

– network state migration through SDN

• Two approaches

– pre-copy: push most of the memory pages to destination host

before stopping VM at source host

– post-copy: pull most of the memory pages from source host

after resuming VM at destination host

• We assume the pre-copy approach (adoped by Xen, KVM, VirtualBox, etc.)

– iterative push phase: memory pages modified in a given round are sent

again in the next round, until total size of dirty

pages is below a given threshold or a maximum

number of iteration is reached

– stop-and-copy phase: VM is suspended at source host and the

remaining dirty pages are copied to destination

– resume phase: VM is resumed at destination with consistent

memory and network state

6

Performance Metrics for VM Live Migration

7

copied memory pages

dirtied memory pages

iterative push phase

stop-and-copy phase

resume phase

• Downtime ( ): amount of time the VM is suspended

measures the end-user’s perceived quality

• Total Migration Time ( ): amount of time needed to copy the

whole memory

measures the impact of the migration process on both communication infrastructure

and computing resource utilization (busy during whole migration time)

time

Simplified Model of VM Live Migration

• generic request to migrate a set of VMs

• all VMs in the given set have the same amount of memory

• all VMs show the same fixed page dirtying rate

• all VMs have the same memory page size

• the bit rate used to migrate VM is constant

• condition for pre-copy algorithm to be sustainable

8

dirty memory

size threshold max no. of

iterations

total migration time of VM

number of

iterations

Sequential vs. Parallel VM Migration

Sequential

Migration of one VM at a time at full

network channel capacity

Parallel

Simultaneous migration of all VMs

equally sharing the channel bit rate

9

Smaller transfer bit rate but same dirtying rate leads to more iterations in parallel

migration than in sequential

Sequential vs. Parallel VM Migration

10

Trade-off

Federated Cloud Network Scenario

• Federated DCs are interconnected by a full mesh of

guaranteed-bandwidth network pipes

– pre-established MPLS LSPs between edge routers

– pre-established lightpaths on optical inter-DC network

• Workload of multiple VMs migrating from source DC can

be hosted by a subset of remote federated DCs

– not enough computing resource available in some DCs

– service-specific DC location constraints (e.g., due to latency)

– other constraints due to load balancing, energy savings, etc.

• Available remote DC resources assigned following the

anycast service model

– any DC in the available/suitable subset is equivalent for hosting

the group of VMs to be migrated

11

Federated Cloud Network Scenario

12

MAN - WAN

VM set 1 VM set 2

Inter-DC Network Model Hypotheses

• H.1: each request z needs to migrate the same number M of VMs

• H.2: each multi-VM migration consumes the same amount of

channel capacity b

• H.3: each network pipe provides the same total amount of

guaranteed capacity B

• H.4: each remote DC has the computing and storage capacity of

hosting up to k groups of M VMs

• H.5: each migration request is allowed to choose among m instances

of the requested computing/storage resources, which are randomly

distributed over the n remote DCs

– considering the general case when multiple instances of the same

resources can be available in the same DC

• H.6: network state, as seen by a given DC, is the number r of

ongoing multi-VM migrations originated by that DC

– r = 0, 1, 2, … , n*B/b

13

Inter-DC Network Model

Example with n = 3 , k = 4, m = 2, b = B

Network state: r = 0

14

z1

Cz11

Cz12

DC 1

DC 2

DC 3

source DC




15

z1

Cz11

DC 1

DC 2

DC 3

Cz12

source DC

z1




16

z2

Cz11

DC 1

DC 2

DC 3

Cz22

Cz21

source DC

z1




17

z2

Cz11

DC 1

DC 2

DC 3

Cz22

Cz21

source DC

z1 z2




18

z3

Cz11

DC 1

DC 2

DC 3

Cz22

Cz31

Cz32

Blocked!

source DC

z1 z2




19

z4

Cz11

DC 1

DC 2

DC 3

Cz22

Cz41

Cz42

source DC

z1 z2




20

z4

Cz11

DC 1

DC 2

DC 3

Cz22

Cz41

Cz42

source DC

z1 z2 z4




21

Cz11

DC 1

DC 2

DC 3

Cz22

Cz41

source DC

• Multi-VM migration requests as a Poisson process

– request arrival rate

• Service time (when channel capacity b is busy) is the

total migration time

– service rate

– offered load

– loss system: results valid for any service time distribution with

same average

Markovian Model of Migration Request Blocking

22

Approximate Sub-state Probabilities

• Given state r, many combinations of connections to DCs are possible

• Exact solution would require to compute all sub-states probabilities

• Approximate solution with reduced state space considering only

"forward" state evolution

• Recursive expression of sub-space probabilities

n = 3, B = 3b

Prob. that m suitable resources

are hosted by unreachable DCs:

Prob. request blocked in state 5:

23

Markovian Model of Migration Request Blocking

24

Blocking probability:

Numerical Results

• VM memory size distribution

– bimodal distribution: groups of large or small VMs

– with probability 75%

– with probability 25%

• Reference values for model parameters

• Model results plus simulations to validate model accuracy

25

Sequential Migration Blocking Probability

• Good match with simulations reasonable accuracy

• Model allows to dimension the inter-DC network pipe capacity

26

Parallel Migration Blocking Probability

27

• Parallel migration shows worse performance than sequential

migration, due to larger total migration time

Seq. vs. Par. Migration Time and Downtime

28

• Model allows to quantify the trade-off between sequential and

parallel migration

B = 3 Gbps

Impact of the Cloud Federation Size

29

• Blocking rate can be reduced by increasing the number of DCs

• Need to asses the resulting network infrastructure cost

Conclusion

• Analytical model for inter-DC network dimensioning in federated

cloud systems

• Network load generated by multiple VM live migration

– performance depends on migration schedule and resources

– sequential vs. parallel migration

– trade off network resource usage with end-user’s perceived quality

• Further study on-going

– release some simplifying assumptions

– different bandwidth allocation stategies

– consider real DC traffic profiles and VM memory profiles

– trade-off holds in general

– memory transfer synchronization may help limiting the downtime

30