Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on...

28
Services Computing Technology and System Lab (SCTS) Cluster and Grid Computing Lab (CGCL) Live Migration of Virtual Machine Based on Full System Trace and Replay School of Computer Science and Technology Huazhong University of Science and Technology Haikun Liu, Hai Jin, Xiaofei Liao, Liting Hu, Chen Yu

Transcript of Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on...

Page 1: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

Services Computing Technology and System Lab (SCTS)Cluster and Grid Computing Lab (CGCL)

Live Migration of Virtual Machine Based on Full

System Trace and Replay

School of Computer Science and Technology

Huazhong University of Science and Technology

Haikun Liu, Hai Jin, Xiaofei Liao, Liting Hu, Chen Yu

Page 2: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

2

Outline

IntroductionDeterministic replay with execution traceSystem design and implementationPerformance EvaluationConclusion

Page 3: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

3

Live VM Migration

It migrates whole OS with running applications rather than single processes in a nondisruptive fashion , avoiding residual dependencies.

Applicable scenarios (cluster, data center)

Load Balancing Online Maintenance and Fault Tolerance Power Management for Green Computing

Page 4: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

4

Live VM Migration

VM migration issuesMemoryNetwork connectionDisk storage

Performance metrics Migration Downtime Total Migration Time Total Data Transferred Migration overhead

Page 5: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

5

Previous approaches

Memory pre-copying: XenMotion, VMotion

Pre-Migration phase (select alternate physical host and resource reservation)Memory push phase (pre-copying algorithm)Stop-and-copy phaseResume phase

Source host A

Target host B

Pre-migration

Stop-and-copy

Round 1Round 2Round n……

resume

Page 6: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

6

Some shortcomings of XenMotion

The memory transferring speed should be much faster than the memory page dirtied;Memory-to-memory approach is only efficient in a LAN but bring little benefit for low-bandwidth WAN; Just copy memory, but can’t copy the cache, TLB at the stop-and-copy phase;Some optimization may cause bad experience for services.

Page 7: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

7

CR/TR-Motion: A novel VM migration approach

ReVirt is adopted as our groundwork.Checkpointing/recovery combining with trace/replay technology are used to provide fast and transparent live VM migration.We orchestrate the running source and target VM with execution trace logged on the source host.

Page 8: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

8

ReVirt overview

ReVirt is a full-system trace and replay system, it was ported on UMLinux.

ReVirt only log non-deterministic events (time and external input)

ReVirt adds reasonable time and space overhead, Logging adds only 0-8% performance overhead, and log growth rates range from 40 MB per day to 1.2GB per day for different workloads .

Guest OS

Host OSVMM Event Logger

Guest Apps

Host hardware

Guest Apps

Page 9: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

9

CR/TR-Motion system structure

Page 10: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

10

Migration process

AB

Checkpoint

log2

Replay log1

…… ……

Round 1

log1

Round 2

VM Recovery

Checkpoint

log3

Round n

Stop and copyTransfer log n

Replay log n

Take over A

Waiting and chasing phase

……

Page 11: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

11

Two prerequisite

Is log transfer speed faster than log growth speed?

Is log replay speed on the target host faster than log growth speed?

Page 12: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

12

log and replay rate in our experiments

Workloads Log growth rate (KB/s)

Log replay rate (KB/s)

290.7

1.27

402.6

866.4

Rreplay/Rlog

daily use 7.9 36.8

kernel-build 1.2 1.05

static web app 247 1.63

dynamic web app

722 1.20

unixbench 61 65.88 1.08

Page 13: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

13

Migration challenges Migrate block devices

NAS or SAN can be accessed uniformly from all host machines in the cluster. Solve WAR (write after read) issues when replay on the target host in LAN.

Guarantee Network connections The same solution as XenMotion (ARP reply in LAN)

Source host① read

② write target hostX③ read

Page 14: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

14

Performance evaluation

AMD Athlon 2500+ processor and 1GB DDR RAM, Intel Pro/1000 Gbit/s NIC (limit the network bandwidth to 500Mbit/sec)The virtual machine is configured to use 512MB of RAM.The VM being migrated is the only VM running on the source machine and there are no VMs running on the target machine.

Page 15: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

15

Migration downtime

Our approach reduced migration downtime by 72.4% in average compared to pre-copy approach.

0

50

100

150

200

250

300

Daily use Kernel-build Static webapp

Dynamicweb app

UnixBench

Dow

ntim

e(m

s)

CR/TR-MotionPre-copy

Page 16: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

16

Total migration time

Our approach reduces the total migration time by 31.5% in average compared to Pre-copy.

0

10

20

30

40

50

60

70

80

90

100

Daily use Kernel-build Static webapp

Dynamicweb app

UnixBench

Tota

l mig

ratio

n tim

e (s

)

CR/TR-MotionPre-copy

Page 17: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

17

Total data transferred

CR/TR-Motion reduces synchronization traffic by 95.9% in average. This improvement will bring great benefit when our migration scheme is applied in low-bandwidth WANs. daily use 0.48 (0.04) 38.54 (2.1) 98.8%

kernel-build 0.53 (0.06) 152.44 (8.2) 99.6% static web app 8.34 (0.21) 228.99 (9.4) 96.4% dynamic web app 36.4 (0.96) 288.05 (12.2) 87.4% unixbench 2.59 (0.22) 113.38 (6.4) 97.7%

0

100

200

300

400

500

600

700

800

900

Daily use Kernel-build Static webapp

Dynamicweb app

UnixBenchTo

tal D

ata

Tran

sfer

red

(MB

) CR/TR-MotionPre-copy

Page 18: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

18

Network Throughput of Migration

The above figures demonstrate that the migrations for those workloads cause reasonable network traffic and bandwidth consumption.

0 10 20 30 40 50 600

50

100

150

200

250

300

350

400

450

Stop-and-copyIterative log transferringTh

roug

htpu

t (M

bit/s

ec)

Elapsed time (second)

transferring checkpoint

Waiting-and-Chasing

Dynamic Web Application

0 1 2 3 4 5 6 7 8 9 10 11 120

50

100

150

200

250

300

350

400

450Daily Use

Thro

ught

put (

Mbi

t/sec

)

Elapsed time (second)

Stop-and-copy

Iterative log transferring

transferring checkpoint

Page 19: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

19

Migration overhead

Overheads caused by transparent checkpointing and logging is less than 8.5% on average. It takes about 30% of a CPU to attain the maximum network throughput over the gigabit link.

0 10 20 30 40 500

10

20

30

40

50

60

70

80

90

100

110

Tota

l mia

grat

ion

time

(sec

ond)

% CPU Reversed

Daily use Kernel-build Static web app Dynamic web app UnixBench

0

10

20

3040

50

60

70

Kernel-build Static webapp

Dynamic webapp

UnixBench

Ela

pse

time

(sec

ond)

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

Mig

ratio

n ov

erhe

ad

stand-alone runtime without migrationruntime during migration migration overhead

Page 20: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

20

Conclusion

CR/TR-Motion provide a novel VM migration approach based on full-system trace and replay, that contrasts against memory-to-memory approach.Our scheme minimize the migration downtime and network bandwidth consumption.Our approach may bring a new benefit if the migration is performed in low-bandwidth WAN environments.

Page 21: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

21

Questions and Comments!

Services Computing Technology and System Lab (SCTS)Cluster and Grid Computing Lab (CGCL)

Page 22: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

22

Parameters definition

Total Data Transmitted: TDTTransfer rounds: nTotal Migration Time: TMTTime sequence to transfer instruction logs:(t0,t1,t2,t3,……,tn)Log serial at each transfering rounds:(log1,log2,log3,……,logn)Checkpoint Volume: Vckpt

Log transfer Rate: Rtrans

Log growth Rate: Rlog

Log replay rate: Rreplay

Vthd = 1KB : the threshold value at which the iteration terminated φ: Rlog/Rtrans,

: Rreplay/Rlog

Page 23: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

23

Scenario 1: fast synchronization

Rreplay>>Rlog

No need to execute the waiting-and-chasing phase

The key factor is parameters: Rtrans and Rlog

Migration process of fast synchronization

Page 24: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

24

Iteration rounds (scenario 1)

Supposing iteration terminated when the log size is less that Vthd =1kB

( 1)n log thdt R V− ≤

( 1)nckpt thdV Vϕ − ≤i.e.

i.e.

Rtrans=400Mbit/sec and Vckpt=512MB

1 log thd

ckpt

VnVϕ

⎡ ⎤= + ⎢ ⎥

⎢ ⎥⎢ ⎥

Page 25: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

25

Total Migration Time (Scenario 1)

checkpoint0

trans

Vt

R= log 0

1 0trans

R tt t

Rϕ= =

( 1)log ( 1) log ( 1)

0

nn ckpt n

n ntrans trans

R t V Rt t

R Rϕ

−− −= = =

……

n

i=0

(1 )TMT= =

(1 )

nckpt

itrans

Vt

Rϕϕ

−−∑ Rtrans=400Mbit/sec and Vckpt=512MB

Page 26: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

26

Conclusions ( Scenario 1 )

Total Data Transmitted

Downtime

1

(1 )TDT=V + =TMT *

(1 )i

nnckpt

ckpt log transi

VV R

ϕϕ=

−=

−∑

ndowntime n log replay otherT t V R t= + +

Page 27: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

27

Scenario 2: slow synchronizationRreplay>Rlog, but log file is not replayed fast enough.

A waiting-and-chasing phase is performed to postpone the stop-and-copy phase.

Migration process with a waiting-and-chasing synchronization phase

1 1i i

m mckpti i

log replay trans

log logV V VR R R= =− <=∑ ∑Condition:

Page 28: Live Migration of Virtual Machine Based on Full System ...Live Migration of Virtual Machine Based on Full System Trace and Replay ... Migration Downtime ... zThe virtual machine is

28

Conclusions ( Scenario 2 )

Total Data Transmitted

Total migration time

1= + ( 1)

1i

n

ckpt ckpti

logTDT V V Vϕ=

∂= +∂ −∑

1 1 ( )1

i

nckpti

trans replay trans

ckpt log logV V V VTMT

R R Rϕ=

+ ∂= + = +∂ −