Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a...

79
Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, Taesoo Kim, Pavel Emelyanov

Transcript of Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a...

Page 1: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Instant OS Updates via Userspace Checkpoint-and-Restart

Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee,Taesoo Kim, Pavel Emelyanov

Page 2: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates are prevalent

Page 3: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

And OS updates are unavoidable

● Prevent known, state-of-the-art attacks– Security patches

● Adopt new features – New I/O scheduler features

● Improve performance– Performance patches

Page 4: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network
Page 5: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Unfortunately, system updates come at a cost

● Unavoidable downtime● Potential risk of system failure

Page 6: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Unfortunately, system updates come at a cost

● Unavoidable downtime● Potential risk of system failure

$109k per minuteHidden costs (losing customers)

Page 7: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Example: memcached

● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network

Page 8: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Example: memcached

● Facebook's memcached servers incur a downtime of 2-3 hours per machine– Warming cache (e.g., 120 GB) over the network

Our approach updates OS in 3 secsfor 32GB of data from v3.18 to v3.19

for Ubuntu / Fedora releases

Page 9: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Page 10: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Losing application state is inevitable → Restoring memcached takes 2-3 hours

Page 11: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Existing practices for OS updates

● Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

● Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires careful planning

Losing application state is inevitable → Restoring memcached takes 2-3 hours

Goals of this work:● Support all types of patches ● Least downtime to update new OS● No kernel source modification

Page 12: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OSOS OSStop service

Page 13: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OS

New OS

OS OSStop service

Soft reboot

Page 14: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Page 15: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

2-3 hours of downtime

Page 16: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

2-3 hours of downtime

2-10 minutes of downtime

Page 17: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problems of typical OS update

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Is it possible to keep the

application state?

2-3 hours of downtime

2-10 minutes of downtime

Page 18: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 19: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Checkpoint

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 20: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Memcahed

In-kernelswitch

Checkpoint

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 21: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

OS

Memcached

OS

New OSNew OS

Memcached

OS OSStop service

Soft reboot

Start service

Memcached

Memcahed

In-kernelswitch

Checkpoint

Restore

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

Page 22: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

Page 23: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

1-10 minutes of downtime

Page 24: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

OS updates loose application states

New OSNew OS

Stop service

Start service

Checkpoint

Restore

KUP's life cycle

KUP: Kernel update with applicationcheckpoint-and-restore (C/R)

In-kernelswitch

Challenge: how to further decrease

the potential downtime?

1-10 minutes of downtime

Page 25: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

Page 26: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

Page 27: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

Page 28: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 29: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 30: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Incremental checkpoint

Timeline

S1

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

downtime

Si Snapshot instance→

Page 31: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Incremental checkpoint

Timeline

S1

S1

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 32: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Incremental checkpoint

Timeline

S1

S1S2

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 33: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Incremental checkpoint

Timeline

S1

S1S2 S3

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint

downtime

Si Snapshot instance→

Page 34: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Incremental checkpoint

Timeline

S1

S1S2 S3

● Reduces downtime (up to 83.5%)● Problem: Multiple snapshots increase the restore time

Naivecheckpoint

Incrementalcheckpoint S4

downtime

downtime

Si Snapshot instance→

Page 35: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

On-demand restore● Rebind the memory once the application

accesses it

– Only map the memory region with snapshot and restart the application

● Decreases the downtime (up to 99.6%)● Problem: Incompatible with incremental

checkpoint

Page 36: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problem: both techniques together result in inefficient application C/R

● During restore, need to map each pages individually

– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore

S1S1

2 43

● An application has 4 pages as its working set size

● Incremental checkpoint has 2 iterations

– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→

– 2nd iteration 2 pages (2, 4) are dirtied→

1

● Increases the restoration downtime (42.5%)

Page 37: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Problem: both techniques together result in inefficient application C/R

● During restore, need to map each pages individually

– Individual lookups to find the relevant pages– Individual page mapping to enable on-demand restore

S1S1

S2

3 2 4

● An application has 4 pages as its working set size

● Incremental checkpoint has 2 iterations

– 1st iteration all 4 pages (1, 2, 3, 4) are dumped→

– 2nd iteration 2 pages (2, 4) are dirtied→

1

● Increases the restoration downtime (42.5%)

Page 38: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

New abstraction: file-offset based address mapping (FOAM)

● Flat address space representation for the snapshot– One-to-one mapping between the address space and the

snapshot– No explicit lookups for the pages across the snapshots

– A few map operations to map the entire snapshot with address space

● Use sparse file representation– Rely on the concept of holes supported by modern file systems

● Simplifies incremental checkpoint and on-demand restore

Page 39: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Techniques to decrease the downtime

Checkpoint

Restore

In-kernelswitch

1) Incremental checkpoint

2) On-demand restore

3) FOAM: a snapshot abstraction

4) PPP: reuse memorywithout an explicit dump

Page 40: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

OS

Running

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM2 431

Page 41: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Checkpoint

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

RAM

Memcached

Page 42: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

In-kernelswitch

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM

Memcached

Page 43: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Restore

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached

RAM2 431

Memcached

Page 44: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

MemcachedMemcached

RAM2 431

Memcached

Page 45: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

MemcachedMemcached

RAM2 431

Memcached

Dump data Read data

Page 46: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Redundant data copy

Running CheckpointIn-kernel

switchRestore Running

S1Snapshot2 431

OS

Running

New OS

● Application C/R copies data back and forth● Not a good fit for applications with huge memory

Memcached MemcachedMemcached

RAM2 431

MemcachedIs it possible to avoid memory copy?

Dump data Read data

Page 47: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

OS

Running

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431Memory actively used

Page 48: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Checkpoint

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

RAM2 431

Memcached

Reserve the memoryin the OS

Page 49: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

In-kernelswitch

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431

Memcached

Reserve the samememory in the new OS

Page 50: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Restore

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached

RAM2 431

Memcached

Implicitly map the memory region

Page 51: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Running

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

MemcachedMemcached

RAM2 431

Memcached

Memory again in use

Page 52: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Avoid redundant data copy across reboot

Running CheckpointIn-kernel

switchRestore Running

S1

Snapshot

OS

Running

New OSOld OS

● Reserve the application's memory across reboot● Inherently rebind the memory without any copy

Memcached MemcachedMemcached

RAM2 431

MemcachedChallenge: how to notify the newer

OS without modifying its source?Memory again in use

Page 53: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Persist physical pages (PPP) without OS modification

● Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary– Inject our own memory reservation function, then

further boot the OS● Handle page-faults for the restored application

– Dynamic kernel instrumentation– Inject our own page fault handler function for

memory binding

Page 54: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Persist physical pages (PPP) without OS modification

● Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary– Inject our own memory reservation function, then

further boot the OS● Handle page-faults for the restored application

– Dynamic kernel instrumentation– Inject our own page fault handler function for

memory binding

● No explicit memory copy● Does not require any kernel source modification

Page 55: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Implementation

● Application C/R → criu– Works at the namespace level

● In-kernel switch → kexec system call– A mini boot loader that bypasses BIOS while booting

Page 56: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Evaluation

● How effective is KUP's approach compared to the in-kernel hot patching?

● What is the effective performance of each technique during the update?

Page 57: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

KUP can support major and minor updates in Ubuntu

● KUP supports 23 minor/4 major updates (v3.17–v4.1) ● However, kpatch can only update 2 versions

– e.g., layout change in data structure

kpatch failure scenarios

Page 58: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Basic - SSD

Page 59: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Incremental checkpoint - SSD

Page 60: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

On-demand restore - SSD

Page 61: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

FOAM - SSD

Page 62: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Basic - RP-RAMFS

Page 63: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

Incremental checkpoint - RP-RAMFS

Page 64: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

On-demand restore - RP-RAMFS

Page 65: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

FOAM - RP-RAMFS

Page 66: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

0

50

100

150

190 200 210 220 230 240 250

Bandwidth (MB)

Timeline (sec)

PPP

Page 67: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Updating OS with memcached● PPP has the least degradation● Storage also affects the performance

Basic - SSD

Incremental checkpoint - SSD

On-demand restore - SSD

FOAM - SSD

Basic - RP-RAMFS

Incremental checkpoint - RP-RAMFS

On-demand restore - RP-RAMFS

FOAM - RP-RAMFS

200 210 220 230 240 250

Timeline (sec)

PPP

Page 68: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Limitations

● KUP does not support checkpoint and restore all socket implementations

– TCP, UDP and netlink are supported● Failure during restoration

– System call is removal or interface modification

Page 69: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Demo

Page 70: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Summary

● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)

● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory

Page 71: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Summary

● KUP: a simple update mechanism with application checkpoint-and-restore (C/R)

● Employs various techniques:– New data abstraction for application C/R– Fast in-kernel switching technique– A simple mechanism to persist the memory

Thank you!

Page 72: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Backup Slides

Page 73: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Handling in-kernel states

● Handles namespace and cgroups● ptrace() syscall to handle the blocking system calls ,

timers, registers etc. ● Parasite code to fetch / put the application's states● /proc file system exposes the required information

for application C/R● A new mode (TCP_REPAIR) allows handling the TCP

connections

Page 74: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

What cannot be checkpointed

● X11 applications● Tasks with debugger attached● Tasks running in compat mode (32 bit)

Page 75: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Possible changes afterapplication C/R

● Per-task statistics● Namespace IDs● Process start time● Mount point IDs● Socket IDs (st_ino)● VDSO

Page 76: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

Suitable applications

● Suitable for all kinds of applications● PPP approach supports all types of applications

– May fail to restore on the previous kernel● FOAM is not a good candidate for write-

intensive applications

– More confidence in safely restoring the application on the previous kernel

Page 77: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

FOAM - SSD

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→

Page 78: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

Out of memory error

FOAM - SSDFOAM - RP-RAMFS

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→

Page 79: Instant OS Updates via Userspace Checkpoint-and-Restart · Facebook's memcached servers incur a downtime of 2-3 hours per machine – Warming cache (e.g., 120 GB) over the network

PPP works effectively

0102030405060708090

0 8 16 24 32 40 48 56 64 72

Dow

ntim

e (s

ec)

WSS (GB) with 50% write

Out of memory error

FOAM - SSDFOAM - RP-RAMFSPPP

● FOAM on SSD slow→

● FOAM on RP-RAMFS space inefficient→