Transparent Checkpoints of Closed Distributed Systems in Emulababurtsev/doc/Eurosys09... ·...
Transcript of Transparent Checkpoints of Closed Distributed Systems in Emulababurtsev/doc/Eurosys09... ·...
TransparentCheckpointofClosedDistributedSystemsin
EmulabAntonBurtsev,PrashanthRadhakrishnan,
MikeHibler,andJayLepreau
UniversityofUtah,SchoolofCompuEng
Emulab
• PublictestbedfornetworkexperimentaEon
2
• Complexnetworkingexperimentswithinminutes
Emulab—preciseresearchtool
• Realism:– Realdedicatedhardware• Machinesandnetworks
– RealoperaEngsystems– FreedomtoconfigureanycomponentofthesoNwarestack
– Meaningfulreal‐worldresults• Control:
– Closedsystem• Controlledexternaldependenciesandsideeffects
– Controlinterface– Repeatable,directedexperimentaEon
3
Goal:morecontroloverexecuEon
• Statefulswap‐out– Demandforphysicalresourcesexceedscapacity– PreempEveexperimentscheduling• Long‐running• Large‐scaleexperiments
– Nolossofexperimentstate
• Time‐travel– Replayexperiments• DeterminisEcallyornon‐determinisEcally
– Debuggingandanalysisaid
4
Challenge
• BothcontrolsshouldpreservefidelityofexperimentaEon
• Bothrelyontransparencyofdistributedcheckpoint
5
Transparentcheckpoint• TradiEonally,semanEctransparency:
– CheckpointedexecuEonisoneofthepossiblecorrectexecuEons
• Whatifwewanttopreserveperformancecorrectness?– CheckpointedexecuEonisoneofthecorrectexecuEonsclosesttoanon‐checkpointedrun
• Preservemeasurableparametersofthesystem– CPUallocaEon– ElapsedEme– Diskthroughput– Networkdelayandbandwidth
6
TradiEonalview
• Localcase– Transparency=smallestpossibledownEme– Severalmilliseconds[Remus]– Backgroundwork– Harmsrealism
• Distributedcase– Lamportcheckpoint• Providesconsistency
– Packetdelays,Emeouts,trafficbursts,replaybufferoverflows
7
Maininsight
• Concealcheckpointfromthesystemundertest– ButsEllstayontherealhardwareasmuchaspossible
• “Instantly”freezethesystem– TimeandexecuEon– Ensureatomicityofcheckpoint• Singlenon‐divisibleacEon
• ConcealcheckpointbyEmevirtualizaEon
8
ContribuEons
• Transparencyofdistributedcheckpoint• Localatomicity
– Temporalfirewall
• ExecuEoncontrolmechanismsforEmulab– Statefulswap‐out– Time‐travel
• Branchingstorage
9
ChallengesandimplementaEon
10
CheckpointessenEals• StateencapsulaEon
– SuspendexecuEon– Saverunningstateofthe
system• VirtualizaEonlayer
11
CheckpointessenEals• StateencapsulaEon
– SuspendexecuEon– Saverunningstateofthe
system• VirtualizaEonlayer
– Suspendsthesystem– Savesitsstate– Savesin‐flightstate– Disconnects/reconnectsto
thehardware
12
Firstchallenge:atomicity• PermanentencapsulaEonis
harmful– Tooslow– Somestateisshared
• Encapsulateduponcheckpoint
• ExternallytoVM– FullmemoryvirtualizaEon– NeedsdeclaraEvedescripEon
ofsharedstate
• InternallytoVM– Breaksatomicity
13
?
Atomicityinthelocalcase
• Temporalfirewall– SelecEvelysuspendsexecuEonandEme
– Providesatomicityinsidethefirewall
• ExecuEoncontrolintheLinuxkernel– Kernelthreads– Interrupts,excepEons,IRQs
• Concealscheckpoint– TimevirtualizaEon
14
Secondchallenge:synchronizaEon
???$%#!Timeout
• Lamportcheckpoint– NosynchronizaEon– SystemisparEally
suspended
• Preservesconsistency– Logsin‐flightpackets
• Onceloggedit’simpossibletoremove
• Unsuspendednodes– Time‐outs
15
Synchronizedcheckpoint
• Synchronizeclocksacrossthesystem
• Schedulecheckpoint
• Checkpointallnodesatonce
• Almostnoin‐flightpackets
16
Bandwidth‐delayproduct• Largenumberofin‐
flightpackets
• Slowlinksdominatethelog
• FasterlinkswaitfortheenErelogtocomplete
• Per‐pathreplay?– UnavailableatLayer2– Accuratereplayengineoneverynode
17
Checkpointthenetworkcore• LeverageEmulabdelay
nodes– Emulablinksareno‐delay– LinkemulaEondoneby
delaynodes
• Avoidreplayofin‐flightpackets
• Captureallin‐flightpacketsincore– Checkpointdelaynodes
18
Efficientbranchingstorage
• TobepracEcalstatefulswap‐outhastobefast
• Mostlyread‐onlyFS– Sharedacrossnodesandexperiments
• Deltasaccumulateacrossswap‐outs
• BasedonLVM– ManyopEmizaEons
19
EvaluaEon
EvaluaEonplan
• Transparencyofthecheckpoint• Measurablemetrics
– TimevirtualizaEon– CPUallocaEon– Networkparameters
21
TimevirtualizaEon
22
Checkpointevery5sec(24checkpoints)
Timeraccuracyis28μsec
Checkpointadds±80μsecerror
do{usleep(10ms)germeofday()}while()
sleep+overhead=20ms
CPUallocaEon
23
Checkpointevery5sec(29checkpoints)
do{stress_cpu()germeofday()}while()
stress+overhead=236.6ms
Normallywithin9msofaverage
Checkpointadds27mserror
ls/root–7msoverheadxmlist–130ms
Networktransparency:iperf
24
NoTCPwindowchangeNopacketdrops
ThroughputdropisduetobackgroundacEvity
Averageinter‐packetEme:18μsecCheckpointadds:330‐‐5801μsec
Checkpointevery5sec(4checkpoints)
‐1Gbps,0delaynetwork,‐ iperfbetweentwoVMs‐ tcpdumpinsideoneofVMs‐ averagingover0.5ms
Networktransparency:BitTorrent
25
100Mbps,lowdelay1BTserver+3clients3GBfile
Checkpointpreservesaveragethroughput
Checkpointevery5sec(20checkpoints)
Conclusions
• Transparentdistributedcheckpoint– Preciseresearchtool– Fidelityofdistributedsystemanalysis
• Temporalfirewall– GeneralmechanismtochangepercepEonofEmeforthesystem
– Concealvariousexternalevents
• FutureworkisEme‐travel
26
Thankyou
Backup
28
Branchingstorage
29
• Copy‐on‐writeasaredolog• Linearaddressing• FreeblockeliminaEon• ReadbeforewriteeliminaEon
Branchingstorage
30