Trust Management in P2P systems Presenter: Lintao Liu April 21th, 2003.
Effectively Model Checking Real-World Distributed Systems Junfeng Yang Joint work with Huayang Guo,...
-
Upload
poppy-jordan -
Category
Documents
-
view
213 -
download
0
Transcript of Effectively Model Checking Real-World Distributed Systems Junfeng Yang Joint work with Huayang Guo,...
1
Effectively Model CheckingReal-World Distributed Systems
Junfeng YangJoint work with Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu,
Lintao Zhang, Heming Cui, Jingyue Wu,Chia-che Tsai, John Gallagher
2
One-slide Summary
• Distributed systems: important, but hard to get right• Model checking: find serious bugs but is slow• Dynamic Interface Reduction: a new type of state-
space reduction technique in 25 years [DeMeter SOSP 11]
– exponentially speed up model checking– One data point: 34 years 18 hours
• Stable Multithreading: a radically new approach [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13]
– what-you-check-is-what-you-run– Billions of years 7 hours– https://github.com/columbia/smt-mc
4
Distributed Systems: Hard to Get Right
• Node has no centralized view of entire system• Code must correctly handle many failures– Link failures, network partitions– message loss, delay, or reordering– machine crashes
• Worse: geo, larger, weird failures more likely
Complex protocols, more complex code, bugs
5
Model CheckingDistributed Systems Implementations
…
• Choices of actions– Send message– Recv message– Run thread– Delay message– Fail link– Crash machine– …
• Run checkers on states– E.g., assertions
send
fail link
thread
crash
…
6
Good Error Detection Results
• E.g., [MoDist NSDI 09] [dBug SSV 10]
– Easy: check unmodified, real code in native environment (“in-situ” [eXplode OSDI 06])
– Comprehensive: check many corner cases– Deterministic: detected errors can be replay
• MoDist results– Checked Berkeley DB rep, MPS (Microsoft production),
PacificA– Found 35 bugs
• 10 Protocol flaws found in every system checked
– Transfer to Microsoft product groups
7
But, the State Explosion Problem
• Real-world distributed systems have too many states to completely explore– Even for conceptually small state spaces– 3-node MPS: 34 years for MoDist!
• Incompleteness Low assurance• Prior model checkers explored many
redundant states
8
This Talk: Two Techniques toEffectively Reduce/Shrink State Space
• Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11]
– 34 years 18 hours, 10^5 reduction
• Leverage Stable Multithreading [Tern OSDI '10] [Peregrine
SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] to make what-you-check what-you-run (ongoing)
9
Dynamic Interface Reduction (DIR)• Insight: system builders decompose a system
into components with narrow interfaces– e.g., [Clarke, Long, McMillan 87] [Laster, Grumberg 98]
• Distinguish global and local actions• Check local actions via conceptually local fork()
// main // ckpt
n=recv()total+=nSend(n)
Log(total)
10
Reduction Analysis• N components, each having M local actions
w/o DIR: M * M * … * M = M^N
w DIR: M + M + … + M = M * N
Exponential reduction
…
…
…
…
…
11
Challenge in Implementing DIR
• How to automatically compute interfaces from real code w/o causing false positives or missing bugs?
• Manual spec: tedious, costly, error-prone– Required by prior compositional or modular
model checking work• Made-up interfaces: difficult-to-diagnose false
positives [Guerraoui and Yabandeh, NSDI 11]
12
Automatically Discover Interfaceby Running Code
12
Global Explorer Explore
global actions
Local Explorers
Explore local
actions
Explore local
actions
Explore local
actons
Message Traces
Message Traces
Message Traces
Message Traces
• Insight: message traces collectively define interfaces
Message Traces
Message Traces
Message Traces
13
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Example
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
14
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Global Explorer:Compute Initial Global Trace
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
15
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Global Explorer:Project Message Traces
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
16
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorers:Explore Local Actions Using Message traces
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
17
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of Primary:Explore Local Trace 1
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
P.Log
P.total+=1
P.total+=2
18
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of Primary:Explore Local Trace 2
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
P.LogP.total+=1
P.total+=2
19
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of Primary:Explore Local Trace 3
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
P.Log
P.total+=1
P.total+=2
20
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of Client
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)C.Send(P, 1)
C.Send(P, 2)
21
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of Client
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)
C.Send(P, 1)
C.Send(P, 2)
C.Toss(2) = 0
22
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Local Explorer of ClientFound New Message Trace
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)
C.Send(P, 1)
C.Send(P, 3)
C.Toss(2) = 1
C.Send(P, 2)
23
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Global Explorer: Composition
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 2)P.Recv(C, 2)P.total+=2P.Send(S, 2)S.Recv(P, 2)S.total+=2
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)
C.Send(P, 1)
C.Send(P, 3)
C.Toss(2) = 1
C.Send(P, 2)
24
// main // ckptWhile(n=recv()){ total+=n Send(S, n)}
Log(total)
if (Toss(2) == 0)) { Send(P, 1); Send(P, 2);} else { Send(P, 1); Send(P, 3);}
Global Explorer: New Global Trace
// main // ckptWhile(n=recv()){ total+=n}
Log(total)
Client C Primary P Second S
C.Toss(2) = 0C.Send(P, 1)P.Recv(C, 1)P.LogP.total+=1P.Send(S, 1)S.Recv(P, 1)S.LogS.total+=1C.Send(P, 3)
Global
P.Recv(C, 1)
P.Send(S, 1)
P.Recv(C, 2)
P.Send(S, 2)
S.Recv(P, 1)
S.Recv(P, 2)
C.Send(P, 1)
C.Send(P, 3)
C.Toss(2) = 1
C.Send(P, 2)
25
Implementation
• 7,279 lines of C++
• Integrated DIR with –MoDist [MoDist NSDI 09] ,757 lines–MaceMC [MaceMC NSDI 07] ,1,114 lines– Easy
• Orthogonal with partial order reduction through vector clock tricks
Verification/Reduction Results
• MPS (Microsoft production system)• BDB: Berkeley DB Replication• Chord: Chord implementation in Mace• *-n: n nodes
• Results of other benchmarks in [Demeter SOSP 11]
26
App MPS-2 MPS-3 BDB-2 BDB-3 Chord-2 Chord-3
Reduction 488 542944 277 278481 19 1587
Speedup 153 217178
50 44203 7 547
DIR-MoDist DIR-MaceMC
27
DIR Summary• Proven sound (introduce no false positive)
and complete (introduce no false negative)• Fully automatic, real, exponential reduction• Works seamlessly w/ existing model
checkers– Integrated into MoDist and MaceMC; easy
• Results– Verified instances of real-world systems– Empirically observed large reduction
• 34 years 18 hours (10^5) on MPS
28
This Talk: Two Techniques toEffectively Reduce State Space
• Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11]
– 34 years 18 hours, 10^5 reduction
• Leverage Stable Multithreading [Tern OSDI '10] [Peregrine
SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] to make what-you-check what-you-run (ongoing)
29
Threads: Difficult to Model Check
• Many thread interleavings, or schedules– To verify, local explorer must explore all schedules
• Wide interfaces between threads – Any shared-memory load/store– Tracing load/store is costly– DIR may not work well
30
What-you-check is what-you-run
• Coverage = C/R• Reduction: enlarge C exploiting
equivalence• But equivalence is rare, hard to
find!– DIR took us 2-3 years
• Can we increase coverage w/o equivalence?
• Shrink R w/ Stable Multithreading [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13]
All possible runtime schedules (R)
Model checked schedules (C)
31
Stable Multithreading
• Reuse well-checked schedules on diff. inputs• How does it work? See papers [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot
SOSP '13] [CACM '13]
• So much easier that it feels like cheating
Nondeterministic Stable Deterministic
32
Conclusion
• Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11]
– Automatic, real, exponential reduction– Proven sound and complete– 34 years 18 hours, 10^5 reduction
• Leverage Stable Multithreading [Tern OSDI '10] [Peregrine
SOSP '11] to make what-you-check what-you-run (ongoing)