Flexible approaches to replicating shared data consistently
description
Transcript of Flexible approaches to replicating shared data consistently
Flexible approaches to replicating shared data consistently
Marc ShapiroJoint work with Nishith Krishna and Karthikeyan Bhargavan
MS Academic Days Europe 2005-04 Exploring consistency design space 2
Sharing information on a global scale
Large numbers of users Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable network bandwidth, high latency
Replicate for fault tolerance, reduced latency, load balancing
Enterprise collaboration, business information
MS Academic Days Europe 2005-04 Exploring consistency design space 3
Important lesson #1
Replication is beneficial in many information sharing scenarios:Preserves autonomyReduces access latencyImproves fault-toleranceSupports disconnected operation
MS Academic Days Europe 2005-04 Exploring consistency design space 4
System model
0
0
1
2
0 3
Many possible schedulesIn or out? Order?
Converge
rejectedrejected
executedexecuted
action = action = value, delta value, delta or operationor operationBob
Suzy
Mary
MS Academic Days Europe 2005-04 Exploring consistency design space 5
Synchronous updates (pessimistic replication)
1SR = 1-Copy Serialisability
Avoid conflicts a priori by locking
Sequential access Intuitive, transparent
Vulnerable to: latency disconnection,
faults deadlock
Doesn’t scale if write contention
paintred
insertsmiley
timeBob
Mary
lock
lock
MS Academic Days Europe 2005-04 Exploring consistency design space 6
Asynchronous updates (optimistic replication)
Resolve conflicts a posteriori
Disconnected, cooperative
Powerful Batch & optimise
Tentative: Diverge, rollback Different user
experienceDoesn’t scale if write contention
paintred
insertsmiley
time
Suzy
reconcileBob
MS Academic Days Europe 2005-04 Exploring consistency design space 7
Important lesson #2
No replication scheme is ideal for all applications.
Performance, complexity, consistency trade-offs
No “one-size-fits-all” designPessimistic / optimistic mode visible
to usersContention / conflicts critical
MS Academic Days Europe 2005-04 Exploring consistency design space 8
Conflicts & non-commute
Conflict: concurrent execution would violate application invariante.g. calendar: no double booking
Non-commuting operations: decide orderCommuting: optimisations
SchedulingConflict: Is action in or out?Non-commuting: Ordering?
Bob & SuzyMon 10:00
Mary & SuzyMon 10:00
time
Bob
Suzyprice -= 10€
price *= 1.05
MS Academic Days Europe 2005-04 Exploring consistency design space 9
Important lesson #3
Understand your application needs and design with replication in mind.Capture invariants.Design for commutativity. Avoid concurrent non-commuting
operations.Avoid conflicting operations.Otherwise have modest scalability
expectations.
MS Academic Days Europe 2005-04 Exploring consistency design space 10
Exploring the consistency design spaceUnderstanding replication & consistency
SemanticsAsynchronous / optimistic updatesPartial replicationDecentralised
Constraint-graph representationBreak into simpler sub-problemsComposable sub-algorithmsSpectrum of solutions
New serialisation algorithmNo unnecessary aborts
MS Academic Days Europe 2005-04 Exploring consistency design space 11
Scenario
Promote 1 Julysalary += 1000
Redundancy
salary := 0
MustHave
Before
NonCommutingConflict
Atomic
CausalDependence
0 1
0 2
MS Academic Days Europe 2005-04 Exploring consistency design space 12
Multilog & schedule
M = (K, , , ) Local view per site:Known actionsKnown constraintsGrows over time
Sound schedule: S = init … (M): known actions, zero or once , S <S S S
M sound (M)
MS Academic Days Europe 2005-04 Exploring consistency design space 13
Protocol primitives
Guar (M) = { | every sound schedule } Dead (M) = { | every sound schedule }Serialised(M) = { |
Dead (M) }Decided (M) = Dead(M)
(Guar(M) Serialised(M))
Monotonic in tM sound Guar (M) Dead (M)
MS Academic Days Europe 2005-04 Exploring consistency design space 14
Consistency: a formal definition
Mergeability: Any combination of multilogs remains sound: i, i’, i’’,…, t, t’, t’’…
Mi(t) Mi’(t’) Mi’’(t’’)… soundEventual Decision: Every action eventually
decided everywhere:, i, j, t:
Ki(t) t’, Decided (Mj (t’))
Omniscient observer:(Dead)(Guar )
MS Academic Days Europe 2005-04 Exploring consistency design space 15
Abstract consistency algorithm
Input: any application semantics (K, , , )
Decompose into very simple sub-problems
Graphs:I = inputB = BeforeM = MustHaveS = SerialisationO = output
Output: scheduling partial order
O graph
I graph
MS Academic Days Europe 2005-04 Exploring consistency design space 16
Conflict breaking
Make dead at least one action per cycle
B: Before edges from IRedden a nodeDelete red node and its
edgesTerminate when acyclic
Concurrent, asynchronousNumerous variants
B graph
I graph
MS Academic Days Europe 2005-04 Exploring consistency design space 17
Conflict-breaking spectrum
B-Null: B assumed acyclic; do nothing [file systems, Usenet, ESDS]
B-TotalOrder, B-LocalMin: UIDs [DB]B-Conservative: Redden every node cycle
[Holliday]B-HighDegree: Redden highest-degree node
[Hamadi]Sub-algorithms: Not optimalB-IceCube: Globally minimise red nodesB-Arbitrary: application/user
MS Academic Days Europe 2005-04 Exploring consistency design space 18
Agreement
If Dead then DeadM: MustHave edges from IColour shared across graphs
Propagate colour along edgesConcurrent, asynchronous
M graph
I graph
MS Academic Days Europe 2005-04 Exploring consistency design space 19
Serialisation
Serialise non-dead edgesS: , edges from I
Delete red node & edgesInsert along in S, B, ODelete when Terminate when no edges
Concurrent May create new cycles in BMany variants
S graph
I graph
MS Academic Days Europe 2005-04 Exploring consistency design space 20
Serialisation spectrum
S-Null: Assume no unordered ; do nothing [Usenet, C-ESDS]
S-Random: baselineS-Conservative: convert to conflict [DB]S-TotalOrder: UIDs [NC-ESDS]S-HappensBefore: follow Happens-Before
[state-machine replication]
MS Academic Days Europe 2005-04 Exploring consistency design space 21
Output
O: edges, from IColours from Conflict Breaking,
Agreement edges from Serialisation
When 3 sub-algorithms have all terminated
Make red nodes dead, others guaranteed
Scheduling partial order
O graph
I graph
init
MS Academic Days Europe 2005-04 Exploring consistency design space 22
Cycle-avoiding serialisation algorithmIdea: given some node
Consider all 24 possible neighbourhoodsSerialise in direction that cannot create
a cycle, if existsOtherwise deterministic global order
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
MS Academic Days Europe 2005-04 Exploring consistency design space 23
Cycle-free serialisation algorithm
Start when B acyclic. In S:Choose two nodes , Lock , Atomically perform the cycle-avoiding
serialisation move:• Insert in S, O• Delete
UnlockNever causes abortsPairwise agreement
MS Academic Days Europe 2005-04 Exploring consistency design space 24
Isolation
Transaction isolationT: initially same as S + transactions
If with between T1, T2Then with between T1, T2Terminate when done
Concurrent, asynchronousMay create new cycles in B
C-B does not terminate before isolation
Many variants
T graph
I graph
MS Academic Days Europe 2005-04 Exploring consistency design space 25
Example
No two-phase commit
d1 d2
d1 d2 d3
d3
T1 = d1.r d2.w
T2 = d2.r d1.w d3.r
T3 = d3.w
schedule = d2.r d1.w d3.w d3.r d1.r d2.w
serialise
isolation
serialise
MS Academic Days Europe 2005-04 Exploring consistency design space 26
Simulations: Pseudo-realistic
B-HD = high-degreeB-Cons = conservativeB-LM = local minimum
S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative
MS Academic Days Europe 2005-04 Exploring consistency design space 27
Joyce
Document = multilog1 operation log / userOperationsConstraints = logical invariants
Local viewsWrite to my log: updates are localView: collect multilog, break conflicts,
replayConsistent: resolution & replay satisfies
constraintsConvergence: authoritative log
MS Academic Days Europe 2005-04 Exploring consistency design space 28
Joyce collaboration experience
paintred
insertsmiley
time
Suzy
reconcileBob
reconcile
Local views Reconcile Unlimited, selective undo
Convergence Commit log
MS Academic Days Europe 2005-04 Exploring consistency design space 29
Conclusion
Actions & constraints: simple, formal modelEncode application semanticsExpress consistency
Basic components of consistency DecideMergeability
Universal consistency protocolSub-algorithmsMix & MatchCycle-avoiding serialisation
Partial replication
MS Academic Days Europe 2005-04 Exploring consistency design space 30
----
MS Academic Days Europe 2005-04 Exploring consistency design space 31
Site schedule
S (M)Choose any sound scheduleSi(t+1) / Si(t) / Si’(t) may differ greatly
More actions more non-determinism More constraints less non-determinismEnough to ensure consistency
0
0
0
Si(t) (Mi(t))
MS Academic Days Europe 2005-04 Exploring consistency design space 32
Example more actionsmore actions
more schedulesmore schedulesmore constraintsmore constraints
fewer schedulesfewer schedules
MS Academic Days Europe 2005-04 Exploring consistency design space 33
Eventual Consistency
From the literature: EC =If all clients stop submitting new
updates,Then eventually all replicas converge to
the same value(Eventually decide)
MS Academic Days Europe 2005-04 Exploring consistency design space 34
Common monotonic prefix property
There exists prefix (i,t):Monotonic: t < t’ (i, t) << (i, t’)Equivalence: (i, t) (i’, t)Eventually inclusive: Ki(t) (i, t’)
CMP: goals to achieve
0
0
0
MS Academic Days Europe 2005-04 Exploring consistency design space 35
Composing sub-algorithms
Parallel composition:Any conflict-breaking algorithmAny serialisation algorithm
Subtle termination conditions: Parallel composition. Terminate: (1)
Serialisation, (2) Conflict-breaking, (3) Agreement
Fast agreement minimises red nodes Sequential composition: conflict breaking +
agreement S acyclic. Then S-NoCycles + synchronisation
MS Academic Days Europe 2005-04 Exploring consistency design space 36
S-AvoidCycles
MS Academic Days Europe 2005-04 Exploring consistency design space 37
Simulations: range
B-HighDegree + S-NoCycles
MS Academic Days Europe 2005-04 Exploring consistency design space 38
Simulations: Random
B-HD = high-degreeB-Cons = conservativeB-LM = local minimum
S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative
MS Academic Days Europe 2005-04 Exploring consistency design space 39
Incremental algorithm
Cannot decide until all its constraints known
Iteratively dectect quiescent subgraph: timestamp matrix
Output from interation n = input to iteration n+1Verify inclusion property
MS Academic Days Europe 2005-04 Exploring consistency design space 40
Partial replication
A site replicates any number of disjoint databases
Receives actions, constraints relative to its replicas only
Consistency:MergeabilityEventual decision w.r.t. database
No need for global consensusOmniscient observer = full replication site
MS Academic Days Europe 2005-04 Exploring consistency design space 41
Partial replication + Cycle-free serialisation
Partitioned database + partial replicationOperations commute across partitionA small number (often 1) of primary nodes
decide partitionIn-partition NonCommute: primary decidesCross-partition: pairwise agreementTotal order unnecessary ( state-machine
replication)