Flexible approaches to replicating shared data consistently

41
Flexible approaches to replicating shared data consistently Marc Shapiro Joint work with Nishith Krishna and Karthikeyan Bhargavan

description

Flexible approaches to replicating shared data consistently. Marc Shapiro Joint work with Nishith Krishna and Karthikeyan Bhargavan. Sharing information on a global scale. Enterprise collaboration, business information. Large numbers of users Globally distributed - PowerPoint PPT Presentation

Transcript of Flexible approaches to replicating shared data consistently

Page 1: Flexible approaches to replicating shared data consistently

Flexible approaches to replicating shared data consistently

Marc ShapiroJoint work with Nishith Krishna and Karthikeyan Bhargavan

Page 2: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 2

Sharing information on a global scale

Large numbers of users Globally distributed Concurrent access and update Invariants between objects Conflicts are rare but do occur Variable network bandwidth, high latency

Replicate for fault tolerance, reduced latency, load balancing

Enterprise collaboration, business information

Page 3: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 3

Important lesson #1

Replication is beneficial in many information sharing scenarios:Preserves autonomyReduces access latencyImproves fault-toleranceSupports disconnected operation

Page 4: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 4

System model

0

0

1

2

0 3

Many possible schedulesIn or out? Order?

Converge

rejectedrejected

executedexecuted

action = action = value, delta value, delta or operationor operationBob

Suzy

Mary

Page 5: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 5

Synchronous updates (pessimistic replication)

1SR = 1-Copy Serialisability

Avoid conflicts a priori by locking

Sequential access Intuitive, transparent

Vulnerable to: latency disconnection,

faults deadlock

Doesn’t scale if write contention

paintred

insertsmiley

timeBob

Mary

lock

lock

Page 6: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 6

Asynchronous updates (optimistic replication)

Resolve conflicts a posteriori

Disconnected, cooperative

Powerful Batch & optimise

Tentative: Diverge, rollback Different user

experienceDoesn’t scale if write contention

paintred

insertsmiley

time

Suzy

reconcileBob

Page 7: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 7

Important lesson #2

No replication scheme is ideal for all applications.

Performance, complexity, consistency trade-offs

No “one-size-fits-all” designPessimistic / optimistic mode visible

to usersContention / conflicts critical

Page 8: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 8

Conflicts & non-commute

Conflict: concurrent execution would violate application invariante.g. calendar: no double booking

Non-commuting operations: decide orderCommuting: optimisations

SchedulingConflict: Is action in or out?Non-commuting: Ordering?

Bob & SuzyMon 10:00

Mary & SuzyMon 10:00

time

Bob

Suzyprice -= 10€

price *= 1.05

Page 9: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 9

Important lesson #3

Understand your application needs and design with replication in mind.Capture invariants.Design for commutativity. Avoid concurrent non-commuting

operations.Avoid conflicting operations.Otherwise have modest scalability

expectations.

Page 10: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 10

Exploring the consistency design spaceUnderstanding replication & consistency

SemanticsAsynchronous / optimistic updatesPartial replicationDecentralised

Constraint-graph representationBreak into simpler sub-problemsComposable sub-algorithmsSpectrum of solutions

New serialisation algorithmNo unnecessary aborts

Page 11: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 11

Scenario

Promote 1 Julysalary += 1000

Redundancy

salary := 0

MustHave

Before

NonCommutingConflict

Atomic

CausalDependence

0 1

0 2

Page 12: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 12

Multilog & schedule

M = (K, , , ) Local view per site:Known actionsKnown constraintsGrows over time

Sound schedule: S = init … (M): known actions, zero or once , S <S S S

M sound (M)

Page 13: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 13

Protocol primitives

Guar (M) = { | every sound schedule } Dead (M) = { | every sound schedule }Serialised(M) = { |

Dead (M) }Decided (M) = Dead(M)

(Guar(M) Serialised(M))

Monotonic in tM sound Guar (M) Dead (M)

Page 14: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 14

Consistency: a formal definition

Mergeability: Any combination of multilogs remains sound: i, i’, i’’,…, t, t’, t’’…

Mi(t) Mi’(t’) Mi’’(t’’)… soundEventual Decision: Every action eventually

decided everywhere:, i, j, t:

Ki(t) t’, Decided (Mj (t’))

Omniscient observer:(Dead)(Guar )

Page 15: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 15

Abstract consistency algorithm

Input: any application semantics (K, , , )

Decompose into very simple sub-problems

Graphs:I = inputB = BeforeM = MustHaveS = SerialisationO = output

Output: scheduling partial order

O graph

I graph

Page 16: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 16

Conflict breaking

Make dead at least one action per cycle

B: Before edges from IRedden a nodeDelete red node and its

edgesTerminate when acyclic

Concurrent, asynchronousNumerous variants

B graph

I graph

Page 17: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 17

Conflict-breaking spectrum

B-Null: B assumed acyclic; do nothing [file systems, Usenet, ESDS]

B-TotalOrder, B-LocalMin: UIDs [DB]B-Conservative: Redden every node cycle

[Holliday]B-HighDegree: Redden highest-degree node

[Hamadi]Sub-algorithms: Not optimalB-IceCube: Globally minimise red nodesB-Arbitrary: application/user

Page 18: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 18

Agreement

If Dead then DeadM: MustHave edges from IColour shared across graphs

Propagate colour along edgesConcurrent, asynchronous

M graph

I graph

Page 19: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 19

Serialisation

Serialise non-dead edgesS: , edges from I

Delete red node & edgesInsert along in S, B, ODelete when Terminate when no edges

Concurrent May create new cycles in BMany variants

S graph

I graph

Page 20: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 20

Serialisation spectrum

S-Null: Assume no unordered ; do nothing [Usenet, C-ESDS]

S-Random: baselineS-Conservative: convert to conflict [DB]S-TotalOrder: UIDs [NC-ESDS]S-HappensBefore: follow Happens-Before

[state-machine replication]

Page 21: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 21

Output

O: edges, from IColours from Conflict Breaking,

Agreement edges from Serialisation

When 3 sub-algorithms have all terminated

Make red nodes dead, others guaranteed

Scheduling partial order

O graph

I graph

init

Page 22: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 22

Cycle-avoiding serialisation algorithmIdea: given some node

Consider all 24 possible neighbourhoodsSerialise in direction that cannot create

a cycle, if existsOtherwise deterministic global order

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Page 23: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 23

Cycle-free serialisation algorithm

Start when B acyclic. In S:Choose two nodes , Lock , Atomically perform the cycle-avoiding

serialisation move:• Insert in S, O• Delete

UnlockNever causes abortsPairwise agreement

Page 24: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 24

Isolation

Transaction isolationT: initially same as S + transactions

If with between T1, T2Then with between T1, T2Terminate when done

Concurrent, asynchronousMay create new cycles in B

C-B does not terminate before isolation

Many variants

T graph

I graph

Page 25: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 25

Example

No two-phase commit

d1 d2

d1 d2 d3

d3

T1 = d1.r d2.w

T2 = d2.r d1.w d3.r

T3 = d3.w

schedule = d2.r d1.w d3.w d3.r d1.r d2.w

serialise

isolation

serialise

Page 26: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 26

Simulations: Pseudo-realistic

B-HD = high-degreeB-Cons = conservativeB-LM = local minimum

S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative

Page 27: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 27

Joyce

Document = multilog1 operation log / userOperationsConstraints = logical invariants

Local viewsWrite to my log: updates are localView: collect multilog, break conflicts,

replayConsistent: resolution & replay satisfies

constraintsConvergence: authoritative log

Page 28: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 28

Joyce collaboration experience

paintred

insertsmiley

time

Suzy

reconcileBob

reconcile

Local views Reconcile Unlimited, selective undo

Convergence Commit log

Page 29: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 29

Conclusion

Actions & constraints: simple, formal modelEncode application semanticsExpress consistency

Basic components of consistency DecideMergeability

Universal consistency protocolSub-algorithmsMix & MatchCycle-avoiding serialisation

Partial replication

Page 30: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 30

----

Page 31: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 31

Site schedule

S (M)Choose any sound scheduleSi(t+1) / Si(t) / Si’(t) may differ greatly

More actions more non-determinism More constraints less non-determinismEnough to ensure consistency

0

0

0

Si(t) (Mi(t))

Page 32: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 32

Example more actionsmore actions

more schedulesmore schedulesmore constraintsmore constraints

fewer schedulesfewer schedules

Page 33: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 33

Eventual Consistency

From the literature: EC =If all clients stop submitting new

updates,Then eventually all replicas converge to

the same value(Eventually decide)

Page 34: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 34

Common monotonic prefix property

There exists prefix (i,t):Monotonic: t < t’ (i, t) << (i, t’)Equivalence: (i, t) (i’, t)Eventually inclusive: Ki(t) (i, t’)

CMP: goals to achieve

0

0

0

Page 35: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 35

Composing sub-algorithms

Parallel composition:Any conflict-breaking algorithmAny serialisation algorithm

Subtle termination conditions: Parallel composition. Terminate: (1)

Serialisation, (2) Conflict-breaking, (3) Agreement

Fast agreement minimises red nodes Sequential composition: conflict breaking +

agreement S acyclic. Then S-NoCycles + synchronisation

Page 36: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 36

S-AvoidCycles

Page 37: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 37

Simulations: range

B-HighDegree + S-NoCycles

Page 38: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 38

Simulations: Random

B-HD = high-degreeB-Cons = conservativeB-LM = local minimum

S-AC = avoid-cyclesS-Rand = randomS-Cons = conservative

Page 39: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 39

Incremental algorithm

Cannot decide until all its constraints known

Iteratively dectect quiescent subgraph: timestamp matrix

Output from interation n = input to iteration n+1Verify inclusion property

Page 40: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 40

Partial replication

A site replicates any number of disjoint databases

Receives actions, constraints relative to its replicas only

Consistency:MergeabilityEventual decision w.r.t. database

No need for global consensusOmniscient observer = full replication site

Page 41: Flexible approaches to replicating shared data consistently

MS Academic Days Europe 2005-04 Exploring consistency design space 41

Partial replication + Cycle-free serialisation

Partitioned database + partial replicationOperations commute across partitionA small number (often 1) of primary nodes

decide partitionIn-partition NonCommute: primary decidesCross-partition: pairwise agreementTotal order unnecessary ( state-machine

replication)