Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why...

53
KTH ROYAL INSTITUTE OF TECHNOLOGY Replication Vladimir Vlassov and Johan Montelius

Transcript of Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why...

Page 1: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

KTH ROYAL INSTITUTE OF TECHNOLOGY

Replication Vladimir Vlassov and Johan Montelius

Page 2: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Replication - why Performance

• latency • throughput

Availability • service respond despite crashes

Fault tolerance • service consistent despite failures

ID2201 DISTRIBUTED SYSTEMS / REPLICATION 2

Page 3: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Challenge A replicated service should, to the users, look like a non-replicated service. What do we mean by “look like”?

• linearizable • sequential consistency • causal consistency • eventual consistency

3 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 4: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Linearizable A replicated service is said to be linearizable if for any execution there is some interleaving of operations that:

• meets the specification of a non-replicated service • matches the real time order of operations in the real execution

All operations seam to have happened: atomically, at the correct time, one after the other.

A register that provides lineraizability is called an atomic register

4 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 5: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Registers Safe register

– If read does not overlap write, read returns the value written by the most recent write – the register is safe

– If read overlaps write, it returns any valid value Regular register

– If read does not overlap write, the register is safe – If read overlaps write, it returns either the old or the new value

Atomic register (linearizable) – If read does not overlap write, the register is safe – If read overlaps with write, it returns either the old value or the

new value but not newer than the next read

5 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 6: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Linearizable

6 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

server

p

q

r

w(x,5)

w(x,7)

r(x)

ok

ok

?

Page 7: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Linearizable

7 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

server

p

q

r

w(x,5)

w(x,7)

r(x)

ok

ok

7

Page 8: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Linearizable

8 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

server

p

q

r

w(x,5)

w(x,7)

r(x)

ok

ok

5

Page 9: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Linearizable

9 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

server

p

q

r

w(x,5)

w(x,7)

r(x)

ok

ok

0

We guarantee that there is a sequence that makes sense.

Page 10: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Why would it not make sense?

10 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

server

p

q

r

w(x,5)

r(x)

ok

?

cache p

cache q

ache r

w(x,7) ok

Page 11: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Why would it not make sense?

11 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

replica 1

p

q

r

w(x,5)

w(x,7)

r(x)

ok

ok

?

replica 2 replica 3

Page 12: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Sequential consistency A replicated service is said to be sequential consistent if for any execution there is some interleaving of operations that:

• meets the specification of a non-replicated service • matches the program order of operations in the real

execution Don’t worry about real time as long as it make sense.

12 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 13: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Still have to make sense

13 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Assume x and y is initially set to 0

p

q

w(x,5) ok

w(y,5) ok

r(y) ?

r(x) ?

Page 14: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Still have to make sense

14 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Assume x and y is initially set to 0

p

q

w(x,5) ok

w(y,5) ok

r(y) 0

r(x) ?

Page 15: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Still have to make sense

15 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Assume x and y is initially set to 0

p

q

w(x,5) ok

w(y,5) ok

r(y) ?

r(x) 0

Page 16: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Still have to make sense

16 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Assume x and y is initially set to 0

p

q

w(x,5) ok

w(y,5) ok

r(y) 0

r(x) 0

There should exist one total order of the operations that is consistent with the results. Total Order Store: this is still ok in X86 architecture.

Page 17: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Even more relaxed

17 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

p w(x,5) ok

w(x,7) ok q

As long as it make sense for each process. Causal consistency, unordered operations could be seen in different order.

r r(x) 5

s

r(x) 7

r(x) 7 r(x) 5

Page 18: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Eventual consistency There exist a total order that will eventually be visible to all. More on this later.

18 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 19: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Replication system model

19 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

replicated service

RM

RM

RM

RM

client

client

front end

front end

• front end knows about replication scheme, could be implemented on the client side

• replica managers (RM) coordinate operations to guarantee consistency

Page 20: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• Request: from front end to one or more replicas

• Coordination: decide on order etc • Execution: the actual execution of the

request • Agreement: agree on possible state

change • Response: reply received by front-

end and delivered to client

Replication system model

20 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

replicated service

front end

front end

Page 21: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• adding and removing nodes • ordered multicast • leader election • view delivery

Group membership service

21 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

group membership service

front end

front end

Page 22: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• reliable multicast • delivered in same view

View-synchronous group communication

22 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

p

q

r

view: p,q,r view: q,r

crash

Page 23: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• reliable multicast • delivered in same view • never deliver from excluded

node • never deliver not yet included

node

View-synchronous group communication

23 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

p

q

r

view: p,q,r view: q,r

crash

not allowed

Page 24: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Passive and active replication • Passive replication: one primary server and several

backup servers

• Active replication: servers on equal term

24 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 25: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• Request: front end sends request to primary

• Coordination: primary checks if it is a new request

• Execution: executes and stores response

• Agreement: sends updated state and reply to backup servers

• Response: sends reply to front-end

Passive replication

25 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM

RM

RM

RM

front end

front end

Page 26: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

What about crashes Primary crashes:

• backups will receive new view with primary missing • new primary is elected

if front end re-sends request • either the reply is known and is resent • or the execution proceeds as normal

26 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 27: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Passive replication - consistency The primary replica manager will serialize all operations.

We can provide linearizability.

27 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 28: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Passive replication – Pros and cons Pros

• All operations passes through a primary that linearize operations.

• Works even if execution is non-deterministic Cons

• Delivering state change can be costly. • Replicas under-utilized. • View-synchrony and leader election could be

expensive.

28 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 29: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• Request: front end multicast to all • Coordination: reliable total order delivery • Execution: all replicas execute request • Agreement: no need • Response: all replicas reply to front end

Active replication

29 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM

RM

RM

front end

front end

Page 30: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Active replication - consistency Sequential consistency:

• All replicas execute the same sequence of operations. • All replicas produce the same answer.

Linearizability: • Total order multicast does not guarantee real-time

order. • Linearizability not guaranteed if front-end acknowledge

operation before it has been processed by replicas.

30 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 31: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Active replication – Pros and cons Pros

• No need to send state changes. • No need to change existing servers. • Read request could possibly be sent directly to replicas. • Could survive Byzantine failures.

Cons: • Requires total order multicast. • Requires deterministic execution.

31 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 32: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Availability Both replication schemes require that servers are available.

If a server crashes it will take some time to detect and remove the faulty node.

Can we build a system that responds even if some nodes are not available?

32 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 33: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

It is impossible for a distributed system to provide all three guarantees at the same time: • Consistency (all nodes see the same

data at the same time) • Availability (every request receives a

response about whether it succeeded or failed)

• Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)

The CAP theorem

33 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

C

A P

Consistency

Availability Partition tolerance

Page 34: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

The CAP theorem You can not have a consistent and always available system if you’re in an environment where you face network partitions.

When there is a network partition: • limit operations i.e. some operations are not available, • continue, but record all operations that could cause an

inconsistency.

When the system re-connects: merge operations performed in separate partitions.

34 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 35: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

The CAP theorem An alternative is to relax consistency.

• BASE: Basic Availability, Soft-state, Eventual consistency Used by many large scale key-value stores and replicated distributed services

35 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 36: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• replica managers interchange update

messages • updates propagate through the network • sequential consistency not guaranteed • we want to provide causal consistency

Gossip architecture

36 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

What if we only need to provide causal consistency?

RM

RM

RM

RM

RM

RM

Page 37: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Vector clocks A vector clock with one index per replica manager. Each update will be tagged with a vector clock timestamp. Some updates are concurrent!

37 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 38: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• one index per replica manager • front ends keep vector clocks • replica mangers apply updates in order • causal consistency guaranteed

The front end

38 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM

RM

RM

front end

front end

Page 39: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• send a query with timestamp • check current time, wait for updates • update will arrive

The front end

39 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,5,5> <2,4,6>

read <2,4,6>

update <2,5,6>

Page 40: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• send a query with timestamp • check current time, wait for updates • update will arrive • update state and clock

The front end

40 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,5,6> <2,4,6>

read <2,4,6>

update <2,5,6>

Page 41: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• send a query with timestamp • check current time, wait for updates • update will arrive • update state and clock • reply

The front end

41 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,5,6> <2,4,6>

ok <2,5,6>

update <2,5,6>

Page 42: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• send a query with timestamp • check current time, wait for updates • update will arrive • update state and clock • reply • update state and clock

The front end

42 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,5,6> <2,5,6>

ok <2,5,6>

update <2,5,6>

Page 43: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

The replica manager The replica manager has a hold-back queue, operations that are too early to execute. As updates arrive the replica will execute updates, and pending read operations. Updates are partially ordered.

43 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 44: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• operation with timestamp

Update operation

44 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,4,5> <2,5,6>

write <2,5,6>

Page 45: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• operation with timestamp • reply with unique timestamp

Update operation

45 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,4,5> <3,5,6>

ok <3,5,6>

Page 46: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• operation with timestamp • reply with unique timestamp • wait for updates

Update operation

46 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,4,6> <3,5,6>

update <2,4,6>

Page 47: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• operation with timestamp • reply with unique timestamp • wait for updates

Update operation

47 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<2,5,6> <3,5,6>

update <2,5,5>

Page 48: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

• operation with timestamp • reply with unique timestamp • wait for updates • perform write when safe

Update operation

48 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

RM-2

RM-1

RM-3

front end

<3,5,6> <3,5,6>

Page 49: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Read operations: on hold until safe to answer. Update operations from front end.

• front end adds unique id • replica checks that it is not a duplicate • replica replies with unique timestamp • placed in update log

Implementation

49 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Gossip operations • interchange part of update log with

partners • place in update log • provide information on which message

a replica has seen • remove applied operations that has

been seen by all replicas Execute operations

• apply stable operations • in happen before order

Page 50: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Stable operations and order of execution • An operation in the log is stable if its time stamp, as provided by

the front end, is less than or equal to the value timestamp.

• Operations must be executed in the order as described by the replica managers in their replies to the front ends.

50 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 51: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Causal, forced and immediate Sometimes we would like to have stronger consistency guarantees:

• Forced: total order in relation to other forced updates. • Immediate: total order in relation to all updates.

Will of course require that we do some more book keeping.

51 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 52: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Gossip architectures • How many replicas can we have? • Have hundreds of read-only replicas and a handful of update replicas. • Will an application cope with causal consistency only? • How eager should the gossiping be? • False ordering - we order things that are not necessarily in causal

relation to each other.

52 ID2201 DISTRIBUTED SYSTEMS / REPLICATION

Page 53: Replication - KTH€¦ · Replication Vladimir Vlassov and Johan Montelius . Replication - why Performance • latency • throughput Availability • service respond despite crashes

Summary • Replication: performance, availability, fault tolerance • Consistency: linearizable, sequential consistency, ... • Passive or active replication • The CAP theorem • Gossip architectures for causal consistency

53 ID2201 DISTRIBUTED SYSTEMS / REPLICATION