CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

32
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 18: Wait-Free Simulations Beyond Registers 1

description

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS. Fall 2011 Prof. Jennifer Welch. Data Types Beyond Registers. Registers support the operations read and write We've seen wait-free simulations of one kind of register out of another kind different numbers of values, readers, writers - PowerPoint PPT Presentation

Transcript of CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Page 1: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

CSCE 668DISTRIBUTED ALGORITHMS AND SYSTEMS

Fall 2011Prof. Jennifer WelchCSCE 668

Set 18: Wait-Free Simulations Beyond Registers 1

Page 2: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Data Types Beyond Registers

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

2

Registers support the operations read and write

We've seen wait-free simulations of one kind of register out of another kind different numbers of values, readers, writers

What about (wait-free) simulating a significantly different kind of data type out of registers?

More generally, what about (wait-free) simulating an object of type X out of objects of type Y ?

Page 3: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Key Insight

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

3

Ability of objects of type Y to be used to simulate an object of type X is related to the ability of those data types to solve consensus!

We are focusing on systems that are asynchronous shared memory wait-free

Page 4: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

FIFO Queue Example

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

4

Sequential specification of a FIFO queue: operation with invocation enq(x) and

response ack operation with invocation deq and response

return(x) a sequence of operations is allowable iff

each deq returns the oldest enqueued value that has not yet been dequeued (returns if queue is empty)

Page 5: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Consensus Algorithm for n = 2 Using FIFO Queue

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

5

Initially Q = [0] and Prefer[i] =

Prefer[i] := pi's inputval := deq(Q)if val = 0 then

decide on pi's inputelse

temp := Prefer[1 - i]decide temp

one shared FIFO queuetwo shared registers

write my input into my register

use shared queue to arbitrate between the 2 procs: first oneto dequeue the initial 0 wins,decision value is its input

loser obtains decisionvalue from other proc'sregister

Page 6: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Implications of Consensus Algorithm Using FIFO Queue

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

6

Suppose we want to wait-free simulate a FIFO queue using read/write registers.

Is this possible? No! If it were possible, we could solve

consensus: simulate a FIFO queue using registers use simulated queue and previous

algorithm to solve consensus

Page 7: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Extend Algorithm to More Procs?

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

7

Can we use FIFO queues to solve consensus with more than 2 procs?

The ability to atomically dequeue a value was key to the 2-proc alg: one proc. learns it is the winner the other learns it is the loser, therefore the

id of the winner is obvious Not clear how to handle 3 procs. Suppose we have a different data type:

Page 8: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Compare & Swap Specification

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

8

compare&swap(X : shared memory address, old: value, new: value) previous := X // previous is a local var. if previous = old then X := new return previous X

oldnew

Page 9: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Consensus Algorithm Using Compare-and-Swap

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

9

Initially First =

val := compare&swap(First, , my input)

if val = then decide on my input

else decide val

one shared C&S object

simultaneously indicate the winner and the value to be decided by all the losers

if First = then replace with my input

Page 10: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility of 3-Proc Consensus with FIFO Queue

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

10

Theorem (15.3): Wait-free consensus is impossible using FIFO queues and registers if n > 2.

Proof: Same structure as for registers.Key difference is when considering

situation when C is bivalent p0(C) is 0-valent and p1(C) is 1-valent.

Page 11: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility of 3-Proc Consensus with FIFO Queues

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

11

p0 and p1 must be accessing the same FIFO queue.

Case 1: Both steps are deq's.

p0 deq's p1 deq's

C0/1

0 1

p1 deq's p0 deq's

0 1look sameto p2

Page 12: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility Proof

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

12

Case 2: p0 deq's and p1 enq's.Case 2.1: The queue is not empty in C

p0 deq's p1 enq's

C0/1

0 1

p1 enq's p0 deq's

?

Page 13: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility Proof

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

13

Case 2: p0 deq's and p1 enq's.Case 2.2: The queue is empty in C

p0 deq's p1 enq's

C0/1

01

look thesame to p2

p0 deq's

1queue is empty again

queue is empty

queue is still empty

Page 14: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility Proof

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

14

Case 3: Both p0 and p1 enq (on same queue).

p0 enq's A p1 enq's BC 0/1

0 1p1 enq's B p0 enq's A

: p0 takessteps untildeq'ing A

: p1 takessteps untildeq'ing B

: p0 takessteps untildeq'ing B

: p1 takessteps untildeq'ing A

0 1look the same to p2

why do and exist?

Page 15: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility Proof

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

15

Case 3 cont'd: Suppose does not exist:p0 enq's A p1 enq's B

C 0/1

0 1p1 enq's B p0 enq's A

p0 takessteps untildecidingbut neverdeq's A;decides 0

p0 takes same numberof steps as on the left;never deq's B; alsodecides 0

0 1

Page 16: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Impossibility Proof

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

16

Case 3 cont'd: Prove existence of similarly.

Thus there is no wait-free algorithm for consensus with 3 procs using FIFO queues and registers.

Page 17: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Implications

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

17

Suppose we want to wait-free simulate a compare&swap object using FIFO queues (and registers).

Is this possible? Not if n > 2! If it were possible, we could

solve consensus using FIFO queues (and registers): simulate a compare&swap object using

FIFO queues (and registers) use simulated compare&swap object and

c&s algorithm to solve consensus

Page 18: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Generalize these Arguments

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

18

Previous results concerning FIFO queues and compare&swap suggest a criterion for determining if wait-free simulations exist:

based on ability of the data types to solve consensus for a certain number of procs.

Page 19: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Consensus Number

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

19

Data type X has consensus number n if n is the largest number of procs. for which consensus can be solved using only objects of type X and read/write registers.

data type consensus number

read/write register

1

FIFO queue 2compare&swap ∞

Page 20: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Using Consensus Numbers

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

20

Theorem (15.5): If data type X has consensus number m and data type Y has consensus number n with n > m, then there is no wait-free simulation of an object of type Y using objects of type X and read/write registers in a system with more than m procs.

X X X …

reg reg reg …Y

Page 21: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Using Consensus Numbers

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

21

Proof: Suppose in contradiction there is a wait-free simulation S of Y using X and registers in a system with k procs, where m < k ≤ n.

Construct consensus algorithm for k > m procs using objects of type X (and registers): Use S to simulate some objects of type Y using

objects of type X (and registers) Use the (simulated) type Y objects (and registers)

in the k-proc consensus algorithm that exists since CN(Y) = n.

Page 22: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Corollaries

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

22

There is no wait-free simulation of any object with consensus number > 1 using just read/write registers.

There is no wait-free simulation of any object with consensus number > 2 using just FIFO queues and read/write registers.

Page 23: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Universality

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

23

Let's now consider positive results relating to consensus number.

A data type is universal if objects of that type (together with read/write registers) can wait-free simulate any data type.

Theorem: If data type X has consensus number n, then it is universal in a system with at most n procs.

Page 24: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Proving Universality Result

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

24

1. Describe an algorithm that simulates any data type

uses compare&swap (instead of any object with consensus number n)

simulation is only non-blocking, weaker than wait-free

2. Modify to use any object with consensus number n

3. Modify to be wait-free 4. Modify to bound shared memory used

Page 25: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Non-Blocking

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

25

Non-blocking vs. wait-free is analogous to no-deadlock vs. no-lockout for mutual exclusion.

Non-blocking simulation: at any point in an execution, if at least one operation is pending (response is not yet ready to be done), then there is a finite sequence of steps by a single proc that completes one of the pending operations.

Does not ensure that every pending operation is eventually completed.

Page 26: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Universal Construction

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

26

Keep history of operations that have been applied to the simulated object as a shared linked list.

To apply an operation on the simulated object, the invoking proc. must insert an appropriate "node" into the linked list: it is convenient to put the newest node at the

head of the list A compare&swap object is used to keep

track of the head of the list

Page 27: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Details on Linked List

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

27

Each linked list node has operation invocation new state of the simulated object operation response pointer to previous node (previous op)

invocationstateresponsebefore

invocationstateresponsebefore

initial state

anchor

Head

Page 28: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Simulation

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

28

Initially Head points to anchor node represents initial state of simulated object

When inv is invoked:allocate a new linked list node in shared

memory, pointed to by local var pointpoint.inv := invrepeat

h := Head // h is a local varpoint.state, point.response :=

apply(inv,h.state)point.before := h

until compare&swap(Head,h,point) = hdo the output indicated by point.response

depends on simulated data type

if Head still points tosame node h pointsto, then make Headpoint to new node.

Page 29: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Simulation Figure

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

29

invocationstateresponsebefore

Head…

pipoint

h

invocationstateresponse

before

if compare&swapindicates that Head hasmoved on, then try againto insert the new node,at the new location

Page 30: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Strengthenings of Algorithm

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

30

To replace compare&swap object with any object with consensus number n (the number of procs): define a consensus object (data type

version of consensus problem) get around the difficulty that a consensus

object can only be used once by adding a consensus object to each linked list node that points to next node in the list

Page 31: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Strengthenings of Algorithm

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

31

To get a wait-free implementation, use idea of helping: procs help each other to finish pending operations (not just their own)

To reduce the size of the linked list (so it doesn't grow without bound), need to keep track of which list nodes can be recycled.

Page 32: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Effect of Randomization

CSCE 668Set 18: Wait-Free Simulations Beyond Registers

32 Suppose we relax the liveness

condition for linearizable shared memory: operations must terminate with high

probability Now a randomized consensus

algorithm can be used to simulate any data type out of any other data type, including read/write registers

I.e., hierarchy collapses.