1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming...

87
1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory

Transcript of 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming...

Page 1: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

1© R. Guerraoui

Concurrent Algorithms(Overview)

Prof R. GuerraouiDistributed Programming Laboratory

Page 2: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

2

In short

This course is about the principles of robust concurrent computing

Page 3: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

3

Certain things are incorrect and it is important to understand why (at least what correctness means)

Certain things are impossible and its important to understand why (at least to not try)

Principles Principles

Page 4: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

4

WARNING

This course is different from the master course : Distributed Algorithms

This course is about shared memory whereas the other one is about message passing systems

It does make a lot of sense to take both

Page 5: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

5

Major chip manufacturers have recently announced what is perceived as a major paradigm shift in computing:

Multiprocessors vs faster processors

Maybe Moore was wrong…

Page 6: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

6

Major chip manufacturers have recently announced a major paradigm shift:

New York Times, 8 May 2004:Intel … [has] decided to focus its development efforts on «dual core» processors … with two engines instead of one, allowing for greater efficiency because the processor workload is essentially shared.

Page 7: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

7

The clock speed of a processor cannot be increased without overheating

But

More and more processors can fit in the same space

Page 8: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

8

Multicores Multicores areare almost almost everywhereeverywhere

Dual-core commonplace in laptopsQuad-core in desktopsDual quad-core in serversAll major chip manufacturers produce multicore CPUs

SUN Niagara (8 cores, 32 threads)Intel Xeon (4 cores)AMD Opteron (4 cores)

Page 9: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

9

AMD Opteron (4 cores)AMD Opteron (4 cores)

L1 cache

L2 cache

L3 cache(shared)

Page 10: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

10

SUN’s Niagara CPU2 (8 SUN’s Niagara CPU2 (8 cores)cores)

Page 11: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

11

MultiprocessorsMultiprocessors

Multiple hardware processors, each executes a series of processes (software constructs) modeling sequential programs

Multicore architecture: multiple processors are placed on the same chip

Page 12: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

12

Principles of an Principles of an architecturearchitecture

Two fundamental components that fall apart: processors and memory

The Interconnect links the processors with the memory:- SMP (symmetric): bus (a tiny Ethernet)- NUMA (network): point-to-point network

Page 13: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

13

CyclesCycles

The basic unit of time is the cycle: time to execute an instruction

This changes with technology but the relative cost of instructions (local vs memory) does not

Page 14: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

14

Simple view

MemoryMemory

BusBus

ProcessoProcessor + r +

CacheCache

Page 15: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

15

Hardware synchronization Hardware synchronization objectsobjects

The basic unit of communication is the read and write to the memory (through the cache)

More sophisticated objects are sometimes provided and, as we will see, necessary: C&S, T&S, LL/SC

Page 16: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

16

The free ride is overThe free ride is over

Cannot rely on CPUs getting faster in every generation

Utilizing more than one CPU core requires concurrency

Page 17: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

17

The free ride is overThe free ride is over

One of the biggest future software challenges: exploiting concurrency

Every programmer will have to deal with itConcurrent programming is hard to get right

Page 18: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

18

Speed will be achieved by having several processors work on independent parts of a task

But

the processors would occasionally need to pause and synchronize

Page 19: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

19

Why synchronize?

But

If the task is indeed common, then pure parallelism is usually impossible and, at best, inefficient

Page 20: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

20

Shared object

Concurrent processes

Page 21: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

21

public class Counter

private long value;

public Counter(int i) { value = i;}

public long getAndIncrement()

{

return value++;

}

Counter

Page 22: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

22

Locked object

Locking (mutual exclusion)

Page 23: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

23

Locking with compare&swap()

A Compare&Swap object maintains a value x, init to , and y;

It provides one operation: c&s(v,w);

Sequential spec: c&s(old,new)

{y := x; if x = old then x := new; return(y)}

Page 24: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

24

lock() {

repeat until

unlocked = this.c&s(unlocked,locked)

}

unlock() {

this.c&s(locked,unlocked)

}

Locking with compare&swap()

Page 25: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

25

Locking with test&set()

A test&set object maintains binary values x, init to 0, and y;

It provides one operation: t&s()

Sequential spec: t&s() {y := x; x: = 1; return(y);}

Page 26: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

26

lock() {

repeat until (0 = this.t&s());

}

unlock() {

this.setState(0);

}

Locking with test&set()

Page 27: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

27

lock() {

while (true)

{

repeat until (0 = this.getState());

if 0 = (this.t&s()) return(true);

}

}

unlock() {

this.setState(0);

}

Locking with test&set()

Page 28: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

28

Lock l = ...;

l.lock();

try {

// access the resource protected by this lock

} finally {

l.unlock();

}

Explicit use of a lock

Page 29: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

29

public class SynchronizedCounter {

private int c = 0;

public synchronized void increment() {

c++;

}

public synchronized void getAndincrement() {

c++; return c;

}

public synchronized int value() {

return c;

}

}

Implicit use of a lock

Page 30: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

30

Locking (mutual exclusion)

Difficult: 50% of the bugs reported in Java come from the use of « synchronized »

Fragile: a process holding a lock prevents all others from progressing

Page 31: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

31

Locked object

One process at a time

Page 32: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

32

Processes are asynchronous

Page faultsPre-emptionsFailuresCache misses, …

Page 33: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

33

Processes are asynchronous

A cache miss can delay a process by ten instructionsA page fault by few millionsAn os preemption by hundreds of millions…

Page 34: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

34

Coarse grained locks => slow

Fine grained locks => errors

Page 35: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

35

Double-ended queue

Enqueue Dequeue

Page 36: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

36

Processes are asynchronous

Page faults, pre-emptions, failures, cache misses, …

A process can be delayed by millions of instructions …

Page 37: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

37

Alternative to locking?

Page 38: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

38

Wait-free atomic objects

Wait-freedom: every process that invokes an operation eventually returns from the invocation (robust … unlike locking)

Atomicity: every operation appears to execute instantaneously (as if the object was locked…)

Page 39: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

39

In short

This course shows how to wait-free implement high-

level atomic objects out of more primitive base objects

Page 40: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

40Shared object

Concurrent processes

Page 41: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

41

This course

Theoretical but no specific theoretic background

Written exam at the end of the semester

Page 42: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

42

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 43: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

43

Processes

We assume a finite set of processes

Processes are denoted by p1,..pN or p, q, r

Processes have unique identities and know each other (unless explicitly stated otherwise)

Page 44: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

44

Processes

Processes are sequential units of computations

Unless explicitly stated otherwise, we make no assumption on process (relative) speed

Page 45: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

45

Processes

p1

p2

p3

Page 46: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

46

ProcessesA process either executes the algorithm assigned to it or crashes

A process that crashes does not recover (in the context of the considered computation)

A process that does not crash in a given execution (computation or run) is called correct (in that execution)

Page 47: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

47

Processes

p1

p2

p3

crash

Page 48: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

48

On objects and processes

Processes execute local computation or access shared objects through their operations

Every operation is expected to return a reply

Page 49: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

49

Processes

p1

p2

p3

operation

operation

operation

Page 50: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

50

On objects and processesSequentiality means here that, after invoking an operation op1 on some object O1, a process does not invoke a new operation (on the same or on some other object) until it receives the reply for op1

Remark. Sometimes we talk about operations when we should be talking about operation invocations

Page 51: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

51

Processes

p1

p2

p3

operation

operation

operation

Page 52: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

52

AtomicityWe mainly focus in this course on how to implement atomic objects

Atomicity means that every operation appears to execute at some indivisible point in time (called linearization point) between the invocation and reply time events

Page 53: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

53

Atomicity

p1

p2

p3

operation

operation

operation

Page 54: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

54

Atomicity

p1

p2

p3

operation

operation

operation

Page 55: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

55

Atomicity (the crash case)

p1

p2

p3

operation

operation

operation

p2

crash

Page 56: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

56

Atomicity (the crash case)

p1

p2

p3

operation

operation

operation

p2

Page 57: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

57

Atomicity (the crash case)

p1

p2

p3

operation

operation

p2

Page 58: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

58

Wait-freedomWe mainly focus in this course on wait-free implementations

An implementation is wait-free if any correct process that invokes an operation eventually gets a reply, no matter what happens to the other processes (crash or very slow)

Page 59: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

59

Wait-freedom

p1

p2

p3

operation

Page 60: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

60

Wait-freedomWait-freedom conveys the robustness of the implementation

With a wait-free implementation, a process gets replies despite the crash of the n-1 other processes

Note that this precludes implementations based on locks (mutual exclusion)

Page 61: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

61

Wait-freedom

p1

p2

p3

crash

operation

crash

Page 62: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

62

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 63: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

63

Most synchronization primitives (problems) can be precisely expressed as atomic objects (implementations)

Studying how to ensure robust synchronization boils down to studying wait-free atomic object implementations

Motivation

Page 64: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

64

Example 1

The reader/writer synchronization problem corresponds to the register object

Basically, the processes need to read or write a shared data structure such that the value read by a process at a time t, is the last value written before t

Page 65: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

65

Register

A register has two operations: read() and write()

We assume that a register contains an integer for presentation simplicity, i.e., the value stored in the register is an integer, denoted by x (initially 0)

Page 66: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

66

Sequential specification

Sequential specification

read()

return(x)

write(v)

x <- v;

return(ok)

Page 67: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

67

Atomicity?

p1

p2

p3

write(1) - ok

read() - 2

write(2) - ok

Page 68: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

68

Atomicity?

p1

p2

p3

write(1) - ok

read() - 2

write(2) - ok

Page 69: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

69

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

write(2) - ok

Page 70: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

70

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

write(2) - ok

Page 71: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

71

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 1

Page 72: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

72

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 0

Page 73: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

73

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 74: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

74

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 75: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

75

Atomicity?

p1

p2

p3

write(1) - ok

read() - 0

read() - 0

Page 76: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

76

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 0

Page 77: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

77

Atomicity?

p1

p2

p3

write(1) - ok

read() - 1

read() - 1

Page 78: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

78

Example 2

The producer/consumer synchronization problem corresponds to the queue object

Producer processes create items that need to be used by consumer processes

An item cannot be consumed by two processes and the first item produced is the first consumed

Page 79: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

79

Queue

A queue has two operations: enqueue() and dequeue()

We assume that a queue internally maintains a list x which exports operation appends() to put an item at the end of the list and remove() to remove an element from the head of the list

Page 80: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

80

Sequential specification

dequeue()

if(x=0) then return(nil);

else return(x.remove())

enqueue(v)

x.append(v);

return(ok)

Page 81: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

81

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

deq() - x

enq(y) - ok

Page 82: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

82

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

deq() - x

enq(y) - ok

Page 83: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

83

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

enq(y) - ok

Page 84: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

84

Atomicity?

p1

p2

p3

enq(x) - ok

deq() - y

enq(y) - ok

Page 85: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

85

Roadmap

Model Processes and objects Atomicity and wait-freedom

ExamplesContent

Page 86: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

86

Content

(1) Implementing registers

(2) The power & limitation of registers

(3) Universal objects & synchronization number

(4) The power of time & failure detection

(5) Tolerating failure prone objects

(6) Anonymous implementations

(7) Transaction memory

Page 87: 1 © R. Guerraoui Concurrent Algorithms (Overview) Prof R. Guerraoui Distributed Programming Laboratory.

87

In short This course shows how to wait-

free implement high-level atomic objects out of basic objects

Remark. Unless explicitly stated otherwise, objects mean atomic objects and implementations are wait-free