When Parallel met Distributed Hagit Attiya CS, Technion.

40
When Parallel met Distributed Hagit Attiya CS, Technion
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    225
  • download

    0

Transcript of When Parallel met Distributed Hagit Attiya CS, Technion.

Page 1: When Parallel met Distributed Hagit Attiya CS, Technion.

When Parallel met Distributed

Hagit Attiya

CS, Technion

Page 2: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA2

My Qualifications…

• 6 papers in SPAA…

• & one paper in AWOC 1988!– About rings (w/ Snir)

Page 3: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA3

What’s Parallel Computing?

Page 4: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA4

What’s Distributed Computing?

Page 5: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA5

Top Keywords in SPAA (2003-2007)

load balancing (95/11%) randomized algorithms (89/11%)online algorithms (87/10%) QoS (87/10%) sensor networks (83/10%) approximation algorithms (80/9%) simulation (76/9%) fault tolerance (75/9%) wireless networks (74/9%) performance evaluation (67/8%) mobile networks (67/8%) scheduling (64/8%) algorithms (63/7%) network (63/7%) peer-to-peer (59/7%) ad hoc networks (57/7%)

Page 6: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA6

Top Keywords in PODC (2003-2007)

fault tolerance (211/24%) sensor networks (155/18%) distributed algorithms (144/17%) self-stabilization (126/14%) randomized algorithms (104/12%) dominating set (94/11%) ad hoc networks (88/10%) lower bounds (87/10%) security (83/10%) routing (82/9%) scalability (81/9%) shared memory (80/9%) replication (77/9%) reliability (77/9%) distributed systems (77/9%) mobile agents (75/9%)

Page 7: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA7

Let’s Compare

load balancing (95/11%) randomized algorithms (89/11%)online algorithms (87/10%) QoS (87/10%) sensor networks (83/10%) approximation algorithms (80/9%) simulation (76/9%) fault tolerance (75/9%) wireless networks (74/9%) performance evaluation (67/8%) mobile networks (67/8%) scheduling (64/8%) algorithms (63/7%) network (63/7%) peer-to-peer (59/7%) ad hoc networks (57/7%)

fault tolerance (211/24%) sensor networks (155/18%) distributed algorithms (144/17%) self-stabilization (126/14%) randomized algorithms (104/12%) dominating set (94/11%) ad hoc networks (88/10%) lower bounds (87/10%) security (83/10%) routing (82/9%) scalability (81/9%) shared memory (80/9%) replication (77/9%) reliability (77/9%) distributed systems (77/9%) mobile agents (75/9%)

Page 8: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA9

Topics are Merging

It used to be…– Synchronous shared-memory SPAA– Asynchronous message-passing PODC

Nowadays…

The Network is a Computer– Peer-2-peer systems, the grid, clusters

The Computer is a Network– Network on chip, PRAM on chip

Page 9: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA10

What Parallel takes from Distributed?• Uncertainty

• Uncertainty

• Uncertainty

due to asynchrony

due to scale

due to failures

Page 10: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA11

What Distributed takes from Parallel?• Simulations and reductions between

models

and conversely,

• Separation between models

Page 11: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA12

Case in Point: Simulating Shared Memory

[Attiya, Bar-Noy, Dolev, PODC 1990]

• Provide a single-writer multi-reader register in a message-passing system– Accessed by read and write operations

ReadWrite(7)

Write(0)

Page 12: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA13

Read 7

Write(7)

Atomicity (AKA Linearizability)

ReadWrite(7)

Write(0)

Page 13: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA14

(Slight) Complication: Failures

• For now, only crash failures– Processes just stop taking steps– Further complicated due to asynchrony

ReadWrite(7)

Write(0)

Page 14: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA15

Simulating Shared Memory w/ Failures

• Requires a majority of nonfaulty processes

• Otherwise, the system can be partitioned– A read will “miss” the latest write

ReadWrite(7)

Write(0)

Page 15: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA16

Two Inspirations

• Simulation of a PRAM on a synchronous interconnect (e.g., Ultracomputer)

[Upfal, Wigderson, FOCS 1984]

– Complete communication graph or a concentrator– No failures– Replicate data to reduce latency– Access a majority

• The majority consensus approach to concurrency control

[Thomas, TODS 1979]

Theabstraction

Thealgorithm

Page 16: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA17

The Algorithm in a Nutshell

• Each data item has a version number– A sequence of values

• write(d, val, v#)– Waits for n-f oks

• read(d) returns (val, v#)– Waits for n-f responses, pick largest v#– Do a write-back to ensure atomicity of reads

Page 17: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA18

The Algorithm in Action: Write

value:

0

value:

0

value:

0

write 1 wr

ite 1

write

A

X

Page 18: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA19

The Algorithm in Action: Write

value:

1

value:

1

value:

0

write 1 wr

ite 1

write

A

Xok

ok

Page 19: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA20

The Algorithm in Action: Read

value:

1

value:

1

value:

0

Xread

read

read

X

0

1

Page 20: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA21

Implications

• Allows to port algorithms from shared memory to message-passing systems, e.g.,– atomic snapshots– safe consensus– approximate agreement – randomized consensus

• Made the message-passing model “obsolete” when studying computability[Borowsky, Gafni][Herlihy, Shavit][Mostefaoui, Rajsbaum, Raynal]…

Page 21: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA22

An Abstract View: Quorums[Gifford, SOSP 1979][Garcia-Molina, Barbara, JACM 1985]

• read and write quorums – An pair of write-write or write-read quorums has a

large intersection

Page 22: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA23

Sharing with Quorums

Apply the previous algorithm with write and read quorums

Write(7)

Read

Page 23: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA24

More on Quorums

• The simplest quorum system uses majority subsets

• But can pick other quorum systems– When fewer processes fail– So as to optimize the load and

availability of quorums[Naor, Wool, FOCS 1994]

• Separation of concerns…

Page 24: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA25

Even More Robust: Dynamic Changes

RAMBO, e.g., [Lynch, Shvartsman 2002]

• Participants can join or leave

Page 25: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA26

Even More Robust: Dynamic Changes

RAMBO, e.g., [Lynch, Shvartsman 2002]

• Participants can join or leave

• Configuration: participants + set of read & write quorums

• Emulate reads and writes using the quorums (ABD)

Page 26: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA27

RAMBO: Reconfiguration

• Modify the set of participants and the quorums

Page 27: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA28

Reconfiguration

• Modify the set of participants and the quorums

• Need to agree on the new configurationA safe consensus protocol

– Implemented from “shared registers”– May take very long, perhaps even not

terminate full-fledged consensus is impossible in this setting

Page 28: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA29

Reconfiguration: Co-Existence

• Reconfiguration proceeds concurrently with the quorum-based reading and writing algorithm

• When in transition between configurations, use representative quorums from all configurations

Write(7)

Page 29: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA30

Even More Robust: Byzantine Failures• Nodes fail arbitrarily

– They lie, they collude

• Causes– Malicious attacks– Non-deterministic software errors

Page 30: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA31

Byzantine Quorums[Malkhi, Reiter, STOC 1997]

• 3f+1 replicas are needed to survive f failures• 2f+1 replicas is a quorum

– Ensures intersection of size f+1– Need many copies with same v#

• Minimal in an asynchronous network

• There are other quorum systems[Malkhi, Reiter, Wool 1997][Bazzi 1997]…

– Optimizing load and availability

Page 31: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA32

Application: Replicated Servers• Clients invoke operations

servers only respond to them– By the same protocol

• Clients may crash

Page 32: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA33

Disk Paxos

[Gafni, Lamport 2003]

• A protocol for replicated servers in a storage area (SAN) network

• Design a shared memory algorithm• Translate to a SAN algorithm using

ABD– optimized: e.g., remove v#

(can be inferred from protocol messages)

Page 33: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA34

Replicating a Server

Servers

Clients

State:

…State:

…State:

write A wr

ite A

write

A X

Page 34: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA35

Replicating a Server

Servers

Clients

State:

…State:

…State:

…A A

X

Page 35: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA36

Replicating a Server

Servers

Clients

State:

…State:

…State:

…A A

Xwrite B

write B

write B

X

B

Page 36: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA37

State: …AState: …A

State: …AState: …

Byzantine Servers?

Servers

Clients

write A

write

AX

wri

te Awrite A

Page 37: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA38

State: …AState: …A B

State: …BState: …B

Byzantine Servers: Quorums in Action

Servers

Clients

write B write B

X

write Bwrite B

Page 38: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA39

Morale: The Art of Abstraction

• The right abstraction can lead to many crucial algorithms

• Finding the right abstractions is key for designing good systems– Hide enough “under the hood” to provide

system designers good leverage– But not too much, so their implementation is

efficient (or easily admits optimizations)

Page 39: When Parallel met Distributed Hagit Attiya CS, Technion.

June 15, 2008SPAA40

More Context

• Client failures

• Optimizations– Reducing communication and reconfiguration– Improving the common case,

without harming the worst case

• Adaptations to new network technologies– Ad-hoc, mobile, sensor

Page 40: When Parallel met Distributed Hagit Attiya CS, Technion.

Thank you…