DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications

47
DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications Frans Kaashoek [email protected] Joint work with: H. Balakrishnan, P. Druschel , J. Hellerstein , D. Karger, R. Karp, J. Kubiatowicz, B. Liskov, D. Mazières, R. Morris, S. Shenker, I. Stoica

description

DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications. Frans Kaashoek [email protected] Joint work with: H. Balakrishnan, P. Druschel , J. Hellerstein , D. Karger, R. Karp, J. Kubiatowicz, B. Liskov, D. Mazi è res, R. Morris, S. Shenker, I. Stoica. - PowerPoint PPT Presentation

Transcript of DISTRIBUTED HASH TABLES Building large-scale, robust distributed applications

Page 1: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

DISTRIBUTED HASH TABLESBuilding large-scale, robust

distributed applications

Frans [email protected]

Joint work with: H. Balakrishnan, P. Druschel , J. Hellerstein , D. Karger, R.

Karp, J. Kubiatowicz, B. Liskov, D. Mazières, R. Morris, S. Shenker, I.

Stoica

Page 2: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

P2P: an exciting social development

• Internet users cooperating to share, for example, music files• Napster, Gnutella, Morpheus, KaZaA, etc.

• Lots of attention from the popular press“The ultimate form of democracy on the

Internet” “The ultimate threat to copy-right protection

on the Internet”

• Many vendors have launched P2P efforts

Page 3: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

What is P2P?

• A distributed system architecture:• No centralized control• Nodes are symmetric in function

• Typically many nodes, but unreliable and heterogeneous

Client

Client

Client Client

Client

Internet

Page 4: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Traditional distributed computing:

client/server

• Successful architecture, and will continue to be so• Tremendous engineering necessary to make server farms scalable and

robust

Server

Client

Client Client

Client

Internet

Page 5: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Application-level overlays

• One per application• Nodes are decentralized• NOC is centralized

ISP3

ISP1 ISP2

Site 1

Site 4

Site 3Site 2

N

N N

N

N

N

P2P systems are overlay networks without central control

Page 6: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

(Potential) P2P advantages

• Allows for scalable incremental growth

• Aggregate tremendous amount of computation and storage resources

• Tolerate faults or intentional attacks

Page 7: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Example P2P problem: lookup

Internet

N1

N2 N3

N6N5

N4

Publisher

Key=“title”Value=file data… Client

Lookup(“title”)

?

• At the heart of all P2P systems

Page 8: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Centralized lookup (Napster)

Publisher@

Client

Lookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

Simple, but O(N) state and a single point of failure

Key=“title”Value=file data…

N4

Page 9: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Flooded queries (Gnutella)

N4Publisher@

Client

N6

N9

N7N8

N3

N2N1

Robust, but worst case O(N) messages per lookup

Key=“title”Value=MP3 data…

Lookup(“title”)

Page 10: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Another approach: distributed hash tables

• Nodes are the hash buckets• Key identifies data uniquely• DHT balances keys and data across nodes• DHT replicates, caches, routes lookups, etc.

Distributed hash tables

Distributed applications

Lookup (key) data

node node node….

Insert(key, data)

Page 11: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Why DHTs now?

• Demand pulls• Growing need for security and robustness• Large-scale distributed apps are difficult to build• Many applications use location-independent data

• Technology pushes• Bigger, faster, and better: every PC can be a server• Scalable lookup algorithms are available• Trustworthy systems from untrusted components

Page 12: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

DHT is a good interface

• Supports a wide range of applications, because few restrictions

• Keys have no semantic meaning• Value is application dependent

• Minimal interface

lookup(key) dataInsert(key, data)

Send(IP address, data)Receive (IP address) data

DHT UDP/IP

Page 13: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

DHT is a good shared infrastructure

• Applications inherit some security and robustness from DHT• DHT replicates data• Resistant to malicious participants

• Low-cost deployment• Self-organizing across administrative domains• Allows to be shared among applications

• Large scale supports Internet-scale workloads

Page 14: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

DHTs support many applications

• File sharing [CFS, OceanStore, PAST, …]• Web cache [Squirrel, ..]• Censor-resistant stores [Eternity, FreeNet,..]• Event notification [Scribe]• Naming systems [ChordDNS, INS, ..]• Query and indexing [Kademlia, …]• Communication primitives [I3, …]• Backup store [HiveNet]• Web archive [Herodotus]

data is location-independent

Page 15: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Cooperative read-only file sharing

• DHT is a robust block store• Client of DHT implements file system

Distributed hash tables

File system

Lookup (key) block

node node node….

insert (key, block)

Page 16: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

File representation:self-authenticating data

• DHT key for block is SHA-1(content block)• File and file systems form Merkle hash trees

995:key=901key=732Signature

File System key=995

……

“a.txt” ID=144

key=431key=795

(root block)

(directory blocks)

(i-node block)

(data)

901= SHA-1 144 = SHA-1431=SHA-1

Page 17: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

DHT distributes blocks by hashing IDs

InternetNode ANode C

Node B

Node D

995:key=901key=732Signature

Block732

Block901

247:key=407key=992key=705Signature

Block992

Block407

Block705

• DHT replicates blocks for fault tolerance

• DHT caches popular blocks for load balance

Page 18: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Historical web archiver

• Goal: make and archive a daily check point of the Web

• Estimates: • Web is about 57 Tbyte, compressed

HTML+img• New data per day: 580 Gbyte 128 Tbyte per year with 5 replicas

• Design:• 12,810 nodes: 100 Gbyte disk each

and 61 Kbit/s per node

Page 19: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Implementation using DHT

• DHT usage:• Crawler distributes crawling load by hash(URL)• Crawler inserts Web pages by hash(URL)• Client retrieve Web pages by hash(URL)

• DHT replicates data for fault tolerance

Distributed hash tables

Crawler

Lookup (URL)

node node node….

Insert(sha-1(URL), page)Client

Insert(sha-1(URL), URL)

Page 20: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Backup store• Goal: backup on other user’s machines• Observations

• Many user machines are not backed up• Backup requires significant manual effort• Many machines have lots of spare disk space

• Using DHT:• Merkle tree to validate integrity of data• Administrative and financial costs are less for all

participants• Backups are robust (automatic off-site backups)• Blocks are stored once, if key = sha1(data)

Page 21: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Research challenges

1. Scalable lookup2. Balance load (flash crowds)3. Handling failures4. Coping with systems in flux5. Network-awareness for performance6. Robustness with untrusted participants7. Programming abstraction8. Heterogeneity9. Anonymity

Goal: simple, provably-good algorithms

thistalk

Page 22: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

1. Scalable lookup• Map keys to nodes in

a load-balanced way• Hash keys and nodes

into a string of digit• Assign key to “closest”

node

Examples: CAN, Chord, Kademlia, Pastry, Tapestry, Viceroy, ….

K20K5

K80

CircularID space N32

N90

N105

N60• Forward a lookup for a key to a closer node

• Insert: lookup + store

• Join: insert node in ring

Page 23: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Chord’s routing table: fingers

N80

½¼

1/8

1/161/321/641/128

Page 24: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Lookups take O(log(N)) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

• Lookup: route to closest predecessor

Page 25: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

CAN: exploit d dimensions

• Each node is assigned a zone• Nodes are identified by zone boundaries• Join: chose random point, split its zone

(0,0) (1,0)

(0,1)

(0,0.5, 0.5, 1)

(0.5,0.5, 1, 1)

(0,0, 0.5, 0.5)

• •

••

(0.5,0.25, 0.75, 0.5)

(0.75,0, 1, 0.5)•

Page 26: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Routing in 2-dimensions

• Routing is navigating a d-dimensional ID space• Route to closest neighbor in direction of destination• Routing table contains O(d) neighbors

• Number of hops is O(dN1/d)

(0,0) (1,0)

(0,1)

(0,0.5, 0.5, 1)

(0.5,0.5, 1, 1)

(0,0, 0.5, 0.5) (0.75,0, 1, 0.5)

••

(0.5,0.25, 0.75, 0.5)

Page 27: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

2. Balance load

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

• Hash function balances keys over nodes

• For popular keys, cache along the path

K19

Page 28: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Why Caching Works Well

N20

• Only O(log N) nodes have fingers pointing to N20• This limits the single-block load on N20

Page 29: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

3. Handling failures: redundancy

N32

N10

N5

N20

N110

N99

N80

N60

• Each node knows IP addresses of next r nodes• Each key is replicated at next r nodes

N40

K19

K19

K19

Page 30: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Lookups find replicas

N40

N10

N5

N20

N110

N99

N80

N60

N50

N68

1.2.

3.

4.

Lookup(K19)

• Tradeoff between latency and bandwidth [Kademlia]

K19

Page 31: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

4. Systems in flux

• Lookup takes log(N) hopsIf system is stableBut, system is never stable!

• What we desire are theorems of the type:

1. In the almost-ideal state, ….log(N)…2. System maintains almost-ideal state

as nodes join and fail

Page 32: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Half-life [Liben-Nowell 2002]

• Doubling time: time for N joins • Halfing time: time for N/2 old nodes to fail• Half life: MIN(doubling-time, halfing-time)

N nodes

N new nodes join

N/2 old nodes leave

Page 33: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Applying half life

• For any node u in any P2P networks:If u wishes to stay connected with high

probability, then, on average, u must be notified

about (log N) new nodes per half life

• And so on, …

Page 34: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

5. Optimize routing to reduce latency

• Nodes close on ring, but far away in Internet• Goal: put nodes in routing table that result in

few hops and low latency

CA-T1CCIArosUtah

CMU

To vu.nlLulea.se

MITMA-CableCisco

Cornell

NYU

OR-DSLN20

N40N80N41

Page 35: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

“close” metric impacts choice of nearby nodes

• Chord’s numerical close and table restrict choice• Prefix-based allows for choice• Kademlia’s offers choice in nodes and places

nodes in absolute order: close (a,b) = XOR(a, b)

N32N103

N105

N60

N06 USA

Europe

USA

Far east

USA

K104

Page 36: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Neighbor set

• From k nodes, insert nearest node with appropriate prefix in routing table

• Assumption: triangle inequality holds

N32N103

N105

N60

N06 USA

Europe

USA

Far east

USA

K104

Page 37: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Finding k near neighbors

1. Ping random nodes2. Swap neighbor sets with

neighbors• Combine with random pings to

explore

3. Provably-good algorithm to find nearby neighbors based on sampling [Karger and Ruhl 02]

Page 38: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

6. Malicious participants

• Attacker denies service• Flood DHT with data

• Attacker returns incorrect data [detectable]• Self-authenticating data

• Attacker denies data exists [liveness]• Bad node is responsible, but says no• Bad node supplies incorrect routing info• Bad nodes make a bad ring, and good node joins

it

Basic approach: use redundancy

Page 39: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Sybil attack [Douceur 02]

• Attacker creates multiple identities

• Attacker controls enough nodes to foil the redundancy

N32

N10

N5

N20

N110

N99

N80

N60

N40

Need a way to control creation of node IDs

Page 40: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

One solution: secure node IDs

• Every node has a public key• Certificate authority signs public

key of good nodes• Every node signs and verifies

messages• Quotas per publisher

Page 41: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Another solution:exploit practical byzantine

protocols

• A core set of servers is pre-configured with keys and perform admission control• The servers achieve consensus with a practical byzantine recovery protocol

[Castro and Liskov ’99 and ’00]• The servers serialize updates [OceanStore] or assign secure node Ids

[Configuration service]

N32N103

N105

N60

N06N

N

N

N

Page 42: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

A more decentralized solution:

weak secure node IDs

• ID = SHA-1 (IP-address node)• Assumption: attacker controls limited IP

addresses

• Before using a node, challenge it to verify its ID

Page 43: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Using weak secure node IDS

• Detect malicious nodes• Define verifiable system properties

• Each node has a successor• Data is stored at its successor

• Allow querier to observe lookup progress• Each hop should bring the query closer

• Cross check routing tables with random queries

• Recovery: assume limited number of bad nodes

• Quota per node ID

Page 44: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

7. Programming abstraction

• Blocks versus files• Database queries (join, etc.)• Mutable data (writers)• Atomicity of DHT operations

Page 45: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Philosophical questions

• How decentralized should systems be?• Gnutella versus content distribution network• Have a bit of both? (e.g., OceanStore)

• Why does the distributed systems community have more problems with decentralized systems than the networking community?• “A distributed system is a system in which a

computer you don’t know about renders your own computer unusable”

• Internet (BGP, NetNews)

Page 46: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

What are we doing at MIT?

• Building a system based on Chord• Applications: CFS, Herodotus, Melody,

Backup store, R/W file system, …• Collaborate with other institutions

• P2P workshop• Big ITR

• Building a large-scale testbed• RON, PlanetLab

Page 47: DISTRIBUTED HASH TABLES Building large-scale, robust  distributed applications

Summary

• Once we have DHTs, building large-scale distributed applications is easy• Single, shared infrastructure for many applications• Robust in the face of failures and attacks• Scalable to large number of servers• Self configuring across administrative domains• Easy to program

• Let’s build DHTs …. stay tuned ….