Distributed Hash Table Systems Hui Zhang University of Southern California.

49
Distributed Hash Table Systems Hui Zhang University of Southern California

Transcript of Distributed Hash Table Systems Hui Zhang University of Southern California.

Distributed Hash Table Systems

Hui ZhangUniversity of Southern California

CS551 - DHT 2

Outline

• History

• A brief description of DHT systems

General definition

Basic components

• Chord, a DHT system.

• Proximity routing in DHTs

• Reading list for interested audience

CS551 - DHT 3

History: client-server model

server index table data

client a client b client c client d

query

data transferring

CS551 - DHT 4

History: peer-to-peer model (Napster)

server

index table

data client a

client b

client c

client ddata

data

data

query

data transferring

CS551 - DHT 5

History: peer-to-peer model (Gnutella)

query

data transferring

index table

data client a

client b

client c

client d

index table

data

index table

data

index table

data

CS551 - DHT 6

History: peer-to-peer model (DHT systems)

query

data transferring

index table

data client a

client b

client c

client d

index table

data

index table

data

index table

data

CS551 - DHT 7

• a new class of peer-to-peer routing infrastructures

- CAN, Chord, Pastry, Tapestry, Viceroy, Kademlia, etc.

• support a hash table-like functionality on Internet-like scale

- given a key, map it onto a node

• numerous applications, services, and infrastructures are proposed to be built on top of a DHT system.- web caching, - naming service, - event notification,- indirection services,- DoS attack prevention, - application-layer multicast, - etc.

DHT systems - definition

CS551 - DHT 8

Basic DHT components

• an overlay space and its partition approach

• a routing protocol local routing table next-hop decision

• a base hash function variants: proximity-aware, locality-preserving, etc.

CS551 - DHT 9

Consistent hashing [Karger et al. STOC97]

data

server

Overlay space

Hashing

CS551 - DHT 10

• Overlay space- one-dimensional unidirectional key space 0 – 2m-1.

Given two m-bit identifiers x and y, d(x, y)=(y-x+2m) % 2m

- Each node is assigned a random m-bit identifier.

- Each node is responsible for all keys equal to or before its identifier until its predecessor node’s identifier in the key space.

Chord [Stoica et al. Sigcomm2001]

CS551 - DHT 11

A Chord network with 8 nodes and 8-bit key space

0

32

64

96

128

160

192

224

256

Network node0

256Data

120

Chord – an example (m=8)

CS551 - DHT 12

• Routing table (finger table)- (at most) m entries. The ith entry of node n contains the pointer to the first node that succeeds n by at least 2(i-1) on the key space, 1 i m.

• Next-hop decision:- For the given target key k, find the closest finger before (to) k and forward the request to it.- Ending condition: The request terminates when k lies between the ID range of current node and its successor node.

- The routing path length is O(log n) for a n-nodes network with high probability (w.h.p.).

Chord – routing protocol

CS551 - DHT 13

Chord – routing table setup

A Chord network with 8 nodes and m(=8)-bit key space

Network node

2550 64 12832 19296 160 224

[1,2)Range 1

[2,4)Range 2

[4,8)Range 3

[8,16)Range 4

[16,32)Range 5

[32,64)Range 6

[64,128)Range 7

[128,256)Range 8

Data

Pointer

CS551 - DHT 14

Chord – a lookup for key 120

Network node

0 64 12832 19296 160 224

120

Data

255

Overlay routing

physical link

A Chord network with N(=8) nodes and m(=8)-bit key space

CS551 - DHT 15

2. Updating fingers of existing nodes

Chord – node joining

3. Transferring keys

• Between the new node and its successor node.

1. Initializing fingers

CS551 - DHT 16

Chord – stabilization mechanism

• Refresh the pointer to the immediate successor up to date.- guarantee correctness of lookups

• Refresh the pointers to the rest (m-1) fingers up to date.- keep lookups fast.

CS551 - DHT 17

Chord – successors-list mechanism

CS551 - DHT 18

Chord – simulation result

[Stoica et al. Sigcomm2001]

CS551 - DHT 19

The fraction of lookups that fail as a function of the fraction of nodes that fail.

[Stoica et al. Sigcomm2001]

Chord – “failure” experiment

CS551 - DHT 20

Outline

• History

• A brief description of DHT systems

General definition

Basic components

• Chord, a DHT system.

• Proximity routing in DHTs

• Reading list for interested audience

CS551 - DHT 21

Overlay performance of Chord

• Expected routing table size per node- O(log N) entries

• Expected hops per request - O(log N) hops

CS551 - DHT 22

Latency stretch in ChordNetwork node

0 64 12832 19296 160 224

120

Data

255

Overlay routing

physical link

A Chord network with N(=8) nodes and m(=8)-bit key space

64

U.S.A China

0

12896

80

80

CS551 - DHT 23

Latency stretch [Ratnasamy et al. 2001]

• = latency for each lookup on the overlay topology

average latency on the underlying topology

• In Chord, (logN) hops per lookup in average (logN) stretch in original Chord.

Could Chord do better, e.g., O(1) stretch, without much change?

CS551 - DHT 24

Three categories of proximity routing [Gummadi2003]

• Proximity identifier selectionBy carefully assign IDs to the nodes, make the overlay topology congruent to the underlying network topology.

• Proximity route selectionWhen forwarding a request message, the choice of the next hop depends on both the overlay distance and the underlying distance.

• Proximity neighbor selectionThe neighbors in the routing table are chosen based on their proximity.

CS551 - DHT 25

Underlying network topology decides the performance of proximity routing

Chord node

The worse-case scenario: equal-distance

Physical link

Router

CS551 - DHT 26

Latency expansion

• Let Nu(x) denote the number of nodes in the network G that are within latency x of node u.

- power-law latency expansion: Nu(x) grows (i.e.

``expands'‘) proportionally to xd, for all nodes u.

Examples: ring (d=1), mesh (d=2).

- exponential latency expansion: Nu(x) grows

proportionally to x for some constant > 1.

Examples: random graphs.

CS551 - DHT 27

“Latency-stretch” theorem - I• “Bad news” Theorem

If the underlying topology G is drawn from a family of graphs with exponential latency expansion, then the expected latency of Chord is (L•logN), where L is the expected latency between pairs of nodes in G.

Chord node

The worse-case scenario: equal-distance

Physical link

Router

CS551 - DHT 28

“Latency-stretch” theorem - II

• “Good news” TheoremIf

(1) the underlying topology G is drawn from a family of graphs with d-power-law latency expansion, and

(2) for each node u in the Chord network, it samples (log N)d nodes in each range with uniform randomness and keeps the pointer to the nearest node for future routing,

then the expected latency of a request is O(L), where L is the expected latency between pairs of nodes in G.

then the latency stretch of Chord is O(1).

CS551 - DHT 29

Two remaining questions

• How does each node efficiently achieve (log

N)d samples from each range?

• Do real networks have power-law latency

expansion characteristic?

CS551 - DHT 30

Uniform sampling in terms of ranges

For a routing request with (log N) hops, final node t will be a random node in (log N) different ranges

Node x: the node at hop x

Node 0: the request initiator

Node t: the request terminator

routing path

Node 0

node t is in its range-d

Node 1

node t is in its range-x (< d)

Node 2

node t is in its range-y (< x)

Node t

CS551 - DHT 31

Lookup-Parasitic Random Sampling

1. Recursive lookup.

2. Each intermediate hop appends its IP address to the lookup message.

3. When the lookup reaches its target, the target informs each listed hop of its identity.

4. Each intermediate hop then sends one (or a small number) of pings to get a reasonable estimate of the latency to the target, and update its routing table accordingly.

CS551 - DHT 32

LPRS-Chord:

convergence time

Convergence Time

CS551 - DHT 33

LPRS-Chord:

topology with power-law expansion

Ring Stretch(at time 2logN)

CS551 - DHT 34

One remaining question

• Do real networks have power-law latency

expansion characteristic?

CS551 - DHT 35

Internet router-level topology:

latency measurement

• Approximate link latency by geographical latency- assign geo-locations to nodes using Geotrack[Padmanabhan2001].

• A large router-level topology dataset- 320,735 nodes, mapped to 603 distinct cities all over the world.

- 92,824 node pairs are sampled to tractably compute the latency expansion of this large topology.

CS551 - DHT 36

Internet router-level topology:

latency expansion

latency expansion

CS551 - DHT 37

LPRS-Chord on router-level topology

Stretch on the router-level subgraphs (at time 2logN)

CS551 - DHT 38

Outline

• History

• A brief description of DHT systems

General definition

Basic components

• Chord, a DHT system.

• Proximity routing in DHTs

• Reading list for interested audience

CS551 - DHT 39

A few interesting papers on DHT• K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S.

Shenker, I. Stoica. “The Impact of DHT Routing Geometry on Resilience and Proximity.” Sigcomm 2003.

• A. Gupta, B. Liskov, and R. Rodrigues. “Efficient Routing for Peer-to-Peer Overlays.” NSDI 2004.

• H. Balakrishnan, K. Lakshminarayanan, S. Ratnasamy, S. Shenker, I. Stoica, M. Walfish. “A Layered Naming Architecture for the Internet.” Sigcomm 2004.

CS551 - DHT 40

Thank you!

CS551 - DHT 41

Backup slides

CS551 - DHT 42

A simple random sampling solution

Network node

0

A Chord network with m-bit key space

2m-1

Pointer

2m-12m-2

Distance measurement

CS551 - DHT 43

A simple random sampling solution

Network node

0

A Chord network with m-bit key space

2m-1

Pointer

2m-12m-2

Distance measurement

CS551 - DHT 44

Term definition (II)

• Range- for a given node in a Chord overlay with ID j, its i-th range Ri(j) is the interval [j+2i-1, j+2i) on the key space, where 1 i m.

• Frugal routing1. after each hop, the search space for the target reduces by a constant factor, and

2. If w is an intermediate node in the route, v is the destination, and v Ri(w), then the node after w in the route depends only on w and i.

CS551 - DHT 45

LPRS-Chord:

simulation methodology

Phase 1. N nodes join the network one-by-

one.

Phase 2. each node on average inserts four

documents into the network.

Phase 3. each node generates, on average

3logN data requests one-by-one.- LPRS actions are enabled only in Phase 3

- Performance measurement begins at Phase 3

CS551 - DHT 46

Comparison of 5 sampling strategies: definitions

• Consider a lookup that is initiated by node x0, then forwarded to node x1, x2, ..., and finally reaches the request terminator, node xn:

1. Node xi samples node xn, 0 i < n

2. Node xn samples nodes x0, …, xn-1

3. Node xi samples node xi-1, 1 i n

4. Node x0 samples nodes xn

5. Node xi samples node x0, 0 < i n

CS551 - DHT 47

Comparison of 5 sampling strategies: simulation result

CS551 - DHT 48

Zipf-ian document popularity

CS551 - DHT 49

Impact of skewed request distributions