Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to...

Structured P2P Overlays

Consistent Hashing –the Basis of Structured P2P

• Intuition:– We want to build a distributed hash table where the number

of buckets stays constant, even if the number of machines changes

• Properties:– Requires a mapping from hash entries to nodes– Don’t need to re-hash everything if node joins/leaves– Only the mapping (and allocation of buckets) needs to

change when the number of nodes changes

• Hybrid (Broker-mediated)– Unstructured+ centralized

• Ex.: Napster (closed)– Unstructured + super peer notion

• Ex.: KazaA, Morpheus (closed)• Unstructured decentralized (or loosely controlled)

+ Files can be anywhere+ Support of partial name and keyword queries– Inefficient search (some heuristics exist) & no

guarantee of finding• Ex.: GTK-Gnutella, Frostwire

• Structured (or tightly controlled, DHT) + Files are rigidly assigned to specific nodes+ Efficient search & guarantee of finding– Lack of partial name and keyword queries• Ex.: Chord, CAN, Pastry, Tapestry, Kademlia

Classification of theP2P File Sharing Systems

MotivationMotivation

How to find data in a distributed file sharing system?

Lookup is the key problem

Internet

PublisherKey=“LetItBe”

Value=MP3 data

Lookup(“LetItBe”)

N1

N2 N3

N5N4Client ?

Centralized SolutionCentralized Solution

Requires O(M) state Single point of failure

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

DB

Central server (Napster)

Distributed Solution (1)Distributed Solution (1)

Worst case O(N) messages per lookup

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

Flooding (Gnutella, Morpheus, etc.)

Distributed Solution (2)Distributed Solution (2) Routed messages (Freenet, Tapestry, Chord, CAN, etc.)

Internet


Value=MP3 data


N1

N2 N3

N5N4Client

Only exact matches

Distributed Hash Tables (DHT)

• Distributed version of a hash table data structure• Stores (key, value) pairs

– The key is like a filename– The value can be file contents

• Goal: Efficiently insert/lookup/delete (key, value) pairs• Each peer stores a subset of (key, value) pairs in the system• Core operation: Find node responsible for a key

– Map key to node– Efficiently route insert/lookup/delete request to this node

Structured Overlays

• Properties– Topology is tightly controlled

• Well-defined rules determine to which other nodes a node connects

– Files placed at precisely specified locations

• Hash function maps file names to nodes

– Scalable routing based on file attributes

• In these systems:– files are associated with a key (produced, e.g., by hashing

the file name) and

– each node in the system is responsible for storing a certain range of keys

Document Routing

• The core of these DHT systems is the routing algorithm

• The DHT nodes form an overlay network with each node having several other nodes as neighbors

• When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key

• The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms

Document Routing Algorithms

• They take, as input, a key and, in response, route a message to the node responsible for that key– The keys are strings of digits of some length– Nodes have identifiers, taken from the same space as the keys (i.e.,

same number of digits)• Each node maintains a routing table consisting of a small subset of nodes

in the system

• When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query– The notion of progress differs from algorithm to algorithm, but in

general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key

Content-Addressable Network (CAN)

• A typical document routing method• Virtual Cartesian coordinate space is used• Entire space is partitioned amongst all the nodes

– every node “owns” a zone in the overall space

• Abstraction– can store data at “points” in the space

– can route from one “point” to another

• Point = node that owns the enclosing zone

CAN Example: Two Dimensional Space

• Space divided between nodes• All nodes cover the entire

space• Each node covers either a

square or a rectangular area of ratios 1:2 or 2:1

• Example: – Node n1:(1, 2) first node

that joins cover the entire space 1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1


• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2


• Node n3:(3, 5) joins space is divided between n1 and n3

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3


• Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5


• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


• Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

CAN: Query Example

• Each node knows its neighbours in the d-space

• Forward query to the neighbour that is closest to the query id

• Example: assume Node n1 queries File Item f4

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Document Routing – CAN

• Associate to each node and item a unique id (nodeIdnodeId and fileIdfileId) in an d-dimensional space

• Goals– Scales to hundreds of thousands of nodes

– Handles rapid arrival and failure of nodes

• Properties – Routing table size O(d)

– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Resource DiscoveryResource Discovery

Associative array

• An associative array is an abstract data type– It is composed of a collection of (key, value) pairs

• Data structure is a way of storing and organizing data in a computer to be used efficiently

• The dictionary problem is the task of designing a data structure of an associative array– A solution: the hash table

• Binding: the association between a key and a value

Hashing

• Any algorithm that maps data of a variable length to data of a fixed length

• It is used to generate fixed-length output data– It is a shortened reference to the original data

• In 1953, D. Knuth and H.P. Luhn of IBM used the concept of hashing

• In 1973, R. Morris used the term ”hashing” in a formal terminology– Before it was used in the technical jargon, only

Distributed Hash Tables

• Key identifies data uniquely• DHT balances keys and data across nodes• DHT replicates, caches, routes lookups, etc.

Distributed hash tables

Distributed applications

Lookup (key) data

node node node….

Insert(key, data)

DHT Applications

• Many services can be built on top of a DHT interface– File sharing– Archival storage– Databases– Chat service– Rendezvous-based communication– Publish/Subscribe systems

Chord

• It is a protocol and algorithm• It was introduced in 2001 by I. Stoica, R. Morris, D. Karger,

F. Kaashoek, & H. Balakrishnan• It is based on the SHA-1 hashing algorithm

– Secure Hash Standard

• It uses m-bit identifiers (IDs)– for the hashed IP addresses of the computers (nodes)

– for the hashed data (keys)

• A logical ring with positions numbered 0 to 2m-1 is formed among nodes

P2P Ring

• Nodes are arranged in a ring based on ID

• identifiers are arranged on a identifier circle modulo 2m => Chord ring

• IDs are assigned randomly• Very large ID space

– m is large enough to make collisions improbable

Construction of the Chord ring

• Data items (keys) also have IDs

• Every node is responsible for a subset of the keys

• a key is assigned to the node whose ID (IDnode) is equal to or greater than the IDkey

• this node is called successor of the key and is the first node clockwise from the IDkey

Chord structure

• Every node is responsible for a subset of the data

• Routing algorithm locates data, with small per-node routing state

• Volunteer nodes join and leave system at any time

• All nodes have identical responsibilities

• All communication is symmetric

Route(d46a1c)

65a1fc

d13da3

d4213f

d462bad467c4

d471f1

d46a1c

Every node knows of every other node requires global information

Routing tables are large O(n) Lookups are fast O(1)

N32

N90

N123

0

Hash(“LetItBe”) = K60

N10

N55

Where is “LetItBe”?

“N90 has K60”

K60

Lookup with global knowledge

Successor and predecessor• Each node has a successor and a predecessor• The successor of a given node is that node, whose ID is

equal to or follows the identifier of the given node• If there are n nodes and k keys, then each node is

responsible for about k/n keys• Basic case - each node knows only the location of its

successor• Increasing the robustness - using more than one

successors– Each node knows r immediate successors

– After failure, will know first live successor

• Predecessor has less importance

Lookup with local knowledge// ask node n to find the successor

of id

n.find_successor(id)

if (id (n; successor])

return successor;

else

// forward the query around the circle

return successor.find_successor(id);

• Disadvantage– Number of messages linear in the

number of nodes• O (n )

Modulo operator

• Modulo operation finds the remainder of division of one number by another

• x mod y ↔ x - (y * int (x / y) )• Examples:

10 mod 8 = 2

5 mod 5 = 0

Lookup with routing (finger) table

• Additional routing information to accelerate lookups• Each node contains a routing table with up to m

entries => finger table– m is number of bits of the identifiers– Every node knows m other nodes in the ring

• The ith entry of a given node N contains the address of successor((N+2i-1) mod 2m)– Increase distance exponentially

Division of the distance by finger tables

N80

½¼

1/8

1/161/321/641/128

Finger i points to successor of N+2i

In this case N=80, i=5

N120

N8080 + 20

N112

N96

N16

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

Routing with finger tables

• Route via binary search• Use fingers first• Then successors not

belonging into fingers• Cost is O(log n)

Scalable node localization

Finger table:

finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n

Scalable node localizationImportant characteristics of this scheme:• Each node stores information about only a small

number of nodes (m)• Each node knows more about nodes closely following it

than about nodes farer away• A finger table generally does not contain enough

information to directly determine the successor of an arbitrary key k

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130

finger table

start int. succ.

keys

1

235

[2,3)[3,5)[5,1)

330

finger table

start int. succ.

keys

2

457

[4,5)[5,7)[7,3)

000

finger table

start int. succ.

keys

6

Another example for finger tables

m=3((N+2i-1) mod 2m), N=0,…,n

int. = node N search for successor in the interval int.

Node joins with finger tables

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130

finger table

start int. succ.

keys

1

235

[2,3)[3,5)[5,1)

330

finger table

start int. succ.

keys

2

457

[4,5)[5,7)[7,3)

000

finger table

start int. succ.

keys

finger table

start int. succ.

keys

702

[7,0)[0,2)[2,6)

003

6

6

66

6

int. = node N search for successor in the interval int.

Node departures with finger tables

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

1

3

finger table

start int. succ.

keys

1

235

[2,3)[3,5)[5,1)

330

finger table

start int. succ.

keys

2

457

[4,5)[5,7)[7,3)

660

finger table

start int. succ.

keys

finger table

start int. succ.

keys

702

[7,0)[0,2)[2,6)

003

6

6

6

0

3

Three-step process:

Initialize all fingers of new node

Update fingers of existing nodes

Transfer keys from successor to new node

Joining the Chord ring

Initialize the new node finger table

Locate any node N in the ring

Ask node N to lookup fingers of new node N36

Return results to new node

N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Joining the Chord ring – step 1

Updating fingers of existing nodes

new node calls update function on existing nodes

existing nodes can recursively update fingers of other nodes

N36

N60

N40

N5

N20N99

N80


Transfer keys from successor node to new node

only keys in the range are transferred

Copy keys 21..36from N40 to N36

K30K38

N36

N60

N40

N5

N20N99

N80

K30

K38


Joining the Chord ring

• To ensure correct lookups, all successor pointers must be up to date

• => stabilization protocol running periodically in the background

• Updates finger tables and successor pointers

Lookup Mechanism Lookups take O (Log n) hops

n is the total number of nodes

Lookup: route to closest predecessor

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Cost of lookup is O(log n)

Number of Nodes

Avera

ge M

ess

ag

es

per

Looku

p

Measured lookup procedure

Handling failures: redundancy

N32

N10

N5

N20

N110

N99

N80

N60

• Each node knows IP addresses of next r nodes• Each key is replicated at next r nodes

N40

K19

K19

K19

Lookups find replicas

N40

N10

N5

N20

N110

N99

N80

N60

N50

N68

1.2.

3.

4.

Lookup(K19)

K19

Chord software

• 3000 lines of C++ code• Library to be linked with the application• provides a lookup(key) – function: yields the

IP address of the node responsible for the key

• Notifies the node of changes in the set of keys the node is responsible for

Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to...

Documents

Transcript of Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to...