Exploiting Route Redundancy via Structured Peer to Peer Overlays
Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to...
-
Upload
mavis-tucker -
Category
Documents
-
view
226 -
download
0
Transcript of Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to...
Structured P2P Overlays
Consistent Hashing –the Basis of Structured P2P
• Intuition:– We want to build a distributed hash table where the number
of buckets stays constant, even if the number of machines changes
• Properties:– Requires a mapping from hash entries to nodes– Don’t need to re-hash everything if node joins/leaves– Only the mapping (and allocation of buckets) needs to
change when the number of nodes changes
• Hybrid (Broker-mediated)– Unstructured+ centralized
• Ex.: Napster (closed)– Unstructured + super peer notion
• Ex.: KazaA, Morpheus (closed)• Unstructured decentralized (or loosely controlled)
+ Files can be anywhere+ Support of partial name and keyword queries– Inefficient search (some heuristics exist) & no
guarantee of finding• Ex.: GTK-Gnutella, Frostwire
• Structured (or tightly controlled, DHT) + Files are rigidly assigned to specific nodes+ Efficient search & guarantee of finding– Lack of partial name and keyword queries• Ex.: Chord, CAN, Pastry, Tapestry, Kademlia
Classification of theP2P File Sharing Systems
MotivationMotivation
How to find data in a distributed file sharing system?
Lookup is the key problem
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client ?
Centralized SolutionCentralized Solution
Requires O(M) state Single point of failure
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
DB
Central server (Napster)
Distributed Solution (1)Distributed Solution (1)
Worst case O(N) messages per lookup
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
Flooding (Gnutella, Morpheus, etc.)
Distributed Solution (2)Distributed Solution (2) Routed messages (Freenet, Tapestry, Chord, CAN, etc.)
Internet
PublisherKey=“LetItBe”
Value=MP3 data
Lookup(“LetItBe”)
N1
N2 N3
N5N4Client
Only exact matches
Distributed Hash Tables (DHT)
• Distributed version of a hash table data structure• Stores (key, value) pairs
– The key is like a filename– The value can be file contents
• Goal: Efficiently insert/lookup/delete (key, value) pairs• Each peer stores a subset of (key, value) pairs in the system• Core operation: Find node responsible for a key
– Map key to node– Efficiently route insert/lookup/delete request to this node
Structured Overlays
• Properties– Topology is tightly controlled
• Well-defined rules determine to which other nodes a node connects
– Files placed at precisely specified locations
• Hash function maps file names to nodes
– Scalable routing based on file attributes
• In these systems:– files are associated with a key (produced, e.g., by hashing
the file name) and
– each node in the system is responsible for storing a certain range of keys
Document Routing
• The core of these DHT systems is the routing algorithm
• The DHT nodes form an overlay network with each node having several other nodes as neighbors
• When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key
• The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms
Document Routing Algorithms
• They take, as input, a key and, in response, route a message to the node responsible for that key– The keys are strings of digits of some length– Nodes have identifiers, taken from the same space as the keys (i.e.,
same number of digits)• Each node maintains a routing table consisting of a small subset of nodes
in the system
• When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query– The notion of progress differs from algorithm to algorithm, but in
general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key
Content-Addressable Network (CAN)
• A typical document routing method• Virtual Cartesian coordinate space is used• Entire space is partitioned amongst all the nodes
– every node “owns” a zone in the overall space
• Abstraction– can store data at “points” in the space
– can route from one “point” to another
• Point = node that owns the enclosing zone
CAN Example: Two Dimensional Space
• Space divided between nodes• All nodes cover the entire
space• Each node covers either a
square or a rectangular area of ratios 1:2 or 2:1
• Example: – Node n1:(1, 2) first node
that joins cover the entire space 1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1
CAN Example: Two Dimensional Space
• Node n2:(4, 2) joins space is divided between n1 and n2
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
CAN Example: Two Dimensional Space
• Node n3:(3, 5) joins space is divided between n1 and n3
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3
CAN Example: Two Dimensional Space
• Nodes n4:(5, 5) and n5:(6,6) join
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
CAN Example: Two Dimensional Space
• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)
• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN Example: Two Dimensional Space
• Each item is stored by the node who owns its mapping in the space
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN: Query Example
• Each node knows its neighbours in the d-space
• Forward query to the neighbour that is closest to the query id
• Example: assume Node n1 queries File Item f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN: Query Example
• Each node knows its neighbours in the d-space
• Forward query to the neighbour that is closest to the query id
• Example: assume Node n1 queries File Item f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN: Query Example
• Each node knows its neighbours in the d-space
• Forward query to the neighbour that is closest to the query id
• Example: assume Node n1 queries File Item f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
CAN: Query Example
• Each node knows its neighbours in the d-space
• Forward query to the neighbour that is closest to the query id
• Example: assume Node n1 queries File Item f4
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
Document Routing – CAN
• Associate to each node and item a unique id (nodeIdnodeId and fileIdfileId) in an d-dimensional space
• Goals– Scales to hundreds of thousands of nodes
– Handles rapid arrival and failure of nodes
• Properties – Routing table size O(d)
– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes
Resource DiscoveryResource Discovery
Associative array
• An associative array is an abstract data type– It is composed of a collection of (key, value) pairs
• Data structure is a way of storing and organizing data in a computer to be used efficiently
• The dictionary problem is the task of designing a data structure of an associative array– A solution: the hash table
• Binding: the association between a key and a value
Hashing
• Any algorithm that maps data of a variable length to data of a fixed length
• It is used to generate fixed-length output data– It is a shortened reference to the original data
• In 1953, D. Knuth and H.P. Luhn of IBM used the concept of hashing
• In 1973, R. Morris used the term ”hashing” in a formal terminology– Before it was used in the technical jargon, only
Distributed Hash Tables
• Key identifies data uniquely• DHT balances keys and data across nodes• DHT replicates, caches, routes lookups, etc.
Distributed hash tables
Distributed applications
Lookup (key) data
node node node….
Insert(key, data)
DHT Applications
• Many services can be built on top of a DHT interface– File sharing– Archival storage– Databases– Chat service– Rendezvous-based communication– Publish/Subscribe systems
Chord
• It is a protocol and algorithm• It was introduced in 2001 by I. Stoica, R. Morris, D. Karger,
F. Kaashoek, & H. Balakrishnan• It is based on the SHA-1 hashing algorithm
– Secure Hash Standard
• It uses m-bit identifiers (IDs)– for the hashed IP addresses of the computers (nodes)
– for the hashed data (keys)
• A logical ring with positions numbered 0 to 2m-1 is formed among nodes
P2P Ring
• Nodes are arranged in a ring based on ID
• identifiers are arranged on a identifier circle modulo 2m => Chord ring
• IDs are assigned randomly• Very large ID space
– m is large enough to make collisions improbable
Construction of the Chord ring
• Data items (keys) also have IDs
• Every node is responsible for a subset of the keys
• a key is assigned to the node whose ID (IDnode) is equal to or greater than the IDkey
• this node is called successor of the key and is the first node clockwise from the IDkey
Chord structure
• Every node is responsible for a subset of the data
• Routing algorithm locates data, with small per-node routing state
• Volunteer nodes join and leave system at any time
• All nodes have identical responsibilities
• All communication is symmetric
Route(d46a1c)
65a1fc
d13da3
d4213f
d462bad467c4
d471f1
d46a1c
Every node knows of every other node requires global information
Routing tables are large O(n) Lookups are fast O(1)
N32
N90
N123
0
Hash(“LetItBe”) = K60
N10
N55
Where is “LetItBe”?
“N90 has K60”
K60
Lookup with global knowledge
Successor and predecessor• Each node has a successor and a predecessor• The successor of a given node is that node, whose ID is
equal to or follows the identifier of the given node• If there are n nodes and k keys, then each node is
responsible for about k/n keys• Basic case - each node knows only the location of its
successor• Increasing the robustness - using more than one
successors– Each node knows r immediate successors
– After failure, will know first live successor
• Predecessor has less importance
Lookup with local knowledge// ask node n to find the successor
of id
n.find_successor(id)
if (id (n; successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
• Disadvantage– Number of messages linear in the
number of nodes• O (n )
Modulo operator
• Modulo operation finds the remainder of division of one number by another
• x mod y ↔ x - (y * int (x / y) )• Examples:
10 mod 8 = 2
5 mod 5 = 0
Lookup with routing (finger) table
• Additional routing information to accelerate lookups• Each node contains a routing table with up to m
entries => finger table– m is number of bits of the identifiers– Every node knows m other nodes in the ring
• The ith entry of a given node N contains the address of successor((N+2i-1) mod 2m)– Increase distance exponentially
Division of the distance by finger tables
N80
½¼
1/8
1/161/321/641/128
Finger i points to successor of N+2i
In this case N=80, i=5
N120
N8080 + 20
N112
N96
N16
80 + 21
80 + 22
80 + 23
80 + 24
80 + 25 80 + 26
Routing with finger tables
• Route via binary search• Use fingers first• Then successors not
belonging into fingers• Cost is O(log n)
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localization
Finger table:
finger[i] = successor ((N + 2i-1) mod 2m), N=0,…n
Scalable node localizationImportant characteristics of this scheme:• Each node stores information about only a small
number of nodes (m)• Each node knows more about nodes closely following it
than about nodes farer away• A finger table generally does not contain enough
information to directly determine the successor of an arbitrary key k
0
4
26
5
1
3
7
124
[1,2)[2,4)[4,0)
130
finger table
start int. succ.
keys
1
235
[2,3)[3,5)[5,1)
330
finger table
start int. succ.
keys
2
457
[4,5)[5,7)[7,3)
000
finger table
start int. succ.
keys
6
Another example for finger tables
m=3((N+2i-1) mod 2m), N=0,…,n
int. = node N search for successor in the interval int.
Node joins with finger tables
0
4
26
5
1
3
7
124
[1,2)[2,4)[4,0)
130
finger table
start int. succ.
keys
1
235
[2,3)[3,5)[5,1)
330
finger table
start int. succ.
keys
2
457
[4,5)[5,7)[7,3)
000
finger table
start int. succ.
keys
finger table
start int. succ.
keys
702
[7,0)[0,2)[2,6)
003
6
6
66
6
int. = node N search for successor in the interval int.
Node departures with finger tables
0
4
26
5
1
3
7
124
[1,2)[2,4)[4,0)
1
3
finger table
start int. succ.
keys
1
235
[2,3)[3,5)[5,1)
330
finger table
start int. succ.
keys
2
457
[4,5)[5,7)[7,3)
660
finger table
start int. succ.
keys
finger table
start int. succ.
keys
702
[7,0)[0,2)[2,6)
003
6
6
6
0
3
Three-step process:
Initialize all fingers of new node
Update fingers of existing nodes
Transfer keys from successor to new node
Joining the Chord ring
Initialize the new node finger table
Locate any node N in the ring
Ask node N to lookup fingers of new node N36
Return results to new node
N36
1. Lookup(37,38,40,…,100,164)
N60
N40
N5
N20N99
N80
Joining the Chord ring – step 1
Updating fingers of existing nodes
new node calls update function on existing nodes
existing nodes can recursively update fingers of other nodes
N36
N60
N40
N5
N20N99
N80
Joining the Chord ring – step 2
Transfer keys from successor node to new node
only keys in the range are transferred
Copy keys 21..36from N40 to N36
K30K38
N36
N60
N40
N5
N20N99
N80
K30
K38
Joining the Chord ring – step 3
Joining the Chord ring
• To ensure correct lookups, all successor pointers must be up to date
• => stabilization protocol running periodically in the background
• Updates finger tables and successor pointers
Lookup Mechanism Lookups take O (Log n) hops
n is the total number of nodes
Lookup: route to closest predecessor
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
Cost of lookup is O(log n)
Number of Nodes
Avera
ge M
ess
ag
es
per
Looku
p
Measured lookup procedure
Handling failures: redundancy
N32
N10
N5
N20
N110
N99
N80
N60
• Each node knows IP addresses of next r nodes• Each key is replicated at next r nodes
N40
K19
K19
K19
Lookups find replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
N68
1.2.
3.
4.
Lookup(K19)
K19
Chord software
• 3000 lines of C++ code• Library to be linked with the application• provides a lookup(key) – function: yields the
IP address of the node responsible for the key
• Notifies the node of changes in the set of keys the node is responsible for