Structured P2P Networks
Guo ShuqiaoYao Zhen Rakesh Kumar Gupta
CS6203 Advanced Topics in Database Systems
Introduction-P2P Network
A peer-to-peer (P2P) network is a distributed system in which peers employ distributed resources to perform a critical function in a decentralized fashion [LW2004]
Classification of P2P networksUnstructured and Structured Centralized and DecentralizedHierarchical and Non-Hierarchical
Structured P2P network
Distributed hash table (DHT)DHT is a structured overlay that offers
extreme scalability and hash-table-like lookup interface
CAN, Chord, Pastry
Other techniquesSkip list
Skipgraph, SkipNet
Outline Hashed based techniques in P2P
Hashed based structured P2P system Pastry P-Grid
Two important issues Load balancing Neighbor table consistency preserving
Comparison of DHT techniques
Skip-list based system SkipNet
Conclusion
Outline Hashed based techniques in P2P
Hashed based structured P2P system Pastry P-Grid
Two important issues Load balancing Neighbor table consistency preserving
Comparison of DHT techniques
Skip-list based system SkipNet
Conclusion
Pastry [RD2001]
Pastry is a P2P object location and routing schemeHash-based
Properties Completely decentralizedScalableSelf-organizedFault-resilientEfficient search
Design of Pastry
nodeID: each node has a unique numeric identifier (128 bit)Assigned randomly
Nodes with adjacent nodeIDs are diverse in geography, ownership, etc
Assumption: nodeID is uniform in the ID space
Presented as a sequence of digits with base 2b
b is a configuration parameter (4)
Design of Pastry (cont’)
Message/query has a numeric key of same length with nodeIDsKey is presented as a sequence of digits wit
h base 2b
Route: a message is routed to the node with a nodeID that is numerically closest to the key
MessageKey = 10
Destination of Routing
20 31
23 03 1212
Destination node
Pastry Schema
Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A
Each node maintains some routing state
Pastry Node State
A leaf set L A routing table A neighborhood
set M
1023323210233122
102332301023300010233020
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
Meanings of ‘Close’Closest according to proximity metric (real distance )
Nearest Neighbor
Closest according to numerical meaning
Node with closet nodeID
20 31
23 03 12
31
23
Pastry Node State
A leaf set |L| nodes with closest nodeIDs
|L|/2 larger ones and |L|/2 smaller ones
Useful in message routing A neighborhood set
|M| nearest neighborsUseful in maintaining locality properties
1023323210233122
102332301023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
Leaf Set and Neighborhood Set
In this example
b=2, l=8 |L| = 2 × 2b
= 8 |M| = 2 × 2b
= 8
SMALLER LARGER
A
Routing Table
l rows and 2b columns ith row: i-prefix jth column: next
digit after the prefix is j
b=2 l=8- > 8 rows and 4 col
umns
1023323210233122
102332301023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
2nd
10-0-31203 10-1-32102 10-3-2330210-0-31203 10-1-32102 10-3-23302
NodeID 10233102
j=0 j=1 j=3
A
1023323210233122
102332301023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
Routing Step1: If k falls within the r
ange of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k
Eg. k = 10233022 falls in the range (10233000,10233232)
Forword it to node10233021 If k is not covered by the
leaf set, go to step2
A
1023323210233122
102332301023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
Routing Step2: The routing table is us
ed and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does
Eg. k = 10223220 forward
it to node 10222302 102-2-2302
If the appropriate entry in the routing table is empty, go to step3
A
Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A
Eg. k = 10233320
If such a node does not exist, A is the destination node
1023323210233122
102332301023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
RoutingA
forward it to node10233232
Routing
The routing procedure always converges, since each step chooses a node that Shares a longer prefix Shares the same long prefix, but is numerically
closer
Routing performance The expected number of routing steps is log2
bN Assumption: accurate routing tables and no recent
node failures
Performance
Average number of routing hops versus number of Pastry nodesb = 4, |L| = 16, |M| =32 and 200,000 lookups.
Discussion of Pastry
Pastry: the parameters make it flexible
b is the most important parameter that determines the power of the system
Trade-off between the routing efficient (log2bN)
and routing table size (log2bN×2b)
Each node can choose its own |L| and |M| based on the node situation
Local optimal??
Eg. k = 10233200
Discussion of Pastry – routing schema
1023313310233122
102331321023300010233021
1023300110233033 10233120
LARGER
-0-2212102
SMALLER
10-0-312031-1-301233 1-3-0210221-2-230203
-3-1203203-2-2301203
10-1-32102
10
233
10
102-0-02301023-0-322
2
1023-1-000102-2-23021023-2-120
10233-0-01
10-3-23302
10233-2-32102331-2-0
3321332131301233
312032032230120310200230
0221210213021022 11301233
NodeID 10233102
Routing table
Leaf set
Neighborhood set
102-1-1302
A
Y’ nodeID = 10233133Dis(k, X’ID) =
(10233200, 10233232) = 32
Dis(k, Y’ID) =
(10233200, 10233133) = 1
X’ nodeID = 10233232
Local optimal node is Y
Pastry forward to node X
P-Grid [Aberer2001]
P-Grid is a scalable access structure for P2P Hash-based & virtual binary search tree Randomized algorithms are used for constructing the
access structure
6 54321
0 1
00 01 10 11Virtual binary tree
1 :301:2
1 :501:2
0 :611:5
0 :211:5
1 :400:6
0 :610:4
Queryk=100
4
P-Grid (cont’)
Properties Complete decentralizedScalable with the total number of nodes and
data itemsFault-resilient, search is robust against
failures of nodesEfficient search
Discussion of Pastry and P-Grid
The two system both make uniform assumptionPastry: ID spaceP-Grid: data distribution and behavior on
peer
If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load
Outline Hashed based techniques in P2P
Hashed based structured P2P system Pastry P-Grid
Two important issues Load balancing Neighbor table consistency preserving
Comparison of DHT techniques
Skip-list based system SkipNet
Conclusion
Load Balancing Consider a DHT P2P system with N nodes
Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001]
Even worse if applications associate semantics with the item IDs
IDs would no longer be uniformly distributed
How to Minimize the load imbalance?Minimize the amount of load moved?
Load Balancing
ChallengesData items are continuously inserted/deletedNodes join and depart continuouslyThe distribution of data item IDs and item
sizes can be skewed Solution—[GLSKS2004]
Load Balancing Virtual server
Represents a peer in the DHT rather than physical node
A physical node hosts one or more virtual server Total load of virtual servers = load of node E.g., in Chord
01
6
4
2
7
5 3
Virtual Server
FT1
FT3
Node:Physical Node
Load Balancing Basic idea
Directories To store load information of the peer nodes Periodically schedule reassignments of virtual
servers
Distributed load balancing problem
Centralized problem at each directory
reduced to
Load Balancing Load balancing algorithm
DirectoryID (known to
all nodes)
Node
Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization
Delay T time
Receives information from nodes
Randomly chooses a directory
Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity
directory innew cycle OR utilization>Ke
yes
Emergency load balancing
Load Balancing Load balancing algorithm (cont.)
Computing optimal reassignment is NP-complete
Greedy algorithm O(mlogm) For each heavily loaded node, move the least
loaded virtual server to pool For each virtual server in pool, from heaviest to
lightest, assign to a node n which minimizes the resulting load
Load Balancing Performance
Tradeoff: Load movement vs. Load balancing Load balancing: max node utilization When T decreases
Max node utilization decreases Load movement increases
Effective in achieving load balancing for System utilization as high as 90% Only transfer 8% of the load that arrives in the
system
Emergency load balancing is necessary
Consistency Preserving
Neighbor tableA table of neighbor pointersFor efficient routing in a P2P system
ChallengeHow to maintain consistent neighbor tables in
a dynamic network where nodes may join, leave and fail concurrently and frequently?
Consistency Preserving
Consistent networkFor every entry in neighbor tables, if there
exists at least one qualified node in the network, then the entry stores at least one qualified node
Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry
Otherwise, the entry is empty
Consistency Preserving
K-consistent networkFor every entry in neighbor tables, if there exist
H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes
Otherwise, the entry is empty For K>0, K-consistency => consistency 1-consistency = consistency
Consistency Preserving
General strategy Identify a consistent subnet as large as possibleOnly replace a neighbor with a closer one if
both of them belong to the subnetExpand the consistent subnet after new nodes
joinMaintain consistency of the subnet when nodes
fail
Consistency Preserving
Approach of [LL2004b] To design a join protocol such that
An initially K-consistent network remains K-consistent after a set of nodes join process terminate
The termination of join implies the node joined belong to this consistent subnet
To design a failure recovery protocol that Recovers K-consistency of the subnet by repairing
holes left by failed neighbors with qualified nodes in the subnet
Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper
Consistency Preserving
Join protocolEach node has a status
copying, waiting, notifying, cset_waiting, in_system S-node: node in status in_system
T-node: otherwise
All S-nodes form a consistent subnet
Consistency Preservingcopying
waiting
notifying
cnet_wating
in_system
Copy neighbor infor from S-nodes to fill in most entries of its table level by level.
When cannot find a qualified S-node for a level i>=1Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor
When find such a node, say ySeek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table
When finish notifyingWait for the nodes joining currently and are likely to be in the same consistent subnet
When confirm all nodes have exited notifying status
Consistency Preserving Performance
p-ratio In x’s table, the primary-neighbor of the entry is y,
the true primary-neighbor should be z p-ratio = delay from x to y / delay from x to z
K-consistency is always maintained in all experiments
When K increases, p-ratio decreases More neighbor infor is stored => more messages
Even with massive joins and failures, tables are still optimized greatly
Outline Hashed based techniques in P2P
Hashed based structured P2P system Pastry P-Grid
Two important issues Load balancing Neighbor table consistency preserving
Comparison of DHT techniques
Skip-list based system SkipNet
Conclusion
Comparing DHTs [DGPR2003]
Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach Break DHT design into independent components Analyze impact of each component choice separately
Two types of components Routing-level : neighbor & route selection System-level : caching, replication, querying policy, latency
Metrics Used
Metrics used in comparison Flexibility – Options in choosing neighbors and routes Resilience – Does it route when nodes goes down ? Load balancing – Is the content distributed ? Proximity & Latency – Is the content stored nearby ?
Aspects of DHT Geometry - a structure that inspires a DHT design, Distance function –distance between two nodes Algorithm: rules for selecting neighbors and routes using the
distance function
Algorithm & Geometry
What is routing algorithm & geometry ? Routing Algorithm – refers to exact rules for selecting neighbors,
routes. (eg. Chord, CAN, PRR, Tapestry, Pastry) Geometries – refers to the algorithms’ underlying structure derived
from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring).
Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes. Neighbor selection – Does the geometry choose neighbors based on
proximity ? Leads to shorter paths. Route selection – Number of options for selecting next hops. Leads
to shorter, reliable paths.
DHT Algorithms Analysis
The table summarizes the geometries & algorithms.
We will examine the metric flexibility in these two aspects Flexibility in neighbor selection
Flexibility in route selection
Geometry Algorithm
Tree PRR
Hypercube CAN
Butterfly Viceroy
Ring Chord
XOR Kademlia
Hybrid Pastry
root
0
00 01
1
10 11
010 110
011 111
000 100
001 101
0
2
4
6
7
5
1
3
root
0
00 01
1
10 11
Tree Geometry
root
0
00 01
1
10 11
PRR uses tree geometry. Distance between two nodes is the depth of the binary tree
(Well-balanced tree : log N) Node selection flexibility - has 2(i-1) options of choosing
neighbor at distance i. No routing flexibility
Height = 1
Height = 2
Leafset
Hypercube Geometry
010 110
011 111
000 100
001 101
CAN uses a d-torus hypercube. Each node has log n neighbor. Routing greedily by correcting bits in
any order. Neighbors differ by exactly one bit.
No flexibility in choosing neighbors. Routing from source to destination at log n distance.
First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices
Butterfly Geometry
Viceroy uses butterfly geometry. Nodes organized in a series of log n “stages” where all the
nodes at stage i are capable of correcting the ith bit. Routing consists of 3 phases. Done in O(log N) hops No flexibility in route selection and neighbor selection.
Ring Geometry
Chord uses the Ring Maintain log n neighbors and
routes to arbitrary destination in log n hops. Routing in O(log n) hops
Flexibility in neighbor selection, has 2(i-1) possible options to pick its ith neighborAn approx of nlog n / 2 possible routing tables for each node
Yields (log n)! possible routes to route from a source to destination of distance log n.
0
2
4
6
7
5
1
3
Ring Geometry000
101
100
011
010
001
110
111 110
To route from 000 to 110, we have two routes. Route to 100 and then to 110. Route to 010 and then to 110.
XOR
Kademlia uses XOR Geometry. Distance between nodes is XOR of their identifier. Node has 2(i-1) options of choosing neighbor at ith
distance. Yields approx nlog n / 2 entries per routing table. Route flexibility by fixing lower order bits before fixing the
higher bits if an optimal path is not available. May result in longer distances as as the lower order bits fixed need not be preserved by later routing.
Hybrid
Pastry is a hybrid. Its nodes are regarded as both leaves of a binary tree and points to a one-dimensional circle.
Distance between nodes is either the tree distance and cyclic distance between nodes
Node has 2(i-1) options of choosing neighbor at distance i. Yields approx n((log n) / 2) entries per routing table.
Route selection freedom – allowed to take hops on the ring – these paths might not retain the O(log n) bound on routes.
root
0
00 01
1
10 11
Flexibility OverviewProperty Tree Hypercube Ring Butterfly Xor Hybrid
Neighbor selection nlog n / 2 1 nlog n / 2 1 nlog n / 2 nlog n / 2
Route Selection (optimal) 1 c1(log n) c1(log n) 1 1 1
Natural support for sequential neighbors?
no no yes no no Deafult – noFallback – yes
Ring & Hypercube have twice the routing flexibilities than Hybrid & XOR geometries
Resilience Two aspects of robust routing
Static resilience measures how well the algorithm can route in a dynamic environment before the recovery algorithms.
Dynamic recovery measures how quickly states are recovered after failure.
Node failure- 30% failure Tree - 90% routes failed
(no route selection flexibility) Ring, Hypercube –
7% routes failed (most route selection flexibility)
Hybrid, XOR - 20% route failed (half flexibility as ring)
Route Selection Flexibility affects static resilience
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90% Failed Nodes
% F
aile
d P
ath
s
Ring
Hybrid
XORTree
Hypercube
Path Latency
Goal is to minimise end-to-end latency of overlay networks. Two proximity methods are considered. Proximity Neighbor Selection (PNS)
Neighbors are chosen on their proximity. Proximity Route Selection (PRS)
Routes are selected depending on the proximity of the neighbors
PNS achieves improvement over PRS which achieves improvement over Plain version.
Geometry does not affect performance of PNS / PRS. Thus it is important to choose a routing algorithm that has a
geometry that accommodates PNS.
Local Convergence
Does messages sent from two nodes to the same destination converge at a node near the two sources ?
Leads to low latencies in the following: Overlay Multicast Caching Server selection
Measured by number of exit points in the network. Best case, only one node sends a message off-domain.
Limitations & Findings
Limitations Author has not considered all geometries Not considered other factors and performance metrics
Findings Routing geometry is important. Flexibility is improves resilience & proximity.
Why not the RING ? Great flexibility to choose neighbors and routes. Implement both
the proximity methods PNS & PRS. Highest performance in resilience tests and is as good as other
geometry in path lengths and local convergence.
Outline Hashed based techniques in P2P
Hashed based structured P2P system Pastry P-Grid
Two important issues Load balancing Neighbor table consistency preserving
Comparison of DHT techniques
Skip-list based system SkipNet
Conclusion
Skip List [PSL1990]
Skip list are data structures that can be used in place of balanced trees. Uses probabilistic balancing techniques hence algorithms are simpler and faster.
Described as a sorted linked list in which some nodes are supplemented with pointers that skip over many list elements.
HDR
2 9 23 275 25
1629
NIL
Perfect Skip List
A perfect skip list is one where the height of the ith node is the exponent of the largest power-of-two that divides i. Pointers at level h have length 2h. A perfect skip list supports searches in O(log N).
Because it is expensive to perform insertion and deletions in a perfect skip list, a probabilistic balanced skip list is proposed by consulting a random number generator.
HDR
2 9 23 275 25
1629
NIL
Height is 2 : (22) Height is 3 : (23)
Level 2 pointer skips over 22 nodes
ExamplesHDR NIL
Add Node 10 (height is 1 chose randomly)
HDR NIL10
Add Node 5 (height is 0 chose randomly)
HDR NIL105
Add Node 8 (height is 2 chose randomly)
HDR NIL105
8
Add Node 12 (height is 0 chose randomly)
HDR NIL5
108
12
Add Node 2 (height is 0 chose randomly)
HDR NIL5
108
122
Search Skip List
HDR
2 9 23 275 25
1629
NIL
• Search for Node 30. From HDR to Node 29. Then stop and search fails. (illustrated)
• Search for Node 23. From HDR to Node 16. Drop two levels, From Node 16 to Node 23. Found.
• Search for Node 27. From HDR to Node 16. Drop one level, From Node 16 to Node 25. Drop one level, from Node 25 to Node 27. Found.
Skip List
Worst case performance when significantly unbalanced. Space efficient. Can use 1.33 pointers per element. Maintains a O(log N) searches with high probability. Comparison with AVL, recursive 2-3 & self adjust trees
Skip List performs more comparison than other methods. Skip List is slightly slower than AVL trees in searches, but
insertions and deletions in a skip list are faster Skip Lists are faster than self adjusting tree when a
uniform distribution is encountered, but slower for highly skewed distributions
SkipNet Introduction [SNL2003]
In DHTs, we cannot control where the data will be stored Data might be stored far away from the administrative domain
and thus hard to administer privileges. – Can we adapt ? Gives rise to Denial of service attacks and traffic analysis.
Solution : Use SkipNet - scalable overlay network that provides controlled data placement and guarantee routing locality by organizing data by string names Content can be placed on pre-defined node or distributed
uniformly across nodes of a hierarchical naming subtree.
Motivation
Disadvantages of Chord, CAN, Tapestry, Pastry: No Content locality:
Explicitly place data on a specific overlay nodes or distribute it across nodes in a specified domain.
Cannot be prone to traffic analysis & Denial of service attacks No Path locality:
Guarantees that routing path between two overlay nodes in a domain does not leave the domain.
Additional security – the traffic does not passed on to other domain which could be its competitor.
SkipNet provides both content & path locality.
How does SkipNet do it?
Employs a string name and numeric ID space. Node names and content identifier string mapped into name ID Hashes of the node names and content identifiers mapped into
the numeric ID. By arranging content in name ID order rather than
dispersing it, we can achieve content & path locality.
Advantages of locality
Improved availability data stored within organisation and can search even if the network
disjoints. Resilience against Internet failures. Nodes within a cluster gracefully
survives failures that disconnect clusters from the rest of the Internet (useful property of SkipNet)
Performance Searches are faster as data is stored near nodes.
Manageability facilitates control and maintenance in an administrative domain
Security Can deal with traffic analysis & denial of service attacks.
SkipNet Structure
Adapts the skip list structure Traversals start from any node State and processing costs should be the same for all nodes We use a Ring & doubly linked list.
Other enhancements. Each node also stored 2 log N pointers rather than a high variable
number of pointers. SkipNet
Perfect : Pointers at level h point to nodes that are exactly 2h nodes to the left and right.
Probabilistic : A node in level h probabilistically determines which ring it belongs to.
SkipNet Structure
Level
2 T T
1 M X
0 D Z
SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown.
A
DM
O
T
ZX
V
Level
2 D D
1 Z O
0 X T
000 001
010
011100
101110
111
SkipNet StructureRing000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A
D M O
T
Z X V
A
M
T
X
D O
Z V
A T
M
X
O
Z
D
V
A T
M
X Z
O D
V
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root Ring Level L = 0
L = 1
L = 2
L = 3
The full SkipNet routing infrastructure for an 8 node system, including the ring labels.
Routing By Name ID
Similar to search in Skip Lists Message routed from highest level pointer in either clockwise /
counter clockwise direction with name ID that are not past the destination value.
Terminates when messages arrives at a node whose name ID is closest to destination.
Because nodes are doubly linked, scheme routes either to left or right pointers depending on name ID’s.
Number of hops is O(log N)
Example
Routing a message from Node A to Node V Path:
A (Level 2, clockwise) T, “T” < “V” T (Level 2, clockwise) Failed T (Level 1, clockwise) Failed T (Level 0, clockwise) V. (Destination)
Level
2 T T
1 M X
0 D Z
A
DM
O
T
ZX
VLevel
2 D D
1 Z O
0 X T
000 001
010
011100
101110
111
Level
2 A A
1 X M
0 V O
Routing Algorithm
SendMsg(nameID, msg) {
if( LongestPrefix(nameID,localNode.nameID)==0 )
msg.dir = RandomDirection();
else if( nameID<localNode.nameID )
msg.dir = counterClockwise;
else
msg.dir = clockwise;
msg.nameID = nameID;
RouteByNameID(msg);
}
// Invoked at all nodes (including the source and// destination nodes) along the routing path.RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID,msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg);}
Routing By Numeric ID
Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit.
Messages forwarded from ring in level h, Rh, to a ring in level h+1, Rh+1, such that nodes in Rh+1 share h+1 digits with destination numeric ID.
Terminates when Deliver message to node with numeric ID = key If none of the nodes in Rh share h+1 digits with destination
numeric ID then we pick node with numeric ID that is closest to destination’s numeric ID.
Number of message hops is O(log N),
Routing By Numeric ID
E.g. Let Z = 1000, O = 1001. Route from A 1011. Path: A(0000) D (1100 – move up level) O (1001 – move up level) Z (1000) O (1001 – closest
match for 1011) (deliver).
Ring0000
Ring0001
Ring0100
Ring0101
Ring1000
Ring1001
Ring1100
Ring1101
A
D M O
T
Z X V
A
M
T
X
D O
Z V
A T
M
X
O
Z
D
V
A TM
X Z
O D
V
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root Ring
………………….
O
Routing Algorithm// Invoked at all nodes (including the source and destination nodes) along the routing path.// Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = falseRouteByNumericID(msg) {
if (msg.numID == localNode.numID || msg.finalDestination) {DeliverMessage(msg.msg);return;
}if (localNode == msg.startNode) { // Done traversing current ring.
msg.finalDestination = true;SendToNode(msg.bestNode);return;
}h = CommonPrefixLen(msg.numID, localNode.numID);if (h > msg.ringLvl) { // Found a higher ring.
msg.ringLvl = h;msg.startNode = msg.bestNode = localNode;
} else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) {// Found a better candidate for current ring.msg.bestNode = localNode;
}// Forward along current ring.nbr = localNode.RouteTable[clockWise][msg.ringLvl];SendToNode(nbr);
}
Benefits
Skip Net support routing with the same data structure by name ID numeric ID
Bottom ring is sorted by name ID and top rings are sorted by numeric ID.
For a given node, the SkipNet rings to which it belongs to precisely form a Skip List that is a ring & double linked.
Node Joins & Departure
Node Joins A New node finds top level ring that matches its numeric ID. Finds a neighbor in the top ring using name Id search. Starting from one of the neighbors, it searches for its name ID at
the next lower level and thus finds neighbors at lower level. Repeated until it reaches root. The existing nodes only point to the new node only after it has
joined the root ring. Insertion traverse O(log N) hops with high probability
Node Departure Can route correctly as long as root level ring is maintained.
Other levels regarded as optimization hints and it maintains upper-ring membership thru background repair process.
Example
Join - Insert node O (101) Search by numeric ID 101
Highest attainable level is 2 O joins ring containing Z at level 2 Z forwards join message to D at next lower level 1
Proceed by searching by name ID in next lower levels D, V are neighbors in level 1 M, T are neighbors in level 0
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A
D M O
T
Z X V
A
M
T
X
D O
Z V
A T
M
X
O
Z
D
V
A T
M
X Z
O D
V
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root Ring
Ring000
Ring001
Ring010
Ring011
Ring100
Ring101
Ring110
Ring111
A
D M O
T
Z X V
A
M
T
X
D O
Z V
A T
M
X
O
Z
D
V
A T
M
X Z
O D
V
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root RingA
D M O
T
Z X V
A
D M O
T
Z X V
A
M
T
X
A
M
T
X
D O
Z V
D O
Z V
A TA T
M
X
M
X
O
Z
O
Z
D
V
D
V
AA TT
MM
XX ZZ
OO DD
VV
Ring 00 Ring 01 Ring 10 Ring 11
Ring 0 Ring 1
Root Ring Level L = 0
L = 1
L = 2
L = 3
Properties of SkipNet Content & Path Locality
Naming nodes like a DNS entry. Path locality for groups in which nodes share a single DNS suffix.
E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john Incorporating node name ID into content name gurantees that the content
will be hosted on that node. E.g. com.microsoft.john/doc-name
Constrained Load Balancing Stored using two parts – a CLB Domain and CLB suffix
For example a doc using the name msn.com/DataCenter!TopStories.html. Searching node
Search for node in the CLB Domain using name ID search. Then search by numeric ID for the hash of the CLB suffix constrained by domain ID.
Search is constrained by a nameID prefix, we use the double link list. This type of search affect the performance by a factor of 2.
Performed over a naming subtree but not over arbitrary subset of nodes.
Properties of SkipNet
Fault tolerance: Only need to maintain correct neighbors at Level 0
Each node has 16 neighbors at Level 0. Level 0 repaired easily by contacting life nodes. Employs background stabilization mechanisms when failure
Failure across organizational boundaries only segments the overlay. Gracefully survives.
Security: Nodes cannot create global names containing suffix of registered
domains. Path locality avoids traffic analysis However, outbound traffic still prone to analysis easily.
Range queries: Ability to perform queries over contiguous ring segments.
Enhancements
Use Sparse & Dense Routing Table Use a density parameter k & a non-binary random digit to the
base k for numeric ID.
Duplicate pointer elimination Remove duplicate pointers in the routing table. 25%
improvements can be achieved.
Incorporate Network proximity for routing by name id Introduce a P-table for proximity routing. The goal of P-table is to
maintain routing in O(log ) hops. Ensures that each hop has low latency. Keeps track of the
network distance that are close to itself.
Enhancements
Incorporate Network proximity for routing by numeric id Add a C-table to incorporate network proximity when searching
by numeric ID. Keeps track of nodes that are close and within CLB domain.
Design Alternative
IP routing & DNSo Content placement by routing using IP and DNS lookup.
Single Overlay Networko Content locality, we name node with the hash of the data’s object’s
name. Requires separate routing table for each objecto Use 2 part naming scheme –content name consist of node addresses
concatenated with node-relative names. Does not support guaranteed path locality
o Add constraints to message to limit path locality. However prevents routing from being consistent.
o Use a 2 part segments, use numeric ID and name ID like SkipNet. Result is a static form of constrained load balancing.
Design Alternative
Multiple overlay networko Multiple overlays with membership could be considered.o Requires that access to other overlays are by gateways. o Access to data is constrained and load balanced within a single
overlay not accessible to clients outside except via gateways.
SkipNet provides explicit content placement, allows clients to dynamically define new DHTs over any name prefix scope and guarantees path locality within shared name prefix within a single infrastructure.
Experiments The author run experiments against the following:
Basic SkipNet using only R-Table Full SkipNet using R-Table, P-Table, C-Table. Pastry Chord
We use the following lookup performance metrics Relative Delay Penalty (RDP) - latency of overlay path compare to IP Physical network hops - length of the overlay path measured in IP hops Number of failed lookups
Other metrics (refer to paper) Format of node name Organisation size Models for distribution of nodes and data Using host or organisation generated node name Simulation of domain isolation by failing organization’s link
Experiment Results Basic routing costs
Full SkipNet and Pastry are locality aware while basic SkipNet and Chord are not. Hence performed better.
Non-uniform distribution of data does not affect performance.
Routing Entries per Node
Locality of Placement Measures physical network hops. Chord and Pastry have constant physical hops because they are
oblivious to locality of data since they diffuse data throughout network.
SkipNet shows performance improvements as the locality of the data references increased.
Chord Basic SkipNet Full SkipNet Pastry
16.3 41.7 102.2 63.2
Experiment Results
Fault Tolerance – when organisation disconnected Locality improves fault tolerance. Chord, Pastry fails totally for local lookups at data diffused SkipNet functions and does local lookups
Constrained Load Balancing (within a domain) Studies the Relative Delay Penalty (RDP) as node increases Basic CLB using R-Table cause higher delays penalties Full CLB causes intermediate delays penalties Pastry has low delay penalties.
Network proximity Study the effect of RDP over density k which control P-Table entries. We notice that RDP levels off after k=8 because of the increase of
pointers in P-Table
SkipNet Summary
SkipNet is the first p2p system that achieves both path and content locality. Provides content locality at desired degree and granularity.
Clustering node names allows SkipNet to perform gracefully in face of linkages failure.
Performance is similar to other p2p systems such as Chord and Pastry under uniform access patter.
Under access patterns where intra-organisation traffic predominates, SkipNet performs better.
SkipNet is also more resilience to network partitions than other p2p.
Conclusion Looked at hashed based techniques in P2
PPastryP-Grid
Two important issuesLoad balancing Neighbor table consistency preserving
Comparison of DHT techniques SkipNet – A Skip List Adaption
References[CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott
Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August 27-31, 2001.
[CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Chord: A Scalable Peertopeer Lookup Service for InternetApplications. SIGCOMM’01, August 27-31, 2001.
[CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system”, Proc. of ICSI Workshop on Design Issues in Anonymity and Unobservability, 2000.
[DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I. Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity, SIGCOMM’03, August 25–29, 2003.
[LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks: Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS, June 2004.
[LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks, Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.
References (cont.)[GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE
INFOCOM, Portland, Oregon, USA, 2004.[PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees.
Communications of the ACM, June 1990 supported by an AT&T Bell Labs Fellowship and by NSF grant CCR–8908900.
[RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale pear-to-per systems”. In Proc. of the 18th IFIP/ACM International Conf. on Distributed Systems Platforms, November 2001.
[SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. Of SIGCOMM ’01, San Diego, California, USA
[SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, Proc. of the 2001 ACM Annual Conference of the Special Interest Group on Data Communication (ACM SIGCOMM’01), 2001.
[SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA. March 2003
Top Related