From peer to peer: 10 peer reviewing tips from peer reviewers
UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating...
05/03/2007 P2P 1
UCDavis, ecs251Spring 2007
ecs251 Spring 2007:
Operating System ModelsOperating System Models#3: Peer-to-Peer Systems
Dr. S. Felix Wu
Computer Science Department
University of California, Davishttp://www.cs.ucdavis.edu/~wu/
05/03/2007 P2P 2
UCDavis, ecs251Spring 2007
The role of service provider..The role of service provider.. Centralized management of services
– DNS, Google, www.cnn.com, Blockbuster, SBC/Sprint/AT&T, cable service, Grid computing, AFS, bank transactions…
Information, Computing, & Network resources owned by one or very few administrative domains.– Some with SLA (Service Level Agreement)
05/03/2007 P2P 3
UCDavis, ecs251Spring 2007
Interacting with the “SP”Interacting with the “SP”
Service providers are the owner of the information and the interactions– Some enhance/establish the interactions
05/03/2007 P2P 4
UCDavis, ecs251Spring 2007
Let’s compare …Let’s compare …
Google Blockbuster CNN MLB/NBA LinkIn e-Bay
Skype Bittorrent Blog Youtube BotNet Cyber-Paparazzi
05/03/2007 P2P 5
UCDavis, ecs251Spring 2007
Toward P2PToward P2P More participation of the end nodes (or
their users)– More decentralized Computing/Network
resources available– End-user controllability and interactions– Security/robustness concerns
05/03/2007 P2P 6
UCDavis, ecs251Spring 2007
Service Providers in P2PService Providers in P2P
We might not like SP, but we still can not avoid SP entirely.– Who is going to lay the fiber and switch?– Can we avoid DNS?– How can we stop “Cyber-Bullying” and other
similar?– Copyright enforcement?– Internet becomes a junkyard?
05/03/2007 P2P 7
UCDavis, ecs251Spring 2007
We will discuss…We will discuss…
P2P system examples– Unstructured, structured, incentive
Architectural analysis and issues Future P2P applications and why?
05/03/2007 P2P 8
UCDavis, ecs251Spring 2007
Challenge to you…Challenge to you…
Define a new P2P-related application, service, or architecture.
Justify why it is practical, useful and will scale well.– Example: sharing cooking recipes, experiences
& recommendations about restaurants and hotels
05/03/2007 P2P 10
UCDavis, ecs251Spring 2007 NapsterNapster
Napster serverIndex1. File location
2. List of peers
request
offering the file
peers
3. File request
4. File delivered5. Index update
Napster server
Index
05/03/2007 P2P 14
UCDavis, ecs251Spring 2007
Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft March 2000, Nullsoft posts Gnutella to the web A day later AOL removes Gnutella at the behest of Time Warner The Gnutella protocol version 0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf
and version 0.6
http://rfc-gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm there are multiple open source implementations at http://sourceforge.net/
including:
– Jtella
– Gnucleus Software released under the Lesser Gnu Public License (LGPL) the Gnutella protocol has been widely analyzed
05/03/2007 P2P 15
UCDavis, ecs251Spring 2007
Gnutella Protocol MessagesGnutella Protocol Messages Broadcast Messages
– Ping: initiating message (“I’m here”)– Query: search pattern and TTL (time-to-live)
Back-Propagated Messages– Pong: reply to a ping, contains information about the peer– Query response: contains information about the
computer that has the needed file Node-to-Node Messages
– GET: return the requested file– PUSH: push the file to me
05/03/2007 P2P 16
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7A
Steps:• Node 2 initiates search for file A
05/03/2007 P2P 17
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7
ASteps:• Node 2 initiates search for file A• Sends message to all neighbors
A
A
05/03/2007 P2P 18
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7
ASteps:• Node 2 initiates search for file A• Sends message to all neighbors• Neighbors forward message
A
A
A
05/03/2007 P2P 19
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7Steps:• Node 2 initiates search for file A• Sends message to all neighbors• Neighbors forward message• Nodes that have file A initiate a
reply message
A:5
A
A:7
A
A
05/03/2007 P2P 20
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7Steps:• Node 2 initiates search for file A• Sends message to all neighbors• Neighbors forward message• Nodes that have file A initiate a
reply message• Query reply message is back-
propagated
A:5
A:7
A
A
05/03/2007 P2P 21
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7Steps:• Node 2 initiates search for file A• Sends message to all neighbors• Neighbors forward message• Nodes that have file A initiate a
reply message• Query reply message is back-
propagated
A:5
A:7
05/03/2007 P2P 22
UCDavis, ecs251Spring 2007
1
2
3
4
5
6
7Steps:• Node 2 initiates search for file A• Sends message to all neighbors• Neighbors forward message• Nodes that have file A initiate a
reply message• Query reply message is back-
propagated• File download
• Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y
download A
Limited Scope FloodingReverse Path Forwarding
05/03/2007 P2P 24
UCDavis, ecs251Spring 2007
GUID: Short for Global Unique Identifier, a randomized string that is used to uniquely identify a host or message on the Gnutella Network. This prevents duplicate messages from being sent on the network.
GWebCache: a distributed system for helping servants connect to the Gnutella network, thus solving the "bootstrapping" problem. Servants query any of several hundred GWebCache servers to find the addresses of other servants. GWebCache servers are typically web servers running a special module.
Host Catcher: Pong responses allow servants to keep track of active gnutella hosts
On most servants, the default port for Gnutella is 6346
05/03/2007 P2P 25
UCDavis, ecs251Spring 2007
Gnutella Network Growth .
-
10
20
30
40
50
11/20/0011/21/0011/25/0011/28/0002/27/0103/01/0103/05/0103/09/0103/13/0103/16/0103/19/0103/22/0103/24/0105/12/0105/16/0105/22/0105/24/0105/29/01
Number of nodes in the largest network component ('000)
05/03/2007 P2P 26
UCDavis, ecs251Spring 2007 ““Limited Scope Flooding”Limited Scope Flooding”
Ripeanu reported that Gnutella traffic totals 1Gbps (or 330TB/month).– Compare to 15,000TB/month in US Internet backbone
(December 2000)– this estimate excludes actual file transfers
Reasoning: QUERY and PING messages are flooded. They form
more than 90% of generated traffic predominant TTL=7 >95% of nodes are less than 7 hops away measured traffic at each link about 6kbs network with 50k nodes and 170k links
05/03/2007 P2P 28
UCDavis, ecs251Spring 2007
A
DB
C
E
H
G
F
Inefficient mapping Link D-E needs to support six times higher
traffic.
05/03/2007 P2P 29
UCDavis, ecs251Spring 2007
Topology mismatchTopology mismatchThe overlay network topology doesn’t match the
underlying Internet infrastructure topology!
40% of all nodes are in the 10 largest Autonomous Systems (AS)
Only 2-4% of all TCP connections link nodes within the same AS
Largely ‘random wiring’ Most Gnutella generated traffic crosses AS border,
making the traffic more expensive May cause ISPs to change their pricing scheme
05/03/2007 P2P 30
UCDavis, ecs251Spring 2007
ScalabilityScalability
Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections.
existing mechanisms to reduce traffic:– TTL counter– Cache information about messages they
received, so that they don't forward duplicated messages.
05/03/2007 P2P 31
UCDavis, ecs251Spring 2007
70% of Gnutella users share no files 90% of users answer no queries Those who have files to share may limit number of connections or
upload speed, resulting in a high download failure rate. If only a few individuals contribute to the public good, these few
peers effectively act as centralized servers.
05/03/2007 P2P 32
UCDavis, ecs251Spring 2007
Anonymity Gnutella provides for anonymity by
masking the identity of the peer that generated a query.
However, IP addresses are revealed at various points in its operation: HITS packets includes the URL for each file, revealing the IP addresses
05/03/2007 P2P 33
UCDavis, ecs251Spring 2007
Query Expressiveness
Format of query not standardized No standard format or matching semantics for the
QUERY string. Its interpretation is completely determined by each node that receives it.
String literal vs. regular expression Directory name, filename, or file contents Malicious users may even return files unrelated to
the query
05/03/2007 P2P 34
UCDavis, ecs251Spring 2007
SuperpeersSuperpeers
Cooperative, long-lived peers typically with significant resources to handle very high amount of query resolution traffic.
05/03/2007 P2P 36
UCDavis, ecs251Spring 2007
Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work
Growth is hindered by the volume of generated traffic and inefficient resource use
since there is no central authority the open source community must commit to making any changes
Suggested changes have been made by– Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei
Ripeanu– Improving Gnutella Protocol: Protocol Analysis and Research
Proposals by Igor Ivkovic
05/03/2007 P2P 37
UCDavis, ecs251Spring 2007
FreenetFreenet
Essentially the same as Gnutella:– Limited-scope flooding– Reverse-path forwarding
Difference:– Data objects (I.e., files) are also being delivered
via “reverse-path forwarding”
05/03/2007 P2P 38
UCDavis, ecs251Spring 2007
P2P IssuesP2P Issues
Scalability & Load Balancing Anonymity Fairness, Incentives & Trust Security and Robustness Efficiency Mobility
05/03/2007 P2P 39
UCDavis, ecs251Spring 2007
Incentive-driven FairnessIncentive-driven Fairness
P2P means we all should contribute..– Hopefully fair, but the majority is selfish…
“Incentive for people to contribute…”
05/03/2007 P2P 40
UCDavis, ecs251Spring 2007
Bittorrent: “Tit for Tat”Bittorrent: “Tit for Tat”
Equivalent Retaliation (Game theory)– A peer will “initially” cooperate, then respond
in kind to an opponent's previous action. If the opponent previously was cooperative, the agent is cooperative. If not, the agent is not.
05/03/2007 P2P 41
UCDavis, ecs251Spring 2007
BittorrentBittorrent Fairness of download and upload between a
pair of peers Every 10 seconds, estimate the download
bandwidth from the other peer– Based on the performance estimation to decide
to continue uploading to the other peer or not
05/03/2007 P2P 42
UCDavis, ecs251Spring 2007
Client & its PeersClient & its Peers
Client– Download rate (from the peers)
Peers– Upload rate (to the client)
05/03/2007 P2P 43
UCDavis, ecs251Spring 2007
BT Choking by ClientBT Choking by Client By default, every peer is “choked”
– stop “uploading” to them, but the TCP connection is still there.
Select four peers to “unchoke”– Best “upload rates” and “interested”.– Uploading to the unchoked ones and monitor the
download rate for all the peers– “Re-choke” every 30 seconds
Optimistic Unchoking– Randomly select a choked peer to unchoke
05/03/2007 P2P 44
UCDavis, ecs251Spring 2007
““Interested”Interested”
A request for a piece (or its sub-pieces)
05/03/2007 P2P 45
UCDavis, ecs251Spring 2007
Becoming “seed”Becoming “seed”
Use “upload” rate to the peers to decide which peers to unchoke.
05/03/2007 P2P 47
UCDavis, ecs251Spring 2007
BT Peer SelectionBT Peer Selection
From the “Tracker”– We receive a partial list of all active peers for
the same file– We can get another 50 from the tracker if we
want
05/03/2007 P2P 48
UCDavis, ecs251Spring 2007
Piece SelectionPiece Selection Piece (64K~1M) Sub-piece (16K)
– Piece-size: trade-off between performance and the size of the torrent file itself
– A client might request different sub-pieces of the same piece from different peers.
Strict Priority - sub-pieces and piece Rarest First
– Exception: “random first”
– Get the stuff out of Seed(s) as soon as possible..
05/03/2007 P2P 49
UCDavis, ecs251Spring 2007
Rarest FirstRarest First
Exchanging bitmaps with 20+ peers– Initial messages– “have” messages
Array of buckets– Ith buckets contains “pieces” with I known
instances– Within the same bucket, the client will
randomly select one piece.
05/03/2007 P2P 50
UCDavis, ecs251Spring 2007
Random-FirstRandom-First
Usually, rare-first pieces are rare. The client has to get all the sub-pieces from
one or very few peers. For the first 4~5 pieces, get some random
pieces so the client can have a few pieces to upload.
05/03/2007 P2P 51
UCDavis, ecs251Spring 2007
BitTorrentBitTorrent
Connect to the Tracker Connect to 20+ peers Random-first or Rarest-first Monitoring the download rate from the
peers (or upload rate to the client) Unchoke and Optimistic Unchoke
05/03/2007 P2P 53
UCDavis, ecs251Spring 2007
Trackerless BittorrentTrackerless Bittorrent
Every BT peer is a tracker! But, how would they share and exchange
information regarding other peers? Similar to Napster’s index server or DNS
05/03/2007 P2P 54
UCDavis, ecs251Spring 2007
Pure P2PPure P2P
Every peer is a tracker Every peer is a DNS server Every peer is a Napster Index server
How can this be done?– We try to remove/reduce the role of “special
servers”!
05/03/2007 P2P 56
UCDavis, ecs251Spring 2007
Structured PeeringStructured Peering
Peer identity and routability
05/03/2007 P2P 57
UCDavis, ecs251Spring 2007
Structured PeeringStructured Peering
Peer identity and routability Key/content assignment
– Which identity owns what? (Google Search?)
05/03/2007 P2P 58
UCDavis, ecs251Spring 2007
Structured PeeringStructured Peering
Peer identity and routability Key/content assignment
– Which identity owns what?Napster: centralized index serviceSkype/Kazaa: login-server & super peersDNS: hierarchical DNS servers
Two problems:(1). How to connect to the “ring”?(2). How to prevent failures/changes?
05/03/2007 P2P 59
UCDavis, ecs251Spring 2007
DHTDHT
Distributed hash tables (DHTs)– decentralized lookup service of a hash table– (name, value) pairs stored in the DHT– any peer can efficiently retrieve the value
associated with a given name– the mapping from names to values is distributed
among peers
05/03/2007 P2P 60
UCDavis, ecs251Spring 2007
HT as a search tableHT as a search table
Index key
Information/content is distributed, and we need to know where?
Where is this piece of music?What is the location of this type of content?What is the current IP address of this skype user?
05/03/2007 P2P 61
UCDavis, ecs251Spring 2007
DHT as a search tableDHT as a search table
Index key
???
05/03/2007 P2P 62
UCDavis, ecs251Spring 2007
DHT as a search tableDHT as a search table
Index key
???
05/03/2007 P2P 63
UCDavis, ecs251Spring 2007
DHT as a search tableDHT as a search table
Index key
???
05/03/2007 P2P 64
UCDavis, ecs251Spring 2007
DHTDHT
Scalable Peer arrivals, departures, and failures Unstructured versus structured
05/03/2007 P2P 65
UCDavis, ecs251Spring 2007
DHT (Name, Value)DHT (Name, Value)
How to utilize DHT to avoid Trackers in Bittorrent?
05/03/2007 P2P 66
UCDavis, ecs251Spring 2007
DHT-based TrackerDHT-based Tracker
Index key
Whoever owns this hash entry is the tracker for the corresponding key!
FreeBSD 5.4 CD images
Publish the key on the class web site.
Seed’s IP address
PUT & GET
05/03/2007 P2P 67
UCDavis, ecs251Spring 2007
ChordChord
Consistent Hashing A Simple Key Lookup Algorithm Scalable Key Lookup Algorithm Node Joins and Stabilization Node Failures
05/03/2007 P2P 68
UCDavis, ecs251Spring 2007
ChordChord
Given a key (data item), it maps the key onto a peer.
Uses consistent hashing to assign keys to peers.
Solves problem of locating key in a collection of distributed peers.
Maintains routing information as peers join and leave the system
05/03/2007 P2P 69
UCDavis, ecs251Spring 2007
IssuesIssues
Load balance: distributed hash function, spreading keys evenly over peers
Decentralization: chord is fully distributed, no node more important than other, improves robustness
Scalability: logarithmic growth of lookup costs with number of peers in network, even very large systems are feasible
Availability: chord automatically adjusts its internal tables to ensure that the peer responsible for a key can always be found
05/03/2007 P2P 70
UCDavis, ecs251Spring 2007
Example ApplicatioExample Applicationn
Highest layer provides a file-like interface to user including user-friendly naming and authentication
This file systems maps operations to lower-level block operations
Block storage uses Chord to identify responsible node for storing a block and then talk to the block storage server on that node
File System
Block Store
Chord
Block Store
Chord
Block Store
Chord
Client Server Server
05/03/2007 P2P 71
UCDavis, ecs251Spring 2007
Consistent HashingConsistent Hashing Consistent hash function assigns each peer and
key an m-bit identifier. SHA-1 is used as a base hash function. A peer’s identifier is defined by hashing the peer’s
IP address. A key identifier is produced by hashing the key
(chord doesn’t define this. Depends on the application).– ID(peer) = hash(IP, Port)
– ID(key) = hash(key)
05/03/2007 P2P 72
UCDavis, ecs251Spring 2007
Consistent HashingConsistent Hashing In an m-bit identifier space, there are 2m identifiers. Identifiers are ordered on an identifier circle
modulo 2m. The identifier ring is called Chord ring. Key k is assigned to the first peer whose identifier
is equal to or follows (the identifier of) k in the identifier space.
This peer is the successor peer of key k, denoted by successor(k).
05/03/2007 P2P 73
UCDavis, ecs251Spring 2007
6
1
2
6
0
4
26
5
1
3
7
2identifier
circle
identifier
node
X key
Consistent HashingConsistent Hashing - - Successor Successor PeersPeers
successor(1) = 1
successor(2) = 3successor(6) = 0
05/03/2007 P2P 75
UCDavis, ecs251Spring 2007
Consistent HashingConsistent Hashing – Join and – Join and DepartureDeparture
When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n.
When node n leaves the network, all of its assigned keys are reassigned to n’s successor.
05/03/2007 P2P 76
UCDavis, ecs251Spring 2007
Node JoinNode Join
0
4
26
5
1
3
7
keys1
keys2
keys
keys
7
5
05/03/2007 P2P 77
UCDavis, ecs251Spring 2007
Node DepartureNode Departure
0
4
26
5
1
3
7
keys1
keys2
keys
keys6
7
05/03/2007 P2P 80
UCDavis, ecs251Spring 2007
A Simple Key LookupA Simple Key Lookup
A very small amount of routing information suffices to implement consistent hashing in a distributed environment
If each node knows only how to contact its current successor node on the identifier circle, all node can be visited in linear order.
Queries for a given identifier could be passed around the circle via these successor pointers until they encounter the node that contains the key.
05/03/2007 P2P 81
UCDavis, ecs251Spring 2007
A Simple Key LookupA Simple Key Lookup
Pseudo code for finding successor:// ask node n to find the successor of id
n.find_successor(id)
if (id (n, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
05/03/2007 P2P 82
UCDavis, ecs251Spring 2007
A Simple Key LookupA Simple Key Lookup The path taken by a query from node 8 for
key 54:
05/03/2007 P2P 83
UCDavis, ecs251Spring 2007
SuccessorSuccessor
Each active node MUST know the IP address of its successor!– N8 has to know that the next node on the ring is
N14. Departure N8 => N21 But, how about failure or crash?
05/03/2007 P2P 84
UCDavis, ecs251Spring 2007
RobustnessRobustness
Successor in R hops– N8 => N14, N21, N32, N38 (R=4)– Periodic pinging along the path to check, &
also find out maybe there are “new members” in between
05/03/2007 P2P 86
UCDavis, ecs251Spring 2007
Complexity of the searchComplexity of the search
Time/messages: O(N)– N: # of nodes on the Ring
Space: O(1)– We only need to remember R IP addresses
Stablization depends on “period”.
05/03/2007 P2P 87
UCDavis, ecs251Spring 2007
Scalable Key LocationScalable Key Location
To accelerate lookups, Chord maintains additional routing information.
This additional information is not essential for correctness, which is achieved as long as each node knows its correct successor.
05/03/2007 P2P 88
UCDavis, ecs251Spring 2007
Scalable Key Location – Finger Scalable Key Location – Finger TablesTables
Each node n’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table.
The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle.
s = successor(n+2i-1).
s is called the ith finger of node n, denoted by n.finger(i)
05/03/2007 P2P 89
UCDavis, ecs251Spring 2007
Scalable Key Location – Scalable Key Location – Finger Finger TablesTables
0
4
26
5
1
3
7
124
130
finger tablestart succ.
keys1
235
330
finger tablestart succ.
keys2
457
000
finger tablestart succ.
keys6
0+20
0+21
0+22
For.
1+20
1+21
1+22
For.
3+20
3+21
3+22
For.
05/03/2007 P2P 90
UCDavis, ecs251Spring 2007
Finger TablesFinger Tables
A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node.
The first finger of n is the immediate successor of n on the circle.
05/03/2007 P2P 91
UCDavis, ecs251Spring 2007
Scalable Key Location – Example Scalable Key Location – Example queryquery
The path a query for key 54 starting at node 8:
05/03/2007 P2P 92
UCDavis, ecs251Spring 2007
Scalable Key Location – A Scalable Key Location – A characteristiccharacteristic
Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier. From this intuition follows a theorem:
Theorem: With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(logN).
05/03/2007 P2P 93
UCDavis, ecs251Spring 2007
Complexity of the SearchComplexity of the Search
Time/messages: O(logN)– N: # of nodes on the Ring
Space: O(logN)– We need to remember R IP addresses– We need to remember logN Fingers
Stablization depends on “period”.
05/03/2007 P2P 94
UCDavis, ecs251Spring 2007 An ExampleAn Example
M = 4096 (identifier size), ring size is 24096
N = 216 (# of nodes) How many entries we need to have for the
Finger Table?
Each node n’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table.The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle.
s = successor(n+2i-1).
05/03/2007 P2P 95
UCDavis, ecs251Spring 2007
Complexity of the SearchComplexity of the Search
Time/messages: O(M)– M: # of bits of the identifier
Space: O(M)– We need to remember R IP addresses– We need to remember M Fingers
Stablization depends on “period”.
05/03/2007 P2P 96
UCDavis, ecs251Spring 2007
Structured PeeringStructured Peering
Peer identity and routability– 2M identifiers, Finger Table routing
Key/content assignment– Hashing
Dynamics/Failures– Inconsistency??
05/03/2007 P2P 97
UCDavis, ecs251Spring 2007
Node Joins and Node Joins and StabilizationsStabilizations
The most important thing is the successor pointer. If the successor pointer is ensured to be up to date,
which is sufficient to guarantee correctness of lookups, then finger table can always be verified.
Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table.
05/03/2007 P2P 98
UCDavis, ecs251Spring 2007
Node Joins and Node Joins and StabilizationsStabilizations
“Stabilization” protocol contains 6 functions:– create( )– join( )– stabilize( )– notify( )– fix_fingers( )– check_predecessor( )
05/03/2007 P2P 99
UCDavis, ecs251Spring 2007
Node Joins – join()Node Joins – join()
When node n first starts, it calls n.join(n’), where n’ is any known Chord node.
The join() function asks n’ to find the immediate successor of n.
join() does not make the rest of the network aware of n.
05/03/2007 P2P 100
UCDavis, ecs251Spring 2007
Node Joins – join()Node Joins – join()
// create a new Chord ring.n.create()
predecessor = nil;successor = n;
// join a Chord ring containing node n’.n.join(n’)
predecessor = nil;successor = n’.find_successor(n);
05/03/2007 P2P 102
UCDavis, ecs251Spring 2007
Node Joins – stabilize()Node Joins – stabilize()
Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead.
stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n.
The successor does this only if it knows of no closer predecessor than n.
05/03/2007 P2P 103
UCDavis, ecs251Spring 2007
Node Joins – stabilize()Node Joins – stabilize()
// called periodically. verifies n’s immediate// successor, and tells the successor about n.n.stabilize()
x = successor.predecessor;if (x (n, successor))
successor = x;successor.notify(n);
// n’ thinks it might be our predecessor.n.notify(n’)if (predecessor is nil or n’ (predecessor, n))
predecessor = n’;
05/03/2007 P2P 104
UCDavis, ecs251Spring 2007
Node JoinsNode Joins – Join and – Join and StabilizatioStabilizationn
np
su
cc(n
p)
= n
s
ns
n
pre
d(n
s)
= n
p n joins
– predecessor = nil
– n acquires ns as successor via some n’
n runs stabilize
– n notifies ns being the new predecessor
– ns acquires n as its predecessor
np runs stabilize
– np asks ns for its predecessor (now n)
– np acquires n as its successor
– np notifies n
– n will acquire np as its predecessor
all predecessor and successor pointers are now correct
fingers still need to be fixed, but old fingers will still work
nil
pre
d(n
s)
= n
su
cc(n
p)
= n
05/03/2007 P2P 105
UCDavis, ecs251Spring 2007
Node Joins – fix_fingers()Node Joins – fix_fingers()
Each node periodically calls fix fingers to make sure its finger table entries are correct.
It is how new nodes initialize their finger tables
It is how existing nodes incorporate new nodes into their finger tables.
05/03/2007 P2P 106
UCDavis, ecs251Spring 2007
Node Joins – fix_fingers()Node Joins – fix_fingers()
// called periodically. refreshes finger table entries.n.fix_fingers()
next = next + 1 ;if (next > m)
next = 1 ;finger[next] = find_successor(n + 2next-1);
// checks whether predecessor has failed.n.check_predecessor()
if (predecessor has failed)predecessor = nil;
05/03/2007 P2P 108
UCDavis, ecs251Spring 2007
Node Node FailureFailuress
Key step in failure recovery is maintaining correct successor pointers
To help achieve this, each node maintains a successor-list of its r nearest successors on the ring
If node n notices that its successor has failed, it replaces it with the first live entry in the list
Successor lists are stabilized as follows: – node n reconciles its list with its successor s by copying s’s successor list,
removing its last entry, and prepending s to it. – If node n notices that its successor has failed, it replaces it with the first
live entry in its successor list and reconciles its successor list with its new successor.
05/03/2007 P2P 109
UCDavis, ecs251Spring 2007
Chord – The MathChord – The Math
Every node is responsible for about K/N keys (N nodes, K keys)
When a node joins or leaves an N-node network, only O(K/N) keys change hands (and only to and from joining or leaving node)
Lookups need O(log N) messages
To reestablish routing invariants and finger tables after node joining or leaving, only O(log2N) messages are required