Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Fall 2006CS 5611 Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter...
Fall 2006 CS 561 1
Peer-to-Peer Networks
OutlineOverview
Pastry
OpenDHT
Contributions from Peter Druschel & Sean Rhea
Fall 2006 CS 561 2
What is Peer-to-Peer About?
• Distribution• Decentralized control• Self-organization• Symmetric communication
• P2P networks do two things:– Map objects onto nodes
– Route requests to the node responsible for a given object
Fall 2006 CS 561 4
Object Distribution
objId
Consistent hashing [Karger et al. ‘97]
128 bit circular id space
nodeIds (uniform random)
objIds (uniform random)
Invariant: node with numerically closest nodeId maintains object
nodeIds
O2128 - 1
Fall 2006 CS 561 5
Object Insertion/Lookup
X
Route(X)
Msg with key X is routed to live node with nodeId closest to X
Problem:
complete routing table not feasible
O2128 - 1
Fall 2006 CS 561 6
Routing: Distributed Hash Table
Properties• log16 N steps • O(log N) state
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
Fall 2006 CS 561 7
Leaf Sets
Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. • routing efficiency/robustness • fault detection (keep-alive)• application-specific local coordination
Fall 2006 CS 561 8
Routing Procedureif (D is within range of our leaf set)
forward to numerically closest memberelse
let l = length of shared prefix let d = value of l-th digit in D’s addressif (RouteTab[l,d] exists)
forward to RouteTab[l,d]else
forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node
Fall 2006 CS 561 9
Routing
Integrity of overlay:• guaranteed unless L/2 simultaneous failures of
nodes with adjacent nodeIds
Number of routing hops:
• No failures: < log16 N expected, 128/b + 1 max
• During failure recovery:– O(N) worst case, average case much better
Fall 2006 CS 561 10
Node Addition
d46a1c
Route(d46a1c)
d462ba
d4213f
d13da3
65a1fc
d467c4d471f1
New node: d46a1c
Fall 2006 CS 561 11
Node Departure (Failure)
Leaf set members exchange keep-alive messages
• Leaf set repair (eager): request set from farthest live node in set
• Routing table repair (lazy): get table from peers in the same row, then higher rows
Fall 2006 CS 561 12
PAST: Cooperative, archival file storage and distribution
• Layered on top of Pastry• Strong persistence• High availability• Scalability• Reduced cost (no backup)• Efficient use of pooled resources
Fall 2006 CS 561 13
PAST API
• Insert - store replica of a file at k diverse storage nodes
• Lookup - retrieve file from a nearby live storage node that holds a copy
• Reclaim - free storage associated with a file
Files are immutable
Fall 2006 CS 561 14
PAST: File storage
fileId
Insert fileId
Fall 2006 CS 561 15
PAST: File storage
Storage Invariant: File “replicas” are stored on k nodes with nodeIdsclosest to fileId
(k is bounded by the leaf set size)
fileId
Insert fileId
k=4
Fall 2006 CS 561 16
PAST: File Retrieval
fileId file located in log16 N steps (expected)
usually locates replica nearest client C
Lookup
k replicasC
Fall 2006 CS 561 17
IP
ChordDHT
CFS(MIT)
PastryDHT
PAST(MSR/Rice)
TapestryDHT
OStore(UCB)
BambooDHT
PIER(UCB)
CANDHT
pSearch(HP)
KademliaDHT
Coral(NYU)
ChordDHT
i3(UCB)
DHT Deployment Today
KademliaDHT
Overnet(open)
connectivity
Every application deploys its own DHT(DHT as a library)
Fall 2006 CS 561 18
IP
ChordDHT
CFS(MIT)
PastryDHT
PAST(MSR/Rice)
TapestryDHT
OStore(UCB)
BambooDHT
PIER(UCB)
CANDHT
pSearch(HP)
KademliaDHT
Coral(NYU)
ChordDHT
i3(UCB)
KademliaDHT
Overnet(open)
DHT
connectivity
indirection
OpenDHT: one DHT, shared across applications(DHT as a service)
DHT Deployment Tomorrow?
Fall 2006 CS 561 19
Two Ways To Use a DHT
1. The Library Model– DHT code is linked into application binary
– Pros: flexibility, high performance
2. The Service Model– DHT accessed as a service over RPC
– Pros: easier deployment, less maintenance
Fall 2006 CS 561 20
The OpenDHT Service
• 200-300 Bamboo [USENIX’04] nodes on PlanetLab– All in one slice, all managed by us
• Clients can be arbitrary Internet hosts– Access DHT using RPC over TCP
• Interface is simple put/get:– put(key, value) — stores value under key
– get(key) — returns all the values stored under key
• Running on PlanetLab since April 2004– Building a community of users
Fall 2006 CS 561 21
OpenDHT ApplicationsApplication Uses OpenDHT for
Croquet Media Manager replica location
DOA indexing
HIP name resolution
DTN Tetherless Computing Architecture host mobility
Place Lab range queries
QStream multicast tree construction
VPN Index indexing
DHT-Augmented Gnutella Client rare object search
FreeDB storage
Instant Messaging rendezvous
CFS storage
i3 redirection
Fall 2006 CS 561 22
An Example Application: The CD Database
Compute Disc Fingerprint
Recognize Fingerprint?
Album & Track Titles
Fall 2006 CS 561 23
An Example Application: The CD Database
Type In Album andTrack Titles
Album & Track Titles
No Such Fingerprint
Fall 2006 CS 561 24
A DHT-Based FreeDB Cache
• FreeDB is a volunteer service– Has suffered outages as long as 48 hours
– Service costs born largely by volunteer mirrors
• Idea: Build a cache of FreeDB with a DHT– Add to availability of main service
– Goal: explore how easy this is to do
Fall 2006 CS 561 25
Cache IllustrationDHTDHT
New AlbumsDisc
Fingerprin
t
Disc In
fo
Disc Fingerprint
Fall 2006 CS 561 26
Is Providing DHT Service Hard?
• Is it any different than just running Bamboo?– Yes, sharing makes the problem harder
• OpenDHT is shared in two senses– Across applications need a flexible interface
– Across clients need resource allocation
Fall 2006 CS 561 27
Sharing Between Applications
• Must balance generality and ease-of-use– Many apps want only simple put/get
– Others want lookup, anycast, multicast, etc.
• OpenDHT allows only put/get– But use client-side library, ReDiR, to build others
– Supports lookup, anycast, multicast, range search
– Only constant latency increase on average
– (Different approach used by DimChord [KR04])
Fall 2006 CS 561 28
Sharing Between Clients
• Must authenticate puts/gets/removes– If two clients put with same key, who wins?
– Who can remove an existing put?
• Must protect system’s resources– Or malicious clients can deny service to others
– The remainder of this talk
Fall 2006 CS 561 29
Fair Storage Allocation
• Our solution: give each client a fair share– Will define “fairness” in a few slides
• Limits strength of malicious clients– Only as powerful as they are numerous
• Protect storage on each DHT node separately– Global fairness is hard– Key choice imbalance is a burden on DHT– Reward clients that balance their key choices
Fall 2006 CS 561 30
Two Main Challenges
1. Making sure disk is available for new puts– As load changes over time, need to adapt
– Without some free disk, our hands are tied
2. Allocating free disk fairly across clients– Adapt techniques from fair queuing
Fall 2006 CS 561 31
Making Sure Disk is Available
• Can’t store values indefinitely– Otherwise all storage will eventually fill
• Add time-to-live (TTL) to puts– put (key, value) put (key, value, ttl)
– (Different approach used by Palimpsest [RH03])
Fall 2006 CS 561 32
Making Sure Disk is Available
• TTLs prevent long-term starvation– Eventually all puts will expire
• Can still get short term starvation:
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
Client A’s valuesstart expiring
B Starves
Fall 2006 CS 561 33
Making Sure Disk is Available
• Stronger condition:Be able to accept rmin bytes/sec new data at all times
Reserved for futureputs. Slope = rmin
Candidate put
TTL
size
Sum must be < max capacity
time
spac
e
max
max0now
Fall 2006 CS 561 34
Making Sure Disk is Available
• Stronger condition:Be able to accept rmin bytes/sec new data at all times
TTL
size
time
spac
e
max
max0now
TTLsize
time
spac
e
max
max0now
Fall 2006 CS 561 35
Fair Storage Allocation
Per-clientput queues
Queue full:reject put
Not full:enqueue put
Select mostunder-
represented
Wait until canaccept withoutviolating rmin
Store andsend accept
message to client
The Big Decision: Definition of “most under-represented”
Fall 2006 CS 561 36
Defining “Most Under-Represented”• Not just sharing disk, but disk over time
– 1-byte put for 100s same as 100-byte put for 1s– So units are bytes seconds, call them commitments
• Equalize total commitments granted?– No: leads to starvation– A fills disk, B starts putting, A starves up to max TTL
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
B catches up with A
Now A Starves!
Fall 2006 CS 561 37
Defining “Most Under-Represented”• Instead, equalize rate of commitments granted
– Service granted to one client depends only on others putting “at same time”
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
B catches up with A
A & B shareavailable rate
Fall 2006 CS 561 38
Defining “Most Under-Represented”• Instead, equalize rate of commitments granted
– Service granted to one client depends only on others putting “at same time”
• Mechanism inspired by Start-time Fair Queuing– Have virtual time, v(t)
– Each put gets a start time S(pci) and finish time F(pc
i)
F(pci) = S(pc
i) + size(pci) ttl(pc
i)
S(pci) = max(v(A(pc
i)) - , F(pci-1))
v(t) = maximum start time of all accepted puts
Fall 2006 CS 561 39
Fairness with Different Arrival Times
Fall 2006 CS 561 40
Fairness With Different Sizes and TTLs
Fall 2006 CS 561 41
Performance
• Only 28 of 7 million values lost in 3 months– Where “lost” means unavailable for a full hour
• On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to PL kernel bug, only lost one value
Fall 2006 CS 561 42
Performance
• Median get latency ~250 ms– Median RTT between hosts ~ 140 ms
• But 95th percentile get latency is atrocious– And even median spikes up from time to time
Fall 2006 CS 561 43
The Problem: Slow Nodes
• Some PlanetLab nodes are just really slow– But set of slow nodes changes over time
– Can’t “cherry pick” a set of fast nodes
– Seems to be the case on RON as well
– May even be true for managed clusters (MapReduce)
• Modified OpenDHT to be robust to such slowness– Combination of delay-aware routing and redundancy
– Median now 66 ms, 99th percentile is 320 ms • using 2X redundancy