An Overlay Infrastructure for Decentralized Object Location and Routing
description
Transcript of An Overlay Infrastructure for Decentralized Object Location and Routing
An Overlay Infrastructure for Decentralized Object Location and Routing
Ben Y. [email protected]
University of California at BerkeleyComputer Science Division
April 22, 2023 [email protected] 2
Peer-based Distributed Computing
Cooperative approach to large-scale applications peer-based: available resources scale w/ # of participants better than client/server: limited resources & scalability
Large-scale, cooperative applications are coming content distribution networks (e.g. FastForward) large-scale backup / storage utilities
leverage peers’ storage for higher resiliency / availability cooperative web caching application-level multicast
video on-demand, streaming movies
April 22, 2023 [email protected] 3
What Are the Technical Challenges? File system: replicate files for resiliency/performance
how do you find close by replicas? how does this scale to millions of users? billions of files?
April 22, 2023 [email protected] 4
Node Membership Changes Nodes join and leave the overlay, or fail
data or control state needs to know about available resources node membership management a necessity
April 22, 2023 [email protected] 5
A Fickle Internet Internet disconnections are not rare (UMichTR98,IMC02)
TCP retransmission is not enough, need route-around IP route repair takes too long: IS-IS 5s, BGP 3-15mins good end-to-end performance requires fast response to faults
April 22, 2023 [email protected] 6
reliable comm.reliable comm.reliable comm.reliable communication
dynamic membershipdynamic membershipdynamic membershipdynamic node membership algorithms
data locationdata locationdata locationefficient, scalable data location
FastForwardYahoo IMSETI
An Infrastructure Approach First generation of large-scale apps: vertical approach Hard problems, difficult to get right
instead, solve common challenges once build single overlay infrastructure at application layer
Internet
networktransportsession
presentationoverlay
physicallink
application
April 22, 2023 [email protected] 7
Personal Research Roadmap
Tapestry
robust dynamicalgorithms
structuredoverlay APIs
resilientoverlay routing
WAN deployment(1500+ downloads)
SPAA 02 / TOCS IPTPS 03ICNP 03
JSAC 04
landmark routing(Brocade)
IPTPS 02
DOLR
PRR 97
multicast(Bayeux)NOSSDAV 02
file system(Oceanstore)
ASPLOS99/FAST03
spam filtering(SpamWatch)
Middleware 03
rapid mobility(Warp)IPTPS 04
a p p l i c a t i o n s
service discoveryservice
XSet lightweightXML DB
Mobicom 99 5000+ downloads
TSpaces
modeling of non-stationary datasets
April 22, 2023 [email protected] 8
Talk Outline Motivation
Decentralized object location and routing Resilient routing Tapestry deployment performance Wrap-up
What should this infrastructure look like?
here is one appealing direction…
April 22, 2023 [email protected] 10
Node IDs and keys from randomized namespace (SHA-1) incremental routing towards destination ID each node has small set of outgoing routes, e.g. prefix
routing log (n) neighbors per node, log (n) hops between any node
pair
Structured Peer-to-Peer Overlays
To: ABCD
ID: ABCE
A930AB5F
ABC0
April 22, 2023 [email protected] 11
Related Work Unstructured Peer to Peer Approaches
Napster, Gnutella, KaZaa probabilistic search (optimized for the hay, not the needle) locality-agnostic routing (resulting in high network b/w costs)
Structured Peer to Peer Overlays the first protocols (2001): Tapestry, Pastry, Chord, CAN then: Kademlia, SkipNet, Viceroy, Symphony, Koorde,
Ulysseus… distinction: how to choose your neighbors
Tapestry, Pastry: latency-optimized routing mesh distinction: application interface
distributed hash table: put (key, data); data = get (key); Tapestry: decentralized object location and routing
April 22, 2023 [email protected] 12
Defining the Requirements1. efficient routing to nodes and data
o low routing stretch (ratio of latency to shortest path distance)
2. flexible data locationo applications want/need to control data placement
o allows for application-specific performance optimizationso directory interface
publish (ObjID), RouteToObj(ObjID, msg)
3. resilient and responsive to faults o more than just retransmission, route around failureso reduce negative impact (loss/jitter) on the application
April 22, 2023 [email protected] 13
backbone
Decentralized Object Location & Routing
redirect data traffic using log(n) in-network redirection pointers average # of pointers/machine: log(n) * avg files/machine
keys to performance proximity-enabled routing mesh with routing convergence
k
k
publish(k)
routeobj(k)
routeobj(k)
April 22, 2023 [email protected] 14
Why Proximity Routing?
Fewer/shorter IP hops: shorter e2e latency, less bandwidth/congestion, less likely to cross broken/lossy links
0123401234
April 22, 2023 [email protected] 15
Performance Impact (Proximity)
Simulated Tapestry w/ and w/o proximity on 5000 node transit-stub network Measure pair-wise routing stretch between 200 random nodes
Prefix Routing w/ and w/o Proximity
0
20
40
60
80
100
120M
ean
Rout
ing
Stre
tch
Ideal (RDP=1)ProximityRandomized
Ideal (RDP=1) 1 1 1Proximity 1.46 1.73 1.79Randomized 108.31 15.81 4.46
In-LAN In-WAN Far-WAN
April 22, 2023 [email protected] 16
DOLR vs. Distributed Hash Table
DHT: hash content name replica placement modifications replicating new version into DHT
DOLR: app places copy near requests, overlay routes msgs to it
April 22, 2023 [email protected] 17
Performance Impact (DOLR)
simulated Tapestry w/ DOLR and DHT interfaces on 5000 node T-S measure route to object latency from clients in 2 stub networks DHT: 5 object replicas DOLR: 1 replica placed in each stub network
0
200
400
600
800
1000
1200
1400
64 256 1024 4096
Overlay Size
Ave
rage
Rou
ting
Late
ncy
DHT Min DHT Avg DHT Max DOLR
April 22, 2023 [email protected] 18
Talk Outline Motivation Decentralized object location and routing
Resilient and responsive routing Tapestry deployment performance Wrap-up
How do you get fast responses to faults?
Response time = fault-detection + alternate path discovery + time to switch
April 22, 2023 [email protected] 20
Fast Response via Static Resiliency Reducing fault-detection time
monitor paths to neighbors with periodic UDP probes O(log(n)) neighbors: higher frequency w/ low bandwidth exponentially weighted moving average for link quality estimation
avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - ) Ln-1 + p
Eliminate synchronous backup path discovery actively maintain redundant paths, redirect traffic immediately repair redundancy asynchronously
create and store backups at node insertion restore redundancy via random pair-wise queries after failures
End result fast detection + precomputed paths = increased responsiveness
April 22, 2023 [email protected] 21
Routing Policies Use estimated overlay link
quality to choose shortest “usable” link
Use shortest overlay link withminimal quality > T
Alternative policies prioritize low loss over latency
use least lossy overlay link use path w/ minimal “cost
function”cf = x latency + y loss rate
April 22, 2023 [email protected] 22
Talk Outline Motivation Decentralized object location and routing Resilient and responsive routing
Tapestry deployment performance Wrap-up
April 22, 2023 [email protected] 23
Tapestry, a DOLR Protocol Routing based on incremental prefix matching Latency-optimized routing mesh
nearest neighbor algorithm (HKRZ02) supports massive failures and large group joins
Built-in redundant overlay links 2 backup links maintained w/ each primary
Use “objects” as endpoints for rendezvous nodes publish names to announce their presence e.g. wireless proxy publishes nearby laptop’s ID e.g. multicast listeners publish multicast session name to self
organize
April 22, 2023 [email protected] 24
Weaving a Tapestry
Existing Tapestry
inserting node (0123) into network1. route to own ID, find 012X nodes, fill last column2. request backpointers to 01XX nodes3. measure distance, add to rTable4. prune to nearest K nodes5. repeat 2—4
ID = 0123XXXX 0XXX 01XX 012X
1XXX2XXX3XXX
00XX
02XX03XX
010X011X
013X
012001210122
April 22, 2023 [email protected] 25
Implementation Performance Java implementation
35000+ lines in core Tapestry, 1500+ downloads Micro-benchmarks
per msg overhead: ~ 50s, most latency from byte copying
performance scales w/ CPU speedup 5KB msgs on P-IV 2.4Ghz: throughput ~ 10,000 msgs/sec
Routing stretch route to node: < 2 route to objects/endpoints: < 3
higher stretch for close by objects
April 22, 2023 [email protected] 26
Responsiveness to Faults (PlanetLab)
B/W network size N, N=300 7KB/s/node, N=106 20KB/s sim: if link failure < 10%, can route around 90% of survivable failures
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200
Link Probe Period (ms)
Tim
e to
Sw
itch
Rou
tes
(ms)
alpha=0.2alpha=0.4
300
660 = 0.2 = 0.4
April 22, 2023 [email protected] 27
0
100
200
0 5 10 15 20 25 30
Time (minutes)
Succ
ess
Rat
e (%
)
0
50
100
150
200
250
Net
wor
k Si
ze
Stability Under Membership Changes
Routing operations on 40 node Tapestry cluster Churn: nodes join/leave every 10 seconds, average lifetime = 2mins
success rate (%)
killnodes
largegroup join
constantchurn
April 22, 2023 [email protected] 28
Talk Outline Motivation Decentralized object location and routing Resilient and responsive routing Tapestry deployment performance
Wrap-up
April 22, 2023 [email protected] 29
Lessons and Takeaways Consider system constraints in algorithm design
limited by finite resources (e.g. file descriptors, bandwidth) simplicity wins over small performance gains
easier adoption and faster time to implementation Wide-area state management (e.g. routing state)
reactive algorithm for best-effort, fast response proactive periodic maintenance for correctness
Naïve event programming model is too low-level much code complexity from managing stack state
important for protocols with asychronous control algorithms need explicit thread support for callbacks / stack
management
April 22, 2023 [email protected] 30
Future Directions Ongoing work to explore p2p application space
resilient anonymous routing, attack resiliency Intelligent overlay construction
router-level listeners allow application queries efficient meshes, fault-independent backup links, failure notify
Deploying and measuring a lightweight peer-based application focus on usability and low overhead p2p incentives, security, deployment meet the real world
A holistic approach to overlay security and control p2p good for self-organization, not for security/ management decouple administration from normal operation explicit domains / hierarchy for configuration, analysis, control
April 22, 2023 [email protected] 32
Impact of Correlated Events
web / application servers independent requests maximize individual throughput
Network
???
???
?
ABC
correlated requests: A+B+CD e.g. online continuous queries, sensor
aggregation, p2p control layer, streaming data mining
event handler
+ + =
April 22, 2023 [email protected] 33
Some Details Simple fault detection techniques
periodically probe overlay links to neighbors exponentially weighted moving average for link quality estimation
avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - ) Ln-1 + p p = instantaneous loss rate, = filter constant
other techniques topics of open research How do we get and repair the backup links?
each hop has flexible routing constraint e.g. in prefix routing, 1st hop just requires 1 fixed digit backups always available until last hop to destination
create and store backups at node insertion restore redundancy via random pair-wise queries after failures
e.g. to replace 123X neighbor, talk to local 12XX neighbors
April 22, 2023 [email protected] 34
Route Redundancy (Simulator)
Simulation of Tapestry, 2 backup paths per routing entry 2 backups: low maintenance overhead, good resiliency
00.10.20.30.40.50.60.70.80.9
1
0 0.05 0.1 0.15 0.2Proportion of IP Links Broken
Porti
on o
f All
Pairs
Rea
chab
le
Instantaneous IP Tapestry / FRLS
April 22, 2023 [email protected] 35
Another Perspective on Reachability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.05 0.1 0.15 0.2
Proportion of IP Links Broken
Prop
ortio
n of
All
Path
s
Portion of all pair-wise paths where
no failure-free paths remain
Portion of all paths where IP and FRLS both
route successfully
A path exists, but neither IP nor
FRLS can locate the path
FRLS finds path, where
short-term IP routing fails
April 22, 2023 [email protected] 36
Single Node Software Architecture
SEDA event-driven frameworkJava Virtual Machine
Dynamic Tap.
distance map
core router
application programming interface
applications
Patchwork
network
April 22, 2023 [email protected] 37
Related Work Unstructured Peer to Peer Applications
Napster, Gnutella, KaZaa probabilistic search, difficult to scale, inefficient b/w
Structured Peer to Peer Overlays Chord, CAN, Pastry, Kademlia, SkipNet, Viceroy, Symphony, Koorde,
Coral, Ulysseus, … routing efficiency application interface
Resilient routing traffic redirection layers
Detour, Resilient Overlay Networks (RON), Internet Indirection Infrastructure (I3)
our goals: scalability, in-network traffic redirection
April 22, 2023 [email protected] 38
Node to Node Routing (PlanetLab)
Ratio of end-to-end latency to ping distance between nodes
All node pairs measured, placed into buckets
0
5
10
15
20
25
30
35
0 50 100 150 200 250 300
Internode RTT Ping time (5ms buckets)
RDP
(min
, med
, 90%
) Median=31.5, 90th percentile=135
April 22, 2023 [email protected] 39
Object Location (PlanetLab)
Ratio of end-to-end latency to client-object ping distance Local-area stretch improved w/ additional location state
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160 180 200
Client to Obj RTT Ping time (1ms buckets)
RD
P (m
in, m
edia
n, 9
0%)
90th percentile=158
April 22, 2023 [email protected] 40
Micro-benchmark Results (LAN)
0.01
0.1
1
10
100
0.06
0.13
0.25 0.
5 1 2 4 8
16 32 64
128
256
512
1024
2048
Message Size (KB)
Tran
smis
sion
Tim
e (s
)
P-III 1Ghz
P-IV 2.4Ghz
P-III 2.3 Speedup
Per msg overhead ~ 50s, latency dominated by byte copying Performance scales with CPU speedup For 5K messages, throughput = ~10,000 msgs/sec
0
10
20
30
40
50
60
70
80
90
0.06 0.13 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 2048Message Size (KB)
Band
wid
th (M
B/s)
P-III 1Ghz LocalP-IV 2.4Ghz LocalP-IV 2.4Ghz 100MBE
100mb/s
April 22, 2023 [email protected] 41
B
Traffic TunnelingLegacyNode A
LegacyNode B
ProxyProxy
registerregister
Structured Peer to Peer Overlay
put (hash(B), P’(B))
P’(B)
get (hash(B)) P’(B)
A, B are IP addresses
put (hash(A), P’(A))
Store mapping from end host IP to its proxy’s overlay ID Similar to approach in Internet Indirection Infrastructure (I3)
April 22, 2023 [email protected] 42
Constrained Multicast Used only when all paths are below
quality threshold Send duplicate messages on
multiple paths Leverage route convergence
Assign unique message IDs Mark duplicates Keep moving window of IDs Recognize and drop duplicates
Limitations Assumes loss not from congestion Ideal for local area routing
2046
1111
2281 2530
2299 2274 2286
2225
? ? ?
April 22, 2023 [email protected] 43
Link Probing Bandwidth (PL)
0
1
2
3
4
5
6
7
1 10 100 1000
Size of Overlay
Ban
dwid
th P
er N
ode
(KB
/s)
PR=300msPR=600ms
Bandwidth increases logarithmically with overlay size Medium sized routing overlays incur low probing bandwidth