An Overlay Infrastructure for Decentralized Object Location and Routing

Ben Y. Zhaoravenben@eecs.berkeley.edu

University of California at BerkeleyComputer Science Division

April 22, 2023 ravenben@eecs.berkeley.edu 2

Peer-based Distributed Computing

Cooperative approach to large-scale applications peer-based: available resources scale w/ # of participants better than client/server: limited resources & scalability

Large-scale, cooperative applications are coming content distribution networks (e.g. FastForward) large-scale backup / storage utilities

leverage peers’ storage for higher resiliency / availability cooperative web caching application-level multicast

video on-demand, streaming movies

What Are the Technical Challenges? File system: replicate files for resiliency/performance

how do you find close by replicas? how does this scale to millions of users? billions of files?

Node Membership Changes Nodes join and leave the overlay, or fail

data or control state needs to know about available resources node membership management a necessity

A Fickle Internet Internet disconnections are not rare (UMichTR98,IMC02)

TCP retransmission is not enough, need route-around IP route repair takes too long: IS-IS 5s, BGP 3-15mins good end-to-end performance requires fast response to faults

reliable comm.reliable comm.reliable comm.reliable communication

dynamic membershipdynamic membershipdynamic membershipdynamic node membership algorithms

data locationdata locationdata locationefficient, scalable data location

FastForwardYahoo IMSETI

An Infrastructure Approach First generation of large-scale apps: vertical approach Hard problems, difficult to get right

instead, solve common challenges once build single overlay infrastructure at application layer

Internet

networktransportsession

presentationoverlay

physicallink

application

Personal Research Roadmap

Tapestry

robust dynamicalgorithms

structuredoverlay APIs

resilientoverlay routing

WAN deployment(1500+ downloads)

SPAA 02 / TOCS IPTPS 03ICNP 03

JSAC 04

landmark routing(Brocade)

IPTPS 02

PRR 97

multicast(Bayeux)NOSSDAV 02

file system(Oceanstore)

ASPLOS99/FAST03

spam filtering(SpamWatch)

Middleware 03

rapid mobility(Warp)IPTPS 04

a p p l i c a t i o n s

service discoveryservice

XSet lightweightXML DB

Mobicom 99 5000+ downloads

TSpaces

modeling of non-stationary datasets

Talk Outline Motivation

Decentralized object location and routing Resilient routing Tapestry deployment performance Wrap-up

What should this infrastructure look like?

here is one appealing direction…

Node IDs and keys from randomized namespace (SHA-1) incremental routing towards destination ID each node has small set of outgoing routes, e.g. prefix

routing log (n) neighbors per node, log (n) hops between any node

Structured Peer-to-Peer Overlays

To: ABCD

ID: ABCE

A930AB5F

Related Work Unstructured Peer to Peer Approaches

Napster, Gnutella, KaZaa probabilistic search (optimized for the hay, not the needle) locality-agnostic routing (resulting in high network b/w costs)

Structured Peer to Peer Overlays the first protocols (2001): Tapestry, Pastry, Chord, CAN then: Kademlia, SkipNet, Viceroy, Symphony, Koorde,

Ulysseus… distinction: how to choose your neighbors

Tapestry, Pastry: latency-optimized routing mesh distinction: application interface

distributed hash table: put (key, data); data = get (key); Tapestry: decentralized object location and routing

Defining the Requirements1. efficient routing to nodes and data

o low routing stretch (ratio of latency to shortest path distance)

2. flexible data locationo applications want/need to control data placement

o allows for application-specific performance optimizationso directory interface

publish (ObjID), RouteToObj(ObjID, msg)

3. resilient and responsive to faults o more than just retransmission, route around failureso reduce negative impact (loss/jitter) on the application

backbone

Decentralized Object Location & Routing

redirect data traffic using log(n) in-network redirection pointers average # of pointers/machine: log(n) * avg files/machine

keys to performance proximity-enabled routing mesh with routing convergence

publish(k)

routeobj(k)

Why Proximity Routing?

Fewer/shorter IP hops: shorter e2e latency, less bandwidth/congestion, less likely to cross broken/lossy links

0123401234

Performance Impact (Proximity)

Simulated Tapestry w/ and w/o proximity on 5000 node transit-stub network Measure pair-wise routing stretch between 200 random nodes

Prefix Routing w/ and w/o Proximity

Ideal (RDP=1)ProximityRandomized

Ideal (RDP=1) 1 1 1Proximity 1.46 1.73 1.79Randomized 108.31 15.81 4.46

In-LAN In-WAN Far-WAN

DOLR vs. Distributed Hash Table

DHT: hash content name replica placement modifications replicating new version into DHT

DOLR: app places copy near requests, overlay routes msgs to it

Performance Impact (DOLR)

simulated Tapestry w/ DOLR and DHT interfaces on 5000 node T-S measure route to object latency from clients in 2 stub networks DHT: 5 object replicas DOLR: 1 replica placed in each stub network

64 256 1024 4096

Overlay Size

DHT Min DHT Avg DHT Max DOLR

Talk Outline Motivation Decentralized object location and routing

Resilient and responsive routing Tapestry deployment performance Wrap-up

How do you get fast responses to faults?

Response time = fault-detection + alternate path discovery + time to switch

Fast Response via Static Resiliency Reducing fault-detection time

monitor paths to neighbors with periodic UDP probes O(log(n)) neighbors: higher frequency w/ low bandwidth exponentially weighted moving average for link quality estimation

avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - ) Ln-1 + p

Eliminate synchronous backup path discovery actively maintain redundant paths, redirect traffic immediately repair redundancy asynchronously

create and store backups at node insertion restore redundancy via random pair-wise queries after failures

End result fast detection + precomputed paths = increased responsiveness

Routing Policies Use estimated overlay link

quality to choose shortest “usable” link

Use shortest overlay link withminimal quality > T

Alternative policies prioritize low loss over latency

use least lossy overlay link use path w/ minimal “cost

function”cf = x latency + y loss rate

Talk Outline Motivation Decentralized object location and routing Resilient and responsive routing

Tapestry deployment performance Wrap-up

Tapestry, a DOLR Protocol Routing based on incremental prefix matching Latency-optimized routing mesh

nearest neighbor algorithm (HKRZ02) supports massive failures and large group joins

Built-in redundant overlay links 2 backup links maintained w/ each primary

Use “objects” as endpoints for rendezvous nodes publish names to announce their presence e.g. wireless proxy publishes nearby laptop’s ID e.g. multicast listeners publish multicast session name to self

organize

Weaving a Tapestry

Existing Tapestry

inserting node (0123) into network1. route to own ID, find 012X nodes, fill last column2. request backpointers to 01XX nodes3. measure distance, add to rTable4. prune to nearest K nodes5. repeat 2—4

ID = 0123XXXX 0XXX 01XX 012X

1XXX2XXX3XXX

02XX03XX

010X011X

012001210122

Implementation Performance Java implementation

35000+ lines in core Tapestry, 1500+ downloads Micro-benchmarks

per msg overhead: ~ 50s, most latency from byte copying

performance scales w/ CPU speedup 5KB msgs on P-IV 2.4Ghz: throughput ~ 10,000 msgs/sec

Routing stretch route to node: < 2 route to objects/endpoints: < 3

higher stretch for close by objects

Responsiveness to Faults (PlanetLab)

B/W network size N, N=300 7KB/s/node, N=106 20KB/s sim: if link failure < 10%, can route around 90% of survivable failures

0 200 400 600 800 1000 1200

Link Probe Period (ms)

alpha=0.2alpha=0.4

660 = 0.2 = 0.4

0 5 10 15 20 25 30

Time (minutes)

Stability Under Membership Changes

Routing operations on 40 node Tapestry cluster Churn: nodes join/leave every 10 seconds, average lifetime = 2mins

success rate (%)

killnodes

largegroup join

constantchurn

Talk Outline Motivation Decentralized object location and routing Resilient and responsive routing Tapestry deployment performance

Wrap-up

Lessons and Takeaways Consider system constraints in algorithm design

limited by finite resources (e.g. file descriptors, bandwidth) simplicity wins over small performance gains

easier adoption and faster time to implementation Wide-area state management (e.g. routing state)

reactive algorithm for best-effort, fast response proactive periodic maintenance for correctness

Naïve event programming model is too low-level much code complexity from managing stack state

important for protocols with asychronous control algorithms need explicit thread support for callbacks / stack

management

Future Directions Ongoing work to explore p2p application space

resilient anonymous routing, attack resiliency Intelligent overlay construction

router-level listeners allow application queries efficient meshes, fault-independent backup links, failure notify

Deploying and measuring a lightweight peer-based application focus on usability and low overhead p2p incentives, security, deployment meet the real world

A holistic approach to overlay security and control p2p good for self-organization, not for security/ management decouple administration from normal operation explicit domains / hierarchy for configuration, analysis, control

Thanks!

Questions, comments?

ravenben@eecs.berkeley.edu

Impact of Correlated Events

web / application servers independent requests maximize individual throughput

Network

correlated requests: A+B+CD e.g. online continuous queries, sensor

aggregation, p2p control layer, streaming data mining

event handler

Some Details Simple fault detection techniques

periodically probe overlay links to neighbors exponentially weighted moving average for link quality estimation

avoid route flapping due to short term loss artifacts loss rate: Ln = (1 - ) Ln-1 + p p = instantaneous loss rate, = filter constant

other techniques topics of open research How do we get and repair the backup links?

each hop has flexible routing constraint e.g. in prefix routing, 1st hop just requires 1 fixed digit backups always available until last hop to destination

create and store backups at node insertion restore redundancy via random pair-wise queries after failures

e.g. to replace 123X neighbor, talk to local 12XX neighbors

Route Redundancy (Simulator)

Simulation of Tapestry, 2 backup paths per routing entry 2 backups: low maintenance overhead, good resiliency

00.10.20.30.40.50.60.70.80.9

0 0.05 0.1 0.15 0.2Proportion of IP Links Broken

Instantaneous IP Tapestry / FRLS

Another Perspective on Reachability

0 0.05 0.1 0.15 0.2

Proportion of IP Links Broken

Portion of all pair-wise paths where

no failure-free paths remain

Portion of all paths where IP and FRLS both

route successfully

A path exists, but neither IP nor

FRLS can locate the path

FRLS finds path, where

short-term IP routing fails

Single Node Software Architecture

SEDA event-driven frameworkJava Virtual Machine

Dynamic Tap.

distance map

core router

application programming interface

applications

Patchwork

network

Related Work Unstructured Peer to Peer Applications

Napster, Gnutella, KaZaa probabilistic search, difficult to scale, inefficient b/w

Structured Peer to Peer Overlays Chord, CAN, Pastry, Kademlia, SkipNet, Viceroy, Symphony, Koorde,

Coral, Ulysseus, … routing efficiency application interface

Resilient routing traffic redirection layers

Detour, Resilient Overlay Networks (RON), Internet Indirection Infrastructure (I3)

our goals: scalability, in-network traffic redirection

Node to Node Routing (PlanetLab)

Ratio of end-to-end latency to ping distance between nodes

All node pairs measured, placed into buckets

0 50 100 150 200 250 300

Internode RTT Ping time (5ms buckets)

) Median=31.5, 90th percentile=135

Object Location (PlanetLab)

Ratio of end-to-end latency to client-object ping distance Local-area stretch improved w/ additional location state

0 20 40 60 80 100 120 140 160 180 200

Client to Obj RTT Ping time (1ms buckets)

90th percentile=158

Micro-benchmark Results (LAN)

0.25 0.

5 1 2 4 8

16 32 64

Message Size (KB)

P-III 1Ghz

P-IV 2.4Ghz

P-III 2.3 Speedup

Per msg overhead ~ 50s, latency dominated by byte copying Performance scales with CPU speedup For 5K messages, throughput = ~10,000 msgs/sec

0.06 0.13 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 2048Message Size (KB)

P-III 1Ghz LocalP-IV 2.4Ghz LocalP-IV 2.4Ghz 100MBE

100mb/s

Traffic TunnelingLegacyNode A

LegacyNode B

ProxyProxy

registerregister

Structured Peer to Peer Overlay

put (hash(B), P’(B))

P’(B)

get (hash(B)) P’(B)

A, B are IP addresses

put (hash(A), P’(A))

Store mapping from end host IP to its proxy’s overlay ID Similar to approach in Internet Indirection Infrastructure (I3)

Constrained Multicast Used only when all paths are below

quality threshold Send duplicate messages on

multiple paths Leverage route convergence

Assign unique message IDs Mark duplicates Keep moving window of IDs Recognize and drop duplicates

Limitations Assumes loss not from congestion Ideal for local area routing

2281 2530

2299 2274 2286

Link Probing Bandwidth (PL)

1 10 100 1000

Size of Overlay

PR=300msPR=600ms

Bandwidth increases logarithmically with overlay size Medium sized routing overlays incur low probing bandwidth

An Overlay Infrastructure for Decentralized Object Location and Routing

Documents

Transcript of An Overlay Infrastructure for Decentralized Object Location and Routing

Decentralized routing in social networks

Orchid: A Decentralized Network Routing Market · Orchid: A Decentralized Network Routing Market Jake S. Cannell1,2 , Justin Sheek1,2 , Jay Freeman2 , Greg Hazel2 , Jennifer Rodriguez-Mueller2

Overlay Networks and Overlay Multicast May 2008. 2 Definition Network -defines addressing, routing, and service model for communication between hosts.

Research Article Parking Backbone: Toward Efficient Overlay ...downloads.hindawi.com/journals/ijdsn/2014/291308.pdfResearch Article Parking Backbone: Toward Efficient Overlay Routing

ir09 13 overlay underlay interactions - TU · PDF file5 Outline Motivation Problems with the underlying routing system Source routing, overlay networks, and hybrids Overlay networks

Topology-aware routing in structured peer-to-peer overlay ...druschel/publications/Pastry-locality.pdf · Topology-aware routing in structured peer-to-peer overlay networks Miguel

Routing, Resource Allocation and Network Design for Overlay ...

Decentralized Overlay for Federation of Enterprise Clouds

Routing Dynamics in Simultaneous Overlay Networks

Overlay Routing Attempt towards Nontrivial Estimation Algorithm to ...

Orchid: A Decentralized Network Routing Market

Routing on P2P Overlay Networks

Orchid: A Decentralized Network Routing Market · 2020-07-10 · Orchid: A Decentralized Network Routing Market Jake S. Cannell1,2 , Justin Sheek1,2 , ... scheme also provides a simple

Turning Heterogeneity into an Advantage in Overlay Routing

DISTANCE AWARE OVERLAY ROUTING WITH AODV IN …yingliu/distance_aware_slides.pdf · Solution-Distance Aware Overlay Routing ... An adjacency link list corresponds to one topological

A Cost-Based Analysis of Overlay Routing Geometries

Decentralized Object Location and Routing: A New Networking Paradigm

Peeriod: An Anonymous Approach for Decentralized Overlay ... · Decentralized Overlay Networks Jonathan Pirnay1, J orn R oder2 Department New Media, School of Art and Design Kassel,

Cost-effective Resource Allocation of Overlay Routing Relay Nodes PDF

Load-Balanced One-hop Overlay Multipath Routing with Path ...