MEPDG AC/AC Overlay Design. MEPDG Overlay Designs PMS & Rehab Example Initial Overlay design.dgp.
Resilient Overlay Networks (RON)
description
Transcript of Resilient Overlay Networks (RON)
WORK BY: ANDERSEN, BALAKRISHNAN, KAASHOEK, AND MORRIS
APPEARED IN: SOSP OCT 2001
PRESENTED BY: MATT TROWER AND MARK OVERHOLT
S O M E I M A G E S A R E TA K E N F R O M O R I G I N A L P R E S E N TAT I O N
Resilient Overlay Networks (RON)
2
Background
Overlay network: Network built on top of a network Examples: Internet (telephone), Gnutella (Internet),
RON (Internet + Internet2)
AUP: Acceptable Use Policy (educational Internet2 can’t be used for commercial traffic)
3
Motivation
BGP is slow to converge (~10 minutes) TCP Timeout less than 512 seconds typically
End-to-End paths unavailable 3.3% of the time
5% of outages last more than 2 hoursFailures caused by config errors, cut lines
(ships), DOS attacks
What if we need more than 4 9’s of reliability?
4
Sidenote
Anderson started ISP in Utah before going to MIT
ResearchIndustry
5
Goals
Detect failures and recover in less than 20s
Integrate routing decisions with application needs
Expressive Policy Routing Per-user rate controls OR packet-based AUPs
6
Big Idea
Use inherent diversity of paths to provide link redundancy
Use failure detectors for small subset of nodes in network
Select best path from multiple options
7
Triangle Inequality
Best Path ≠ Direct Path
8
Design
C++ Application level library (send/recv interface)
Prober ProberRouter RouterForwarder ForwarderConduit Conduit
PerformanceDatabase
Application-specific routing tables Policy routing module
9
Failure Detection
Send probes over N-1 links O() Probe interval: 12 seconds Probe timeout: 3 seconds Routing update interval: 14 seconds
Send 3 fast retries on failure
10
Overhead
Acceptable for small cliques
11
Experimental Setup
Two Configurations Ron1: 12 hosts in US and Europe, mix of academic
and industry 64 hours collected in March 2001
Ron2: Original 12 hosts plus 4 new hosts 85 hours collected in May 2001
12
Analysis – Packet Loss
RON1UDP
13
Analysis - Latency
RON1UDP
14
Analysis - Throughput
RON1TCP
15
Resiliency to DOS Attacks
Done on 3-node systemAck’s take different pathUtah’s NetworkEmulation Testbed
16
Pros & Cons
Pros Recovers from complete outages and severe
congestion Doubles throughput in 5% of samples Single-hop redirection sufficient
Cons Valid paths not always considered
Cisco->MIT->NC-Cable-> CMU growth limits scalability Route Ack’s
17
Discussion
How does geographic distribution affect RON?
Why doesn’t BGP do this already?
What if everyone was part of a RON?
What if all latencies fall below 20ms?
18
WORK BY: RATNASAMY, FRANCIS, HANDLEY, AND KARP
APPEARED IN: SIGCOMM ‘01
A Scalable Content-Addressable Network
19
Distributed Hash Table
A Decentralized system that provides key-value pair lookup service across a distributed system.
DHTs support Insert, Lookup, and Deletion of Data
Image from Wikipedia
20
Content Addressable Network
CAN is a design for a distributed hash table.CAN was one of the original four DHT
proposals. It was introduced concurrently with Chord, Pastry, and Tapestry
21
Motivation
P2P Applications were the forefront in designing CAN.
P2P Apps were scalable in their file transfers, but not very scalable in their indexing of content.
Originally designed as a scalable index for P2P content.
Use of CAN was not limited to P2P apps however. Could also be used in large scale storage systems, content distribution systems, or DNS.
22
CAN: Basic Design
The Overlay Network is a Cartesian Coordinate Space on a d-torus (The coordinate system wraps around).
Each node of the CAN is assigned a “Zone” of the d-dimensional space to manage.
Each Node only has knowledge of nodes in Neighboring Zones.
Assume for now, that each zone has only 1 node.
23
Example
24
Example
Neighbor Lists: Zone 1 knows about Zones 4, 5, and 3Zone 2 knows about Zones 4, 5, and 3Zone 3 knows about Zones 5, 1, and 2Zone 4 knows about Zones 2 and 1Zone 5 knows about Zones 1, 2, and 3
25
Inserting Data in a CAN
Given a (Key,Value) Pair, hash the Key d different ways, where d is the # of Dimensions
The resulting coordinate is mapped onto the Overlay.
The node responisible for that coordinate is the node that stores the (Key,Value) pair.
26
Example
Given (Key,Value): HashX(Key) = Xcoord HashY(Key) = Ycoord
27
Routing in a CAN
A routing message hops from node to node, Getting closer and closer to the Destination.
A node only knows about its immediate Neighbors
Routing Path Length is (d/4)(n1/d)
As d approaches log(n), the totalPath length goes to log(n).
28
Adding Nodes in a CAN
A new node, N inquires at a Bootstrap node for the IP of any node in the system, S.
Pick a random point, P, in the coordinate space, managed by Node D.
Using CAN Routing, route from S to D.Node D splits its Zone and gives half to N to
manage. Update the Neighbor List in all Neighboring
Nodes to D and N, including D and N.
29
Example
Node, N, routes to the zone containing Point P
The Zone is split between the newNode, N and the old node D.
30
Node Removal
Need to repair the routing in case of a leaving node or a dead node.
If one of the neighboring zones can merge with the empty zone and maintain Rectilinear Integrity, it does so.
If not, the neighboring Node with the smallest zone attempts to Takeover the zone of the dead node.
Each node independently sends a “Takeover” message to its neighbors.
If a node receives a Takeover message, it cancels its timer if the sending zone is smaller, or it sends a takeover message of its own if it is bigger.
31
Proposed Improvements
Multi-Dimensioned Coordinate SpacesMultiple Realities: Multiple Overlapping
Coordinate SpacesBetter Routing MetricsMultiple nodes per ZoneMultiple Hash Functions (replication)Geographically sensitive overlay
32
Experimental Data
33
Comparisons
Comparing Basic CAN to “Knobs on Full” CAN
34
Discussion - Pros
Using some of the improvement made CAN a very robust routing and storage protocol.
Using geographic location in the overlay creation would create smarter hops between close nodes. (But what about a geographically centralized disaster?)
35
Discussion - Cons
Not much work on Load-Balancing the KeysWhen all of the Extra Features are running at
once, CAN becomes quite complicated. Tough to guarantee uniform distribution of
keys with hash functions on a large scale. Query Correctness
36
WORK BY: ROWSTRON AND DRUSCHEL
APPEARED IN: MIDDLEWARE ‘01
Pastry
37
The Problem
Maintain overlay network for both arrivals and failures
Load Balancing
Network proximity sensitive routing
38
Pastry
Lookup/insert O(logN)
Per-node state O(logN)
Network proximity-based routing
39
Design
objId
nodeIds
O2128-1
40
Design
objId
nodeIds
O2128-1
Owner of obj
41
Lookup Table
Prefix matching based on Plaxton Routing
42
Locality of Search
Search widens as prefix match becomes longer!
43
Parameters
b: tradeoff between local storage and average hop count
L: resiliency of routing
44
Security
Choose next hop for routes randomly amongst choices
Replicate data to nearby nodes
45
Scalability
46
Distance
47
Discussion
What does bigger leafset gain you?
How do we decide proximity?
What other features might we want to create a lookup table based upon?