Resilient Overlay Networks (RON)

47
WORK BY: ANDERSEN, BALAKRISHNAN, KAASHOEK, AND MORRIS APPEARED IN: SOSP OCT 2001 PRESENTED BY: MATT TROWER AND MARK OVERHOLT SOME IMAGES ARE TAKEN FROM ORIGINAL PRESENTATION Resilient Overlay Networks (RON)

description

Resilient Overlay Networks (RON). Work by: andersen , balakrishnan , Kaashoek , and Morris Appeared In: SOSP Oct 2001 Presented By: Matt Trower and Mark Overholt Some Images Are Taken From Original Presentation. Background. Overlay network: Network built on top of a network - PowerPoint PPT Presentation

Transcript of Resilient Overlay Networks (RON)

Page 1: Resilient Overlay Networks (RON)

WORK BY: ANDERSEN, BALAKRISHNAN, KAASHOEK, AND MORRIS

APPEARED IN: SOSP OCT 2001

PRESENTED BY: MATT TROWER AND MARK OVERHOLT

S O M E I M A G E S A R E TA K E N F R O M O R I G I N A L P R E S E N TAT I O N

Resilient Overlay Networks (RON)

Page 2: Resilient Overlay Networks (RON)

2

Background

Overlay network: Network built on top of a network Examples: Internet (telephone), Gnutella (Internet),

RON (Internet + Internet2)

AUP: Acceptable Use Policy (educational Internet2 can’t be used for commercial traffic)

Page 3: Resilient Overlay Networks (RON)

3

Motivation

BGP is slow to converge (~10 minutes) TCP Timeout less than 512 seconds typically

End-to-End paths unavailable 3.3% of the time

5% of outages last more than 2 hoursFailures caused by config errors, cut lines

(ships), DOS attacks

What if we need more than 4 9’s of reliability?

Page 4: Resilient Overlay Networks (RON)

4

Sidenote

Anderson started ISP in Utah before going to MIT

ResearchIndustry

Page 5: Resilient Overlay Networks (RON)

5

Goals

Detect failures and recover in less than 20s

Integrate routing decisions with application needs

Expressive Policy Routing Per-user rate controls OR packet-based AUPs

Page 6: Resilient Overlay Networks (RON)

6

Big Idea

Use inherent diversity of paths to provide link redundancy

Use failure detectors for small subset of nodes in network

Select best path from multiple options

Page 7: Resilient Overlay Networks (RON)

7

Triangle Inequality

Best Path ≠ Direct Path

Page 8: Resilient Overlay Networks (RON)

8

Design

C++ Application level library (send/recv interface)

Prober ProberRouter RouterForwarder ForwarderConduit Conduit

PerformanceDatabase

Application-specific routing tables Policy routing module

Page 9: Resilient Overlay Networks (RON)

9

Failure Detection

Send probes over N-1 links O() Probe interval: 12 seconds Probe timeout: 3 seconds Routing update interval: 14 seconds

Send 3 fast retries on failure

Page 10: Resilient Overlay Networks (RON)

10

Overhead

Acceptable for small cliques

Page 11: Resilient Overlay Networks (RON)

11

Experimental Setup

Two Configurations Ron1: 12 hosts in US and Europe, mix of academic

and industry 64 hours collected in March 2001

Ron2: Original 12 hosts plus 4 new hosts 85 hours collected in May 2001

Page 12: Resilient Overlay Networks (RON)

12

Analysis – Packet Loss

RON1UDP

Page 13: Resilient Overlay Networks (RON)

13

Analysis - Latency

RON1UDP

Page 14: Resilient Overlay Networks (RON)

14

Analysis - Throughput

RON1TCP

Page 15: Resilient Overlay Networks (RON)

15

Resiliency to DOS Attacks

Done on 3-node systemAck’s take different pathUtah’s NetworkEmulation Testbed

Page 16: Resilient Overlay Networks (RON)

16

Pros & Cons

Pros Recovers from complete outages and severe

congestion Doubles throughput in 5% of samples Single-hop redirection sufficient

Cons Valid paths not always considered

Cisco->MIT->NC-Cable-> CMU growth limits scalability Route Ack’s

Page 17: Resilient Overlay Networks (RON)

17

Discussion

How does geographic distribution affect RON?

Why doesn’t BGP do this already?

What if everyone was part of a RON?

What if all latencies fall below 20ms?

Page 18: Resilient Overlay Networks (RON)

18

WORK BY: RATNASAMY, FRANCIS, HANDLEY, AND KARP

APPEARED IN: SIGCOMM ‘01

A Scalable Content-Addressable Network

Page 19: Resilient Overlay Networks (RON)

19

Distributed Hash Table

A Decentralized system that provides key-value pair lookup service across a distributed system.

DHTs support Insert, Lookup, and Deletion of Data

Image from Wikipedia

Page 20: Resilient Overlay Networks (RON)

20

Content Addressable Network

CAN is a design for a distributed hash table.CAN was one of the original four DHT

proposals. It was introduced concurrently with Chord, Pastry, and Tapestry

Page 21: Resilient Overlay Networks (RON)

21

Motivation

P2P Applications were the forefront in designing CAN.

P2P Apps were scalable in their file transfers, but not very scalable in their indexing of content.

Originally designed as a scalable index for P2P content.

Use of CAN was not limited to P2P apps however. Could also be used in large scale storage systems, content distribution systems, or DNS.

Page 22: Resilient Overlay Networks (RON)

22

CAN: Basic Design

The Overlay Network is a Cartesian Coordinate Space on a d-torus (The coordinate system wraps around).

Each node of the CAN is assigned a “Zone” of the d-dimensional space to manage.

Each Node only has knowledge of nodes in Neighboring Zones.

Assume for now, that each zone has only 1 node.

Page 23: Resilient Overlay Networks (RON)

23

Example

Page 24: Resilient Overlay Networks (RON)

24

Example

Neighbor Lists: Zone 1 knows about Zones 4, 5, and 3Zone 2 knows about Zones 4, 5, and 3Zone 3 knows about Zones 5, 1, and 2Zone 4 knows about Zones 2 and 1Zone 5 knows about Zones 1, 2, and 3

Page 25: Resilient Overlay Networks (RON)

25

Inserting Data in a CAN

Given a (Key,Value) Pair, hash the Key d different ways, where d is the # of Dimensions

The resulting coordinate is mapped onto the Overlay.

The node responisible for that coordinate is the node that stores the (Key,Value) pair.

Page 26: Resilient Overlay Networks (RON)

26

Example

Given (Key,Value): HashX(Key) = Xcoord HashY(Key) = Ycoord

Page 27: Resilient Overlay Networks (RON)

27

Routing in a CAN

A routing message hops from node to node, Getting closer and closer to the Destination.

A node only knows about its immediate Neighbors

Routing Path Length is (d/4)(n1/d)

As d approaches log(n), the totalPath length goes to log(n).

Page 28: Resilient Overlay Networks (RON)

28

Adding Nodes in a CAN

A new node, N inquires at a Bootstrap node for the IP of any node in the system, S.

Pick a random point, P, in the coordinate space, managed by Node D.

Using CAN Routing, route from S to D.Node D splits its Zone and gives half to N to

manage. Update the Neighbor List in all Neighboring

Nodes to D and N, including D and N.

Page 29: Resilient Overlay Networks (RON)

29

Example

Node, N, routes to the zone containing Point P

The Zone is split between the newNode, N and the old node D.

Page 30: Resilient Overlay Networks (RON)

30

Node Removal

Need to repair the routing in case of a leaving node or a dead node.

If one of the neighboring zones can merge with the empty zone and maintain Rectilinear Integrity, it does so.

If not, the neighboring Node with the smallest zone attempts to Takeover the zone of the dead node.

Each node independently sends a “Takeover” message to its neighbors.

If a node receives a Takeover message, it cancels its timer if the sending zone is smaller, or it sends a takeover message of its own if it is bigger.

Page 31: Resilient Overlay Networks (RON)

31

Proposed Improvements

Multi-Dimensioned Coordinate SpacesMultiple Realities: Multiple Overlapping

Coordinate SpacesBetter Routing MetricsMultiple nodes per ZoneMultiple Hash Functions (replication)Geographically sensitive overlay

Page 32: Resilient Overlay Networks (RON)

32

Experimental Data

Page 33: Resilient Overlay Networks (RON)

33

Comparisons

Comparing Basic CAN to “Knobs on Full” CAN

Page 34: Resilient Overlay Networks (RON)

34

Discussion - Pros

Using some of the improvement made CAN a very robust routing and storage protocol.

Using geographic location in the overlay creation would create smarter hops between close nodes. (But what about a geographically centralized disaster?)

Page 35: Resilient Overlay Networks (RON)

35

Discussion - Cons

Not much work on Load-Balancing the KeysWhen all of the Extra Features are running at

once, CAN becomes quite complicated. Tough to guarantee uniform distribution of

keys with hash functions on a large scale. Query Correctness

Page 36: Resilient Overlay Networks (RON)

36

WORK BY: ROWSTRON AND DRUSCHEL

APPEARED IN: MIDDLEWARE ‘01

Pastry

Page 37: Resilient Overlay Networks (RON)

37

The Problem

Maintain overlay network for both arrivals and failures

Load Balancing

Network proximity sensitive routing

Page 38: Resilient Overlay Networks (RON)

38

Pastry

Lookup/insert O(logN)

Per-node state O(logN)

Network proximity-based routing

Page 39: Resilient Overlay Networks (RON)

39

Design

objId

nodeIds

O2128-1

Page 40: Resilient Overlay Networks (RON)

40

Design

objId

nodeIds

O2128-1

Owner of obj

Page 41: Resilient Overlay Networks (RON)

41

Lookup Table

Prefix matching based on Plaxton Routing

Page 42: Resilient Overlay Networks (RON)

42

Locality of Search

Search widens as prefix match becomes longer!

Page 43: Resilient Overlay Networks (RON)

43

Parameters

b: tradeoff between local storage and average hop count

L: resiliency of routing

Page 44: Resilient Overlay Networks (RON)

44

Security

Choose next hop for routes randomly amongst choices

Replicate data to nearby nodes

Page 45: Resilient Overlay Networks (RON)

45

Scalability

Page 46: Resilient Overlay Networks (RON)

46

Distance

Page 47: Resilient Overlay Networks (RON)

47

Discussion

What does bigger leafset gain you?

How do we decide proximity?

What other features might we want to create a lookup table based upon?