MIT Laboratory for Computer Science
Transcript of MIT Laboratory for Computer Science
Resilient Overlay Networks
David G. Andersen
Hari Balakrishnan, M. Frans Kaashoek, Robert Morris
MIT Laboratory for Computer Science
October 2001
http://nms.lcs.mit.edu/ron/
The Internet Abstraction
����
��������
����
��������
� Any-to-any communication
Resilient Overlay Networks Slide 1
The Internet Abstraction
��������
��������
����
����
� Any-to-any communication
transparently routing around failures
Resilient Overlay Networks Slide 2
How Robust is Internet Routing?Paxson
95-97� 3.3% of routes had “serious problems”
Labovitz
97,00
� 10% of routes available < 95% of time
� 65% of routes available < 99.9% of time
� 3-min minimum detect+recover time;
often 15 minutes
� 40% of outages took 30+ mins to repair
Chandra
01
� 5% of faults > 2.75 hours
Resilient Overlay Networks Slide 3
The Internet has Redundancy
� Traceroute between 12 hosts,showing Autonomous Systems (AS’s)
AS5650
AS6521AS1239
AS226
AS3967
AS210AS701
UTREP
AS3
AS7015AS1742
AS145
AS1
AS7922 AS7018
AS3561
AMNAP
AS209
AS1103
AS3756
AS6453
AS1785
AS3356AS8297
AS5050
AS9
AS13790
AS702
AS26
AS1200
AS9057 AS8709
AS7280
AS13649
AS6114
AS1790
CCI
Utah
ArosSightpath
NYU
CA−T1
CMU
MA−Cable
MIT
VU−NL
CornellvBNSNYSERNet
Abilene
Known private peering
Resilient Overlay Networks Slide 4
How Robust is Internet Routing?
✔ Scales well
✘ Suffers slow outage detection and recovery
Internet backbone routing also cannot:
� Detect badly performing paths
� Efficiently leverage redundant paths
� Multi-home small customers
� Express sophisticated routing policy / metrics
➔ We’d like to fix these shortcomings
Resilient Overlay Networks Slide 5
GoalImprove communication availability
for small (3-50 node)communities:
� Collaboration and conferencing
� Virtual Private Networks (VPNs)
� 5 friends who want better service...
Interest in improving communication betweenany
members of the community
Resilient Overlay Networks Slide 6
RON: Routing around Internet Failures
��������
��������
����
����
The Internet takes a while to re-route
Resilient Overlay Networks Slide 7
RON: Routing around Internet Failures
��������
��������
����
����
The Internet takes a while to re-route
... Cooperating hosts in different routing domains
can do better by re-routing through a peer node
Resilient Overlay Networks Slide 8
Overlays
� Old idea in networks
✔ Easily deployed
✔ Lets Internet focus on scalability
✔ Keep functionality betweenactive peers
Resilient Overlay Networks Slide 9
The Approach
Probes and Routing
��������
��������
����
����
� Frequently measureall inter-node paths
� Exchange routing information
� Route along app-specific best path
consistent with routing policy
Resilient Overlay Networks Slide 10
Architecture: Probing
��������
��������
����
����
➔ Probe between nodes, determine path qualities
– O�
N2�
probe traffic with active probes
– Passive measurements
Resilient Overlay Networks Slide 11
Probing and Outage Detection
Record "success" with RTT 6
Node A Node BInitial Ping
Response 1
Response 2
ID 5: time 10
ID 5: time 15
ID 5: time 33
ID 5: time 39
Record "success" with RTT 5
� Probe every random(14) seconds
� 3 packets, both sides get RTT and reachability
� If “lost probe,” send next immediatelyTimeout based on RTT and RTT variance
� If N lost probes, notify outage
Resilient Overlay Networks Slide 12
Architecture: Performance Database
Node 2
Performance Database
Probes
Performance Database
Probes
Node 1���� ��
������
������������
� Probe between nodes, determine path qualities
➔ Store probe results in performance database
Resilient Overlay Networks Slide 13
Architecture: Routing Protocol
Node 3
Router
Forwarder
Probes
Router
Forwarder
Probes
Node 1
Performance Database
Node 2
Router
Forwarder
Probes��������
��������
����
����
� Probe between nodes, determine path qualities
� Store probe results in performance database
➔ Link-state routing protocol between nodes
Disseminates info using the overlay
Resilient Overlay Networks Slide 14
Routing: Building Forwarding TablesPolicy routing
� Classify by policy
� Generate table per policy
� E.g. Internet2 Clique
Metric optimization
� App tags packets
(e.g. “low latency”)
� Generate one table per metric
PolicyDemux
Lookup
HopDst Next
Next Hop Address
Metric Demux
Resilient Overlay Networks Slide 15
Architecture: Forwarding
Node 3
Conduit
Conduit
Router
Forwarder
Probes
Node 1
Performance Database
Node 2
Forwarder
Probes
Router
Forwarder
Data
Router
Probes��������
��������
����
����
� Probe between nodes, determine path qualities
� Store probe results in performance database
� Link-state routing protocol between nodes
➔ Data handled by application-specific conduitForwarded in UDP
Resilient Overlay Networks Slide 16
ScalingRouting and probing add packets:Responsiveness vs. overhead vs. size
10 nodes 13.3Kbps30 nodes
2.2Kbps
33Kbps50 nodes
0
35000
0 5 10 15 20 25 30 35 40 45 50
Ove
rhea
d (b
its/s
econ
d)
Number of Nodes
Overhead
25000
20000
15000
10000
5000
30000
✘ 50 nodes is pushing the limit
✔ But is enough formany community apps
Resilient Overlay Networks Slide 17
RON Clients and ApplicationsRON is a set oflibraries...
... you have to build something with them.
Resilient IP Forwarder ClientIP
Divert Socket
FreeBSD Networking
RON
Raw Socket
FreeBSD NetworkingIP
RON
➔ Transparent RON of any traffic
Resilient Overlay Networks Slide 18
Many Evaluation QuestionsSome of which we’ve answered...
� Does the RON approach work?
– How fast do we detect and avoid bad paths?
– How many Internet outages are avoidable?
– How does RON affect latency/throughput?
� Doesn’t RON violate network policies?
� Can RON’s routing behavior be stable?
� Is it safe to deploy these things?
Resilient Overlay Networks Slide 19
Evaluation
� Two datasets from Internet deployment
(running for several months now)
� RON1: 12 nodes, 64 hours, Mar 2001
� RON2: 16 nodes, 85 hours, May 2001
� Compared RON-chosen paths to the Internet
Resilient Overlay Networks Slide 20
Deployment
To kaist.kr
CCIArosUtah
CMU
To vu.nlLulea.se
MITMA−CableCisco
Cornell
NYU
OR−DSL
CA−T1
� 16 hosts in the US, Europe, and Asia.(A few more online now. Want a RON?)
� Variety of network types / bandwidths
� N2 scaling of paths seen
Resilient Overlay Networks Slide 21
Evaluation Methodology
� Loss & latency. Each node repeats:
1. Pick random node j
2. Pick a probe type (direct; latency; loss)
round-robin. Send to j3. Delay for random interval
� Throughput: As above, but do 1M TCP bulk
transfer to random host
� Record traceroutes for post-processing
Resilient Overlay Networks Slide 22
Evaluation Details: Policy
� Never, ever use the Internet2 to improve life
for a host not already connected to the
Internet2
✘ Internet2 is high-speed, research-only net:
atypically fast, uncongested, and reliable
➔ Implemented via RON’s policy routing
component
Resilient Overlay Networks Slide 23
Major Results
✔ Probe-based outage detection effective
– RON takes ~10s to route around failureCompared to BGP’s several minutes
– Many Internet outages are avoidable
– RON often improves latency / loss /throughput [paper]
✔ Single-hop indirect routing works well
✔ Scaling is explicitly not our fortebut big enough
Resilient Overlay Networks Slide 24
RON1 vs Internet 30 minute loss rates
RON loss rate(0,10)[10,20)
[20,30)
(0,10)
[10,20)
[20,30)
[80,90)
[90,100]
InternetLossRate
8 4
3
2
12
12
2188
3628783131
44
1
32
1
� 6,825 “path hours” (13,650 samples)
Resilient Overlay Networks Slide 25
Performance: Latency
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
Fra
ctio
n of
sam
ples
Latency (ms)
RONDirect
40ms
RON overhead increases latency byabout 1/2 ms on already fast paths
RON improves latencyby tens to hundreds of mson some slower paths
Resilient Overlay Networks Slide 26
Who Helps?
� How many different intermediates in RON1?# intermediates 0 1 2 3 4 5
# paths 10 31 33 17 14 5
Zeros were primarily Internet2 hosts...
� Popular intermediatesdirect aros Mcable Spath cornell CCI
72% 7.8% 6.7% 5.5% 4.0% 3.5%Not just one well-connected host
Resilient Overlay Networks Slide 27
Single-hop Indirect RoutingP [path good] = p
P [indirect good] =
P [indirect bad] =
RONGood(p)
Source Target
Good(p)
RON
’R’ nodes
Resilient Overlay Networks Slide 28
Single-hop Indirect RoutingP [path good] = p
P [indirect good] = p2
P [indirect bad] = 1� p2
Source Target
Good(p)
’R’ nodes
Good(p)
1−p2
RON
RON
Resilient Overlay Networks Slide 29
Single-hop Indirect RoutingP [path good] = p
P [indirect good] = p2
P [indirect bad] = 1� p2
Source Target
Good(p)
’R’ nodes
Good(p)
1−p2
RON
RON
P [At least one good] = 1 � (1� p2)R+1
Latency shortest paths from routing table dump:
� 48.8% direct paths best
� 99% best paths had 0 or 1 intermediate nodes
Resilient Overlay Networks Slide 30
Single-hop Indirect Routing
✘ Reality: Policy routing can force longer paths
More analysis needed for outage avoidance
✔ But realized many gains with only one hop
Resilient Overlay Networks Slide 31
Related work
� Early ARPANET routing
– OSPF-like queue-length dynamic routing
� Detour
– Performance optimization vs. reliability
– Long-term averages vs. quick re-routing
– Implementation and Internet evaluation
➔ Shows both insights exploitable
� X-Bone / DynaBone
Resilient Overlay Networks Slide 32
Future Work
� Fundamentals
– Internet scalability / resilience trade-off
� Scaling
– How big? What tactics?
– Interacting RONs? Stability?
� Development
– More applications
– Native RON applications
Resilient Overlay Networks Slide 33
Conclusions
✔ RONs improve packet delivery reliability
✔ Overlays attractive spot for resiliency:development, fewer nodes, simple substrate
✔ Single-hop indirection works well
✔ Small confederations respond quickly
➔ RON libraries are good platform fordevelopment, research
http://nms.lcs.mit.edu/ron/
Resilient Overlay Networks Slide 34
RON1 30 minute loss rate changes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−0.15 −0.1 −0.05 0 0.05 0.1 0.15
Fra
ctio
n of
sam
ples
Difference in RON/non−RON loss rate
Latency/losstradeoffs
5% improve loss rate by > 5%
Loss optimizedLatency optimized
Resilient Overlay Networks Slide 35
Outage Detection: Flooding Attack
0
200000
400000
600000
800000
1e+06
1.2e+06
0 10 20 30 40 50 60
TC
P S
eque
nce
#
Time (s)
Successful TCP packet
TCPFlood: TCP+RON
Flood: TCP
BGP can’ t handle this kind of problem...
Resilient Overlay Networks Slide 36
Routing: Announcements
� Link-state announcements from perf. db
� Announce every 10-20 seconds
� Latency: EWMA with parameter .9:
latavg = 0:9� latavg + :1� new sample
� Loss: Average of last 100 samples
� Outage: Any success in last 4 probes
Resilient Overlay Networks Slide 37
Routing: Predicting pathsCombine link metrics into a path estimate.
� Latency:P
L1; L2; :::; LN
� Loss:
Q�1; �2; :::; �N
� Throughput: (TCP Throughput Equation)
score =
p1:5
rtt � p�
� Outage: Any outage anywhere?
Resilient Overlay Networks Slide 38
The Big Picture
Divert Socket
Routing Pref Demux
Latency Loss B/W
HopDst Next
UDP
Type DemuxYes
LocalDataProtocolsandConduits
No
recv()
send()
PolicyDemuxvBNS
defaultRou
ter
Route Lookup
flagspolicy
Yes
Con
duit
Must_Fragment?
Classify
Encapsulate
is_local?
Forw
arde
rRaw Socket
ICMP
Emit
Outbound
FreeBSD NetworkingIP
To RONnodes
To/FromLocal Site
Resilient Overlay Networks Slide 39
Performance: Throughput
� 1% were 50% worse with RON
� 5% doubled throughput with RON
� Median unchanged: RON’s throughput
optimizer looks only for big wins.
Resilient Overlay Networks Slide 40
Performance: Throughput in RON2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1 1 10
Fra
ctio
n of
sam
ples
Ratio of RON throughput to direct throughput (logscale)
2x increase from RON(135 samples)
1/2x decrease from RON(17 samples)
bw samples
Resilient Overlay Networks Slide 41
Policy
� RON supports flexible policies
– Exclusive cliques (e.g. Internet2)
– General policies (classifier + link set)
� RONs deployed between cooperating entities
No “ involuntary backdoors”
� Policy violations remain at the human level
How do users know what policy is?
More interesting future work.
Resilient Overlay Networks Slide 42
Related work 2Performance / egress optimization
� RouteScience: Optimize egress point selection
– “Probe” and TCP-monitoring based
– Aimed at BGP-capable customers
– Injects local BGP routes
� Sockeye
– Akamai-based
– Optimization for multihomed customers
Resilient Overlay Networks Slide 43
Traditional BGP Multihoming
✔ “Just works” - software transparent
✘ Depends on BGP failover times (3min+)
✘ Provider filtering (/19 — /24)
✘ Requires a decent-sized clue
➔ Not applicable to small networks
Resilient Overlay Networks Slide 44