Post on 12-Sep-2021
1
Quality of Service in Overlay Networks
2
Why QoS in Overlay Networks?
Better handling of Internet path outagesPath outages lead to significant disruption in communication for 10 minutes or more [Paxton ACM SIGCOMM’ 96]
Information shared by AS and providers are through Border Gateway Protocol (BGP), which hides topology information and traffic conditions
Provide desirable paths for QoS sensitive applications
3
Design Goals
I. Get accurate path-quality information with low overhead
II. Find paths that satisfy user-desired QoSquickly
loss rates, end-to-end delays, bandwidthNote: Some existing work focus only one of the metrics
4
How other researchers solve Problem I?Come up with efficient and effective overlay monitoring systems
5
Components of Monitoring Services
Overlay link is an IP-layer path
Some time, they are tightly integrated
6
A
B C
D
A
B C
D
I
EF
G
HIP-layer topology
Overlay topology
7
I. Overlay Link Monitoring
Existing work can be divided intoNo IP-level topological information used
O(n2) such as RON (full mesh)O(nlog n) such as Pastry and Tapestry with neighbors of O(log n); scalable content network with neighbor O(n1/d)O(n) fixed probing neighbors for each node like NICE
Use IP-level topological informationO(nlogn) [ICNP 03, ICDCS 04, SIGCOMM 04]
Issues: Estimation of quality, topology measurements, topology error handling, Topology changes due to addition/deleting of endhosts, route changes
8
II. Link Quality Exchange
Link state (e.g., RON)Dissemination tree that satisfies certain properties
Minimum diameter, bounded link stress [Tang et al. ICDCS 04]
Try to avoid sending the quality information if measured/estimated quality remains the same [Tang et al. ICDCS 04]
Minimize network load
Other structures
9
III. Routing
No fixed topology; route based on user-defined criteria
E.g., RONSome pre-defined topology [Li Infocom 04]
Minimum spanning treeTopology-aware k minimal spanning tree
10
Resilient Overlay NetworkAuthors: David Anderson and Hari Balakrishnanand Frans Kaashoek and Robert MorrisACM SOSPOctober 2001, Banff, Canada
11
Resilient Overlay Network (RON)
Goals: Failure detection and recovery within 20 secondsProvide tighter integration of routing and path selection with applicationsProvide expressive policy routing
Implementation with real experiments on 12-16 nodes at different geographical locations
12
RON Probing MechanismEach peer probes other n-1neighborsRandomized periodic probing
Repeatpick a random neighborprobe for bandwidth, loss
rate, and latencywait for a random time
between 1-2 sec.Probing cost: O(n2)Maintain
Latency, loss rate, and throughput (TCP) for each overlay link
Path quality exchange using link state
13
RON Mechanism for Detection Path Outage
Each node does outage detection for each overlay linkTriggered by a loss of probe packets in normal probingSend a sequence of consecutive probes in quick succession spaced by PROBE_TIMEOUT periodConsider the link down if no response for certain #of probes over a threshold (OUTAGE_THRESH)
14
Routing in RON
Entry node tags the packet’s RON header with a flow identifier the packet belongs
Subsequent routers forward packets of the same flow along the same path of the first packet of that flow
Best-effort routingThe authors did not explicitly say that a list of routers are included in the packet header like in source-based routing, but it seems that the complete route is determined at the entry node
Routing policies: Choose the direct internet path first before choosing a RON path
Minimized latencyMinimized loss ratesOptimized throughput
15
Latency Minimizer
For any link l, its latency estimate latl is
Use alpha = 0.9Latency of a RON path (consisting of a set of overlay links)
lll samplenewlatlat _)1( ⋅−+⋅← αα
∑∈
=pathl
lpath latlat
16
Loss-Minimizer
Compute current loss rate as the average of last 100 probe samples
Loss rate of a Ron path is estimated as
Assume that loss rates of overlay links on the path are independent
)1(1 lpathlpath lossratelossrate −∏−= ∈
17
TCP-Throughput Optimizer
Strive to avoid paths of low throughputFocus on TCP-flow using a simplified formula to estimate TCP throughput
p: one-way loss probability estimated from losstwo_way/2rtt: end-to-end round trip time probed
prttscore
⋅= 5.1
18
Other Routing Policy
Other policy specified by the userDisallow packets from commercial sites to go through Internet 2
ImplementationEntering packet is tagged with policy by the entry RON nodePolicy is used to identify the right routing table to lookup
A separate set of routing tables is constructed for each policy by re-running the routing computation, removing disallowed links
Subsequent RON nodes just look at the tag for routing
19
Implementation
Ron providesa set of C++ libraries for a user-level RON client (e.g., resilient IP forwarder, prober) to link
No special kernel supportAllow sending data through RON without modification to transport protocols and applications
20
Experimental StudyRON1 (N=12)
36 different ASes74 inter-ASeslinks
RON2 (N=16)50 ASes118 inter-ASeslinks
Policy: Prohibit sending traffic from commercial sites to or from over Internet2
21
RON164 hours collection of data in March 20012.6 Million data samples
RON285 hours collection of data in May 2001
Chosen Ron path consists of one overlay link
22
Some Results
Declare a path outage if loss rate is greater than 30%
23
Conclusions
Ron satisfies the design criteriaMany experimental results are reportedHigh probing overhead of O(n2)High overhead for exchanging of link quality
Contribute to more investigation in this area in recent year
Use a simple way of probing loss rates, round-trip time and TCP throughput
Contribute to more investigations of better way to measure loss rates and bandwidth
24
On the cost-quality tradeoff in Topology-Aware Overlay Path Probing
Chiping Tang and Phillip K. McKinleyICNP 2003
25
Proposed Work
Propose a centralized algorithm that determines the probe set P with the least probing overhead while achieving high estimation accuracy
Probe set = {selected IP paths to probe}Propose an inference algorithm to infer a quality bound of each unprobed pathConsider loss rates, latency, and available bandwidth
Assumption: IP-level topological information is knownDoes not concern about topology measurements/errors/topology changes/route changes
26
Why should it work?
IP paths on the Internet are overlapped considerably small probe set in the order of O(n log n)
27
Performance Metric
Estimation accuracy of a path p is
Q(p): real QoS of a path; Q’(p): inferred QoSOverall estimation accuracy for a probe set P is
)('),(max(
|)(')(|1))('),((1)(
pQpQ
pQpQpQpQpacc
−−=−= δ
∑ ⋅=i
ii paccwQPZ )(),(
28
Approach
1. Generate an intermediate topology consisting of path segments in between the overlay topology and the IP-level topology
2. Path SelectionStep 1: Select the probe set with the least probing
cost s. t. quality bound of the unprobed paths can be inferred from the probe paths
Step 2: Add more paths into the probe set to tighten the inferred bound of the unprobedpaths
29
Path Segments
Path segment is one of the maximal subpathsin a path such that the inner vertices on the subpathare not incident to any other physical links in the overlay network.
30
Estimate the bound of a segment in a probed path
RulesLatency of a path is > latency of any of its subpathsLoss rate of a path is >= loss rate of any sub pathsAvailable bandwidth <= available bandwidth of any subpaths Probe AB get 5% loss rate
Probe AC get 3% loss rate
5%
5%
5%
3% 3%
3%
5%
31
Estimate the quality of an unprobed path
Latency of a path <= sum of the latency upper bounds of all its segmentsLoss rate of a path p is <= where rs is the loss rate upper bound of segment sAvailable bandwidth is >= the minimum of the lower bound of the available bandwidth of all its segments
Ex. Loss rate of path BC is estimated as 1-(0.95*0.97*0.97)=0.11
)1(1 ∏ −− ∈ sps r
32
I: Determine the probe set
Choice I: Use #probe packets as the probing overhead (i.e., more paths more probe packets)
Goal: Cover every path segment with the least probing overhead (standard minimum set cover)Greedy Heuristic: At each step, choose a path with the maximum number of unprobed segments
Choice II: Probing a path e incurs cost C(e)Minimum weighted set cover
Greedy Heuristic: At each step, choose the path with the minimum cost per unprobed segments in the path
33
Step 2: Refine the probe set Goal: To tighten the bound of the inferred segment qualityStrategy
Choice I: Pick an unprobed path randomlyChoice II: Pick the unprobed path with the lowest cost; use the least #segments as the tie breakerChoice III: Pick the unprobed path with the least number of segments; use the least cost as the tie breakerChoice IV: Pick the unprobed path that is most overlapped with other paths in the probe set; Use the least cost as the tie breaker
Why: Have more chance to refine the bound
Keep picking more paths until the desired estimation accuracy is achieved
34
Findings
Turn out that these approaches work quite well with bandwidth, but not as well for latency and loss rateWhy?
Latency is an additive metricLoss rate is a multiplicative metricDeviation from bounds have more impact due to addition or multiplication
35
Improvements for Latency
Latency is the sum of the bounds not so accurateUse an algebraic method
Measured latency of path AB is LABMeasured latency of path CD is LCDMeasured latency of path AC is LAC
L(w)+L(v)=LABL(v)+L(x)+L(y)=LACL(x)+L(y) = LCDL(w)+L(x)+L(y)=L(BC)L(v)+L(x)+L(z) = L(AD)L(w)+L(x)+L(z) = L(BD)
5 variables: w,v,x,y,zProbing more paths will enable us to possibly solve the linear equations
In general, we may not be able to solve the linear equations. Inthis case, use the estimated bound for the ones that we cannot solve
36
Improvement for Loss Rates
Loss rate is multiplicative matrixHow do we use linear equations?
∑∏
∏∏
∈
∈
∈
∈
=
−=−
−=−
−−=
pss
spsp
spsp
spsp
R
rr
rr
rr
R
))1(log()1log(
)1(1
)1(1
37
Performance EnvironmentSimulation on 6 different network topologies
Real AS-level Internet topologyFeb 2000; 6474 nodes with an average degree of 2.15
3 generated by GT-ITMTwo by Inet3.0 simulator
Select overlay nodes “uniformly” from physical nodesDelay on backbone links 1ms-50msDelay on links from edge routers to end hosts is randomly set between 1-3 msLM1 model for backbone link loss rate
Good link fraction is set to 90%; Loss rate on a good link is between 0 and 1%Loss rate of a bad link is between 5 and 10%For edge links
Good link fraction is 50%; loss rate on a good link is 0-1%; loss rate on a bad link is 10-20%
Bandwidth on backbone between 100-500 MB; bandwidth on edge links 500KB-1MB
38
Some Results without Algebraic Method
All-bounded using set coverrandom selection of more paths are used in the path refinements
0
60000
39
Summary of Algorithms with Algebraic Method
Select paths using path selectionIf the segment of a late-selected path is linearly dependent on any segment of paths selected earlier, discard this path and try another unselected pathProbe the selected pathsDerive segment latency and loss rate estimationSolve subset of linear equations and get exact latency valuesCompute quality of unprobed paths from quality of probed paths
40
Improvement using Algebraic Method
Estimation accuracy improves quickly with the algebraic method
41
Findings
Different choices in the second phase does not have much impact on estimation accuracyWSETCOVER_RANDOM incurs low average link stressIf overlay paths do not overlap, not much benefit, but paths do overlap on the InternetBounded bandwidth estimation with accuracy upto 90% for all paths with O(nlogn) probing overhead
(From plots)
42
Conclusion
Reduce probing overheadDo not consider topology changes and errors from topology measurements