Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay...

21
1 Quality of Service in Overlay Networks 2 Why QoS in Overlay Networks? Better handling of Internet path outages Path outages lead to significant disruption in communication for 10 minutes or more [Paxton ACM SIGCOMM’ 96] Information shared by AS and providers are through Border Gateway Protocol (BGP), which hides topology information and traffic conditions Provide desirable paths for QoS sensitive applications

Transcript of Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay...

Page 1: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

1

Quality of Service in Overlay Networks

2

Why QoS in Overlay Networks?

Better handling of Internet path outagesPath outages lead to significant disruption in communication for 10 minutes or more [Paxton ACM SIGCOMM’ 96]

Information shared by AS and providers are through Border Gateway Protocol (BGP), which hides topology information and traffic conditions

Provide desirable paths for QoS sensitive applications

Page 2: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

3

Design Goals

I. Get accurate path-quality information with low overhead

II. Find paths that satisfy user-desired QoSquickly

loss rates, end-to-end delays, bandwidthNote: Some existing work focus only one of the metrics

4

How other researchers solve Problem I?Come up with efficient and effective overlay monitoring systems

Page 3: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

5

Components of Monitoring Services

Overlay link is an IP-layer path

Some time, they are tightly integrated

6

A

B C

D

A

B C

D

I

EF

G

HIP-layer topology

Overlay topology

Page 4: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

7

I. Overlay Link Monitoring

Existing work can be divided intoNo IP-level topological information used

O(n2) such as RON (full mesh)O(nlog n) such as Pastry and Tapestry with neighbors of O(log n); scalable content network with neighbor O(n1/d)O(n) fixed probing neighbors for each node like NICE

Use IP-level topological informationO(nlogn) [ICNP 03, ICDCS 04, SIGCOMM 04]

Issues: Estimation of quality, topology measurements, topology error handling, Topology changes due to addition/deleting of endhosts, route changes

8

II. Link Quality Exchange

Link state (e.g., RON)Dissemination tree that satisfies certain properties

Minimum diameter, bounded link stress [Tang et al. ICDCS 04]

Try to avoid sending the quality information if measured/estimated quality remains the same [Tang et al. ICDCS 04]

Minimize network load

Other structures

Page 5: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

9

III. Routing

No fixed topology; route based on user-defined criteria

E.g., RONSome pre-defined topology [Li Infocom 04]

Minimum spanning treeTopology-aware k minimal spanning tree

10

Resilient Overlay NetworkAuthors: David Anderson and Hari Balakrishnanand Frans Kaashoek and Robert MorrisACM SOSPOctober 2001, Banff, Canada

Page 6: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

11

Resilient Overlay Network (RON)

Goals: Failure detection and recovery within 20 secondsProvide tighter integration of routing and path selection with applicationsProvide expressive policy routing

Implementation with real experiments on 12-16 nodes at different geographical locations

12

RON Probing MechanismEach peer probes other n-1neighborsRandomized periodic probing

Repeatpick a random neighborprobe for bandwidth, loss

rate, and latencywait for a random time

between 1-2 sec.Probing cost: O(n2)Maintain

Latency, loss rate, and throughput (TCP) for each overlay link

Path quality exchange using link state

Page 7: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

13

RON Mechanism for Detection Path Outage

Each node does outage detection for each overlay linkTriggered by a loss of probe packets in normal probingSend a sequence of consecutive probes in quick succession spaced by PROBE_TIMEOUT periodConsider the link down if no response for certain #of probes over a threshold (OUTAGE_THRESH)

14

Routing in RON

Entry node tags the packet’s RON header with a flow identifier the packet belongs

Subsequent routers forward packets of the same flow along the same path of the first packet of that flow

Best-effort routingThe authors did not explicitly say that a list of routers are included in the packet header like in source-based routing, but it seems that the complete route is determined at the entry node

Routing policies: Choose the direct internet path first before choosing a RON path

Minimized latencyMinimized loss ratesOptimized throughput

Page 8: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

15

Latency Minimizer

For any link l, its latency estimate latl is

Use alpha = 0.9Latency of a RON path (consisting of a set of overlay links)

lll samplenewlatlat _)1( ⋅−+⋅← αα

∑∈

=pathl

lpath latlat

16

Loss-Minimizer

Compute current loss rate as the average of last 100 probe samples

Loss rate of a Ron path is estimated as

Assume that loss rates of overlay links on the path are independent

)1(1 lpathlpath lossratelossrate −∏−= ∈

Page 9: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

17

TCP-Throughput Optimizer

Strive to avoid paths of low throughputFocus on TCP-flow using a simplified formula to estimate TCP throughput

p: one-way loss probability estimated from losstwo_way/2rtt: end-to-end round trip time probed

prttscore

⋅= 5.1

18

Other Routing Policy

Other policy specified by the userDisallow packets from commercial sites to go through Internet 2

ImplementationEntering packet is tagged with policy by the entry RON nodePolicy is used to identify the right routing table to lookup

A separate set of routing tables is constructed for each policy by re-running the routing computation, removing disallowed links

Subsequent RON nodes just look at the tag for routing

Page 10: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

19

Implementation

Ron providesa set of C++ libraries for a user-level RON client (e.g., resilient IP forwarder, prober) to link

No special kernel supportAllow sending data through RON without modification to transport protocols and applications

20

Experimental StudyRON1 (N=12)

36 different ASes74 inter-ASeslinks

RON2 (N=16)50 ASes118 inter-ASeslinks

Policy: Prohibit sending traffic from commercial sites to or from over Internet2

Page 11: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

21

RON164 hours collection of data in March 20012.6 Million data samples

RON285 hours collection of data in May 2001

Chosen Ron path consists of one overlay link

22

Some Results

Declare a path outage if loss rate is greater than 30%

Page 12: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

23

Conclusions

Ron satisfies the design criteriaMany experimental results are reportedHigh probing overhead of O(n2)High overhead for exchanging of link quality

Contribute to more investigation in this area in recent year

Use a simple way of probing loss rates, round-trip time and TCP throughput

Contribute to more investigations of better way to measure loss rates and bandwidth

24

On the cost-quality tradeoff in Topology-Aware Overlay Path Probing

Chiping Tang and Phillip K. McKinleyICNP 2003

Page 13: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

25

Proposed Work

Propose a centralized algorithm that determines the probe set P with the least probing overhead while achieving high estimation accuracy

Probe set = {selected IP paths to probe}Propose an inference algorithm to infer a quality bound of each unprobed pathConsider loss rates, latency, and available bandwidth

Assumption: IP-level topological information is knownDoes not concern about topology measurements/errors/topology changes/route changes

26

Why should it work?

IP paths on the Internet are overlapped considerably small probe set in the order of O(n log n)

Page 14: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

27

Performance Metric

Estimation accuracy of a path p is

Q(p): real QoS of a path; Q’(p): inferred QoSOverall estimation accuracy for a probe set P is

)('),(max(

|)(')(|1))('),((1)(

pQpQ

pQpQpQpQpacc

−−=−= δ

∑ ⋅=i

ii paccwQPZ )(),(

28

Approach

1. Generate an intermediate topology consisting of path segments in between the overlay topology and the IP-level topology

2. Path SelectionStep 1: Select the probe set with the least probing

cost s. t. quality bound of the unprobed paths can be inferred from the probe paths

Step 2: Add more paths into the probe set to tighten the inferred bound of the unprobedpaths

Page 15: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

29

Path Segments

Path segment is one of the maximal subpathsin a path such that the inner vertices on the subpathare not incident to any other physical links in the overlay network.

30

Estimate the bound of a segment in a probed path

RulesLatency of a path is > latency of any of its subpathsLoss rate of a path is >= loss rate of any sub pathsAvailable bandwidth <= available bandwidth of any subpaths Probe AB get 5% loss rate

Probe AC get 3% loss rate

5%

5%

5%

3% 3%

3%

5%

Page 16: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

31

Estimate the quality of an unprobed path

Latency of a path <= sum of the latency upper bounds of all its segmentsLoss rate of a path p is <= where rs is the loss rate upper bound of segment sAvailable bandwidth is >= the minimum of the lower bound of the available bandwidth of all its segments

Ex. Loss rate of path BC is estimated as 1-(0.95*0.97*0.97)=0.11

)1(1 ∏ −− ∈ sps r

32

I: Determine the probe set

Choice I: Use #probe packets as the probing overhead (i.e., more paths more probe packets)

Goal: Cover every path segment with the least probing overhead (standard minimum set cover)Greedy Heuristic: At each step, choose a path with the maximum number of unprobed segments

Choice II: Probing a path e incurs cost C(e)Minimum weighted set cover

Greedy Heuristic: At each step, choose the path with the minimum cost per unprobed segments in the path

Page 17: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

33

Step 2: Refine the probe set Goal: To tighten the bound of the inferred segment qualityStrategy

Choice I: Pick an unprobed path randomlyChoice II: Pick the unprobed path with the lowest cost; use the least #segments as the tie breakerChoice III: Pick the unprobed path with the least number of segments; use the least cost as the tie breakerChoice IV: Pick the unprobed path that is most overlapped with other paths in the probe set; Use the least cost as the tie breaker

Why: Have more chance to refine the bound

Keep picking more paths until the desired estimation accuracy is achieved

34

Findings

Turn out that these approaches work quite well with bandwidth, but not as well for latency and loss rateWhy?

Latency is an additive metricLoss rate is a multiplicative metricDeviation from bounds have more impact due to addition or multiplication

Page 18: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

35

Improvements for Latency

Latency is the sum of the bounds not so accurateUse an algebraic method

Measured latency of path AB is LABMeasured latency of path CD is LCDMeasured latency of path AC is LAC

L(w)+L(v)=LABL(v)+L(x)+L(y)=LACL(x)+L(y) = LCDL(w)+L(x)+L(y)=L(BC)L(v)+L(x)+L(z) = L(AD)L(w)+L(x)+L(z) = L(BD)

5 variables: w,v,x,y,zProbing more paths will enable us to possibly solve the linear equations

In general, we may not be able to solve the linear equations. Inthis case, use the estimated bound for the ones that we cannot solve

36

Improvement for Loss Rates

Loss rate is multiplicative matrixHow do we use linear equations?

∑∏

∏∏

=

−=−

−=−

−−=

pss

spsp

spsp

spsp

R

rr

rr

rr

R

))1(log()1log(

)1(1

)1(1

Page 19: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

37

Performance EnvironmentSimulation on 6 different network topologies

Real AS-level Internet topologyFeb 2000; 6474 nodes with an average degree of 2.15

3 generated by GT-ITMTwo by Inet3.0 simulator

Select overlay nodes “uniformly” from physical nodesDelay on backbone links 1ms-50msDelay on links from edge routers to end hosts is randomly set between 1-3 msLM1 model for backbone link loss rate

Good link fraction is set to 90%; Loss rate on a good link is between 0 and 1%Loss rate of a bad link is between 5 and 10%For edge links

Good link fraction is 50%; loss rate on a good link is 0-1%; loss rate on a bad link is 10-20%

Bandwidth on backbone between 100-500 MB; bandwidth on edge links 500KB-1MB

38

Some Results without Algebraic Method

All-bounded using set coverrandom selection of more paths are used in the path refinements

0

60000

Page 20: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

39

Summary of Algorithms with Algebraic Method

Select paths using path selectionIf the segment of a late-selected path is linearly dependent on any segment of paths selected earlier, discard this path and try another unselected pathProbe the selected pathsDerive segment latency and loss rate estimationSolve subset of linear equations and get exact latency valuesCompute quality of unprobed paths from quality of probed paths

40

Improvement using Algebraic Method

Estimation accuracy improves quickly with the algebraic method

Page 21: Quality of Service in Overlay Networksweb.cs.iastate.edu/~ruan/cs686/qos-overlay.pdfSelect overlay nodes “uniformly” from physical nodes Delay on backbone links 1ms-50ms Delay

41

Findings

Different choices in the second phase does not have much impact on estimation accuracyWSETCOVER_RANDOM incurs low average link stressIf overlay paths do not overlap, not much benefit, but paths do overlap on the InternetBounded bandwidth estimation with accuracy upto 90% for all paths with O(nlogn) probing overhead

(From plots)

42

Conclusion

Reduce probing overheadDo not consider topology changes and errors from topology measurements