1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu...

1

Server-based Characterization and Inference of Internet Performance

Venkat PadmanabhanLili Qiu

Helen Wang

Microsoft Research

UCLA/IPAM Workshop March 2002

2

Outline

• Overview• Server-based characterization of

performance• Server-based inference of

performance– Passive Network Tomography

• Summary and future work

3

Overview

• Goals– characterize end-to-end performance– infer characteristics of interior links

• Approach: server-based monitoring– passive monitoring relatively

inexpensive– enables large-scale measurements– diversity of network paths

4

Web server

clients

DATA

ACKs ACKs

5

Research Questions

• Server-based characterization of end-to-end performance– correlation with topological metrics– spatial locality– temporal stability

• Server-based inference of internal link characteristics– identification of lossy links

6

Related Work

• Server-based passive measurement– 1996 Olympics Web server study

(Berkeley, 1997 & 1998)– characterization of TCP properties

(Allman 2000)

• Active measurement– NPD (Paxson 1997)– stationarity of Internet path properties

(Zhang et al. 2001)

7

Experiment Setting

• Packet sniffer at microsoft.com– 550 MHz Pentium III– sits on spanning port of Cisco Catalyst 6509– packet drop rate < 0.3%– traces up to 2+ hours long, 20-125 million

packets, 50-950K clients

• Traceroute source– sits on a separate Microsoft network, but all

external hops are shared– infrequent and in the background

8

Topological Metrics and Loss Rate

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15 20 25 30

Router hop count

E2E

pac

ket

loss

rat

e

0

0.02

0.04

0.06

0.08

0.1

0.12

0 2 4 6 8 10

AS hop countE

2E p

acke

t lo

ss r

ate

Topological distance is a poor predictor of packet loss rate.All links are not equal need to identify the lossy links

9

Spatial Locality

0

20

40

60

80

100

0 1 2 3 4

Difference in loss rate (buckets)

Cu

mu

lati

ve p

rob

abil

ity

(%)

Subnet BGP Prefix AS Random

Spatial locality there may be shared cause for packet loss

• Do clients in the same cluster see similar loss rates?

• Loss rate is quantized into buckets– 0-0.5%, 0.5-2%, 2-5%, 5-

10%, 10-20%, 20+%– suggested by Zhang et al.

(IMW 2002)• Focus on lossy clusters

– average loss rate > 5%

10

Temporal Stability

• Loss rate again quantized into buckets

• Metric of interest: stability period (i.e., time until transition into new bucket)

• Median stability period ≈ 10 minutes• Consistent with previous findings

based on active measurements

11

Putting it all together

• All links are not equal need to identify the lossy links

• Spatial locality of packet loss rate lossy links may well be shared

• Temporal stability worthwhile to try and identify the lossy links

12

Passive Network Tomography

• Goal: determine characteristics of internal network links using end-to-end, passive measurements

• We focus on the link loss rate metric– primary goal: identifying lossy links

• Why is this interesting?– locating trouble spots in the network– keeping tabs on your ISP– server placement and server selection

13

Sprint

AT&T

Web server

UUNET

C&W

Qwest AOL

Earthlink

Darn, it’s slow!

Why is itso slow?

14

Related Work

• MINC (Caceres et al. 1999)– multicast-based active probing

• Striped unicast (Duffield et al. 2001)– unicast-based active probing

• Passive measurement (Coates et al. 2002)– look for back-to-back packets

• Shared bottleneck detection– Padmanabhan 1999, Rubenstein et al. 2000,

Katabi et al. 2001

15

Active Network Tomography

S

A B

S

A B

Multicast probes Striped unicast probes

16

Problem Formulation

l1

l8l7l6

l2

l4 l5

l3

server

clients

p1 p2 p3 p4 p5

Collapse linear chains into virtual links

(1-l1)*(1-l2)*(1-l4) = (1-p1)

(1-l1)*(1-l2)*(1-l5) = (1-p2)…(1-l1)*(1-l3)*(1-l8) = (1-p5)

Under-constrained system of equations

17

#1: Random Sampling

• Randomly sample the solution space• Repeat this several times• Draw conclusions based on overall statistics

• How to do random sampling?– determine loss rate bound for each link using

best downstream client– iterate over all links:

• pick loss rate at random within bounds• update bounds for other links

• Problem: little tolerance for estimation error

l1

l8l7l6

l2

l4 l5

l3

server

clients

p1 p2 p3 p4 p5

18

#2: Linear Optimization

Goals• Parsimonious explanation• Robust to estimation error

Li = log(1/(1-li)), Pj = log(1/(1-pj))

minimize Li + |Sj|L1+L2+L4 + S1 = P1

L1+L2+L5 + S2 = P2

…L1+L3+L8 + S5 = P5

Li >= 0

Can be turned into a linear program

l1

l8l7l6

l2

l4 l5

l3

server

clients

p1 p2 p3 p4 p5

19

#3: Bayesian Inference

• Basics:– D: observed data

• sj: # packets successfully sent to client j

• fj: # packets that client j fails to receive

– Θ: unknown model parameters• li: packet loss rate of link i

– Goal: determine the posterior P(Θ|D)– inference is based on loss events, not

loss rates• Bayes theorem

– P(Θ|D) = P(D|Θ)P(Θ)/∫P(D|Θ)P(Θ)dΘ– hard to compute since Θ is

multidimensional

l1

l8l7l6

l2

l4 l5

l3

server

clients

(s1,f1) (s2,f2) (s3,f3) (s4,f4) (s5,f5)

20

Gibbs Sampling

• Markov Chain Monte Carlo (MCMC)– construct a Markov chain whose stationary

distribution is P(Θ|D)

• Gibbs Sampling: defines the transition kernel– start with an arbitrary initial assignment of li

– consider each link i in turn

– compute P(li|D) assuming lj is fixed for j≠i

– draw sample from P(li|D) and update li– after burn-in period, we obtain samples from the

posterior P(Θ|D)

21

Gibbs Sampling Algorithm

1) Initialize link loss rates arbitrarily2) For j = 1 : burn-in

for each link i compute P(li|D, {li’}) where li is loss rate of link i, and {li’} = ji lj

3) For j = 1 : realSamples for each link i

compute P(li|D, {li’})Use all the samples obtained at step 3 to

approximate P(|D)

22

Experimental Evaluation

• Simulation experiments• Internet traffic traces

23

Simulation Experiments

• Advantage: no uncertainty about link loss rate• Methodology

– Topologies used:• randomly-generated: 20 - 3000 nodes, max degree = 5-50• real topology obtained by tracing paths to microsoft.com

clients– randomly-generated packet loss events at each link

• a fraction f of the links are good, and the rest are “bad”• LM1: good links: 0 – 1%, bad links: 5 – 10%• LM2: good links: 0 – 1%, bad links: 1 – 100%

• Goodness metrics: – Coverage: # correctly inferred lossy links – False positives: # incorrectly inferred lossy links

24

Simulation Results

1000-node random topologies (d=10, f=0.95)

0

20

40

60

80

100

120

140

160

Random LP Gibbs

# li

nk

s

"# true lossy links"

"# correctly identified lossy links"

"# false positive"

25

Simulation Results

1000-node random topologies (d=10, f=0.5)

0

100

200

300

400

500

600

Random LP Gibbs

# li

nk

s

"# true lossy links""# correctly identified lossy links""# false positive"

26

Simulation ResultsGibbs sampling for a 1000-node random topology (d = 10, f = 0.5)

0

100

200

300

400

500

600

0 200 400 600 800 1000

# lin

ks

"# correctly identified lossy links""# true lossy links""# false positive"

High confidence in top few inferences

27

Trade-off

Techniques Coverage False Positive Computation

Random sampling

High High Low

LP Medium Low Medium

Gibbs sampling High Low High

28

Internet Traffic Traces

• Challenge: validation– Divide client traces into two: tomography set and validation

set– Tomography data set => loss inference – Validation set => check if clients downstream of the inferred

lossy links experience high loss• Results

– false positive rate is between 5 – 30%– likely candidates for lossy links:

• links crossing an inter-AS boundary• links having a large delay (e.g. transcontinental links)• links that terminate at clients

– example lossy links:• San Francisco (AT&T) Indonesia (Indo.net)• Sprint PacBell in California• Moscow Tyumen, Siberia (Sovam Teleport)

29

Summary

• Poor correlation between topological metrics & performance

• Significant spatial locality and temporal stability• Passive network tomography is feasible• Tradeoff between computational cost and accuracy• Future directions

– real-time inference– selective active probing

• Acknowledgements:– MSR: Dimitris Achlioptas, Christian Borgs, Jennifer

Chayes, David Heckerman, Chris Meek, David Wilson– Infrastructure: Rob Emanuel, Scott Hogan

http://www.research.microsoft.com/~padmanab

1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu...

Documents

Transcript of 1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu...