Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T...

24
Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs

Transcript of Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T...

Page 1: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

Active Measurements on the AT&T IP Backbone

Len Ciavattone,

Al Morton, Gomathi Ramachandran

AT&T Labs

Page 2: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 2

Colleagues on This Project

Nicole KowalskiRon KulperGeorge HolubecShashi Pulakurti

Page 3: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 3

Measurements for Large Networks

Must be: Easily understood Estimate or assess customer performance Useful for alarming and associated actions Not likely to generate false positives As close as possible to real-time notification Part of the traditional fault/passive management system

Page 4: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 4

Traditional Measurements

Fault Triggered by hard failures (link, card, router, etc) Near real-time alarms

Passive Element level monitoring Traffic, drops, device health, card performance monitored Performance alarming possible per interface

Where can traditional measurements be added to? Path level performance information Delay and delay variation measurements Indication of customer degradation (except hard failures)

Page 5: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 5

Active Measurements

Active measurements introduce synthetic traffic into the network Advantages:

Traffic flow follows a sampled customer path Delay, delay variation and sampled loss directly measurable Possible to estimate customer impact of element level

degradation Well designed sampling methodology will allow sound

estimation of levels of degradation seen Can be used to give customers a sense of network behavior

(e.g. AT&T’s Network Status Site http://www.att.com/ipnetwork)

Disadvantages Need to introduce traffic into the network Based on sampling, not customer traffic

Page 6: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 6

Practical Considerations

From a practical standpoint, what limits the measurements? Amount of data generated Desire to use a standard/unmodified UNIX kernel Expense of bigger and more powerful servers Cost of deployment of new servers in COs. Difficulty of acquiring appropriate GPS feed

Page 7: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 7

Measurement Design

Poisson Sequence 15 minute duration = 0.3 pkts/sec Type UDP 278 bytes total packet loss threshold is a

min of 3 s

Periodic Sequence 1 minute duration Random Start Time 20 ms spacing Type UDP, IPv4 60 bytes total packet loss threshold is a min

of 3 s

24 hours. . .

15 minutes

Presented at the IETF 50 IPPM meeting by Al Morton

Page 8: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 8

Sampling and Event Detection

Poisson Sequence All 15 minutes tested with average inter-arrival time of 3.33s Assume 10 s congestion events (minimum length) If

Probability of Detection by one or more packets

95.011 n)P(detectio 10*3.0t_length*cong_even ee

Length CycleTest

packets probe ofNumber

Page 9: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 9

Sampling and Event Detection Periodic sequence

1-min test in a 15-min test cycle (2 if considering RT processes) Assume 10s congestion events (minimum length), assume 1 event per

test cycle

Consider that only recurring events are actionable: Average Number of cycles to detection (one-way) = 1/0.0777 = 13 test cycles

The Poisson Probe sequence detects accurately, the Periodic Probe sequence is used to characterize recurring events

event) (RT 144.06015

1060*2

events)way -(one 0778.06015

1060

Length CycleTest

lengthevent Congestionlength Test detection)(

P

Page 10: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 10

Metrics

Round Trip (RT) Loss RT Delay (std dev, 95th percentile, min, mean) Inter-Packet Delay Variation (IPDV) and DV jitterOut of sequence events (non-reversing sequence

definition -- up for consideration in the IETF IPPM)Approximate one-way lossDegraded seconds or minutesLoss pattern (number of consecutive losses)Distributions of delay variationsTraceroutes performed at the beginning of each test

85 Metrics kept indefinitely

Page 11: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 11

Tx Rcv Playout

Time spent in: Transit Rcv Buffer

1

2

3

4

t

1

2

3

4

Inter packetarrival time,longer thansend interval

IPDV is a measure of transfer delay variation. For Packet n,IPDV(n) = Delay(n) - Delay(n-1)

If the nominal transfer time is =10msec, and packet 2 is delayed in transit for an additional 5 msec, then two IPDV values will be affected.

IPDV(2) = 15 - 10 = 5 msec

IPDV(3) = 10 - 15 = -5 msec

IPDV(4) = 10 - 10 = 0 msec

IPDV Definition and Example

Page 12: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 12

IP Packet Sequence

Src Dst Playout

Time spent in: Transit Rcv Buffer

1

2

3

41

2

3

Toleranceon R2 arrival with2 Packet Buffer

t

Arriving Packets are compared with the “next expected” RefNum.

Packet 2 arrives Out-of-Sequence, since Packet 3 has arrived and the “next expected” packet in Packet 4.

Packet 2 is Offset by 1 packet, or Late by the arrival time of Packet 2 - Packet 3 = t

Page 13: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 13

Common Problems Detected

Route ChangesCard degradation Low-level fiber errorsEffects of Maintenance (Card swaps etc)

Page 14: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 14

Examples of Detection

Bit errors that cause low-level (~0.03%) loss can be detected accurately using this method and can be fixed before customers feel the impact Typically in such cases the degradation is subtle enough

that traditional IP alarms do not show the problem clearly Customers aren’t complaining….yet In the case shown, no customer complaints were made and

the problem was fixed proactively

Page 15: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 15

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0:07:0902/05/2002

14:37:1802/07/2002

5:03:5102/10/2002

19:41:2802/12/2002

10:10:5702/15/2002

0:35:1602/18/2002

15:02:5602/20/2002

5:39:0702/23/2002

20:00:4402/25/2002

10:35:3102/28/2002

1:08:2703/03/2002

15:53:3903/05/2002

Per

cen

tag

e L

oss

Increasing Bit Errors

Single packet loss per Periodic test

Two packet losses per Periodic test

More occasional Loss was seen with the Poisson Probe Sequence

Fiber span taken out of service

Page 16: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 16

Detection of Route Changes

Time, h:m Type LostPackets

BurstDuration

Route Dur,m:s

0:49 Poisson 5 consec 23 sec 4:321:07 Poisson 5 consec 15 sec <2:001:09 Periodic 54 consec 1.04 sec (return)1:18 Poisson 4 consec 26 sec 2:101:30 Poisson 5 consec 28 sec 2:03

RT Delay

Time1:00 Periodic

Sequence1:15

6

91:07 1:09

Page 17: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 17

Poisson Probe Route change detection

0

10

20

30

40

50

60

70

80

19:45:0811/28/2001

20:30:0811/28/2001

21:15:0811/28/2001

22:00:0911/28/2001

22:45:0811/28/2001

23:30:0811/28/2001

0:15:0811/29/2001

1:00:0811/29/2001

1:45:0911/29/2001

2:30:0911/29/2001

3:15:0811/29/2001

Del

ay (

ms)

min mean 95% Loss (%)

Page 18: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 18

Periodic probe (same incident)

0

10

20

30

40

50

60

70

19:38:0311/28/2001

20:16:0011/28/2001

21:00:1911/28/2001

21:48:1811/28/2001

22:39:3211/28/2001

23:17:3411/28/2001

0:00:5411/29/2001

0:57:2211/29/2001

1:41:0911/29/2001

2:22:5511/29/2001

3:10:1111/29/2001

Del

ay (

ms)

min mean 95% %Loss

Page 19: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 19

The “Blenders”

First shown by Steve Casner et al in the NANOG 22 conference (May 20-22, 2001, “A Fine-Grained View of High Performance Networking”, http://www.nanog.org/mtg-0105/agenda.html)

Seem to be properties of route loopsRare events, but interesting as they may shed light

on some properties of route convergence

Page 20: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 20

Simple Blender

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

RtDelay(ms) TxSeqNo

• 88 packets arrive within 64 ms• 79 OOS packets, 9 in sequence• 7 sequence discontinuities.• Zero Loss• Delay and IPDV actually describe this event best

Page 21: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 21

Simple Blender Magnified

0

200

400

600

800

1000

1200

1400

1600

1800

2000

17000 17200 17400 17600 17800 18000 18200 18400 18600 18800 19000

RtDelay(ms) TxSeqNo

Page 22: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 22

Blender 2

0

1000

2000

3000

4000

5000

6000

7000

0 10000 20000 30000 40000 50000 60000 70000

RtDelay(ms) TxSeqNo

• Scattered loss throughout• 250 packets in event, •10 separate sequence discontinuities• Delay of first packet 6s

Page 23: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 23

Blender 2

0

1000

2000

3000

4000

5000

6000

7000

53000 54000 55000 56000 57000 58000 59000 60000 61000

RtDelay(ms) TxSeqNo

Page 24: Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.

L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 24

Summary

Active measurements:Can provide a view of customer performanceCan be used to alert maintenance personnel

proactivelyCan provide insight into network behaviorCan be used to improve planned maintenance