Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T...
-
Upload
bertram-houston -
Category
Documents
-
view
214 -
download
1
Transcript of Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T...
Active Measurements on the AT&T IP Backbone
Len Ciavattone,
Al Morton, Gomathi Ramachandran
AT&T Labs
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 2
Colleagues on This Project
Nicole KowalskiRon KulperGeorge HolubecShashi Pulakurti
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 3
Measurements for Large Networks
Must be: Easily understood Estimate or assess customer performance Useful for alarming and associated actions Not likely to generate false positives As close as possible to real-time notification Part of the traditional fault/passive management system
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 4
Traditional Measurements
Fault Triggered by hard failures (link, card, router, etc) Near real-time alarms
Passive Element level monitoring Traffic, drops, device health, card performance monitored Performance alarming possible per interface
Where can traditional measurements be added to? Path level performance information Delay and delay variation measurements Indication of customer degradation (except hard failures)
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 5
Active Measurements
Active measurements introduce synthetic traffic into the network Advantages:
Traffic flow follows a sampled customer path Delay, delay variation and sampled loss directly measurable Possible to estimate customer impact of element level
degradation Well designed sampling methodology will allow sound
estimation of levels of degradation seen Can be used to give customers a sense of network behavior
(e.g. AT&T’s Network Status Site http://www.att.com/ipnetwork)
Disadvantages Need to introduce traffic into the network Based on sampling, not customer traffic
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 6
Practical Considerations
From a practical standpoint, what limits the measurements? Amount of data generated Desire to use a standard/unmodified UNIX kernel Expense of bigger and more powerful servers Cost of deployment of new servers in COs. Difficulty of acquiring appropriate GPS feed
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 7
Measurement Design
Poisson Sequence 15 minute duration = 0.3 pkts/sec Type UDP 278 bytes total packet loss threshold is a
min of 3 s
Periodic Sequence 1 minute duration Random Start Time 20 ms spacing Type UDP, IPv4 60 bytes total packet loss threshold is a min
of 3 s
24 hours. . .
15 minutes
Presented at the IETF 50 IPPM meeting by Al Morton
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 8
Sampling and Event Detection
Poisson Sequence All 15 minutes tested with average inter-arrival time of 3.33s Assume 10 s congestion events (minimum length) If
Probability of Detection by one or more packets
95.011 n)P(detectio 10*3.0t_length*cong_even ee
Length CycleTest
packets probe ofNumber
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 9
Sampling and Event Detection Periodic sequence
1-min test in a 15-min test cycle (2 if considering RT processes) Assume 10s congestion events (minimum length), assume 1 event per
test cycle
Consider that only recurring events are actionable: Average Number of cycles to detection (one-way) = 1/0.0777 = 13 test cycles
The Poisson Probe sequence detects accurately, the Periodic Probe sequence is used to characterize recurring events
event) (RT 144.06015
1060*2
events)way -(one 0778.06015
1060
Length CycleTest
lengthevent Congestionlength Test detection)(
P
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 10
Metrics
Round Trip (RT) Loss RT Delay (std dev, 95th percentile, min, mean) Inter-Packet Delay Variation (IPDV) and DV jitterOut of sequence events (non-reversing sequence
definition -- up for consideration in the IETF IPPM)Approximate one-way lossDegraded seconds or minutesLoss pattern (number of consecutive losses)Distributions of delay variationsTraceroutes performed at the beginning of each test
85 Metrics kept indefinitely
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 11
Tx Rcv Playout
Time spent in: Transit Rcv Buffer
1
2
3
4
t
1
2
3
4
Inter packetarrival time,longer thansend interval
IPDV is a measure of transfer delay variation. For Packet n,IPDV(n) = Delay(n) - Delay(n-1)
If the nominal transfer time is =10msec, and packet 2 is delayed in transit for an additional 5 msec, then two IPDV values will be affected.
IPDV(2) = 15 - 10 = 5 msec
IPDV(3) = 10 - 15 = -5 msec
IPDV(4) = 10 - 10 = 0 msec
IPDV Definition and Example
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 12
IP Packet Sequence
Src Dst Playout
Time spent in: Transit Rcv Buffer
1
2
3
41
2
3
Toleranceon R2 arrival with2 Packet Buffer
t
Arriving Packets are compared with the “next expected” RefNum.
Packet 2 arrives Out-of-Sequence, since Packet 3 has arrived and the “next expected” packet in Packet 4.
Packet 2 is Offset by 1 packet, or Late by the arrival time of Packet 2 - Packet 3 = t
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 13
Common Problems Detected
Route ChangesCard degradation Low-level fiber errorsEffects of Maintenance (Card swaps etc)
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 14
Examples of Detection
Bit errors that cause low-level (~0.03%) loss can be detected accurately using this method and can be fixed before customers feel the impact Typically in such cases the degradation is subtle enough
that traditional IP alarms do not show the problem clearly Customers aren’t complaining….yet In the case shown, no customer complaints were made and
the problem was fixed proactively
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 15
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0:07:0902/05/2002
14:37:1802/07/2002
5:03:5102/10/2002
19:41:2802/12/2002
10:10:5702/15/2002
0:35:1602/18/2002
15:02:5602/20/2002
5:39:0702/23/2002
20:00:4402/25/2002
10:35:3102/28/2002
1:08:2703/03/2002
15:53:3903/05/2002
Per
cen
tag
e L
oss
Increasing Bit Errors
Single packet loss per Periodic test
Two packet losses per Periodic test
More occasional Loss was seen with the Poisson Probe Sequence
Fiber span taken out of service
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 16
Detection of Route Changes
Time, h:m Type LostPackets
BurstDuration
Route Dur,m:s
0:49 Poisson 5 consec 23 sec 4:321:07 Poisson 5 consec 15 sec <2:001:09 Periodic 54 consec 1.04 sec (return)1:18 Poisson 4 consec 26 sec 2:101:30 Poisson 5 consec 28 sec 2:03
RT Delay
Time1:00 Periodic
Sequence1:15
6
91:07 1:09
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 17
Poisson Probe Route change detection
0
10
20
30
40
50
60
70
80
19:45:0811/28/2001
20:30:0811/28/2001
21:15:0811/28/2001
22:00:0911/28/2001
22:45:0811/28/2001
23:30:0811/28/2001
0:15:0811/29/2001
1:00:0811/29/2001
1:45:0911/29/2001
2:30:0911/29/2001
3:15:0811/29/2001
Del
ay (
ms)
min mean 95% Loss (%)
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 18
Periodic probe (same incident)
0
10
20
30
40
50
60
70
19:38:0311/28/2001
20:16:0011/28/2001
21:00:1911/28/2001
21:48:1811/28/2001
22:39:3211/28/2001
23:17:3411/28/2001
0:00:5411/29/2001
0:57:2211/29/2001
1:41:0911/29/2001
2:22:5511/29/2001
3:10:1111/29/2001
Del
ay (
ms)
min mean 95% %Loss
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 19
The “Blenders”
First shown by Steve Casner et al in the NANOG 22 conference (May 20-22, 2001, “A Fine-Grained View of High Performance Networking”, http://www.nanog.org/mtg-0105/agenda.html)
Seem to be properties of route loopsRare events, but interesting as they may shed light
on some properties of route convergence
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 20
Simple Blender
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
RtDelay(ms) TxSeqNo
• 88 packets arrive within 64 ms• 79 OOS packets, 9 in sequence• 7 sequence discontinuities.• Zero Loss• Delay and IPDV actually describe this event best
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 21
Simple Blender Magnified
0
200
400
600
800
1000
1200
1400
1600
1800
2000
17000 17200 17400 17600 17800 18000 18200 18400 18600 18800 19000
RtDelay(ms) TxSeqNo
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 22
Blender 2
0
1000
2000
3000
4000
5000
6000
7000
0 10000 20000 30000 40000 50000 60000 70000
RtDelay(ms) TxSeqNo
• Scattered loss throughout• 250 packets in event, •10 separate sequence discontinuities• Delay of first packet 6s
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 23
Blender 2
0
1000
2000
3000
4000
5000
6000
7000
53000 54000 55000 56000 57000 58000 59000 60000 61000
RtDelay(ms) TxSeqNo
L. Ciavattone, A. Morton, G. Ramachandran AT&T Labs Page 24
Summary
Active measurements:Can provide a view of customer performanceCan be used to alert maintenance personnel
proactivelyCan provide insight into network behaviorCan be used to improve planned maintenance