1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao...

19
1 A Framework for Measuring and A Framework for Measuring and Predicting the Impact of Predicting the Impact of Routing Changes Routing Changes Ying Zhang Z. Morley Mao Jia Ying Zhang Z. Morley Mao Jia Wang Wang

Transcript of 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao...

Page 1: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

11

A Framework for Measuring and A Framework for Measuring and Predicting the Impact of Routing Predicting the Impact of Routing

ChangesChanges

Ying Zhang Z. Morley Mao Jia Wang Ying Zhang Z. Morley Mao Jia Wang

Page 2: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

22

Internet routing changesInternet routing changes Various causesVarious causes

Link failures, configuration changes, topology changes, etc. Link failures, configuration changes, topology changes, etc. Direct influence on the data planeDirect influence on the data plane

Transient data-plane disruptionTransient data-plane disruption Packet loss, increased delay, forwarding loopsPacket loss, increased delay, forwarding loops

Internet

CBR

CBR

CBR

Destination

SourceOld path

New path

CBR

CBR

CBR

CBRCBR

Page 3: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

MotivationMotivation

Frequent routing dynamics can cause Frequent routing dynamics can cause transient disruption in the data planetransient disruption in the data planeInconsistent routes during convergenceInconsistent routes during convergence

Real-time applications can be Real-time applications can be affectedaffected

Predicting performance impact can Predicting performance impact can assist more intelligent route selectionassist more intelligent route selection

33

Page 4: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Measuring and predicting the Measuring and predicting the impactimpact

Comprehensively measure the Comprehensively measure the impact of routing changesimpact of routing changes

Characterize the properties of Characterize the properties of routing changes that cause traffic routing changes that cause traffic disruptiondisruption

Search for pattern to help predictionSearch for pattern to help prediction

44

Page 5: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

OutlineOutline

MotivationMotivationMethodologyMethodologyCharacterization of data-plane Characterization of data-plane

failuresfailuresFailure prediction model Failure prediction model

55

Page 6: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

MethodologyMethodology

Data collectionData collection Control plane: local real-time BGP updates Control plane: local real-time BGP updates Data plane: ping and traceroute probes for each updateData plane: ping and traceroute probes for each update

A light weight active probing methodologyA light weight active probing methodology A coarse-grained performance metric: reachabilityA coarse-grained performance metric: reachability

Destination reachable: any ping replyDestination reachable: any ping reply Scalable to many destinations with live IPs Scalable to many destinations with live IPs

Measurement-based approachMeasurement-based approach No simplifying assumptionsNo simplifying assumptions Empirical evidenceEmpirical evidence

66

Page 7: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Our approachOur approach Focus: measure data-plane failures caused by Focus: measure data-plane failures caused by

routing changesrouting changes Coarse-grained performance metricsCoarse-grained performance metrics

Methodology: light-weight active probingMethodology: light-weight active probing Triggered by locally observed routing updatesTriggered by locally observed routing updates Probing target of a live IP within the prefix Probing target of a live IP within the prefix

77

Prefix P

Old path

New path

CBR

AS C

Update Prefix: P,

AS path: A D B

CBRAS B

AS A

CBR

AS DMeasurement Framework

Internet

Page 8: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Our approachOur approach Focus: measure data-plane failure caused by Focus: measure data-plane failure caused by

routing changesrouting changes Methodology: light-weight active probingMethodology: light-weight active probing

Triggered by locally observed routing updatesTriggered by locally observed routing updates Probing target of a live IP within the prefix Probing target of a live IP within the prefix

88Live IP 1 within Prefix P

Old path

New path

CBR

AS C

Ping

CBRAS B

AS A

CBR

AS DMeasurement Framework

Internet

Traceroute

Ping, traceroute

Page 9: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Probing controlProbing control

Background probingBackground probing Identifying persistent failuresIdentifying persistent failures Verifying live IP’s responseVerifying live IP’s response

Resource controlResource control Ignoring updates due to table transfersIgnoring updates due to table transfers Imposing maximum probing durationImposing maximum probing duration

Accuracy controlAccuracy control Impose maximum waiting durationImpose maximum waiting duration

99

Page 10: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

OutlineOutline

MotivationMotivationMethodologyMethodologyCharacterization of data-plane Characterization of data-plane

failuresfailuresFailure prediction model Failure prediction model

1010

Page 11: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Characterization of data-plane Characterization of data-plane failuresfailures

Failure typesFailure types Reachability failureReachability failure

Ping reply is not received due to network problemsPing reply is not received due to network problems Forwarding loopsForwarding loops

A subset of reachability failuresA subset of reachability failures Transient loops observed in the pathTransient loops observed in the path

Failure propertiesFailure properties Affected networksAffected networks Failure durationFailure duration Failure predictabilityFailure predictability

1111

Page 12: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Overall reachability failure Overall reachability failure statisticsstatistics

1212

Incidence

Prefix AS

Unreachable

Loop 6% 23% 33%

Other 36% 72% 38%

All 42% 73% 63%

Reachable 57% 83% 98%

Internet experiments for 11 weeks

Page 13: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Affected network locationsAffected network locations Understanding the networks affected by routing changesUnderstanding the networks affected by routing changes

Most Ases are near the edge and in foreign countriesMost Ases are near the edge and in foreign countries Small fraction of destinations experiencing many unreachable Small fraction of destinations experiencing many unreachable

incidencesincidences

1313

Page 14: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Failure durationsFailure durations Short durationShort duration

Most last less than 300 secondsMost last less than 300 seconds Transient routing failure, convergence delayTransient routing failure, convergence delay

10% incidences with longer duration10% incidences with longer duration Configuration errors or path failuresConfiguration errors or path failures

1414

Page 15: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Failure predictabilityFailure predictability

Destination prefix informationDestination prefix information Appearance probabilityAppearance probability

Probability of an unreachable incidence for prefix DProbability of an unreachable incidence for prefix D

Destination prefix and AS path segmentsDestination prefix and AS path segments Conditional probability on AS path segmentsConditional probability on AS path segments

Probability of an unreachable event occurring given a particular AS path segment Probability of an unreachable event occurring given a particular AS path segment

Responsible ASResponsible AS Where traceroute stopsWhere traceroute stops

1515

Page 16: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

OutlineOutline

MotivationMotivationMethodologyMethodologyCharacterization of data plane failureCharacterization of data plane failureFailure prediction model Failure prediction model

1616

Page 17: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Prediction modelPrediction model

Prefix and AS segment informationPrefix and AS segment informationThe data plane failure likelihood ratioThe data plane failure likelihood ratio

P(Y=1|R;D): the conditional probability of data-plane failure given P(Y=1|R;D): the conditional probability of data-plane failure given a routing update R for prefix Da routing update R for prefix D

Assuming the failure on each AS is independentAssuming the failure on each AS is independent

xxii is the responsible AS in history data is the responsible AS in history data

1717

);|0(

);|1()(

DRYP

DRYPY

));|1(1(1);,...,|1(1

21

n

iin DxYPDxxxRYP

Page 18: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

EvaluationEvaluation The trade-off between selectivity and sensitivityThe trade-off between selectivity and sensitivity is the decision threshold which determines false positives is the decision threshold which determines false positives

and false negative routeand false negative route Receiver operating characteristicReceiver operating characteristic

Evaluation resultsEvaluation results 60% detection rate 60% detection rate

with 18% false positiveswith 18% false positives

1818

Page 19: 1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

ConclusionConclusion

Developed an efficient framework for Developed an efficient framework for measuring and predicting data-plane measuring and predicting data-plane failures caused by routing changesfailures caused by routing changes

Identified patterns to accurately Identified patterns to accurately predict data-plane failurespredict data-plane failures

Provided suggestions for more Provided suggestions for more intelligent route selectionsintelligent route selections

1919