1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory,...

15
1 Case Studies on Intra- Case Studies on Intra- Domain Routing Domain Routing Instability Instability Zhang Shu Zhang Shu Communications Research Laboratory, Ja Communications Research Laboratory, Ja pan pan To be renamed to National Institute of Information and To be renamed to National Institute of Information and Communications Technology) Communications Technology) APAN17 – Engineering Session APAN17 – Engineering Session 1/30/2004, Hawaii 1/30/2004, Hawaii

Transcript of 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory,...

Page 1: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

11

Case Studies on Intra-Domain Case Studies on Intra-Domain Routing InstabilityRouting Instability

Zhang ShuZhang Shu

Communications Research Laboratory, JapanCommunications Research Laboratory, Japan(( To be renamed to National Institute of Information and Communications TecTo be renamed to National Institute of Information and Communications Tec

hnology)hnology)

APAN17 – Engineering SessionAPAN17 – Engineering Session1/30/2004, Hawaii1/30/2004, Hawaii

Page 2: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

22

OverviewOverview

What is routing instability?What is routing instability? Methodology of the measurementMethodology of the measurement Case study 1: WIDE InternetCase study 1: WIDE Internet Case study 2: APAN Tokyo-XPCase study 2: APAN Tokyo-XP Conclusion and future workConclusion and future work

Page 3: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

33

Routing InstabilityRouting Instability Routing instabilityRouting instability

• Also called route flapsAlso called route flaps• Unexpected topology changeUnexpected topology change

Bad influenceBad influence• Packet lossPacket loss• Increased router loadIncreased router load• Wasted bandwidthWasted bandwidth

CausesCauses• Link failure, software bugLink failure, software bug

Types of routing instabilityTypes of routing instability• Inter-domainInter-domain• Intra-domainIntra-domain

Page 4: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

44

MethodologyMethodology MethodologyMethodology

• Use “tcpdump” to collect link state routing Use “tcpdump” to collect link state routing messagesmessages

• Then analyze the routing messages by self-Then analyze the routing messages by self-made toolsmade tools

OspfanalyOspfanaly Some other scriptsSome other scripts

• Include a CGI perl script to view the statistical results Include a CGI perl script to view the statistical results by webby web

Page 5: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

55

Open Shortest Path FirstOpen Shortest Path First• A widely deployed intra-domain link state routing prA widely deployed intra-domain link state routing pr

otocolotocol• OSPFv2 and OSPFv3OSPFv2 and OSPFv3

Link state advertisements (LSAs)Link state advertisements (LSAs)• OSPFv2OSPFv2

Router-LSARouter-LSA Network-summary-LSANetwork-summary-LSA AS-external-LSAAS-external-LSA

• OSPFv3OSPFv3 Seven kinds of LSAs defined in RFC2740Seven kinds of LSAs defined in RFC2740

OSPFOSPF

Network-LSANetwork-LSA ASBR-summary-LSAASBR-summary-LSA

Page 6: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

66

Case Study One: WIDE InternetCase Study One: WIDE Internet

WIDE InternetWIDE Internet• WIDE Project WIDE Project

http://www.wide.ad.jphttp://www.wide.ad.jp• Connecting hundreds of organizationsConnecting hundreds of organizations

NARA-NOCNARA-NOC• Located in Nara Institute of Science and TechnologLocated in Nara Institute of Science and Technolog

y, Japany, Japan• The measurement machine is placed into one etheThe measurement machine is placed into one ethe

rnet segment of the NARA-NOC networkrnet segment of the NARA-NOC network

Page 7: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

77

Measurement Result of WIDE Internet (OSPFv2)Measurement Result of WIDE Internet (OSPFv2)

Date (Year/Month)

Num

ber of LS

A changes

Num

ber of LS

As

Page 8: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

88

The Case of OSPFv3The Case of OSPFv3N

umber of L

SA

changes

Date (Year/Month)

Num

ber of LS

As

Page 9: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

99

Other Findings during the AnalysisOther Findings during the Analysis

Sometimes serious LSA oscillation Sometimes serious LSA oscillation happenedhappened• The change happens with the interval of The change happens with the interval of

10s-200s10s-200s• Usually lasts for hours, sometimes for daysUsually lasts for hours, sometimes for days

Oscillation of router-LSAOscillation of router-LSA• Most of the observed oscillation was the Most of the observed oscillation was the

repeated up/down of routers’ interfacesrepeated up/down of routers’ interfaces

Page 10: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1010

The Causes of the FlapsThe Causes of the Flaps The isolated causesThe isolated causes

• CongestionCongestion DDoS attacksDDoS attacks

• Operation missOperation miss Mis-configuration of router IDMis-configuration of router ID

• Software/Hardware bugSoftware/Hardware bug Zebra routing daemonZebra routing daemon Cisco’s OSPF bugCisco’s OSPF bug Foundry switchFoundry switch

The causes of much flaps are still unknownThe causes of much flaps are still unknown• The flaps occur randomlyThe flaps occur randomly

Why the flaps decrease in the recent months?Why the flaps decrease in the recent months?• The change of routing protocol implementation styleThe change of routing protocol implementation style

Special process on routing messagesSpecial process on routing messages• BandwidthBandwidth

Page 11: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1111

Case Study Two: APAN Tokyo-XPCase Study Two: APAN Tokyo-XP

APAN Tokyo-XPAPAN Tokyo-XP• Located in Otemachi, Tokyo Located in Otemachi, Tokyo • Seven routers in the backbone areaSeven routers in the backbone area• Data collected on a FreeBSD box Data collected on a FreeBSD box

connected to a ethernet segmentconnected to a ethernet segment

Page 12: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1212

Measurement Result of APAN Measurement Result of APAN Tokyo-XP (OSPFv2)Tokyo-XP (OSPFv2)

Num

ber of LS

As

Date (Year/Month)

Num

ber of LS

A changes

Although most of the updates are due to router maintenance,Although most of the updates are due to router maintenance,

there still unknown ones.there still unknown ones.

Page 13: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1313

ConclusionConclusion Our investigation on WIDE InternetOur investigation on WIDE Internet

• OSPF LSA oscillation may occur frequently OSPF LSA oscillation may occur frequently sometimessometimes

• Sometimes serious oscillation occurredSometimes serious oscillation occurred• It is difficult to determine what caused the It is difficult to determine what caused the

flapsflaps Similar phenomenon may be found on Similar phenomenon may be found on

other networks, so it is important to other networks, so it is important to deploy a measurement system on deploy a measurement system on different networksdifferent networks

Page 14: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1414

Future WorkFuture Work

To do more measurement on other To do more measurement on other networksnetworks• Abilene of Internet2Abilene of Internet2

To improve our monitoring systemTo improve our monitoring system To isolate the causesTo isolate the causes

• When detects oscillation, obtain When detects oscillation, obtain helpful data for troubleshootinghelpful data for troubleshooting

Page 15: 1 Case Studies on Intra-Domain Routing Instability Zhang Shu Communications Research Laboratory, Japan ( To be renamed to National Institute of Information.

1515

If you would like to conduct a If you would like to conduct a routing instability measurement on routing instability measurement on your own network, please contactyour own network, please contact

Zhang Shu Zhang [email protected]@koganei.wide.ad.jp

Thank you for your attention!Thank you for your attention!