Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

43
Incident analysis with RIPE NCC tools Analysing the RIS/Duke BGP incident Erik Romijn <[email protected] > Senior Software Engineer

description

Presentation given at RIPE 61 and LINX 71

Transcript of Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Page 1: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Incident analysis with RIPE NCC toolsAnalysing the RIS/Duke BGP incident

Erik Romijn <[email protected]>

Senior Software Engineer

Page 2: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

RIPE NCC information collection

• Routing Information Service (RIS)_ Listens to and stores all BGP updates_ Receiving data form 600 peers

2

Page 3: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

RIPE NCC information collection

• DNS monitoring service (DNSMON)_ Monitors critical DNS infrastructure_ About 100 vantage points worldwide

3

Page 4: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

RIPE NCC information collection

• Test Traffic Measurements (TTM)_ One-way latency/jitter/loss & traceroutes_ About 100 nodes in full mesh

4

Page 5: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Case study:RIPE NCC / Duke University

BGP experiment

5

Page 6: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

RIS experiments & announcements

• RIS has a long tradition of supporting research

• Second AS in the world to announce 4-byte AS numbers

• Beacon prefixes from RIS available since 2002_ Also a vital part of debogonizing

6

Page 7: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Case study: RIPE NCC BGP experiment

• RIPE NCC conducted an experiment on 27-08-2010_ An optional BGP attribute was announced_ This was a optional transitive attribute of 3000 bytes_ The announcement was valid according to RFC4271

• Some routers corrupted the route and sent it_ Peers who saw this dropped the session

• This caused disruption to some internet traffic

7

Page 8: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

• Announcement active from 08:41 to 09:08 UTC, using 93.175.144.0/24

• We later observed some negative impact

• Immediately started an extensive investigation_ This pointed towards a Cisco IOS XR bug_ Sent out a very detailed private announcement_ Also provided Cisco with all details

• Cisco released cisco-sa-20100827-bgp

8

Case study: 27 August 2010

Page 9: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Propagation of the announcement

9

Other router Other router

Other routers

Other router

Other router

Page 10: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Other router

Erik Romijn

Propagation of the announcement

10

RISAS65550

RIS @ AMS-IXAS12654

Other router

Other routers

Other router

Other router

Page 11: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Propagation of the announcement

11

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Page 12: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Propagation of the announcement

12

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Page 13: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Propagation of the announcement

13

RISAS65550

RIS @ AMS-IXAS12654

Other router Faulty router

Other routers

Other router

Other router

Page 14: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Propagation of the announcement

14

Other router

Other routers

Other router

Other router

Faulty router

Page 15: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Goal of the experiment

• Research group from Duke University approached RIPE NCC to help

• Their goal was to measure support for long optional transitive attributes

_ Intended to be used for certificates for secure routing

• They did not have an AS number or addresses

• Provided RIPE NCC with a patched Quagga

15

Page 16: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Expected results

A.The route propagates with the attribute intact

B.The route propagates, with some AS in the path removing the attribute

C.The route propagates, but takes a different path because some ASes drop the route

A and B were seen in 4-byte AS number tests.

16

Page 17: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Impact of the experimenton the Internet

17

Page 18: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Unstable prefixes

18

0%

25%

50%

75%

100%

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Per

cent

age

of t

otal

pre

fixes

(320

000)

Time (UTC)

Page 19: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

E-mails per hour - 28-29 August

19

0

15

30

45

60

0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00 0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00

Mai

ls p

er h

our

Time (UTC)

First NANOGpost

First LINX post

Initial RIPE NCC announcement / first AMS-IX post

Traffic on AMS-IX, LINX & NANOG

Page 20: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0%

0.5%

1.0%

1.5%

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Per

cent

age

of t

otal

pre

fixes

(320

000)

Time (UTC)

Erik Romijn

Unstable prefixes

20

Page 21: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0.00%

0.05%

0.10%

0.15%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Per

cent

age

of t

otal

pre

fixes

(320

000)

Duration of Invisibility (minutes)

Erik Romijn

Length of invisibilities

21

8-10 UTC, July 30, 2010 (total: 0.24%)8-10 UTC, Aug 20, 2010 (total: 0.11%)8-10 UTC, Aug 26, 2010 (total: 0.26%)8-10 UTC, Aug 27, 2010 (total: 0.69%)

Page 22: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Length of invisibilities

22

0.00%

0.05%

0.10%

0.15%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Per

cent

age

of t

otal

pre

fixes

(320

000)

Duration of Invisibility (minutes)

8-10 UTC, July 30, 2010 (total: 0.24%)8-10 UTC, Aug 20, 2010 (total: 0.11%)8-10 UTC, Aug 26, 2010 (total: 0.26%)8-10 UTC, Aug 27, 2010 (total: 0.69%)

Page 23: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Critical DNS infrastructure (from DNSMON)

• Root servers unaffected

• 57% of TLDs unaffected

• Minor effects for 38% of the TLDs_ Some dropped queries for one or two servers

• More significant effects on 5% of the TLDs

23

Page 24: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Critical DNS infrastructure

24

Page 25: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 26: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 27: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 28: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 29: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 30: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 31: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71
Page 32: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

View from a TTM probe in Prague, CZ

32

Page 33: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

IPv4 IPv6Erik Romijn

Updates for IPv4 vs IPv6

33

Page 34: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

IPv4 IPv6Erik Romijn

Updates for IPv4 vs IPv6

34

Most affected BGP sessions

did not carry IPv6 routes

Page 35: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0

10

20

30

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er m

inut

e p

er 1

000

pre

fixes

Time (UTC)

Erik Romijn

Unstable prefixes vs number of updates

35

Page 36: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

View from a TTM probe in Prague, CZ

36

Page 37: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Locality of effects - updates

37

0

200

400

600

800

1000

Average up

d/sec p

er full table p

eer

8:25 AM8:35 AM

8:45 AM8:55 AM

9:05 AM9:15 AM

9:25 AM

9:35 AM

BGP Updates on all RIS locations (IPv4)

LINX, London AMS-IX/NL-IX/GN-IX, AmsterdamCIXP, Geneva VIX, ViennaDIX-IE, Tokyo Netnod, StockholmMIX, Milan NYIIX, New YorkDE-CIX, Frankfurt MSK-IX, MoscowPAIX, Palo Alto PTT, Sao PauloNOTA, Miami

VIX

LINX

AMS-IX

CIXP

DIX-IENetnodMIXNYIIXDE-CIXMSK-IXPAIXPTTNOTA

Page 38: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Locality of effects - withdrawals

38

0

80

160

240

320

400

Average up

d/sec p

er full table p

eer

8:25 AM8:35 AM

8:45 AM8:55 AM

9:05 AM9:15 AM

9:25 AM9:35 AM

BGP Withdrawals on all RIS locations (IPv4)

LINX, London AMS-IX/NL-IX/GN-IX, AmsterdamCIXP, Geneva VIX, ViennaDIX-IE, Tokyo Netnod, StockholmMIX, Milan NYIIX, New YorkDE-CIX, Frankfurt MSK-IX, MoscowPAIX, Palo Alto PTT, Sao PauloNOTA, Miami

LINXAMS-IX

CIXPDIX-IENetnodMIXNYIIXDE-CIXMSK-IXPAIXPTTNOTA

VIX

Page 39: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Locality of effects - vendors per IX

39

9%

4%2%

18%

67%

VIX

9%

2%

14%

33%

42%

AMS-IX

9%

4%

9%

23%

54%

NYIIX

CiscoJuniperBrocadeIntelOther

4%5%5%

28%

58%

LINX

7%2%7%

34%

50%

DE-CIX

Page 40: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

Lessons learned

• Future experiments should be pre-announced with sufficient lead time

• Detected vulnerabilities should be handledwith more care

• More comprehensive impactassessments are needed

• Your input is welcome: <[email protected]>

40

Page 42: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

0

750

1500

2250

3000

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50

Up

dat

es p

er 1

000

pre

fixes

per

min

ute

Time (UTC)

Erik Romijn

Updates per prefix range

42Updates for prefixes 0-90 Updates for prefixes 100-255

Page 43: Incident analysis using RIPE NCC tools - RIPE 61 and LINX 71

Erik Romijn

AS path length

43

0

1

2

3

4

5

8:00 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30 9:40 9:50 10:00 10:10 10:20 10:30 10:40 10:50

Ave

rage

AS

pat

h le

ngth

in u

pd

ates

Time (UTC)