BGP-lens: Patterns and Anomalies in Internet Routing Updates B. Aditya Prakash 1, Nicholas Valler 2,...

31
BGP-lens: Patterns and Anomalies in Internet Routing Updates B. Aditya Prakash 1 , Nicholas Valler 2 , David Andersen 1 , Michalis Faloutsos 2 , Christos Faloutsos 1 1 Carnegie Mellon University 2 UC-Riverside KDD 2009, Paris

Transcript of BGP-lens: Patterns and Anomalies in Internet Routing Updates B. Aditya Prakash 1, Nicholas Valler 2,...

BGP-lens: Patterns and Anomalies in Internet Routing Updates

B. Aditya Prakash1, Nicholas Valler2, David Andersen1, Michalis Faloutsos2, Christos Faloutsos1

1Carnegie Mellon University2UC-Riverside

KDD 2009, Paris

Introduction

• Border Gateway Protocol (BGP)– Internet Routing Protocol– Router sending messages to each other– Keeps path information up-to-date

• Ideal Setting - no BGP updates• Really – many updates

– link failures, router restarts, malicious behavior

2

Time peerAS originAS prefix

2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24

2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24

2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24

…. …. …. ….

Each Row is an

update

Introduction contd.

Question: Find patterns/anomalies?• Challenges:

– Millions of updates sent over network– Data has multiple dimensions– Noisy Measurements– Impossible for human to sift through updates

3

Automated Tool needed!

The DataTime peerAS originAS prefix

2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24

2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24

2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24

…. …. …. ….

• Data from Datapository.net• Abilene Network

4

18 million update messages – over two years!

Our Approach

• Look at a simple time-series• Focus on just the time• # of updates received every b seconds (bin size)

• Specific Problem we are tackling– Given such time-series– Report patterns and anomalies

• Also find suspicious entities (paths, ASes etc.)

5

Time

2005-02-17 12:39:42

2005-02-17 12:39:43

2005-02-17 12:39:46

2005-02-17 12:40:01

….

Time peerAS originAS prefix

2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24

2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24

2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24

…. …. …. ….

b secs

time

Bin: 0 1 2 …

Count: 4 2 6 …

Real data: Washington Router

Very Bursty!

Traditional Tools like FFT, auto-regression

don’t work

6

# of Updates

Bin number (‘Time’)

Bin Size = 600s

Outline

• Introduction and Problem Statement• Techniques

– Temporal Analysis– Frequency Analysis

• BGP-lens at work• Conclusions

7

Temporal Analysis

• First Cut: Take log-linear plot– emphasizes small values over high values

8

Bin size: 10sBin size: 10s

9

But: Bin size is important!

10

‘Clotheslines’

Bin size: 600sBin size: 600s

Clotheslines

Q1: Why Clotheslines?– Near consecutive updates over long time-period

– Can be Route Flapping• advertise/withdraw same path frequently• important to identify

Q2: How to automate this discovery? 11

Proposal: Marginals to Rescue

• PDF of volume of updates– Number of time-bins with volume

Extremes == Height of the clotheslines!

12

Marginals to Rescue

• PDF of volume of updates– Number of time-bins with volume

13

Algorithm - Clotheslines

• For marginals plot use the median filtering approach to determine ‘outliers’;

• For each time interval found, report the most consistent IPs/ASes etc.

High Level Idea only – details in

paper!

14

Outline

• Introduction and Problem Statement• Techniques

– Temporal Analysis– Frequency Analysis

• BGP-lens at work• Conclusions

15

16

Low Freq.

HighFreq.

High energy Low energy‘Tornado’does nottouch down

time ->Signal

In real data…

17

E2

18

E2

~ 20,000 updates!

~ 8 hrs

Why Prolonged Spike?

• Bursts of short duration

• Can represent malicious behavior– Or simple router restarts!

• Exact cause hard to find – but important for system-administrators

19

Algorithm – Prolonged Spikes

• Basic idea: find tornados from scalogram• Find suitable starting point at higher levels• Extend downward as much as possible• The finest scale where tornado stops

– the shortest time period to look for a prolonged spike

• Again, details in paper!

20

Scalability

21

BGP-lens: User Interface

22

# of suspicious events sysadmin wants to check

duration: length of events to be checked (think daily vs weekly vs monthly)

optionaloptional

Outline

• Introduction and Problem Statement• Techniques

– Temporal Analysis– Frequency Analysis

• BGP-lens at work• Conclusions

23

BGP-lens at Work

• We found real events too . examples-Event 1:

50-clothesline– Prefix and Origin-AS pointed to Alabama

Supercomputing Net– When contacted sysadmins

• attributed changes to route flapping• “the route for 207.157.115.0/24 was appearing and

disappearing in [the] IGP routing table ... [which] may have caused BGP to flap.”

– Anomaly went undetected and unresolved for 30 days!

24

Results from real data

25

Event 2Prolonged Spike

– May 12th 2006 – 8hr spike – Most persistent IPs/ASes

• Primary and middle schools in a large district in a country

– Two more spikes Jan18-19, 2006 and Aug 1

Conclusions• Studied huge real data (~18 million updates)• Developed two new techniques

– effective• spots subtle phenomena like clotheslines and prolonged

spikes

– scalable

• BGP-lens: a user-friendly tool• provides reasonable defaults• provides easy-to-use knobs• leads like IPs/ASes

26

Thank You!

• Any questions?

• www.cs.cmu.edu/~badityap– We thank NSF, USA for their support.

• Author-Reel!

27

Extra - Frequency Analysis

• Data is self-similar!– we used the entropy-plot measure – also called the b-model [26]

– Corresponds to b-model of 75-25

– Multi-resolution techniques needed!

28

Extra - FFT

29

Extra – Marginals for 10sec

30

Extra – Prolonged Spike Algorithm

31