A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

Post on 18-Oct-2020

0 views 0 download

Transcript of A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

A Fine-Grained Distance Metric for Analyzing Internet Topology

Mark Crovella joint with

Gonca Gursun, Natali Ruchansky, and Evimaria Terzi

Distance in Graphs with Low Diameter •  Do ‘neighborhoods’ still

exist? •  Under what distance

metric?

A Motivating Problem

•  Internet routing between domains –  Autonomous Systems (ASes)

•  Simple question: what paths pass through my network? •  Important for network planning, traffic management,

security –  If someone at BU were to send an email to UM, would it go

through my network?

Surprisingly hard to answer!

•  Routing consists of an AS choosing a next hop for each destination –  Destinations are ‘prefixes’

•  Decisions are only partially communicated to neighbors –  In general, decisions made by a remote AS are not known

Observing Traffic

•  An AS can observe the traffic passing through it –  If BU sends traffic to UM through Sprint, Sprint knows it

•  Traffic only provides positive information –  Absence of traffic is ambiguous

•  If the observer does not see traffic from I to j, it is either –  A true zero: the path from i to j does not go through the observer; or –  A false zero: the path goes through, but i is not sending anything to j

The Visibility-Inference Problem

•  For each observer there is a ground truth matrix T –  T(i,j) = 1 è path from i to j passes through observer

•  Traffic summarized in observable matrix M –  M(i,j) = 1 è traffic was seen flowing from i to j –  M(i,j) = 1 è T(i,j) = 1

•  Problem: label the zeros in M as either true or false

•  Success metric: Detection Rate, False Alarm Rate –  Of true zeros

T =

2

40 1 01 1 01 0 1

3

5 M =

2

40 0 01 0 01 0 1

3

5

Intuition

•  Amplify knowledge obtained from traffic observation •  Empirically we observe that there are groups of

sources, destinations exhibiting `similar routing’ •  Observed traffic provides positive knowledge for entire

group

•  Requires a metric that captures the notion that d1 is `close’ to d2 while s1 is `far’ from s2

M =s1

s2

2

40 0 01 0 01 0 1

3

5

d1 d2 s1

s2

d1

d2

Seeking a metric for ‘neighborhoods’

•  Typical distance used in graphs is hop count •  Not suitable in small worlds

•  90% of prefix pairs have hop distance < 5 –  Clearly, typical distance metric is inappropriate

•  Need a metric that expresses ‘routed similarly in the Internet’ –  or other graph

0 200 400 600 800 10001

2

3

4

5

6

7

Hop

Dis

tanc

e

Prefix Pairs

Capturing global routing state •  Conceptually, imagine capturing the entire routing

state of the Internet in a matrix N •  N(i,j) = next hop on path from i to j •  Each row is actually the routing table of a single AS •  Now consider the columns

N =

2

66664

3

77775

Routing State Distance •  rsd(a,b) = # of entries that differ in columns a and b of N •  If rsd is small, most ASes think the two prefixes are

‘in the same direction’ •  A metric (obeys triangle inequality)

rsd=3 rsd=5

N =

2

66664

3

77775N =

2

66664

3

77775

RSD in Practice

•  Key observation: we don’t need all of H to obtain a useful metric –  That’s good – the problem would be solved anyway if we had that

•  Many (most?) nodes contribute little information to RSD –  Nodes at edges of network have nearly-constant rows in H

•  Sufficient to work with a small set of well-chosen rows of H –  Empirically we observe this to be the case

•  Such a set is obtainable from publicly available BGP measurements –  Note that public BGP measurements require some careful handling to use

properly for computing RSD

0 200 400 600 800 10001

2

3

4

5

6

7

Hop

Dis

tanc

e

Prefix Pairs0 200 400 600 800 1000

0

50

100

150

200

250

RSD

Prefix Pairs

Using RSD to amplify observed traffic

•  For each zero (i,j) in M –  Find , –  Compute –  If then (i,j) is a false zero; o/w true zero

•  Thresholds τ and β are easy to set in an automatic way •  Applied to three sets of ASes:

–  Core-100 and Core-1000: centrally located ASes –  Edge-1000: randomly chosen ASes, mostly near edge of net

Si = {i0 | rsd(i, i0) < ⌧} Di = {j0 | rsd(j, j0) < ⌧}⇡(i, j) =

PM(Si, Dj)

⇡(i, j) > �

Experimental setup

•  Ground-truth matrices from BGP data –  Collected all active paths from 38 sources to 135,000

destinations –  For every AS, construct 38 x 135,000 ground truth matrix T –  Data hygiene: discussed at end

•  Simulate traffic absence by setting some 1s to zeros –  Flipped at random from 1 to 0 –  10%, 30%, 50%, 95%

Quick Look

Performance

•  Accuracy on Edge-1000 is poorer –  At edge of network, problem is actually easier –  RSD not the right idea to use there

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

False Positive Rate

True

Pos

itive

Rat

e

10%30%50%95%

0 0.05 0.1

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

False Positive Rate

True

Pos

itive

Rat

e

10%30%50%95%

0 0.05 0.1

0.8

0.9

1

Core-1000 Core-100

Mean TPR and FPR: Core-1000

Flip Rate TPR FPR

10% 0.98 (0.03) 0.03 (0.04)

30% 0.98 (0.03) 0.03 (0.04)

50% 0.98 (0.03) 0.03 (0.05)

95% 0.98 (0.03) 0.21 (0.18)

Properties of RSD

•  Tree: rsd = 1 + length of path in tree

–  So rsd in a tree is low, generally O(log n)

•  Clique: rsd = n

•  Star: rsd = 3

On graphs of size n with shortest-path routing

Comparison to Classical Distance

rsd dStar 3 2

Balanced tree log n log nClique n 1

RSD on small worlds

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

RSD

W&S, Nature, ‘98

RSD remains useful as hop distance breaks down

p = 0 p = 0.01 p = 0.1

d

rsd

RSD as a tool for data analysis and discovery

•  RSD applied to Internet prefixes shows low effective rank •  Implication: RSD is good for visualization of prefixes

0 5 10 15 200

5

10

15 x 104

20 largest singular values of 1000 x 1000 RSD matrix for Internet prefixes

RSD for clustering – fine scale

−0.04 −0.038 −0.036 −0.034 −0.032 −0.030.02

0.03

0.04

0.05

0.06

0.07

ASN−QWEST

ESNET

GTANET−ASAPPLE

ASN852KREONET−AS−KR

TELE2TELIANET

Rede

ERX−JARINGWISCNET1−ASKANREN

INFOSPHERE

IDC2554

MORENET

SPIRITTEL−AS

XO−AS15

SINET−AS

ASC−NET

HARNET

UIOWA−AS INSINET

LGDACOM

CPCNET−AS4058

ERX−AU−NET

CTM−MO

INET−TH−AS

UNSPECIFIED

ASN−PACIFIC−INTERNET−IX

THAI−GATEWAY

HYUNDAI−KR

IDC

ODN

SAMART−BOARDER−AS

CSLOXINFO−AS−AP

WORLDNET−AS

KIXS−AS−KR

ICONZ−AS

GLOBE−TELECOM−AS

SEEDNET

LINTASARTA−AS−AP

ASN−IINETVOCUS−BACKBONE−AS

IINET−AU

DISC−AS−KR

ICN−AS

GLOBEINTERNET

GT−BELL

BAYAN−TELECOMMUNICATIONSHURRICANEWINDSTREAM

GRANDECOM−AS1

TRUEINTERNET−AS−AP

CSTNET−AS−AP

NETIRD

PI−AU

TPG−INTERNET−AP

HCNSEOCHO−AS−KR

APAN−JP

CTNET

TRANSACT−SDN−AS

VISIONNET

BEZEQ−INTERNATIONAL−AS

CELLCOM−AS

SPEEDCAST−AP

ISP−AS−AP

HCNCHUNGJU−AS−KR

TDNC

NEWTT−IP−AP

CRNET

MULTIMEDIA−AS−AP

DREAMX−AS

SMARTONE−AS−AP

ASN−PLATFORM−AP

AP−AUST−AU−AS

FX−PRIMARY−AS

GENESIS−AP

ZAQ

ASN−OZONLINE−AU

MOPH−TH−AP

FCABLE−AS

TOTNET−TH−AS−AP

CHEONANVITSSEN−AS−KR

DONGBUIT−AS−KRPUBNET1−AS

CMNET−GD

CNNIC−CN−COLNET

ASN−ATHOMEJP

GITS−TH−AS−AP

GNGAS

CAT−AP

COMINDICO−AP

SAERONET−AS−KR

TOKAI PACNET

GAYANET−AS−KREFTEL−AS−APHCNKUMHO−AS−KR

YAHOO−1VISIONARY

AIRSTREAMCOMM−NET

TELWEST−NETWORK−SVCS−STATIC

AFILIAS−NST

TRUVISTA

IPPLANET−AS

ITGATE

TVC−AS1

Diveo

ROCK−HILL−TELEPHONE

VEROXITY

BROADVIEWNET

INFOLINK−MIA−US

Hadara

RIPNET

CAVTEL02

HOMESCCITIZENS

CORPCOLO

NWT−AS−APK−OPTICOM

QALA−SG−AP

CITIC−HK−AP

ADC−BUDDYB−AS

GIGAPASS−AS−KR

ICE−AS−KRCOMCLARK−AS

SPTEL−AP AS−PNAPTOK

GIGAINFRA

CAMNET−AS

ASN−EQUINIX−APDREAMPLUS−AS−KR

GINAMHANVIT−AS−KR

HANVITIAB−AS−KR

ACE−1−WIFI−AS−AP

KBTNETSPEED−AS−AP

WWW−PH−AP

LINKTELECOM−NZ−AP

PLANET−OZI

POSNET

FPT−AS−AP

EGIHOSTING

USCARRIER

TWIN−LAKES

SERVERROOM

O1COMM

AMPATH−AS

CHL

BEN−LOMAND−TEL

C−R−T

HARCOM1

ISPNET−1

ALLIED−TELECOMBARR−XPLR−ASN

SKYBEST

VITSSEN−SUWON−AS−KR

INCHEON−AS−KR

ONLINE−AS

YAHOO

AINS−AS−AP

IPVG−AS−AP

SKYBB−AS−AP

BIGAIR−AP

PIPETRANSIT−AS−AP

KIRZ−AS−TH

BCL−TRANSIT−AS−AP

ABN−PEERING−AS−AP

CMNET−V4SHANGHAI−AS−AP

WICAM−AS

ISATNET−AS−IDDIGISATNET−AS−ID

YAHOO−JP−AS−AP

ENMAX−ENVISION

SANTEECOOPER

2901

CooperaciónUniversidade

FIORD−AS

IPNXng

NASSIST−AS

VOXEL−DOT−NETCOLOGUYS

GIGANEWS

XITTEL−AS EONIX−CORPORATION−AS−PHX01−WWW−INFINITIE−NET

EVERN2

ONS−COS

NEMONT−CELLULAR

CANBYTELEPHONEASSOCIATION

RSN−1

PARC−ASN

VPLSNET

CNNIC−PBSL−AP

HELLONET−AS−KR

EXTREMEBB−AS−MY

SIS−GROUP−SYD−AS−AP

AIS−TH−AP

PACTEL−AS−AP

SUNNYVISION−AS−AP

HCLC−AS−KR

CMBHK−AS−KR

CMBDAEJEON−AS−KR

BB−BROADBAND−TH−AS−AP

MURAMATSU−AS−AP

GOSCOMB−AS

NORTHLAND−CABLE

BUDREMYER−AS

CTC−CORE−AS

EPSILON

IPSERVERONE−AS−AP

JCN−AS−KR

DAELIM

DEDICATED−SERVERS−SYD−AS−AP

SUPERBROADBANDNETWORK−AS−AP

GRNT−JP−AS−AP

JASTEL−NETWORK−TH−AP

LBNI

TRIPLETNET−AS−AP

PFNL−ASN

EVERN−1

LIGHTOWER

IMPACT−AS−APCMNET−GUANGDONG−AP

CMNET−JIANGSU−AP

Visualizing all prefixes

−100 0 100

−100

0

100Smaller cluster is about 25% of all prefixes

Consistent clustering in any sample of prefixes

Understanding Clustering under RSD

In terms of the next-hop matrix N: •  A cluster C corresponds to a

set of columns of N, ie, N(:,C) •  The columns are close in RSD •  So they must be similar in

some positions S

In terms of BGP routing: •  For any row in N(S,C) the entries

must be nearly the same •  So S is a set of ASes making

similar routing decisions w.r.t. C

We call such a pair (S,C) a local atom

−100 0 100

−100

0

100

Why are Local Atoms Interesting?

•  Routing decisions by ASes are made independently •  There will often be many next hops for any given prefix •  Different ASes will choose different next hops for a prefix

Exposing Unexpected Coordination

−100 0 100

−100

0

100

−100 0 100

−100

0

100 A significant fraction of ASes are all following the same rule.

There is a special AS X in the system.

The rule is: “Always use AS X to get to any customer in its cone.”

The ASes making this coordinated decision are not related in any obvious way

The smaller cluster is a local atom.

Attractivity of AS6939

What is so special about Hurricane Electric?

•  A peering policy based only on colocation –  Settlement-free peering –  No commercial consideration or traffic ratios

•  Immensely attractive to small/medium NSPs –  “free” access to any customer network of HE –  HE backbone has global scope

“Hurricane Electric is willing to peer with networks which are connected to one or more exchange points which we have in common.”

he.net/adm/peering.html

Systematic Discovery of Local Atoms

•  Manual inspection uncovers a large-scale local atom –  The Hurricane Electric cluster –  Includes about 25% of prefixes in the Internet

•  Do smaller local atoms exist? How can we detect them? •  Need an effective clustering strategy for RSD

•  A natural approach: seek clusters such that –  Within-cluster RSD is minimized –  Between-cluster RSD is maximized

Pivot Clustering

•  Minimizing P-Cost is NP-hard, but there is an expected 3-approximation algorithm for it: Pivot clustering

Largest 5 clusters found using Pivot Clustering

•  Clusters show clear separation

•  Each cluster corresponds to a local atom

33

Number of Prefixes (C)

Number of Source ASes (S)

Destinations

C1 150 16 Ukraine 83% Czech. Rep 10%

C2 170 9 Romania 33% Poland 33%

C3 126 7 India 93% US 2%

C4 484 8 Russia 73% Czech rep. 10%

C5 375 15 US 74% Australia 16%

Interpreting Clusters

Conclusions

•  RSD is a metric that captures when two prefixes are “routed similarly in the Internet”

•  It has many useful properties –  a fine-grained distance metric even in small-world graphs –  captures a different notion of distance than shortest-path –  summarizes the routing state of the entire network –  effective for visualizing Internet prefixes –  allows in-depth analysis of BGP –  uncovers surprising and interesting patterns in routing

•  Local atoms

Thank you!

Gonca Gursun Natali Ruchansky Evimaria Terzi

Related Work •  Reported that BGP tables provide an incomplete view of the

AS graph. [Roughan et. al. ‘11]

•  Visualization based on AS degree and geo-location. [Huffaker and k. claffy ‘10]

•  Small scale visualization through BGPlay and bgpviz

•  Clustering on the inferred AS graph. [Gkantsidis et. al. ‘03]

•  Clustering prefixes that share the same BGP paths into policy atoms. [Broido and k. claffy ‘01]

•  Methods for calculating policy atoms and characteristics. [Afek et. al. ‘02]

36

Data Hygiene Implications

•  BGP data is known to favor customer-provider links and miss peer-peer links

•  Our restriction to 38 x 135000 known paths means that we are not missing any links in the scope of our experiments

•  Hence accuracy for the chosen subsets of M is not affected by missing links

•  However, the accuracy of our methods may be different on the full M –  Whether better or worse, it’s not clear –  There is some reason to believe it would be better…