A Fine-Grained Distance Metric for Analyzing Internet Topology

Mark Crovella joint with

Gonca Gursun, Natali Ruchansky, and Evimaria Terzi

Distance in Graphs with Low Diameter •  Do ‘neighborhoods’ still

exist? •  Under what distance

metric?

A Motivating Problem

•  Internet routing between domains –  Autonomous Systems (ASes)

•  Simple question: what paths pass through my network? •  Important for network planning, traffic management,

security –  If someone at BU were to send an email to UM, would it go

through my network?

Surprisingly hard to answer!

•  Routing consists of an AS choosing a next hop for each destination –  Destinations are ‘prefixes’

•  Decisions are only partially communicated to neighbors –  In general, decisions made by a remote AS are not known

Observing Traffic

•  An AS can observe the traffic passing through it –  If BU sends traffic to UM through Sprint, Sprint knows it

•  Traffic only provides positive information –  Absence of traffic is ambiguous

•  If the observer does not see traffic from I to j, it is either –  A true zero: the path from i to j does not go through the observer; or –  A false zero: the path goes through, but i is not sending anything to j

The Visibility-Inference Problem

•  For each observer there is a ground truth matrix T –  T(i,j) = 1 è path from i to j passes through observer

•  Traffic summarized in observable matrix M –  M(i,j) = 1 è traffic was seen flowing from i to j –  M(i,j) = 1 è T(i,j) = 1

•  Problem: label the zeros in M as either true or false

•  Success metric: Detection Rate, False Alarm Rate –  Of true zeros

40 1 01 1 01 0 1

40 0 01 0 01 0 1

Intuition

•  Amplify knowledge obtained from traffic observation •  Empirically we observe that there are groups of

sources, destinations exhibiting `similar routing’ •  Observed traffic provides positive knowledge for entire

•  Requires a metric that captures the notion that d1 is `close’ to d2 while s1 is `far’ from s2

40 0 01 0 01 0 1

d1 d2 s1

Seeking a metric for ‘neighborhoods’

•  Typical distance used in graphs is hop count •  Not suitable in small worlds

•  90% of prefix pairs have hop distance < 5 –  Clearly, typical distance metric is inappropriate

•  Need a metric that expresses ‘routed similarly in the Internet’ –  or other graph

0 200 400 600 800 10001

Prefix Pairs

Capturing global routing state •  Conceptually, imagine capturing the entire routing

state of the Internet in a matrix N •  N(i,j) = next hop on path from i to j •  Each row is actually the routing table of a single AS •  Now consider the columns

Routing State Distance •  rsd(a,b) = # of entries that differ in columns a and b of N •  If rsd is small, most ASes think the two prefixes are

‘in the same direction’ •  A metric (obeys triangle inequality)

rsd=3 rsd=5

77775N =

RSD in Practice

•  Key observation: we don’t need all of H to obtain a useful metric –  That’s good – the problem would be solved anyway if we had that

•  Many (most?) nodes contribute little information to RSD –  Nodes at edges of network have nearly-constant rows in H

•  Sufficient to work with a small set of well-chosen rows of H –  Empirically we observe this to be the case

•  Such a set is obtainable from publicly available BGP measurements –  Note that public BGP measurements require some careful handling to use

properly for computing RSD

0 200 400 600 800 10001

Prefix Pairs0 200 400 600 800 1000

Prefix Pairs

Using RSD to amplify observed traffic

•  For each zero (i,j) in M –  Find , –  Compute –  If then (i,j) is a false zero; o/w true zero

•  Thresholds τ and β are easy to set in an automatic way •  Applied to three sets of ASes:

–  Core-100 and Core-1000: centrally located ASes –  Edge-1000: randomly chosen ASes, mostly near edge of net

Si = {i0 | rsd(i, i0) < ⌧} Di = {j0 | rsd(j, j0) < ⌧}⇡(i, j) =

PM(Si, Dj)

⇡(i, j) > �

Experimental setup

•  Ground-truth matrices from BGP data –  Collected all active paths from 38 sources to 135,000

destinations –  For every AS, construct 38 x 135,000 ground truth matrix T –  Data hygiene: discussed at end

•  Simulate traffic absence by setting some 1s to zeros –  Flipped at random from 1 to 0 –  10%, 30%, 50%, 95%

Quick Look

Performance

•  Accuracy on Edge-1000 is poorer –  At edge of network, problem is actually easier –  RSD not the right idea to use there

0 0.2 0.4 0.6 0.8 10

False Positive Rate

10%30%50%95%

0 0.05 0.1

0 0.2 0.4 0.6 0.8 10

False Positive Rate

10%30%50%95%

0 0.05 0.1

Core-1000 Core-100

Mean TPR and FPR: Core-1000

Flip Rate TPR FPR

10% 0.98 (0.03) 0.03 (0.04)

30% 0.98 (0.03) 0.03 (0.04)

50% 0.98 (0.03) 0.03 (0.05)

95% 0.98 (0.03) 0.21 (0.18)

Properties of RSD

•  Tree: rsd = 1 + length of path in tree

–  So rsd in a tree is low, generally O(log n)

•  Clique: rsd = n

•  Star: rsd = 3

On graphs of size n with shortest-path routing

Comparison to Classical Distance

rsd dStar 3 2

Balanced tree log n log nClique n 1

RSD on small worlds

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

0.0001 0.001 0.01 0.1 1p

Clustering CoefficientHop Distance

W&S, Nature, ‘98

RSD remains useful as hop distance breaks down

p = 0 p = 0.01 p = 0.1

RSD as a tool for data analysis and discovery

•  RSD applied to Internet prefixes shows low effective rank •  Implication: RSD is good for visualization of prefixes

0 5 10 15 200

15 x 104

20 largest singular values of 1000 x 1000 RSD matrix for Internet prefixes

RSD for clustering – fine scale

−0.04 −0.038 −0.036 −0.034 −0.032 −0.030.02

ASN−QWEST

GTANET−ASAPPLE

ASN852KREONET−AS−KR

TELE2TELIANET

ERX−JARINGWISCNET1−ASKANREN

INFOSPHERE

IDC2554

MORENET

SPIRITTEL−AS

XO−AS15

SINET−AS

ASC−NET

HARNET

UIOWA−AS INSINET

LGDACOM

CPCNET−AS4058

ERX−AU−NET

CTM−MO

INET−TH−AS

UNSPECIFIED

ASN−PACIFIC−INTERNET−IX

THAI−GATEWAY

HYUNDAI−KR

SAMART−BOARDER−AS

CSLOXINFO−AS−AP

WORLDNET−AS

KIXS−AS−KR

ICONZ−AS

GLOBE−TELECOM−AS

SEEDNET

LINTASARTA−AS−AP

ASN−IINETVOCUS−BACKBONE−AS

IINET−AU

DISC−AS−KR

ICN−AS

GLOBEINTERNET

GT−BELL

BAYAN−TELECOMMUNICATIONSHURRICANEWINDSTREAM

GRANDECOM−AS1

TRUEINTERNET−AS−AP

CSTNET−AS−AP

NETIRD

PI−AU

TPG−INTERNET−AP

HCNSEOCHO−AS−KR

APAN−JP

TRANSACT−SDN−AS

VISIONNET

BEZEQ−INTERNATIONAL−AS

CELLCOM−AS

SPEEDCAST−AP

ISP−AS−AP

HCNCHUNGJU−AS−KR

NEWTT−IP−AP

MULTIMEDIA−AS−AP

DREAMX−AS

SMARTONE−AS−AP

ASN−PLATFORM−AP

AP−AUST−AU−AS

FX−PRIMARY−AS

GENESIS−AP

ASN−OZONLINE−AU

MOPH−TH−AP

FCABLE−AS

TOTNET−TH−AS−AP

CHEONANVITSSEN−AS−KR

DONGBUIT−AS−KRPUBNET1−AS

CMNET−GD

CNNIC−CN−COLNET

ASN−ATHOMEJP

GITS−TH−AS−AP

CAT−AP

COMINDICO−AP

SAERONET−AS−KR

TOKAI PACNET

GAYANET−AS−KREFTEL−AS−APHCNKUMHO−AS−KR

YAHOO−1VISIONARY

AIRSTREAMCOMM−NET

TELWEST−NETWORK−SVCS−STATIC

AFILIAS−NST

TRUVISTA

IPPLANET−AS

ITGATE

TVC−AS1

ROCK−HILL−TELEPHONE

VEROXITY

BROADVIEWNET

INFOLINK−MIA−US

Hadara

RIPNET

CAVTEL02

HOMESCCITIZENS

CORPCOLO

NWT−AS−APK−OPTICOM

QALA−SG−AP

CITIC−HK−AP

ADC−BUDDYB−AS

GIGAPASS−AS−KR

ICE−AS−KRCOMCLARK−AS

SPTEL−AP AS−PNAPTOK

GIGAINFRA

CAMNET−AS

ASN−EQUINIX−APDREAMPLUS−AS−KR

GINAMHANVIT−AS−KR

HANVITIAB−AS−KR

ACE−1−WIFI−AS−AP

KBTNETSPEED−AS−AP

WWW−PH−AP

LINKTELECOM−NZ−AP

PLANET−OZI

POSNET

FPT−AS−AP

EGIHOSTING

USCARRIER

TWIN−LAKES

SERVERROOM

O1COMM

AMPATH−AS

BEN−LOMAND−TEL

C−R−T

HARCOM1

ISPNET−1

ALLIED−TELECOMBARR−XPLR−ASN

SKYBEST

VITSSEN−SUWON−AS−KR

INCHEON−AS−KR

ONLINE−AS

AINS−AS−AP

IPVG−AS−AP

SKYBB−AS−AP

BIGAIR−AP

PIPETRANSIT−AS−AP

KIRZ−AS−TH

BCL−TRANSIT−AS−AP

ABN−PEERING−AS−AP

CMNET−V4SHANGHAI−AS−AP

WICAM−AS

ISATNET−AS−IDDIGISATNET−AS−ID

YAHOO−JP−AS−AP

ENMAX−ENVISION

SANTEECOOPER

CooperaciÃ³nUniversidade

FIORD−AS

IPNXng

NASSIST−AS

VOXEL−DOT−NETCOLOGUYS

GIGANEWS

XITTEL−AS EONIX−CORPORATION−AS−PHX01−WWW−INFINITIE−NET

EVERN2

ONS−COS

NEMONT−CELLULAR

CANBYTELEPHONEASSOCIATION

RSN−1

PARC−ASN

VPLSNET

CNNIC−PBSL−AP

HELLONET−AS−KR

EXTREMEBB−AS−MY

SIS−GROUP−SYD−AS−AP

AIS−TH−AP

PACTEL−AS−AP

SUNNYVISION−AS−AP

HCLC−AS−KR

CMBHK−AS−KR

CMBDAEJEON−AS−KR

BB−BROADBAND−TH−AS−AP

MURAMATSU−AS−AP

GOSCOMB−AS

NORTHLAND−CABLE

BUDREMYER−AS

CTC−CORE−AS

EPSILON

IPSERVERONE−AS−AP

JCN−AS−KR

DAELIM

DEDICATED−SERVERS−SYD−AS−AP

SUPERBROADBANDNETWORK−AS−AP

GRNT−JP−AS−AP

JASTEL−NETWORK−TH−AP

TRIPLETNET−AS−AP

PFNL−ASN

EVERN−1

LIGHTOWER

IMPACT−AS−APCMNET−GUANGDONG−AP

CMNET−JIANGSU−AP

Visualizing all prefixes

−100 0 100

−100

100Smaller cluster is about 25% of all prefixes

Consistent clustering in any sample of prefixes

Understanding Clustering under RSD

In terms of the next-hop matrix N: •  A cluster C corresponds to a

set of columns of N, ie, N(:,C) •  The columns are close in RSD •  So they must be similar in

some positions S

In terms of BGP routing: •  For any row in N(S,C) the entries

must be nearly the same •  So S is a set of ASes making

similar routing decisions w.r.t. C

We call such a pair (S,C) a local atom

−100 0 100

−100

Why are Local Atoms Interesting?

•  Routing decisions by ASes are made independently •  There will often be many next hops for any given prefix •  Different ASes will choose different next hops for a prefix

Exposing Unexpected Coordination

−100 0 100

−100

−100 0 100

−100

100 A significant fraction of ASes are all following the same rule.

There is a special AS X in the system.

The rule is: “Always use AS X to get to any customer in its cone.”

The ASes making this coordinated decision are not related in any obvious way

The smaller cluster is a local atom.

Attractivity of AS6939

What is so special about Hurricane Electric?

•  A peering policy based only on colocation –  Settlement-free peering –  No commercial consideration or traffic ratios

•  Immensely attractive to small/medium NSPs –  “free” access to any customer network of HE –  HE backbone has global scope

“Hurricane Electric is willing to peer with networks which are connected to one or more exchange points which we have in common.”

he.net/adm/peering.html

Systematic Discovery of Local Atoms

•  Manual inspection uncovers a large-scale local atom –  The Hurricane Electric cluster –  Includes about 25% of prefixes in the Internet

•  Do smaller local atoms exist? How can we detect them? •  Need an effective clustering strategy for RSD

•  A natural approach: seek clusters such that –  Within-cluster RSD is minimized –  Between-cluster RSD is maximized

Pivot Clustering

•  Minimizing P-Cost is NP-hard, but there is an expected 3-approximation algorithm for it: Pivot clustering

Largest 5 clusters found using Pivot Clustering

•  Clusters show clear separation

•  Each cluster corresponds to a local atom

Number of Prefixes (C)

Number of Source ASes (S)

Destinations

C1 150 16 Ukraine 83% Czech. Rep 10%

C2 170 9 Romania 33% Poland 33%

C3 126 7 India 93% US 2%

C4 484 8 Russia 73% Czech rep. 10%

C5 375 15 US 74% Australia 16%

Interpreting Clusters

Conclusions

•  RSD is a metric that captures when two prefixes are “routed similarly in the Internet”

•  It has many useful properties –  a fine-grained distance metric even in small-world graphs –  captures a different notion of distance than shortest-path –  summarizes the routing state of the entire network –  effective for visualizing Internet prefixes –  allows in-depth analysis of BGP –  uncovers surprising and interesting patterns in routing

•  Local atoms

Thank you!

Gonca Gursun Natali Ruchansky Evimaria Terzi

Related Work •  Reported that BGP tables provide an incomplete view of the

AS graph. [Roughan et. al. ‘11]

•  Visualization based on AS degree and geo-location. [Huffaker and k. claffy ‘10]

•  Small scale visualization through BGPlay and bgpviz

•  Clustering on the inferred AS graph. [Gkantsidis et. al. ‘03]

•  Clustering prefixes that share the same BGP paths into policy atoms. [Broido and k. claffy ‘01]

•  Methods for calculating policy atoms and characteristics. [Afek et. al. ‘02]

Data Hygiene Implications

•  BGP data is known to favor customer-provider links and miss peer-peer links

•  Our restriction to 38 x 135000 known paths means that we are not missing any links in the scope of our experiments

•  Hence accuracy for the chosen subsets of M is not affected by missing links

•  However, the accuracy of our methods may be different on the full M –  Whether better or worse, it’s not clear –  There is some reason to believe it would be better…

A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

Transcript of A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

Documents

Transcript of A Fine-Grained Distance Metric for Analyzing Internet Topology · A Fine-Grained Distance Metric...

Chapter 9 The Topology of Metric Spaces - …mfinkels/140T/chapter9.pdf · Chapter 9 The Topology of Metric Spaces ... topology (see Example 4), ... (Note that in general, will depend

1 The topology of metric spaces - Florida Atlantic …math.fau.edu/schonbek/Analysis/ia1fa14MetricSpaces.pdf · 1 THE TOPOLOGY OF METRIC SPACES 2 Theorem 1 Let M be a metric space

Topology of Nonarchimedean Analytic Spaces · Nonarchimedean analytic geometry In the 1960s, Tate developed rigid analytic spaces. Two key steps: Replace the metric topology on Kn

Fine-grained or coarse-grained? Strategies for ...

METRIC TOPOLOGY: A FIRST COURSE (FOR MATH 4450, SPRING …

Overview A quick review of general topology · 1. A quick review of general topology First let us review fundamental facts on topological spaces. The proofs are omitted. Metric space.

General Topology - maths.ed.ac.uktl/topology/topology_notes.pdf · Chapter A Topological spaces A1 Review of metric spaces For the lecture of Thursday, 18 September 2014 Almost everything

Topology on locally finite metric spaces

One Setting for All: Metric, Topology, Uniformity ... · METRIC, TOPOLOGY, UNIFORMITY, APPROACH STRUCTURE 129 The horizontal embeddings in the diagram above are both reﬂective and

Topology of B-metric spaces - Numdamarchive.numdam.org/article/CM_1954-1956__12__250_0.pdfthe topology induced in sets over a-complete Boolean algebras by the Kantorovitch topologies

Metric Spaces in Synthetic Topology - Andrej Bauermath.andrej.com/wp-content/uploads/2010/01/csms_in_synthtop.pdf · synthetic topology. Section 3 makes initial observations about

The Geometry of the Universal Teichmuller Space and the ...€¦ · strong metric, that is, a metric which induces the Hilbert space topology on each tangent spaces. They also show

Metric spaces - Department of Mathematics Spaces.pdf · Let be a set. A function is called a metric if it satisfies the following three conditions: A pair , ... Topology of metric

ANALYSIS SYLLABUS Metric Space Topology Rn, compactness ... · Continuity, uniform continuity, uniform convergence, Weierstrass Comparison Test, uniform convergence and limits of

Metric Topology yukita/.

Math201-Lecturenotes metric topology

METRIC FOUNDATIONS OF GEOMETRY. I · 1944] METRIC FOUNDATIONS OF GEOMETRY. I 467 of general topology apply to metric spaces(7). To indicate this, we recall the following common notions,

Deep Metric Learning With Angular Lossresearch.baidu.com/Public/uploads/5acc20706a719.pdfture matching [7], ﬁne-grained image classiﬁcation [33, 38], zero-shot learning [11, 35]

Halogen bonded Borromean networks by design: Topology ...Halogen bonded Borromean networks by design: Topology invariance and metric tuning in a library of multi component systems

General Topology lecture notes - WordPress.com choose a metric suited to particular purpose Metrics may be complicated, while the topology may be simple Can study families of metrics

Topology of B-metric spaces - Numdamarchive.numdam.org/article/CM_1954-195612250_0.pdfthe topology induced in sets over a-complete Boolean algebras by the Kantorovitch topologies