Post on 13-Jan-2016
Internet Tomography and Geography
What is this area all about?Related work in the areaMain Paper WEBMAPPER’s features How WEBMAPPER works WEBMAPPER results summarized
Internet Tomography / Geography
Where, geographically, are my clients located (or servers)?What’s the best server for me to get content from?Given two IP’s, what’s the latency between them? What IP’s all come from the same geographic location?Generally, show me a complete and accurate map of the Internet
Related Work
B. Krishnamurthy and J. Wang, “On network-aware clustering of webclients,” in Proceedings of SIGCOMM ’00, August 2000
Problem: Which IP’s are the same Does BGP-table based clustering Groups IP’s by common administrative control Passive
S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the placement of intenet instrumentation,” in Proceedings of INFOCOM ’00, March 2000, pp. 295–304.
Problem: What’s latency between 2 IP’s Clusters the internet based on BGP address prefixes Places “Tracers” everywhere in the Internet that actively ping
each other
Related work
An Investigation of Geographic Mapping Techniques for Internet Hosts Padmanabhan, Subramanian, SIGCOMM 2001 Problem: geo-locate from IP 3 solutions
GeoTrack: infer location by DNS / traceroute GeoPing: probe target from known locations,
triangulate [active] GeoCluster: do passive BGP, IP-prefix based
clustering
Related work
Predicting Internet Network Distance with Coordinate-Based Approaches Problem: Given 2 IP’s, what’s latency Ng, Zhang, CMU, INFOCOM 2002 Active network of landmarks, pinging each
other Better math model (GNP) for interpolating
distance than IDMaps
Proprietary Solutions (Akamai, etc) Problem: What’s the closest server to an IP?
Clustering and Server Selection Using Passive Monitoring
M. Andrews, B. Sheperd, A. Srinivasan, P. Winkler, F. Zane(Bell Labs), INFOCOM 2002
Problem: Server Selection
Given a client, and a set of servers, all with identical content, tell me the “best” serverFor the client, “best” means lowest latency or highest throughputFor the whole system, it may mean something else
What is Passive Monitoring?
Content Servers don’t ping each otherContent Servers don’t ping clientsNo pinging is done: no additional network traffic is introducedInstead, servers record the Round-Trip Time of TCP handshake from a client
WEBMAPPER
WEBMAPPER: Clusterer
Clusterer uses the client-server latency pairs reported by the serversDetermines which address prefixes correspond to the same network location, and which don’tDoes more than just find the closest server to each cluster; assigns probabilities to each to balance the network flow
WEBMAPPER OutputClient cluster
Content server 1
Content server 2
Content server 3
25.135.64.0/19
0.9 0.05 0.05
132.0.0.0/8 0.85 0 0.15
WEBMAPPER: Big Tree
Giant Binary Tree of all IP addressesWell, not all… assume last 8 bits always clusteredRoot of tree: 0.0.0.0/0 Children 0.0.0.0/1, 128.0.0.0/1 Leaves 123.123.123.123/24
WEBMAPPER: Big Tree
Leaves also store Sum of recorded distances (per
server) Squared sum of recorded distances Number of recorded distances
Leaf data is periodically aged exponentially (I = I * 0.9)
WEBMAPPER: Small Tree
Clusters are formed from big tree by folding children into their parents when they’re “similar” to their siblingsUses statistical test to determine “similar” (two-sample t-test)Threshold based
Assigning Clusters to Servers
Assigning servers to clusters is complicated 1. Testing Index
Futzes with the probability of a cluster being assigned to a server
Based on multi-armed bandit solution
Assigning Clusters to Servers
Assigning Clusters to Servers
Calculate server capacityEach cluster assignment has a latency costTry to minimize the total cost without breaking any server’s capacityGraph theory to the rescue: min-cost flow
Results
Experiment 1: 28 day log of busy web traffic Recorded client IP, time, RTT Clustering produced 17,270 clusters Here are 12 example clusters for high
traffic days
Results
Experiment 2: Set up a west-coast and east-coast
server Force clients to download something
from each (to get actual measurments)
Have clients download something from the “either-or” server (WEBMAPPER powered)
Opinions
Statistics and weighing formulae are coolWhat’s a good way to tell if the clustering is any good aside from eyeing samples?Two server test is all we can get out of Bell Labs?