IIT RTC Conference October 15 - 17, 2013
description
Transcript of IIT RTC Conference October 15 - 17, 2013
1 SnT – Interdisciplinary Centre for Security, Reliability and Trust2 Bell Laboratories, Alcatel-Lucent
Of maps and costs: Aggregating large-scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol
IIT RTC ConferenceOctober 15 - 17, 2013
David Goergen1
Vijay K. Gurbani2
Radu State1
OUTLINE
• Premise
• ALTO: background
• FCC dataset
• Processing
• Evaluation and discoveries
IIT RTC conference21.04.23 2
Premise• Essential to study trends and derive
network analytics• Two extremes exist
– Complete and highly details raw data• Users lost in details• High amount of data
– Highly aggregated and summerized reports• Human readable format
– i.e. charts, presentations, reports• Often cannot be further investigated
There is a need for an intermediate way– ALTO Protocol seems a good choice.
IIT RTC conference21.04.23 3
ALTO Introduction•ALTO solves the general rendezvous problem: Given a choice
of resources, which one is the best candidate?
•Recurring pattern in many domains:
Peer-to-peer (BitTorrent)
Which peers are close to me? Which peers have high upload bandwidth?
Content delivery networks (CDN)
Rendezvous me with nearest surrogate
Network routing and distance calculation
Shortest path computation
Data centers and cloud computing
Where is my nearest data center? Which server is lightly loaded? Which data center has the lowest network utilization?
21.04.23 IIT RTC conference 4
ALTO Introduction•History
Circa 2008 --- Comcast and BitTorrent
P2P traffic dominates the Internet
Internet Service Providers wanted a well-behaved network
ISPs wanted to reduce transit costs.
BitTorrent traffic exhibits greedy behaviour to optimize local maxima at the expense of other time-sensitive traffic.
May 2008 IETF Workshop on P2P Infrastructure held in MIT to arrive at mitigating solutions
Outcome: 2 Working Groups
LEDBAT: Low Effort Extra Delay Background Transport
ALTO: Application Layer Traffic Optimization
21.04.23 IIT RTC conference 5
ALTO Introduction•ALTO is:
An Application Layer Traffic Optimization Protocol
An IETF Working Group
An IETF (soon-to-be) standard RFC
A restful API that provides topology maps and cost maps to clients
A restful API that provides building blocks to construct:
Ranking service
Endpoint cost service
Endpoint property service
Map Filtering service
What is an endpoint?
An IP address, a MAC address, an aggregation of IP addresses, ...
21.04.23 IIT RTC conference 6
ALTO Introduction
ALTO server
ALTO client
Routing protocolsProvisioningpolicies
Dynamic networkinformation
External interfaces
ALTO servicediscovery
ISP
Third parties, content providers, ...
ALTO Architecture
Standardized protocol
Not subject to standardization
21.04.23 IIT RTC conference 7
ALTO Introduction•2 main abstractions:
Network Map
Cost Map
•Network specified in terms of Partition/Provider ID (PID): aggregation of endpoints identified by a provider-defined network location identifier.
•Costs are normalized and have two attributes:
Type: What does the cost represent? Air-miles, hop count, ...
Mode: How to interpret the cost.
Numerical (mathematical operations)
Ordinal(position-based preferences)
•These abstractions help!
IT, meet NOC. NOC, meet IT!21.04.23 IIT RTC conference 8
Network map
Graphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
Datacenter 1
Datacenter 2
Datacenter 3
Problem: Complexity andnetwork structure exposed.
ALTO Introduction: Maps (Network and cost)
21.04.23 IIT RTC conference 9
Network map
Hides complexity behind “partition IDs”Graphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
Datacenter 1
Datacenter 2
Datacenter 3
PID 1
PID 2
PID 3
ALTO Introduction: Maps (Network and cost)
21.04.23 IIT RTC conference 10
Cost map
Network cost of linking the partitionsGraphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
Datacenter 1
Datacenter 2
Datacenter 3
PID 1
PID 2
PID 3
20
10
22
30
5
1
ALTO Introduction: Maps (Network and cost)
21.04.23 IIT RTC conference 11
ALTO Introduction: Example ALTO maps
Network map
Cost map
21.04.23 IIT RTC conference 12
FCC Dataset specification
• One country• Time Period: 01.01.2012 to 31.12.2012• 7,782 anonymised volunteers spread
across the country• Each hourly triggers a defined set of
common web sites– i.e. Google, YouTube, CNN, …
• 75-78 million records per month• 6-7 GB of data per month
IIT RTC conference21.04.23 13
FCC Dataset specification
• Consists of several files organized per month– Linked together through unit_id field
• For our first evaluation we use curr_dns file extract distinct unit_id which are
consistent over a certain period– Use these to create a topology map for the
ALTO protocol
IIT RTC conference21.04.23 14
FCC Dataset specification
21.04.23 IIT RTC conference 15
Processing
• Find a stable set of unit_id– DNS resolver appears in every file– Location is fixed.
• Location is resolved using geo-ip database
• Unit_id close to DNS resolver location
IIT RTC conference21.04.23 16
Hadoop cluster specs
• Hadoop 2.0.0-cdh 4.3.0
• 4 nodes– hexacore 2.4GHz Xeon
• 120 GB RAM
• HDFS 27.54 TB
• 2 x 1GB Ethernet bonded
21.04.23 IIT RTC conference 17
Hadoop job process
21.04.23 IIT RTC conference 18
Outcome
• Output contains– unit_id– DNS Resolver IP– Occurrence– Geo. location
• Post process– Filter all non stable unit_id
• Occurrence < 12 month
21.04.23 IIT RTC conference 19
Interesting Observation• Some unit_id are located outside US
– Assume user has manually configured DNS resolver
• OpenDNS and Google DNS resolvers were ignored
• Large convergence to single point (Potwin,KS)– Potwin is the geographical center of the US– ISPs generally locate their primary or secondary
DNS name servers– continue to further investigate on minimizing the
impact
• Some unit_id change ISP and/or location
21.04.23 IIT RTC conference 20
Stable unit_id
21.04.23 IIT RTC conference 21
Next steps• Attempt to create network map
– Rough PID groupings accomplished by unit IDs belonging to same ISP.
– More formal PID groupings for further study (e.g., group by bandwidth speed irrespective of ISP, lowest jitter, …).
• Attempt to create a cost map– Different cost maps for different applications
(e.g., use udp latency or jitter as a cost metric for VoIP applications).
• Cross-reference with other dataset (e.g., US Census Dataset).
21.04.23 IIT RTC conference 22
Next steps• Using stable unit IDs as landmarks in a
virtual coordinate system.
21.04.23 IIT RTC conference 23
Thank you for your attentionQUESTIONS?
IIT RTC conference21.04.23 24