IIT RTC Conference October 15 - 17, 2013

24
1 SnT – Interdisciplinary Centre for Security, Reliability and Trust 2 Bell Laboratories, Alcatel-Lucent Of maps and costs: Aggregating large- scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol IIT RTC Conference October 15 - 17, 2013 David Goergen 1 Vijay K. Gurbani 2 Radu State 1

description

Of maps and costs : Aggregating large-scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol. IIT RTC Conference October 15 - 17, 2013. David Goergen 1 Vijay K. Gurbani 2 Radu State 1. OUTLINE. Premise ALTO: background FCC dataset Processing - PowerPoint PPT Presentation

Transcript of IIT RTC Conference October 15 - 17, 2013

Page 1: IIT RTC Conference October 15 - 17, 2013

1 SnT – Interdisciplinary Centre for Security, Reliability and Trust2 Bell Laboratories, Alcatel-Lucent

Of maps and costs: Aggregating large-scale broadband measurements for the Application Layer Traffic Optimization (ALTO) protocol

IIT RTC ConferenceOctober 15 - 17, 2013

David Goergen1

Vijay K. Gurbani2

Radu State1

Page 2: IIT RTC Conference October 15 - 17, 2013

OUTLINE

• Premise

• ALTO: background

• FCC dataset

• Processing

• Evaluation and discoveries

IIT RTC conference21.04.23 2

Page 3: IIT RTC Conference October 15 - 17, 2013

Premise• Essential to study trends and derive

network analytics• Two extremes exist

– Complete and highly details raw data• Users lost in details• High amount of data

– Highly aggregated and summerized reports• Human readable format

– i.e. charts, presentations, reports• Often cannot be further investigated

There is a need for an intermediate way– ALTO Protocol seems a good choice.

IIT RTC conference21.04.23 3

Page 4: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction•ALTO solves the general rendezvous problem: Given a choice

of resources, which one is the best candidate?

•Recurring pattern in many domains:

Peer-to-peer (BitTorrent)

Which peers are close to me? Which peers have high upload bandwidth?

Content delivery networks (CDN)

Rendezvous me with nearest surrogate

Network routing and distance calculation

Shortest path computation

Data centers and cloud computing

Where is my nearest data center? Which server is lightly loaded? Which data center has the lowest network utilization?

21.04.23 IIT RTC conference 4

Page 5: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction•History

Circa 2008 --- Comcast and BitTorrent

P2P traffic dominates the Internet

Internet Service Providers wanted a well-behaved network

ISPs wanted to reduce transit costs.

BitTorrent traffic exhibits greedy behaviour to optimize local maxima at the expense of other time-sensitive traffic.

May 2008 IETF Workshop on P2P Infrastructure held in MIT to arrive at mitigating solutions

Outcome: 2 Working Groups

LEDBAT: Low Effort Extra Delay Background Transport

ALTO: Application Layer Traffic Optimization

21.04.23 IIT RTC conference 5

Page 6: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction•ALTO is:

An Application Layer Traffic Optimization Protocol

An IETF Working Group

An IETF (soon-to-be) standard RFC

A restful API that provides topology maps and cost maps to clients

A restful API that provides building blocks to construct:

Ranking service

Endpoint cost service

Endpoint property service

Map Filtering service

What is an endpoint?

An IP address, a MAC address, an aggregation of IP addresses, ...

21.04.23 IIT RTC conference 6

Page 7: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction

ALTO server

ALTO client

Routing protocolsProvisioningpolicies

Dynamic networkinformation

External interfaces

ALTO servicediscovery

ISP

Third parties, content providers, ...

ALTO Architecture

Standardized protocol

Not subject to standardization

21.04.23 IIT RTC conference 7

Page 8: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction•2 main abstractions:

Network Map

Cost Map

•Network specified in terms of Partition/Provider ID (PID): aggregation of endpoints identified by a provider-defined network location identifier.

•Costs are normalized and have two attributes:

Type: What does the cost represent? Air-miles, hop count, ...

Mode: How to interpret the cost.

Numerical (mathematical operations)

Ordinal(position-based preferences)

•These abstractions help!

IT, meet NOC. NOC, meet IT!21.04.23 IIT RTC conference 8

Page 9: IIT RTC Conference October 15 - 17, 2013

Network map

Graphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg

Datacenter 1

Datacenter 2

Datacenter 3

Problem: Complexity andnetwork structure exposed.

ALTO Introduction: Maps (Network and cost)

21.04.23 IIT RTC conference 9

Page 10: IIT RTC Conference October 15 - 17, 2013

Network map

Hides complexity behind “partition IDs”Graphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg

Datacenter 1

Datacenter 2

Datacenter 3

PID 1

PID 2

PID 3

ALTO Introduction: Maps (Network and cost)

21.04.23 IIT RTC conference 10

Page 11: IIT RTC Conference October 15 - 17, 2013

Cost map

Network cost of linking the partitionsGraphics sources:http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg

Datacenter 1

Datacenter 2

Datacenter 3

PID 1

PID 2

PID 3

20

10

22

30

5

1

ALTO Introduction: Maps (Network and cost)

21.04.23 IIT RTC conference 11

Page 12: IIT RTC Conference October 15 - 17, 2013

ALTO Introduction: Example ALTO maps

Network map

Cost map

21.04.23 IIT RTC conference 12

Page 13: IIT RTC Conference October 15 - 17, 2013

FCC Dataset specification

• One country• Time Period: 01.01.2012 to 31.12.2012• 7,782 anonymised volunteers spread

across the country• Each hourly triggers a defined set of

common web sites– i.e. Google, YouTube, CNN, …

• 75-78 million records per month• 6-7 GB of data per month

IIT RTC conference21.04.23 13

Page 14: IIT RTC Conference October 15 - 17, 2013

FCC Dataset specification

• Consists of several files organized per month– Linked together through unit_id field

• For our first evaluation we use curr_dns file extract distinct unit_id which are

consistent over a certain period– Use these to create a topology map for the

ALTO protocol

IIT RTC conference21.04.23 14

Page 15: IIT RTC Conference October 15 - 17, 2013

FCC Dataset specification

21.04.23 IIT RTC conference 15

Page 16: IIT RTC Conference October 15 - 17, 2013

Processing

• Find a stable set of unit_id– DNS resolver appears in every file– Location is fixed.

• Location is resolved using geo-ip database

• Unit_id close to DNS resolver location

IIT RTC conference21.04.23 16

Page 17: IIT RTC Conference October 15 - 17, 2013

Hadoop cluster specs

• Hadoop 2.0.0-cdh 4.3.0

• 4 nodes– hexacore 2.4GHz Xeon

• 120 GB RAM

• HDFS 27.54 TB

• 2 x 1GB Ethernet bonded

21.04.23 IIT RTC conference 17

Page 18: IIT RTC Conference October 15 - 17, 2013

Hadoop job process

21.04.23 IIT RTC conference 18

Page 19: IIT RTC Conference October 15 - 17, 2013

Outcome

• Output contains– unit_id– DNS Resolver IP– Occurrence– Geo. location

• Post process– Filter all non stable unit_id

• Occurrence < 12 month

21.04.23 IIT RTC conference 19

Page 20: IIT RTC Conference October 15 - 17, 2013

Interesting Observation• Some unit_id are located outside US

– Assume user has manually configured DNS resolver

• OpenDNS and Google DNS resolvers were ignored

• Large convergence to single point (Potwin,KS)– Potwin is the geographical center of the US– ISPs generally locate their primary or secondary

DNS name servers– continue to further investigate on minimizing the

impact

• Some unit_id change ISP and/or location

21.04.23 IIT RTC conference 20

Page 21: IIT RTC Conference October 15 - 17, 2013

Stable unit_id

21.04.23 IIT RTC conference 21

Page 22: IIT RTC Conference October 15 - 17, 2013

Next steps• Attempt to create network map

– Rough PID groupings accomplished by unit IDs belonging to same ISP.

– More formal PID groupings for further study (e.g., group by bandwidth speed irrespective of ISP, lowest jitter, …).

• Attempt to create a cost map– Different cost maps for different applications

(e.g., use udp latency or jitter as a cost metric for VoIP applications).

• Cross-reference with other dataset (e.g., US Census Dataset).

21.04.23 IIT RTC conference 22

Page 23: IIT RTC Conference October 15 - 17, 2013

Next steps• Using stable unit IDs as landmarks in a

virtual coordinate system.

21.04.23 IIT RTC conference 23

Page 24: IIT RTC Conference October 15 - 17, 2013

Thank you for your attentionQUESTIONS?

IIT RTC conference21.04.23 24