Content Distribution March 2, 2011 2: Application Layer1.

55
Content Distribution March 2, 2011 2: Application Layer 1

Transcript of Content Distribution March 2, 2011 2: Application Layer1.

Page 1: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 1

Content Distribution

March 2, 2011

Page 2: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 2

Contents

P2P architecture and benefits P2P content distribution Content distribution network (CDN)

Page 3: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 3

Pure P2P architecture

no always-on server arbitrary end systems

directly communicate peers are

intermittently connected and change IP addresses

Three topics: File distribution Searching for

information Case Study: Skype

peer-peer

Page 4: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 4

File Distribution: Server-Client vs P2PQuestion : How much time to distribute file

from one server to N peers?

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload bandwidth

ui: peer i upload bandwidth

di: peer i download bandwidth

Page 5: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 5

File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F server

sequentially sends N copies: NF/us time

client i takes F/di

time to download

increases linearly in N(for large N)

= dcs = max { NF/us, F/min(di) }i

Time to distribute F to N clients using

client/server approach

Page 6: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 6

File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F server must send one

copy: F/us time client i takes F/di time

to download NF bits must be

downloaded (aggregate) fastest possible upload rate: us + Sui

dP2P = max { F/us, F/min(di) , NF/(us + Sui) }i

Page 7: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 7

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

Min

imu

m D

istr

ibu

tion

Tim

e

N

P2P

Client-Server

Server-client vs. P2P: example

Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

Page 8: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 8

Contents

P2P architecture and benefits P2P content distribution Content distribution network (CDN)

Page 9: Content Distribution March 2, 2011 2: Application Layer1.

P2P content distribution issues Issues

Peer discovery and group management Data placement and searching Reliable and efficient file exchange Security/privacy/anonymity/trust

Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT])

2: Application Layer 9

Page 10: Content Distribution March 2, 2011 2: Application Layer1.

Centralized index (Napster)

original “Napster” design

1) when peer connects, it informs central server: IP address content

2) Alice queries for “Hey Jude”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

2: Application Layer 10

Page 11: Content Distribution March 2, 2011 2: Application Layer1.

Centralized modelBob Alice

JaneJudy

file transfer is decentralized, but locating content is highly centralized

2: Application Layer 11

Page 12: Content Distribution March 2, 2011 2: Application Layer1.

Centralized Benefits:

Low per-node state Limited bandwidth usage Short location time High success rate Fault tolerant

Drawbacks: Single point of failure Limited scale Possibly unbalanced load

copyright infringement

Bob Alice

JaneJudy

2: Application Layer 12

Page 13: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 13

File distribution: BitTorrent

tracker: tracks peers participating in torrent

torrent: group of peers exchanging chunks of a file

obtain listof peers

trading chunks

peer

P2P file distribution

Page 14: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 14

BitTorrent (1)

file divided into 256KB chunks. peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

while downloading, peer uploads chunks to other peers.

peers may come and go once peer has entire file, it may (selfishly) leave

or (altruistically) remain

Page 15: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 15

BitTorrent (2)

Pulling Chunks at any given time,

different peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Sending Chunks: tit-for-tat Alice sends chunks to

four neighbors currently sending her chunks at the highest rate re-evaluate top 4

every 10 secs every 30 secs: randomly

select another peer, starts sending chunks newly chosen peer

may join top 4 “optimistically

unchoke”

Page 16: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 16

BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading partners & get file faster!

Page 17: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 17

P2P Case study: Skype

inherently P2P: pairs of users communicate.

proprietary application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

Index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server

Page 18: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 18

Peers as relays

Problem when both Alice and Bob are behind “NATs”. NAT prevents an

outside peer from initiating a call to insider peer

Solution: Using Alice’s and Bob’s

SNs, Relay is chosen Each peer initiates

session with relay. Peers can now

communicate through NATs via relay

Page 19: Content Distribution March 2, 2011 2: Application Layer1.

Distributed Hash Table (DHT)

DHT = distributed P2P database Database has (key, value) pairs;

key: ss number; value: human name key: content type; value: IP address

Peers query DB with key DB returns values that match the key

Peers can also insert (key, value) peers

2: Application Layer 19

Page 20: Content Distribution March 2, 2011 2: Application Layer1.

DHT Identifiers

Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits.

Require each key to be an integer in same range.

To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table

2: Application Layer 20

Page 21: Content Distribution March 2, 2011 2: Application Layer1.

How to assign keys to peers?

Central issue: Assigning (key, value) pairs to peers.

Rule: assign key to the peer that has the closest ID.

Convention in lecture: closest is the immediate successor of the key.

Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1

2: Application Layer 21

Page 22: Content Distribution March 2, 2011 2: Application Layer1.

1

3

4

5

810

12

15

Chord (a circular DHT) (1)

Each peer only aware of immediate successor and predecessor.

“Overlay network”2: Application Layer 22

Page 23: Content Distribution March 2, 2011 2: Application Layer1.

Chord (a circular DHT) (2)

0001

0011

0100

0101

10001010

1100

1111

Who’s resp

for key 1110 ?I am

O(N) messageson avg to resolvequery, when thereare N peers

1110

1110

1110

1110

1110

1110

Define closestas closestsuccessor

2: Application Layer 23

Page 24: Content Distribution March 2, 2011 2: Application Layer1.

Chord (a circular DHT) with Shortcuts

Each peer keeps track of IP addresses of predecessor, successor, short cuts.

Reduced from 6 to 2 messages. Possible to design shortcuts so O(log N) neighbors, O(log N)

messages in query

1

3

4

5

810

12

15

Who’s resp for key 1110?

2: Application Layer 24

Page 25: Content Distribution March 2, 2011 2: Application Layer1.

Peer Churn

Peer 5 abruptly leaves Peer 4 detects; makes 8 its immediate successor;

asks 8 who its immediate successor is; makes 8’s immediate successor its second successor.

What if peer 13 wants to join?

1

3

4

5

810

12

15

• To handle peer churn, require each peer to know the IP address of its two successors. • Each peer periodically pings its

two successors to see if they are still alive.

2: Application Layer 25

Page 26: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 26

Contents

P2P architecture and benefits P2P content distribution Content distribution network (CDN)

Page 27: Content Distribution March 2, 2011 2: Application Layer1.

Why Content Networks?

More hops between client and Web server more congestion!

Same data flowing repeatedly over links between clients and Web server

S

C1

C4

C2

C3

- IP router

Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 27

Page 28: Content Distribution March 2, 2011 2: Application Layer1.

Why Content Networks?

Origin server is bottleneck as number of users grows

Flash Crowds (for instance, Sept. 11)

The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server (www.cnn.com) and a content sink (us, as users)

Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 28

Page 29: Content Distribution March 2, 2011 2: Application Layer1.

Example: Web Server Farm

Simple solution to the content distribution problem: deploy a large group of servers

Arbitrate client requests to servers using an “intelligent” L4-L7 switch

Pretty widely used today

L4-L7 Switch

Request fromgrad.umd.edu

Request from ren.cis.udel.edu

Request fromren.cis.udel.edu

Request fromgrad.umd.edu

www.cnn.com(Copy 1)

www.cnn.com(Copy 3)

www.cnn.com(Copy 2)

2: Application Layer 29

Page 30: Content Distribution March 2, 2011 2: Application Layer1.

Example: Caching Proxy

Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet

Reduced network traffic Reduced user perceived latency

Clientren.cis.udel.edu

Clientmerlot.cis.u

del.edu

Intercepters

Proxy

www.cnn.comInternetTCP port 80 traffic

Othertraffic

ISP

2: Application Layer 30

Page 31: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 31

But on Sept. 11, 2001

Web Serverwww.cnn.com

Usermslab.kaist.ac.kr

1000,000other hosts

1000,000other hosts

New ContentWTC News!

oldcontent request

request

- Caching Proxy

ISP

- Congestion / Bottleneck

Page 32: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 32

Problems with discussed approaches: Server farms and Caching proxies Server farms do nothing about problems due to network

congestion, or to improve latency issues due to the network

Caching proxies serve only their clients, not all users on the Internet

Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies

Accounting issues with caching proxies. For instance, www.cnn.com needs to know the number of hits

to the webpage for advertisements displayed on the webpage

Page 33: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 33

Again on Sept. 11, 2001 with CDN

Web Serverwww.cnn.com

Usermslab.kaist.ac.kr

New ContentWTC News!

requestnew

content

1000,000other users

1000,000other users

- Surrogate

- Distribution Infrastructure

FL

IL

DE

NY

MA

MICA

WA

Page 34: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 34

Web replication - CDNs

Overlay network to distribute content from origin servers to users

Avoids large amounts of same data repeatedly traversing potentially congested links on the Internet

Reduces Web server load

Reduces user perceived latency

Tries to route around congested networks

Page 35: Content Distribution March 2, 2011 2: Application Layer1.

2: Application Layer 35

CDN vs. Caching Proxies

Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users

Caches are reactive, CDNs are proactive

Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients

CDNs give control over the content to the content providers, caching proxies do not

Page 36: Content Distribution March 2, 2011 2: Application Layer1.

CDN Architecture

Surrogate

Surrogate

Request Routing

Infrastructure

Distribution& Accounting Infrastructure

CDN

Origin Server

Client Client

2: Application Layer 36

Page 37: Content Distribution March 2, 2011 2: Application Layer1.

CDN Components

Content Delivery Infrastructure: Delivering content to clients from surrogates

Request Routing Infrastructure: Steering or directing content request from a client to a suitable surrogate

Distribution Infrastructure: Moving or replicating content from content source (origin server, content provider) to surrogates

Accounting Infrastructure: Logging and reporting of distribution and delivery activities

2: Application Layer 37

Page 38: Content Distribution March 2, 2011 2: Application Layer1.

Server Interaction with CDN

DistributionInfrastructure

1

1. Origin server pushes new content to CDN OR CDN pulls content from origin server

Accounting Infrastructure

2

2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server

CDN

Origin Server

www.cnn.com

2: Application Layer 38

Page 39: Content Distribution March 2, 2011 2: Application Layer1.

Request Routing

Infrastructure

Client Interaction with CDN

1

1. Hi! I need www.cnn.com/sept11

2

2. Go to surrogate newyork.cnn.akamai.com

3

3. Hi! I need content /sept11

Q:How did the CDN choose the New York surrogate over the California surrogate ?

Client

Surrogate(NY)

Surrogate(CA)

CDNcalifornia.cnn.akamai.com

newyorkcnn.akamai.com

2: Application Layer 39

Page 40: Content Distribution March 2, 2011 2: Application Layer1.

Request Routing Techniques

Request routing techniques use a set of metrics to direct users to “best” surrogate

Proprietary, but underlying techniques known: DNS based request routing Content Modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms

2: Application Layer 40

Page 41: Content Distribution March 2, 2011 2: Application Layer1.

DNS based Request-Routing

Common due to the ubiquity of DNS as a directory service

Specialized DNS server inserted in DNS resolution process

DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics

2: Application Layer 41

Page 42: Content Distribution March 2, 2011 2: Application Layer1.

DNS based Request-Routing

Akamai DNS

DN

S q

uery

:w

ww

.cnn.c

om

DN

S r

esp

onse

:A

1

45

.15

5.1

0.1

5

Sess

ion

local DNS server (dns.nyu.edu)128.4.4.12

1) DNS query:www.cnn.com

DNS response:A 145.155.10.15

www.cnn.com

Surrogate145.155.10.15

Surrogate58.15.100.152

AkamaiCDN

test.nyu.edu

128.4.30.15

newyork.cnn.akamai.com

california.cnn.akamai.com

newyork.cnn.akamai.com

Q: How does the Akamai DNS know which surrogate is

closest ?

2: Application Layer 42

Page 43: Content Distribution March 2, 2011 2: Application Layer1.

DNS based Request-Routing

DN

S q

uery

DN

S r

esp

onse

Sess

ion

Akamai DNS

www.cnn.com

Surrogate

Surrogate

AkamaiCDN

test.nyu.edu128.4.30.15

local DNS server (dns.nyu.edu)

128.4.4.12

DNS query

DNS response

Measure

to

Client D

NS

Measure to Client DNS

Measurement results

Measure

ment resu

lts

Mea

sure

men

tsMeasurem

ents

2: Application Layer 43

Page 44: Content Distribution March 2, 2011 2: Application Layer1.

DNS based Request Routing: Cachingwww.cnn.com

Client DNS76.43.32.4

Surrogate145.155.10.15

Surrogate58.15.100.152

Akamai DNS

AkamaiCDN

Client76.43.35.53

Requesting DNS - 76.43.32.4

Surrogate - 145.155.10.15

www.cnn.comA 145.155.10.15TTL = 10s

Requesting DNS - 76.43.32.4Available Bandwidth = 10 kbpsRTT = 10 ms

Requesting DNS - 76.43.32.4Available Bandwidth = 5 kbpsRTT = 100 ms

2: Application Layer 44

Page 45: Content Distribution March 2, 2011 2: Application Layer1.

45

DNS based Request Routing: Discussion

Originator Problem: Client may be far removed from client DNS

Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu?Q: Which DNS server performs the last recursion of the DNS

request?

Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates

2: Application Layer

Page 46: Content Distribution March 2, 2011 2: Application Layer1.

46

Server Selection Metrics

Network Proximity (Surrogate to Client): Network hops (traceroute) Internet mapping services (NetGeo, IDMaps) …

Surrogate Load: Number of active TCP connections HTTP request arrival rate Other OS metrics …

Bandwidth Availability2: Application Layer

Page 47: Content Distribution March 2, 2011 2: Application Layer1.

P4P : Provider Portal for (P2P) Applications

Laboratory of Networked Systems

Yale University

Page 48: Content Distribution March 2, 2011 2: Application Layer1.

P2P: Benefits and Challenges

P2P is a key to content delivery– Low costs to content owners/distributors– Scalability

Challenge– Network-obliviousness usually leads to network

inefficiency• Intradomain: for Verizon network, P2P traffic traverses

1000 miles and 5.5 metro-hops on average• Interdomain: 50%-90% of existing local pieces in active

users are downloaded externally*

*Karagiannis et al. Should Internet service providers fear peer-assisted content distribution? In Proceeding of IMC 2005

Page 49: Content Distribution March 2, 2011 2: Application Layer1.

ISP Attempts to Address P2P Issues

Upgrade infrastructure Customer pricing Rate limiting, or termination of services P2P caching

ISPs cannot effectively address network efficiency alone

Page 50: Content Distribution March 2, 2011 2: Application Layer1.

Locality-aware P2P: P2P’s Attempt to Improve Network Efficiency

P2P has flexibility in shaping communication patterns

Locality-aware P2P tries to use this flexibility to improve network efficiency E.g., Karagiannis et al. 2005, Bindal et al. 2006,

Choffnes et al. 2008 (Ono)

Page 51: Content Distribution March 2, 2011 2: Application Layer1.

Problems of Locality-aware P2P

Locality-aware P2P needs to reverse engineer network topology, traffic load and network policy

Locality-aware P2P may not achieve network efficiency

ISP 0

ISP K

ISP 1

ISP 2

Choose congested links Traverse costly interdomain links

Page 52: Content Distribution March 2, 2011 2: Application Layer1.

A Fundamental Problem

Feedback from networks is limited E.g., end-to-end flow measurements or

limited ICMP feedback

Page 53: Content Distribution March 2, 2011 2: Application Layer1.

Our Goal

Design a framework to enable better cooperation between networks and P2P

P4P: Provider Portal for (P2P) Applications

Page 54: Content Distribution March 2, 2011 2: Application Layer1.

ISP A

iTracker

P4P Architecture Providers

publish information via iTracker

Applications query providers’

information adjust traffic patterns

accordinglyP2P

ISP B

iTracker

Page 55: Content Distribution March 2, 2011 2: Application Layer1.

Example:Tracker-based P2P Information flow

1. peer queries appTracker

2/3. appTracker queries iTracker

4. appTracker selects a set of active peers

ISP A

3

2iTracker

peer

appTracker

1 4