1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State...

31
1 Why is P2P the Most Why is P2P the Most Effective Way to Deliver Effective Way to Deliver Internet Media Content Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing Chen, George Mason Enhua Tan, Ohio State Zhen Xiao, IBM T. J. Watson Research

Transcript of 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State...

Page 1: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

1

Why is P2P the Most Effective Way to Why is P2P the Most Effective Way to Deliver Internet Media Content Deliver Internet Media Content

Xiaodong Zhang

Ohio State University

In collaborations with

Lei Guo, Yahoo!

Songqing Chen, George Mason

Enhua Tan, Ohio State

Zhen Xiao, IBM T. J. Watson Research

Page 2: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

2

Media contents on the Internet

• Video traffic is doubling every 3 to 4 months

No. 3

1. Yahoo

2. Google

3. YouTube

• Video applications are mainstream

Page 3: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

3

Different media delivery approaches

♫♫

P2P exchange P2P swarming

Content Delivery Network Downloading

Streaming

Complete fetch

User controlled access

PseudoStreaming

Progressive download & play

Overlay multicast

HTTP

RTSP

HTTP

Page 4: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

4

The Power of measurements and modeling

• Media delivery on the Internet– Internet is an open, complex system

– Media traffic is user-behavior driven

• Challenges– Lack of QoS support

– Lack of Internet management and control for media flow

– Thousands of concurrent streams from diverse clients

• Measurements and modeling are critical for– Evaluating system performance under the Internet environment

– Understanding user access patterns in media systems

– Providing guidance to media system design and management

Page 5: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

5

Zipf distribution is believed the general model of Internet traffic patterns

• Zipf distribution (power law)– Characterizes the property of scale

invariance

– Heavy tailed, scale free

• 80-20 rule– Income distribution: 80% of social wealth

owned by 20% people (Pareto law)

– Web traffic: 80% Web requests access 20% pages (Breslau, INFOCOM’99)

• System implications– Objectively caching the working set in

proxy

– Significantly reduce network traffic

log i

log y

slope: -

iy i i : rank of objects

yi : number of references

: 0.6~0.8

i

y

heavy tail

Page 6: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

6

Does Internet media traffic follow Zipf’s law?

Chesire, USITS’01: Zipf-likeCherkasova, NOSSDAV’02: non-Zipf

Acharya, MMCN’00: non-ZipfYu, EUROSYS’06: Zipf-like

Web media systems VoD media systems

Live streaming and IPTV systems

Veloso, IMW’02: Zipf-likeSripanidkulchai, IMC’04: non-Zipf

P2P media systems

Gummadi, SOSP’03: non-ZipfIamnitchi, INFOCOM’04: Zipf-like

Page 7: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

7

Inconsistent media access pattern models

• Still based on the Zipf model– Zipf with exponential cutoff

– Zipf-Mandelbrot distribution

– Generalized Zipf-like distribution

– Two-mode Zipf distribution

– Fetch-at-most-once effect

– Parabolic fractal distribution

– …

• All case studies– Based on one or two workloads

– Different from or even conflict with each other

• An insightful understanding is essential to– Content delivery system design

– Internet resource provisioning

– Performance optimization

heuristic assumptions

Page 8: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

8

Challenges of addressing the issues

• Existing studies cannot identify a general media access pattern– Limited number of workloads

– Constrained scope of media traffic

– Biased measurements and noises in the data set

• Model should be accurate, simple, and meaningful– Characterize the unique properties

– Have clear physical meanings

– Observable and verifiable predictions

– Impacts on system designs

• Model validation methodology– Goodness-of-fit test

– Reexamination of previous observations

– Reappraisal of other models

Page 9: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

9

Research Objectives

• Discover a general distribution model of media access patterns– Comprehensive measurements and experiments

– Rigorous mathematical analysis and modeling

– Insights into media system designs

• Performance analysis of BitTorrent systems for media delivery– Identifying the weakness of BitTorrent

– Modeling potential of collaboration among different torrents

– System facility and incentive mechanism for multi-torrent collaboration

• Designs and implementations of streaming media systems– Reliable and scalable peer-to-peer media systems

– Power efficient wireless media systems

– High performance Internet streaming through WLANs

Page 10: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

10

Outline

• Motivation and objectives

• Stretched exponential model of Internet media traffic

• Dynamics of access patterns in media systems

• Caching implications

• Concluding remarks

Page 11: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

11

Workload summary

• 16 workloads in different media systems

– Web, VoD, P2P, and live streaming

– Both client side and server side

• Different delivery techniques

– Downloading, streaming, pseudo streaming

– Overlay multicast, P2P exchange, P2P swarming

• Data set characteristics

– Workload duration: 5 days - two years

– Number of users: 103 - 105

– Number of requests: 104 - 108

– Number of objects: 102 - 106

nearly all workloads available on the Internet

all major delivery techniques

data sets of different scales

Page 12: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

12

Stretched exponential distribution

• Media reference rank follows stretched exponential distribution (passed Chi-square test)

0log (1 , )c ciy a i b i N a x

0

( ) 1 exp[ ( ) ]cxP X x

x

c: stretch factor

1 log (assuming 1)Nb a N y log i

yc

b slope: -a

i : rank of media objects (N objects)

y : number of references

Probability distribution: Weibull

Rank distribution:

• fat head and thin tail in log-log scale

• straight line in logx-yc scale

( )i

iP y y

N

log i

log yfat head

thin tail

c: stretch factor

Page 13: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

13

Evidences: Web media systems (server logs)

ST-SVR-01 (15 MB)*HPC-98 (14 MB) *HPLabs-99 (120 MB)

HPC-98: enterprise streaming media server logs of HP corporation (29 months)HPLabs: logs of video streaming server for employees in HP Labs (21 months)ST-SVR-01: an enterprise streaming media server log workload like HPC-98 (4 months)

data in stretched exponential scale

data in log-log scale

R2: coefficient of determination (1 means a perfect fit)

x: rank of media object, y: number of references to the object. Title: workload name (median file size)

log scale in x axis

po

wer

ed s

cale

yc

log

sca

le

fat head thin tail

c = 0.22R2 ~ 1

Page 14: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

14

Evidences: Web media systems (req packets)

ST-CLT-05 (4.5 MB)PS-CLT-04 (1.5 MB) ST-CLT-04 (2 MB)

All collected from a large cable network hosted by a well-known ISP

PS-CLT-04: first IP packets of HTTP requests for media objects (downloading and pseudo streaming), 9 days

ST-CLT-04: RTSP/MMS streaming requests (on-demand media), 9 daysST-CLT-05: RTSP/MMS streaming requests (on-demand media), 11 days

po

wer

ed s

cale

yc

log

sca

le

fat head thin tail

log scale in x axis

Page 15: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

15

Evidences: VoD media systems

• mMoD-98: logs of a multicast Media-on-Demand video server, 194 days

• CTVoD-04: streaming serer logs of a large VoD system by China telecom, 219 days, reported as Zipf in EUROSYS’06

• IFILM-06: number of web page clicks to video clips in IFILM site, 16 weeks (one week for the figure)

• YouTube-06: cumulative number of requests to YouTube video clips, by crawling on web pages publishing the data

*mMoD-98 (125 MB) *CTVoD-04 (300 MB)

IFILM-06 (2.25 MB) YouTube-06 (3.4 MB)

po

wer

ed s

cale

yc

log

sca

le

fat head thin tail

log scale in x axis

Page 16: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

16

Evidences: P2P media systems

BT-03 (636 MB)*KaZaa-02 (300 MB) *KaZaa-03 (5 MB)

KaZaa-02: large video file (> 100 MB. Files smaller than 100 MB are intensively removed) transferring in KaZaa network, collected in a campus network, 203 days.

KaZaa-03: music files, movie clips, and movie files downloading in KaZaa network, 5 days,reported as Zipf in INFOCOM’04.

BT-03: 48 days BitTorrent file downloading (large video and DVD images) recorded by two tracker sites

Page 17: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

17

Evidences: Live streaming and other systems

IMDB-06Akamai-03 Movie-02

Akamai-03: server logs of live streaming media collected from akamai CDN, 3 months, reported as two-mode Zipf in IMC’04

Movie-02: US movie box office ticket sales of year 2002.

IMDB-06: cumulative number of votes for top 250 movies in Internet Movie Database web site

Page 18: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

18

Why Zipf observed before?

• Media traffic is driven by user requests• Intermediate systems may affect traffic pattern

– Effect of extraneous traffic– Filtering effect due to caching

• Biased measurements may cause Zipf observation

cache proxy

ad server

media server

Page 19: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

19

Extraneous media traffic

meta file link

web server

streamingmedia server

ads server

ads clip

flag clip

videoprogram

ad and flag video are pushed to clients mandatorily

ads clipflag clipvideo prog 1flag clipvideo prog 2

ads clipflag clipvideo prog 1flag clipvideo prog 2

Page 20: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

20

• Do not represent user access patterns– High request rate (high popularity)

– High total number of requests

• Not necessary Zipf with extraneous traffic– Extraneous traffic changes

– Always SE without extraneous traffic

• Small object sizes, small traffic volume

Effects of extraneous traffic on reference rank distributions

Reference rates

prog ads flag

2004 2005

2004: 2 objects 2005: mergedinto 1 object

Non-ZipfZipf

with extraneous traffic

SE2004

SE2005

without extraneous traffic

Page 21: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

21

Caching effect

• Web workload: caching can cause a “flattened head” in log-log scale

• Stretched exponential is not caused by caching effect

• Local replay events can be traced by WM/RM streaming media protocols– Before replay: cache validation

– After replay: send feed back

– Recorded in server logs

– Captured in our network measurement

log i

log y ZipfFiltered by Web cache

log i

log yStretched exponential

SET_PARAMETERLogplaystats

DESCRIBEServer logs

packet sniffer

Page 22: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

23

Why media access pattern is not Zipf

• “Rich-get-richer” phenomenon– Pareto, power law, …

– The structure of WWW

• Web accesses are Zipf– Popular pages can attract more users

– Pages update to keep popular

– Yahoo ranks No.1 more than six years

– Zipf-like for long duration

• Media accesses are different– Popularity decreases with time

exponentially

– Media objects are immutable

– Rich-get-richer not present

– Non-Zipf in long duration

100

101

10210

0

101

102

103

Popularity rank

Num

ber

of d

istin

ct o

bjec

ts

WebVideo

Number of distinct weekly top N popular objects in 16 weeks

Top 1 Web object never changes

Top 1 video object changes every week

16

1 0 100 200100

101

102

103

CC

DF

of r

eq (

log) ------ raw data

------ linear fit

Time after object birth (day)

BitTorrent media file

Page 23: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

25

Dynamics of Access Patterns in Media Systems

• Media reference rank distribution in log-log scale– Different systems have different access patterns– The distribution changes over time in a system (NOSSDAV’02)

• All follow stretched exponential distribution– Stretch factor c– Minus of slope a

• Physical meanings– Media file sizes– Aging effects of media objects– Deviation from the Zipf model log i

yc

b slope: -a

c: stretch factor

Page 24: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

26

streaming

P2P

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Str

etc

h f

ac

tor

c

300 MB 300 MB

Median file size

Different systems, similar file sizes

streaming P2P

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Str

etc

h f

ac

tor

c

2.25 MB 5 MB

Median file size

Different systems, similar file sizes

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Str

etch

fac

tor

c

5 MB 300 MB

Median file size

KaZaa systems, different file sizes

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Str

etch

fac

tor

c

2.25 MB 4.5 MB 120 MB 300 MB

Median file size

Streaming systems, different file sizes

Stretched factors of different systems

Page 25: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

27

Stretched factor and media file sizes

• Other factors besides file size– Different encoding rates and compression ratios

– Video and audio are different

– Different content type: entertainment, educational, business

0.00

0.10

0.20

0.30

0.40

0.50

0.60

1 10 100 1000

Median file size (MB)

Str

etc

h f

ac

tor

c

EDU

BIZ

file size vs. stretch factor c

• 0 – 5 MB: c <= 0.2• 5 – 100 MB: 0.2 ~ 0.3• > 100 MB: c >= 0.3

c increases with file size

Page 26: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

29

Stretched exponential parameters

• In a media system– Constant request rate– Constant object birth rate– Constant median file size

• Stretch factor c is a time invariant constant

• Parameter a increases with time

11

(1 ): ; req req

obj obj c

c

t y a

( ) 1

1 1

1 (1 )obj

c

req

N tobj t c

a

log i

yc

b slope: -a

Page 27: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

35

Modeling caching performance

11

1 1( )

k

zfi

kH

N i N

1

(log )( )

c

se

k k NH

N y N

1

1

( ) (log )lim lim 0

( )

ckse N

kN Nzf N

H Nc

H N

Media caching is far less efficient than Web caching

Parameter selection

Zipf: typical Web workload (=0.8)

SE: typical streaming workload

(c = 0.2, a = 0.25, same as ST-CLT-05)

Asymptotic analysis for small cache size k (k << N)

• Zipf

• SE

Web

media

Page 28: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

37

Long time to reach optimal

• Media objects have long lifespan– Most requested objects are created long time ago– Most requests are for objects created long time ago

• To achieve maximal concentration– Very long time (months to years)– Huge amount of storage– Only peer-to-peer systems provide such a huge space with a long time

200 days 150 days

50% 50%

Page 29: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

38

Summary

• Media access patterns do not fit Zipf model

• We give reasons why previous results were confusing

• Media access patterns are stretched exponential

• Our findings imply that

– Client-server based  proxy systems are not effective to deliver media contents

– P2P systems are most suitable for this purpose

• We provide an analytical basis for the effectiveness of a P2P media content delivery infrastructure  

Page 30: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

39

• Centralized Internet accesses follows zipf

• Decentralized Internet accesses (in an organized way, such as P2P) follow SE

• Other P2P-like accesses follow SE

– Social networks

– Instant messages, QQ, VoIP, …

– Dictionary-type searches: Wikipedia, Yahoo answers

• SE distributions also exist in business and sciences.

Stretched Exponential Distribution: Decentralized Content Delivery in Internet

Page 31: 1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.

40

References

The stretched exponential distribution, PODC’08

PSM-throttling, streaming in WLAN with low power, ICNP’07

SCAP, wireless AP caching for streaming, ICDCS’07.

Quality and resource utilization of Internet streaming, IMC’06

Internet streaming workload analysis, WWW’05

Measuring and modeling BitTorrent, IMC’05

Sproxy, caching for streaming, INFOCOM’04