1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹,...

Post on 22-Dec-2015

214 views 1 download

Tags:

Transcript of 1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹,...

1

YouTube Traffic Characterization:

A View From the Edge

Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³

¹Dept. of Computer Science, University of Calgary, Canada

²Enterprise Systems & Software Lab, HP Labs, USA

³Dept. of Computer Science and Engineering, IIT Delhi, India

2

Introduction

The way people use the Web is changing.

Creation and sharing of media: Fast, easy, cheap!

Volume of data associated with extremely popular online media.

3

What is Web 2.0? User generated content

Text: Wordpress, Blogspot Photos: Flickr, Facebook Video: YouTube, MySpace

Social Networking Facebook, MySpace

Tagging Flickr, YouTube

4

YouTube: Facts and Figures

Founded in February 2005 Enabled users to easily share movies

by converting them to Flash Largest video sharing Website on

the Internet [Alexa2007] Sold to Google for $1.65 billion in

November 2006

5

How YouTube Works (1/2)

GET: /watch?v=wQVEPFzkhaM

OK (text/html)

GET: /vi/fNaYQ4kM4FE/2.jpg

OK (img/jpeg)

6

How YouTube Works (2/2)

GET: swfobject.js

OK (application/x-javascript)

GET: /p.swf

OK (video/flv)

GET: /get_video?video_id=wQVEPFzkhaM

OK (application/shockwave-flash)

7

Our Contributions Efficient measurement framework One of the first extensive

characterizations of Web 2.0 traffic File properties File access patterns Transfer properties

Implications for network and content providers

8

Outline

Introduction & Background Contributions Methodology Results Implications Conclusions

9

Our View Points

Edge (University Campus) 28,000 students 5,300 faculty & staff /16 address space 300Mb/s full-duplex network link

Global Most popular videos

10

Campus Data Collection Goals:

Collect data on all campus YouTube usage Gather data for an extended period of

time Protect user privacy

Challenges: YouTube’s popularity Monitor limitations Volume of campus Internet usage

11

Our Methodology

Identify servers providing YouTube content

Use bro to summarize each HTTP transaction in real time

Restart bro daily and compress the daily log

Map visitor identifier to a unique ID

12

Categories of Transactions

Complete – the entire transaction was parsed successfully

Interrupted – TCP connection was reset

Gap – monitor missed a packet Failure – transaction could not be

parsed

13

Categories of Transactions (2)

Status % of Total % of Video

Complete 90.82 24.66

Interrupted 1.88 24.25

Gap 1.56 51.09

Failure 5.75 -

14

Our Traces

Start Date: Jan. 14, 2007End Date: Apr. 8, 2007Total Valid Transactions:

23,250,438

Total Bytes: 6.54 TBTotal Video Requests:

625,593

Total Video Bytes: 6.45 TBUnique Video Requests:

323,677

Unique Video Bytes: 3.26 TB

15

HTTP Response Codes

Code % of Responses % of Bytes

200 (OK) 75.80 89.78

206 (Partial Content)

1.29 10.22

302 (Found) 0.05 0.00

303 (See Other) 5.33 0.00

304 (Not Modified)

17.34 0.00

4xx (Client Error) 0.19 0.00

5xx (Server Error) 0.01 0.00

16

Global Data Collection Crawling all videos is infeasible Focus on top 100 most popular

videos Four time frames: daily, weekly,

monthly and all time. 2 step data collection:

Retrieve pages of most popular videos Use YouTube API to get details on

these videos

17

Outline

Introduction & Background Contributions Methodology Results Implications Conclusions

18

Results

Campus Usage Patterns File Properties File Access Patterns Transfer Properties

19

Campus Usage Patterns

ReadingBreak

20

Results

Campus Usage Patterns File Properties File Access Patterns Transfer Properties

21

Unique File Sizes

Video data is significantly larger than the other content types

22

Time Since Modification Videos and

images rarely modified

Text and application data modified more frequently

23

Video Durations Spike around 3 minutes likely music videos Campus videos are relatively short: μ=3.3

min

24

Summary of File Properties

Video content is much larger than other content types

Image and video content is more static than application and text content

Video durations are relatively shortVideos viewed on campus tend to be

more than 1 month old

25

Results

Campus Usage Patterns File Properties File Access Patterns Transfer Properties

26

Relative Popularity of Videos Video popularity

follows a weak Zipf distribution

Possibly due to edge network point of view

β = 0.56

27

Commonality of Videos

~10% commonality between consecutive days during the week

~5% commonality between consecutive days on the weekend

28

Summary of File Referencing

Zipf distribution is weak when observed from the edge of the network

There is some overlap between videos viewed on consecutive days

Significant amount of content viewed on campus is non-unique

29

Results

Campus Usage Patterns File Properties File Access Patterns Transfer Properties

30

Transfer SizesFlash player (p.swf, player2.swf)

Javascripts

31

Transfer Durations Video transfers

have significantly longer durations than other content types

32

Summary of Transfer Properties

Javascript and flash objects have an impact on the size of files transferred

Video transfers have significantly larger sizes and durations

33

Outline

Introduction & Background Contributions Methodology Results Implications Conclusions

34

Implications for Network Providers

Web 2.0 poses challenges to caching Larger multimedia files More diversity in content

Meta data may be used to improve caching efficiency

35

Implications for Content Providers

Multimedia content is large! 65,000 videos/day x 10MB/video = 19.5 TB/month

Long tail effect -> much of the content will be unpopular Cheap storage solutions

Longer transfer durations for video files more CPU cycles required for transfers

36

Conclusions Multimedia content has much larger

transfer sizes and durations than other content types

From the edge of the network, video popularity follows a weak Zipf distribution

Web 2.0 facilitates diversity in content which poses challenges to caching

New approaches are needed to efficiently handle the resource demands of Web 2.0 sites

37

Questions?

Contactpsessini@ucalgary.ca