Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard...

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload

Krishna Gummadi, Richard Dunn, Stefan SaroiuSteve Gribble, Hank Levy, John Zahorjan

Several slides were taken from the original presentation by Gummadi

The Internet has changed

• Explosive growth of P2P file-sharing systems

– now the dominant source of Internet traffic

– its workload consists of large multimedia (audio, video) files

• P2P file-sharing is very different than the Web

– in terms of both workload and infrastructure

– we understand the dynamics of the Web, but the dynamics

of P2P are largely unknown

Why measure?

Measure

Build model

Validate

Predict

The current paper

• Multimedia workloads– what files are being exchanged– goal: to identify the forces driving the workload and

understand the potential impacts of future changes in them

• P2P delivery infrastructure– how the files are being exchanged– goal: to understand the behavior of Kazaa peers, and

derive implications for P2P as a delivery infrastructure

Studies the KazaA peer-to-peer file-sharing system, to understand two separate phenomena

KazaA: Quick Overview

• Peers are individually owned computers– most connected by modems or broadband– no centralized components

• Two-level structure: some peers are “super-nodes”– super-nodes index content from peers underneath– files transferred in segments from multiple peers

simultaneously

• The protocol is proprietary

Methodology

• Capture a 6-month long trace of Kazaa traffic at

UW– trace gathered from May 28th – December 17th, 2002

• passively observe all objects flowing into UW campus

• classify based on port numbers and HTTP headers

• anonymize sensitive data before writing to disk

• Limitations:– only studied one population (UW)

– could see data transfers, but not encrypted control traffic

– cannot see internal Kazaa traffic

Trace Characteristics

Outline

• Introduction

• Some observations about Kazaa

• A model for studying multimedia workloads

• Locality-aware P2P request distribution

• Conclusions

Kazaa is really 2 workloads

<= 10 10-100 >100

object size (MB)

% of requests/bytes

requests

bytes transferred

• If you care about:– making users happy: make sure audio/video arrives quickly– making IT dept. happy: cache or rate limit video

Kazaa users are very patient

• audio file takes 1 hr to fetch over broadband, video takes 1 day– but in either case, Kazaa users were willing to wait for weeks!

download latency

% of requests (CDF)

5 mins 1 hour 1 week1 day 150 days

Kazaa objects are immutable

• The Web is driven by object change(many visit cnn.com every hour. Why?)

– users revisit popular sites, as their content changes– rate of change limits Web cache effectiveness [Wolman 99]

• In contrast, Kazaa objects never change– as a result, users rarely re-download the same object

• 94% of the time, a user fetches an object at-most-once• 99% of the time, a user fetches an object at-most-twice

– implications:• # requests to popular objects bounded by user population size

Kazaa popularity has high turnover

• Popularity is short lived: rankings constantly change– only 5% of the top-100 audio objects stayed in the top-100 over

our entire trace [video: 44%]

• Newly popular objects tend to be recently born– of audio objects that “broke into” the top-100, 79% were born a

month before becoming popular [video: 84%]

Zipf distribution

Zipf’s Law states that the popularity of an object

of rank k is 1/ kr of the popularity of the top-ranked

object (1 < r < 2).

rank1 2 3

y Log-log plot will be a straight linewith a negative slope

Kazaa does not obey Zipf’s law

10,000

100,000

1,000,000

10,000,000

1 10 100 1000 10000 100000 1000000 1E+07 1E+08

object rank

# of requests

WWW objects

• Kazaa: the most popular objects are 100x less popular than Zipf predicts

10,000

100,000

1,000,000

10,000,000

1 10 100 1000 10000 100000 1000000 1E+07 1E+08

object rank

# of requests

WWW objects

Factors driving P2P file-sharing workloads

• The traces suggest two factors drive P2P workloads:

1. Fetch-at-most-once behavior– resulting in a “flattened head” in popularity curve

2. The “dynamics” of objects and users over time– new objects are born, old objects lose popularity, and new

users join the system

• Let’s build a model to gain insight into these factors

It’s not just Kazaa

1 10 100 1000movie index

rental frequency

• Video rental and movie box office sales data show similar properties

– multimedia in general seems to be non-Zipf

video store rentals

box office sales

It’s not just Kazaa

100000

1000000

1 10 100 1000movie index

box office sales ($millions)

• Video rental and movie box office sales data show similar properties– multimedia in general seems

to be non-Zipf

video store rentals

box office sales

Outline

• Introduction

• Conclusions

Model basics

1. Objects are chosen from an underlying Zipf curve

2. But we enforce “fetch-at-most-once” behavior– when a user picks an object, it is removed from her

distribution

3. Fold in “user, object dynamics”– new objects inserted with initial popularity drawn from Zipf

• new popular objects displace the old popular objects

– new users begin with a fresh Zipf curve

Model parameters

C # of clients 1,000

O # of objects 40,000

λRclient req. rate 2 objs/day

r Zipf param driving obj. popularity

prob. client req. object of pop rank x

Zipf (1.0) +

fetch-at-most-once

prob. of new object inserted at pop rank x

Zipf (1.0)

M cache size (frac. of obj) varies

λO object arrival rate varies

λcclient arrival rate varies

Fetch-at-most-once flattens Zipf’s head

100000

1 10 100 1000 10000 100000

object rank

# of requests

fetch-at-most-once + Zipf

File sharing effectiveness

An organization is experiencing too much demand for external bandwidth for P2P applications. How will the demand change if a proxy cache is used? Let us examinethe hit ratio of the proxy cache.

Caching implications

0 100 200 300 400 500 600 700 800 900 1000

hit rate

cache size = 1000 objectsrequest rate per user = 2/day

• In the absence of new objects and users– fetch-many: cache hit rate is stable– fetch-at-most-once: hit rate degrades over time

Fetch repeatedlyLike Web objects

Popular objects areConsumed early. After this,

It is pretty much random

0 100 200 300 400 500 600 700 800 900 1000

hit rate

cache size = 1000 objectsrequest rate per user = 2/day

New objects help (not hurt)

• New objects do cause cold misses– but they replenish the supply of popular objects that are the

source of file sharing hits• A slow, constant arrival rate stabilizes performance

– rate needed is proportional to avg. per-user request rate

1 2 3 4 5 6 7 8 9

Object arrival rate / avg. per-user request rate

hit rate

cache size = 8000 objects

New users cannot help

• They have potential…– new users have a “fresh” Zipf curve to draw from

– therefore will have a high initial hit rate

• But the new users grow old too– ultimately, they increase the size of the “elderly” population

– to offset, must add users at exponentially increasing rate• not sustainable in the long run

Validating the model

• We parameterized our model using measured trace values

– its output closely matches the trace itself

1 10 100 1000 10000 100000

object rank

# requests

measured

our model

static Zipf

Outline

• Introduction

• Conclusions

Kazaa has significant untapped locality

• We simulated a proxy cache for UW P2P environment

– 86% of Kazaa bytes already exist within UW when they are downloaded externally by a UW peer

14.0% 11.5%

all objects Video objects Audio objects

% bytes transferred

Locality Aware Request Routing

• Idea: download content from local peers, if available– local peers as a distributed cache instead of a proxy cache

• Can be implemented in several ways– scheme 1: use a redirector instead of a cache

• redirector sits at organizational border, indexes content, reflects download requests to peers that can serve them

– scheme 2: decentralized request distribution• use location information in P2P protocols (e.g., a DHT)

• We simulated locality-awareness using our trace data– note that both schemes are identical w.r.t the simulation

Locality-aware routing performance

• “P2P-ness” introduces a new kind of miss: “unavailable” miss– even with pessimistic peer availability, locality-awareness saves

significant bandwidth– goal of P2P system: minimize the new miss types

• achieve upper bound imposed by workload (cold misses only)

all objects video objects audio objects

% bytes transferred

unavailable

hit67.9%

Eliminating unavailable misses

• Popularity drives a kind of “natural replication”– descriptive, but also predictive

• popular objects take care of themselves, unpopular can’t help• focus on “middle” popularity objects when designing systems

1 10 100 1000 10000 100000 1000000

object # (sorted by popularity)

unavailability miss bytes (CDF)

Video objects

Audio objects

Conclusions

• P2P file-sharing driven by different forces than the Web • Multimedia workloads:

– driven by two factors: fetch-at-most-once, object/user dynamics

– constructed a model that explains non-zipf behavior and validated it

• P2P infrastructure:– current file-sharing architectures miss opportunity

– locality-aware architectures can save significant bandwidth

– a challenge for P2P: eliminating unavailable misses

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard...

Documents

Transcript of Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard...

OSN Research As If Sociology Mattered Krishna P. Gummadi Networked Systems Research Group MPI-SWS.

An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna.

Measurement, Modeling, and Analysis of a Peer-to-Peer File ...hy554/papers/gummadi-sosp2003.pdf{gummadi,rdunn,tzoompy,gribble,levy,zahorjan}@cs.washington.edu ABSTRACT Peer-to-peer

"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.

1 UFlood: High-Throughput Wireless Flooding Jayashree Subramanian Collaborators: Robert Morris, Ramakrishna Gummadi, and Hari Balakrishnan.

Metronome: Coordinating Spectrum Sharing in ...srini/papers/2009.Gummadi...Metronome: Coordinating Spectrum Sharing in Heterogeneous Wireless Networks Ramakrishna Gummadi* *MIT CSAIL

Eduardo Cuervo – Duke University Aruna Balasubramanian - University of Massachusetts Amherst Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra,

1 IT Services in Developing Nations Mark Tegtmeyer Stephanie Schmitt Aarti Dinesh Vijay Gummadi.

June 2020 Investor Presentation › mr5ircnw_encana › 881...($0.5) $0.0 $0.5 $1.0 Peer 28 Peer 27 Peer 26 Peer 25 Peer 24 Peer 23 Peer 22 Peer 21 Peer 20 Peer 19 Peer 18 Peer 17

Climate and crop modeling by Gummadi Sridhar,Gizachew Legesse,Pauline Chivenge, Martin Moyo,Lieven Claessens

ITrustPage: Pretty Good Phishing Protection Stefan Saroiu, Troy Ronda, and Alec Wolman University of Toronto and Microsoft Research.

Ben Greenstein, Jeff Pang, Ramki Gummadi, Srini Seshan, and David Wetherall

Data Center Services and Optimizationpbg.web.engr.illinois.edu/courses/cs538fa11/lectures/18-Chris-Sobir.pdfServices, by Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu,

An Analysis of Internet Content Delivery Systems - …gummadi/papers/osdi.pdfAn Analysis of Internet Content Delivery Systems Stefan Saroiu, ... tems from the perspectives of clients,

Wireless Networks Should Spread Spectrum On Demand Ramki Gummadi (MIT) Joint work with Hari Balakrishnan.

Understanding and Mitigating the Impact of RF Interference ...srini/papers/2007.Gummadi...Ramakrishna Gummadi∗ David Wetherall†‡ Ben Greenstein† Srinivasan Seshan ∗USC †

1 Improving Retrieval Accuracy in Web Databases Using Attribute Dependencies Ravi Gummadi & Anupam Khulbe gummadi@asu.edu – akhulbe@asu.edu Computer Science.

Predictable Scheduling for a Soft Modem Stefan Saroiu – University of Washington tzoompy@cs.washington.edu Michael.

An Analysis of Internet Content Delivery Systemsgummadi/papers/osdi.pdfAn Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D.

Simplifying Friendlist ManagementSimplifying Friendlist Management Yabing Liu† Bimal Viswanath‡ Mainack Mondal‡ Krishna Gummadi‡ Alan Mislove† †Northeastern University