Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements...

39
Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th , 2003 CS 8803: Network Measurements Seminar

Transcript of Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements...

Page 1: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Measurements ofPeer-to-Peer Systems

Pradnya Karbhari

Nov 25th, 2003

CS 8803: Network Measurements Seminar

Page 2: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Introduction to Peer-to-Peer (P2P) systems

End-systems (or peers), are capable of behaving as clients and servers of data, hence system is scalable and reliable

Peers participation is voluntary, membership is dynamic, hence topology keeps changing

Most popularly used for file sharing, hence peer-to-peer systems have become synonymous with peer-to-peer file sharing networks

Page 3: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Classification of P2P systems

P2P computation (e.g. seti@home) P2P communication (instant messaging) P2P file-sharing networks

Centralized (e.g. Napster) Decentralized

Structured (e.g. Chord, CAN, Pastry, Tapestry) Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey,

eMule, Direct Connect, …)

Page 4: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Popularity of unstructured decentralized P2P networks

Gnutella host count, maintained by Limewire (http://www.limewire.com)

good scope for measurement studies because: deployed and widely used use a lot of bandwidth during

data transfer, hence a concern for network operators

quite a few measurement studies have been done on these systems, some of which we will discuss in this seminar

Page 5: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Outline

Characterization of users of P2P systems Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing

Systems”, MMCN, 2002. Effect of P2P traffic on the underlying network

Sen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02 Peer-to-Peer Topologies

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002.

Searching on the P2P network Sripanidkulchai, “The popularity of Gnutella queries and its implications on

scalability”, 2001 Deciphering proprietary P2P systems (like Kazaa)

Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.

Page 6: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Gnutella protocol overview

Connecting to the Gnutella network bootstrap using GWebCache system and locally cached hostlist Ping/Pong messages are exchanged with potential neighbors

Searching on the network Query messages are flooded on the network QueryHit messages are received (back-propagated along Query

path) from peers having the requested content

Downloading the content peers download files directly from peers having the requested

content

Page 7: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Characterization of Users of P2P systems

S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN’02.

first paper to characterize p2p file sharing systems Goal: To analyze the following user characteristics

latency lifetime of peers bottleneck bandwidth number of files shared and downloaded degree of cooperation

methodology: active crawling systems studied: Napster and Gnutella data collection: May 2001

Page 8: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Measurement Methodology

active crawling of the Napster and Gnutella systems Napster: issued queries for popular content, and then queried

central server for peer information Gnutella: used ping/pong messages in protocol to get metadata

about peers, and then their neighbors and so on

parallel measurement for: peer lifetime- periodic probing of peers obtained from crawlers

offline if no response to TCP SYN inactive if response to TCP SYN is a TCP RST active if accepts the incoming TCP connection on that port

latency- RTT measurements from one host bottleneck link bandwidth- active probing using Sprobe, a tool

they developed based on packet-pair dispersion technique

Page 9: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Host Lifetime analysis

20% peers in Napster, Gnutella have IP-level uptime of 93% or more Napster peers have higher application uptimes than Gnutella peers the best 20% of Napster peers have uptime of 83% or more and the

best 20% of Gnutella peers have uptime of 45% or more median session duration is 60 minutes for Napster and Gnutella

Page 10: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Latency analysis (Gnutella)

20% peers have a latency of at most 70ms and 20% have a latency of at least 280ms

correlation between downstream bottleneck bandwidth and latency: two clusters for modems (20-60Kbps, 100-1000ms) and broadband (1Mbps, 60-300ms)

Page 11: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Bottleneck Bandwidth Analysis (Gnutella)

92% Gnutella peers have downstream bottleneck bandwidth of at least 100Kbps

22% peers have upstream bottleneck bandwidth of 100Kbps or less

peers are unsuitable to serve content

Page 12: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Downloads, Uploads and Shared Files

relative number of downloads and uploads varies significantly across bandwidth classes

clear client/server behavior of different classes

Page 13: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Shared files v/s Shared Data(Napster and Gnutella)

Strong correlation between number of files shared and amount of shared MB of data

slope of both lines is 3.7MB, the size of a typical MP3 audio file

Page 14: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

Degree of Cooperation (Napster)

30% of the peers report bandwidth as 64Kbps or less, but actually have significantly higher bandwidths

10% of the peers reporting higher bandwidths (3Mbps or higher) actually have significantly lower bandwidth

Page 15: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Effect of P2P traffic on underlying network

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002.

Goal: To characterize p2p traffic at three aggregation levels- IP, prefix and AS host distribution and host connectivity traffic volume and mean bandwidth usage traffic patterns over time connection duration and on-time methodology: passive

measurements at routers (port based) systems studied: FastTrack(Kazaa), Gnutella, Direct Connect analysis of flow-level data collected from multiple border

routers across a large tier-1 ISP’s backbone

Page 16: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Measurement Methodology

flow records from multiple border routers matching ports: 6346/6347: Kazaa 1214: FastTrack 411/412: Direct Connect

processed data to eliminate private IP addresses invalid AS numbers

final data set contained 800 million flow records

Page 17: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Datasets used for analysis

FastTrack is most popular in terms of number of hosts participating and average traffic volume per day

rapid growth of P2P traffic is mainly caused by increasing number of hosts in the system

Direct Connect systems have higher traffic volume per IP address

Page 18: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Host distribution analysis

# of IP addresses in FastTrack ranges from 0.5 to 2 million

ratio of # of IP addresses in FastTrack:Gnutella:DirectConnect is 150:30:1

Density of a prefix is the number of unique active IP addresses belonging to it

Density of an AS is the number of unique prefixes belonging to it

FastTrack hosts are distributed more densely than Gnutella and Direct Connect hosts (64:16:4)

Page 19: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Host connectivity analysis (FastTrack)

48% of individual IPs communicate with at most one IP and 89% with at most 10 IPs

75% of prefixes and ASes communicate with at least 2 prefixes or ASes

very few hosts have very high connectivity and most hosts have very low connectivity

Page 20: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Traffic volume analysis

CDF of traffic volume per IP/prefix/AS for FastTrack (one day)

distribution of P2P upstream traffic volume across three months

Page 21: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Mean bandwidth usage(FastTrack and Direct Connect)

FastTrack: 33% IP addresses have mean downstream b/w 56Kbps or less; 50% have mean upstream b/w 56Kbps or less

Direct Connect: 20% IP addresses have mean downstream b/w 56Kbps or less; 33% have mean upstream b/w 56Kbps or less

Page 22: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Traffic patterns over time (FastTrack)

traffic volume transferred every hour among FastTrack hosts number of unique IP addresses, prefixes, ASes active every hour number of active unique IP addresses in each bin of various sizes system is very dynamic- hosts join and leave frequently

Page 23: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

Connection duration and On-time (FastTrack)

50% of the IPs are online for less than one minute/day 60% IPs, 40% prefixes, 30% ASes stay for less than 10 mins/day 65% of the IPs join only once AS, prefix level- not very transient

Page 24: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Peer-to-Peer Topologies

M. Ripeanu, I. Foster and A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing Journal, 2002.

Goal: To discover and analyze the Gnutella overlay topology and evaluate generated traffic

methodology: active crawling datasets: Nov 2000, March 2001 and May 2001

Page 25: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

Gnutella Network Growth

number of nodes in the largest connected component in the Gnutella network

significantly larger network found during Memorial Day and Thanksgiving

50 times increase within 6 months

Page 26: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

Distribution of node-to-node shortest paths

more than 95% node pairs are at most 7 hops away

longest node-to-node path is 12 hops

Page 27: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

Averag node connectivity

average number of connections per node remains constant = 3.4

Page 28: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

Node connectivity distribution

Nov 2000: Gnutella nodes organize themselves in a power law March 2001: connectivity does not look like a power law for all

nodes; power law distribution is preserved for nodes with more than 10 links; for less than 10 links, the distribution is almost constant

Page 29: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Searching on the P2P network

K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001, http://www-2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html

methodology: passive measurements at one or two peers, made part of the Gnutella network, to log queries and query messages routed through it

data sets: Dec 2000, Jan 2001

Page 30: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001.

Top 20 most popular query types

17% queries contained non-ASCII strings- filtered them out

most queries for artists, adult content and file extensions (audio)

some queries for books, software etc.

Page 31: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001.

Query popularity distribution

two distinct distributions of document popularity, with a break at query rank 100

most popular documents are equally popular

less popular documents follow a Zipf-like distribution, with alpha beween 0.63 and 1.24

Page 32: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Deciphering proprietary P2P systems

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003.

methodology: passive content-based data collection at a caching server installed at the border of a large ISP

L4 switch inspects first few packets of each TCP connection to detect Kazaa download traffic

redirects Kazaa download traffic through caching server focus on download traffic only, not control traffic (since it

is encrypted)

Page 33: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

Characteristics of Collected Traces

38% of all download sessions do not use standard Kazaa port (1214)

Page 34: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

File download distribution by bytes

CDF of byte popularity distribution for 10%, 1% most popular files 0.8 % of all files account for 80% of the generated traffic 0.1% of the most bandwidth hungry files (top 1% of all files)

generate 50% traffic

Page 35: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

File size distribution

note the log-scale on X-axis

3 distinct modes 100KB for pictures 2-5MB for music files 700MB for movies

Page 36: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

Quantity and Rate of Distinct Files

new files seen at different time scales- every day, hour, minute 150,000 distinct files during a 17-day period daily graph: new files seen continued to decrease, but no steady

state value (rate of injection of files in the network) achieved hourly graph: time of day effect per-minute graph: 50 new files seen every minute on an average

Page 37: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

Rate of change of popularity of files

percentage of files that make it to the N most popular files list- (a) in consecutive intervals and (b) after T intervals, compared with first list

measurement interval is 24 hours 15% of the highly popular files remain popular throughout the

experiment, and the rest are popular at short time intervals

Page 38: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Open Questions

Mapping a global snapshot of the entire Gnutella topology

Bootstrapping of peers in unstructured peer-to-peer systems (work in progress)

More efficient searching on P2P networks- efforts in this direction include random walks, bloom-filter based techniques etc.

End-point privacy/anonymity is absent in most of these peer-to-peer networks

Page 39: Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

References

Papers covered in the seminar: S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing

Systems”, MMCN 2002. S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002. M. Ripeanu, I. Foster, A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale

Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001. N. Leibowitz, M. Ripeanu, A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP 2003.

Papers not covered in the seminar: J. Chu, K.Labonte and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File

Systems”, SPIE, July 2002. F. Bustamante and Y. Qiao, “Friendships that last: Peer lifespan and its role in P2P protocols”,

WCW 2003. R. Bhagwan, S. Savage and G. Voelker, “Understanding Availability”, IPTPS 2003. Saroiu, et.al., “An Analysis of Internet Content Delivery Systems”, OSDI 2002. Markatos et.al., “Tracing a large-scale Peer-to-Peer System: An hour in the life of Gnutella”,

CCGrid 2002.