Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC...
-
Upload
ashlyn-park -
Category
Documents
-
view
217 -
download
0
Transcript of Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC...
Maurizio NaldiUniversità di Roma “Tor Vergata”
POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC
MODELLING
Workshop “Statistica e Telecomunicazioni”, Roma 2-3 Luglio 2001
Università Di Roma “Tor Vergata”Dip. Informatica Sistemi Produzione
WHAT’S A POPULARITY MODEL
Popularity models describe the way users distribute their preferences among a set of objects.They are represented under the form of either a frequency-rank plot (suitable for highly preferred objects) or a frequency-count plot (suitable for the less preferred objects.
EXAMPLES OF FREQUENCY-RANK AND FREQUENCY-COUNT PLOTS
A frequency-rank plotNo. of preferences vs. rank
A frequency-count plotNo. of preferences vs. no. of objects
that have those preferences
SOME POPULARITY MODELS(FREQUENCY-RANK LAWS)
• Zipf
• Simon
• Yule
rrf
rbrrf
1
1
r
rrf
RELATIONSHIP TO PARETO’S MODEL
Arx
If the objects in a set of N are ranked by size according to Zipf’s law
Then the number of objects having a size greater or equal tox is
1
A
xr
The probability distribution function is therefore
11
1
A
x
NrF i.e. of the Pareto type
APPLICATIONS
Present• Cache algorithm design• Address cache table dimensioning• Optimization of Video-on-Demand
servers’ architecturePossible• Any communications context where the
user has a wide choice
TRAFFIC MONITORING POINTS
Users Sites
Web proxy observation pointSome-to-All
Web server observation point
All-to-One
Users Sites
OBSERVED REQUEST DISTRIBUTIONS
OBSERVED REQUEST DISTRIBUTIONS
OBSERVED REQUEST DISTRIBUTIONS
OBSERVED DISTRIBUTIONS OF USERS AMONG SITES
THE 20/80 (10/90) RULE
Proportion ofdocuments [%]
Expectedrequests [%]
Observedrequests [%]
1 19-46 20-35
10 44-68 45-55
Proportion ofrequests [%]
Expectedproportion of
documents [%]
Observedproportion of
documents [%]
70 12-37 25-40
90 54-75 70-80
• The proportion of requests for the top documents is overestimated• Fixed proportion rules are false
GENERAL COMMENTS
• When fitted by linear regression via Zipf’s law the estimated parameter typically lies in the 0.6-0.85 range
• All log-log frequency-rank plots exhibit an initial concavity (top objects’ preferences are overestimated)
• All log-log frequency-count plots exhibit final (count vs. frequency) spreading
OPEN ISSUES
• Search for better models (solving the initial concavity problem)
• Search for parameter estimation methds other than linear regression
• Definition of proper goodness-of-fit tests