Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100...

8
Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 1 of 43 Power-Law Size Distributions Principles of Complex Systems CSYS/MATH 300, Spring, 2013 Prof. Peter Dodds @peterdodds Department of Mathematics & Statistics | Center for Complex Systems | Vermont Advanced Computing Center | University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 2 of 43 Outline Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 3 of 43 Let’s test our collective intuition: Money Belief Two questions about wealth distribution in the United States: 1. Please estimate the percentage of all wealth owned by individuals when grouped into quintiles. 2. Please estimate what you believe each quintile should own, ideally. 3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 4 of 43 Wealth distribution in the United States: [8] “Building a better America—One wealth quintile at a time” Norton and Ariely, 2011. [8] Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 5 of 43 Wealth distribution in the United States: [8] Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF References 6 of 43 Your turn—estimates: 0 20 40 60 80 100

Transcript of Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100...

Page 1: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

1 of 43

Power-Law Size DistributionsPrinciples of Complex SystemsCSYS/MATH 300, Spring, 2013

Prof. Peter Dodds@peterdodds

Department of Mathematics & Statistics | Center for Complex Systems |Vermont Advanced Computing Center | University of Vermont

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

2 of 43

Outline

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf⇔ CCDF

References

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

3 of 43

Let’s test our collective intuition:

Money≡

Belief

Two questions about wealth distribution in the UnitedStates:

1. Please estimate the percentage of all wealth ownedby individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

4 of 43

Wealth distribution in the United States: [8]

“Building a better America—One wealth quintile at a time”Norton and Ariely, 2011. [8]

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

5 of 43

Wealth distribution in the United States: [8]

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

6 of 43

Your turn—estimates:

0 20 40 60 80 100

Page 2: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

7 of 43

Your turn—ideal:

0 20 40 60 80 100

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

8 of 43

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:

P(size = x) ∼ c x−γ

where 0 < xmin < x < xmax and γ > 1.

I Exciting class exercise: sketch this function.

I xmin = lower cutoff, xmax = upper cutoffI Negative linear relationship in log-log space:

log10 P(x) = log10 c − γ log10 x

I We use base 10 because we are good people.

I power-law decays in probability:The Statistics of Surprise.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

9 of 43

Size distributions:

Usually, only the tail of the distribution obeys apower law:

P(x) ∼ c x−γ for x large.

I Still use term ‘power-law size distribution.’I Other terms:

I Fat-tailed distributions.I Heavy-tailed distributions.

Beware:I Inverse power laws aren’t the only ones:

lognormals (), Weibull distributions (), . . .

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

10 of 43

Size distributions:

Many systems have discrete sizes k :I Word frequencyI Node degree in networks: # friends, # hyperlinks, etc.I # citations for articles, court decisions, etc.

P(k) ∼ c k−γ

where kmin ≤ k ≤ kmax

I Obvious fail for k = 0.I Again, typically a description of distribution’s tail.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

11 of 43

The statistics of surprise—words:Brown Corpus () (∼ 106 words):

rank word % q1. the 6.88722. of 3.58393. and 2.84014. to 2.57445. a 2.29966. in 2.10107. that 1.04288. is 0.99439. was 0.9661

10. he 0.939211. for 0.934012. it 0.862313. with 0.717614. as 0.713715. his 0.6886

rank word % q1945. apply 0.00551946. vital 0.00551947. September 0.00551948. review 0.00551949. wage 0.00551950. motor 0.00551951. fifteen 0.00551952. regarded 0.00551953. draw 0.00551954. wheel 0.00551955. organized 0.00551956. vision 0.00551957. wild 0.00551958. Palmer 0.00551959. intensity 0.0055

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

12 of 43

Jonathan Harris’s Wordcount: ()

A word frequency distribution explorer:

Page 3: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

13 of 43

The statistics of surprise—words:

First—a Gaussian example:

P(x)dx =1√2πσ

e−(x−µ)2/2σdx

linear:

0 5 10 15 200

0.1

0.2

0.3

0.4

x

P(x

)

log-log

−3 −2 −1 0 1 2−25

−20

−15

−10

−5

0

log10 x

log 1

0P

(x)

mean µ = 10, variance σ2 = 1.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

14 of 43

The statistics of surprise—words:

Raw ‘probability’ (binned) for Brown Corpus:

linear:

0 2 4 6 80

200

400

600

800

1000

1200

q

Nq

log-log

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

Nq

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

15 of 43

The statistics of surprise—words:

‘Exceedance probability’ N>q:

linear:

0 2 4 6 80

500

1000

1500

2000

2500

q

N>

q

log-log

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

16 of 43

My, what big words you have...

I Test capitalizes on word frequency following aheavily skewed frequency distribution with adecaying power-law tail.

I Let’s do it collectively... ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

17 of 43

The statistics of surprise:

Gutenberg-Richter law ()

I Log-log plotI Base 10I Slope = -1

N(M > m) ∝ m−1

I From both the very awkwardly similar Christensen etal. and Bak et al.:“Unified scaling law for earthquakes” [4, 2]

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

18 of 43

The statistics of surprise:

From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” () by Kenneth Chang, March13, 2011, NYT:‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In the past300 years, no earthquake nearly that large—nothinglarger than magnitude eight—had struck in the Japansubduction zone. That, in turn, led to assumptions abouthow large a tsunami might strike the coast.’

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has been underestimated.Most geophysicists did not think the Sumatra fault couldgenerate a magnitude 9.1 earthquake, . . . ’

Page 4: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

19 of 43

Well, that’s just great:

Two things we have poor cognitive understanding of:1. Probability

I Ex. The Monty Hall Problem ()I Ex. Daughter/Son born on Tuesday () (see asides;

Wikipedia entry Boy or Girl Paradox ()here).

2. Logarithmic scales.

On counting and logarithms:

I Listen to Radiolab’s“Numbers.” ().

I Later: Benford’s Law ().

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

20 of 43

6

100 102 104

word frequency

100

102

104

100 102 104

citations

100

102

104

106

100 102 104

web hits

100

102

104

106 107

books sold

1

10

100

100 102 104 106

telephone calls received

100

103

106

2 3 4 5 6 7earthquake magnitude

102

103

104

0.01 0.1 1crater diameter in km

10-4

10-2

100

102

102 103 104 105

peak intensity

101

102

103

104

1 10 100intensity

1

10

100

109 1010

net worth in US dollars

1

10

100

104 105 106

name frequency

100

102

104

103 105 107

population of city

100

102

104

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

FIG. 4 Cumulative distributions or “rank/frequency plots” of twelve quantities reputed to follow power laws. The distributionswere computed as described in Appendix A. Data in the shaded regions were excluded from the calculations of the exponentsin Table I. Source references for the data are given in the text. (a) Numbers of occurrences of words in the novel Moby Dickby Hermann Melville. (b) Numbers of citations to scientific papers published in 1981, from time of publication until June1997. (c) Numbers of hits on web sites by 60 000 users of the America Online Internet service for the day of 1 December 1997.(d) Numbers of copies of bestselling books sold in the US between 1895 and 1965. (e) Number of calls received by AT&Ttelephone customers in the US for a single day. (f) Magnitude of earthquakes in California between January 1910 and May 1992.Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys apower law even though the horizontal axis is linear. (g) Diameter of craters on the moon. Vertical axis is measured per squarekilometre. (h) Peak gamma-ray intensity of solar flares in counts per second, measured from Earth orbit between February1980 and November 1989. (i) Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of theparticipating countries. (j) Aggregate net worth in dollars of the richest individuals in the US in October 2003. (k) Frequencyof occurrence of family names in the US in the year 1990. (l) Populations of US cities in the year 2000.

6

100

102

104

wor

d fr

eque

ncy

100

102

104

100

102

104

cita

tions

100

102

104

106

100

102

104

web

hits

100

102

104

106

107

book

s sol

d

110100

100

102

104

106

tele

phon

e ca

lls re

ceiv

ed

100

103

106

23

45

67

earth

quak

e m

agni

tude

102

103

104

0.01

0.1

1cr

ater

dia

met

er in

km

10-4

10-2

100

102

102

103

104

105

peak

inte

nsity

101

102

103

104

110

100

inte

nsity

110100

109

1010

net w

orth

in U

S do

llars

110100

104

105

106

nam

e fr

eque

ncy

100

102

104

103

105

107

popu

latio

n of

city

100

102

104

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)(k

)(l)

FIG

.4

Cum

ula

tive

distr

ibutionsor

“ra

nk/fr

equen

cyplo

ts”

oftw

elve

quantities

repute

dto

follow

pow

erla

ws.

The

distr

ibutions

wer

eco

mpute

das

des

crib

edin

Appen

dix

A.

Data

inth

esh

aded

regio

ns

wer

eex

cluded

from

the

calc

ula

tions

ofth

eex

ponen

tsin

Table

I.Sourc

ere

fere

nce

sfo

rth

edata

are

giv

enin

the

text.

(a)

Num

ber

sofocc

urr

ence

sofw

ord

sin

the

nov

elM

oby

Dic

kby

Her

mann

Mel

ville

.(b

)N

um

ber

sof

cita

tions

tosc

ientific

paper

spublish

edin

1981,

from

tim

eof

publica

tion

until

June

1997.

(c)

Num

ber

sofhits

on

web

site

sby

60

000

use

rsofth

eA

mer

ica

Online

Inte

rnet

serv

ice

for

the

day

of1

Dec

ember

1997.

(d)

Num

ber

sof

copie

sof

bes

tsel

ling

books

sold

inth

eU

Sbet

wee

n1895

and

1965.

(e)

Num

ber

of

calls

rece

ived

by

AT

&T

tele

phone

cust

om

ersin

the

US

fora

single

day

.(f

)M

agnitude

ofea

rthquakes

inC

alifo

rnia

bet

wee

nJanuary

1910

and

May

1992.

Magnitude

ispro

port

ionalto

the

logarith

mofth

em

axim

um

am

plitu

de

ofth

eea

rthquake,

and

hen

ceth

edistr

ibution

obey

sa

pow

erla

wev

enth

ough

the

horizo

nta

laxis

islinea

r.(g

)D

iam

eter

ofcr

ate

rson

the

moon.

Ver

tica

laxis

ism

easu

red

per

square

kilom

etre

.(h

)Pea

kgam

ma-r

ayin

tensity

of

sola

rflare

sin

counts

per

seco

nd,

mea

sure

dfr

om

Eart

horb

itbet

wee

nFeb

ruary

1980

and

Nov

ember

1989.

(i)

Inte

nsity

ofw

ars

from

1816

to1980,m

easu

red

as

batt

ledea

ths

per

10000

ofth

epopula

tion

ofth

epart

icip

ating

countr

ies.

(j)

Aggre

gate

net

wort

hin

dollars

ofth

erich

est

indiv

iduals

inth

eU

Sin

Oct

ober

2003.

(k)

Fre

quen

cyofocc

urr

ence

offa

mily

nam

esin

the

US

inth

eyea

r1990.

(l)

Popula

tions

ofU

Sci

ties

inth

eyea

r2000.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

21 of 43

Size distributions:

Examples:I Earthquake magnitude (Gutenberg-Richter

law ()): [6, 2] P(M) ∝ M−2

I Number of war deaths: [11] P(d) ∝ d−1.8

I Sizes of forest fires [5]

I Sizes of cities: [12] P(n) ∝ n−2.1

I Number of links to and from websites [3]

I See in part Simon [12] and M.E.J. Newman [7] “Powerlaws, Pareto distributions and Zipf’s law” for more.

I Note: Exponents range in error

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

22 of 43

Size distributions:

Examples:I Number of citations to papers: [9, 10] P(k) ∝ k−3.I Individual wealth (maybe): P(W ) ∝W−2.I Distributions of tree trunk diameters: P(d) ∝ d−2.I The gravitational force at a random point in the

universe: [1] P(F ) ∝ F−5/2. (see the Holtsmarkdistribution () and stable distributions ()

I Diameter of moon craters: [7] P(d) ∝ d−3.I Word frequency: [12] e.g., P(k) ∝ k−2.2 (variable)

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

23 of 43

power-law distributions

Gaussians versus power-law distributions:I Mediocristan versus ExtremistanI Mild versus Wild (Mandelbrot)I Example: Height versus wealth.

I See “The Black Swan” by NassimTaleb. [13]

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

24 of 43

Turkeys...

From “The Black Swan” [13]

Page 5: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

25 of 43

Taleb’s table [13]

Mediocristan/ExtremistanI Most typical member is mediocre/Most typical is either

giant or tiny

I Winners get a small segment/Winner take almost alleffects

I When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s going on

I Prediction is easy/Prediction is hard

I History crawls/History makes jumps

I Tyranny of the collective/Tyranny of the rare andaccidental

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

26 of 43

Size distributions:

Power-law size distributions aresometimes calledPareto distributions () after Italianscholar Vilfredo Pareto. ()

I Pareto noted wealth in Italy wasdistributed unevenly (80–20 rule;misleading).

I Term used especially bypractitioners of the DismalScience ().

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

27 of 43

Devilish power-law size distribution details:

Exhibit A:I Given P(x) = cx−γ with 0 < xmin < x < xmax,

the mean is (γ 6= 2):

〈x〉 = c2− γ

(x2−γ

max − x2−γmin

).

I Mean ‘blows up’ with upper cutoff if γ < 2.I Mean depends on lower cutoff if γ > 2.I γ < 2: Typical sample is large.I γ > 2: Typical sample is small.

Insert question from assignment 1 ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

28 of 43

And in general...

Moments:I All moments depend only on cutoffs.I No internal scale that dominates/matters.I Compare to a Gaussian, exponential, etc.

For many real size distributions: 2 < γ < 3I mean is finite (depends on lower cutoff)I σ2 = variance is ‘infinite’ (depends on upper cutoff)I Width of distribution is ‘infinite’I If γ > 3, distribution is less terrifying and may be

easily confused with other kinds of distributions.

Insert question from assignment 1 ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

29 of 43

Moments

Standard deviation is a mathematical convenience:I Variance is nice analytically...I Another measure of distribution width:

Mean average deviation (MAD) = 〈|x − 〈x〉|〉

I For a pure power law with 2 < γ < 3:

〈|x − 〈x〉|〉 is finite.

I But MAD is mildly unpleasant analytically...I We still speak of infinite ‘width’ if γ < 3.

Insert question from assignment 2 ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

30 of 43

How sample sizes grow...

Given P(x) ∼ cx−γ:I We can show that after n samples, we expect the

largest sample to be

x1 & c′n1/(γ−1)

I Sampling from a finite-variance distribution gives amuch slower growth with n.

I e.g., for P(x) = λe−λx , we find

x1 &1λ

ln n.

Insert question from assignment 2 ()

Page 6: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

31 of 43

Complementary Cumulative Distribution Function:

CCDF:I

P≥(x) = P(x ′ ≥ x) = 1− P(x ′ < x)

I

=

∫ ∞x ′=x

P(x ′)dx ′

I

∝∫ ∞

x ′=x(x ′)−γdx ′

I

=1

−γ + 1(x ′)−γ+1

∣∣∣∣∞x ′=x

I

∝ x−γ+1

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

32 of 43

Complementary Cumulative Distribution Function:

CCDF:I

P≥(x) ∝ x−γ+1

I Use when tail of P follows a power law.I Increases exponent by one.I Useful in cleaning up data.

PDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

Nq

CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

33 of 43

Complementary Cumulative Distribution Function:

I Discrete variables:

P≥(k) = P(k ′ ≥ k)

=∞∑

k ′=k

P(k)

∝ k−γ+1

I Use integrals to approximate sums.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

34 of 43

Zipfian rank-frequency plots

George Kingsley Zipf:I Noted various rank distributions

have power-law tails, often with exponent -1(word frequency, city sizes...)

I Zipf’s 1949 Magnum Opus ():

I We’ll study Zipf’s law in depth...

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

35 of 43

Zipfian rank-frequency plots

Zipf’s way:I Given a collection of entities, rank them by size,

largest to smallest.I xr = the size of the r th ranked entity.I r = 1 corresponds to the largest size.I Example: x1 could be the frequency of occurrence of

the most common word in a text.I Zipf’s observation:

xr ∝ r−α

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

36 of 43

Size distributions:

Brown Corpus (1,015,945 words):

CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Zipf:

0 1 2 3 4−3

−2

−1

0

1

log10

rank i

log 10

qi

I The, of, and, to, a, ... = ‘objects’I ‘Size’ = word frequencyI Beep: (Important) CCDF and Zipf plots are related...

Page 7: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

37 of 43

Size distributions:

Brown Corpus (1,015,945 words):CCDF:

−3 −2 −1 0 10

1

2

3

4

log10

q

log 10

N>

q

Zipf:

−3 −2 −1 0 10

1

2

3

4

log10

qi

log 10

ran

k i

I The, of, and, to, a, ... = ‘objects’I ‘Size’ = word frequencyI Beep: (Important) CCDF and Zipf plots are related...

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

38 of 43

Observe:I NP≥(x) = the number of objects with size at least x

where N = total number of objects.I If an object has size xr , then NP≥(xr ) is its rank r .I So

xr ∝ r−α = (NP≥(xr ))−α

∝ x (−γ+1)(−α)r since P≥(x) ∼ x−γ+1.

We therefore have 1 = (−γ + 1)(−α) or:

α =1

γ − 1

I A rank distribution exponent of α = 1 corresponds toa size distribution exponent γ = 2.

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

39 of 43

The Don. ()Extreme deviations in test cricket:

1000 10 20 30 9040 50 60 70 80

I Don Bradman’s batting average ()= 166% next best.

I That’s pretty solid.I Later in the course: Understanding success—

is the Mona Lisa like Don Bradman?

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

40 of 43

References I

[1]

[2] P. Bak, K. Christensen, L. Danon, and T. Scanlon.Unified scaling law for earthquakes.Phys. Rev. Lett., 88:178501, 2002. pdf ()

[3] A.-L. Barabási and R. Albert.Emergence of scaling in random networks.Science, 286:509–511, 1999. pdf ()

[4] K. Christensen, L. Danon, T. Scanlon, and P. Bak.Unified scaling law for earthquakes.Proc. Natl. Acad. Sci., 99:2509–2513, 2002. pdf ()

[5] P. Grassberger.Critical behaviour of the Drossel-Schwabl forest firemodel.New Journal of Physics, 4:17.1–17.15, 2002. pdf ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

41 of 43

References II

[6] B. Gutenberg and C. F. Richter.Earthquake magnitude, intensity, energy, andacceleration.Bull. Seism. Soc. Am., 499:105–145, 1942. pdf ()

[7] M. E. J. Newman.Power laws, pareto doistributions and zipf’s law.Contemporary Physics, pages 323–351, 2005.pdf ()

[8] M. I. Norton and D. Ariely.Building a better America—One wealth quintile at atime.Perspectives on Psychological Science, 6:9–12,2011. pdf ()

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

42 of 43

References III

[9] D. J. d. S. Price.Networks of scientific papers.Science, 149:510–515, 1965. pdf ()

[10] D. J. d. S. Price.A general theory of bibliometric and other cumulativeadvantage processes.J. Amer. Soc. Inform. Sci., 27:292–306, 1976.

[11] L. F. Richardson.Variation of the frequency of fatal quarrels withmagnitude.J. Amer. Stat. Assoc., 43:523–546, 1949. pdf ()

[12] H. A. Simon.On a class of skew distribution functions.Biometrika, 42:425–440, 1955. pdf ()

Page 8: Power-Law Size Distributions - University of Vermont€¦ · Your turn—ideal: 0 20 40 60 80 100 Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

References

43 of 43

References IV

[13] N. N. Taleb.The Black Swan.Random House, New York, 2007.

[14] G. K. Zipf.Human Behaviour and the Principle of Least-Effort.Addison-Wesley, Cambridge, MA, 1949.