Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Areas of Countries and Benford’s LawModel of a Dynamical System Proposed by V. Arnold

Alex Janke1 Xiangyu Wang2

1University of MichiganAnn Arbor, MI

2Peking UniversityBeijing, China

August 12, 2011

A. Janke, X. Wang Areas of Countries and Benford’s Law

Benford’s Law

Numbers from many real-life data sets, particularly thosedominated by exponential processes, have leading digitsdistributed in a non-uniform way.Let d ∈ {1,2,3,4,5,6,7,8,9} be the leading digit of anumber. A set of numbers (or random variable) satisfiesBenford’s law if d occurs with frequency (probability) givenby the following:

P(d) = log10(d + 1)− log10(d) = log10(1 +1d

Benford’s Law

The following shows the frequency of leading digits predictedby Benford’s law.

Benford’s Law

The following shows the leading digit for the populations of 237countries. The dots denote the true Benford’s law.

Rationalizing Benford’s Law

The set {αn | n ∈ Zand log10(α) 6∈ Q} satisfies Benford’slaw. This is a consequence of the equidistribution theorem.A continuous random variable X whose logarithm’sfractional parts are uniformly distributed on [0,1) will satisfyBenford’s law.A continuous random variable X on a lognormaldistribution will satisfy Benford’s law increasingly well as itssecond moment approaches infinity.

The Dynamical System Proposed by V. Arnold

Consider N countries with areas A1, ...,AN drawn fromsome distribution such that

∑Ni=1 Ai = 1.

At each iteration 1, ...,n randomly select two countries tomerge together and one country to split into two equalparts.Experimental evidence suggests that for large N and n theareas of countries satisfy the first digit law, irrespective ofthe initial distribution.

Experimental Evidence

Let N = 1000 and the initial entries of A are all 1N . We generate

the entries of A after n = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 10000

Country Index

ctionalPart

ofLog10ofCountryAre

Experimental Evidence

Let N = 1000 and the initial entries of A are all drawn from anexponential distribution with λ = 1000 such that the sum of theentries is normalized to one. We generate the entries of A aftern = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 1000−11

Country Index

10ofCountryAre

0 100 200 300 400 500 600 700 800 900 10000

Country Index

ctionalPart

ofLog10ofCountryAre

Formalizing the Model

Let A ∈ RN be a vector such that∑N

i=1 Ai = 1. At each iteration,three distinct entries (Ai ,Aj ,Ak ) from A are randomly chosen toform V = (Ai ,Aj ,Ak )T . This vector is multiplied on the left bythe following matrix:

1 1 00 0 1

20 0 1

We are interested in the evolution of the distribution function forthe coordinates of A. Brackets on the matrix are in honor of Bif.

Density Functions

Suppose A = (x1, ..., xN)T initially. Then the initial densityfunction is given by the following formula:

f0(t) =1N

N∑i=1

δ(t − xi)

Suppose that coordinates xj1, xj2, xj3 are chosen. The newcoordinates are given by x ′ji =

∑3k=1 mikxjk . Then the density

function after a single iteration is given by the following formula:

f1(t) =1N

N∑i=1

δ(t − xi) +1N

3∑i=1

δ(t − x ′ji)−1N

3∑i=1

δ(t − xji)

Expectation of Discrete Transitions

By linearity of the Laplace transform, the expectation of themoment generating function is just the moment generatingfunction of the expected distribution.

L{f0(t)} =1N

N∑i=1

e−xi s

E [L{f1(t)}] =1

∑j1,j2,j3

N∑i=1

e−xi s+1N

3∑i=1

e−∑3

k=1 mik xjk s− 1N

e−xji s)

We then consider the expected discrete transition as thedifference E [L{f1(t)}]− L{f0(t)}. We can identify a stabledistribution for this by setting the difference equal to zero.

Stable Solution

Let L{f0(t)} = G0(s). Some algebraic magic reveals:

E [G1(s)|G0(s)]−G0(s) =1N

3∑i=1

G0(mijs)−G0(s)) [1.1]

Here we will substitute in the values from our matrix.

E [G1(s)|G0(s)]−G0(s) =1N

(G20(s) + 2G0(

)− 3G0(s)) [1.2]

Stable Solution

Suppose at each iteration we follow conditional expectation withE [G1(s)|G0(s)] = G1(s). Then the stable solution to [1.2] is:

G∞(s) =∞∑

aisi ,a0 = 1,a1 = −1,an =n−1∑i=1

aian−i

1− 21−n [1.3]

Theorem 1: G∞(s) has a positive radius of convergence.Proof: Coefficients of this power series grow no faster thanexponentially. The coefficients grow as the Catalan numbersdo. Thus the power series has a positive radius of convergence.

Continuous Approximation

We define a continuous approximation for these discretetransitions. Define G(s, t) by G(s,0) = G0(s) and t = n

N . LetN →∞ and n→∞ to define the evolution equation by thefollowing:

∂tG(s, t) = (G2(s, t) + 2G(

s2, t)− 3G(s, t)) [1.4]

Theorem 2: ‖ G(s, t)− G(s, t) ‖< CtN for some Ct > 0.

Proof: The difference is bounded like the error of the Riemannapproximation for the integral of the evolution equation.

Convergence to Stable Solution

Theorem 3: The stable solution G∞(s) [1.3] is an attractor ofthe evolution equation [1.4] with a basin containing all analyticG0(s).

Proof: G(s, t) =∑∞

n=0(an − bn(t))sn, where an is defined as inthe stable solution. It can be shown that bn(t) decreasesexponentially with t . This follows from the contractive propertiesof this mapping.

Random Discrete Transitions

Now let’s assume for each step the areas of countries changerandomly, instead of along the expectation.

E [Gn(s)|Gn−1(s)] = Gn−1(s)2 + 2Gn−1(s)− 3Gn−1(s) [2.1]

If we fix the value of s, {Gn(s)} is a series of random variables.Note that E(E [X |Y ]) = E(X ) and E(X 2) = E(X )2 + Var(X ).

E(Gn(s)) = E(Gn−1(s))2 + 2E(Gn−1(s))

−3E(Gn−1(s)) + Var(Gn−1(s)) [2.2]

Random Discrete Transitions

If this procedure really converges to a unique distribution F thathas Laplace transformation LF (s), then

Gn(s)→ LF (s) a.s. and hence E(Gn(s))→ LF (s) a.s.

Noticing LF (s) is a constant, by the Continuous Mappingtheorem, we know

Var(Gn(s))→ 0 a.s. [2.3]

Then the equation will become identical to the discreteequation [1.3] we derived previously.

E(Gn(s))→ G∞(s) a.s. and therefore LF (s) = G∞(s) a.s.

In other word, if this random procedure converges, it mustconverge to a distribution whose Laplace transformation isidentical to the stable solution we’ve got before.

Convergence

Let X (i)N,n represent the area of country i at the nth step, where

we have N countries total. Via more algebraic magic, we have:

E [X 1N,n+1|FN,n] = (1− 2

N)X (1)

N,n +2N

Let FN,n be a filtration defined by:

FN,n = σ(FN,n−1,X(1)N,n, . . . ,X

(N)N,n )

Then by the following transformation:

Z (i)N,n =

X (i)N,n − 1

(1− 2N )n

We have by [2.4] that:

E [Z (1)N,n+1|FN,n] = Z (1)

N,n [2.5]

Convergence

Fix the ratio t = n/N and set n→∞,N →∞. This implies:

E(Z (1)+N,n ) < +∞ if n = t × N

Then by the Martingale Convergence theorem, we know

Z (1)N,n

P−→ Zt and X (1)N,n

P−→ 1+Zt

e2t n→ +∞,N → +∞,n = Nt

where Zt is a random variable with finite mean. We can then

establish that the series { Zt

e2t } will converge in probability to aunique distribution Z∞ as t →∞.

Convergence

If we can show that as t →∞, the areas will tend to beindependent or weakly dependent pairwisely, then bycombining the empirical distribution given by:

FN(x) =N∑

I{X (i)N,n<x}(x)

Then, by application of the Law of Large Numbers, we canshow that:

FN(x)d−→ Z∞

This implies that the areas of countries will converge to somedistribution.

Conjecture: Final Distribution is Exponential to aPower

Our conjecture is that the final distribution function is:

F (x) = 1− e−λx1b

Where the density function is given by:

f (x) =λ

be−λx

1b−1

The coefficients of the moment generating function are:

an =Γ(nb + 1)

λnbn!n−1

The numerically approximated value for b is 1.64677.A. Janke, X. Wang Areas of Countries and Benford’s Law

Conjecture: Final Distribution is Exponential to aPower

The red line is the composed of points drawn from ourconjectured distribution. The blue line is the country areas froman experiment for n = 500000 and N = 10000 with a uniforminitial distribution.

Outlook

Can we formalize our procedure to establish weak pairwisedependence between countries for t →∞?Can it be shown that this limiting distribution is in factexponential with the parameter we estimated numerically?

Areas of Countries and Benford’s Law

Documents

Transcript of Areas of Countries and Benford’s Law

LNAI 4013 - Adaptive Fraud Detection Using Benford’s Lawbit.csc.lsu.edu/~jianhua/petrov.pdf · Adaptive Fraud Detection Using Benford’s Law 349 difﬁcult to compare new fraud

Benford’s law The Amusing Law of Benford

Benford’s Law and irrational rotation

Benford’s Law and Number Selection in Fixed-Odds

Election Forensics: The Second-digit Benford’s Law Test ...wmebane/fraud06.pdf · Abstract Election Forensics: The Second-digit Benford’s Law Test and Recent American Presidential

A Bibliography of Publications about Benford’s Law, Heaps’ Law, …ftp.math.utah.edu/pub/tex/bib/benfords-law.pdf · 2020. 6. 17. · A Bibliography of Publications about Benford’s

Benford’s Very Strange Law John D. Barrow

Transport in Rural Areas of Developing Countries ...

Benford’s law and Chinese banks’ non-performing loans

A Guide to Benford’s Law

How to Spot Bad Data Benford’s Law

Detecting Fraud Using Benford’s Law

BENFORD’S LAW. History What is Benford’s Law Types of Data That Conform Uses in Fraud Investigations Examples Other uses of Benford’s Law.

Benford’s law in the Gaia universe

The surprising accuracy of Benford’s law in mathematics

Benford’s Law formulas Benford’s Law research The basic digit tests Access steps Prepared by: Mark J. Nigrini Copyright © 2012 by Mark J. Nigrini. All.

Uncovering Financial Shenanigans: Benford’s Law as a ...

Digit Analysis Using Benford’s Law: A Bayesian Approach

BENFORD’S LAW, FAMILIES OF DISTRIBUTIONS AND A …johnmorrow.info/pdf/benford.pdf · BENFORD’S LAW, FAMILIES OF DISTRIBUTIONS AND A TEST BASIS John Morrow CEP, London School of

Benford’s Law and Property Appraisals for Private label ... · 1 Benford’s Law and Property Appraisals for Private-label Mortgages 1. Introduction A mathematical property, which