Academy of Management Montreal, 6 August 2010 Empirical Exploration of Complexity in Human Systems:...

Academy of Management

Montreal , 6 August 2010

Empirical Exploration of

Complexity in Human Systems:

Data Collection & Interpretation Techniques

Power law statistics and Pareto Science

Pierpaolo Andriani

Durham Business School, University of Durham, UK

sampling & inferenceLIKELIHOOD distribution:

PROB (data given population)

INFERENTIAL distribution:

PROB (population given data)

sample

sample mean

sample variance

etc.

sample size

population

mean

variance

etc.

size is infinite

infer a value

take a sampleStatistical inference: Drawing conclusions

about the whole population on the basis of

a sample

Precondition for statistical inference:

A sample is randomly selected from the

population (=probability sample)

Representative agent links

population to sample level and

allows reduction of population

complexity to single agent

complexity

From Starbuck: The production of knowledge (2006)

• Consensus favoring use of null-hypothesis significance tests affords a clear example of paradigm stability. Although methodologists have been trying to discourage the use of these tests since the 1950s, the tests have remained very prevalent, and there is no sign that social scientists are shifting to other criteria. …. Hubbard and Ryan (2000: 678) concluded: ‘It seems inconceivable to admit that a methodology as bereft of value as SST (statistical significance tests) has survived, as the centerpiece of inductive inference no less, more than four decades of criticism in the psychological literature’.

p. 77

Starbucks: The production of knowledge (2006)


• Choosing two variables utterly at random, a researcher has 2-to-1 odds of finding a significant correlation on the first try, and 24-to-1 odds of finding a significant correlation within three tries. … the main inference I drew from these statistics was that the social sciences are drowning in statistically significant but meaningless noise. Because the differences and correlations that social scientists test have distributions quite different from those assumed in hypothesis test, social scientists are using tests that assign statistical significance to confounding background relationships. Because social scientists equate statistical significance with meaningful relationships, they often mistake confounding background relationships for theoretically important information. One result is that social science research creates a cloud of statistically significant differences and correlations that not only have no real meaning but also impede scientific progress by obscuring the truly meaningful relationships.

p. 49


• I began to think of statistical tests as arcane rituals that demonstrate membership in an esoteric subculture

p. 18

Gaussian Paretian

A tale of two worlds

World

Statistics Bell distribution

(finite variance distributions)

Pareto

(infinite variance)

Relations btw UoA Independence (or weak interdepencdence) Interdependence

Linear science (principle of superposition)Scientific ‘approach’ Non-linear science

Phenomena have proper scaleScaling property Phenomena are fractal

‘Things’, entities Relations Unit of analysis

Property of world Closure Openness

Parmenides, Plato, NewtonPhilosophical origin Eraclitus, Aristotile

LimitedVariability Unbounded

Gaussian Paretian

Bell curve distribution

of node linkages

Exponential Network

Power-law distribution

of node linkages

Scale-free Network

Num

ber

of n

odes

Number of links

Typical node

No large number

Num

ber

of n

odes

Number of links

Num

ber

of n

odes

(lo

g sc

ale)

Number of links (log scale)

From Barabasi/Bonabeau, Scientific American, May 2003

ignores or downplays

extreme events on the right

hand side of the distribution

but also ignores or

downplays tiny initiating

events on the left hand side

of the distribution

By assuming finite variability and compressing data around mean/variance,the

Gaussian approach

http://www.zazzle.com/statisticians_do_it_within_3_standard_deviations_tshirt-235087605979353103

Rationality, stock market and the butterfly effect

Growth-related power laws - ratio imbalances

1Surface /

volume Law

Organisms; villages: In organisms, surfaces absorbing energy grow by the square but the organism grows by the volume, resulting in an imbalance (Galileo 1638, Carneiro 1987); fractals emerge to bring surface/volume back into balance. West and Brown (1997) show that several phenomena in biology such as metabolic rate, height of trees, life span, etc. are described by allometric power law whose exponent is a multiple of ±¼. The cause is a fractal distribution of resources. Allometric power laws hold across 27 orders of magnitude (of mass).

2Least effort

Language; transition: Word frequency is a function of ease of usage by both speaker/writer and listener/reader; this gives rise to Zipf’s (power) Law (1949); now found to apply to language, firms, and economies in transition (Ferrer i Cancho & Solé, 2003; Dahui et al., 2005; Ishikawa, 2005; Podobnik et al., 2006).

3Hierarchical modularity

Growth unit connectivity: As cell fission occurs by the square, connectivity increases by n(n–1)/2, producing an imbalance between the gains from fission vs. the cost of maintaining connectivity; consequently organisms form modules so as to reduce the cost of connectivity; Simon argued that adaptive advantage goes to “nearly decomposable” systems (Simon, 1962; Bykoski, 2003). Complex adaptive systems: Heterogeneous agents seeking out other agents to copy/learn from so as to improve fitness generate networks; there is some probability of positive feedback such that some networks become groups, some groups form larger groups & hierarchies (Kauffman, 1969, 1993; Holland, 1995).

Combinations

4Interactive Breakage

theory

Wealth; mass extinctions/explosions: A few independent elements having multiplicative effects produce lognormals; if the elements become interactive with positive feedback loops materializing, a power law results; based on Kolmogorov’s “breakage theory” of wealth creation (1941).

5Combination

theory

# of exponentials; complexity: Multiple exponential or lognormal distributions or increased complexity of components (subtasks, processes) sets up, which results in a power law distribution (Mandelbrot, 1963; West & Deering, 1995; Newman, 2005).

6Interacting

fractals

Food web; firm & industry size, heartbeats: The fractal structure of a species is based on the food web (Pimm, 1982), which is a function of the fractal structure of predators and niche resources (Preston 1950; Halloy, 1998; Solé & Alonso, 1998; Camacho & Solé, 2001; Kostylev & Erlandsson, 2001, West, 2006).

Positive feedback loops

7Preferential attachment

Nodes; gravitational attraction: Given newly arriving agents into a system, larger nodes with an enhanced propensity to attract agents will become disproportionately even larger, resulting in the power law signature (Yule, 1925; Young, 1928; Arthur, 1988; Barabási, 2000).

8Irregularity generated gradients

Coral growth; blockages: Starting with a random, insignificant irregularity, coupled with positive feedback, the initial irregularity increases its effect. This explains the growth of coral reefs, blockages changing the course of rivers, (Juarrero, 1999; Turner, 2000; Barabási, 2005). Diffusion limited accretion (DLA). See also “niche constructionism” in biology (Odling-Smee, 2003)

Contextual effects

9Phase

transitions

Turbulent flows: Exogenous energy impositions cause autocatalytic, interaction effects and percolation transitions at a specific energy level—the 1st critical value—such that new interaction groupings form with a Pareto distribution (Bénard, 1901; Prigogine, 1955; Stauffer, 1985; Newman, 2005).

10Self-

organized criticality

Sandpiles; forests; heartbeats: Under constant tension of some kind (gravity, ecological balance, delivery of oxygen), some systems reach a critical state where they maintain stasis by preservative behaviors—such as sand avalanches, forest fires, changing heartbeat rate—which vary in size of effect according to a power law (Bak et al., 1987; Drossel & Schwabl, 1992; Bak, 1996).

11Niche

proliferation

Markets: When production, distribution, and search become cheap and easily available, markets develop a long tail of proliferating niches containing fewer customers; they become Paretian with mass-market products at one end and a long tail of niches at the other (Anderson, 2006).

Gaussian – heights of individualsTallest man (Robert Pershing Wadlow) 272 cm

Shortest man (He Pingping) 74 cm

Ratio: = 3.7

Source : Lada Adamic - http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

Source: Bak (1996) “How Nature Works”

Krugman on the Zipf law:

“we are unused to seeing regularities this

exact in economics – it is so exact that I find

it spooky” (1996) p.40

Largest city (Mumbai) population 13,922,125

Smallest city (Hum, Croatia ) pop. 23

Ratio: = 605310

Paretian: city size

Hum, CroatiaMumbai, India

Two tails of a power law

Casti _126

Find gutemberg

Ricther-Gutenberg Law

Earthquake magnitude (mb

) ~ Log E

Nc

(Ear

thqu

akes

/Yea

r)

Extreme events tail

Small events tail

Main properties of Paretian distributions

• Moments: Pr[X ≥ x] = k*x-α

• Largest value: • maximum value depends on size of sample• highly skewed distribution (80/20 Rule)

• Scaling property:p(bx) = g(b)p(x) for any b

Moments of distributions

• 3rd: Skewness

• 4th: Kurtosis

number of AOL visitors to

other websites in 1997*

* Lada Adamic, Zipf, Power-laws, and Pareto - a ranking tutorial, http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

1st: average

– Representative?

– Stable?

2nd: variance

– Finite or unbounded?

– Stable?

http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

Largest value

• Financial markets

Central limit theorem doesn’t apply. No convergence to the mean, no central tendency.

The world shows an unlimited and irreducible stock of surprises!

Scalability

Scalability in financial markets

Traditional statistics assume bell-shaped distribution, with

typical scale (mean) and rapidly decaying tails Power-law distributions show no mean (scale-free) and exhibit

long fat tails (infinite variance). A PL explores the maximum

dynamic range of diversity of the variable, limited only by size of

network and agent.

Neo-classical economics and equilibrium-based management

theories assume normal distributions and descriptive/behavioral

parameters gathering around means. Extreme events are very rare

and therefore negligible

Extreme events are more frequent and their magnitude is

disproportionately bigger than in the bell distribution case.

Which approach to statistics?

Challenge: manage the population

– How: reduce population to the representative agent and define

variance (of population)

– Manage around mean and variance

In a Paretian world:In a Gaussian world:

Challenge: manage the frontier

– Identify outliers and manage the tails (together with the bulk) of

the distribution

– Manage the tails

Scale-free theories

– The growth of most systems follows a set of scaling trends that

link tiny initiating events with more significant or even extreme

outcomes.

Change: gradualism

– EEs are exceedingly rare and can be treated as perturbation

(system restores equilibrium after transient)

Change: extreme events

– EEs arise in the tails and determine the industry next structure

Scale-free theories

– Don’t exist in Gaussian systems

The danger of averages

Thank you

Any questions?

Academy of Management Montreal, 6 August 2010 Empirical Exploration of Complexity in Human Systems:...

Documents

Transcript of Academy of Management Montreal, 6 August 2010 Empirical Exploration of Complexity in Human Systems:...