Fictionalized Historical Account of Statistical Inference Material developed largely for...

17
Fictionalized Historical Account of Statistical Inference • Material developed largely for non-statistical audience • Might seem a bit “under technical” • But it can seem very strange at first • Feedback would be appreciated

description

Stealing an anecdote from physics Isaac Newton sitting under a tree When an apple fell on his head Prompting him to ponder what makes things fall Leading to the development of his theory of gravity Often loosely described as “discovering” gravity. I’ll use a statistical version using Charles Darwin’s cousin Francis Galton and a orange tree (More almost true)

Transcript of Fictionalized Historical Account of Statistical Inference Material developed largely for...

Page 1: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Fictionalized Historical Account of Statistical Inference

• Material developed largely for non-statistical audience

• Might seem a bit “under technical”• But it can seem very strange at first• Feedback would be appreciated

Page 2: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Statistical logic?(A non-technical introduction)

• It is how to make well reasoned guesses – from what is known – about something that is not known– along with an explicit assessment of how good or

valuable those reasoned guesses are

• Need not be technical or overly mathematical

Page 3: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Stealing an anecdote from physics

• Isaac Newton sitting under a tree• When an apple fell on his head• Prompting him to ponder what makes things fall• Leading to the development of his theory of gravity• Often loosely described as “discovering” gravity.• I’ll use a statistical version using Charles Darwin’s

cousin Francis Galton and a orange tree• (More almost true)

Page 4: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Stealing an anecdote from physics

• Francis Galton sitting under an orange tree• When an orange fell on his head• Prompting him to ponder what would be a

well reasoned guess at where it came from• Not exactly where, but say within a foot?• Can actually be described as him discovering

statistical logic to get his well reasoned guess

Page 5: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Best picture of a orange tree I could grab from web

Page 6: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Physics of an orange tree

• Look at the drawing of an orange tree – notice the spread or distribution of the oranges – and the many branches that slope up (or down)

• Where an orange lands depends on – where it started to fall from and – how it haphazardly bounces through the branches

• (We will forget about the wind, birds, squirrels and farm tractor mishaps)

Page 7: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Quick review

• What is known?– where Galton sitting when he got hit

• What is not known?– where that orange fell from

• What did Galton want a reasoned guess of?– where that orange fell from

• What do you think he thought to do?

Page 8: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

What Galton did

• Galton put a drum where he sat• Then stood there watching carefully (along

with his seven gardeners)• Carefully noting where oranges fell from• He recorded that orange’s initial position -

only if he heard that orange hit the drum • (Not really, but something very similar that we will see later)

Page 9: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Noting where other oranges came from that would have hit him

• By doing this watching, listening and noting, he determined for sure where many other oranges came from that would have hit him

• He then took the area around of all those as likely locations for the orange that did hit him

• That is, he determined the subset of the initial locations that would have hit him and guessed the orange that did hit came from same area

Page 10: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

A good reasoned guess

• Galton guessed that the orange that actually hit him was likely from around the same area as that subset that would have hit him

• And more likely in the middle that than the edges of that – more crowded in the middle

• Why might that be a well reasoned guess?• How would you evaluate its “goodness”?

Page 11: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Assessing goodness of a guess

• Lets say, you take ninety percent of the inner section of that distribution of orange locations

• That is throw away the outer ten percent and take area around what remains as your guess

• How good (in what sense) would that be?

Page 12: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

It would be 90% correct

• If 100 blindfolded people sit under the tree• In different positions and they waited until

they were hit on the head and they all did that tedious Galton guessing using the area found

• 90 areas would include where the hitting orange came form and 10 would not

• Well that is what most usually would happen• (Can repeat this over 1000 similar trees)

Page 13: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Identifying today’s statistical jargon

• The initial distribution of the positions of the oranges – in Bayesian statistics is called the prior distribution

• The haphazard bouncing of oranges through branches– is called the data generation model or data model

• Reasoning about the original position of the orange given where one was hit on the head by it– in Bayesian statistics is called getting and using the

posterior distribution (post-data)

Page 14: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

The Bayesian intervals 90% correcton AVERAGE

Only when it is averaged over all these initial locations, that it will 90% of areas would be correct and 10% incorrect

• In the evaluation above, the location the original orange fell from and hit someone is randomly selected from the prior

• If we fix that location, percent correct varies• For instance, near the middle of the tree it might

be 97% at the edge of the tree it might be 50%• With more in the middle it averages to 90%

Page 15: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

An annotated picture from Galton, F. (1889) Natural Inheritance.

Page 16: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

Historical perspective

• Reference: Stigler, Stephen M. 2010. Darwin, Galton and the statistical enlightenment. Journal of the Royal Statistical Society: Series A 173(3):469-482.

Page 17: Fictionalized Historical Account of Statistical Inference Material developed largely for non-statistical audience Might seem a bit “under technical” But.

The only statistical formulas that are immediately “helpful” here

• We get p(a) and p(a given x) from simulation• From the definition of conditional probability• p(a given x) = p(a and x)/p(x) • = p(x) * p(a given x)/p(x)• = p(a) * p(x given a)/p(x)• Interest: How did p(a) change to p(a given x)?• Ratio is p(a given x)/p(a) = p(x given a)/p(x)