Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf ·...

23
Applied Bayesian Statistics STAT 388/488 Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 29, 2017 1 Applied Bayesian Statistics Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Transcript of Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf ·...

Page 1: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Applied Bayesian StatisticsSTAT 388/488

Dr. Earvin Balderama

Department of Mathematics & StatisticsLoyola University Chicago

August 29, 2017

1Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 2: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Course Info

STAT 388/488 – Applied Bayesian Statisticshttp://math.luc.edu/~ebalderama/bayes

2Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 3: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

A motivating example(See http:

//math.luc.edu/~ebalderama/bayes_resources/handouts/Eddy_What_is_Bayesian.pdf)

Alice and Bob play a casino game; first player to 6 points wins.1 Before the game starts, the casino rolls a ball randomly onto a pool table

(that Alice and Bob can’t see) until it comes to a complete stop.It’s position is marked and remains for the duration of the game.

2 Each point is awarded based on another ball being rolled randomly ontothe table:

If the ball stops to the left of the initial mark, Alice is awarded the point.If the ball stops to the right of the initial mark, Bob is awarded the point.

Alice and Bob are told nothing except who is awarded each point.

3Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 4: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

First, some questionsLet θ be the probability that Alice is awarded a point.Before the game starts,

1 What’s your best guess about θ?

2 What’s the probability that θ is greater than a half?

Suppose the game is being played, and the score is now Alice 5, Bob 3.1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

4Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 5: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

First, some questionsLet θ be the probability that Alice is awarded a point.Before the game starts,

1 What’s your best guess about θ?2 What’s the probability that θ is greater than a half?

Suppose the game is being played, and the score is now Alice 5, Bob 3.1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

4Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 6: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

First, some questionsLet θ be the probability that Alice is awarded a point.Before the game starts,

1 What’s your best guess about θ?2 What’s the probability that θ is greater than a half?

Suppose the game is being played, and the score is now Alice 5, Bob 3.1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

4Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 7: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Frequentist approachThe Frequentist approach requires the (theoretical) notion of long-runfrequency distributions: Quantifying uncertainty in terms of repeating thesampling process many times.

The parameters are fixed and unknown.The sample (data) is random.Probability statements are only made about the randomness in the data.

5Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 8: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Frequentist approach

Sample statistic

A statistic is a numerical summary of a sample. For example, X is a statistic,and is an estimator of the population mean µ.

Here, one would never say “P(µ > 0) = 0.50”.

Sampling distribution

The distribution of a sample statistic that arises from repeating the processthat generated the data many times.

Here, one would never say “the distribution of µ is Normal(5.3,0.8)”.

6Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 9: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Frequentist approach

95% confidence intervalAn interval constructed from the data that should contain the true parametervalue 95% of the time if we repeated the process that generated the datamany times and computed an interval each time.

Here, one would never say “the probability that µ is in the interval(4.2,5.6) is 0.95”.

p-value

Probability of observing a test statistic at least as extreme as observed in thesample if we repeated the process that generated the data many times.

Here, one would never say “the probability that H0 is true is 0.027”.

7Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 10: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Frequentist approachExamples of repeatable data generation:

Sometimes it’s hard to imagine repeating the data generation:

8Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 11: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Some debate about the merits of the p-value

http://www.nature.com/news/scientific-method-statistical-errors-1.14700

http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/

http://fivethirtyeight.com/features/science-isnt-broken/

http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/

http://www.tandfonline.com/doi/pdf/10.1080/01973533.2015.1012991

9Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 12: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

What are Frequentist answers to these questions?Before the game starts,

1 What’s your best guess about θ?2 What’s the probability that θ is greater than a half?

After collecting observations,1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

Bonus question,1 What’s Bob’s probability of winning?

10Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 13: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

What are Frequentist answers to these questions?Before the game starts,

1 What’s your best guess about θ?2 What’s the probability that θ is greater than a half?

After collecting observations,1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

Bonus question,1 What’s Bob’s probability of winning?

10Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 14: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

What are Frequentist answers to these questions?Before the game starts,

1 What’s your best guess about θ?2 What’s the probability that θ is greater than a half?

After collecting observations,1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

Bonus question,1 What’s Bob’s probability of winning?

10Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 15: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Bayesian approachThe Bayesian approach consists of finding the most credible values of aparameter, conditional on the data:Uncertainty is described using probability distributions that are updatedas data is observed.

The true parameter values are fixed and unknown, but their uncertainty isdescribed probabilistically and so are treated as random variables.The sample (data) is considered fixed.Probability statements express degree of belief and uncertainty in theunknown parameters.

11Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 16: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Bayesian learning

Prior distribution, f (θ)

The uncertainty distribution of θ, before observing the data.

Posterior distribution, f (θ |y)The uncertainty distribution of θ, after observing the data.

Bayes’ Rule

Provides the rule for updating the prior:

f (θ |y) = f (y |θ)f (θ)f (y)

Posterior ∝ Likelihood × Prior

12Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 17: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Bayesian learning

Likelihood function, f (y |θ)Distribution of the data given θ.

This function is created by choosing a reasonable probability model forthe data, then writing the “probability of the data” under this model.Regarded as a function of the model’s parameters (Remember, the datais considered fixed!).

Bayes’ Rule

Provides the rule for updating the prior:

f (θ |y) = f (y |θ)f (θ)f (y)

Posterior ∝ Likelihood × Prior

13Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 18: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Back to exampleThe probability of Alice being awarded a point is a random variable θ ∈ [0,1]

Usually, we form a prior by assigning (varying levels of) probabilitiesacross all possible values of θ.If we have no relevant prior information we might use anuninformative prior such as

θ ∼ Uniform(0,1)

The likelihood may be

y |θ ∼ Binomial(n, θ)

The posterior then turns out to be

θ |y ∼ Beta(y + 1,n − y + 1)

14Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 19: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Specifying a Beta prior distribution

A more flexible prior is θ ∼ Beta(a,b), where a and b control the shape.When a = b = 1, this specifies the uniform prior.The posterior then turns out to be

θ |y ∼ Beta(y + a,n − y + b).

E(θ |y) = y + an + a + b

, V(θ |y) = (y + a)(n − y + b)(n + a + b)2(n + a + b + 1)

A prior is conjugate with respect to the likelihood if the posteriordistribution is in the same family as the prior.Thus, the Beta prior is a conjugate prior for the Binomial likelihood.

15Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 20: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Back to exampleThe score is Alice 5, Bob 3.

1 What’s your best guess about θ now?2 What’s the probability that θ is greater than a half now?

Bonus question,1 What’s Bob’s expected probability of winning?

16Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 21: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Advantages of Bayesian approach

Bayesian concepts (arguably) easier to interpret than frequentist ideas.Able to incorporate scientific/expert knowledge via the prior.In some cases the computing is easier (hierarchical models).Easy to incorporate data from multiple sources.Sample size reduction via prior or adaptive trial design.Imputing missing data comes naturally.FDA document on the use of Bayesian methods: http://www.fda.gov/RegulatoryInformation/Guidances/ucm071072.htm

17Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 22: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Disadvantages of Bayesian approach

Picking a prior can be subjective.Slow computation time for complex problems.Less common/familiar.Nonparametric methods are challenging.Frequentist properties are desirable.

18Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>

Page 23: Applied Bayesian Statisticswebpages.math.luc.edu/~ebalderama/bayes_resources/slides/intro.pdf · Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago

Introduction

Frequentist vs Bayesian

19Applied Bayesian Statistics

Last edited August 21, 2017 by Earvin Balderama <[email protected]>