Hypothesis Testing The Bayesian Way - Purdue University

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Hypothesis Testing The Bayesian Way

Prof. Nicholas Zabaras

Materials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering

101 Frank H. T. Rhodes Hall

Cornell University

Ithaca, NY 14853-3801

Email: [email protected]

URL: http://mpdc.mae.cornell.edu/

December 27, 2013

1

mailto:[email protected]

http://mpdc.mae.cornell.edu/


Introduction to Bayesian Statistics

2

Reverend Thomas

Bayes

(ca. 1702–1761)

Sole probability paper,

“Essay Towards Solving a

Problem in the Doctrine of

Chances”,

published posthumously in 1763

http://en.wikipedia.org/wiki/Thomas_Bayes



http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/1764-bayes.pdf




References

3

C P Robert, The Bayesian Choice: From Decision-Theoretic Motivations to

Compulational Implementation, Springer-Verlag, NY, 2001 (online resource)

A Gelman, JB Carlin, HS Stern and DB Rubin, Bayesian Data Analysis,

Chapman and Hall CRC Press, 2nd Edition, 2003.

J M Marin and C P Robert, The Bayesian Core, Spring Verlag, 2007 (online

resource)

D. Sivia and J Skilling, Data Analysis: A Bayesian Tutorial, Oxford University

Press, 2006.

Bayesian Statistics for Engineering, Online Course at Georgia Tech, B.

Vidakovic.

Additional References with links are provided in the lecture slides.

http://www.ceremade.dauphine.fr/~xian/books.html






http://link.springer.com/book/10.1007/0-387-71599-1/page/1

http://www.amazon.com/Bayesian-Analysis-Edition-Chapman-Statistical/dp/158488388X

http://www.ceremade.dauphine.fr/~xian/BCS/

http://link.springer.com/book/10.1007/978-0-387-38983-7/page/1

http://link.springer.com/book/10.1007/978-0-387-38983-7/page/1

http://www.amazon.com/Data-Analysis-Bayesian-Devinderjit-Sivia/dp/0198568320

http://www2.isye.gatech.edu/~brani/isyebayes/


Hypothesis testing the Bayesian way

Examples of Parametric Bayesian Models

Examples of Bayesian Prediction

Sequential nature of Bayesian inference, a Gaussian example

Bayes vs. MLE (limit of large data sets)

Example: Bayes and the Poisson model

Hypothesis Testing in a Bayesian framework (integration in parameter space)

Summary: Bayesian point estimates

Contents

4


Hypothesis Testing the Bayesian Way

5

Consider two hypotheses in coin flipping:

Coin is fair with prior p(h1)

Coin always produces tails with prior p(h2)

We flip the coin 5 times and obtain data x = {HTHTT}

Inferencing: We want assess the validity of the two hypotheses

Likelihood f(x | hi ):

Posterior:

Here there is no effect of the prior

1 1 1

2 2 2

( | ) ( | ) ( )

( | ) ( | ) ( )

h f h h

h f h h

p p

p p

x x

x x

1 25

1( | ) , ( | ) 0

2f h f h x x


Hypothesis Testing the Bayesian Way

6

Consider two hypothesis in coin flipping:

Coin is fair with prior p(h1)

Coin always produces tails with prior p(h2)

We flip the coin 5 times and obtain data x = {TTTTT}

Inferencing: We want to assess the validity of the two hypotheses

Likelihood f(x | hi ):

Posterior

The data (evidence) points to `tails’ but the posterior inferences

also depend strongly on the priors!

1 1 1 1

2 2 2 2

( | ) ( | ) ( ) ( )1

( | ) ( | ) ( ) 32 ( )

h f h h h

h f h h h

p p p

p p p

x x

x x

1 25

1( | ) , ( | ) 1

2f h f h x x


Parametric Bayesian Models

7

Let q the probability that the coin will draw heads

Let p(q) be the prior for here taken as:

Consider that we have the data x = {HTHTT}. We want to make an

inference about q?

Likelihood:

Posterior:

2 3( | ) (1 ) ( )f binomialq q q x

0,1q

[0,1]( ) ( ) ( )p q q uniform1

2 3

[0,1]

( | ) ( )( | )

( )

(1 )( ) (3,4)

(3,4)

q p qp q

q qq

f

m

betaB

xx

x

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

1

2

3

4

5

6

x

pdf

2H - 3T

20H - 30T

0H - 5T

Click here for a MatLab

implementation

http://mpdc.mae.cornell.edu/Courses/MAE714/Software/plotposterior.m



8


Suppose that B(a,b) is now our prior distribution:

We are given data x = {HTHTT}

Likelihood:

Posterior:

2 3( | ) (1 ) ( )f binomialq q q x

1 11( ) (1 )

( , )

a b

beta a bp q q q

2 1 3 1

[0,1]

( | ) ( )( | )

( )

(1 )( ) ( 2, 3)

( 2, 3)

q p qp q

q qq

a b

f

m

a bbeta a b

B

xx

x

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

5

10

15

20

25

30

x

pdf

a=1,b=1

a=100,b=100

a=100,b=1

Click here for a MatLab

implementation

http://mpdc.mae.cornell.edu/Courses/MAE714/Software/plotBetapdf.m



9

Data given: The coin is flipped n times and nH of those came to be heads

Prior: We consider the Beta prior B(a,b) as in the earlier slide.

Posterior:

Posterior mean:

1 1

( | ) ( )( | )

( )

(1 )( , )

( , )

q p qp q

q q

H Ha n b n n

H H

H H

f

m

a n b n nbeta a n b n n

B

xx

x

[ | ] Ha n

a b nq

x

, [ | ] Hnn

nq x

2

[ | ]

1

H H

Var

a n b n n

a b n a b n

q

x

[ | ] 0 (1/ )Var as O nq x

Note that:

Posterior variance:

http://en.wikipedia.org/wiki/Beta_distribution

http://en.wikipedia.org/wiki/Beta_distribution


Prediction

10

Suppose we have observed x and we want to make a prediction about

(future) unknown observables: What is the probability of observing data

If we already have observed data x?

This means finding

We have:

Compare this with the normalizing factor:

( | )g x x

( )( )

( , , ) ( , , ) ( , )( | ) ( , | )

( ) ( , ) ( )

( | , ) ( | ) ( | ) ( | )

p q p q qq q q q

q

q p q q q p q q

posteriorlikelihood

g g d d dm m

f d f d

xx

x x x x xx x x x

x x x

x x x x x

x

Pr( )

( ) ( | ) ( )q p q q iorLikelihood

m f d

x

x x


Prediction: Example

11

Consider the coin flipping example

Let q the probability that the coin draws heads

Consider p(q) a uniform prior for q

Given data x = {HTHTT}

We have seen that the posterior is B(3, 4)

What is the probability that the next draw will be heads?

x

2 3(1 ) 3( | ) ( | ) ( | )

(3,4) 7g H f d d

beta

q qq p q q q q

x x x x


Prediction

12

Consider the coin flipping example


Consider p(q) a uniform prior for q

Given data x = {HTHTT}

We have seen that the posterior is B (3, 4)

What is the probability that the next 5 draws will be all heads?

x

2 35

( | ) ( | ) ( | )

(1 ) 1

(3,4) 22

g HHHHH g d

dbeta

q p q q

q qq q

x x x x


The Bayesian Analysis is Sequential

13

Using some given data x, we computed the posterior p(q| x).

If new data x* arrives, how can we update our inference?

We assume that x and x* are conditionally independent on q i.e.:

The augmented posterior (based on both x and x*) is:

Note that the prior now is our old posterior computed with data x.

Thus Bayesian analysis is sequential.

( , * | ) ( ) ( * | ) ( | ) ( ) ( * | ) ( | )( | , *)

( , *) ( *) ( ) ( *)

p q p q q q p q q p qp q

f f f

m m m

x x x x x xx x

x x x x x

( , *| ) ( | ) ( *| )f f fq q qx x x x

( * | ) ( | )( | , *)

( *)

f

m

q p qp q

x xx x

x


Sequential Nature of Bayesian Inference

14

Assume we have observed and computed the

corresponding posterior. Now we observe a second realization

We are interested to update our posterior:

Updating the prior one observation at a time or all observations

together does not matter

The sequential approach is useful for massive data sets, e.g.

i.e. the prior at time n is the posterior at time n-1.

2

2 2 | ~ ( , ).q q x of X N

1 2 2 1 1 2 1

2 1

1 2

( | , ) ( | , ) ( | ) ( | ) ( | ) ( )

( | ) ( | )

( | ) ( | )

x x f x x x f x f x

f x x

f x x

p q q p q q q p q

q p q

q p q

1 2 1 2 1( | , ,..., ) ( | ) ( | , ,..., )n n nx x x f x x x xp q q p q

2

1 ~ ( , )x q N


A Gaussian Example

15

Consider

Then we can derive the following:

2 2

1 0 0| ~ ( , ), ~ ( , ).q q q X with prior mN N

22

011 1 2 2

0

( )( )( | ) ( | ) ( ) exp

2 2

mxx f x

qqp q q p q

2

2011 12 2 2 2 2

0 0 1

1 1 1( | ) exp exp ( )

2 2

mxx m

qp q q q

2

1 1 1

2 22 012 2 2 2 2

1 0 0

2 011 1 2 2

0

| ~ ( , )

1 1 1,

q

x m with

and

mxm

N


A Gaussian Example: Continued

16

To predict the distribution of a new observation in light

of , we use the predictive distribution as follows:

We use the properties of the bivariate normal distribution. The product

in the integrand is the exponential of a quadratic function in (x,q); hence

(x,q) have a joint normal distribution. One can verify that:

Thus the marginal (integrating q) is a Gaussian with:

2| ~ ( , )q q X N

1x

222211

2 22211

( )1 ( )( )( )222

1 1( | ) ( | ) ( | )

qqqq q p q q q q

mxmx

Likelihood Posterior

f x x f x x d e e d e d

221

2 2 21

( )1 ( )2 2

2 2(1 ) 1 1 1 1 1

2 2 2 2 2 2 21 1 1 1 1

( ) ( ) ( )( )~ , 2 ,

qq

q q

mx z

x m m x m me e z

2 2

1 1 1| ~ ( , ) X x mN

http://mathworld.wolfram.com/BivariateNormalDistribution.html





A Gaussian Example: Continued

17

We can derive the same results also using the fact that is

Gaussian so characterized fully by its mean and variance as (use the

earlier posterior results):

Thus we obtain the same result as before:

1( | )f x x

1 1 1 1 1| ( | , ) | |X

u v Posterior mean

X x X x x x mq qq q

1 1 1 1 1

2 2 2

1 1 1

tan

| ar ( | , ) | ( | , ) |

| |

X X

u v

cons t

Var X x V X x x Var X x x

x Var x

q q

q q

q

q q

q

Model Variancevariance due to

posterioruncertainty in

2 2

1 1 1| ~ ( , ) X x mN

: ( | )Use u u v

: ar( | ) ( | )Similarly use Var u V u v Var u v


Note that from:

Lets re-write this equation conditioning on x1:

Similar derivation can be shown for the variance:

18

( ) ( ) ( | ) ( ) ( | ) ( ) ( | )u u u du u u v v dvdu u u v du v dv u vp p p p p

1 1 1 1( | ) ( | ) ( | , ) ( | )u x u u x du u u v x v x dvdup p p

1 1 1 1 1 1 1( | ) ( | , ) ( | ) ( | , ) ( | ) ( | , ) |u x u u v x du v x dv u v x v x dv u v x xp p p

1 1 1 1 1( | ) ( | , ) | ( | , ) |Var u x Var u v x x Var u v x x

Proof of the Conditional Expectations


Gaussian Example: Bayes’ versus MLE

19

We have seen that the ML estimate of q at time n is simply:

The posterior of q at time n is (simply generalizing the earlier result):

where

As N∞, the prior is washed out by the data and the posterior mean is

the MLE estimate:

11

1( | )

N N

ML i i

ii

arg sup f x xNq

q q

2

1| ,..., ~ ( , )q N N Nx x mN

2 2 22 0

2 2 2 2 2

0 0

2 1 0 1

2 2

0

1 1~

~

NN

N

N N

i i

i iN N

N

N

N N

x xm

mN

1| ,..., N N MLx x mq q


Gaussian Example: Bayes versus MLE

20

Information provided by the Bayesian approach is much richer than the

simple MLE estimate.

You can compute for example posterior probabilities

or

Also you can do prediction of future data

1Pr | ,..., NA x xq

1| ,..., NVar x xq

1| ,..., Nf x x x


Bayes and the Poisson model

21

Assume you have some counting observations i.e.

Assume we adopt a Gamma prior for q, i.e.

You can easily show that:

. . .

~ ( ),qi i d

iX P

( | )!

ix

i

i

f x ex

q qq

~ ( , )q G

1( ) ( ; , )( )

q

p q q q

eG

1

1

( | ,..., ) ( ; , )p q q

N

N i

i

x x x NG

http://en.wikipedia.org/wiki/Gamma_distribution


Testing Hypothesis in a Bayesian Framework

22

Consider the problem where we have and

To test using the posterior, we simply compute:

Note the integration is in parameter space.

In Bayesian statistics you never integrate with respect to

observations

Contrary to a frequentist approach, hypothesis testing in Bayesian is

never based on data you don’t observe!

( ) [0,1]p q U

Pr( | ) (1 ) ( | ) ( 1, 1 )q q q p q

x n xn

X x x x n xx

B

0 1

1 1: :

2 2H vs Hq q

1

0 1

1/2

( | ) 1 ( | ) ( | )H x H x x dp p p q q

Hypothesis Testing The Bayesian Way - Purdue University

Documents

Transcript of Hypothesis Testing The Bayesian Way - Purdue University