Hypothesis Testing The Bayesian Way - Purdue University
Transcript of Hypothesis Testing The Bayesian Way - Purdue University
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Hypothesis Testing The Bayesian Way
Prof. Nicholas Zabaras
Materials Process Design and Control Laboratory
Sibley School of Mechanical and Aerospace Engineering
101 Frank H. T. Rhodes Hall
Cornell University
Ithaca, NY 14853-3801
Email: [email protected]
URL: http://mpdc.mae.cornell.edu/
December 27, 2013
1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Introduction to Bayesian Statistics
2
Reverend Thomas
Bayes
(ca. 1702–1761)
Sole probability paper,
“Essay Towards Solving a
Problem in the Doctrine of
Chances”,
published posthumously in 1763
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References
3
C P Robert, The Bayesian Choice: From Decision-Theoretic Motivations to
Compulational Implementation, Springer-Verlag, NY, 2001 (online resource)
A Gelman, JB Carlin, HS Stern and DB Rubin, Bayesian Data Analysis,
Chapman and Hall CRC Press, 2nd Edition, 2003.
J M Marin and C P Robert, The Bayesian Core, Spring Verlag, 2007 (online
resource)
D. Sivia and J Skilling, Data Analysis: A Bayesian Tutorial, Oxford University
Press, 2006.
Bayesian Statistics for Engineering, Online Course at Georgia Tech, B.
Vidakovic.
Additional References with links are provided in the lecture slides.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Hypothesis testing the Bayesian way
Examples of Parametric Bayesian Models
Examples of Bayesian Prediction
Sequential nature of Bayesian inference, a Gaussian example
Bayes vs. MLE (limit of large data sets)
Example: Bayes and the Poisson model
Hypothesis Testing in a Bayesian framework (integration in parameter space)
Summary: Bayesian point estimates
Contents
4
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Hypothesis Testing the Bayesian Way
5
Consider two hypotheses in coin flipping:
Coin is fair with prior p(h1)
Coin always produces tails with prior p(h2)
We flip the coin 5 times and obtain data x = {HTHTT}
Inferencing: We want assess the validity of the two hypotheses
Likelihood f(x | hi ):
Posterior:
Here there is no effect of the prior
1 1 1
2 2 2
( | ) ( | ) ( )
( | ) ( | ) ( )
h f h h
h f h h
p p
p p
x x
x x
1 25
1( | ) , ( | ) 0
2f h f h x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Hypothesis Testing the Bayesian Way
6
Consider two hypothesis in coin flipping:
Coin is fair with prior p(h1)
Coin always produces tails with prior p(h2)
We flip the coin 5 times and obtain data x = {TTTTT}
Inferencing: We want to assess the validity of the two hypotheses
Likelihood f(x | hi ):
Posterior
The data (evidence) points to `tails’ but the posterior inferences
also depend strongly on the priors!
1 1 1 1
2 2 2 2
( | ) ( | ) ( ) ( )1
( | ) ( | ) ( ) 32 ( )
h f h h h
h f h h h
p p p
p p p
x x
x x
1 25
1( | ) , ( | ) 1
2f h f h x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Parametric Bayesian Models
7
Let q the probability that the coin will draw heads
Let p(q) be the prior for here taken as:
Consider that we have the data x = {HTHTT}. We want to make an
inference about q?
Likelihood:
Posterior:
2 3( | ) (1 ) ( )f binomialq q q x
0,1q
[0,1]( ) ( ) ( )p q q uniform1
2 3
[0,1]
( | ) ( )( | )
( )
(1 )( ) (3,4)
(3,4)
q p qp q
q qq
f
m
betaB
xx
x
10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
4
5
6
x
2H - 3T
20H - 30T
0H - 5T
Click here for a MatLab
implementation
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Parametric Bayesian Models
8
Let q the probability that the coin will draw heads
Suppose that B(a,b) is now our prior distribution:
We are given data x = {HTHTT}
Likelihood:
Posterior:
2 3( | ) (1 ) ( )f binomialq q q x
1 11( ) (1 )
( , )
a b
beta a bp q q q
2 1 3 1
[0,1]
( | ) ( )( | )
( )
(1 )( ) ( 2, 3)
( 2, 3)
q p qp q
q qq
a b
f
m
a bbeta a b
B
xx
x
10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
25
30
x
a=1,b=1
a=100,b=100
a=100,b=1
Click here for a MatLab
implementation
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Parametric Bayesian Models
9
Data given: The coin is flipped n times and nH of those came to be heads
Prior: We consider the Beta prior B(a,b) as in the earlier slide.
Posterior:
Posterior mean:
1 1
( | ) ( )( | )
( )
(1 )( , )
( , )
q p qp q
q q
H Ha n b n n
H H
H H
f
m
a n b n nbeta a n b n n
B
xx
x
[ | ] Ha n
a b nq
x
, [ | ] Hnn
nq x
2
[ | ]
1
H H
Var
a n b n n
a b n a b n
q
x
[ | ] 0 (1/ )Var as O nq x
Note that:
Posterior variance:
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Prediction
10
Suppose we have observed x and we want to make a prediction about
(future) unknown observables: What is the probability of observing data
If we already have observed data x?
This means finding
We have:
Compare this with the normalizing factor:
( | )g x x
( )( )
( , , ) ( , , ) ( , )( | ) ( , | )
( ) ( , ) ( )
( | , ) ( | ) ( | ) ( | )
p q p q qq q q q
q
q p q q q p q q
posteriorlikelihood
g g d d dm m
f d f d
xx
x x x x xx x x x
x x x
x x x x x
x
Pr( )
( ) ( | ) ( )q p q q iorLikelihood
m f d
x
x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Prediction: Example
11
Consider the coin flipping example
Let q the probability that the coin draws heads
Consider p(q) a uniform prior for q
Given data x = {HTHTT}
We have seen that the posterior is B(3, 4)
What is the probability that the next draw will be heads?
x
2 3(1 ) 3( | ) ( | ) ( | )
(3,4) 7g H f d d
beta
q qq p q q q q
x x x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Prediction
12
Consider the coin flipping example
Let q the probability that the coin will draw heads
Consider p(q) a uniform prior for q
Given data x = {HTHTT}
We have seen that the posterior is B (3, 4)
What is the probability that the next 5 draws will be all heads?
x
2 35
( | ) ( | ) ( | )
(1 ) 1
(3,4) 22
g HHHHH g d
dbeta
q p q q
q qq q
x x x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Bayesian Analysis is Sequential
13
Using some given data x, we computed the posterior p(q| x).
If new data x* arrives, how can we update our inference?
We assume that x and x* are conditionally independent on q i.e.:
The augmented posterior (based on both x and x*) is:
Note that the prior now is our old posterior computed with data x.
Thus Bayesian analysis is sequential.
( , * | ) ( ) ( * | ) ( | ) ( ) ( * | ) ( | )( | , *)
( , *) ( *) ( ) ( *)
p q p q q q p q q p qp q
f f f
m m m
x x x x x xx x
x x x x x
( , *| ) ( | ) ( *| )f f fq q qx x x x
( * | ) ( | )( | , *)
( *)
f
m
q p qp q
x xx x
x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Nature of Bayesian Inference
14
Assume we have observed and computed the
corresponding posterior. Now we observe a second realization
We are interested to update our posterior:
Updating the prior one observation at a time or all observations
together does not matter
The sequential approach is useful for massive data sets, e.g.
i.e. the prior at time n is the posterior at time n-1.
2
2 2 | ~ ( , ).q q x of X N
1 2 2 1 1 2 1
2 1
1 2
( | , ) ( | , ) ( | ) ( | ) ( | ) ( )
( | ) ( | )
( | ) ( | )
x x f x x x f x f x
f x x
f x x
p q q p q q q p q
q p q
q p q
1 2 1 2 1( | , ,..., ) ( | ) ( | , ,..., )n n nx x x f x x x xp q q p q
2
1 ~ ( , )x q N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
A Gaussian Example
15
Consider
Then we can derive the following:
2 2
1 0 0| ~ ( , ), ~ ( , ).q q q X with prior mN N
22
011 1 2 2
0
( )( )( | ) ( | ) ( ) exp
2 2
mxx f x
qqp q q p q
2
2011 12 2 2 2 2
0 0 1
1 1 1( | ) exp exp ( )
2 2
mxx m
qp q q q
2
1 1 1
2 22 012 2 2 2 2
1 0 0
2 011 1 2 2
0
| ~ ( , )
1 1 1,
q
x m with
and
mxm
N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
A Gaussian Example: Continued
16
To predict the distribution of a new observation in light
of , we use the predictive distribution as follows:
We use the properties of the bivariate normal distribution. The product
in the integrand is the exponential of a quadratic function in (x,q); hence
(x,q) have a joint normal distribution. One can verify that:
Thus the marginal (integrating q) is a Gaussian with:
2| ~ ( , )q q X N
1x
222211
2 22211
( )1 ( )( )( )222
1 1( | ) ( | ) ( | )
qqqq q p q q q q
mxmx
Likelihood Posterior
f x x f x x d e e d e d
221
2 2 21
( )1 ( )2 2
2 2(1 ) 1 1 1 1 1
2 2 2 2 2 2 21 1 1 1 1
( ) ( ) ( )( )~ , 2 ,
q q
mx z
x m m x m me e z
2 2
1 1 1| ~ ( , ) X x mN
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
A Gaussian Example: Continued
17
We can derive the same results also using the fact that is
Gaussian so characterized fully by its mean and variance as (use the
earlier posterior results):
Thus we obtain the same result as before:
1( | )f x x
1 1 1 1 1| ( | , ) | |X
u v Posterior mean
X x X x x x mq qq q
1 1 1 1 1
2 2 2
1 1 1
tan
| ar ( | , ) | ( | , ) |
| |
X X
u v
cons t
Var X x V X x x Var X x x
x Var x
q q
q q
q
q q
q
Model Variancevariance due to
posterioruncertainty in
2 2
1 1 1| ~ ( , ) X x mN
: ( | )Use u u v
: ar( | ) ( | )Similarly use Var u V u v Var u v
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Note that from:
Lets re-write this equation conditioning on x1:
Similar derivation can be shown for the variance:
18
( ) ( ) ( | ) ( ) ( | ) ( ) ( | )u u u du u u v v dvdu u u v du v dv u vp p p p p
1 1 1 1( | ) ( | ) ( | , ) ( | )u x u u x du u u v x v x dvdup p p
1 1 1 1 1 1 1( | ) ( | , ) ( | ) ( | , ) ( | ) ( | , ) |u x u u v x du v x dv u v x v x dv u v x xp p p
1 1 1 1 1( | ) ( | , ) | ( | , ) |Var u x Var u v x x Var u v x x
Proof of the Conditional Expectations
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Gaussian Example: Bayes’ versus MLE
19
We have seen that the ML estimate of q at time n is simply:
The posterior of q at time n is (simply generalizing the earlier result):
where
As N∞, the prior is washed out by the data and the posterior mean is
the MLE estimate:
11
1( | )
N N
ML i i
ii
arg sup f x xNq
q q
2
1| ,..., ~ ( , )q N N Nx x mN
2 2 22 0
2 2 2 2 2
0 0
2 1 0 1
2 2
0
1 1~
~
NN
N
N N
i i
i iN N
N
N
N N
x xm
mN
1| ,..., N N MLx x mq q
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Gaussian Example: Bayes versus MLE
20
Information provided by the Bayesian approach is much richer than the
simple MLE estimate.
You can compute for example posterior probabilities
or
Also you can do prediction of future data
1Pr | ,..., NA x xq
1| ,..., NVar x xq
1| ,..., Nf x x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayes and the Poisson model
21
Assume you have some counting observations i.e.
Assume we adopt a Gamma prior for q, i.e.
You can easily show that:
. . .
~ ( ),qi i d
iX P
( | )!
ix
i
i
f x ex
q qq
~ ( , )q G
1( ) ( ; , )( )
q
p q q q
eG
1
1
( | ,..., ) ( ; , )p q q
N
N i
i
x x x NG
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Testing Hypothesis in a Bayesian Framework
22
Consider the problem where we have and
To test using the posterior, we simply compute:
Note the integration is in parameter space.
In Bayesian statistics you never integrate with respect to
observations
Contrary to a frequentist approach, hypothesis testing in Bayesian is
never based on data you don’t observe!
( ) [0,1]p q U
Pr( | ) (1 ) ( | ) ( 1, 1 )q q q p q
x n xn
X x x x n xx
B
0 1
1 1: :
2 2H vs Hq q
1
0 1
1/2
( | ) 1 ( | ) ( | )H x H x x dp p p q q
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Posterior Inference: Point Estimates
23
Maximum A Posteriori estimate (MAP)
Posterior Mean
Posterior Quantiles
( | ) ( )( | )
( )
q p qp q
f xx
m x
* argmax log ( | ) argmax log ( | ) log ( )x xq q
q p q p q p q
( | )[ ] ( | )p x x dqq q qp q q
Pr ( | )a
a x dq p q q