15. Bayesian Methods - University of California,...

47
15. Bayesian Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 2003. They can be used as an adjunct to Chapter 13 of our subsequent book Microeconometrics: Methods and Applications Cambridge University Press, 2005. Original version of slides: May 2003

Transcript of 15. Bayesian Methods - University of California,...

Page 1: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

15. Bayesian Methods

c A. Colin Cameron & Pravin K. Trivedi 2006

These transparencies were prepared in 2003.

They can be used as an adjunct to

Chapter 13 of our subsequent book

Microeconometrics: Methods and Applications

Cambridge University Press, 2005.

Original version of slides: May 2003

Page 2: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Outline

1. Introduction

2. Bayesian Approach

3. Bayesian Analysis of Linear Regression

4. Monte Carlo Integration

5. Markov Chain Monte Carlo Simulation

6. MCMC Example: Gibbs Sampler for SUR

7. Data Augmentation

8. Bayesian Model Selection

9. Practical Considerations

Page 3: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

1 Introduction

� Bayesian regression has grown greatly since books byArnold Zellner (1971) and Leamer (1978).

� Controversial. Requires specifying a probabilistic modelof prior beliefs about the unknown parameters.[Though role of prior is negligible in large samplesand relatively uninformative priors can be speci�ed.]

� Growth due to computational advances.

� In particular, despite analytically intractable poste-rior can use simulation (Monte Carlo) methods to

� estimate posterior moments

� make draws from the posterior.

Page 4: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2 Bayesian Approach

1. Prior �(�)Uncertainty about parameters � explicitly modelledby density �(�).

e.g. � is an income elasticity and on basis of eco-nomic model or previous studies it is felt thatPr[0:8 � � � 1:2] = 0:95.Possible prior is � � N [1; 0:12].

2. Sample joint density or likelihood f(yj�)Similar to ML framework.In single equation case y is N�1 vector and depen-dence on regressors X is suppressed.

3. Posterior p(�jy)Obtained by combining prior and sample.

Page 5: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2.1 Bayes Theorem

� Bayes inverse law of probability gives posterior

p(�jy) = f(yj�)�(�)f(y)

; (1)

where f(y) is marginal (wrt �) prob. distn of y

f(y) =Zf(yj�)�(�)d�: (2)

� Proof: Use Pr[AjB] = Pr[A\B]Pr[B]

=Pr[BjA] Pr[A]

Pr[B]:

� f(y) in (1) is free of �, so can write p(�jy) asproportional to the product of the pdf and the prior

p(�jy) _ f(yj�)�(�): (3)

� Big di¤erence

� Frequentist: �0 is constant and b� is random.� Bayesian: � is random.

Page 6: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2.2 Normal-Normal iid Example

1. Sample Density f(yj�)Assume yij� � N [�; �2] with � unknown and �2given.

f(yj�) =�2��2

��N=2exp

8<:�NXi=1

(yi � �)2 =2�29=;

_ exp�� N

2�2(�y � �)2

�;

2. Prior �(�).Suppose � � N [�; �2] where � and �2 are given.

�(�) =�2��2

��1=2exp

n� (� � �)2 =2�2

o_ exp

�� 1

2�2(� � �)2

�;

3. Posterior density p(�jy)

p(�jy) _ exp�� N

2�2(y � �)2

�exp

�� 1

2�2(� � �)2

Page 7: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� After some algebra (completing the square)

p(�jy) _ exp

(�12

"(� � �)2

�2+

(y � �)2

N�1�2 + �2

#)

_ exp

(�12

"(� � �1)2

�21

#)

�1 = �21

�Ny=�2 + �=�2

��21 =

�N=�2 + 1=�2

��1:

� Properties of posterior:

� Posterior density is �jy � N [�1; �21]:

� Posterior mean �1 is weighted average of priormean � and sample average y.

� Posterior precision ��21 is sum of sample preci-sion of y, N=�2, and prior precision 1=�2.[Precision is the reciprocal of the variance.]

� As N !1, �jy � N [y; �2=N ]

Page 8: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Normal-Normal Examplewith �2 = 100; � = 5, �2 = 3, N = 50 and �y = 10.

Page 9: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2.3 Speci�cation of the Prior

� Tricky. Not the focus of this talk.

� Prior can be improper yet yield proper posterior.

� A noninformative prior has little impact on the re-sulting posterior distribution.Use Je¤reys prior, not uniform prior, as invariant toreparametrization.

� For informative prior prefer natural conjugate prioras it yields analytical posterior.)Exponential family prior, density and posterior.e.g. normal-normal, Poisson-gamma

� Hierarchical priors popular for multilevel models.

Page 10: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2.4 Measures Related to Posterior

� Marginal Posterior:p(�kjy) =

Rp(�1; :::; �djy)d�1::d�k�1d�k+1::d�d:

� Posterior Moments: mean/median; standard devn.

� Point Estimation: no unknown �0 to estimate.Instead �nd value of � that minimizes a loss function.

� Posterior Intervals (95%):Prh�k;:025 � �k � �k;:975)jy

i= 0:95.

� Hypothesis Testing: Not relevant. Bayes factors.

� Conditional Posterior Density:p(�kj�j; �j 2 ��k;y) = p(�jy)=p(�j 2 ��kjy):

Page 11: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

2.5 Large Sample Behavior of Posterior

� Asymptotically role of prior disappears.

� If there is a true �0 then the posterior mode b� (max-imum of the posterior) is consistent for this.

� Posterior is asymptotically normal

�jy a� Nhb�; I(b�)�1i ; (4)

centered around the posterior mode, where

I�b�� = "

@2 ln p(�jy)@�@�0

������=b�

#�1:

� Called a Bayesian central limit theorem.

Page 12: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

3 Bayesian Linear Regression

� Linear regression model

yjX;�;�2 � N [X�;�2IN ]:

� Di¤erent results with noninformative and informativepriors.Even within these get di¤erent results according tosetup.

Page 13: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

3.1 Noninformative Priors

� Je¤reys�priors: �(�j) _ c and �(�2) _ 1=�2.All values of �j equally likely.Smaller values of �2 are viewed as more likely.

�(�;�2) _ 1=�2:

� Posterior density after some algebra

p(�;�2jy;X)

_�1

�2

�K=2exp

��12(��b�)0 1

�2(X0X)(��b�)�

��1

�2

�(N�K)=2+1exp

� (N �K) s2

2�2

!

� Cond. posterior p(�j�2;y;X) isN [b�OLS; �2 �X0X��1]� Marginal posterior p(�jy;X) (integrate out �2) ismultivariate t-distribution centered at b� withN�Kdof and variance s2 (N �K)

�X0X

��1 = (N �K � 2).

Page 14: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Marginal posterior p(�2jy;X) is inverse gamma.

� Qualitatively similar to frequentist analysis in �nitesamples.

� Interpretation is quite di¤erent.

E.g. Bayesian 95 percent posterior interval for �jis b�j � t:025;N�K�sehb�ji

� means that �j lies in this interval with posteriorprobability 0.95

� not that if we had many samples and constructedmany such intervals 95 percent of them will con-tain the true �j0.

Page 15: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

3.2 Informative Priors

� Use conjugate priors:

� Prior for �j�2 is N [�0; �20]:

� Prior for �2 is inverse-gamma.

� Posterior after much algebra is

p(�;1=�jy;X)

_��2�(�0+N)=2�1

exp�� s12�2

� ��2��K=2

� exp�� 1

2�2

����

�01

����

��where

� =�0 +X

0X��1

(0�0 +X0Xb�)

1 =�0 +X

0X�

s1 = s0 + bu0bu+ ����

�0 ��10 +

�X0X

��1� ����

Page 16: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Conditional posterior p(�j�2;y;X) is N [�;1]:

� Marginal posterior p(�jy;X) (integrate out �2) ismultivariate t-distribution centered at �:

� Here � is average of b�OLS and prior mean �0:And precision is sum of prior and sample precisions.

Page 17: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

4 Monte Carlo Integration

� Compute key posterior moments, without �rst ob-taining the posterior distribution.

� Want E[m(�jy)], where expectation is wrt to pos-terior density p(�jy).For notational convenience suppress y:

� So wish to compute

E [m(�)] =Zm(�)p(�)d�: (5)

� Need a numerical estimate of an integral:- Numerical quadrature too hard.- Direct Monte Carlo with draws from p(�) not pos-sible.- Instead use importance sampling.

Page 18: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

4.1 Importance Sampling

� Rewrite

E [m(�)] =Zm(�)p(�)d�

=Z

m(�)p(�)

g(�)

!g(�)d�;

where g(�) > 0 is a known density with same sup-port as p(�).

� The corresponding Monte Carlo integral estimate is

bE [m(�)] = 1

S

SXs=1

m(�s)p(�s)

g(�s); (6)

where �s, s = 1; :::; S, are S draws from of � fromg(�) not p(�).

� To apply to posterior need also to account for con-stant of integration in the denominator of (1).

Page 19: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Let pker(�) = f (yj�)�(�) be posterior kernel.

� Then posterior density is

p(�) =pker(�)Rpker(�)d�

;

with posterior moment

E [m(�)] =Zm(�)

pker(�)Rpker(�)d�

!d�

=

Rm(�) pker(�)d�Rpker(�)d�

=

R �m(�) pker(�)=g(�)

�g(�)d�R �

pker(�)=g(�)�g(�)d�

:

� The importance sampling-based estimate is then

bE [m(�)] = 1S

PSs=1m(�

s)pker(�s)=g(�s)1S

PSs=1 p

ker(�s)=g(�s); (7)

where �s, s = 1; :::; S, are S draws of � from theimportance sampling density g(�):

Page 20: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Method was proposed by Kloek and van Dijk (1978).

� Geweke (1989) established consistency and asymp-totic normality as S !1 if

� E[m(�)] <1 so the posterior moment exists

�Rp(�)d� = 1 so the posterior density is proper.

May requireR�(�)d� <1.

� g(�) > 0 over the support of p(�)

� g(�) should have thicker tails than the p(�) toensure that the importance weight w(�) = p(�)=g(�)

remains bounded.e.g. use multivariate- t.

� The importance sampling method can be used toestimates many quantities, including mean, standarddeviation and percentiles of the posterior.

Page 21: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5 Markov Chain Monte Carlo Sim-

ulation

� If can make S draws from the posterior,E[m(�)] can be estimated by S�1

Psm(�

s).

� But hard to make draws if no tractable closed formexpression for the posterior density.

� Instead make sequential draws that, if the sequenceis run long enough, converge to a stationary distrib-ution that coincides with the posterior density p(�).

� Called Markov chain Monte Carlo, as it involvessimulation (Monte Carlo) and the sequence is thatof a Markov chain.

� Note that draws are correlated.

Page 22: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.1 Markov Chains

� A Markov chain is a sequence of random variablesxn (n = 0; 1; 2; :::) with

Pr [xn+1 = xjxn; xn�1; :::; x0] = Pr [xn+1 = xjxn] ;

so that the distribution of xn+1 given past is com-pletely determined only by the preceding value xn.

� Transition probabilities are

txy = Pr [xn+1 = yjxn = x] :

� For �nite state Markov chain with m states form anm�m transition matrix T:

� Then for transition from x to y in n steps (stages)the transition probability is given by Tn, the n-timesmatrix product of T.

Page 23: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� The rows t(n)j of the matrix Tn give the marginal

distribution across the m states at the nth stage.

� The chain is said to yield a stationary distributionor invariant distribution t (x) ifX

x2At (x)Tx;y = t (y) 8y 2 A:

� For Bayesian application the chain is �(n) not xn:

� We want the chain �(n):(1) to converge to a stationary distribution and(2) this stationary distribution to be the desired pos-terior.

Page 24: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.2 Gibbs Sampler

� Easy to describe and implement.

� Let � = [�01 �02]0 have posterior density p(�) =p(�1; �2).

� Suppose know p(�1j�2) and p(�2j�1).

� Then alternating sequential draws from p(�1j�2) andp(�2j�1) in the limit converge to draws from p(�1; �2).

Page 25: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.2.1 Gibbs Sampler Example

� Let y = (y1; y2) � N [�;�]where � =(�1; �2)

0

and � has diagonal entries 1 and o¤-diagonals �.

� Then given a uniform prior for � the posterior

�jy � N [�y;N�1�]:

� So the conditional posterior distributions are

�1j�2;y � Nh(�y1 + � (�2 � �y2)) ; (1� �2)=N

i�2j�1;y � N

h(�y2 + � (�2 � �y2)) ; (1� �2)=N

i;

� Can iteratively sample from each conditional normaldistribution using updated values of �1 and �2.

� If the chain is run long enough then it will convergeto the bivariate normal.

Page 26: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.2.2 Gibbs Sampler

� More generally, suppose � is partitioned into d blocks.e.g. � = [� �2]0 in a linear regression example.

� Let �k be the kth blockand ��k denote all components of � aside from �k.

� Assume the full conditional distributions p��kj��k

�,

k = 1; :::; d are known.

� Then sequential sampling from the full conditionalscan be set up as follows.

Page 27: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

1. Let the initial values of � be �(0) = (�(0)1 ; :::; �(0)d ):

2. The next iteration involves sequentially revising allcomponents of � to yield �(1) = (�

(1)1 ; :::; �

(1)d )

generated using d draws from the d conditional dis-tributions as follows:

p(�(1)1 j�(0)2 ; :::; �

(0)d )

p(�(1)2 j�(1)1 ; �

(0)3 :::; �

(0)d )

...

p(�(1)d j�(1)1 ; �

(1)2 ; :::; �

(1)d�1)

3. Return to step one, reinitialize the vector � at �(1)

and cycle through step 2 again to obtain the newdraw �(2): Repeat the steps until convergence isachieved.

Page 28: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Geman and Geman (1984) showed that the stochas-tic sequence

n�(n)

ois a Markov chain with the cor-

rect stationary distribution. See also Tanner andWong (1987) and Gelfand and Smith (1990).

� These results mentioned do not tell us how manycycles are needed for convergence, which is modeldependent.

� It is very important to ensure that su¢ cient numberof cycles are executed for the chain to converge. Dis-card the earliest results from the chain, the so-called�burn-in�phase. Diagnostic tests are available.

Page 29: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.3 Metropolis Algorithm

� The Gibbs sampler is the best known MCMC algo-rithm.

� Limited applicability as it requires direct samplingfrom the full conditional distributions which may notbe known.

� Two extensions that allow MCMC to be applied moregenerally are the Metropolis algorithm and the Metropolis-Hastings algorithm.

� In applying MCMC we use a sequence of approxi-mating posterior distributions; are transition distrib-utions or transition kernels or proposal densities:

� Use the notation Jn(�(n)j�(n�1)) which emphasizesthat the transition distribution varies with n.

Page 30: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

1. Draw a starting point �(0) from an initial approxi-mation to the posterior for which p(�(0)) > 0.e.g. draw from a multivariate t-distribution centeredon the posterior mode.

2. Set n = 1: Draw �� from a symmetric jumpingdistribution J1(�

(1)j�(0)),i.e. for any arbitrary pair (�a; �b); Jn(�aj�b) =Jn(�bj�a)e.g. �(1)j�(0) � N [�(0);V] for some �xed V.

3. Calculate the ratio of densities r = p(��)=p(�(0)):

4. Set

�(1) =

(�� with probability min(r; 1)�(0) with probability (1�min(r; 1)) :

5. Return to step 2, increase the counter and repeatthe following steps.

Page 31: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Can view as an iterative method to maximize p(�).If �� increases p(�) then �(n) = �� always.If �� decreases p(�) then �(n) = �� with prob r.

� Similar in spirit to accept-reject sampling but withno requirement that a �xed multiple of the jumpingdistribution always covers the posterior.

� Metropolis generates a Markov chain with propertiesof reversibility, irreducibility and Harris recurrencethat ensure convergence to a stationary distribution.

� To see that the Metropolis stationary distribution isthe desired posterior p(�) do as follows ...

Page 32: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Let �a and �b be points such that p(�b) � p(�a).

� If �(n�1) = �a and �� = �b then �(n) = �b with

certainty and

Pr[�(n) = �b; �(n�1) = �a] = Jn(�bj�a)p(�a):

� If order is reversed and �(n�1) = �b and �� = �a

then �(n) = �a with probability r = p(�a)=p(�b)

and

Pr[�(n) = �a; �(n�1) = �b] = Jn(�aj�b)p(�b)p(�a)p(�b)

= Jn(�aj�b)p(�a)= Jn(�bj�a)p(�a)

as symmetric jumping distribution.

� Symmetric joint distribution) marginal distributions of �(n) and �(n�1) aresame) p(�) is the stationary distribution

Page 33: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.4 Metropolis-Hastings (M-H) Algorithm

� The Metropolis-Hastings (M-H) algorithm is thesame as the Metropolis algorithm,except that in step 2 the jumping distribution neednot be symmetric.

� Then in step 3 the acceptance probability

rn =p(��)=Jn(��j�(n�1))

p(�(n�1))=Jn(�(n�1)j��)

=p(��)Jn(�(n�1)j��)

p(�(n�1))Jn(��j�(n�1)):

� Any normalizing constants present in either p(�) orJn(�) cancel in rn. So both posterior and jump prob-abilities need only be computed up to this constant.

Page 34: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

5.5 M-H Examples

� Di¤erent jumping distributions lead to di¤erent M-Halgorithms.

� Gibbs sampler is a special case of M-H.If � is partitioned into d blocks, then there are dMetropolis steps at the nth step of the algorithm.The jumping distribution is the conditional distri-bution given in subsection 5.2 and the acceptanceprobability is always 1. Gibbs sampling is also calledalternating conditional sampling.

� Mixed strategies can be used.e.g. an M-H step combined with a Gibbs sampler.

� The independence chain makes all draws from a �xeddensity g (�) :

Page 35: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� A random walk chain sets the draw�� = �(n�1) + "; where " is a draw from g("):

� Gelman et al. (1995, p.334) consider � � N [�;�].For Metropolis with

��j�(n�1) � N [�(n�1); c2�];

c ' 2:4=pq leads to greatest e¢ ciency relative to

direct draws from q�variate normal.The e¢ ciency is about 0:3, compared to 1=q for theGibbs sampler for � =�2Iq.

Page 36: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

6 Gibbs Sampler for SUR

� Two-equation example with ith observation

y1i = �11 + �12x1i + "1i

y2i = �21 + �22x2i + "2i;""1i"2i

#� N

""00

#;

"� =

�11 �12�21 �22

##:

� Assume independent informative priors, with

� � N [�0;B�10 ];

��1 � Wishart[�0;D0];

� Some algebra yields the conditional posteriors

�j�;y;X � N [C0(B0�0 +NXi=1

x0i��1yi);C0];

��1j�;y;X � Wish[�0 +N; (D�10 +

NXi=1

"0i"i)�1]

Page 37: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

where C0 =�B0 +

PNi=1 x

0i��1xi

��1.

� Gibbs sampler can be used since conditionals known.

� Simulation: N = 1000 or N = 10000.x1i�N [0; 1] and x2i�N [0; 1]:�11 = �12 = �21 = �22 = 1

�11 = �22 = 1; �12 = �:5:Priors �0 = 0, B�10 = �I (with � = 10; 1; 0:1),D0 = I and �0 = 5.

� Gibbs sampler samples recursively from the condi-tional posteriors.Reject the �rst 5000 replications - �burn-in�.Use subsequent 50000 and 100000 replications.

� Table reports mean and st.dev. of the marginal pos-terior distribution of the 7 parameters.

Page 38: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� First three columns: not sensitive to di¤erent valuesof �:

� Fourth column vs. �rst shows doubling number ofreps has very little e¤ect.

� Fifth column vs. �rst shows that increasing samplesize ten-fold to 100000 has relatively small impacton point estimates though precision is much higher.

� When number of reps is small ' 1000 the autocor-relation coe¢ cients of parameters are found to be ashigh as 0.06. When number of reps ' 50000 serialcorrelation is much lower < 0:01.

Page 39: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

7 Data Augmentation

� Gibbs sampler can sometimes be applied to a widerrange of models by introduction of auxiliary vari-ables.

� In particular, this is the case for models involvinglatent variables, such as discrete choice, truncatedand censored models.

� Observe only y = g(y�) for given g(�) and latentdependent variable y�.e.g. Probit / logit have y = 1(y� > 0).

� Data augmentation replaces y� by imputed val-ues and treats this as observed data.

� Essential insight, due to Tanner and Wong (1987),is that posterior based only on observed data is in-tractable, but that obtained after data augmentationis often tractable using the Gibbs sampler.

Page 40: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

8 Bayesian Model Selection

� Method uses Bayes factors.

� Two hypotheses under consideration- H1 and H2 possibly non-nested.- Prior probabilities Pr[H1] and Pr[H2]:- Sample dgp�s Pr[yjH1] and Pr[yjH2]:

� Posterior probabilities by Bayes Theorem

Pr[Hkjy] =Pr[yjHk]Pr[Hk]

Pr[yjH1]Pr[H1] + Pr[yjH2]Pr[H2]:

� The posterior odds ratio

Pr[H1jy]Pr[H2jy]

=Pr[yjH1]Pr[H1]Pr[yjH2]Pr[H2]

� B12Pr[H1]

Pr[H2];

whereB12 =Pr[yjH1] =Pr[yjH2] is called Bayes fac-tor.

Page 41: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

� Hypothesis 1 preferred if posterior odds ratio > 1.

� Bayes factor = posterior odds in favor of H1if Pr[H1] =Pr[H2].

� Bayes factor has form of a likelihood ratio.But depends on unknown parameters �k eliminatedby integrating over parameter space wrt prior, so

Pr [yjHk] =ZPr [yj�k; Hk]� (�kjHk) d�:

� This expression depends upon all the constants thatappear in the likelihood.These constants can be neglected when evaluatingthe posterior, but are required for the computationof the Bayes factor.

Page 42: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

9 Practical Considerations

� WinBUGS package (Bayesian inference Using GibbsSampling) package especially useful for hierarchicalmodels and missing data problems.

� For more complicated models use Matlab or Gauss.

� Practical issue of how long to run the chain.Diagnostic checks for convergence are available, butoften do not have universal applicability.Graphs of output for scalar parameters from the Markovchain is a visually attractive way of con�rming con-vergence, but more formal approaches are available(Geweke, 1992).Gelman and Rubin (1992) use multiple (parallel) Gibbssamplers each beginning with di¤erent starting val-ues to see if di¤erent chains converge to the sameposterior distribution.Zellner and Min (1995) propose several convergencecriteria that can be used if the posterior can be writ-ten explicitly.

Page 43: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

10 Bibliography

� Useful books include Gamerman (1997), Gelman,Carlin, Stern and Rubin (1995), Gill (2002) and Koop(1993) plus older texts by Zellner (1971) and Leamer(1978).

� Numerous papers by Chib and his collaborators, andGeweke and his collaborators, cover many topics ofinterest in microeconometrics. See Chib and Green-berg (1996), Chib (200) and Geweke and Keane(2000).

Albert, J.H. (1988), �Computational Methods for Using a Bayesian

Hierarchical Generalized Linear Model�. Journal of American Statis-

tical Association, 83, 1037-1045:

Casella, G. and E. George (1992), �Explaining the Gibbs Sampler�,

The American Statistician, 46, 167-174.

Page 44: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Chib, S. (2000), �Markov Chain Monte Carlo Methods: Computa-

tion and Inference�, chapter 57 in J.J. Heckman and E.E. Leamer,

Editors, Handbook of Econometrics Volume 5, 3570-3649.

Chib, S., and E. Greenberg (1995), �Understanding the Metropolis-

Hastings Algorithm�, The American Statistician, 49, 4, 327-335.

Chib, S., and E. Greenberg (1996), �Markov Chain Monte Carlo

Simulation Method in Econometrics�, Econometric Theory, 12, 409-

431.

Gamerman, D. (1997), Markov Chain Monte Carlo: Stochastic Sim-

ulation for Bayesian Inference, London: Chapman and Hall.

Gelfand, A.E. and A.F.M. Smith (1990) �Sampling Based Approaches

to Calculating Marginal Densities�, Journal of American Statistical

Association, 85, 398-409.

Gelman, A., J.B. Carlin, H.S. Stern and D.B. Rubin (1995), Bayesian

Data Analysis, London: Chapman and Hall.

Gelman, A., and D.B. Rubin (1992), �Inference from Iterative Sim-

ulations Using Multiple Sequences�, Statistical Science, 7, 457-511.

Page 45: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Geman, S. and D. Geman (1984), �Stochastic Relaxation, Gibbs Dis-

tributions and Bayesian Restoration of Images�, IEEE Transactions

on Pattern Analysis and Machine Intelligence, 6, 721-741.

Geweke, J. (1989), �Bayesian Inference in Econometric Models Using

Monte Carlo Integration�, Econometrica, 57, 1317-1339.

Geweke, J. (1992), �Evaluating the Accuracy of Sampling-based Ap-

proaches to the Calculation of Posterior Moments (with discussion)�,

in J. Bernardo, J. Berger, A.P. Dawid, and A.F.M. Smith, Editors,

Bayesian Statistics, 4, 169-193. Oxford: Oxford University Press.

Geweke, J. and M. Keane (2000), �Computationally Intensive Meth-

ods for Integration in Econometrics�, chapter 56 in Heckman, J.J.

and E.E. Leamer, Editors, Handbook of Econometrics Volume 5,

3463-3567.

Gill, J. (2002), Bayesian Methods: A Social and Behavioral Sciences

Approach, Boca Raton (FL): Chapman and Hall.

Hastings, W.K. (1970), �Monte Carlo Sampling Methods Using Markov

Chain and Their Applications�, Biometrika, 57, 97-109.

Kass, R.E. and A.E. Raftery (1995), �Bayes Factors�, Journal of

American Statistical Association, 90, 773-795.

Page 46: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Kloek, T. and H.K. van Dijk (1978), �Bayesian Estimates of Equa-

tion System Parameters: An Application of Integration by Monte

Carlo�, Econometrica, 46, 1-19.

Koop, G. 2003), Bayesian Econometrics, Wiley.

Leamer, E.E. (1978), Speci�cation Searches: Ad Hoc Inference with

Nonexperimental Data, New York: John Wiley.

Robert, C.P., and G. Casella (1999), Monte Carlo Methods, New

York: Springer-Verlag.

Tanner, M.A., and W.H. Wong (1987), �The Calculation of Pos-

terior Distributions by Data Augmentation�, Journal of American

Statistical Association, 82, 528-549.

Zellner, A. (1971), An Introduction to Bayesian Inference in Econo-

metrics, New York: John Wiley.

Zellner, A. (1978), �Je¤reys-Bayes Posterior Odds Ratio and the

Akaike Information Criterion for Discriminating Between Models�,

Economics Letters, 1, 337-342.

Zellner, A., and C-k. Min (1995), �Gibbs Sampler Convergence

Criteria�, Journal of American Statistical Association, 90, 921-927.

Page 47: 15. Bayesian Methods - University of California, Daviscameron.econ.ucdavis.edu/mmabook/transparencies/ct13... · 2019. 7. 10. · 15. Bayesian Methods c A. Colin Cameron & Pravin

Table 1: Mean and Standard deviation of the PosteriorDistribution of a two-equation SUR Model calculated byGibbs Sampling.

� = 10 � = 1 � = 1=10 � = 10 � = 10N 1000 1000 1000 1000 10000reps 50000 50000 50000 100000 100000�11 0.971 1.013 0.983 1.020 1.010

(0.0310) (0.0312) (0.0316) (0:0324) (0:0100)�12 1.026 0.9835 1.006 1.006 1.015

(0.0265) (0.0271) (.0265) (:0268) (0:0086)�21 1.016 0.972 0.993 1.017 0.991

(0.0309) (0.0325) (0.0322) (0:0326) (0:0100)�22 0.983 0.992 0.979 1.005 1.007

(0.0256) (0.0285) (0.0272) (0:0277) (0:0085)�11 0.960 0.969 1.012 1:043 1.010

(0.0429) (0.0434) (0.0453) (0:0466) (0:0143)�12 -0.499 -0.507 -0.519 -0.576 -0.515

(0.0340) (0.0358) (0.0368) (0:0379) (0:0113)�22 0.950 1.066 1.049 1.062 1.002

(0.425) (0.0476) (0.0467) (0:0472) (0:0141)