Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate...

26
Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G. Booth

Transcript of Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate...

Page 1: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Candidate Sampling Schemes and

Some Important Applications

Brian S. Caffo

Chair: James G. Booth

Page 2: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Dissertation outline

• Introduction

• Review of Monte Carlo

• Review of conditional inference

• An MCMC algorithm for approximating

conditional probabilities

• ESUP accept/reject sampling

• MCEM algorithm

• Discussion

Page 3: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Candidate sampling schemes

• Target distribution F with density f

• Candidate distribution G with density g

• Supp(F ) ⊂ Supp(G)

F and G are the same type

F ≪ G

• Accept/reject sampling

• Independence metropolis algorithm

• Metropolis Hastings algorithm, allows G

to depend on previously generated vari-

able

Page 4: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Metropolis Hastings algorithm

• Current state xi

• Target density f(·)• Candidate transition density g(·|xi)

• Generate Y ∼ g(y|xi)

• Accept Y as the next state with probability

min(

f(Y )g(xi|Y )f(xi)g(Y |xi)

, 1)

otherwise next state is xi

• Markov chain with f as stationary density

• gθ(·|xi) can be selected at random

Select θ at random from π(θ)

Candidate is gθ(·|xi)

Page 5: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Conditional analysis for log-linear models

Y = (Y1, . . . , Yn)t ∼ Poisson(µ)

Alternative model

log µ = Xβ + Zλ

Null model H0 : λ = 0

log µ = Xβ

Sufficient statistics for β under H0 are Sβ = XtY

Test fit of the null model using conditional

distribution

f(y|Sβ) ∝n

i=1

yi!−1

where Y is in Γ = {y|Xty = Xtyobs}

Page 6: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Benefits of conditional inference

• Eliminates β

• Inferences are “exact”

• Induce correlation

• Avoid Neyman/Scott

• Most standard tests are conditional tests

Criticisms of conditional inference

• Sβ not ancillary for λ

• Often too conservative

• Degenerate conditional distribution

• Often hard/impossible to do

Complexity or size of Γ make calculating

all of its members impossible

Page 7: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Our algorithm for Monte Carlo conditional

inference

• Use the MH algorithm to approximate

expecations from f(y|Sβ)

• Poisson rv’s are approximately normal

• Sampling normal random variables

Y subject to XtY = sβ is easy

• Sampling normal random variables

Y = [ Yt1 Yt

2 ]t

subject to XtY = sβ,Y2 = y2

is nearly as easy

• Update a few random cells each iteration

• Round in a clever way

• Specify g(ynew|yold)

• Specify g(yold|ynew)

• Irreducibility

Page 8: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Wife’s Rating

N F V A Tot

Husband’s

Rating

N 7 7 2 3 19

F 2 8 3 7 20

V 1 5 4 9 19

A 2 8 9 14 33

12 28 18 33 91

Wife’s Rating

N F V A Tot

Husband’s

Rating

N 7 7 2 3 19

F 2 8 3 7 20

V 1 5 4 9 19

A 2 8 9 14 33

12 28 18 33 91

Page 9: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Wife’s Rating

N F V A Tot

Husband’s

Rating

N ? 7 2 ? 19

F 2 8 3 ? 20

V ? 5 ? ? 19

A ? ? ? ? 33

12 28 18 33 91

Wife’s Rating

N F V A Tot

Husband’s

Rating

N 7 ? 2 ? 19

F 2 8 3 ? 20

V ? 5 4 ? 19

A ? ? ? 14 33

12 28 18 33 91

Page 10: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Quasi-symmetry model for a 10×10 table

? ?

? ?

? ? ?

? ? ? ?

? ? ? ? ?

? ? ? ? ? ?

? ? ? ? ? ? ?

? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ?

Page 11: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Accept/reject sampling

• Target distribution F density f

• Candidate distribution G density g

• C ≡ supxf(x)g(x) < ∞ (for now)

• Simulates G variates and accepts those

most consistent with being F variates

Accept/reject algorithm

1 Generate X ∼ G

2 Generate U ∼ U(0,1)

3 Accept X if U ≤ f(X)Cg(X)

• Accepted Xs have distribution F

• Acceptance rate is 1/C

• Only have to know f and g up to con-

stants of proportionallity

• Any upper bound on C works

Page 12: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Accept/reject sampling

Uniform Candidate

(C, 1)

(0, 0)b

b

x

P (Point accepted) = / = 1/C

P (Pt acpt and X < x) = / = F (x)/C

P (X < x|Pt acpt) = / = F (x)

Page 13: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

ESUP Accept/reject sampling

(C, 1)

(0, 0)b

b

(C, 1)

b

b

b

b

• Estimate C with the largest observed

value of f(Xi)/g(Xi)

• Sequence of accepted Xs are no longer in-

dependent or identically distributed

• Conditional on the value of C really sim-

ulating from min(f, Cg)

Page 14: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

ESUP Accept/reject algorithm

1 Generate X ∼ G

2 Generate U ∼ U(0, 1)

3 Accept X if U ≤ f(X)

Cg(X)

4 Update C = max(C, f(X)g(X) )

• Prove everything about ESUP ac-

cept/reject by comparing the candidates

it accepts with the candidates KSUP

accepts

• As C < C ESUP accept/reject always ac-

cepts candidates that KSUP accept/reject

accepts

• For convenience we assume contrary to

the algorithm above that C is updated

only once for every accepted candidate

Page 15: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Notation

• Let {Xij} ∼ G

• Let {Uij} ∼ Uniform(0, 1)

• Yi = Xiτiwhere

τi = min{j|Uij ≤ f(Xij)/Cg(Xij)}• Yi = Xiτi

where

τi = min{j|Uij ≤ f(Xij)/Cig(Xij)}• Assume C = f(x)/g(x) for some x in the

support of F

• Note if∑∞

i=1 P (Yi 6= Yi) < ∞ it doesn’t

matter whether we use ESUP or KSUP

accept/reject sampling

X11 X12 X13 X14 X15

U11 U12 U13 U14 U15

Y1 = X12 Y1 = X15

ESUP KSUP

Page 16: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Theorem If the support of F is discrete then

∞∑

i=1

P (Yi 6= Yi) < ∞

The sequences are the same for all but finitely

many i

Argument

• For each candidate generated there is a

positive probability C = C

• Notice then the number of iterations until

C = C is finite with probability one

P (Yi 6= Yi) ≤ P (A geometric random variable ≥ i)

• It is then easy to show the right hand side

of the inequality sums.

In a sense, this theorem also covers continuous

cases

Page 17: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Continuous Case

Theorem P (Yi 6= Yi) ≤ E[C/Ci − 1]

Theorem E[C/Ci − 1] = O(i−1)

Argument

Note Zi = Ci/C is a max of i.i.d. rvs bdd by 1

P (Zi ≤ z) = F iZ(x) for distribution function FZ

E[1 − Zi] = 1 −∫ 1

0

(1 − F iZ(x))dx

≤ F iZ(ǫ) +

∫ 1

ǫ

F iZ(x)dx

Need to show F iZ(x) ≤ xpi for some 0 < p ≤ 1 and

ǫ ≥ x ≤ 1

Equivalently fZ(x) ≤ pxp−1

Always true for some p if fZ(1) > 0

Assumption that C = f(x)/g(x) for some x in the

support of F

Page 18: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.
Page 19: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Main Theorem

• {Yi} the i.i.d. sequence of F variates from

the KSUP algorithm

• {Yi} the sequence from the ESUP algo-

rithm

• If Y is an F variate, let the mean of

h(Y ) = µh and the variance of h(Y ) be

σ2h

(i.)

∑ni=1 h(Yi)

n→ µh

(ii.)

√n(

P

n

i=1h(Yi)

n − µh)

σh

D→ N(0, 1)

Theorem If h is continuous and E|h(Y )|2+δ < ∞for some δ > 0 then (i.) and (ii.) hold with Yi

replaced with Yi

Proof P (Yi 6= Yi) = O(i−1) and Holder inequality

Page 20: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Notes

• All theorems apply using any estimates of

C that exceed the Ci

• Under smoothness assumptions large

sample behavior of Ci shows that

P (Yi 6= Yi) = O(i−2)

• Infinite value of C can be diagnosed using

large sample behavior of exceedances

Results in a p-value based on Greenwood’s

statistic

• Convergence of the ESUP is independent

of dimension of x

Page 21: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Question

Gender 1 2 3 Count

Male yes yes yes 342

Male yes yes no 26

Male yes no yes 11

Male yes no no 32

Male no yes yes 6

Male no yes no 21

Male no no yes 19

Male no no no 356

Female yes yes yes 440

Female yes yes no 25

Female yes no yes 14

Female yes no no 47

Female no yes yes 14

Female no yes no 18

Female no no yes 22

Female no no no 457

Page 22: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Example: MCEM for a Logit/Normal

Model

• Person i question j

• Response Yij |Ui are independent Bernoulli(πij)

logπij

1 − πij

= Intc + Sex + Question + Person

= α + γI(person i is female) + βj + Ui

• Ui are independent Normal(0, σ2)

• Perhaps not the best model for this data

Page 23: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

MCEM Algorithm

• Let θ = (α, γ, β1, β2, σ)t

• Let θt be the current estimate of θ

• Let ci be the count of people in group i

• EM algorithm obtains θt+1 by maxi-

mizing the expected complete data log-

likelihood

Qt =∑16

i=1 ciE∗t [log f(yi,ui; θ)]

• Where E∗t denotes expectation with repe-

spect to Ui|yi, θt

• MCEM maximizes

Qt =∑16

i=1ci

mi

∑mi

k=1 log f(yi,Uik; θ)

where Uik are i.i.d. from Ui|yi, θt

Page 24: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Details

• To minimize the Monte Carlo variance of

Qt we should set

di = ci (Var [log f(yi,Uik; θ)])1/2

mi = M diP

16

i=1di

Usually the counts dominate this esti-

mate, we can just set di = ci

• ESUP accept/reject sampling

• Use a shifted and scaled t3 distribution as

the candidate distribution

• Location and scale parameters are second

order Taylor approximations to the mo-

ments of Ui|yi, θt

• Difficult to calculate the exact C for this

problem

Page 25: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Performance of ESUP for one iteration and

one cluster in EM algorithm

Table 1: Average number of differences (AND)

and acceptance rate (AR) for marginal and Laplace

candidates with z/n = 1/3 for M = 1, 000.

Marginal Laplace/t

n z AND AR AND AR

9 3 20.56 0.11 0.28 0.85

12 4 18.359 0.07 0.43 0.85

15 5 17.03 0.05 0.27 0.85

18 6 16.09 0.04 0.25 0.86

21 7 14.429 0.03 0.31 0.86

24 8 13.703 0.02 0.28 0.86

27 9 13.134 0.02 0.32 0.86

30 10 11.999 0.02 0.25 0.86

Page 26: Candidate Sampling Schemes and Some Important Applicationsbcaffo/downloads/defense.pdf · Candidate Sampling Schemes and Some Important Applications Brian S. Caffo Chair: James G.

Future research

• Standard errors for MCMC exact condi-

tional inference

Bounding the mixing time

Perfect sampling

• Extensions to the conditional saddlepoint

approximation

• Groebner bases ?

• Completely automated accept/reject sam-

pler

Rao-Blackwellization