Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC...

55
Double Regression with Post-stratication (DRP) for high-dimensional survey data Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020

Transcript of Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC...

Page 1: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Double Regression with Post-stratification(DRP)

for high-dimensional survey data

Eli Ben-Michael, Avi Feller, and Erin HartmanUC Berkeley & UCLA

BigSurv20November 2020

Page 2: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Even in moderate dimensions, survey adjustment is hard

Raking Post-stratificationMore desirable

Less feasible

1 / 20

Page 3: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Even in moderate dimensions, survey adjustment is hard

Raking

Post-stratificationMore desirable

Less feasible

1 / 20

Page 4: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Even in moderate dimensions, survey adjustment is hard

Raking Post-stratification

More desirableLess feasible

1 / 20

Page 5: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Even in moderate dimensions, survey adjustment is hard

Raking Post-stratificationMore desirable

Less feasible

1 / 20

Page 6: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Even in moderate dimensions, survey adjustment is hard

Raking Post-stratificationMore desirable

Less feasible

1 / 20

Page 7: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Quickly run into empty cells

2 / 20

Page 8: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

How can we account for interactions in a principled way?

Adjusting for interactions in non-response is important– Example: 2016 presidential election polling [Kennedy et al., 2018]

Ideally we’d post-stratify, but we can’t

This paper: Approximately post-stratify while at least raking on margins– Leverage the value of interactions in a parsimonious way– Dual representation as multilevel model of non-response

Combine with outcome model→ Double Regression with Post-stratification (DRP)

3 / 20

Page 9: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

How can we account for interactions in a principled way?

Adjusting for interactions in non-response is important– Example: 2016 presidential election polling [Kennedy et al., 2018]

Ideally we’d post-stratify, but we can’t

This paper: Approximately post-stratify while at least raking on margins– Leverage the value of interactions in a parsimonious way– Dual representation as multilevel model of non-response

Combine with outcome model→ Double Regression with Post-stratification (DRP)

3 / 20

Page 10: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Approximately post-stratifyingwhile at least raking

Page 11: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Notation and setup

i = 1, . . . , N individuals– Outcome Yi, Response Ri with prob P (Ri = 1) = πi

– d categorical covariates w/levels J1, . . . , Jd

Combine into cells Si ∈ {1, . . . , J1 × . . .× Jd ≡ J}– Overall count vector NP ∈ NJ and response count vector nR ∈ NJ

Binary vector of kth order interaction terms for cell s: Dks

4 / 20

Page 12: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Notation and setup

i = 1, . . . , N individuals– Outcome Yi, Response Ri with prob P (Ri = 1) = πi

– d categorical covariates w/levels J1, . . . , Jd

Combine into cells Si ∈ {1, . . . , J1 × . . .× Jd ≡ J}– Overall count vector NP ∈ NJ and response count vector nR ∈ NJ

Binary vector of kth order interaction terms for cell s: Dks

4 / 20

Page 13: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

5 / 20

Page 14: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Goal

Impute the average

µ =1

N

n∑i=1

Yi =1

N

∑s

NPs µs

From the respondents

µ =1

N

N∑i=1

RiγiYi =1

N

∑s

nRs γsYs

Assume responses are Missing At Random (MAR) so that within each cell

E[Ys]

= µs

6 / 20

Page 15: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Goal

Impute the average

µ =1

N

n∑i=1

Yi =1

N

∑s

NPs µs

From the respondents

µ =1

N

N∑i=1

RiγiYi =1

N

∑s

nRs γsYs

Assume responses are Missing At Random (MAR) so that within each cell

E[Ys]

= µs

6 / 20

Page 16: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Goal

Impute the average

µ =1

N

n∑i=1

Yi =1

N

∑s

NPs µs

From the respondents

µ =1

N

N∑i=1

RiγiYi =1

N

∑s

nRs γsYs

Assume responses are Missing At Random (MAR) so that within each cell

E[Ys]

= µs

6 / 20

Page 17: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Choosing weights: Raking and Post-stratification

Raking on margins

Exactly match the counts for margins:∑s

D1sn

Rs γs =

∑s

D1sN

Ps

– Can usually compute if d is moderate– “Only” accounts for the linear variables

Post-stratification

Exactly match the counts within each cell:

γs =NPs

nRs

– Impossible to compute with empty cells– Unbiased when feasible

7 / 20

Page 18: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Choosing weights: Raking and Post-stratification

Raking on margins

Exactly match the counts for margins:∑s

D1sn

Rs γs =

∑s

D1sN

Ps

– Can usually compute if d is moderate– “Only” accounts for the linear variables

Post-stratification

Exactly match the counts within each cell:

γs =NPs

nRs

– Impossible to compute with empty cells– Unbiased when feasible

7 / 20

Page 19: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ| ≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 20: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =

K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ| ≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 21: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =

K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ|

≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 22: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =

K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ| ≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 23: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =

K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ| ≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 24: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

What do we want from calibration weights?Is it worth it to try to post-stratify?

Population regression

Yi =

K∑k=1

βk ·DkSi

+ ei

Estimation error

|µ− µ| ≤ 1

N

K∑k=1

‖βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

If interactions are weak, approximate post-stratification may be enough– Bias depends on strength of interaction × imbalance– Variance depends on sum of the squared weights ‖γ‖22

8 / 20

Page 25: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Approximately post-stratify while at least raking on margins

Find weights via convex optimization:

minγ

d∑k=2

1

λk

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

2

+ ‖γ‖22

subject to∑s

D1sn

Rs γs =

∑s

D1sNs, γs > 0

Move smoothly between two extremes– With λk →∞, recover raking– With λk → 0, recover post-stratification– With λk = 1, cell weights are regularized by cell size

Based on calibration weighting and approximate balancing weights[Deville and Sarndal, 1992; Deville et al., 1993; Zubizarreta, 2015; Hirshberg et al., 2019]

9 / 20

Page 26: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Approximately post-stratify while at least raking on margins

Find weights via convex optimization:

minγ

d∑k=2

1

λk

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

2

+ ‖γ‖22

subject to∑s

D1sn

Rs γs =

∑s

D1sNs, γs > 0

Move smoothly between two extremes– With λk →∞, recover raking– With λk → 0, recover post-stratification– With λk = 1, cell weights are regularized by cell size

Based on calibration weighting and approximate balancing weights[Deville and Sarndal, 1992; Deville et al., 1993; Zubizarreta, 2015; Hirshberg et al., 2019]

9 / 20

Page 27: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Approximately post-stratify while at least raking on margins

Find weights via convex optimization:

minγ

d∑k=2

1

λk

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

2

+ ‖γ‖22

subject to∑s

D1sn

Rs γs =

∑s

D1sNs, γs > 0

Move smoothly between two extremes– With λk →∞, recover raking– With λk → 0, recover post-stratification– With λk = 1, cell weights are regularized by cell size

Based on calibration weighting and approximate balancing weights[Deville and Sarndal, 1992; Deville et al., 1993; Zubizarreta, 2015; Hirshberg et al., 2019]

9 / 20

Page 28: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Dual view: multilevel regression for responseA regularized model for the inverse probability of response:

[Zhao and Percival, 2016; Wang and Zubizarreta, 2020; Chattopadhyay et al., 2020]

1

πi∼

[θ1 ·D1

Si+

K∑k=2

θk ·DkSi

]+

Regularization makes approximate post-stratification feasible

0× ‖θ1‖22 +K∑k=2

λk ‖θk‖22

– At least raking→ no regularization for marginal probabilities [Little and Wu, 1991]

– Approximate post-stratification→ regularizing interaction terms

Link between calibration weighting and inverse propensity score weighting

10 / 20

Page 29: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Dual view: multilevel regression for responseA regularized model for the inverse probability of response:

[Zhao and Percival, 2016; Wang and Zubizarreta, 2020; Chattopadhyay et al., 2020]

1

πi∼

[θ1 ·D1

Si+

K∑k=2

θk ·DkSi

]+

Regularization makes approximate post-stratification feasible

0× ‖θ1‖22 +

K∑k=2

λk ‖θk‖22

– At least raking→ no regularization for marginal probabilities [Little and Wu, 1991]

– Approximate post-stratification→ regularizing interaction terms

Link between calibration weighting and inverse propensity score weighting10 / 20

Page 30: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Double Regression withPost-Stratification (DRP)

Page 31: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

MRP-style approaches

Start with an outcome model (aggregated to cell-level)

µs =1

NPs

∑Si=s

Yi

– Multilevel model, high dimensional regression, tree-based methods

Post-stratify using predictions instead of outcomes

µmrp =1

N

∑s

NPs

nRsnRs µs =

1

N

∑s

NPs µs

11 / 20

Page 32: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

MRP-style approaches

Start with an outcome model (aggregated to cell-level)

µs =1

NPs

∑Si=s

Yi

– Multilevel model, high dimensional regression, tree-based methods

Post-stratify using predictions instead of outcomes

µmrp =1

N

∑s

NPs

nRsnRs µs =

1

N

∑s

NPs µs

11 / 20

Page 33: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight +1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp +1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 34: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight

+1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp +1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 35: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight +1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp +1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 36: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight +1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp

+1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 37: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight +1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp +1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 38: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

DRP: a more surgical approach

Instead of relying on the model everywhere, use it to pick up slack from weighting

µdrp = µweight +1

N

∑s

µs ×(NPs − nRs γs

)︸ ︷︷ ︸imbalance in cell s

= µmrp +1

N

∑s

nRs γs × (Ys − µs)︸ ︷︷ ︸error in cell s

Related to double robust and bias-corrected estimators[Cassel et al., 1976; Robins et al., 1994; Abadie and Imbens, 2006; Hirshberg and Wager, 2019]

– Relies on outcome model in cells where weighting doesn’t get it right– Relies on weights to adjust cells where model is off– If post-stratifying, collapses to weighting estimator µdrp = µweight

12 / 20

Page 39: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Estimation error

Estimation error depends on how good the weights and the model are together∣∣∣µdrp − µ∣∣∣ . 1

N

√∑s

ns(µs − µs)2√∑

s

(nRs γs −NPs )2 +

1

N‖γ‖2‖e‖2

Special case of linear model:

∣∣∣µdrp − µ∣∣∣ . 1

N

K∑k=1

‖βk − βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

13 / 20

Page 40: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Estimation error

Estimation error depends on how good the weights and the model are together∣∣∣µdrp − µ∣∣∣ . 1

N

√∑s

ns(µs − µs)2√∑

s

(nRs γs −NPs )2 +

1

N‖γ‖2‖e‖2

Special case of linear model:

∣∣∣µdrp − µ∣∣∣ . 1

N

K∑k=1

‖βk − βk‖2

∥∥∥∥∥∑s

Dksn

Rs γs −Dk

sNPs

∥∥∥∥∥2

+1

N‖γ‖2‖e‖2

13 / 20

Page 41: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Example: 2016 presidential election

Pre-election Pew poll of vote intention– Age, sex, race, region, party ID, education, income, born again Christian

Impute Republican vote share within each state

Ground truth: weighted CCES

Interactions should be important here [Kennedy et al., 2018]

First, look at imbalance with full weighted CCES as target∣∣NPs − nRs γs

∣∣NPs

14 / 20

Page 42: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Example: 2016 presidential election

Pre-election Pew poll of vote intention– Age, sex, race, region, party ID, education, income, born again Christian

Impute Republican vote share within each state

Ground truth: weighted CCES

Interactions should be important here [Kennedy et al., 2018]

First, look at imbalance with full weighted CCES as target∣∣NPs − nRs γs

∣∣NPs

14 / 20

Page 43: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Both multilevel weighting and raking exactly match marginals...

15 / 20

Page 44: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

...but raking fails to balance higher order interactions

16 / 20

Page 45: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Approximate balance in 3rd order interactions

17 / 20

Page 46: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Weighting on higher order interactions helps

18 / 20

Page 47: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Double regression picks up some slack

19 / 20

Page 48: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Recap

Principled, parsimonious way of leveraging the value of interactions– Multilevel weighting as middle ground between raking and post-stratification– Dual view as multilevel model for non-response– DRP adjusts cells where weighting misses the mark

Thank you!ebenmichael.github.io

20 / 20

Page 49: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Recap

Principled, parsimonious way of leveraging the value of interactions– Multilevel weighting as middle ground between raking and post-stratification– Dual view as multilevel model for non-response– DRP adjusts cells where weighting misses the mark

Thank you!ebenmichael.github.io

20 / 20

Page 50: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

Appendix

Page 51: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

1 / 5

Page 52: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

2 / 5

Page 53: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

3 / 5

Page 54: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

References I

Abadie, A. and Imbens, G. W. (2006). Large Sample Properties of Matching Estimators.Econometrica, 74(1):235–267.

Cassel, C. M., Sarndal, C.-E., and Wretman, J. H. (1976). Some results on generalizeddifference estimation and generalized regression estimation for finite populations.Biometrika, 63(3):615–620.

Chattopadhyay, A., Christopher H. Hase, and Zubizarreta, J. R. (2020). Balancing VersusModeling Approaches to Weighting in Practice. Statistics in Medicine, in press.

Deville, J. C. and Sarndal, C. E. (1992). Calibration estimators in survey sampling. Journal ofthe American Statistical Association, 87(418):376–382.

Deville, J. C., Sarndal, C. E., and Sautory, O. (1993). Generalized raking procedures in surveysampling. Journal of the American Statistical Association, 88(423):1013–1020.

Hirshberg, D. and Wager, S. (2019). Augmented Minimax Linear Estimation.Hirshberg, D. A., Maleki, A., and Zubizarreta, J. (2019). Minimax Linear Estimation of the

Retargeted Mean.

4 / 5

Page 55: Double Regression with Post-strati cation (DRP)...Eli Ben-Michael, Avi Feller, and Erin Hartman UC Berkeley & UCLA BigSurv20 November 2020 Even in moderate dimensions, survey adjustment

References IIKennedy, C., Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., McGeeney,

K., Miringoff, L., Olson, K., Rivers, D., Saad, L., Witt, G. E., and Wlezien, C. (2018). AnEvaluation of the 2016 Election Polls in the United States. Public Opinion Quarterly,82(1):1–33.

Little, R. J. and Wu, M. M. (1991). Models for contingency tables with known margins whentarget and sampled populations differ. Journal of the American Statistical Association,86(413):87–95.

Robins, J. M., Rotnitzky, A., Ping Zhao, L., and Ping ZHAO, L. (1994). Estimation ofRegression Coefficients When Some Regressors are not Always Observed. Journal of theAmerican Statistical Association, 89427:846–866.

Wang, Y. and Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights:Asymptotic properties and practical considerations. Biometrika, 107(1):93–105.

Zhao, Q. and Percival, D. (2016). Entropy Balancing is Doubly Robust. Journal of CausalInference.

Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation WithIncomplete Outcome Data. Journal of the American Statistical Association,110(511):910–922.

5 / 5