Selection Bias with Linear Probability Models

Selection Bias with Linear Probability Models

(LPM)

Suneel ChatlaGalit Shmueli

Institute of Service Science,National Tsing Hua University, Taiwan

Outline

Ø Introduction to self selectionØ Popular methods for selection bias

correctiono Two step methods (2SLS)o Matching methods (PSM)

Ø Incorporating LPM into 2SLS and PSM Ø Simulation studyØ Conclusions

Quasi-experiments

Like randomized experimental designs that test causal hypotheses but lack random assignment (=self selection)

Pros

• When random assignment is impractical and/or unethical

• Easier to setup, greater external validity• Minimize threats to ecological validity

Cons

• Estimates are subject to contamination by confounding variables (Biased)

• Do not have total control over extraneous variables

Why we need Quasi experiments?

Two Methods for Addressing Selection Bias

Two step methods: Heckman vs Olsen

Stage 1: Selection model (T)

AdjustmentStage 2: Outcomemodel (Y)

𝐸[𝑇|𝑋] = Φ(𝑋𝛾) 𝐼𝑀𝑅 =𝜙(𝑋𝛾)Φ(𝑋𝛾) 𝑌 = 𝑋𝜷 + 𝛿𝐼𝑀𝑅 + 𝜀Heckman

(1977)

𝐸[𝑇|𝑋] = 𝑋𝛾 𝜆 = 𝑋𝛾 − 1 𝑌 = 𝑋𝜷 + 𝛿𝜆 + 𝜀Olsen (1980)

Probit

LPM

Heckman’s

• Bivariate normality

• Inconsistent second stage standard errors

• Identification issues

• Expensive computation

• Convergence issues

Olsen’s

• Linear conditional expectation

• Inconsistent second stage standard errors

• Identification issues

• Cheaper computation

• No convergence issues

In Short: For Continuous Outcome

Open Research Questions

1. Selection model with unequal sample sizes (treat/control) - continuous outcome

2. Binary outcome model – coefficient consistency

3. Selection model with unequal sample sizes (treat/control) + binary outcome model with unequal sample sizes

Simulation Design

Selection model: 𝑆∗ = −0.5 + 0.5𝑥? − 0.5𝑥@ + 1.5𝑥A − 𝑥B + 𝜔

𝑇 = D 1𝑖𝑓𝑆∗ > 0

0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Continuous Outcome model: 𝑌 = 0.5 − 1.5𝑥? + 0.5𝑥@ + 𝑥A + 𝜀

Binary Outcome:

𝑌O = D 1𝑖𝑓𝑌 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑁00, 0.5 −0.4−0.4 0.5

Q1: Continuous outcome: treat/control sample size ratio has no influence

Q2: Binary outcome - coefficients inconsistent

How about marginals?

Q3: Binary outcome - divergence of marginals with imbalance ratio

Outcome cut-off 50% Outcome cut-off 25% Outcome cut-off 5%

Sel

ectio

n cu

t-of

f 50

%Sel

ectio

n cu

t-of

f 25

%Sel

ectio

n cu

t-of

f 5%

Summary: Heckman Vs Olsen

Ø Continuous outcome: Heckman and Olsen corrections are similar, even when unbalanced

Ø Binary outcome: marginal effects from Heckman and Olsen corrections, diverge with imbalance

ØLPM in both stages provides consistent estimates (OLS)

ØBut how about Probit?

Two Methods for Addressing Selection Bias

Matching Methods

Stage 1: Selection model (T)

Covariate balance

Stage 2: Outcomemodel (Y)

𝑙𝑜𝑔𝑖𝑡(𝐸 𝑇 𝑋 ) = (𝑋𝛾)|𝑝 𝑇 = 1− 𝑝 𝑇 = 0 |< 𝜀

𝑌 = 𝑋𝜷 + 𝜀Rosenbaum and Rubin (1985)

𝐸 𝑇 𝑋 = (𝑋𝛾)|𝑝 𝑇 = 1− 𝑝 𝑇 = 0 |< 𝜀

𝑌 = 𝑋𝜷 + 𝜀LPM

Propensity Score Matching (PSM)

ü Only accounts for observable/observed covariates

ü Requires large samples and substantial overlap between treatment and control

ü What happens to ATE if we use LPM for matching?

Simulation Design

Selection model: 𝑇 = 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖( Z

Z[\](^_[`))

Outcome model :𝑌 = 𝑇 + 𝑋𝛽 + 𝜀

𝑁(0, {0.1,1,5})

𝑋~𝑁 0,1 and 𝛽 = 1

• Sample size1000

•Standard deviation0.1,1,5

• Bootstrap50

• 𝑚𝑒𝑎𝑛 𝑌hi? −𝑚𝑒𝑎𝑛 𝑌hij

ATE

Identical ATE from Logit and LPM matching

Summary & Future Researchü LPM similar to logit in terms of estimated Average

Treatment Effectü Ongoing work: what about binary outcome

models?

ü Logit faces problems if insufficient overlap between treat/control

ü Ongoing work: does LPM have overlap issues?

Thank you!

Selection Bias with Linear Probability Models

Data & Analytics

Transcript of Selection Bias with Linear Probability Models