An introduction to causal inference for register ... · The methods are becoming more common in...

56
An introduction to causal inference for register researchers - Open national SIMSAM meeting 12-13 October 2016 Ingeborg Waernbaum Department of Statistics, Ume˚ a University Institute for Evaluation of Labour Market and Education Policy, IFAU, Uppsala

Transcript of An introduction to causal inference for register ... · The methods are becoming more common in...

Page 1: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

An introduction to causal inference for registerresearchers - Open national SIMSAM meeting

12-13 October 2016

Ingeborg Waernbaum

Department of Statistics,Umea University

Institute for Evaluation of Labour Market and Education Policy, IFAU, Uppsala

Page 2: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference
Page 3: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Register studies and causal questions

Register studies commonly involve drawing causal conclusionsabout investigated relationships.

Medical sciences – data from population-based hospital recordslinked to socio-economic registers is used to both support andgenerate new hypotheses.

Social science – data from linked registers often the only reliablesource when studying individuals’ economic and social positionsand life trajectories wrt education, the job market and welfaresystems.

Register-based research contributes to the forming of newknowledge of causal pathways and mechanisms in theirrespective subject matter disciplines.

Page 4: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

What is causal inference?

There has been a rapid development in the statistical research field ofcausal inference.

Causal inference - statistical methods related to the estimation ofcausal parameters.

A causal parameter is defined as a summary measure of outcomes thatwould occur under hypothetical interventions on theexposure/treatment of interest:

Causal parameters can be estimated with experimental data and inobservational studies under various sets of assumptions.

The methods are becoming more common in medical and other subjectmatter journals although there may be a need to emphasize and explainthe special precautions of causal analysis.

Page 5: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Outline

1. Science, statistics and causality

2. Causal inference with potential outcomes, model and theory,assumptions

3. Estimators - regression adjustment, matching/stratification,weighting, IV-methods

4. Confounder (covariate) selection

5. Assessing no unmeasured confounding - sensitivity analysis,negative controls

6. Mediation analysis - investigating causal pathways

Page 6: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Statistics; concepts and terminology

Statistics

”The science of collecting, describing and analzying data”

Two theoretical frameworks

Data generating process Observed data

Probability

Inference

Page 7: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Statistics; concepts and terminology

Basic concepts

Variable - a feature of interest of the units under study, e.g.,blood pressure, sex, income, time to death, etc. Notation;U,V,W,X,Y,Z

Parameter - A numerical descriptive measure of the population,e.g., mean, proportion, total, correlation, regression coefficient.Symbols used for parameters; θ, α, β, γ, µ, η, ψ.

Estimator - uses the sample data to calculate a single value whichserves as a best guess of an unkown population parameter. (E.g.,sample mean as an estimator of the population mean.)

Properties of estimators address uncertainty - confidenceintervals, hypothesis testing (p-value).

Page 8: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Statistics; concepts and terminology

Page 9: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

The role of statistics: Scientific questions - parameters ofinterest

Scientific question - statistical model - parameter of interest

The scientific questions determine the parameters of interest which inturn determines the selection of variables to study.

”How the translation from a subject-matter problem to a statisticalmodel is done is often the most critical part of an analysis”.

- Sir David Cox, Principles of Applied Statistics.

Causal inference - estimation of causal parameters

Page 10: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Association and causation

STATISTICAL ASSOCIATION

In statistics, an association is any relationship between two measuredquantities that renders them statistically dependent.P(A,B) 6= P(A) · P(B)P(A|B) 6= P(A)

CAUSE (dictionary):

a person or thing that acts, happens, or exists in such a way that somespecific thing happens as a result; the producer of an effect: ”You havebeen the cause of much anxiety. What was the cause of the accident?”

Page 11: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Statistics and causality

Then: conservative (cautious) approach

“Statistics can establish correlation but not causation”

Now: A currently dominant approach in statistics, econometrics andepidemiology to quantifying causal effects relies on potentialoutcomes (or counterfactuals) [Neyman, 1923; Rubin, 1974; Robins,1986; Pearl, 1995] However there are also alternative approaches andan ongoing discussion.

Page 12: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Randomized experiments: a gold standard for measuringcausal effects

A scientific study where individuals (units) are randomlyassigned to treatment or control

After follow-up the effect of the treatment is measured. By thedesign, the estimated effect does not depend on that theindividuals selected to treatment where different to the controls.

”Randomization - causal effect, Observational study -association”

Page 13: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Potential outcomes: A theoretical framework

A statistical definition of a causal effect.

Causal conclusions rely on causal assumptions - hold under arandomized experiment.

Not testable with the data at hand in the observational study. Canbe evaluated with subject matter theory, sensitivity analysis.

Page 14: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Page 15: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Causal effects with potential outcomes

A causal question is formulated as a contrast of potentialoutcomes that would occur under hypothetical interventions onthe exposure of interest:

Would the outcome of an individual differ if that individual hadbeen with versus without that exposure?

For an individual: Y1 − Y0

Example: The causal effect of an injury on health. What would be thedifference in your health if you got injured compared to if you didn’tget injured

Page 16: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

The Journal of TRAUMA R© Injury, Infection, and Critical Care, 2010ORIGINAL ARTICLE

The Years After an Injury: Long-Term Consequences of Injury onSelf-Rated Health

Anne Mette Hornbek Toft, MSc, Hanne Møller, MA, and Bjarne Laursen, PhD

Background: Knowledge on long-term consequences of injury on health isvital when injury prevention policies and emergency care are planned.However, few studies have described lasting health consequences associatedwith injury. This study analyses the relationship between injury and self-assessed health up to 10 years after the injury.Methods: The study makes use of a public health research database linkinghealth interview survey information with data from national health registries.Using this database, the health of a group of Danish patients with injuryevents during 1995 to 2005 was compared with a noninjured group up to 10years after the injury. The association between self-assessed general healthand self-reported depression and injury-related factors were estimated usinglogistic regression analysis.Results: When patients with injuries compared with noninjured, the oddsratios of poor self-assessed general health and self-reported depression were1.83 (confidence level, 1.53–2.19) and 1.33 (confidence level, 1.14–1.54),respectively. Although decreasing with time, the effect of injury on generalhealth was significant up to 10 years after the injury. The injury type wassignificantly related to health, and in particular, patients with back, head, andneck injuries reporting poor general health. No gender differences werefound in the effect of injury on self-assessed health.Conclusions: Injuries have lasting consequences for physical and mentalhealth up to 10 years after the injury event, in particular, for peoplesustaining head, neck, and back injuries. Sustaining an injury has the sameeffect on general health in men and women.Key Words: Long-term injury outcome, Self-assessed health, Depression,Gender differences.

(J Trauma. 2010;69: 26–30)

In Denmark, unintentional injuries are one of the maincauses for people �40 years of age to be treated at a

hospital, and injuries are the number one cause of death inthis age group.1 There is a substantial amount of studies onthe risk of injury and on the overall societal consequences ofinjury.2–5 Quite a few studies have described the long-termhealth consequences associated with nonfatal injury. Theydocument an increased risk of death attributable to injury upto 10 years after the injury6; higher rates of psychiatric

treatment among injured compared with noninjured7–10; anincreased risk of poor physical and mental health within thefirst years after an injury11,12; a high risk of work absence inthe years after an injury12; and a gender difference in qualityof life after a major trauma.13 Nevertheless, the number ofstudies is limited, and previous studies are often limited bythe study methods applied, e.g., lack of comparisongroup,9,13,14 and short follow-up periods.12,13,15

Focusing on long-term outcomes of injury is importantwhen addressing the definition of injury severity. A focus onthe lasting consequences of injury enables a more completeand detailed understanding of injury severity. Priorities forinjury prevention and health care services ought to depend onsuch knowledge. Thus, an analysis of the long-term outcomesof injury on health is important.

This article presents the results from a population-basedstudy of Danish patients with injuries. The aim of the studywas to quantify the relationship between injury and self-assessed health up to 10 years after the injury and, thus, todetermine whether long-term consequences of injury on self-assessed general health and self-reported depression existed.In addition, the study aimed to determine whether the rela-tionship between injury and health depends on gender, edu-cational level, and injury type. As a population-based study,the focus was more on the frequent injuries than on the mostsevere injuries.

PATIENTS AND METHODS

Study DesignThe study was designed as a retrospective, population-

based cohort study with a follow-up time of between imme-diately after the injury and 10 years after, depending on thesample. The study population comprised people participatingin the Danish Health Interview Surveys (HIS) in either 2000or 2005; both were random samples of Danish citizens aged16 years or older. HIS is a part of the Danish National CohortStudy (DANCOS), and the interview data were linked to theDanish National Patient Registry and other national registriesat the individual level.16

Registry information on injury-related hospital contactsduring 1995 to 2005 was used to identify the injury group: therespondents having sustained a severe injury in the periodfrom 1995 to the time of the interview. The survey responsesfrom this injury group were compared with those of thenoninjured group defined as respondents without a severeinjury-related hospital contact during the same period. As

Submitted for publication March 12, 2009.Accepted for publication January 12, 2010.Copyright © 2010 by Lippincott Williams & WilkinsFrom the National Institute of Public Health, University of Southern Denmark,

Denmark.Supported by Trygfonden grant 7585-07.Address for reprints: Bjarne Laursen, National Institute of Public Health, Univer-

sity of Southern Denmark, Øster Farimagsgade 5A, DK-1353 Copenhagen K,Denmark; email: [email protected].

DOI: 10.1097/TA.0b013e3181d3cbf2

26 The Journal of TRAUMA® Injury, Infection, and Critical Care • Volume 69, Number 1, July 2010

Page 17: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Causal assumption: No unmeasured confounding

U

injury long term health

R

X

� U

injury long term health

� U

X

-

-

Page 18: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Causal assumption: Instrumental variable

G T Y- -

U

� ^

Page 19: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Defining a causal effect through potential outcomes

Pre-treatment variables, covariates X

Treatment/Exposure, T (T = 1 if treated, T = 0 if control)

Potential outcomes: Y0,Y1

Observed response Y = TY1 + (1− T)Y0

Causal effect of the treatment: Y1 − Y0

Average causal effect: ∆ = E(Y1 − Y0)

Average causal effect of the treated: ∆1 = E(Y1 − Y0|T = 1)

Page 20: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Observational study

The average causal effect: E(Y1 − Y0)

Y1

Y1|T = 1

Y0|T = 0

Y0

Under randomization:

E(Y1|T = 1) = E(Y1) and E(Y0|T = 0) = E(Y0)

Page 21: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Confounding

”Confounding is a mixing of the effect of the exposure under study onthe disease with that of a third factor. This third factor must beassociated with the exposure and, independent of that exposure be arisk factor for the disease. In such circumstances the observedrelationship between the exposure and disease can be attributed,totally or in part, to the effect of the confounder.”

E(Y1|T = 1)− E(Y0|T = 0) 6= E(Y1)− E(Y0)

Page 22: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Adjusting for confounders

The potential outcomes: Y0,Y1 can not both be observed for the sameindividual. We observe a random sample: (T,X,Y)

Identification of ∆ = E[Y1 − Y0] involves the question if the observed datacan be used to draw inference on ∆.

CAUSAL ASSUMPTION:

No unmeasured confounding (Y1,Y0) ⊥⊥ T | X

Overlap: 0 < P(T = 1|X) < 1

Page 23: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Adjusting for confounders

Under unconfoundedness the causal effect can be identified by comparingtreated and controls conditional on X.

∆ = EX (E[Y|T = 1,X]− E[Y|T = 0,X])

= EX (E[Y1 − Y0|X])

= E[Y1 − Y0]

or by weighting with the inverse of the propensity score P(T = 1|X) = e(X)

∆ = E(Y1 − Y0) = E(

TYe(X)

)− E

((1− T)Y1− e(X)

).

Page 24: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Page 25: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

Causal inference in register studies

The study population is defined by exclusion/inclusion criteria

The exclusion/inclusion criteria is often decided by subject mattertheory and by the availability of the data, confounders (pre-treatmentvariables), outcome variables.

- Evaluating the effect of infertility treatment on the risk ofcesarean section.

- e.g. all mothers ≥40 that gave birth to their first child during2007-2012.

Causal effects are conditional on the study population characteristics.

Conditioning on common effects of exposure and outcome - bias(Hernan et al., 2004)

Conditioning on post-exposure variables - on the causal pathway

Page 26: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal inference with potential outcomes

American Journal of Obstetrics and Gynecology, 2001

888

Within the past few decades a new phenomenon hasbeen observed in which an increasing number ofwomen begin a realization of the family unit relativelylate in their reproductive lives. Today this delay in child-bearing is socially accepted and relates primarily to in-creased opportunities for education, career choices,and effective means of birth control.1-3 Although it maybe assumed that women >40 years old have made a con-scious decision, have planned the pregnancy, and willenjoy the benefits of good support systems, at least halfof those women experience various difficulties trying toconceive.4 Moreover, the success rates of older women

in programs of assisted reproductive techniques aremuch lower than the success rates of their youngercounterparts.5

Older patients have higher rates of preexisting medicalproblems (such as gestational diabetes and pregnancy-in-duced hypertension), a higher incidence of chromosomalabnormalities, and a higher incidence of cesarean deliver-ies.6-11 All of the above components may increase the per-centage of adverse pregnancy outcomes.6-11 However, therisks for poor outcome of pregnancy and delivery for a 40-year-old woman who has had normal fecundity and has de-layed childbearing voluntarily may differ from those whohave had unsuccessful pregnancy attempts for an extendedperiod of time. Perinatal outcome of pregnancies after as-sisted reproduction in general has been shown to be worsethan that of spontaneously conceived pregnancies.12-16

In this study the pregnancy outcomes of nulliparouswomen >40 years old with singleton pregnancies whoconceived after assisted reproductive techniques werecompared with the outcomes of those who conceivedspontaneously. We specifically examined whether a his-tory of infertility treatment among those women furtherincreased the risk for cesarean delivery.

From the Departments of Obstetrics and Gynecology,a and Epidemiologyand Health Services Evaluation,b Faculty of Health Sciences, SorokaUniversity Medical Center, Ben Gurion University of the Negev.Presented at the Twenty-first Annual Meeting of the Society for Maternal-Fetal Medicine, Reno, Nev, February 5-10, 2001.Reprint requests: Eyal Sheiner, MD, Department of Obstetrics and Gyne-cology, Ben Gurion University, Soroka Medical Center, PO Box 151,Beer Sheva 84101, Israel. E-mail: [email protected] © 2001 by Mosby, Inc.0002-9378/2001 $35.00 + 0 6/6/117308doi:10.1067/mob.2001.117308

Infertility treatment is an independent risk factor for cesareansection among nulliparous women aged 40 and above

Eyal Sheiner, MD,a Ilana Shoham-Vardi, PhD,b Reli Hershkovitz, MD,a Miriam Katz, MD,a andMoshe Mazor, MDa

Beer Sheva, Israel

OBJECTIVE: To determine whether nulliparous women >40 years old with singleton pregnancies who con-ceived after infertility treatment are at an increased risk for cesarean section compared with older nulliparouspatients who conceived spontaneously.STUDY DESIGN: All subjects in this study were nulliparous women >40 years old with singleton gestationswho were delivered of their infants between 1990 and 1998. The Mantel-Haenszel procedure was used toobtain the weighted odds ratios and to control for confounding variables.RESULTS: During the study period, 115 nulliparous women >40 years old with singleton pregnancies weredelivered of their infants in our institute. Of those, 80 pregnancies were spontaneous and 35 pregnancies oc-curred after infertility treatment. Women treated for infertility had a higher rate of low-birth-weight (<2500 g)newborns (34.3% versus 10.1%; odds ratio, 4.7; 95% CI, 1.5 to 14.6; P = .002). No other statistically signifi-cant demographic and obstetric differences were found between the groups. There were no cases of perina-tal death in the study population. Women treated for infertility had statistically significant higher rates ofcesarean section compared with those who conceived spontaneously (71.4% versus 41.3%; odds ratio, 3.6;95% CI, 1.4 to 9.2; P = .002). Stratified analysis (the Mantel-Haenszel technique) was used to control forpossible confounders such as low birth weight, pathologic presentations, failed induction, nonprogressivelabor, and nonreassuring fetal heart rate tracings. None of those variables explained the higher incidence ofcesarean section in the group treated for infertility.CONCLUSION: A history of infertility treatment among nulliparous women >40 years old with singleton preg-nancies increases the risk for cesarean delivery independently of other known risk factors. (Am J Obstet Gy-necol 2001;185: 888-92.)

Key words: Cesarean section, older nulliparous women, infertility treatment

Page 27: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Estimators of causal effects

Different estimators of the average causal effect, average causal effectof the treated have been studied and proposed (Review by Imbens andWooldridge, 2009, Hernan and Robins book forthcoming).

The estimators build on no unmeasured confounding and overlap orIV-assumptions but also on other model assumptions (in variousdegrees).

Non-parametric estimators: no assumptions on distribution of orfunctional form of (X,Y,T) other than no unmeasured confoundingand overlap. Suffer from ”curse of dimensionality”

Semi-parametric estimators assume some model(s) and leave someaspects of p(y, t, x) unspecified.

Page 28: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Regression modelling, OR

Y = α0 + α1T + α2X1 + α3X2 + . . .+ ε

Direct interpretation of a treatment/exposure parameter in anoutcome regression (OR) model as an estimate of an average causaleffect has been discussed (Rubin, 1997, Lunceford and Davidian,2004, Senn, Graf and Caputo, 2007)

- Only valid for a linear model and constant treatment effect

- Relies on correct specfication of the full model

- Gives no warning if the overlap assumption is not satisfied

Page 29: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

0 2 4 6 8 10

020

4060

X

Y

treatedcontrols

Page 30: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

The propensity score, PS

Many semi-parametric estimators rely on the propensity score (PS).

The PS is a balancing score, i.e, conditional on the propensity score thetreated and controls have the same distribution of the covariates, X.

Instead of controlling for X it is sufficient to adjust for the propensityscore in the analysis (Rosenbaum and Rubin, 1983).

−3 −2 −1 0 1 2 3 4

0.00.1

0.20.3

0.4

x

f(x)

treatedcontrols

−3 −2 −1 0 1 2 3 4

0.00.1

0.20.3

0.4

X conditioning on a balancing score

f(x)

treatedcontrols

Page 31: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Example: Propensity score matching

The goal is to create a balanced sample, i.e., treated and controls havethe same covariate distribution after matching.

Fit a first order binary response modele.g., logit, probit for the treatment/exposure variable.

Match treated to controls (∆1)and controls to treated (∆)with similar PS.

Evaluate covariate balancefor the matched sample

If variables are not balancedadd second order and/orinteraction terms. refugees

flatsmiss

flats

rubblemiss

rubble

indrate

emprate

popgrowth3339

popgrowth1939

log2pop

0.0 0.2 0.4 0.6 0.8 1.0

Absolute standardized differences in means

Before matchingAfter matching

Page 32: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Propensity score methods: Regression andstratification/matching, IPW

Propensity score matching - targeting ∆, ∆1 defining a match,with or without replacement, 1:k (Abadie and Imbens, 2016).

Propensity score stratification – number of strata, additionalregression adjustment (Imbens, 2015).

Propensity score regression (Vansteelandt and Daniel, 2014)

Inverse probability weighting (IPW) - stabilized weights,truncation (Lunceford and Davidian, 2004).

The estimators rely on the PS model in different ways.

Page 33: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Journal of Clinical Epidemiology 58 (2005) 550–559

Propensity score methods gave similar results to traditional regressionmodeling in observational studies: a systematic review

Baiju R. Shaha,b,*, Andreas Laupacisa,b, Janet E. Huxa,b, Peter C. Austina,c

aInstitute for Clinical Evaluative Sciences, G106 – 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, CanadabDepartment of Medicine and Clinical Epidemiology and Health Care Research Program, University of Toronto, Ontario, Canada

cDepartment of Public Health Sciences, University of Toronto, Ontario, Canada

Accepted 20 October 2004

Abstract

Objective: To determine whether adjusting for confounder bias in observational studies using propensity scores gives different resultsthan using traditional regression modeling.

Methods: Medline and Embase were used to identify studies that described at least one association between an exposure and anoutcome using both traditional regression and propensity score methods to control for confounding. From 43 studies, 78 exposure–outcomeassociations were found. Measures of the quality of propensity score implementation were determined. The statistical significance ofeach association using both analytical methods was compared. The odds or hazard ratios derived using both methods were comparedquantitatively.

Results: Statistical significance differed between regression and propensity score methods for only 8 of the associations (10%),κ � 0.79 (95% CI � 0.65–0.92). In all cases, the regression method gave a statistically significant association not observed with thepropensity score method. The odds or hazard ratio derived using propensity scores was, on average, 6.4% closer to unity than that derivedusing traditional regression.

Conclusions: Observational studies had similar results whether using traditional regression or propensity scores to adjust for con-founding. Propensity scores gave slightly weaker associations; however, many of the reviewed studies did not implement propensity scoreswell. � 2005 Elsevier Inc. All rights reserved.

Keywords: Statistical methods; Observational studies; Propensity scores; Regression modeling; Systematic reviews; Confounding

1. Introduction

In observational studies, patient assignment to the exposureof interest is not under the investigators’ control. Therefore,there are likely to be important differences in confounding fac-tors between the exposure groups, so any differences inoutcome may be caused by the exposure itself, by differencesin the measured and unmeasured confounders, or by both.

Multivariate regression is often used to lessen the biascaused by measured confounders, although it cannot adjustfor unmeasured confounders; however, investigators fre-quently seek to construct parsimonious regression modelsusing as few covariates as possible to predict the outcome,and interaction and nonlinear terms are rarely added. Achiev-ing the best possible adjustment for bias may be sacrificedto improve the comprehensibility of the model. Furthermore,

* Corresponding author. Tel.: 416-480-4055 ext. 3798; fax: 416-480-6048.

E-mail address: [email protected] (B.R. Shah).

0895-4356/05/$ – see front matter � 2005 Elsevier Inc. All rights reserved.doi: 10.1016/j.jclinepi.2004.10.016

regression modeling may not alert investigators to situationswhere the confounders do not adequately overlap betweenexposure groups, threatening the validity of conclusionsdrawn from the data. This problem could be exaggeratedwhen small differences in each of a large number of con-founders produce marked separation between the exposuregroups, and hence irresolvable selection bias.

Trying to circumvent these difficulties, Rosenbaum andRubin [1] proposed “propensity scores” in 1983 as amethod of controlling for confounding in observational stud-ies. An individual’s propensity score is defined as his or herconditional probability of a particular exposure versus an-other, given the observed confounders. It can be estimatedwith logistic regression, modeling the exposure as the depen-dent variable and the potential confounders as the inde-pendent variables. Because the model itself is not the focusof the study, it need not be parsimonious and easy to under-stand, so it can include numerous covariates (including thosewith statistically insignificant coefficients) and interactionsand nonlinear terms. Two patients with the same propensity

Page 34: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

PS Estimators: Matching and IPW

Matching and IPW estimators:

∆SM = 1n

[∑n1i=1

(Y1i − Y0i

)+∑n0

i=1

(Y1i − Y0i

)]

∆IPW = 1n

∑ni=1

[TY

e(X) −(1−T)Y(1−e(X)

]

Page 35: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

Doubly robust methods

A doubly robust (DR) estimator is an IPW estimator that has beenaugmented to include outcome regression models for the treated andcontrols (Robins et al. 1994).

Rely on either PS and/or OR. If one is correctly specified then theestimator is consistent for the average causal effect.

∆DR =1n

n∑i=1

TiYi − (Ti − e(Xi))µ1(Xi)

e(Xi)

− 1n

n∑i=1

(1− Ti)Yi + (Ti − e(Xi))µ0(Xi)

1− e(Xi)

where µ1(X) = E(Y1 | X) and µ0(X) = E(Y0 | X)

Page 36: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Estimating causal effects

JAMA, 2013

Changes in Health Care Spending and Quality for MedicareBeneficiaries Associated With a Commercial ACO ContractJ. Michael McWilliams, MD, PhD; Bruce E. Landon, MD, MBA, MSc; Michael E. Chernew, PhD

IMPORTANCE In a multipayer system, new payment incentives implemented by one insurerfor an accountable care organization (ACO) may also affect spending and quality of care foranother insurer’s enrollees served by the ACO. Such spillover effects reflect the extent oforganizational efforts to reform care delivery and can contribute to the net impact of ACOs.

OBJECTIVE We examined whether the Blue Cross Blue Shield (BCBS) of Massachusetts’Alternative Quality Contract (AQC), an early commercial ACO initiative associated withreduced spending and improved quality for BCBS enrollees, was also associated with changesin spending and quality for Medicare beneficiaries, who were not covered by the AQC.

DESIGN, SETTING, AND PARTICIPANTS Quasi-experimental comparisons from 2007-2010 ofelderly fee-for-service Medicare beneficiaries in Massachusetts (1 761 325 person-years)served by 11 provider organizations entering the AQC in 2009 or 2010 (intervention group) vsbeneficiaries served by other providers (control group). Using a difference-in-differencesapproach, we estimated changes in spending and quality for the intervention group in thefirst and second years of exposure to the AQC relative to concurrent changes for the controlgroup. Regression and propensity score methods were used to adjust for differences insociodemographic and clinical characteristics.

MAIN OUTCOMES AND MEASURES The primary outcome was total quarterly medical spendingper beneficiary. Secondary outcomes included spending by setting and type of service, 5process measures of quality, potentially avoidable hospitalizations, and 30-day readmissions.

RESULTS Before entering the AQC, total quarterly spending per beneficiary for theintervention group was $150 (95% CI, $25-$274) higher than for the control group andincreased at a similar rate. In year 2 of the intervention group’s exposure to the AQC, thisdifference was reduced to $51 (95% CI, −$109 to $210; P = .53), constituting a significantdifferential change of −$99 (95% CI, −$183 to −$16; P = .02) or a 3.4% savings relative to anexpected quarterly mean of $2895. Savings in year 1 were not significant (differential change,−$34; 95% CI, −$83 to $16; P = .18). Year 2 savings derived largely from lower spending onoutpatient care (differential change, −$73; 95% CI, −$97 to −$50; P < .001), particularly forbeneficiaries with 5 or more conditions, and included significant differential changes inspending on procedures, imaging, and tests. Annual rates of low-density lipoproteincholesterol testing differentially improved for beneficiaries with diabetes in the interventiongroup by 3.1 percentage points (95% CI, 1.4-4.8 percentage points; P < .001) and for thosewith cardiovascular disease by 2.5 percentage points (95% CI, 1.1-4.0 percentage points;P < .001), but performance on other quality measures did not differentially change.

CONCLUSIONS AND RELEVANCE The AQC was associated with lower spending for Medicarebeneficiaries but not with consistently improved quality. Savings among Medicarebeneficiaries and previously demonstrated savings among BCBS enrollees varied similarlyacross settings, services, and time, suggesting that organizational responses were associatedwith broad changes in patient care.

JAMA. 2013;310(8):829-836. doi:10.1001/jama.2013.276302

Supplemental content atjama.com

Author Affiliations: Department ofHealth Care Policy, Harvard MedicalSchool, Boston, Massachusetts(McWilliams, Landon, Chernew);Division of General Internal Medicineand Primary Care, Department ofMedicine, Brigham and Women’sHospital and Harvard Medical School,Boston, Massachusetts (McWilliams);Division of General Internal Medicineand Primary Care, Department ofMedicine, Beth Israel DeaconessMedical Center, Boston,Massachusetts (Landon).

Corresponding Author: J. MichaelMcWilliams, MD, PhD, Department ofHealth Care Policy, Harvard MedicalSchool, 180 Longwood Ave, Boston,MA 02115 ([email protected]).

Research

Original Investigation

829

Downloaded From: http://jama.jamanetwork.com/ by a Uppsala University User on 09/26/2016

Page 37: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Covariate selection

Covariate selection

When adjusting for confounding in an observational study there canbe multiple covariate sets that can be used (Greenland, Pearl, Robins,1999).

Covariate selection is often performed as part of model selection.However using the covariates that are selected in a propensity scoremodel is not optimal for the purpose of estimating a causal effect(Vansteelandt et el. 2012)

The covariate sets selected affect the bias and variance of estimatorsdepending on their cardinality and structure (de Luna et al. 2011).

Graphical confounder selection criteria (VanderWeele and Shpitser,2011), causes of treatment, outcome or both.

Page 38: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Covariate selection

Sufficient set of confounders

T Y

R

q)

?

X2

X3

X4

X5

^ �

X1 X6

-

Page 39: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Assessing no unmeasured confounding

Unobserved confounding

Some degree of unobserved confounding is almost certainlypresent in most observational studies.

It is been argued that researchers should supplement the primaryanalysis with an assessment of the possible impact of unobservedconfounding

Different approaches have been proposed (Rosenbaum, 2002).

Sensitivity analyses provide a measure of the possible impact ofan unmeasured confounder on the conclusions drawn (Hsu et al.,2013).

Negative control outcomes (Tchetgen Tchetgen, 2014) providemeans to detect and correct for unemasured confounding.

Page 40: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Assessing no unmeasured confounding

Negative control outcomes

U

R

X

� U

-T Y

U

R

X

� U

T N

Page 41: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Assessing no unmeasured confounding

Avai lable onl ine at www.sc iencedirect .com

journal homepage: www.elsevier .com/ locate / jva l

Evaluating the Impact of Unmeasured Confounding with InternalValidation Data: An Example Cost Evaluation in Type 2 DiabetesDouglas Faries, PhD1,�, Xiaomei Peng, MS1, Manjiri Pawaskar, PhD1, Karen Price, PhD1, James D. Stamey, PhD2,John W. Seaman Jr., PhD2

1Eli Lilly & Company, Indianapolis, IN, USA; 2Department of Statistics, Baylor University, Waco, TX, USA

A B S T R A C T

The quantitative assessment of the potential influence of unmeasuredconfounders in the analysis of observational data is rare, despite relianceon the ‘‘no unmeasured confounders’’ assumption. In a recent compar-ison of costs of care between two treatments for type 2 diabetes usinga health care claims database, propensity score matching was imple-mented to adjust for selection bias though it was noted that informationon baseline glycemic control was not available for the propensity model.Using data from a linked laboratory file, data on this potential ‘‘unmea-sured confounder’’ were obtained for a small subset of the originalsample. By using this information, we demonstrate how Bayesianmodeling, propensity score calibration, and multiple imputation canutilize this additional information to perform sensitivity analyses toquantitatively assess the potential impact of unmeasured confounding.Bayesian regression models were developed to utilize the internalvalidation data as informative prior distributions for all parameters,

retaining information on the correlation between the confounder andother covariates. While assumptions supporting the use of propensityscore calibration were not met in this sample, the use of Bayesianmodeling and multiple imputation provided consistent results, suggest-ing that the lack of data on the unmeasured confounder did not have astrong impact on the original analysis, due to the lack of strongcorrelation between the confounder and the cost outcome variable.Bayesian modeling with informative priors and multiple imputationmay be useful tools for unmeasured confounding sensitivity analysis inthese situations. Further research to understand the operating character-istics of these methods in a variety of situations, however, remains.Keywords: Bayesian methods, confounding, robustness, sensitivityanalyses.

Copyright & 2013, International Society for Pharmacoeconomics andOutcomes Research (ISPOR). Published by Elsevier Inc.

Introduction

The use of retrospective observational research as a tool formedical decision making, particularly with data from health careclaims databases and electronic medical records, has been grow-ing in recent years. With large and heterogeneous populations ofpatients, such observational databases are a rich source of usualcare data, which can potentially address a variety of medicalquestions [1,2]. The use of such data for comparative effectiveness,however, is challenged by selection bias and potential for unmea-sured confounding [3–5]. Patients are not randomized to treat-ments and thus comparisons between treatment groups aresubject to bias due to the many factors that influence treatmentchoices in usual care practice. Statistical adjustment for measuredconfounders is possible, such as through propensity score adjust-ment. The validity of such methods, however, relies on theassumption that there are no unmeasured confounders. That is,there are no factors related to both treatment and outcome thatare not collected and appropriately utilized in the analysis. As thisassumption cannot be verified, observational data have lowerinternal validity and are lower on the hierarchy of evidencerelative to randomized clinical trials [6–9].

In prospective observational studies, a researcher can specify thecollection of data on known confounders; however, this opportunitydoes not exist in retrospective database research. While researcherslook for proxies for such known confounders within the existingdatabase, the degree to which this addresses the confounding isunknown. In addition, unknown confounders may exist and withoutrandomization such variables will cause the standard analyses to bebiased. To ensure the robustness of the observational researchfindings, it is important to conduct sensitivity analyses to assessthe potential impact of unmeasured confounding [4,9–10].

While many researchers mention the limitations on infer-ences from their work due to unmeasured confounding, fewdirectly assess the potential impact in a quantitative fashion[10,11]. Even when no or limited additional data on the unmea-sured confounders are available, there are several methods thatcan be utilized to assess sensitivity for unmeasured confounding,including the Rule Out [10] and Bayesian modeling with non-informative priors [12]. The Rule Out approach uses a simplemodel to quantify the level of unmeasured confounding neces-sary to eliminate the observed treatment difference (e.g., movesthe risk ratio to 1). Researchers can then assess whether sucha level of confounding is plausible for their scenario.

1098-3015/$36.00 – see front matter Copyright & 2013, International Society for Pharmacoeconomics and Outcomes Research (ISPOR).

Published by Elsevier Inc.

http://dx.doi.org/10.1016/j.jval.2012.10.012

E-mail: [email protected].

�Address correspondence to: Douglas Faries, Eli Lilly & Company, Lilly Corporate Center, Indianapolis, IN 46285, USA

VA L U E I N H E A L T H 1 6 ( 2 0 1 3 ) 2 5 9 – 2 6 6

Page 42: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Mediation analysis

Causal inquiries often involves studying the direct and indirect effectsof a treatment taking an intermediate variable into account.

Often desired in empirical sciences to investigate causalmechanisms/pathways

-T Y

-

>

~

T Y

M

Page 43: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Mediation analysis

We could be interested in

The effect of education on health (total effect).

Evaluate the role of income as a possible mediator (direct/indirecteffects)

-T Y

-

>

~

T Y

M

Page 44: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Mediation analysis

Early methods (Baron-Kenny, 1986) for linear models and nointeraction between the treatment and the mediator

Many developments in the research field in recent years(Vanderweele, 2015), nonparametric identification of naturaldirect and indirect effects, parameters defined in an extendedpotential outcome framework.

Identification requires further untestable (causal) assumptions

Page 45: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Mediation assumptions

T Y

M

X

W

I �

R

-

Page 46: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Epidemiology

Epidemiology  •  Volume 26, Number 2, March 2015 www.epidem.com  |  153

Original article

Background: the prevalence of overweight and obesity is rising globally and together they constitute a major risk factor for coro-nary heart disease (cHD). Previous estimates of direct effects of high body mass index (BMi) on cHD did not consider an interaction between BMi and its mediators and did not include inflammatory biomarkers as potential mediators.Methods: We analyzed data from 9 prospective cohort studies with 58,322 participants and 9,459 cHD events and decomposed the total effects into natural direct and indirect effects using a 2-stage regression model. We examined overweight (BMi = 25 to <30 kg/m2) separately. We pooled hazard ratios using random-effects models and calculated the percentages of excess relative risk mediated by blood pressure, cholesterol, glucose, fibrinogen and high-sensitive c-reactive protein.Results: there was no interaction between BMi and its mediators in the multiplicative scale (P < 0.05 for all). Blood pressure was the most important mediator. the percentage of excess relative risk of overweight (versus normal BMi, 20 to <25 kg/m2) mediated was 28% for blood pressure, 10% for blood glucose, and 14% for cholesterol. the same percentages for obesity were 37% for blood pressure, 17% for blood glucose, and 6% for cholesterol. the percentage mediated through all three metabolic risk factors together was 47% (95% confi-dence interval = 33%–63%) for overweight and 52% (38%–68%) for obesity. Fibrinogen mediated 6% to 9% and high-sensitive c-reactive

protein mediated 6% to 8% of the excess relative risk for overweight and obese participants.Conclusions: Metabolic mediators explain about half of the adverse effects of high BMi on cHD. the role of inflammatory and prothrom-botic biomarkers is much smaller than that of metabolic factors.

(Epidemiology 2015;26: 153–162)

The prevalence of overweight and obesity, as measured by body mass index (BMi), is rising in most countries of the

world.1,2 their associations with increased risk of cardiovas-cular disease (cVD) have been established in many prospec-tive cohort studies.3–5 Various mechanisms linking high BMi with cVD have been proposed. High BMi increases blood pressure, and serum cholesterol, and it is a major risk factor for diabetes.6,7 in addition, there is increasing evidence that adipose tissue acts as an active endocrine organ and releases pro-inflammatory cytokines, which may have an important role in endothelial dysfunction, may induce low-grade sys-temic inflammation, and may affect fibrinolysis.6,7 a third explanation for the associations between BMi, cardiovascu-lar risk factors and cVD is that they share common causes such as diet and physical activity,8 which may not have been adequately controlled for in previous observational stud-ies. Despite the rising levels of BMi, some individuals and populations have managed to control their blood pressure and serum cholesterol successfully.9–12 effective anti-inflamma-tory and anti-coagulation interventions are also available.13–15 therefore, it is evident that we can partially control the harm-ful effects of BMi on cVD. However, it is not clear to what extent the effect of BMi on cVD operates through metabolic risk factors, inflammatory and coagulatory pathways, or other unknown mechanisms.

in a previous meta-analysis, we used data from 97 pro-spective cohort studies to estimate the effects of high BMi on coronary heart disease (cHD) and stroke, with and without adjustment for metabolic mediators (blood pressure, serum cholesterol, and blood glucose).16 However, our models did not allow interactions between BMi and its mediators, and they did not include inflammatory and prothrombotic bio-markers as potential mediators. in addition, we did not have

copyright © 2015 Wolters Kluwer Health, inc. all rights reserved.iSSn: 1044-3983/15/2602-0153DOi: 10.1097/eDe.0000000000000234

Submitted 16 June 2014; accepted 2 October 2014.From the aDepartment of global Health and Population, Harvard School of

Public Health, Boston, Ma; bchanning Division of network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Ma; cDepartment of epidemiology, Harvard School of Public Health, Boston, Ma; dDepartment of nutrition, Harvard School of Public Health, Boston, Ma; and eMrc-PHe centre for environment and Health, School of Pub-lic Health, imperial college london, london, United Kingdom.

Disclosure: the authors report no conflicts of interest. national institute of Health grant DK09043 (gD); UK Medical research

council and national institute for Health research comprehensive Biomedical research centre at imperial college Healthcare nHS trust (Me); lown Scholars in residence Program and Harvard global Health institute Doctoral research grant (Yl).

Supplemental digital content is available through direct Url citations in the HtMl and PDF versions of this article (www.epidem.com). this content is not peer-reviewed or copy-edited; it is the sole respon-sibility of the author.

Editors' note: A commentary on this article appears on page 163.correspondence: goodarz Danaei, 677 Huntington avenue, Building 1,

room 1107, Boston, Ma 02115. e-mail:[email protected].

Mediators of the Effect of Body Mass Index on  Coronary Heart Disease

Decomposing Direct and Indirect Effects

Yuan Lu,a Kaveh Hajifathalian,a Eric B. Rimm,b,c,d Majid Ezzati,e and Goodarz Danaeia,c

Page 47: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Summary

There has been a rapid development of statistical methods forestimating causal effects under various sets of assumptions.

Causal parameters - identified under random experiments andunder no unmeasured confounding in observational studies.

Study population selected from inclusion/exclusion criteria.

There are multiple sets of confounders that can be adjusteed forin the analysis, the sets have an impact on the estimators’ biasand variance.

Sensitivity analyses can be performed to assess the possibleimpact of unobserved confounders.

Page 48: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: general

Neyman, J. (1923). Sur les applications de la th´eorie des probabilit´es auxexperiences agricoles: Essai des principes. Roczniki Nauk Rolniczych X,1–51. In Polish, English translation by D. Dabrowska and T. Speed inStatistical Science 5, 465?72, 1990.

Rubin, D. B. (1974). Estimating causal effects of treatments in randomizedand nonrandomized studies. Journal of Educational Psychology, 66,688?701.

Robins, J. M. (1986). A new approach to causal inference in mortalitystudies with a sustained exposure period?application to control of thehealthy worker survivor effect. Mathematical Modelling 7.9-12 : 1393-1512.

Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82.4 :669-688.

Page 49: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: general

Pearl, J. (2009) Causality (2nd edition). Cambridge University Press.

Pearl, J., Glymour, M. and Jewell, N.P. (2016) Causal Inference in Statistics:A Primer. John Wiley & Sons.

Robins, J. M., Hernan, M. A., Brumback, B. (2000) Marginal Structuralmodels and Causal Inference in Epidemiology. Epidemiology, 11:5,550-560.

Rosenbaum, P. R., and D. B. Rubin. (1983). The central role of thepropensity score in observational studies for causal effects. Biometrika 70:1:41-55.

Stuart, E. (2010). Matching methods for causal inference: A review and lookforward. Statistical Science, 25, 1-21.

Page 50: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: general

Imbens, G. W., and Wooldridge, J. M. (2009). Recent Developments in theEconometrics of Program Evaluation. Journal of Economic Literature 47.1:5-86.

Lunceford, J.K., Davidian, M. (2004). Stratification and weighting via thepropensity score in estimation of causal treatment effects: a comparativestudy. Statistics in Medicine, 23, 2937-2960.

Rosenbaum, Paul R., and Donald B. Rubin. (1983). The central role of thepropensity score in observational studies for causal effects. Biometrika 70:1:41-55.

Page 51: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: general

Vansteelandt, S., and Daniel, R. M. (2014) On regression adjustment for thepropensity score. Statistics in medicine, 33: 4053-4072.

Hernan, M.A., Hernandez-Dıaz, S. Robins, J.M. (2004). A structuralapproach to selection bias. Epidemiology 15:5, 615-625.

Robins, J. M., Rotnitzky, A., Zhao, L. P. (1994). Estimation of regressioncoefficients when some regressors are not always observed. Journal of theAmerican Statistical Association 89, 846–866.

Imbens, G. W. (2015). Matching methods in practice: Three examples.Journal of Human Resources, 50.2: 373-419.

Page 52: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: Covariate selection

Greenland, S., Pearl, J. and Robins, J.M. (1999). Causal diagrams forepidemiologic research. Epidemiology 10:1, 37-48.

De Luna, X., Waernbaum, I. and Richardson. T.S. (2011). Covariateselection for the nonparametric estimation of an average treatment effect.Biometrika, 98:4, 861-875.

Vansteelandt, S. Bekaert, M., Claeskens, G. (2012). On model selection andmodel misspecification in causal inference. Statistical methods in medicalresearch 21:1, 7-30.

VanderWeele, Tyler J., and Ilya Shpitser. (2011). A new criterion forconfounder selection. Biometrics 67:4 1406-1413.

Page 53: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: Assessing no unmeasured confounding

Lipsitch, M., Tchetgen Tchetgen E., Cohen, T. (2010). Negative controls: atool for detecting confounding and bias in observatinoal studies.Epidemiology 21:3, 383-388.

Tchetgen Tchetgen E. (2014). The control outcome approach for causalinference with unobserved confounding. American Journal of Epidemiology,179:5, 633-640.

Hsu, J. Y., Small, D.S., Rosenbaum, P.R. (2013). Effect modification anddesign sensitivity in observational studies. Journal of the AmericanStatistical Association 108:501, 135-148.

Rosenbaum, P.R. (2002). Observational Studies (2nd edition), New York:Springer.

Page 54: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Literature: Mediation analysis

Pearl, J. (2001). Direct and indirect effects. Proceedings of the seventeenthconference on uncertainty in artificial intelligence. Morgan KaufmannPublishers Inc.

Imai, K., Keele, L., Yamamoto, T. (2010). Identification, inference andsensitivity analysis for causal mediation effects. Statistical Science, 25:1,51-71.

VanderWeele, T. (2015) Explanation in causal inference: methods formediation and interaction. Oxford University Press.

VanderWeele, T., Vansteelandt, S. (2009). Conceptual issues concerningmediation, interventions and composition. Statistics and its Interface 2:457-468.

Page 55: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Software Stata

Abadie, A., et al. (2004). Implementing matching estimators for averagetreatment effects in Stata. The Stata Journal 4: 290-311.

Becker, S.O., Ichino, A. (2002) Estimation of average treatment effectsbased on propensity scores. The Stata Journal, 2.4: 358-377.

Emsley, R. et al. (2008). Implementing double-robust estimators of causaleffects. The Stata Journal 8.3 (2008): 334-353.

Fewell, Z., et al. (2004). Controlling for time-dependent confounding usingmarginal structural models. The Stata Journal, 4.4: 402-420.

Hicks, R., and Tingley, D. (2011). Causal mediation analysis. The StataJournal, 11:4, 605-619.

Page 56: An introduction to causal inference for register ... · The methods are becoming more common in medical and other subject ... 1.Science, statistics and causality 2.Causal inference

Causal mediation analysis

Software R

Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Softwarewith Automated Balance Optimization: The Matching package for R.”Journal of Statistical Software 42 .

van der Wal, W. M., Geskus, R.B. (2011) ipw: An R package for inverseprobability weighting. Journal of Statistical Software 43:13, 1-23.

Tingley, D., et al. (2014) Mediation: R package for causal mediationanalysis. Journal of Statistical Software, 59, 1–38.

Keele, L. (2010) An overview of rbounds: An R package for Rosenbaumbounds sensitivity analysis with matched data. Vignette.

Haggstrom, J., et al. (2015) CovSel: An R Package for Covariate SelectionWhen Estimating Average Causal Effects. Journal of Statistical Software 68:1-20.