Multi-site Time Series Analysis Motivation and Methodology

31
1 Multi-site Time Series Analysis Motivation and Methodology SAMSI Spatial Epidemiology Fall 2009 Howard Chang [email protected]

Transcript of Multi-site Time Series Analysis Motivation and Methodology

Page 1: Multi-site Time Series Analysis Motivation and Methodology

1

Multi-site Time Series Analysis

Motivation and Methodology

SAMSI Spatial EpidemiologyFall 2009

Howard Chang

[email protected]

Page 2: Multi-site Time Series Analysis Motivation and Methodology

2

Epidemiology

The study of factors affecting

the health of human populations

Some objectives of epidemiologic studies:

– Identify the cause of a disease and its risk factors.

– Measure the extent and occurrence of the disease.

– Quantify the burden of the disease.

– Evaluate current methods of health care delivery.

– Create preventive and intervention programs.

– Provide information for policy and regulatory decisions.

Page 3: Multi-site Time Series Analysis Motivation and Methodology

3

First Step in Epidemiology

Exposure Adverse Health Outcome?

Exposure Examples

A few factors studied for breast cancer:

genes, physical activity, schizophrenia, birth-weight, obesity,

consumption of fruits and vegetables, total visual blindness, arthritis,

… (about 22,000 hits from PubMed)

Health Outcome Examples

Some ways to measure frailty in the elderly:

slow walking speed, poor grip strength, exhaustion,

unintended weight loss and low physical activity

Page 4: Multi-site Time Series Analysis Motivation and Methodology

4

Challenges in Epidemiologic Study

Test subjects = Humans

Study Design

– How to select and recruit subjects?

• experimental versus observational

• sample size and cost

– How to define and measure exposure?

• duration, intensity

– Ethical concerns

Interpretations

– How to establish causation through associations?

– Can the results be generalized to the whole population?

Bias, Confounder, Interaction

Page 5: Multi-site Time Series Analysis Motivation and Methodology

5

The London Smog (1952)

Adverse health effects of extreme air pollution are well established.

Page 6: Multi-site Time Series Analysis Motivation and Methodology

6

Air Pollution Epidemiology

Scientific Question:

Does everyday level of air pollution affect human health?

Motivations:

Air pollution is experienced by everyone and there is no alternative to

breathing!

The health impact and economic cost of the population can be substantial.

Ambient pollutants are mostly generated by human activities and regulatory

policies are required to protect public health.

Page 7: Multi-site Time Series Analysis Motivation and Methodology

7

Background

The EPA currently regulate six criteria pollutants:

Ozone, particulate matter, carbon monoxide, nitrogen oxides,

sulfur dioxide and lead

The National Ambient Air Quality Standards (NAAQS) provide limits

on both long-term and short-term exposure.

Example: Fine particulate matter (PM2.5)

Similarly the health effects of air pollution are classified as chronic or

acute that are estimated using different study designs.

15 µg/m3Annual

35 µg/m324-hour

LevelAveraging Time

Page 8: Multi-site Time Series Analysis Motivation and Methodology

8

Time Series Analysis

It is the most common population-based study design to estimate the

short-term (acute) health effects of air pollution.

IDEA: Quantify the association between daily variations in air pollution

level and variations in daily adverse health outcomes.

Example: Cook, IL

Page 9: Multi-site Time Series Analysis Motivation and Methodology

9

Chronic Health Effects

Cannot use the time series design that relies on temporal (between

days) comparison.

Study of chronic health effect quantifies the association between spatial

variation in air pollution level and health outcomes in different

geographic areas.

Annual Average Level of PM2.5 (µg/m3)

Page 10: Multi-site Time Series Analysis Motivation and Methodology

10

Multi-site Time Series Analysis

Goal: Estimate the acute health effect of an exposure that

varies both spatially and temporally.

Daily Variation Spatial Variation

050

100

150

Kern County

Daily PM2.5 Level

1999 2000 2001 2002 2003

010

20

30

40

50

King County

Daily PM2.5 Level

1999 2000 2001 2002 2003

Annual Average Level of PM2.5 (µg/m3)

Page 11: Multi-site Time Series Analysis Motivation and Methodology

11

Multi-site Time Series Analysis

Stage I

A single-site time series analysis is conducted within a community such

as a city, a county, or a metropolitan area.

Data:Outcome of interest: daily count for an adverse health outcome in the

community. Example: hospital admissions, deaths

Exposure of interest: daily community-level exposure to air pollution that reflects the average level of exposure experienced by all at-risk individuals.

Other known predictors (confounders) of the health outcome, such as

temperature, humidity, …

Stage II

A multi-site analysis combines the health effects across locations.

Page 12: Multi-site Time Series Analysis Motivation and Methodology

12

Case Study Example: NMMAPS

National Morbidity, Mortality, Air Pollution Study

– Study period: 1987 ~ 2000

– 108 urban communities (cities).

– Daily mortality count from National Center for Health Statistics

– Daily air pollution data (PM2.5, PM10, O3, NO2, SO2, CO)

– Weather data from the National Climate Data Center

– City characteristics from the 2000 Census

Page 13: Multi-site Time Series Analysis Motivation and Methodology

13

NMMAPS Resources

Website: http://www.ihapss.jhsph.edu/

Book:

Page 14: Multi-site Time Series Analysis Motivation and Methodology

14

Case Study Example: MCAPSMedicare and Air Pollution Study

– Study period: 1999 ~ 2005 (on-going)

– Approximately 204 counties

– Medicare enrollees aged 65 or above

– Daily hospital admission count for primary diagnosis

– 11.5 million Medicare enrollees residing an average of 5.9 miles from a PM2.5

Page 15: Multi-site Time Series Analysis Motivation and Methodology

15

Case Study

Study Population

Medicare Enrollees from 204 US counties with population greater than 200,000

Exposure Data

Time series of daily county-level average concentrations of PM2.5 were calculated

using measurements from EPA's monitoring network.

Health Outcome Data

Time series of daily number of hospitalization for various cardiovascular and

respiratory diseases were constructed for each county.

Time series of the total number of at-risk individuals for each hospitalization

outcome.

Page 16: Multi-site Time Series Analysis Motivation and Methodology

16

Stage I County-specific Model

sconfounderxN ptcccctct +++= − )(loglog βαµ

( )ctct Poissony µ~

For county c:

yct = number of admission on day t

xc(t-p) = county-level PM2.5 exposure on day with lag p

(ex. p = 0 for same-day exposure; p = 1 for previous-day exposure)

Nct = population at risk on day t

For each county separately, we model the count outcome via Poisson

regression with over-dispersion:

Page 17: Multi-site Time Series Analysis Motivation and Methodology

17

Stage I Modelling

Time series analysis is ecological in time:

(1) We regress aggregated health outcome on aggregated

exposure.

(2) Day serves as the unit of comparison.

Over-dispersion may be due to residual confounding, measurement error, or

ecological bias.

The acute health effect βc represents:

county-specific log relative risk associated per unit increase in

same-day PM2.5 level controlling for known confounders.

% increase in hospital admissions associated per unit increase in

same-day PM2.5 level controlling for known confounders.

a single number with great policy implication!

Page 18: Multi-site Time Series Analysis Motivation and Methodology

18

Confounders

Also known as hidden variables or lurking variables.

In establishing whether A causes B,

factor C is a confounder if:

(1) C is a known risk factor for B

(2) C is associated with A but not

in the causal pathway of A.

(B) Health Outcome

(A) Air Pollution

(C) Temperature?

Page 19: Multi-site Time Series Analysis Motivation and Methodology

19

Controlling for Confounders

It is important to rigorously control for confounders. A typical model will include:

• Day of the week

• Age-group categories (under 65 versus 65 to 75 versus 75+)

• Smooth function of calendar time to control for long-term trends and seasonality due to

epidemics of influenza and respiratory infections.

• Interaction between age-group and smooth function of time

• Smooth functions of current-day and previous-day temperature

• Smooth function of current-day and previous-day dew-point temperature to control for humidity

Smooth functions for the confounders are modelled via natural cubic spline.

Note that confounders that do no vary with time is automatically controlled for!

Page 20: Multi-site Time Series Analysis Motivation and Methodology

20

Controlling for Confounders Examples

(1) Mortality and Temperature

Association between lag 1 PM10 and mortality as

the number of lags of temperature included in the

model is increased, New York, NY, 1987–2000.

(2) Mortality and Time

Estimates of the log relative risk PM10 for

Denver, Colorado, 1987–2000, as the number

of degrees of freedom per year in the smooth

function of time is varied

Peng RD, Dominici F (2008). Statistical Methods for Environmental Epidemiology in R: A Case Study in Air Pollution and Health, Springer.

Page 21: Multi-site Time Series Analysis Motivation and Methodology

21

Stage II Combining Across Locations

),(~ 2τµβ Normalc

A simple hierarchical model:

Assuming the true location-specific log relative risks are independent across

locations,

µ = ( pooled / overall / average / national ) relative risk

= between-county variability (spatial heterogeneity) in relative risks

One can view the adverse health effects of PM2.5 as treatments that were randomly

assigned to the selected counties or that the risks are exchangeable among counties.

Page 22: Multi-site Time Series Analysis Motivation and Methodology

22

Estimation

We cannot carry out estimation for both Stage I and Stage II simultaneously because of

the large number of county-specific regression coefficients for confounders.

A two-stage approximation approach:

1. First estimate county-specific log relative risk and its variance

2. Use an MLE-based Normal approximation:

cβ̂ cV̂

)ˆ,(~|ˆcccc VNormal βββ

),(~ 2τµβ Normalc

The above two-level Normal-Normal model can be estimated via MCMC,

programs for meta-analysis, or the TLNISE algorithm of Everson and

Morris (2000)

Page 23: Multi-site Time Series Analysis Motivation and Methodology

23

National Estimates for PM2.5 and Admissions

Page 24: Multi-site Time Series Analysis Motivation and Methodology

24

Example: County-specific Effect of PM10 on Mortality

F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and

Health, pages 9–24. The Health Effects Institute, Cambridge, MA, 2003.

MLE Estimates

Page 25: Multi-site Time Series Analysis Motivation and Methodology

25

Example: County-specific Effect of PM10 on Mortality

F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and

Health, pages 9–24. The Health Effects Institute, Cambridge, MA, 2003.

Bayesian Estimates

Page 26: Multi-site Time Series Analysis Motivation and Methodology

26

County-specific Estimates

The hierarchical framework borrows strength across studies (locations).

In Stage I, county-specific relative risks estimates are often poorly estimated.

Example: Mortality and PM10

MLE Bayesian

Log relative rates of mortality from exposure to PM10. areas of the circles are proportional to the posterior precisions of

the Bayesian estimates; larger circles indicate more precise estimates. Black outline denote relative rates with posterior

mean and posterior standard deviation ratio > 1.96

Dominici F. McDermott A. Zeger S.L. Samet J.M. National Maps of the Effects of PM on Mortality: Exploring Geographical Variation

Environmental Health Perspectives vol 111 no 1, 39-43

Page 27: Multi-site Time Series Analysis Motivation and Methodology

27

Risk Heterogeneity

The observed heterogeneity in risks can from unmeasured confounders and effect

modifications due to county-specific characteristics.

We can include higher level covariates in the hierarchical model:

),(~ 2τγβ cc ZNormal

County-specific covariates (Zc) may include factors that potentially modify the true

health effects. Examples:

Exposure measurement errorAverage distance between

residents and monitor

Pollutant composition% urbanicity

Socio-economic status% poverty

To test the effect of Variable

Page 28: Multi-site Time Series Analysis Motivation and Methodology

28

Example of Risk Heterogeneity

Health Outcome

Air Pollution

East versus West

Does region (East versus West) modify the health effects?

?

Page 29: Multi-site Time Series Analysis Motivation and Methodology

29

Example of Health Burden Estimates

N×−×= ]1)10([exp µAnnual reduction

Page 30: Multi-site Time Series Analysis Motivation and Methodology

30

Advantages of Multi-site Time Series

• Can achieve large study population and long study period from utilizing publicly

available national air pollution and health surveillance databases

• Day-to-day comparison allows a community to serve as its own control and

unmeasured confounders that are relatively constant between days.

• A multi-site approach combine evidence, borrow information across locations, and

potentially enhance statistical power.

• Multi-site ensures that the same analytic method is used at each location,

minimizing publication/selection bias and allowing better generalizability of the

results.

• Comparing risk estimates from different locations, effect modification due to

location-specific characteristics can be examined.

Page 31: Multi-site Time Series Analysis Motivation and Methodology

31

Epidemiologic Evidence and Policy

Regarding the time series design, the EPA’s 2004 Criteria document for

particulate matter states that

``the temporal relationship supports a conclusion of a causal relation, even when both the

outcome and the exposure are community indices.’’

Consistency and Strength

Regarding the evidence on the health effects of fine PM,

`` A growing body of epidemiologic evidence both (a) confirms associations between short-

term ambient exposures to fine-fraction particles (generally indexed by PM2.5) and various

mortality or morbidity endpoint effects and (b) supports the general conclusion that PM2.5

(or one or more PM2.5 components), acting alone and/or in combination with gaseous co-

pollutants, are likely causally related to observed ambient fine particle associated health

effects. ’’