Multi-Site Time Series Studies: Effect of Air Pollution on Morbidity ...
Transcript of Multi-Site Time Series Studies: Effect of Air Pollution on Morbidity ...
Multi-Site Time Series Studies: Effect of Air Pollution on Morbidity
and Mortality (in Pittsburgh)
Francesca Dominici Department of Biostatistics
Harvard School of Public Health
May 7 2013
Outline • What is a multi-site time series study?
• Which statistical methods do we use to analyze a multi-site time series study?
• What are the benefits of multi-site versus single-site studies?
• What are the strengths/limitations of multi-site versus meta-analysis?
• What do we know now about the short-term effects of air pollution?
Data
Single-city time series studies in the U.S.
Steubenville, OH
Schwartz, 1992
Philadelphia, PA
Kelsall et al. 1997
Birmingham, AL
Schwartz 1993
Utah Valley Pope
et al. 1992
National Data Bases
Health data
(Medicare, NCHS)
120 GB
Exposure data (EPA) 2GB
Weather data (NOAA) 5GB
Daily time series data linked by county
National Morbidity Mortality Air Pollution Study
1987—2006
What is a multi-site time series study?
NMMAPS 1987—2006
• 108 urban communities (including Pittsburgh)
• Cause-specific mortality data from NCHS
– all-cause (non-accidental), CVD, respiratory, COPD, pneumonia, accidental
• Weather from NWS
– Temperature, dew point, relative humidity
• Air pollution data from the EPA
– PM10, PM2.5, O3, NO2, SO2, CO
• U.S. Census 1990, 2000
Methods
Multi-site time series models of air pollution and mortality
• Stage 1 (within city): Poisson regressions for estimating short-term association between air pollution and mortality, controlling for time-varying confounders
• Stage 2 (between cities): Hierarchical model for pooling information across neighboring cities and obtaining a national average effect
Confounding bias
• The association between air pollution and mortality is potentially confounded by:
– Weather: mortality is higher at low and high temperatures
– Seasonality: e.g. mortality generally peaks in winter because of influenza epidemics
– Long-term trends: e.g. improvements in medical practice, lower mortality over time
• All these phenomena cannot be attributed to air pollution
Date
CV
D p
er
10
0,0
00
pe
r d
ay
2002 2004 2006 2008
10
15
20
25
2 df / year
6 df / year
24 df / year
Date
PM
2.5
Co
nce
ntr
atio
n
2002 2004 2006 2008
0
10
20
30
40
50
602 df / year
6 df / year
24 df / year
Degre
es o
f freedom
per y
ear
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
●
●
●
●●
●●●●
●●●●●●
●●●●
●●●●●●●●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
020
40
60
80
100
−3
−2
−1 0 1 2 3
Estimate + bounds
(% increase per 10 units PM2.5)
To Pool or Not to Pool?
• Individual cities can be selected to show one point or another
• Results from individual cities are much more sensitive to model assumptions
• Results from individual cities are swamped by statistical error
• There is not reason to expect that two neighboring cities with similar sources of particles would have qualitative different relative risks
Pooled estimate from multi site time series studies
• It does provides: – evidence of short-term associations
between particulate matter and mortality, on average across locations
– an estimate of the excess number of deaths associated with shorter-term air pollution exposure
• It does not provide: – an estimate of how premature these deaths
are
– an estimate of the extra deaths associated with a sustained exposure which are unrelated to the time of the air pollution episode
Sensitivity of the national average lag effect of PM10 on mortality to different statistical models to adjust for
confounding (NMMAPS 1987-2000)
Peng Dominici Louis JRSSC 2006
Reported estimate
Different statistical models to adjust for confounding
weak moderate strong
Using information only at the very short time scales
Using information at the short and long (trend and seasonality) time scales
Meta Analysis or Multi site time series study?
Meta-analysis versus multi-site time series study (Bell et al 2005)
This indicates that the lag with the highest effect is more likely to have been reported
What are the benefits of multi-site versus single-site studies?
• The primary advantages of meta-analysis or multi-city study over a single city estimate are:
1. the statistical power gained from aggregating multiple estimates
2. the generation of an overall effect estimate
3. the possibility of exploring heterogeneity of the effect across locations
What are the benefits of multi-site time series studies versus meta-
analysis? • In the meta-analytic approach, the independently
conducted single-city studies generally differ with respect to the specification of the statistical model approaches to addressing confounding by weather and long-term trends, and adjustment for additional pollutants, complicating the interpretation of the overall effect.
• In addition, meta-analyses are subject to publication bias and degree of publication bias is difficult to quantify.
The Evidence
PM10 Mortality NMMAPS
Seasonally varying effect of PM10 at lag 1 by region, 100 U.S. cities, 1987—2000
% in
cr.
in
mort
alit
y w
ith
10
mg
m3
in
cr.
in P
M10
at
lag
1
Industrial Midwest North East
Jan Apr July Oct
North West
-0.5
0.0
0.5
1.0
1.5
Southern California
Jan Apr July Oct
-0.5
0.0
0.5
1.0
1.5
Jan Apr July Oct
South East South West
Jan Apr July Oct
Upper Midwest All Regions
The National Medicare Cohort Study,
1999-2008 (MCAPS)
• Medicare data include:
–Billing claims for everyone over 65 enrolled in Medicare (~48 million people),
•date of service
•disease (ICD 9)
•age, gender, and race
•place of residence (zip code)
• Approximately 204 counties linked to the PM2.5 monitoring network
MCAPS study population: 204 counties with populations larger
than 200,000 (11.5 million people)
PM2.5 and Admissions PM10-2.5 and Admissions
US EPA PM Fact Sheet 2006: To better protect public health EPA issued the Agency most protective suite of national air quality standards for particle pollution ever
Dominici et al JAMA 2006 Peng et al JAMA 2008
• Only seven of the 52 components contributed 1% or more to total mass for yearly or seasonal averages
1. OCM
2. Sulfate
3. Nitrate
4. EC
5. Silicon
6. Sodium Ion
7. Ammonium
Chemical composition data on PM2.5
OC
33%
Si
1%
Na+
1%EC
5% Other
4%
SO4=
30%
NO3-
14%
NH4+
12%
Exposure data: Chemical composition
data on PM2.5
from the STN network
1. Constructed a database of time series data for 52 PM2.5 chemical constituents from over 250 STN monitors for 2000 to 2008
2. Identified a subset of PM2.5 components that substantially contribute and/or co-vary with daily PM2.5 concentrations
3. Constructed a database that links by zip code the chemical composition data to human health data
Bell et al EHP 2007
PM2.5
chemical components and mortality rates: 1999-2008
National average estimates and 95% posterior intervals for the percent increase in hospital admissions for cardiovascular diseases per 1 IQR increase in each of the seven PM2.5 components, 119 U.S.
counties, 2000--2006.
Peng et al submitted
Peng et al 2008, EHP
Concluding Thoughts
• Evidence that:
– PM2.5 effects varies by season and region, as does PM2.5 chemical composition
– Some PM sources and components are more harmful than others
• True harmful characteristics of PM not fully understood
• Policy challenge: which sources are most harmful?
• Many challenges remain for study of health and PM, as well as pollution mixtures in general
Questions?
Stage 1: City-specific model
Poisson regression model
Pollutant series
Estimated relative rate for city c
True relative rate for city c
Stage 2: Pooling information across cities
True national-average relative rate
Within city Across cities
b̂ c
b c
a
b̂ c = (b̂ c -b c )+ (b c -a)
Stage 2: Pooling information across cities
City-specific MLE
Between-city variance; heterogeneity
National average
City-specific true effect
b̂ c = b c +N(0,vc )
b c =a +N(0,t 2 )
t
Fre
qu
en
cy
0 1 2 3 4 5 6 7
01
02
030
40
50
60
t
Fre
que
ncy
0 1 2 3 4 5 6 7
05
10
15
20
25
30
35
t
Fre
qu
en
cy
0 1 2 3 4 5 6 7
05
10
15
20
25
30
What is heterogeneity? • It the variance across cities of the true (not the
estimated) air pollution effects • The problem is that we do not see the true effects!
What the data say about heterogeneity?
Small (there is evidence
that it might be zero)
Medium (still including evidence that it might be zero)
Large (no evidence of homogeneity)
How do we know whether city-specific short-term effects of air pollution are truly different across cities?
050
100
150
200
−40 −20 0 20 40
county
b^
●
●
●
●●
●
●
●
●
●●●●●●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
●●●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
050
100
150
200
−40 −20 0 20 40
county
b^
●
●
●
●●
●
●
●
●
●●●●●●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
●●●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
050
100
150
200
−40 −20 0 20 40
county
b^
●
●
●
●●
●
●
●
●
●●●●●●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
●●●●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
Stage 1: City-specific model
Poisson regression model
Stage 1: City-specific model
Poisson regression model
Weather
a
0.5 1 1.5 2 2.5
High
Medium
Low
Large heterogeneity Medium heterogeneity Small heterogeneity
Posterior distribution of the pooled effect
Stage 1: City-specific model
Poisson regression model
Seasonal and long-term trends