Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa...

38
Statistical Modelling for Real-time Epidemiology Peter J Diggle School of Health and Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University With help from: Ciprian Crainiceanu, Barry Rowlingson, Ines Sousa, Alex Rodrigues March 2009

Transcript of Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa...

Page 1: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Statistical Modellingfor

Real-time Epidemiology

Peter J Diggle

School of Health and Medicine, Lancaster University

and

Department of Biostatistics, Johns Hopkins University

With help from:

Ciprian Crainiceanu, Barry Rowlingson, Ines Sousa, Alex Rodrigues

March 2009

Page 2: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Outline

• real-time epidemiology

• motivating examples:

– syndromic surveillance for gastroenteric illness

– tropical disease prevalence mapping

– detecting incipient renal failure

• spatio-temporal stochastic processes

– real-valued processes S(x, t)

– separability of spatio-temporal correlation

• examples re-visited

• outlook

Page 3: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Real-time epidemiology

Epidemiology: the study of health outcomes in theirnatural setting

• not just epidemics

• also social outcomes

Real-time epidemiology: analysis of health outcome dataas they accrue

• time-scale is context-specific

• often concerned also with spatial variation

Page 4: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Epidemic vs endemic patterns of incidence

Two animations:

• foot-and-mouth in Cumbria, UK

• campylobacteriosis in Preston, Lancashire, UK

http://www.lancaster.ac.uk/staff/diggle/

Page 5: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Gastroenteric illness in Hampshire, UK

6 March, 2003

Page 6: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Loa loa in equatorial Africa

Page 7: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Incipient renal failure in Salford, UK

78 79 80 81 82 83 84

3.45

3.55

3.65

3.75

agei

log(

eGF

R)

78 79 80 81 82 83 84

−0.

50.

00.

5

agei

hAi

78 79 80 81 82 83 84

−0.

10.

00.

10.

2

agei

hBi

78 79 80 81 82 83 84

0.0

0.2

0.4

0.6

0.8

1.0

agei

prob

abil

Page 8: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Spatio-temporal stochastic processes

Real-valued stochastic process {S(x, t) : x ∈ IR2; t ∈ IR+}

• µ(x, t) = E[S(x, t)] (mean function, or trend)

• γ(x, x′, t, t′) = Cov{S(x, t), S(x′, t′)} (covariance function)

Stationarity:

• mean stationary: µ(x, t) = µ

• covariance stationary: γ(x, x′, t, t′) = γ(u, v)(u = ||x − x′||, v = |t − t′|)

Gaussian process:

• {S(xi, ti) : i = 1, ..., m} ∼ multivariate Gaussian

Page 9: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Modelling covariance structure

• postive-definiteness requirement imposes non-obviousconstraints

• spectral representation easier, but maybe less intuitive

• some general considerations:

– separability: γ(u, v) = σ2ρ(u, v) = σ2ρx(u)ρt(v)

– standard families for ρx(·), ρt(·)

– constructive definitions using convolutions

– low-rank approximations to ease computation

Page 10: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Separability

ρ(u, v) = ρx(u)ρt(v)

Conditioning on the past:

• St = {S(x, t) : x ∈ IR2}

• model St conditional on {Sv : v < t}

• add Markov assumption,

[St|{Sv : v < t}] = [St|St−1]

Separability implies

[S(x, t)|{S(u, t − 1) : u ∈ IR2}] = [S(x, t)|S(x, t − 1)]

Page 11: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Standard families

Matern family widely used in purely spatial case

ρ(u) = 2κ−1(u/φ)κKκ(u/φ)

• shape parameter κ > 0, scale parameter φ > 0

• Kκ(·) : modified Bessel function of order κ

– integer part of κ determines mean-squaredifferentiability of S(·)

– but estimation of κ is difficult

• no obvious non-separable spatio-temporal extension

Page 12: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Spatial convolution-based models

S(x) =

∫k(x − u)B(u)du

• {B(u) : u ∈ IR2} spatially continuous white noise

• kernel function k(v), such that

∫k(v)dv = 1

∫k(v)2vdv < ∞

Covariance structure of S(·) given by

Cov{S(x), S(x − u)} ∝

∫k(u − v)k(v)dv

Page 13: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Low-rank approximations

S(x) =

m∑i=1

k(x − si)Bi, x ∈ A,

• B = (B1, ..., Bm) : i = 1, ..., m (coefficients)

• si ∈ A : i = 1, ..., m pre-specified locations (knots)

• k(·) : i = 1, ..., m real-valued function (kernel)

Practical simplification (stationary case):

• white noise coefficients

• knots to form a regular lattice

Page 14: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Extension to spatio-temporal setting

γ(u, v) = Cov{S(x, t), S(x′, t′)} ∝

∫ ∫k(r−u, s−v)k(u, v)drds

• for separable correlation, specify k(u, v) = k1(u)k2(v)

• positive and negative non-separability:

– positively non-separable if ρ(u, v) > ρ(u, 0)ρ(0, v)

– negatively non-separable if ρ(u, v) < ρ(u, 0)ρ(0, v)

Page 15: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

A non-separable kernel

k(u, v) = ρ(v)c exp{−ρ(v)β(u/τ )2}

Requires β ≤ 1, c > 0, ρ(·) any valid correlation function

• negatively non-separable when β < 0

• separable when β = 0

• positively non-separable when β > 0

Use as

• extant model for measured (geostatistical) data

• latent model for point process data

Page 16: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

The space-time kernel: c = 1

β = 0 β = 1

space

time

−2 −1 0 1 2

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

space

time

−2 −1 0 1 2

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

space

time

−2 −1 0 1 2

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

space

time

−2 −1 0 1 2

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

β = −1 β = −3

Page 17: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

The induced covariance structure: c = 1

β = 0 β = 1

space

time

−2 −1 0 1 2

−2

−1

01

2

space

time

−4 −2 0 2 4

−3

−2

−1

01

23

space

time

−2 −1 0 1 2

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

space

time

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−0.

50.

00.

5

β = −1 β = −3

Page 18: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

AEGISS

Problems with current surveillance system in UK include

• under-reporting by general practitioners (GP’s)

• inconsistencies in reporting rates between GP’s

• delays between onset and confirmation of cases

The AEGISS project: can spatio-temporal statistical modellingimprove the early detection of anomalies in the observedpattern of incident cases?

Page 19: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Possible data-sources

GP-reported data

• GP locations form a discrete spatial network

• known number of patients registered with each GP

• possibility of individual follow-up

• problems with inconsistency of GP reporting

NHS Direct data

• a relatively new, 24hr phone-in advisory service

• date and location (post-code) recorded for each call

• but spatial and temporal pattern of usage is unknown

• data on approx 10,000 incident cases, 2001 to 2003

Page 20: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Statistical formulation

Notation

λ0(x, t) = expected intensity of incident casesλ(x, t) = actual intensity of incident casesR(x, t) = spatio-temporal variation from normal pattern

λ(x, t) = λ0(x, t)R(x, t)

= λ0(x)µ0(t) exp{S(x, t)}

S(x, t) modelled as stationary Gaussian process

Scientific objective

• use incident data up to time t to construct predictivedistribution for current R(x, t)

• identify and map anomalies, for further investigation.

Diggle et al (2003), Diggle, Rowlingson and Su (2004)

Page 21: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Spatial prediction

• plug-in for estimated model parameters

• MCMC to generate samples from conditionaldistribution of S(x, t) given data up to time t

• choose critical threshold value c > 1

• map empirical exceedance probabilities,

pt(x) = P (exp{S(x, t)} > c|data)

• web-reporting with daily updates, demo at:

http://www.lancaster.ac.uk/staff/diggle/

Page 22: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Spatial prediction : results for 6 March 2003

c = 2

Page 23: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Loa loa

WHO policy requirement: identify areas where prevalenceexceeds 20%

Model:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

location

prev

alen

ce

environmental gradient

local fluctuation

sampling error

Page 24: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Loa loa result

Predictive map: P(prevalence > 0.2|data)

Page 25: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

RAPLOA

• A cheaper alternative to parasitological sampling:

– have you ever experienced eye-worm?

– did it look like this photograph?

– did it go away within a week?

• RAPLOA data collected:

– in sample of villages previously surveyed(to calibrate parasitology vs RAPLOA estimates)

– in villages not previously surveyed(to reduce local uncertainty)

• Calibration model needed to reconcile parasitological andRAPLOA prevalence estimates

Page 26: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical
Page 27: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

RAPLOA calibration

0 20 40 60 80 100

020

4060

8010

0

RAPLOA prevalence

para

sito

logy

pre

vale

nce

−6 −4 −2 0 2

−6

−4

−2

02

RAPLOA logitpa

rasi

tolo

gy lo

git

Empirical logit transformation linearises relationshipColour-coding corresponds to four surveys in different regions

Page 28: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

RAPLOA calibration (ctd)

Fit linear functional relationship on logit scale and back-transform

0 20 40 60 80 100

020

4060

8010

0

RAPLOA prevalence

para

sito

logy

pre

vale

nce

Page 29: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

ARLAT

A RapLoa Analysis Tool

• plug-in for QGIS (open-source GIS) by Barry Rowlingson

• updates predictive exceedance maps as and when newRAPLOA data acquired

• portable to lap-top PC’s for use in the field

• periodic off-line updating of base-map

Demo at http://www.lancaster.ac.uk/staff/diggle/

Page 30: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Predicting incipient renal failure

• kidney failure can be asymptomatic for many years

• kidney function measured by estimated glomerularfiltration rate (eGFR)

• eGFR declines naturally with increasing age

• but unusually rapid rate of decline suggests need forspecialist treatment

Goal: early detection of primary care patients whose annualrate of decline in eGFR is greater than 5%

Page 31: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Data: cross-sectional

Page 32: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Data: cross-sectional vs longitudinal

Page 33: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Dynamic Regression Model

• Yij = log(eGFR) for patient i at time tij

• E[Yij] =linear function of covariates, initial age and time

Yij = α0(xi) + α1tij + Ui + Ci(tij) + Zij

– xi base-line covariates (age, co-morbidities,...)

– Ui subject-specific random effect (intercept)

– Ci(tij) subject-specific random effect (time-varying)

– Zij measurement error and/or short-term fluctua-tions

Model for Ci(t) is integrated Brownian Motion

Ci(tj) =

∫ tj

0

Bi(u)du, Bi(u)|Bi(s) ∼ N(Bi(u), (u − s)σ2)

Page 34: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Prediction of subject-specific rate of change

Goal: predict [Bi(tij)|Yi1, ..., Yi,j−1, Yij]

• Bi(t) measures rate of change of eGFR relative toexpectation

• Kalman filter algorithm allows efficient updating ofpredictive distribution whenever new eGFR measurementbecomes available

Predictive inference: P(Bi(tij) < −0.05|Yi1, ..., Yi,j−1, Yij)

Page 35: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Results for diabetic patient A in primary care

78 79 80 81 82 83 84

3.45

3.55

3.65

3.75

agei

log(

eGF

R)

78 79 80 81 82 83 84

−0.

50.

00.

5

agei

hAi

78 79 80 81 82 83 84

−0.

10.

00.

10.

2

agei

hBi

78 79 80 81 82 83 84

0.0

0.2

0.4

0.6

0.8

1.0

agei

prob

abil

Page 36: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Results for diabetic patient B in primary care

71 72 73 74 75 76

3.0

3.2

3.4

3.6

3.8

4.0

agei

log(

eGF

R)

71 72 73 74 75 76

−0.

50.

00.

5

agei

hAi

71 72 73 74 75 76

−0.

2−

0.1

0.0

0.1

agei

hBi

71 72 73 74 75 76

0.0

0.2

0.4

0.6

0.8

1.0

agei

prob

abil

Page 37: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

Outlook

• real-time health outcome data are becoming morewidely available

• but are often under-utilised

• real-time data presents interesting challenges forstatisticians

• statistical modelling is a two-edged sword

We buy information with assumptions

Coombs, 1964

• ⇒ need for close collaboration between statisticians andsubject-matter experts

Page 38: Statistical Modelling for Real-time Epidemiology · 2009-03-04 · spatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American Statistical

References

Crainiceanu, C.,Diggle, P.J. and Rowlingson, B.S. (2008) Bivariate modelling and prediction ofspatial variation in Loa loa prevalence in tropical Africa (with Discussion). Journal of the American

Statistical Association 103, 21–47.

Diggle, P.J. (2007). Spatio-temporal point processes: methods and applications. In Semstat2004:

Statistics of Spatio-Temporal Systems,, ed B Finkenstadt, L Held, V. Isham, 1–45. London: CRCPress.

Diggle, P.J., Knorr-Held, L., Rowlingson, B., Su, T., Hawtin, P. and Bryant, T. (2003). TowardsOn-line Spatial Surveillance. In Monitoring the Health of Populations: Statistical Methods for Public

Health Surveillance., ed R. Brookmeyer and D. Stroup. Oxford : Oxford University Press.

Diggle, P.J. and Ribeiro, P.J. (2007). Model-based Geostatistics. New York: Springer.

Diggle, P., Rowlingson, B. and Su, T. (2005). Point process methodology for on-line spatio-temporal disease surveillance. Environmetrics, 16, 423–34.

Diggle, P.J., Thomson, M.C., Christensen, O.F., Rowlingson, B., Obsomer, V., Gardon, J., Wanji,S., Takougang, I., Enyong, P., Kamgno, J., Remme, H., Boussinesq, M. and Molyneux, D.H. (2007).Spatial modelling and prediction of Loa loa risk: decision making under uncertainty. Annals of

Tropical Medicine and Parasitology, 101, 499–509.

Rodrigues, A. and Diggle, P.J. (2009). A class of convolution-based models for spatio-temporalprocesses with non-separable covariance structure. Scandinavian Journal of Statistics (submitted)

Sousa, I. and Diggle, P.J. (2009). Real-time detection of incipient renal failure in primary carepatients using a dynamic time series model. Biostatistics (submitted)