Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the...

23
Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University September 17, 2010 Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 1 / 23

Transcript of Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the...

Page 1: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation in the Spatio-TemporalMixed Effects Model –

Analysis of Massive Spatio-Temporal Data Sets

Matthias KatzfußAdvisor: Dr. Noel Cressie

Department of StatisticsThe Ohio State University

September 17, 2010

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 1 / 23

Page 2: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Outline

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 2 / 23

Page 3: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Introduction: The STME Model

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 3 / 23

Page 4: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Introduction: The STME Model

Notation

• Hidden spatio-temporal process yt(s) at time t and location s

• Measurementszt(si ,t) = yt(si ,t) + εt(si ,t)

i = 1, . . . , nt

t = 1, . . . ,T

• In vector notation: z1:T := [z′1, . . . , z′T ]′, where

zt := [z(s1,t), . . . , z(snt ,t)]′

• Goal: Predictyt(s0); t ∈ {1, . . . ,T}

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 4 / 23

Page 5: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Introduction: The STME Model

Motivating Example: Remote-Sensing Data

Example: Global satellite measurementsof CO2

Challenges of global remote-sensing data:

• Massiveness• Need dimension reduction

• Sparseness• Need to take advantage of spatial

and temporal correlations

• Nonstationarity• Need a flexible model

350

355

360

365

370

375

380

385

390

395

400

Day 1

350

355

360

365

370

375

380

385

390

395

400

Day 2

350

355

360

365

370

375

380

385

390

395

400

Day 3

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 5 / 23

Page 6: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Introduction: The STME Model

Spatio-Temporal Mixed Effects Model (Cressie et al., 2010)

Process Model:yt(s) = x(s)′βt + b(s)′ηt + γt(s)

• x(s)′βt : large-scale trend

• b(s) := [b1(s), . . . , br (s)]′: vector of known spatial basis functions

• ηt = Hηt−1 + δt ; t = 1, 2, . . .• η0 ∼ Nr (0,K0)

• δt ∼ Nr (0,U)

• γt(s) ∼ N(0, σ2γvγ(s)): fine-scale variation

Unknown parameters: θ :={{βt}, σ2

γ ,K0,H,U}

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 6 / 23

Page 7: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Introduction: The STME Model

Previous Approaches to Massive S-T Data Sets

• Many ad-hoc methods used outside the statistics literature(non-optimal, no measures of uncertainty)

• Other statistical spatio-temporal dimension-reduction models are lessgeneral (e.g., Nychka et al., 2002)

• STME model: Parameter estimation via binned-method-of-moments(Kang et al., 2010):• Many arbitrary choices have to be made• Estimates have to be modified to be valid• Does not fully exploit temporal dependence in the data

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 7 / 23

Page 8: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 8 / 23

Page 9: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation EM Estimation

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 9 / 23

Page 10: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation EM Estimation

Maximum-Likelihood Estimation

• Goal: Findθ̂ML = arg max

θf (z1:T |θ)

where recall zt = Xtβt + Btηt + γt + εt

• Problem: Likelihood f (z1:T |θ) is quite complicated

• Solution: Expectation-maximization algorithm (Dempster et al.,1977)• Maximization: “Complete-data likelihood” f (η1:T ,γ1:T |θ) is easy to

maximize

• Expectation: Eθ( f (η1:T ,γ1:T |θ) | z1:T ) is obtained via FRS, a rapidsequential updating technique based on the Kalman filter (Kalman,1960)

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 10 / 23

Page 11: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation EM Estimation

EM Estimation (Katzfuss & Cressie, 2010)

The EM algorithm:

• Choose initial value θ[0]

• For l = 0, 1, 2, . . . (until convergence):

1. E-Step: Run FRS with θ[l ] to obtain Eθ[l ]( f (η1:T ,γ1:T |θ) | z1:T )

2. M-Step: θ[l+1] = arg maxθ

Eθ[l ]( f (η1:T ,γ1:T |θ) | z1:T )

3. Go back to 1.

Properties of the resulting estimates:

• Parameter estimates guaranteed to be valid

• Here, convergence to a (possibly local) maximum of the likelihoodfunction

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 11 / 23

Page 12: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation Bayesian Estimation

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 12 / 23

Page 13: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation Bayesian Estimation

Bayesian Inference

• Parameters θ have a prior distribution

• Obtain posterior distribution of unknowns yt(s0) and θ given the dataz1:T using Bayes’ Theorem

• In almost all cases, have to approximate posterior by sampling from it

• “Shrinkage”: Biased, but more efficient estimators

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 13 / 23

Page 14: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Parameter Estimation Bayesian Estimation

Priors and Posteriors

Prior distributions:

• “Standard” priors on {βt} and σ2γ

• Covariance matrices K0 and U: Multiresolutional Givens-angle prior(Kang & Cressie, 2009)• Control extreme eigenvalues• Shrink off-diagonal elements toward zero

• Propagator matrix H: Shrink off-diagonal elements depending on howfar corresponding basis functions are apart

Posterior distribution:

• Samples of posterior distribution obtained using MCMC

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 14 / 23

Page 15: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 15 / 23

Page 16: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

The Data

Mid-tropospheric CO2 on May 1-4, 2003, as measured by AIRS (nt ≈ 14K )

350

355

360

365

370

375

380

385

390

395

400Day 1 Day 2

Day 3 Day 4

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 16 / 23

Page 17: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

Statistical Analysis

• Trend: x(s) = [1 lat(s)]′

• Make predictions on a hexagonal grid of size 57, 065 for each day

• Basis functions: r = 380 bisquare functions at 3 spatial resolutions

−1.0 −0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Bisquare function in one dimension

s

b(s)

Res 1

Res 2

Res 3

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 17 / 23

Page 18: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

EM Results

Predictions using EM

Standard errors using EM

EM computation time: 16 iterations × one minute each = 16 min total

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 18 / 23

Page 19: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

Bayesian Results

Posterior means

Posterior standard deviations

1,500 MCMC iterations × 15 seconds each = 6.25 hours total

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 19 / 23

Page 20: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Application: Analysis of CO2 Data

Estimates of the Propagator Matrix

HEM

50 100 150 200 250 300 350

50

100

150

200

250

300

350

HB

50 100 150 200 250 300 350

50

100

150

200

250

300

350−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 20 / 23

Page 21: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Conclusions

Outline

1 Introduction: The STME Model

2 Parameter EstimationEM EstimationBayesian Estimation

3 Application: Analysis of CO2 Data

4 Conclusions

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 21 / 23

Page 22: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Conclusions

Conclusions

• STME Model• Scalable and flexible technique for analysis of massive, nonstationary

spatio-temporal data sets• Provides uncertainty quantification• Here, successful use on CO2 satellite data

• Parameter estimation:• EM Estimation: Fast, easy• Bayesian estimation: Better prediction (≈ 10% for AIRS data), more

accurate uncertainty assessment

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 22 / 23

Page 23: Parameter Estimation in the Spatio-Temporal Mixed Effects ...€¦ · Parameter Estimation in the Spatio-Temporal Mixed Effects Model – Analysis of Massive Spatio-Temporal Data

Conclusions

References

• Cressie, N., Shi, T., & Kang, E. L. (2010). Fixed rank filtering for spatio-temporaldata. Journal of Computational and Graphical Statistics. Forthcoming.

• Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood fromIncomplete Data via the EM Algorithm. Journal of the Royal Statistical Society,Series B, 39(1), 1–38.

• Kalman, R. (1960). A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82(1), 35–45.

• Kang, E. L., & Cressie, N. (2009). Bayesian inference for the spatial randomeffects model. Department of Statistics Technical Report No. 830. The OhioState University.

• Kang, E. L., Cressie, N., & Shi, T. (2010). Using temporal variability to improvespatial mapping with application to satellite data. Canadian Journal of Statistics.Forthcoming.

• Katzfuss, M., & Cressie, N. (2010). Spatio-Temporal Smoothing and EMEstimation for Massive Remote-Sensing Data Sets. Department of StatisticsTechnical Report No. 840. The Ohio State University.

• Nychka, D. W., Wikle, C., & Royle, J. (2002). Multiresolution models fornonstationary spatial covariance functions. Statistical Modelling, 2, 315-331.

Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 23 / 23