Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas...

Vecchia-Laplace Approximations for GeneralizedGaussian Processes

Matthias Katzfuss

Department of StatisticsTexas A&M University

Joint work with Daniel Zilber

September 28, 2018

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 1 / 27

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Gaussian processes

Outline

4 Conclusions

Gaussian processes

Function estimation

Consider a function , observed incompletely , and with noise/error

Gaussian processes

Function estimation

●●

●● ●

●●

●●●

●●

●●●

●●

Gaussian processes

Function estimation

●●

Gaussian processes

Gaussian processes (GPs): Probabilistic function estimators

●●

GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution

Gaussian processes

●●

Gaussian processes

●●

●● ●

●●

●●●

●●

●●●

●●

Gaussian processes

Challenge: GPs are not scalable

For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity

0 20 40 60 80 100

n (thousands)

scubic

Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27

Gaussian processes

Challenge: GPs are not scalable

For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity

0 20 40 60 80 100

n (thousands)

scubiclinear

Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27

Gaussian processes

Existing approaches for computational feasibility

Let Σ be the n × n data covariance matrix.

Existing approaches include:

• Sparse Σ (e.g., Furrer et al., 2006; Kaufman et al., 2008)

• Sparse Σ−1 (e.g., Rue and Held, 2005; Lindgren et al., 2011; Nychkaet al., 2015)

• Low-rank Σ (e.g., Higdon, 1998; Wikle and Cressie, 1999;Quinonero-Candela and Rasmussen, 2005; Banerjee et al., 2008;Cressie and Johannesson, 2008)

• Sparse Cholesky factor of Σ−1: Vecchia

The general Vecchia framework

Outline

4 Conclusions

The general Vecchia framework Overview and basic idea

Outline

4 Conclusions

Vecchia approximation

Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

This factorization motivates the Vecchia (1988) approximation:

f (x) =N∏i=1

f (xi |xg(i)),

where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.

If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

f (x) =N∏i=1

f (xi |xg(i)),

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

f (x) =N∏i=1

f (xi |xg(i)),

General Vecchia framework (Katzfuss and Guinness, 2017)

If screening is weak for data/responses, might be able to includeunobserved (latent) variables in x for stronger screening effect.

Three choices for general Vecchia:

1. Which variables to include in x?

2. How to order the variables in x?

3. How to choose the conditioning sets g(i)?

Choices can result in tremendous differences in terms of approximationaccuracy and computational speed.

Special cases

• Standard/response Vecchia: Noisy responses in x (e.g., Vecchia,1988; Stein et al., 2004)

• Nearest-neighbor GP: Latent GP in x (Datta et al., 2016)

• Multi-resolution approximation (Katzfuss, 2017):Ordering/conditioning based on iterative domain partitioning

• Special cases: Modified predictive process (Finley et al., 2009),FSA-block (Snelson and Ghahramani, 2007; Sang et al., 2011)

Ordering and conditioning

Coordinate

● ● ● ● ● ● ● ● ● ●

●●

●●●

●●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

Max-min distance

● ●

● ● ● ●

● ● ● ● ● ● ● ●

● ●

● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ●

• Max-min distance ordering can be much more accurate thancoordinate ordering (Guinness, 2018)

• Conditioning usually on m nearest previously ordered locations, butmore complicated schemes possible

The general Vecchia framework Gaussian noise

Outline

4 Conclusions

Response Vecchia (Vecchia, 1988)

Vecchia approximation is applied to data/responses directly

Exact GP vs. response Vecchia with m = 4

●●

●● ●

●●

●●●

●●

●●●

●●

Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27

●●

●● ●

●●

●●●

●●

●●●

●●

General Vecchia (Katzfuss and Guinness, 2017)

x consists of noisy data and (unknown/latent) GP realizations

●●

Works well even if data are noisy (m = 4)

Specific methods

• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)

• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)

• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)

Crucial details:

• Include variables and order them such that strong screening effectholds

• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic

graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27

Specific methods

• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)

• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)

• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)

Crucial details:

• Include variables and order them such that strong screening effectholds

• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic

graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27

Vecchia-Laplace approximations for generalized GPs

Outline

4 Conclusions

Non-Gaussian spatial data: Generalized GP (GGP)

Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .

Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from

Bernoulli distribution

●●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

Log-Gaussian Cox process (LGCP) as a GGP

• Poisson process with random intensity function λ(·) = ey(·), wherey(·) ∼ GP(µ,C )

• Approximation as GGP with Poisson likelihood:• Partition domain into grid cells A1, . . . ,An with center points a1, . . . , an• Data zi = z(Ai ): number of observed points in Ai

• Then, z1, . . . , zn|y(·) ind.∼ Poisson(µ(Ai )), where µ(Ai ) ≈ |Ai | ey(ai )

Latent

●●

Observed Down−Sampled

Laplace for non-Gaussian data

For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).

Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data

●●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

But still O(n3) → infeasible for large n

●●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

● ●

●●

●●●

●●

●●●●●

● ●

●●●●●●

●●

●●●

●●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

● ●

●●

●●●

●●

●●●●●

● ●

●●●●●●

●●

●●●

●●

Vecchia-Laplace (Katzfuss & Zilber, in prep.)

Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)

Comparison of MSE relative to Laplace:

Vecchia-Laplace (Katzfuss & Zilber, in prep.)

Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)

Comparison of MSE relative to Laplace:

0 5 10 15 20 25 30

ν = 0.5 (Logistic)

Conditioning Set Size

● ●

0 5 10 15 20 25 30

ν = 0.5 (Poisson)

●● ●

0 5 10 15 20 25 30

ν = 0.5 (Gamma)

●● ●

● VL Laplace Low Rank

Comparison for simulated LGCP

Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid

ν = 0.5 ν = 1.5

0 20 40 60 0 20 40 60

Algorithm Laplace LowRank VL

Comparison for simulated LGCP

Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid

KL divergence

ν = 0.5 ν = 1.5

0 20 40 60 0 20 40 60

Algorithm Laplace LowRank VLMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 24 / 27

Vecchia-Laplace algorithm

• Complexity is linear in n

• Typically converges in < 10 iterations

• Unknown hyperparameters: Approximate the integrated likelihood

• Predictions at unobserved locations

Conclusions

Outline

4 Conclusions

Conclusions

• General Vecchia framework for GP approximations:• Accurate• Choice of x, ordering, and conditioning all important• Can guarantee linear scalability• Extension to GGPs and LGCPs

• R package: https://github.com/katzfuss-group/GPvecchia

• Papers:• Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchia

approximations of Gaussian processes. arXiv:1708.06302.• Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia

approximations of Gaussian-process predictions. arXiv:1805.03309.• Zilber, D. and Katzfuss, M. (in prep.) A Vecchia-Laplace

approximation for generalized Gaussian processes.

• Partially supported by NSF Grants DMS–1654083 and DMS–1521676

Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussianpredictive process models for large spatial data sets. Journal of theRoyal Statistical Society, Series B, 70(4):825–848.

Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very largespatial data sets. Journal of the Royal Statistical Society, Series B,70(1):209–226.

Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016).Hierarchical nearest-neighbor Gaussian process models for largegeostatistical datasets. Journal of the American Statistical Association,111(514):800–812.

Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009).Improving the performance of predictive process modeling for largedatasets. Computational Statistics & Data Analysis, 53(8):2873–2884.

Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering forinterpolation of large spatial datasets. Journal of Computational andGraphical Statistics, 15(3):502–523.

Guinness, J. (2018). Permutation and grouping methods for sharpeningGaussian process approximations. Technometrics.

Higdon, D. (1998). A process-convolution approach to modellingtemperatures in the North Atlantic Ocean. Environmental andEcological Statistics, 5(2):173–190.

Katzfuss, M. (2017). A multi-resolution approximation for massive spatialdatasets. Journal of the American Statistical Association,112(517):201–214.

Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchiaapproximations of Gaussian processes. arXiv:1708.06302.

Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia approximationsof Gaussian-process predictions. arXiv:1805.03309.

Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008). Covariancetapering for likelihood-based estimation in large spatial data sets.Journal of the American Statistical Association, 103(484):1545–1555.

Lindgren, F., Rue, H., and Lindstrom, J. (2011). An explicit link betweengaussian fields and gaussian markov random fields: the stochastic partialdifferential equation approach. Journal of the Royal Statistical Society:Series B, 73(4):423–498.

Nychka, D. W., Bandyopadhyay, S., Hammerling, D., Lindgren, F., andSain, S. R. (2015). A multi-resolution Gaussian process model for theanalysis of large spatial data sets. Journal of Computational andGraphical Statistics, 24(2):579–599.

Quinonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view ofsparse approximate Gaussian process regression. Journal of MachineLearning Research, 6:1939–1959.

Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theoryand Applications. CRC press.

Sang, H., Jun, M., and Huang, J. Z. (2011). Covariance approximation forlarge multivariate spatial datasets with an application to multipleclimate model errors. Annals of Applied Statistics, 5(4):2519–2548.

Snelson, E. and Ghahramani, Z. (2007). Local and global sparse Gaussianprocess approximations. In Artificial Intelligence and Statistics 11(AISTATS).

Stein, M. L., Chi, Z., and Welty, L. (2004). Approximating likelihoods forlarge spatial data sets. Journal of the Royal Statistical Society: SeriesB, 66(2):275–296.

Vecchia, A. (1988). Estimation and model identification for continuousspatial processes. Journal of the Royal Statistical Society, Series B,50(2):297–312.

Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach tospace-time Kalman filtering. Biometrika, 86(4):815–829.

Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas...

Documents

Transcript of Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas...

3AI1 MATHEMATICS-III UNIT 1 : LAPLACE TRANSFORM - Laplace

La Transformation de Laplace - jehresmann.free.frjehresmann.free.fr/fichiers//FICHIERS PDF/SE/COURS/05 LAPLACE... · La Transformation de Laplace LA TRANSFORMATION DE LAPLACE : un

Laplace and inverse laplace transforms

Transformasi laplace latihan - edwardivanfadli.files.wordpress.com · Transformasi laplace latihan Author: CamScanner Subject: Transformasi laplace latihan ...

Vecchia TV vs Nuova TV (Giandomenico Celata, keynote speech)

The Laplace Transform Objective To study the Laplace transform and inverse Laplace transform

Department of Physics Bharathidasan University ... · Properties of Laplace transform – Inverse Laplace transform – Laplace transform of ... Laplace transform of derivatives –

Massa Vecchia Massa Vecchia. - Louis Dressner Selections

Solving circuits directly with Laplace - Iowa State Universitytuttle.merc.iastate.edu/ee230/topics/Laplace/Laplace... · 2020-01-24 · EE 230 Laplace – 1 Solving circuits directly

Vecchia Icnirpapproach

LAPLACE TRANSFORMATION AND APPLICATIONS Laplace transformation

Fast Uncertainty Estimates and Bayesian Model Averaging of DNNsbalaji/udl-camera-ready/UDL-15.pdf · Laplace approximations for neural networks and on the asymptotic distribution

Chapter 7: Laplace Transform...85 Chapter 5: Laplace Transform Sec 5.1 Introduction Example: Find Laplace of f(t) c using definition. Example: Find Laplace of f(t) eat The Laplace

Definitions of the Laplace Transform, Laplace Transform ...faculty.washington.edu/tadg/BEE235S10/Lessons/Week... · Laplace Transform. • 4. The Laplace Transform is used for analog

5 stolpersteine-im-soical-media-prof-martina-dalla-vecchia

Bayesian Computing with INLA: A Review - arXiv · the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs).

Laplace Transform - devel.isa-afp.org€¦ · Laplace Transform Fabian Immler October 3, 2020 Abstract This entry formalizes the Laplace transform and concrete Laplace transforms

Turismo e outdoor vecchia

Transformarea Laplace. Aplicatii ale transformarii Laplace.

Accurate approximations for the complex error function ... · When the argument zis large enough by absolute value, say jx+ iyj>15, the truncation of the Laplace continued fraction