Post on 24-Feb-2021
Vecchia-Laplace Approximations for GeneralizedGaussian Processes
Matthias Katzfuss
Department of StatisticsTexas A&M University
Joint work with Daniel Zilber
September 28, 2018
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 1 / 27
Outline
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27
Gaussian processes
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 3 / 27
Gaussian processes
Function estimation
Consider a function , observed incompletely , and with noise/error
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27
Gaussian processes
Function estimation
●
●●
●
●
●
●
●●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
Consider a function , observed incompletely , and with noise/error
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27
Gaussian processes
Function estimation
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
Consider a function , observed incompletely , and with noise/error
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27
Gaussian processes
Gaussian processes (GPs): Probabilistic function estimators
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27
Gaussian processes
Gaussian processes (GPs): Probabilistic function estimators
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27
Gaussian processes
Gaussian processes (GPs): Probabilistic function estimators
●
●●
●
●
●
●
●●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27
Gaussian processes
Challenge: GPs are not scalable
For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity
0 20 40 60 80 100
010
2030
4050
n (thousands)
hour
scubic
Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27
Gaussian processes
Challenge: GPs are not scalable
For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity
0 20 40 60 80 100
010
2030
4050
n (thousands)
hour
scubiclinear
Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27
Gaussian processes
Existing approaches for computational feasibility
Let Σ be the n × n data covariance matrix.
Existing approaches include:
• Sparse Σ (e.g., Furrer et al., 2006; Kaufman et al., 2008)
• Sparse Σ−1 (e.g., Rue and Held, 2005; Lindgren et al., 2011; Nychkaet al., 2015)
• Low-rank Σ (e.g., Higdon, 1998; Wikle and Cressie, 1999;Quinonero-Candela and Rasmussen, 2005; Banerjee et al., 2008;Cressie and Johannesson, 2008)
• Sparse Cholesky factor of Σ−1: Vecchia
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 7 / 27
The general Vecchia framework
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 8 / 27
The general Vecchia framework Overview and basic idea
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 9 / 27
The general Vecchia framework Overview and basic idea
Vecchia approximation
Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as
f (x) =N∏i=1
f (xi |x1:i−1),
where x1:i−1 := (x1, . . . , xi−1).
This factorization motivates the Vecchia (1988) approximation:
f (x) =N∏i=1
f (xi |xg(i)),
where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.
If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27
The general Vecchia framework Overview and basic idea
Vecchia approximation
Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as
f (x) =N∏i=1
f (xi |x1:i−1),
where x1:i−1 := (x1, . . . , xi−1).
This factorization motivates the Vecchia (1988) approximation:
f (x) =N∏i=1
f (xi |xg(i)),
where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.
If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27
The general Vecchia framework Overview and basic idea
Vecchia approximation
Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as
f (x) =N∏i=1
f (xi |x1:i−1),
where x1:i−1 := (x1, . . . , xi−1).
This factorization motivates the Vecchia (1988) approximation:
f (x) =N∏i=1
f (xi |xg(i)),
where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.
If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27
The general Vecchia framework Overview and basic idea
General Vecchia framework (Katzfuss and Guinness, 2017)
If screening is weak for data/responses, might be able to includeunobserved (latent) variables in x for stronger screening effect.
Three choices for general Vecchia:
1. Which variables to include in x?
2. How to order the variables in x?
3. How to choose the conditioning sets g(i)?
Choices can result in tremendous differences in terms of approximationaccuracy and computational speed.
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 11 / 27
The general Vecchia framework Overview and basic idea
Special cases
• Standard/response Vecchia: Noisy responses in x (e.g., Vecchia,1988; Stein et al., 2004)
• Nearest-neighbor GP: Latent GP in x (Datta et al., 2016)
• Multi-resolution approximation (Katzfuss, 2017):Ordering/conditioning based on iterative domain partitioning
• Special cases: Modified predictive process (Finley et al., 2009),FSA-block (Snelson and Ghahramani, 2007; Sang et al., 2011)
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 12 / 27
The general Vecchia framework Overview and basic idea
Ordering and conditioning
Coordinate
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
●●
●●●
●●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
Max-min distance
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
MRA
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ●
• Max-min distance ordering can be much more accurate thancoordinate ordering (Guinness, 2018)
• Conditioning usually on m nearest previously ordered locations, butmore complicated schemes possible
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 13 / 27
The general Vecchia framework Gaussian noise
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 14 / 27
The general Vecchia framework Gaussian noise
Response Vecchia (Vecchia, 1988)
Vecchia approximation is applied to data/responses directly
Exact GP vs. response Vecchia with m = 4
●
●●
●
●
●
●
●●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27
The general Vecchia framework Gaussian noise
Response Vecchia (Vecchia, 1988)
Vecchia approximation is applied to data/responses directly
Exact GP vs. response Vecchia with m = 4
●
●●
●
●
●
●
●●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27
The general Vecchia framework Gaussian noise
Response Vecchia (Vecchia, 1988)
Vecchia approximation is applied to data/responses directly
Exact GP vs. response Vecchia with m = 4
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27
The general Vecchia framework Gaussian noise
General Vecchia (Katzfuss and Guinness, 2017)
x consists of noisy data and (unknown/latent) GP realizations
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
Works well even if data are noisy (m = 4)
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 16 / 27
The general Vecchia framework Gaussian noise
Specific methods
• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)
• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)
• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)
Crucial details:
• Include variables and order them such that strong screening effectholds
• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic
graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27
The general Vecchia framework Gaussian noise
Specific methods
• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)
• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)
• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)
Crucial details:
• Include variables and order them such that strong screening effectholds
• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic
graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27
Vecchia-Laplace approximations for generalized GPs
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 18 / 27
Vecchia-Laplace approximations for generalized GPs
Non-Gaussian spatial data: Generalized GP (GGP)
Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .
Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from
Bernoulli distribution
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27
Vecchia-Laplace approximations for generalized GPs
Non-Gaussian spatial data: Generalized GP (GGP)
Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .
Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from
Bernoulli distribution
−2
−1
01
2
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27
Vecchia-Laplace approximations for generalized GPs
Non-Gaussian spatial data: Generalized GP (GGP)
Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .
Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from
Bernoulli distribution
−2
−1
01
2
0.0
0.2
0.4
0.6
0.8
1.0
●●
●
● ●
● ●
●●
●●● ●●●●●
●●
● ●
●●●●●
●
●
● ●
●●●●●●●
● ● ●●●
●
●
●
●
●
●
●●
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27
Vecchia-Laplace approximations for generalized GPs
Log-Gaussian Cox process (LGCP) as a GGP
• Poisson process with random intensity function λ(·) = ey(·), wherey(·) ∼ GP(µ,C )
• Approximation as GGP with Poisson likelihood:• Partition domain into grid cells A1, . . . ,An with center points a1, . . . , an• Data zi = z(Ai ): number of observed points in Ai
• Then, z1, . . . , zn|y(·) ind.∼ Poisson(µ(Ai )), where µ(Ai ) ≈ |Ai | ey(ai )
Latent
−3
−2
−1
0
1
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Observed Down−Sampled
12
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 20 / 27
Vecchia-Laplace approximations for generalized GPs
Laplace for non-Gaussian data
For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).
Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data
0.0
0.2
0.4
0.6
0.8
1.0
●●
●
● ●
● ●
●●
●●● ●●●●●
●●
● ●
●●●●●
●
●
● ●
●●●●●●●
● ● ●●●
●
●
●
●
●
●
●●
But still O(n3) → infeasible for large n
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27
Vecchia-Laplace approximations for generalized GPs
Laplace for non-Gaussian data
For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).
Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data
0.0
0.2
0.4
0.6
0.8
1.0
●●
●
● ●
● ●
●●
●●● ●●●●●
●●
● ●
●●●●●
●
●
● ●
●●●●●●●
● ● ●●●
●
●
●
●
●
●
●●
−2
−1
01
23
●●
●
●
●
● ●
●●
●
●
●●●
●●●
●●
●●
●●●●●
●
●
● ●
●●●●●●
●
●●
●●●
●
●
●
●
●
●
●●
But still O(n3) → infeasible for large n
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27
Vecchia-Laplace approximations for generalized GPs
Laplace for non-Gaussian data
For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).
Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data
0.0
0.2
0.4
0.6
0.8
1.0
●●
●
● ●
● ●
●●
●●● ●●●●●
●●
● ●
●●●●●
●
●
● ●
●●●●●●●
● ● ●●●
●
●
●
●
●
●
●●
−2
−1
01
23
●●
●
●
●
● ●
●●
●
●
●●●
●●●
●●
●●
●●●●●
●
●
● ●
●●●●●●
●
●●
●●●
●
●
●
●
●
●
●●
But still O(n3) → infeasible for large n
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27
Vecchia-Laplace approximations for generalized GPs
Vecchia-Laplace (Katzfuss & Zilber, in prep.)
Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)
Comparison of MSE relative to Laplace:
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 22 / 27
Vecchia-Laplace approximations for generalized GPs
Vecchia-Laplace (Katzfuss & Zilber, in prep.)
Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)
Comparison of MSE relative to Laplace:
0 5 10 15 20 25 30
1.00
1.05
1.10
1.15
1.20
ν = 0.5 (Logistic)
Conditioning Set Size
MS
E
●
●
●
●
●
●
● ●
0 5 10 15 20 25 30
1.00
1.05
1.10
1.15
1.20
ν = 0.5 (Poisson)
Conditioning Set Size
MS
E
●
●
●
●
●
●● ●
0 5 10 15 20 25 30
1.00
1.05
1.10
1.15
1.20
1.25
ν = 0.5 (Gamma)
Conditioning Set Size
MS
E
●
●
●
●
●
●● ●
● VL Laplace Low Rank
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 22 / 27
Vecchia-Laplace approximations for generalized GPs
Comparison for simulated LGCP
Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid
RMSE
ν = 0.5 ν = 1.5
0 20 40 60 0 20 40 60
1.2
1.5
1.8
1.00
1.05
1.10
1.15
1.20
1.25
m
RM
SE
Algorithm Laplace LowRank VL
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 23 / 27
Vecchia-Laplace approximations for generalized GPs
Comparison for simulated LGCP
Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid
KL divergence
ν = 0.5 ν = 1.5
0 20 40 60 0 20 40 60
0
1000
2000
3000
0
200
400
600
800
m
LS
Algorithm Laplace LowRank VLMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 24 / 27
Vecchia-Laplace approximations for generalized GPs
Vecchia-Laplace algorithm
• Complexity is linear in n
• Typically converges in < 10 iterations
• Unknown hyperparameters: Approximate the integrated likelihood
• Predictions at unobserved locations
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 25 / 27
Conclusions
Outline
1 Gaussian processes
2 The general Vecchia frameworkOverview and basic ideaGaussian noise
3 Vecchia-Laplace approximations for generalized GPs
4 Conclusions
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 26 / 27
Conclusions
Conclusions
• General Vecchia framework for GP approximations:• Accurate• Choice of x, ordering, and conditioning all important• Can guarantee linear scalability• Extension to GGPs and LGCPs
• R package: https://github.com/katzfuss-group/GPvecchia
• Papers:• Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchia
approximations of Gaussian processes. arXiv:1708.06302.• Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia
approximations of Gaussian-process predictions. arXiv:1805.03309.• Zilber, D. and Katzfuss, M. (in prep.) A Vecchia-Laplace
approximation for generalized Gaussian processes.
• Partially supported by NSF Grants DMS–1654083 and DMS–1521676
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussianpredictive process models for large spatial data sets. Journal of theRoyal Statistical Society, Series B, 70(4):825–848.
Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very largespatial data sets. Journal of the Royal Statistical Society, Series B,70(1):209–226.
Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016).Hierarchical nearest-neighbor Gaussian process models for largegeostatistical datasets. Journal of the American Statistical Association,111(514):800–812.
Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009).Improving the performance of predictive process modeling for largedatasets. Computational Statistics & Data Analysis, 53(8):2873–2884.
Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering forinterpolation of large spatial datasets. Journal of Computational andGraphical Statistics, 15(3):502–523.
Guinness, J. (2018). Permutation and grouping methods for sharpeningGaussian process approximations. Technometrics.
Higdon, D. (1998). A process-convolution approach to modellingtemperatures in the North Atlantic Ocean. Environmental andEcological Statistics, 5(2):173–190.
Katzfuss, M. (2017). A multi-resolution approximation for massive spatialdatasets. Journal of the American Statistical Association,112(517):201–214.
Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchiaapproximations of Gaussian processes. arXiv:1708.06302.
Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia approximationsof Gaussian-process predictions. arXiv:1805.03309.
Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008). Covariancetapering for likelihood-based estimation in large spatial data sets.Journal of the American Statistical Association, 103(484):1545–1555.
Lindgren, F., Rue, H., and Lindstrom, J. (2011). An explicit link betweengaussian fields and gaussian markov random fields: the stochastic partialdifferential equation approach. Journal of the Royal Statistical Society:Series B, 73(4):423–498.
Nychka, D. W., Bandyopadhyay, S., Hammerling, D., Lindgren, F., andSain, S. R. (2015). A multi-resolution Gaussian process model for theanalysis of large spatial data sets. Journal of Computational andGraphical Statistics, 24(2):579–599.
Quinonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view ofsparse approximate Gaussian process regression. Journal of MachineLearning Research, 6:1939–1959.
Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theoryand Applications. CRC press.
Sang, H., Jun, M., and Huang, J. Z. (2011). Covariance approximation forlarge multivariate spatial datasets with an application to multipleclimate model errors. Annals of Applied Statistics, 5(4):2519–2548.
Snelson, E. and Ghahramani, Z. (2007). Local and global sparse Gaussianprocess approximations. In Artificial Intelligence and Statistics 11(AISTATS).
Stein, M. L., Chi, Z., and Welty, L. (2004). Approximating likelihoods forlarge spatial data sets. Journal of the Royal Statistical Society: SeriesB, 66(2):275–296.
Vecchia, A. (1988). Estimation and model identification for continuousspatial processes. Journal of the Royal Statistical Society, Series B,50(2):297–312.
Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach tospace-time Kalman filtering. Biometrika, 86(4):815–829.
Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 27 / 27