Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA...

Efficient Gaussian Process Regression for large Data Sets

ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDARBiometrika, 2008

IntroductionWe have noisy observations from the unknown function observed at locations respectively

The prediction for new input x,

fffT

fx Kkxg y1,, )()()(

Where K_{f,f} is n x n covariance matrix.

Problem: O(n^3) in performing necessary matrix inversions with n denoting the number of data points

nf yy ,...,1y

nf xx ,...,1x

The unknown function f is assumed to be a realization of a GP.

Key idea

• Existing solutions : “Knots” or “landmarks” based solutions– Determining the location and spacing of knots, with the

choice having substantial impact• Methods have been proposed for allowing uncertain numbers and locations of knots

in the predictive process using reversible jump– Unfortunately, such free knot methods increase the computational burden substantially, partially

eliminating the computational savings due to a low rank method

• Motivated by the literature on compressive sensing, the authors propose an alternative random projection of all the data points onto a lower-dimensional subspace

Nystrom Approximation (Williams and Seeger, 2000)

Landmark based Method

• Let X*= Let, Defining

An approximation to

Random projection method

• The key idea for random projection is to use instead of where

Tffff

TffffTTT

aug KK

KKxfxfK ,)(cov

Let be the random projection approximation to ,

ffTff

Tff

RPff KKKQ

1

Some properties

• When m = n, So, and we get back the original process with a full rank

random projection

• Relation to Nystrom approximation– Approximation in the machine learning literature were viewed as

reduced rank approximations to the covariance matrices– It is easy to see that corresponds to a Nystrom approximation to

Choice of Ф

Low Distortion embeddings

• Embed matrix K from a using random projection matrix

• Embeddings with low distortion properties have been well-studied and J-L transforms are among the most popular–

Given a fixed rank m what is the near optimal projection for that rank m?

Finding range for a given target error condition

Results

• Conditioning numbers– Full covariance matrix for a smooth GP tracked at a dense set of

locations will be ill-conditioned and nearly rank-deficient in practice– Inverses may be highly unstable and severely degrade inference

Results

• Parameter estimation (Toy data)

Results• Parameter estimation (real data)

Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA...

Documents

Transcript of Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA...