Scalable Gaussian Process Analysis – the SCALAGauss...

31
Scalable Gaussian Process Analysis – the SCALAGauss Project Mihai Anitescu (MCS, Argonne), With, Jie Chen, Emil Constan:nescu (ANL); Michael Stein (Chicago) Web site of project hBp://press3.mcs.anl.gov/scalagauss/ VERSION OF 5/1/2012 At SAMSIHPCUQ workshop Oak Ridge, May 1, 2012

Transcript of Scalable Gaussian Process Analysis – the SCALAGauss...

Page 1: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Scalable Gaussian Process Analysis – the SCALAGauss Project

   

Mihai  Anitescu  (MCS,  Argonne),      

With,  Jie  Chen,  Emil  Constan:nescu  (ANL);  Michael  Stein    (Chicago)    Web  site  of  project                        hBp://press3.mcs.anl.gov/scala-­‐gauss/    VERSION  OF  5/1/2012        At  SAMSI-­‐HPC-­‐UQ  workshop  Oak  Ridge,  May  1,  2012  

Page 2: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Outline

  1.  Context    2.  Scalable  Max  Likelihood  Calcula7ons  with  GPs    3.  Linear  Algebra,  Precondi7oning    4  .  Scalable  Matrix-­‐Vector  Mul7plica7on  with  Covariance  Matrices    5.    Scalable  Sampling  Methods  for  Gaussian  Processes.      6.  Physics-­‐based  cross-­‐covariance  models    

5/2/12  

IMA  2011  UQ  Workshop  

Page 3: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

1. CONTEXT

5/2/12  

IMA  2011  UQ  Workshop  

Page 4: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Application: Interpolation with UQ of Spatio-Temporal Processes

                       

Source:  hRp://www.ccs.ornl.gov/  

4  

Page 5: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Gaussian process regression (kriging): Setup

4/2/12  

Argonne  Na7onal  Laboratory  -­‐-­‐-­‐-­‐  Mathema7cs  and  Computer  Science  Division  

5  

   GP  joint  distribu7on:  �

yy∗

�∼ N

��m(X)m(X∗)

�,

�K11 + Σ K12

K21 K22

��

y∗|X,X∗,y = m(X∗) + K21 (K11 + Σ)−1 (y −m(X))   Predic7ve  distribu7on:  

1 0.5 0 0.5 13

2

1

0

1

2

3

1 0.5 0 0.5 12

1

0

1

2

3prior   posterior  

   Gaussian  process  (GP):  

   Most  common:  Sta,onary  E {f(x)} = f(x) = m(x)

Cov(f(x)) = k(x, x�)

f(x) ∼ N (m(x), k(x, x�))

y = [ 0.5 −0.5 0.5 ]x = [ −0.5 0 0.5 ]

   Data  (observa7ons)/predic7ons:   y = f(x) + ε / y∗ = f(x∗)

Cov(y∗|X,X∗,y) = K22 −K21 (K11 + Σ)−1 K12

k(x, x ') = k(x ! x ')

Page 6: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Gaussian process regression (kriging): inferences

4/2/12  

Argonne  Na7onal  Laboratory  -­‐-­‐-­‐-­‐  Mathema7cs  and  Computer  Science  Division  

6  

!! !"#$ " "#$ !!$"

"

$"

!""%&'()*+,,(

-./!

!! !"#$ " "#$ !!$

"

$

+./0

1

23,**-3,

   Log-­‐  marginal  likelihood:  

   Covariance  func7on  (kernel):  

k(d) = σ2 21− ν2

Γ(ν)

�d√

ν

�ν

�√2d√

ν

�   Matérn  covariance  kernel:  

P (y|x) =�

P (y|f, x)P (f |x) df   Marginal  likelihood:  

“truth”  

predic7on  ±  2σ  (noisy)  observa7ons  

log(P (y|X; θ)) = −1

2yT (K11(θ) + Σ)−1 y − 1

2log |K11(θ) + Σ|

θ = [σ2, �2, . . . ]T ; θ∗ = argmax(log(P (y|X; θ)))

K(p, q) = k(d) = σ2 exp

�− d2

2�2

�, d = |xp − xq|

   GP  regression  or  kriging:  related  to  autoregressive  models,  Kalman  filtering  

K(p, q) = k(d) = σ2 exp

�− d2

2�2

�, d = |xp − xq|

MLE-­‐II  

Page 7: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

What makes a covariance function acceptable? Bochner’s theorem (this slide from Rasmussen)

5/2/12  

IMA  2011  UQ  Workshop  

  This  defines  the  spectral  density  of  a  covariance  process  (i.e.  its  FFT,  which  must  be  real  and  nonnega7ve  everywhere).  

  Some  example  processes  and  densi7es  

Square  Exponen7al   Matern   Matern  3/2  

Page 8: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Tasks and challenges

  Sampling      Maximum  likelihood    Interpola7on/Kriging  (solving  linear  system  with  K)    Regression/Classifica7on  (solving  linear  systems  with  K)    …  

  A  lot  of  the  basic  tasks  require  matrix  computa7ons  w.r.t.  the  covariance  matrix  K  (and  most  oeen,  Cholesky).    

  But  for  1B  data  points,  you  need  8*10^18  bytes  to  store  =  8  EXABYTES,  so  cannot  store  K.    

  How  do  you  do  compute  log-­‐det  and  A  without  storing  the  covariance  matrix?  And  Hopefully  in  O(number  data  points)  opera7ons?  

  The  same  challenges  appear  even  outside  GPs,  as  soon  as  you  need  to  deal  with  full  correla7on.      

8  

log(p(J S;! ) = " 12Y TK "1Y + 1

2Y TK "1H (HTK "1H )"1HTKY " 1

2log K " m

2log(2# )

K = AT A, $ ! N 0, I( ), y = M + A$ ! N(m,K )

Page 9: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

2. SCALABLE MAXIMUM LIKELIHOOD CALCULATIONS

5/2/12  

IMA  2011  UQ  Workshop  

Page 10: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Maximum Likelihood Estimation (MLE)

  A  family  of  covariance  func7ons  parameterized  by  θ:  φ(x;  θ)  

  Maximize  the  log-­‐likelihood  to  es7mate  θ:  

  First  order  op7mality:  (also  known  as  score  equa7ons)  

10  

max!L(! ) = log (2" )!n/2 (detK )!1/2 exp(!yTK !1y / 2){ }

= !12yTK !1y! 1

2log(detK )! n

2log2"

12yTK !1(" jK )K

!1y! 12tr K !1(" jK )#$ %&= 0

Page 11: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Maximum Likelihood Estimation (MLE)

The  log-­‐det  term  poses  a  significant  challenge  for  large-­‐scale  computa7ons    

  Cholesky  of  K:  Prohibi7vely  expensive!    log(det  K)  =  tr(log  K):  Need  some  matrix  func7on  methods  to  handle  the  log    No  exis7ng  method  to  evaluate  the  log-­‐det  term  in  sufficient  accuracy  

11  

max!

! 12yTK !1y! 1

2log(detK )! n

2log2"

Page 12: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Sample Average Approximation of Maximum Likelihood Estimation (MLE)

We  consider  approximately  solving  the  first  order  op7mality  instead:              A  randomized  trace  es7mator  tr(A)  =  E[uTAu]  

–  u  has  i.i.d.  entries  taking  ±1  with  equal  probability  

  As  N  tends  to  infinity,  the  solu7on  approaches  the  true  es7mate    The  variance  introduced  in  approxima7ng  the  trace  is  comparable  with  the  

variance  of  the  sample  y  –  So  the  approxima7on  does  not  lose  too  much  accuracy  

  Numerically,  one  must  solve  linear  systems  with  O(N)  right-­‐hand  sides.    

12  

12yTK !1(" jK )K !1y! 1

2tr K !1(" jK )#$ %&

' 12yTK !1(" jK )K !1y! 1

2NuiT

i=1

N

( K !1(" jK )#$ %&ui = 0

Page 13: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Stochastic Approximation of Trace

  When  entries  of  u  are  i.i.d.  with  mean  zero  and  covariance  I      

  The  es7mator  has  a  variance  

  If  each  entry  of  u  takes  ±1  with  equal  probability,  the  variance  is  the  smallest    

13  

tr(A) = Eu uT Au!" #$

var uTAu{ }= E ui4!" #$%1( )Aii2

i& +

12

(Aij + Aji )2

i' j&

var uTAu{ }= 12 (Aij + Aji )2

i! j"

Page 14: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Convergence of Stochastic Programming - SAA

  Let  

  First  result:      where  

  Note:  ΣN  decreases  in  O(N-­‐1)  

14  

! : truth

!̂ : sol of 12yTK !1(" jK )K !1y! 1

2tr K !1(" jK )#$ %&= 0

!̂ N : sol of F = 12yTK !1(" jK )K !1y! 1

2NuiT

i=1

N

' K !1(" jK )#$ %&ui = 0

[V N ]!1/2 (!̂ N !!̂ ) D" #" standard normal, V N = [J N ]!T $N [J N ]!1

J N =!F(!̂ N ) and "N = cov{F(!̂ N )}

Page 15: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Simulation: We scale

  Truth  θ  =  [7,  10],  Matern  ν  =  1.5  

15  

104 1066

7

8

9

10

11

matrix dimension n

1

2

104 106101

102

103

104

105

matrix dimension n

time

(sec

onds

)

64x64 grid 2.56 mins func eval: 7

128x128 grid6.62 mins func eval: 7

256x256 grid1.1 hours func eval: 8

512x512 grid2.74 hours func eval: 8

1024x1024 grid11.7 hours func eval: 8

Page 16: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

“Optimal” Convergence

  Let  

  Second  result:      where  

  Note:  J  has  a  bound                                                                                    ,  so  C  converges  to  I-­‐1  in  O(N-­‐1)  if  condi7on  number  of  K  is  bounded.  

16  

! : truth

!̂ : sol of 12yTK !1(" jK )K !1y! 1

2tr K !1(" jK )#$ %&= 0

!̂ N : sol of F = 12yTK !1(" jK )K !1y! 1

2NuiT

i=1

N

' K !1(" jK )#$ %&ui = 0

C!1/2 (!̂ N !! ) D" #" standard normal, C = A!TBA!1

!A = I, Fisher matrix and B = I + 14N

J

J ! I " [cond(K )+1]2

cond(K )

Page 17: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

3. LINEAR ALGEBRA; PRECONDITIONING

5/2/12  

IMA  2011  UQ  Workshop  

Page 18: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

LINEAR ALGEBRA CHALLENGES: PRECONDITIONING AND MATRIX VECTOR MULTIPLICATIONS

We  reduced  max  likelihood  calcula7ons  to  solving  linear  systems  with  K.          We  next  focus  on  the  linear  algebra:    Precondi7oning  K    Matrix-­‐vector  mul7plica7on  with  K    Solving  linear  system  w.r.t.  K  with  mul7ple  right-­‐hand  sides  

18  

Page 19: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Covariance Model

  Matern  covariance  func7on  

 –  ν:  Example  values  0.5,  1,  1.5,  2  –  θ:  Scale  parameters  to  es7mate  –  Kν  is  the  modified  Bessel  func7on  of  the  second  kind  of  order  ν  

  Commonly  used  in  spa7al/temporal  data  modeling.    The  parameter  ν  is  used  to  model  the  data  with  a  certain  level  of  smoothness.    When  ν  -­‐>  ∞,  the  kernel  is  the  Gaussian  kernel.    Spectral  density  

19  

!(x) = 12"!1"(" )

2" r( )"#" 2" r( ) where r =

x j2

# j2

j=1

d

$

f (!)! 2" + # 2( )"(!+d /2)

where " = (# j$ j )2

j=1

d

#

Page 20: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Why the Matern Kernel?

5/2/12  

IMA  2011  UQ  Workshop  

  In  machine  learning,  people    tend  to  use  the  square  exponen7al  kernel  a  lot.    This  assumes  that  all  realiza7ons  are  infinitely  smooth,  a  fact  rarely  supported  by  

data,  especially  high  resolu7on  data.      The  Matern  Kernel  allows  one  to  adjust  smoothness.          The  resul7ng  covariance  matrix  is  dense,  compared  to  compact  Kernels,  but  the  

likelihood  surface  is  much  smoother.    

Page 21: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Condition Number

  K  is  increasingly  ill-­‐condi7oned.    If  the  grid  is  in  a  fixed,  finite  domain                        ,  then  cond(K)  =  O(n2ν/d+1)  

  On  regular  grid,  K  is  (mul7-­‐level)  Toeplitz,  hence  a  circulant  precondi7oner  applies  

  More  can  be  done  by  considering  filtering  

21  

! Rd

Midwest NA Day, May 8, 2011

Solving Linear System K

Circulant preconditioner [Chan, ’88]

c(Tn) = minCn

‖Cn − Tn‖F

Simple construction

t0 t−1 · · · t−n+2 t−n+1

t1 t0 t−1 t−n+2

... t1 t0. . .

...

tn−2

. . .. . . t−1

tn−1 tn−2 · · · t1 t0

−→

c0 cn−1 · · · c2 c1

c1 c0 cn−1 c2... c1 c0

. . ....

cn−2

. . .. . . cn−1

cn−1 cn−2 · · · c1 c0

ci =(n− i) · ti + i · t−n+1

n, i = 0, . . . , n− 1

Numerical Linear Algebra Challenges in Very Large Scale Data Analysis – p. 32/44

Page 22: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Condition Number

  Filtering  (1D):      if  f(w)w2  bounded  away  from  0  and  ∞  as  w  -­‐>  ∞    Let  0  ≤  x0  ≤  x1  ≤  …  ≤  xn  ≤  T.    dj  =  xj  –  xj-­‐1.  

  Then  K(1)  has  a  bounded  condi7on  number  independent  of  n    

  Filtering  (1D):      if  f(w)w4  bounded  away  from  0  and  ∞  as  w  -­‐>  ∞  

  Then  K(2)  has  a  bounded  condi7on  number  independent  of  n  

22  

Yj(1) = [Z(x j )! Z(x j!1)] / dj , K (1) ( j, l) = cov Yj

(1),Yl(1){ }

Yj(2) =

Z(x j+1)! Z(x j )2dj+1 dj+1 + dj

!Z(x j )! Z(x j!1)2dj d j+1 + dj

, K (2) ( j, l) = cov Yj(2),Yl

(2){ }

Page 23: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Condition Number

  Filtering  (high  dimension,  regular  grid):  if  f(w)  is  asympto7cally  (1+|w|)-­‐4τ  

   

  Then  K[τ]  has  a  bounded  condi7on  number  independent  of  n  

  Use  the  filter  as  a  precondi7oner        In  2D,  L  is  the  5-­‐point  stencil  matrix  with  rows  w.r.t.  the  grid  boundary  removed.  

  Similarly  for  the  filters  in  the  preceding  slide  

23  

!Z(x j ) = Z(x j "!ep )p=1

d

# " 2Z(x j )+ Z(x j +!ep )

K [! ]( j, l) = cov ![! ]Z(x j ),![! ]Z(xl ){ }

K [! ] = L[! ]!" #$! L[2]!" #$ L[1]!" #$K L[1]!" #$

TL[2]!" #$

T! L[! ]!" #$

T

Page 24: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Condition Number

  Effect  of  filtering  (K[τ]  can  be  further  precondi7oned  by  circulant  precondi7oner)  

24  

102 103 104 105 106100

102

104

106

108

matrix dimension n

cond

ition

num

ber

KK preconditionedK[1]

K[1] preconditioned

Page 25: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Block CG

  Precondi7oned  Conjugate  Gradient  (M  is  precondi7oner)  

25  

AX = B (block version)

Xj+1 = Xj +Pj! j

Rj+1 = Rj ! APj! j

Pj+1 = (MRj+1 +Pj" j )# j+1

! j = (PjT APj )

!1# jT (Rj

TMRj )

" j = # j!1(Rj

TMRj )!1(Rj+1

T MRj+1)

x j+1 = x j +! j pjrj+1 = rj !! jApjpj+1 =Mrj+1 +" j pj

! j = rjTMrj / pj

T Apj" j = rj+1

T Mrj+1 / rjTMrj

Ax = b

where   where  

Page 26: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Block CG

  CG,  block  CG,  and  the  precondi7oned  versions  using  circulant  precondi7oner  

26  

0 100 200 300 400 50010 10

10 5

100

105

iteration

resi

dual

CGPCGBCGBPCG

Page 27: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Experimental Results

  Combined  effect  of  circulant  precondi7oning  and  filtering  

27  

0 100 200 300 40010 10

10 5

100

105

iteration

resi

dual

KK preconditionedK[1]

K[1] preconditioned

Page 28: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

3.1 PHYSICS-INSPIRED “PROBLEM”

5/2/12  

IMA  2011  UQ  Workshop  

Page 29: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Stokes Flow

 

29  

ML Stein, J Chen, M Anitescu/Approximation of score function 26

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

(a) Pressure field

(b) Filtered pressure field in log-scale

(c) Partition of the data

−5 0 5−3

−2

−1

0

1

2

3 x 10−10

(d) Data in circle

−5 0 5−3

−2

−1

0

1

2

3 x 10−10

(e) Data in rectangle

−5 0 5−3

−2

−1

0

1

2

3

(f) Data on boundaries of shapes

−5 0 5−1.5

−1

−0.5

0

0.5

1

1.5 x 10−11

(g) Data in background

Fig 7. Stokes flow data. The right columns are QQ plots of the filtered data in regions.

ML Stein, J Chen, M Anitescu/Approximation of score function 26

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

(a) Pressure field

(b) Filtered pressure field in log-scale

(c) Partition of the data

−5 0 5−3

−2

−1

0

1

2

3 x 10−10

(d) Data in circle

−5 0 5−3

−2

−1

0

1

2

3 x 10−10

(e) Data in rectangle

−5 0 5−3

−2

−1

0

1

2

3

(f) Data on boundaries of shapes

−5 0 5−1.5

−1

−0.5

0

0.5

1

1.5 x 10−11

(g) Data in background

Fig 7. Stokes flow data. The right columns are QQ plots of the filtered data in regions.

Page 30: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Stokes Flow

FiRed  a  power-­‐law  model  

30  

!(x;",C) = !("" / 2) #C x "

Data   Circle   Rectangle   Boundary   Background  

#  Points   1.5e+5   7.9e+4   3.1e+4   4.7e+5  

Fi>ed  α   0.1819   0.3051   0.7768   1.5945  

Fi>ed  C   722.54   803.07   3309.4   626180.0  

eig(Fisher-­‐1)1/2   5.04e-­‐4   9.80e-­‐4   2.50e-­‐3   8.11e-­‐4  

1.31e+1   1.86e+1   1.26e+2   5.44e+3  

Page 31: Scalable Gaussian Process Analysis – the SCALAGauss Projectanitescu/Presentations/2012/anitescu_SA… · 2.#Scalable#Max#Likelihood#Calculaons#with#GPs# 3.#Linear#Algebra,#Precondi7oning#

Conclusion GP

  State-­‐of-­‐the-­‐art  methods  use  Cholesky  to  do  sampling  and  solve  ML.  –  Can  probably  handle  data  size  up  to  n  =  O(104)  or  O(105).  

  We  propose  a  framework  to  overcome  the  Cholesky  barrier.  –  Use  a  matrix-­‐free  method  to  do  sampling.  –  Reformulate  maximum  likelihood  using  stochas7c  approxima7on.  –  Use  itera7ve  solver  to  solve  linear  systems.  –  Use  a  filtering  technique  to  reduce  the  condi7on  number.  

  On  going  work  –  Inves7ga7ng  the  scaling  of  parallel  FFT  for  n  =  O(106)  and  larger  computa7ons.  –  For  scaRered  points,  inves7ga7ng  a  discrete  Laplace  operator  for  filtering.  –  Implemen7ng  a  fast  summa7on  method  to  do  mat-­‐vec.    

  Details:  ScalaGauss  project  web  site.    

46