Session 1: Introduction - Personal webpages at NTNU › ... › Lund2016 › Session1 ›...

Session 1: Introduction

Geir-Arne Fuglstad <[email protected]>

Department of Mathematical Sciences, NTNU

April 14, 2016

www.ntnu.no G.-A. Fuglstad, Introduction

<[email protected]>

2

Instructors

— Geir-Arne Fuglstad• Postdoc with Håvard Rue• Working with sensitivity and robustifying results from INLA• PhD on non-stationary spatial modelling with SPDE-models

— Jingyi Guo• Ph.D. student with Andrea Riebler and Håvard Rue• Master in Statistics from Lund University


3

Practical information

Slides and code for the sessions available athttp://www.math.ntnu.no/~fuglstad/Lund2016


http://www.math.ntnu.no/~fuglstad/Lund2016

4

Practical informationInformation about the software, examples, papers and help can befound at http://www.r-inla.org


http://www.r-inla.org

5

What is INLA?

We separate between three different parts:1. The INLA method2. The SPDE models3. The INLA R-package


6

The INLA method

An approach for fast Bayesian inference with “latent Gaussianmodels”

Read paper:Rue, Martino, and Chopin (2009) “Approximate Bayesian inference for latent Gaussian

models by using integrated nested Laplace approximations.” Journal of the royal statistical

society: Series B. 71, 319–392


7

The SPDE models

A novel way to get around the computational inefficiencies ofcontinuously indexed spatial fields (GRFs)

Read paper:Lindgren, F., Rue, H. and Lindström, J. (2011) “An explicit link between Gaussian fields and

Gaussian Markov random fields: the stochastic partial differential equation approach.”

Journal of the royal statistical society: Series B. 73, 319–392


8

The INLA package

The R-package is an implementation of the INLA method and theSPDE models with a flexible and simple interface

Download with:source("http://www.math.ntnu.no/inla/givemeINLA-testing.R")


9

The history of INLA

— The development has been driven by Håvard Rue and is theresult of many years of hard work

— Around 2002–2004 he and Leonard Held started to realize theimportance of the class of models that INLA handles

— In 2005 Håvard Rue and Leonard Held wrote the book“Gaussian Markov Random Fields: Theory and Applications”


10

The history of INLA

— The first implementation in C was finished in 2007, butrequired hand-crafted input-files

— Arnoldo Frigessi (Oslo) suggested that an R-interface wasnecessary to reach a broad audience

— Sara Martino wrote the first prototype of the R-interface inJanuary/February 2008

— The source code now consists of many, many, many lines...— The source code is available at

https://bitbucket.org/hrue/r-inla/


https://bitbucket.org/hrue/r-inla/

11

Who develops INLA?

Håvard Rue, Finn Lindgren, Daniel Simpson, Andrea Riebler, (SaraMartino, Thiago Guerrera Martins, Rupali Akerkar) and others(photo 2011)


12

Aims of the course

— Get an overview of latent Gaussian models— Get an overview of the INLA method— Learn how to use INLA for (generalized) linear models, and

more— Learn how to do spatial modelling with INLA


13

Structure of the course

Day 1:10:30–11:45 Session 1: Introduction13:15–15:00 Session 2: R-INLA15:30–17:00 Session 3: Practical session with R-INLADay 2:09:15–10:00 Session 4: Advanced Example10:30–11:45 Session 5: Spatial modelling with INLA13:15–15:00 Session 6: Practical session with spatial modelling


14

In general

— Ask questions!— Discuss with us!— If you have questions, you can use the google group or

[email protected]


[email protected]

15

Outline

Motivation

Bayesian hierarchical models

Latent Gaussian models

Deterministic inference

R and INLA


16

Why use INLA?

— Provides full Bayesian analysis— Quick to write code, do not need to write a sampler— Runs quickly— Can be used for a flexible class of models (Latent Gaussian

Models)


17

Example: Ski flying recordsWe have ski flying world records y = (y1, . . . , yn) and their datesx1, . . . , xn, and want to fit a simple linear regression with Gaussianresponses, where

E(yi) = µ+ βxi , Var(yi) = τ−1, i = 1, . . . ,n

1960 1970 1980 1990 2000 2010

14

01

80

22

0

Year

Dis

tan

ce


18

Frequentist analysis

1 mod = lm(Length ~ Date , data = skiData)2 summary(mod)

1960 1970 1980 1990 2000 2010

140

180

220

Year

Dis

tance

Estimates

µ: -3986 (66),β: 2.10 (0.03)σ = 1/

√τ : 3.98


19

Bayesian analysis1 res = inla(Length ~ Date , data = skiData)2 res$summary.fixed [ ,1:2]; res$summary.hyperpar

−4400 −4000 −36000.0

00

0.0

05

mu

Mean = −3986, SD = 65

1.9 2.1 2.3

04

8

beta

Mean = 2.10, SD = 0.03

3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

standard deviation


20

Real-world problems are typically morecomplicated!

Often we need to— include complicated dependency structures— stabilize the inference

Can be achieved with hierarchical Bayesian modelling, but...

Two main challenges:

— Need computationally efficient methods to calculate posteriors.— Select priors in a sensible way


21


INLA can analyze Bayesian hierarchical models specified in threestages:

Stage 1: What is the distribution of the responses?

Stage 1.5: How is the mean/variance/probability of the responselinked to the underlying unobserved components?

Stage 2: What is the distribution of the underlying unobservedcomponents?

Stage 3: What are our prior beliefs about the parameterscontrolling the components in the model?


21


INLA can analyze Bayesian hierarchical models specified in threestages:

Stage 1: What is the distribution of the responses?Stage 1.5: How is the mean/variance/probability of the response

linked to the underlying unobserved components?Stage 2: What is the distribution of the underlying unobserved

components?Stage 3: What are our prior beliefs about the parameters

controlling the components in the model?


22

Stage 1

How is the data (y) generated from the underlying components (x)and hyperparameters (θ) in the model:— Gaussian response?— Count data? (E.g. Poisson, negative binomial)— Zero-inflation?— Point pattern? (E.g. Log-Gaussian cox process)— Binary data?

The response distribution is connected to x and θ through thelikelihood π(y |x ,θ)


23

Stage 1.5

In INLA Stage 1 and Stage 2 must be connected through linearpredictors by

π(y |x ,θ) =n∏

i=1

π(yi |ηi ,θ),

where each ηi is a linear combination of the model components x .

For example, ηi = µ+ βxi can be combined withGaussian: ηi = µi

Poisson: ηi = log(µi)

Binomial: ηi = logit(pi)


24

Stage 2

The underlying unobserved components x are called latentcomponents and can be:— Covariates— Unstructured random effects (individual effects, group effects)— Structured random effects (AR(1), regional effects,

continuously indexed spatial effects)

The distribution of the the model components are specified byπ(x |θ)


25

Stage 3

The likelihood and the latent model typically have hyperparametersthat control their behavior. The hyperparameters θ can include:

— Variance of unstructured effects— Range and variance of spatial effects— Autocorrelation parameter— Variance of observation noise— Probability of a zero (zero-inflated models)

The a priori beliefs about these parameters are placed in the priorπ(θ)


27

Example: Disease mapping in Germany

We have observed larynx cancer mortality counts for males in 544district of Germany from 1986 to 1990 and want to make a model.

Information given:

yi : The count at location i .Ei : An offset; expected number of

cases in district i .ci : A covariate (level of smoking

consumption) at location isi : spatial location i (here, district).

0.0

0.5

1.0

1.5

2.0

2.5


28

Stage 1: The data

First we decide on the likelihood for our data y

— Need a distribution for counts— We decide to model our responses as

yi | ηi ∼ Poisson(Ei exp(ηi)),

ηi is a linear function of the latent components


29

Stage 2: The latent model

We choose four components— Intercept µ— Spatially structured effect fs and unstructured effect u— Covariate effect f (ci) of the exposure covariate ci

Combine with linear predictor ηi = µ+ fs(si) + f (ci) + ui , and thefull latent field x = (µ, {fs(·)}, {f (·)},u1,u2, . . . ,un)


30

Stage 3: Hyperparameters

The structured and unstructured spatial effect as well as thesmooth covariate effect are each controlled by one parameter

— τc , τf , τη: The precisions (inverse variances) of the covariateeffect, spatial effect and unstructured effect, respectively.

Hyperparameters θ = (τc , τf , τη) must be given a prior π(τc , τf , τη).


31

Quantities of interest

Median of structured spatial effect Covariate effect f (ci)exp(fs(si))

0.8

1.0

1.2

1.4

1.6

0 20 40 60 80 100

−0

.50

.00

.5

0.025%, 0.5% and 0.975%


32

Latent Gaussian models

A key feature of the example is that it is contained in the veryflexible and useful class of models called Latent Gaussian models

— The characteristic property is that the latent part of thehierarchical model is Gaussian, x |θ ∼ N (0,Q−1)

• The expected value is 0• The precision matrix (inverse covariance matrix) is Q

Together with the linear predictor restriction this defines the classof models INLA can handle


33

The general set-upThe class contains GLMs, GLMMs, GAMs, GAMMs, and more.Can be constructed by connecting the mean µi to the linearpredictor, ηi , through a link function g,

ηi = g(µi) =

α +

zTi β +

ui +

∑γ

wγ,i

fγ(cγ,i)

, i = 1,2, . . . ,n

where

α : Intercept

β : Fixed effects of covariates z

u : Unstructured error terms

{fγ(·)} : Non-linear/smooth effects of covariates c

{wγ,i} : Known weights defined for each observed data point


33


ηi = g(µi) = α +

zTi β +

ui +

∑γ

wγ,i

fγ(cγ,i)

, i = 1,2, . . . ,n

whereα : Intercept

β : Fixed effects of covariates z





33


ηi = g(µi) = α + zTi β +

ui +

∑γ

wγ,i

fγ(cγ,i)

, i = 1,2, . . . ,n

whereα : Interceptβ : Fixed effects of covariates z





33


ηi = g(µi) = α + zTi β + ui +

∑γ

wγ,i

fγ(cγ,i)

, i = 1,2, . . . ,n

whereα : Interceptβ : Fixed effects of covariates zu : Unstructured error terms




33



∑γ

wγ,i

fγ(cγ,i), i = 1,2, . . . ,n





33



∑γ

wγ,i fγ(cγ,i), i = 1,2, . . . ,n


{fγ(·)} : Non-linear/smooth effects of covariates c{wγ,i} : Known weights defined for each observed data point


34

Flexibility through f -functions

The functions {fγ} provides very different types of random effects— f (time): E.g., an AR(1) process, RW1 or RW2— f (spatial location): E.g., a Matérn field— f (covariate): E.g., a RW1 or RW2 on the covariate values— f (time, spatial location): spatio-temporal effect— And much more


35

Additivity

— One of the most useful features of the framework is theadditivity

— Effects can easily be removed and added without difficulty— Each component might adds a new latent part and might add

new hyperparameters, but the modelling framework andcomputations stay the same


36

Example: Smoothing binary time-series

— Observed the sequence y1, y2, . . . , yn of 0s and 1s— Each time t has an associated covariate xi

— We want to smooth the time series by inferring the sequencept , for t = 1,2, . . . ,n, of probabilities for 1s at each time step


37

Example: Smoothing time series

Stage 1: Bernoulli distribution for the responses

yt |ηt ∼ Bernoulli(

exp (ηt )

1 + exp (ηt )

)

Stage 2: Covariates, AR(1) component and random noise areconnected to likelihood by

ηt = β0 + β1xt + at + vt

Stage 3: ρ : Dependence parameter in AR(1) processσ2

a : Marginal variance in AR(1) processσ2

v : Variance of the unstructed error


37

Example: Smoothing time series

Stage 1: Bernoulli distribution for the responses

yt |ηt ∼ Bernoulli(

exp (ηt )

1 + exp (ηt )

)Stage 2: Covariates, AR(1) component and random noise are

connected to likelihood by

ηt = β0 + β1xt + at + vt

Stage 3: ρ : Dependence parameter in AR(1) processσ2

a : Marginal variance in AR(1) processσ2

v : Variance of the unstructed error


38

Loads of examples— Dynamic linear models— Stochastic volatility models— Generalised linear (mixed) models— Generalised additive (mixed) models— Measurement error models— Spline smoothing— Semi-parametric regression— Disease mapping— Log-Gaussian Cox-processes— Spatio-temporal models— Survival analysis— And more!


39

Computations

Now we have a modelling framework.

But how are the calculations actually done?

It depends on what you want to compute!


40

What are we interested in?

— Quantiles for the fixed effects

— A linear combination of elements from the latent field (e.g. theaverage over an area of a spatial effect, the difference of twoeffects)

— A single hyperparameter (e.g. the range)— A non-linear combination of hyperparameters (breeding values

for livestock)— Predictions at unobserved locations


40


— Quantiles for the fixed effects— A linear combination of elements from the latent field (e.g. the

average over an area of a spatial effect, the difference of twoeffects)




40




— A single hyperparameter (e.g. the range)

— A non-linear combination of hyperparameters (breeding valuesfor livestock)

— Predictions at unobserved locations


40





for livestock)

— Predictions at unobserved locations


40







41

What do we need to compute?

Often interested in the marginal posteriors the latent field

π(xi |y)

or the marginal posteriors the hyperparameters

π(θi |y)

or the posterior of another statistics

π(f (x ,θ)|y)

However, these can almost never be computed analytically.


42

Traditional approach with MCMC

Construct Markov chains with the target posterior distribution asthe stationary distribution.— Extensively used for Bayesian inference since the 1980’s— It is flexible and general— There are generic tools such as JAGS or OpenBUGS— Or more specific tools for more specific models, e.g. BayesX


43

Alternative approach

— MCMC “works” for everything, but it can be incredibly slow— Is it possible to make a quicker, more specialized inference

scheme which only needs to work for this limited class ofmodels?


44

Our model framework

Latent Gaussian models:Stage 1: y |x ,θ ∼

∏i π(yi |ηi ,θ)

Stage 2: x |θ ∼ π(x |θ) ∼ N (0,Q(θ)−1) Gaussian!Stage 3: θ ∼ π(θ)

where the precision matrix Q(θ) is sparse. Generally these“sparse” Gaussian distributions are called Gaussian Markovrandom fields (GMRFs).

The sparseness can be exploited for very quick computations forthe Gaussian part of the model through numerical algorithms forsparse matrices.


46

Toy example: SmoothingObservations

yi = m(i) + εi , i = 1, . . . ,n

— εi is i.i.d. Gaussian noise with known precision, τ0— m(i) is an unknown smooth function wrt i

1 n = 502 idx = 1:n3 fun = 100*((idx -n/2)/n)^34 y = fun + rnorm(n, mean

=0, sd=1)5 plot(idx , y)

0 10 20 30 40 50

−10

−5

05

10

idx

y


47

Assumed hierarchical model1. Data: Gaussian observations with known precision

yi | xi , θ ∼ N (xi , τ0)

2. Latent model: A second-order random walk for the underlyingsmooth function1

π(x | θ) ∝ θ(n−2)/2 exp

(−θ

2

n∑i=3

(xi − 2xi−1 + xi−2)2

)3. Hyperparameter: The smoothing parameter θ is assigned a

Γ(a,b) prior

π(θ) ∝ θa−1 exp (−bθ) , θ > 0

1model="rw2"


47


yi | xi , θ ∼ N (xi , τ0)


π(x | θ) ∝ θ(n−2)/2 exp

(−θ

2

n∑i=3

(xi − 2xi−1 + xi−2)2

)

3. Hyperparameter: The smoothing parameter θ is assigned aΓ(a,b) prior

π(θ) ∝ θa−1 exp (−bθ) , θ > 0

1model="rw2"


47


yi | xi , θ ∼ N (xi , τ0)


π(x | θ) ∝ θ(n−2)/2 exp

(−θ

2

n∑i=3

(xi − 2xi−1 + xi−2)2

)3. Hyperparameter: The smoothing parameter θ is assigned a

Γ(a,b) prior

π(θ) ∝ θa−1 exp (−bθ) , θ > 0

1model="rw2"


49

Posterior marginal for hyperparameter

1 2 3 4 5 6 7

0e

+0

04

e−

19

8e

−1

9

Posterior marginal for theta

Log(theta)

Un

no

rma

lise

d d

en

sity

1 2 3 4 5 6 70

.00

.40

.81

.2

Posterior marginal for theta

Log(theta)

De

nsity


51

Posterior marginal for latent parameters

−13 −12 −11 −10 −9 −8 −7 −6

0.0

0.2

0.4

0.6

0.8

Posterior marginal for x_1 for each theta (unweighted)

x_1

Density


52


−13 −12 −11 −10 −9 −8 −7 −6

0.0

00.0

50.1

00.1

50.2

0

Posterior marginal for x_1 for each theta (weighted)

x_1

Density


53


−13 −12 −11 −10 −9 −8 −7 −6

0.0

0.2

0.4

0.6

0.8

Posterior marginal for x_1

x_1

Density


54

Fitted splineThe posterior marginals are used to calculate summary statistics,like means, variances and credible intervals:

0 10 20 30 40 50

−10

−5

05

10

idx


55

Comparison with maximum likelihood

The red line is the Bayesian posterior, the blue line is the “posterior”using the MLE of θ, and the vertical line is the observed value y1

−13 −12 −11 −10 −9 −8 −7 −6

0.0

0.2

0.4

0.6

x_1

Density


56

Extensions

This is the simple basic idea behind INLA

However, we need to extend this basic idea to deal with— More than one hyperparameter— Non-Gaussian observations


57

Extension: More than onehyperparameter

Step 1 Explore π(θ|y)— Locate the mode— Use the Hessian to construct new variables— Grid-search

Step 2 Approximate marginals based on these integrationpoints


58

Non-Gaussian observations

— π(x |y ,θ) is often very close to a Gaussian distribution evenwith a non-Gaussian likelihood, and can be replaced with aLaplace approximation

— All the difficult high-dimensional integrals w.r.t. the latent fieldare easy, and only the integrals w.r.t. the hyperparametersremain

— The integrals can be done efficiently numerically when thenumber of hyperparameters is low


59

Limitations

— The dimension of the latent field x can be large (102–106)— But the dimension of the hyperparameters θ must be small

(≤ 9)

In other words, each random effect can be big, but there cannot betoo many random effects unless they share parameters.


60

How to use INLA?

INLA is implemented through the package INLA in the R softwarewhich— is the most popular computing language in applied statistics— is open source and free— has a lot of packages that extend the base functionality— has a very user friendly formula interface

linear_model <- lm(weight ~ group)

Fits the linear model

weighti = µ+ groupi + εi


61

Summary of INLA

Three main ingredients in INLA— Latent Gaussian models— Laplace approximations— Gaussian Markov random fields

These ingredients leads to a very nice tool for Bayesian inference:— fast— accurate— scales well for moderate sizes


Session 1: Introduction - Personal webpages at NTNU › ... › Lund2016 › Session1 ›...

Documents

Transcript of Session 1: Introduction - Personal webpages at NTNU › ... › Lund2016 › Session1 ›...