Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1...

31
Inverse problems: From Regularization to Bayesian Inference 1 Daniela Calvetti Case Western Reserve University 1 based on forthcoming WIRES overview article ”Bayesian methods in inverse problems”

Transcript of Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1...

Page 1: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Inverse problems:

From Regularization to Bayesian Inference1

Daniela Calvetti

Case Western Reserve University

1based on forthcoming WIRES overview article ”Bayesian methods ininverse problems”

Page 2: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Inverse problems are everywhere

The solution of an inverse problem is needed when

I Recovering unknown causes from indirect, noisy measurement

I Determining poorly known - or unknown - parameters ofmathematical model

I Inferring on the interior of a body from non invasivemeasurements

Page 3: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Going against causality

Forward models:

I Move in the causal direction from causes to consequences

I Are usually well-posed: small perturbations in the causes leadto little change in the consequences

I May be very difficult to solve

Inverse problems:

I Go against causality mapping consequences to causes

I Their instability is the price for the well-posedness of theassociated forward model

I The more forgiving the forward problem, the more unstablethe inverse one

Page 4: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Instability vs regularization

I Regularization trades original unstable problem for a nearbyapproximate one

I For a linear inverse problem with forward model f (x) = Axinstability ⇐⇒ wide range of singular values

I Division by small singular values amplifies noise in the data

I Lanczos (1958) replaces A with low rank approximation priorto inversion

I Similar idea is formulated as Truncated SVD regularization(1987)

Page 5: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Tikhonov regularization

I Tikhonov (1960s) leave forward model intact

I The solution is penalized for traits due to amplified noise:

xTik = argmin‖b − f (x)‖2 + αG (x)

I Under suitable condition on f and G the solution exists and is

unique

I G (x) monitors growth of instability

I α > 0 assesses relative weights of fidelity and penalty terms

Tikhonov penalty requires a priori belief about the solution!

Page 6: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Regularization operator

Standard choices of the regularization operator are

I G (x) = ‖Lx‖2, with L = I to penalize growth in the solutionEuclidean norm

I G (x) = ‖Lx‖2, with L equal to a discrete first orderdifferencing to penalize growth in the gradient

I G (x) = ‖Lx‖2, with L equal to the discrete Laplacian, topenalize growth in the curvature

I G (x) = ‖x‖1, to favor sparse solutions, related to compressedsensing

I Total variation (TV) of x , to favor blocky solutions.

Page 7: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Regularization parameter

I Tikhonov regularized solution depends heavily on value of α

I In the limit as α→ 0, solve an output least squares problem

I If α is very large the solution may not predict well the data

I Morozov discrepancy principle:If the norm of the noise, ‖e‖ ≈ δ, is known, α = α(δ) shouldbe chosen so that

‖b − f (xα)‖ = δ.

I If norm of the noise is not known, selection of α can be basedon heuristic criteria

Page 8: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Semi-convergence and stopping rules

Given the linear systemAx = b,

and x0 = 0, a Krylov least squares solver generate a sequence ofapproximate solutions x0, x1, . . . xk , . . ., with

xk ∈ Kk

(ATA,ATb

)= span

ATb, . . .

(ATA

)k−1ATb

.

I If Ax = b is a discrete inverse problem, iterates start toconverge to the solution, then diverge (semi-convergence).

I If dk = b − Axk form a decreasing sequence, iterate until

‖b − Axk‖ ≤ δ.

Page 9: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Data Forward model

Noise level

Penalty term

Regularization operator

Solution features Solution smoothness

Stopping rule

Tikhonov regularization Iterative regularization

Regularization parameter

Fidelity term Fidelity term

A priori info

Single solution Single solution

Page 10: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

The Bayesian framework for inverse problems

I Model unknown as a random variable X, and describe it via adistribution, not a single value

I Model observed data as random variable B describing itsuncertainty

I Stochastic extension of f (x) = b + e

B = f (X ) + E

I If X, E are mutually independent, likelihood of the form

πB|X (b | x) = πE (b − f (x)).

Page 11: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Bayes’ formula

Update the belief about X by the information throughB = bobserved using Bayes’ formula,

πX |B(x | b) =πB|X (b | x)πX (x)

πB(b), b = bobesrved,

where

πB(b) =

∫Rn

πB|X (b | x)πX (x)dx ,

normalization factor.

This is the complete solution of the inverse problem!

Page 12: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

The art and science of designing priors

A prior is the initial description of the unknown X beforeconsidering the data. There are different types of priors:

I Smoothness priors (0th, 1st and 2nd order)

I Correlation priors

I Structural priors

I Sparsity promoting priors

I Sample-based priors

Page 13: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Smoothness priors: 1st order

I Let

Xj − Xj−1 = γWj , Wj ∼ N (0, 1), 1 ≤ j ≤ n

or, collectively

L1X = γW ∼ N (0, In), L1 =

1−1 1

. . .. . .

−1 1

,leading to a Gaussian prior model,

πX (x) ∝ exp

(− 1

2γ2‖L1x‖2

), x ∈ Rn.

Page 14: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Smoothness priors: 2nd order

I Let2Xj − (Xj−1 + Xj+1) ∼ γWj , 1 ≤ j ≤ n − 1,

.

I Collectively

L2X = γW ∼ N (0, In−1), L2 =

2 −1

−1 2. . .

. . .. . . −1−1 2

.This leads to the Gaussian prior

πX (x) ∝ exp

(− 1

2γ2‖L2x‖2

), x ∈ Rn−1.

Page 15: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Whittle-Matern priors,

I D discrete approximation of Laplacian in Ω ⊂ Rn

I Prior for X defined in Ω can be described by a stochasticmodel

(−D + λ−2I)βX = γW ,

I where γ, β > 0, λ > 0 is the correlation length, and W is awhite noise process.

Page 16: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview
Page 17: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Structural priors

I Solution to have a jump across structure boundaries

I Within the structures, the smoothness prior is a validdescription

I

Xj − Xj−1 = γjWj , Wj ∼ N (0, 1), 1 ≤ j ≤ n,

where γj > 0 is small if we expect xk near xk−1, and largewhere xk could differ significantly from xk−1.

I

πstruct(x) ∝ exp

(−1

2‖Γ−1L1x‖2

).

I Examples:I geophysical sounding problems with known sediment layersI electrical impedance tomography with known anatomy

Page 18: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1−0.06

−0.04

−0.02

0

0.02

0.04

0 0.2 0.4 0.6 0.8 1−0.06

−0.04

−0.02

0

0.02

0.04

0 0.2 0.4 0.6 0.8 1−0.06

−0.04

−0.02

0

0.02

0.04

Page 19: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Sparsity promoting priors

These priors:

I assume very small values for most entries, while allowingisolated, sudden changes.

I SatisfyπX (x) ∝ exp (−α‖x‖p) .

I `p-priors are non-Gaussian, introducing computationalchallenges.

Page 20: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Sample based priors

I Assume we can build a generative model to generate typicalsolutions

I Generate a sample of typical solutions

X = x (1), x (2), . . . , x (N)

I Regard solutions in X as independent draws from approximateprior

I Approximate prior by a Gaussian

πX (x) ∝ exp

(−1

2(x − x)TΓ−1(x − x)

),

where

x =1

N

N∑j=1

x (j), Γ =1

N

N∑j=1

(x (j) − x)(x (j) − x)T + δ2I,

Page 21: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Solving inverse problems the Bayesian way

The main steps in solving inverse problems are:

I constructing an informative and computationally feasible prior;

I constructing the likelihood using the forward model andinformation about the noise;

I forming the posterior density using Bayes’ formula;

I extracting information from the posterior.

Page 22: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Prior density

Likelihood

Posterior Density

Data

Forward model

Noise Model

Modeling error

A priori belief Smoothness

Structure

Sparsity

Sample

Bayes’ formula

Single point estimate Posterior sample

Page 23: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

0.5

1

1.5

0.5

1

1.5

Page 24: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

The pure Gaussian case

Assume

E ∼ N (0,C), or πE (e) =1

(2π)m/2√

det(C)exp

(−1

2eTC−1e

).

and

πX(x) = exp

(−1

2G (x)

).

then

πX |B(x | b) ∝ exp

(−1

2(x − x)TB−1(x − x)

),

where the posterior mean and covariance are given by

x = (G−1 + ATC−1A)−1ATC−1b,

B = (G−1 + ATC−1A)−1.

Page 25: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Regularization and Bayes’ formula

Assume

I Scaled white noise model C = σ2I

I Zero-mean Gaussian prior with covariance matrix G with

G−1 = LTL,

I The posterior is of the form

πX |B(x | b) ∝ exp

(− 1

2σ2

(‖b − f (x)‖2 + σ2‖Lx‖2

)).

I Tikhonov regularized solution with G (x) = Lx is theMaximum A Posteriori estimate of the posterior

xα = argmaxπX |B(x | b), α = σ2.

Page 26: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Hierarchical Bayes models and sparsity

I Conditionally Gaussian prior

πX |Θ(x | θ) =1

(2π)n/2exp

−1

2

n∑j=1

x2j

θj− 1

2

n∑j=1

log θj

,

I Solution is sparse, or close to sparse: most variances of xj aresmall, with some outliers

I Gamma hyperpriors

πΘ(θ) ∝n∏

j=1

θβ−1j e−θ/θ0 ,

with β > 0 and θ0 are the shape and scaling parameters.

I Posterior distribution

πX ,Θ|B(x , θ | b) ∝ πΘ(θ)πX |B,Θ(x | b, θ)

Page 27: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Efficient quasi-MAP estimate

To find a point estimate for the pair (x , θ) ∈ Rn × Rn+

I Given the current estimate θ(k) ∈ Rn+, update x ∈ Rn by

minimizing

E (x | θ(k)) =1

2σ2‖b − f (x)‖2 +

1

2

n∑j=1

x2j

θ(k)j

;

I Given the updated x (k), update θ(k) → θ(k+1) by minimizing

E (θ | x (k)) =1

2

n∑j=1

x2j

θj+ η

n∑j=1

log θj +n∑

j=1

θjθ0, η = β − 3

2.

Page 28: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

Bayes meets Krylov...

When the problems are large and the computing time at apremium:

I The least squares problems in step 1 can be solved by CGSLwith suitable stopping rule

I A right preconditioner accounts for the information in the prior

I A left precondtioner accounts for the observational andmodeling errors

I The quasi-MAP is computed very rapidly via an inner-outeriteration scheme

I A number of open questions about the success of thepartnership are currently under investigation

Page 29: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

Page 30: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview
Page 31: Daniela Calvetti Case Western Reserve University...From Regularization to Bayesian Inference1 Daniela Calvetti Case Western Reserve University 1based on forthcoming WIRES overview

The Bayesian horizon is wide open

When solving inverse problems in the Bayesian framework

I We compensate uncertainties in forward models (see Ville’stalk)

I We can explore the posterior (see John’s talk)

I We can generalize classical regularization techniques toaccount for all sort of uncertainties.