A hand-waving introduction to sparsity for compressed tomography reconstruction
-
Upload
gaelvaroquaux -
Category
Technology
-
view
1.804 -
download
1
description
Transcript of A hand-waving introduction to sparsity for compressed tomography reconstruction
Dense slides. For future reference: http://www.slideshare.net/GaelVaroquaux
A hand-waving introduction to sparsityfor compressed tomography reconstruction
Gael Varoquaux and Emmanuelle Gouillart
1 Sparsity for inverse problems
2 Mathematical formulation
3 Choice of a sparse representation
4 Optimization algorithms
G Varoquaux 2
1 Sparsity for inverse problems
Problem setting
Intuitions
G Varoquaux 3
1 Tomography reconstruction: a linear problem
y = A x
0 50 100 150 200 250
0
20
40
60
80
y ∈ Rn, A ∈ Rn×p, x ∈ Rp
n ∝ number of projectionsp: number of pixels in reconstructed image
We want to find x knowing A and y
G Varoquaux 4
1 Small n: an ill-posed linear problem
y = A x admits multiple solutions
The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:
Large number of projectionsIll-conditioned problem:
“short-sighted” rather than blind,⇒ captures noise on those components
G Varoquaux 5
1 Small n: an ill-posed linear problem
y = A x admits multiple solutions
The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:
Large number of projectionsIll-conditioned problem:
“short-sighted” rather than blind,⇒ captures noise on those components
G Varoquaux 5
1 A toy example: spectral analysisRecovering the frequency spectrum
Signal Freq. spectrum
signal = A · frequencies
0 10 20 30 40 50 60
02468 ...
G Varoquaux 6
1 A toy example: spectral analysisSub-sampling
Signal Freq. spectrum
signal = A · frequencies
0 10 20 30 40 50 60
02468 ...
Recovery problem becomes ill-posed
G Varoquaux 6
1 A toy example: spectral analysisProblem: aliasing
Signal Freq. spectrum
Information in the null-space of A is lost
Solution: incoherent measurementsSignal Freq. spectrum
i.e. careful choice of null-space of A
G Varoquaux 7
1 A toy example: spectral analysisProblem: aliasing
Signal Freq. spectrum
Solution: incoherent measurementsSignal Freq. spectrum
i.e. careful choice of null-space of AG Varoquaux 7
1 A toy example: spectral analysisIncoherent measurements, but scarcity of data
Signal Freq. spectrum
The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise”
Sparse Freq. spectrum
Impose sparsityFind a small number of frequenciesto explain the signal
G Varoquaux 8
1 A toy example: spectral analysisIncoherent measurements, but scarcity of data
Signal Freq. spectrum
The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise” Sparse Freq. spectrum
Impose sparsityFind a small number of frequenciesto explain the signal
G Varoquaux 8
1 And for tomography reconstruction?
Original image Non-sparse reconstruction Sparse reconstruction
128× 128 pixels, 18 projections
http://scikit-learn.org/stable/auto_examples/applications/plot_tomography_l1_reconstruction.html
G Varoquaux 9
1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:
y=
Ax
x2
x1
xtrue
The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦
G Varoquaux 10
1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:
y=
Ax
x2
x1
xtrue
The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦
The cross can be replaced by its convex hull
G Varoquaux 10
1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:
y=
Ax
x2
x1
xtrue
The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦
In high dimension: large acceptable set
G Varoquaux 10
Recovery of sparse signal
Null space of sensing operator incoherentwith sparse representation
⇒ Excellent sparse recovery with little projections
Minimum number of observations necessary:nmin ∼ k log p, with k : number of non zeros
[Candes 2006]Rmk Theory for i.i.d. samples
Related to “compressive sensing”G Varoquaux 11
2 Mathematical formulation
Variational formulation
Introduction of noise
G Varoquaux 12
2 Maximizing the sparsity`0 number of non-zeros
minx `0(x) s.t. y = A x
y=
Ax
x2
x1
xtrue
“Matching pursuit” problem [Mallat, Zhang 1993]“Orthogonal matching pursuit” [Pati, et al 1993]
Problem: Non-convex optimizationG Varoquaux 13
2 Maximizing the sparsity`1(x) =
∑i |xi |
minx `1(x) s.t. y = A x
y=
Ax
x2
x1
xtrue
“Basis pursuit” [Chen, Donoho, Saunders 1998]
G Varoquaux 13
2 Modeling observation noise
y = A x + e e = observation noise
New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2
2 ≤ ε2
Equivalent: “Lasso estimator” [Tibshirani 1996]
minx ‖y− Ax‖22 + λ `1(x)
G Varoquaux 14
2 Modeling observation noise
y = A x + e e = observation noise
New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2
2 ≤ ε2
Equivalent: “Lasso estimator” [Tibshirani 1996]
minx ‖y− Ax‖22 + λ `1(x)
Data fit Penalization
x2
x1
Rmk: kink in the `1 ball creates sparsity
G Varoquaux 14
2 Probabilistic modeling: Bayesian interpretation
P(x|y) ∝ P(y|x) P(x) (?)
“Posterior”Quantity of interest
Forward model “Prior”Expectations on x
Forward model: y = A x + e, e: Gaussian noise⇒ P(y|x) ∝ exp− 1
2σ2‖y− A x‖22
Prior: Laplacian P(x) ∝ exp− 1µ‖x‖1
Negated log of (?): 12σ2‖y− A x‖2
2 +1µ`1(x)
Maximum of posterior is Lasso estimateNote that this picture is limited and the Lasso is not a good
Bayesian estimator for the Laplace prior [Gribonval 2011].G Varoquaux 15
3 Choice of a sparserepresentation
Sparse in wavelet domain
Total variation
G Varoquaux 16
3 Sparsity in wavelet representation
Typical imagesare not sparse
Haar decompositionLevel 1 Level 2 Level 3
Level 4 Level 5 Level 6
⇒ Impose sparsity in Haar representation
A→ A H where H is the Haar transform
Original image Non-sparse reconstruction
Sparse image Sparse in Haar
G Varoquaux 17
3 Sparsity in wavelet representation
Typical imagesare not sparse
Haar decompositionLevel 1 Level 2 Level 3
Level 4 Level 5 Level 6
⇒ Impose sparsity in Haar representation
A→ A H where H is the Haar transform
Original image Non-sparse reconstruction
Sparse image Sparse in Haar
G Varoquaux 17
3 Total variation
Original image Haar wavelet TV penalization
Impose a sparse gradientminx ‖y− Ax‖2
2 + λ∑i
∥∥∥(∇x)i∥∥∥2
`12 norm: `1 norm of thegradient magnitude
Sets ∇x and ∇y to zero jointly
G Varoquaux 18
3 Total variation
Original image Error for Haar wavelet Error for TV penalization
Impose a sparse gradientminx ‖y− Ax‖2
2 + λ∑i
∥∥∥(∇x)i∥∥∥2
`12 norm: `1 norm of thegradient magnitude
Sets ∇x and ∇y to zero jointly
G Varoquaux 18
3 Total variation + interval
Original image TV penalization TV + interval
Bound x in [0, 1]minx ‖y−Ax‖2
2 +λ∑i
∥∥∥(∇x)i∥∥∥2+I([0, 1])
G Varoquaux 19
3 Total variation + interval
Original image TV penalization TV + interval
Bound x in [0, 1]
0.0 0.5 1.0
TV
TV + interval
Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.
minx ‖y−Ax‖22 +λ
∑i
∥∥∥(∇x)i∥∥∥2+I([0, 1])
Histograms:
G Varoquaux 19
3 Total variation + interval
Original image Error for TV penalization Error for TV + interval
Bound x in [0, 1]
0.0 0.5 1.0
TV
TV + interval
Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.
minx ‖y−Ax‖22 +λ
∑i
∥∥∥(∇x)i∥∥∥2+I([0, 1])
Histograms:
G Varoquaux 19
Analysis vs synthesis
Wavelet basis min ‖y− A H x‖22 + ‖x‖1
H Wavelet transform
Total variation min ‖y− A x‖22 + ‖Dx‖1
D Spatial derivation operator (∇)
G Varoquaux 20
Analysis vs synthesis
Wavelet basis min ‖y− A H x‖22 + ‖x‖1
H Wavelet transform“synthesis” formulation
Total variation min ‖y− A x‖22 + ‖Dx‖1
D Spatial derivation operator (∇)“analysis” formulation
Theory and algorithms easier for synthesisEquivalence iif D is invertible
G Varoquaux 20
4 Optimization algorithmsNon-smooth optimization⇒ “proximal operators”
G Varoquaux 21
4 Smooth optimization fails!
Gradient descent
Iterations
Energ
y
Gradient descent
x2
x1
Smooth optimization fails in non-smooth regionsThese are specifically the spots that interest us
G Varoquaux 22
4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz
Typically f is the data fit term, and g the penalty
ex: Lasso 12σ2‖y− A x‖2
2 +1µ`1(x)
G Varoquaux 23
4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz
Minimize successively:(quadratic approx of f ) + g
f (x) < f (y) +⟨x− y,∇f (y)
⟩+L
2
∥∥∥x− y∥∥∥2
2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)
in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f
xk+1 = argminx
g(x) + L2∥∥∥x− (
xk −1L∇f (xk)
)∥∥∥22
[Daubechies 2004]
G Varoquaux 23
4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz
Minimize successively:(quadratic approx of f ) + g
f (x) < f (y) +⟨x− y,∇f (y)
⟩+L
2
∥∥∥x− y∥∥∥2
2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)
in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f
xk+1 = argminx
g(x) + L2∥∥∥x− (
xk −1L∇f (xk)
)∥∥∥22
[Daubechies 2004]
Step 1: Gradient descent on f
Step 2: Proximal operator of g :proxλg(x)
def= argminy‖y− x‖2
2 + λ g(y)
Generalization of Euclidean projectionon convex set {x, g(x) ≤ 1} Rmk: if g is the indicator function
of a set S, the proximal operatoris the Euclidean projection.
proxλ`1(x) = sign(xi)(xi − λ
)+
“soft thresholding”
G Varoquaux 23
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Gradient descent step
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Projection on `1 ball
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Gradient descent step
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Projection on `1 ball
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Gradient descent step
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Projection on `1 ball
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Gradient descent step
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
Projection on `1 ball
G Varoquaux 24
4 Iterative Shrinkage-Thresholding Algorithm
Iterations
Energ
y
Gradient descentISTA
x2
x1
G Varoquaux 24
4 Fast Iterative Shrinkage-Thresholding Algorithm
FISTA
Iterations
Energ
y
Gradient descentISTAFISTA
x2
x1As with conjugate gradient: add a memory term
dxk+1 = dxISTAk+1 + tk−1
tk+1(dxk − dxk−1)
t1 = 1, tk+1 =1+√
1+4 t2k
2⇒ O(k−2) convergence [Beck Teboulle 2009]
G Varoquaux 25
4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]
proxλTVx = argminx‖y− x‖2
2 + λ∑i
∥∥∥(∇x)i∥∥∥2
Proof:“dual norm”: ‖v‖1 = max
‖z‖∞≤1〈v, z〉
div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x
Duality: [Boyd 2004] This proof: [Michel 2011]
G Varoquaux 26
4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]
proxλTVx = argminx‖y− x‖2
2 + λ∑i
∥∥∥(∇x)i∥∥∥2
= argmaxz, ‖z‖∞≤1
‖λ div z + y‖22
Proof:“dual norm”: ‖v‖1 = max
‖z‖∞≤1〈v, z〉
div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x
Duality: [Boyd 2004] This proof: [Michel 2011]G Varoquaux 26
Sparsity for compressed tomography reconstruction
@GaelVaroquaux 27
Sparsity for compressed tomography reconstructionAdd penalizations with kinksChoice of prior/sparse representationNon-smooth optimization (FISTA)
x2
x1
Further discussion: choice of prior/parametersMinimize reconstruction error from degradeddata of gold-standard acquisitionsCross-validation: leave half of the projectionsand minimize projection error of reconstruction
Python code available:https://github.com/emmanuelle/tomo-tv
@GaelVaroquaux 27
Bibliography (1/3)
[Candes 2006] E. Candes, J. Romberg and T. Tao, Robust uncertaintyprinciples: Exact signal reconstruction from highly incomplete frequencyinformation, Trans Inf Theory, (52) 2006
[Wainwright 2009] M. Wainwright, Sharp Thresholds forHigh-Dimensional and Noisy Sparsity Recovery Using`1 constrainedquadratic programming (Lasso), Trans Inf Theory, (55) 2009
[Mallat, Zhang 1993] S. Mallat and Z. Zhang, Matching pursuits withTime-Frequency dictionaries, Trans Sign Proc (41) 1993
[Pati, et al 1993] Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonalmatching pursuit: Recursive function approximation with plications towavelet decomposition, 27th Signals, Systems and Computers Conf 1993
@GaelVaroquaux 28
Bibliography (2/3)
[Chen, Donoho, Saunders 1998] S. Chen, D. Donoho, M. Saunders,Atomic decomposition by basis pursuit, SIAM J Sci Computing (20) 1998
[Tibshirani 1996] R. Tibshirani, Regression shrinkage and selection via thelasso, J Roy Stat Soc B, 1996
[Gribonval 2011] R. Gribonval, Should penalized least squares regressionbe interpreted as Maximum A Posteriori estimation?, Trans Sig Proc,(59) 2011
[Daubechies 2004] I. Daubechies, M. Defrise, C. De Mol, An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint, Comm Pure Appl Math, (57) 2004
@GaelVaroquaux 29
Bibliography (2/3)
[Beck Teboulle 2009], A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAM JImaging Sciences, (2) 2009
[Chambolle 2004], A. Chambolle, An algorithm for total variationminimization and applications, J Math imag vision, (20) 2004
[Boyd 2004], S. Boyd and L. Vandenberghe, Convex Optimization,Cambridge University Press 2004
— Reference on convex optimization and duality
[Michel 2011], V. Michel et al., Total variation regularization forfMRI-based prediction of behaviour, Trans Med Imag (30) 2011
— Proof of TV reformulation: appendix C
@GaelVaroquaux 30