Automatic estimation of regularization parameters:...
-
Upload
nguyenminh -
Category
Documents
-
view
221 -
download
0
Transcript of Automatic estimation of regularization parameters:...
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
AUTOMATIC ESTIMATION OF REGULARIZATION
PARAMETERS: INITIAL STEPS
Rosemary Renauthttp://math.asu.edu/˜rosie
QUANTITATIVE SUSCEPTIBILITY MAPPING (QSM)JULY 27, 2013
Acknowledgements: Cornell MRI LaboratoryYi WangTian LiuPascal SpincemailleShaui Wang
1 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Outline
Motivating Example 3D Data
Context
Regularization Parameter Estimation
Using the Noise Properties
Conclusions and Future
Theoretical Discussion
2 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Motivating Example for QSM
Neuroimage 59, 2012 (2560-2568): Liu et al
Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.
Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):
b ≈ Ax
Least squares or image based formulation: solve for x
‖Wb(Ax− b)‖2 +1
σ2‖WG(∇x)‖2
Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 3 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Motivating Example for QSM
Neuroimage 59, 2012 (2560-2568): Liu et al
Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.
Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):
b ≈ Ax
Least squares or image based formulation: solve for x
‖Wb(Ax− b)‖2 +1
σ2‖WG(∇x)‖2
Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 4 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Motivating Example for QSM
Neuroimage 59, 2012 (2560-2568): Liu et al
Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.
Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):
b ≈ Ax
Least squares or image based formulation: solve for x
‖Wb(Ax− b)‖2 +1
σ2‖WG(∇x)‖2
Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 5 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Motivating Example for QSM
Neuroimage 59, 2012 (2560-2568): Liu et al
Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.
Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):
b ≈ Ax
Least squares or image based formulation: solve for x
‖Wb(Ax− b)‖2 +1
σ2‖WG(∇x)‖2
Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 6 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Goals
1. Develop approach to automatically estimate parameterσ = 1/
√λ
2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization
7 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Goals
1. Develop approach to automatically estimate parameterσ = 1/
√λ
2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization
8 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Goals
1. Develop approach to automatically estimate parameterσ = 1/
√λ
2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization
9 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Goals
1. Develop approach to automatically estimate parameterσ = 1/
√λ
2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization
10 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Goals
1. Develop approach to automatically estimate parameterσ = 1/
√λ
2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization
11 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Context: L2 regularization
Solve ill-conditionedAx ≈ b
Standard Tikhonov, L approximates a derivative operator
x(λ) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx‖22}
x(λ) solves normal equations provided null(L)∩ null(A) = {0}
(ATA+ λ2LTL)x(λ) = ATb
Multiple approaches exist for estimating parameter λ = 1/√σ
12 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Context: L2 regularization
Solve ill-conditionedAx ≈ b
Standard Tikhonov, L approximates a derivative operator
x(λ) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx‖22}
x(λ) solves normal equations provided null(L)∩ null(A) = {0}
(ATA+ λ2LTL)x(λ) = ATb
Multiple approaches exist for estimating parameter λ = 1/√σ
13 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Context: L2 regularization
Solve ill-conditionedAx ≈ b
Standard Tikhonov, L approximates a derivative operator
x(λ) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx‖22}
x(λ) solves normal equations provided null(L)∩ null(A) = {0}
(ATA+ λ2LTL)x(λ) = ATb
Multiple approaches exist for estimating parameter λ = 1/√σ
14 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Methods: assume variance τ2 in weighted Wbb
Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)
‖Wb(Ax(σ)− b)‖2 ≈ τ2
L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)
)Generalized Cross Validation (GCV) - minimization
‖Wb(Ax(σ)− b‖2
Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)
Unbiased Predictive Risk Estimation (UPRE) minimization
‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)
)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...
15 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Methods: assume variance τ2 in weighted Wbb
Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)
‖Wb(Ax(σ)− b)‖2 ≈ τ2
L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)
)Generalized Cross Validation (GCV) - minimization
‖Wb(Ax(σ)− b‖2
Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)
Unbiased Predictive Risk Estimation (UPRE) minimization
‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)
)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...
16 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Methods: assume variance τ2 in weighted Wbb
Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)
‖Wb(Ax(σ)− b)‖2 ≈ τ2
L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)
)Generalized Cross Validation (GCV) - minimization
‖Wb(Ax(σ)− b‖2
Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)
Unbiased Predictive Risk Estimation (UPRE) minimization
‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)
)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...
17 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Methods: assume variance τ2 in weighted Wbb
Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)
‖Wb(Ax(σ)− b)‖2 ≈ τ2
L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)
)Generalized Cross Validation (GCV) - minimization
‖Wb(Ax(σ)− b‖2
Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)
Unbiased Predictive Risk Estimation (UPRE) minimization
‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)
)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...
18 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Methods: assume variance τ2 in weighted Wbb
Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)
‖Wb(Ax(σ)− b)‖2 ≈ τ2
L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)
)Generalized Cross Validation (GCV) - minimization
‖Wb(Ax(σ)− b‖2
Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)
Unbiased Predictive Risk Estimation (UPRE) minimization
‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)
)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...
19 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some characteristics of the methods
Method Idea Many λ Algorithm Statistical UniqueDiscrepancy Easy No Root finding Yes Yes
L-curve Easy Yes spline No NoGCV Hard Yes Minimum Yes No
UPRE Hard Yes Minimum Yes Noχ2 Ok No Root finding Yes Yes
1. In particular χ2 and UPRE rely on provision of statistics ofthe noise distribution
2. UPRE and GCV require a matrix trace estimation -expensive
20 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some characteristics of the methods
Method Idea Many λ Algorithm Statistical UniqueDiscrepancy Easy No Root finding Yes Yes
L-curve Easy Yes spline No NoGCV Hard Yes Minimum Yes No
UPRE Hard Yes Minimum Yes Noχ2 Ok No Root finding Yes Yes
1. In particular χ2 and UPRE rely on provision of statistics ofthe noise distribution
2. UPRE and GCV require a matrix trace estimation -expensive
21 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Weighting for the noise - assume noise η in b
Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C
1/2b )2 and is invertible.
Multiplying by W 1/2b = (C
1/2b )−1 whitens the noise in b
W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W
1/2b Cb(W
1/2b )T ) = (0, Im)
ie. we have the weighted form (‖A‖2W = ATWA)
x(σ) = arg minx{‖Ax− b‖2Wb
+ 1/σ2‖x‖2}
More generally : Wx = 1/σ2I and augmented residual r(σ)
x(Wx) = arg minx
∥∥∥∥∥(W
1/2b A
W1/2x
)x−
(W
1/2b b0n
)∥∥∥∥∥2
:= arg minx‖r(σ)A‖2
22 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Weighting for the noise - assume noise η in b
Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C
1/2b )2 and is invertible.
Multiplying by W 1/2b = (C
1/2b )−1 whitens the noise in b
W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W
1/2b Cb(W
1/2b )T ) = (0, Im)
ie. we have the weighted form (‖A‖2W = ATWA)
x(σ) = arg minx{‖Ax− b‖2Wb
+ 1/σ2‖x‖2}
More generally : Wx = 1/σ2I and augmented residual r(σ)
x(Wx) = arg minx
∥∥∥∥∥(W
1/2b A
W1/2x
)x−
(W
1/2b b0n
)∥∥∥∥∥2
:= arg minx‖r(σ)A‖2
23 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Weighting for the noise - assume noise η in b
Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C
1/2b )2 and is invertible.
Multiplying by W 1/2b = (C
1/2b )−1 whitens the noise in b
W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W
1/2b Cb(W
1/2b )T ) = (0, Im)
ie. we have the weighted form (‖A‖2W = ATWA)
x(σ) = arg minx{‖Ax− b‖2Wb
+ 1/σ2‖x‖2}
More generally : Wx = 1/σ2I and augmented residual r(σ)
x(Wx) = arg minx
∥∥∥∥∥(W
1/2b A
W1/2x
)x−
(W
1/2b b0n
)∥∥∥∥∥2
:= arg minx‖r(σ)A‖2
24 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Weighting for the noise - assume noise η in b
Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C
1/2b )2 and is invertible.
Multiplying by W 1/2b = (C
1/2b )−1 whitens the noise in b
W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W
1/2b Cb(W
1/2b )T ) = (0, Im)
ie. we have the weighted form (‖A‖2W = ATWA)
x(σ) = arg minx{‖Ax− b‖2Wb
+ 1/σ2‖x‖2}
More generally : Wx = 1/σ2I and augmented residual r(σ)
x(Wx) = arg minx
∥∥∥∥∥(W
1/2b A
W1/2x
)x−
(W
1/2b b0n
)∥∥∥∥∥2
:= arg minx‖r(σ)A‖2
25 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Weighting for the noise - assume noise η in b
Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C
1/2b )2 and is invertible.
Multiplying by W 1/2b = (C
1/2b )−1 whitens the noise in b
W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W
1/2b Cb(W
1/2b )T ) = (0, Im)
ie. we have the weighted form (‖A‖2W = ATWA)
x(σ) = arg minx{‖Ax− b‖2Wb
+ 1/σ2‖x‖2}
More generally : Wx = 1/σ2I and augmented residual r(σ)
x(Wx) = arg minx
∥∥∥∥∥(W
1/2b A
W1/2x
)x−
(W
1/2b b0n
)∥∥∥∥∥2
:= arg minx‖r(σ)A‖2
26 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Statistical Properties of the Augmented Regularized Residual
For a given solution
x(Wx) = W−1x AT (ATW−1
x A+W−1b )−1b
the augmented residual is
J(Wx) = bT (ATW−1x A+W−1
b )−1b = ‖r(Wx)‖2
Lemma (Distribution of the Cost Functional)
If Wb and Wx have been chosen appropriately functional J is arandom variable which follows a χ2 distribution with m degreesof freedom:
J(Wx) ∼ χ2(m) E(J(x(Wx))) = m Var(J) = 2m
27 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Statistical Properties of the Augmented Regularized Residual
For a given solution
x(Wx) = W−1x AT (ATW−1
x A+W−1b )−1b
the augmented residual is
J(Wx) = bT (ATW−1x A+W−1
b )−1b = ‖r(Wx)‖2
Lemma (Distribution of the Cost Functional)
If Wb and Wx have been chosen appropriately functional J is arandom variable which follows a χ2 distribution with m degreesof freedom:
J(Wx) ∼ χ2(m) E(J(x(Wx))) = m Var(J) = 2m
28 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
χ2 method to find the parameter (Mead and Renaut)
Find Wx = σ2I such that
m−√
2mzα/2 < bT (ATW−1x A+W−1
b )−1b < m+√
2mzα/2
Using the SVD W1/2b A = UΣV T let s = UTW
1/2b b - solve
F (σ) = sTdiag(1
1 + σ2σ2i)s−m = 0.
Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)
Large Scale Implement using CG or other projected methods withmapped regularization L
σ(k+1) = σ(k)(1 + α(k) 1
2
(σ(k)
‖Lx(σ(k)‖
)2
(J(σ(k))− m̃)
m̃ - degrees of freedom in the residual. α a line searchparameter.
29 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
χ2 method to find the parameter (Mead and Renaut)
Find Wx = σ2I such that
m−√
2mzα/2 < bT (ATW−1x A+W−1
b )−1b < m+√
2mzα/2
Using the SVD W1/2b A = UΣV T let s = UTW
1/2b b - solve
F (σ) = sTdiag(1
1 + σ2σ2i)s−m = 0.
Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)
Large Scale Implement using CG or other projected methods withmapped regularization L
σ(k+1) = σ(k)(1 + α(k) 1
2
(σ(k)
‖Lx(σ(k)‖
)2
(J(σ(k))− m̃)
m̃ - degrees of freedom in the residual. α a line searchparameter.
30 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
χ2 method to find the parameter (Mead and Renaut)
Find Wx = σ2I such that
m−√
2mzα/2 < bT (ATW−1x A+W−1
b )−1b < m+√
2mzα/2
Using the SVD W1/2b A = UΣV T let s = UTW
1/2b b - solve
F (σ) = sTdiag(1
1 + σ2σ2i)s−m = 0.
Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)
Large Scale Implement using CG or other projected methods withmapped regularization L
σ(k+1) = σ(k)(1 + α(k) 1
2
(σ(k)
‖Lx(σ(k)‖
)2
(J(σ(k))− m̃)
m̃ - degrees of freedom in the residual. α a line searchparameter.
31 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
χ2 method to find the parameter (Mead and Renaut)
Find Wx = σ2I such that
m−√
2mzα/2 < bT (ATW−1x A+W−1
b )−1b < m+√
2mzα/2
Using the SVD W1/2b A = UΣV T let s = UTW
1/2b b - solve
F (σ) = sTdiag(1
1 + σ2σ2i)s−m = 0.
Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)
Large Scale Implement using CG or other projected methods withmapped regularization L
σ(k+1) = σ(k)(1 + α(k) 1
2
(σ(k)
‖Lx(σ(k)‖
)2
(J(σ(k))− m̃)
m̃ - degrees of freedom in the residual. α a line searchparameter.
32 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
χ2 method to find the parameter (Mead and Renaut)
Find Wx = σ2I such that
m−√
2mzα/2 < bT (ATW−1x A+W−1
b )−1b < m+√
2mzα/2
Using the SVD W1/2b A = UΣV T let s = UTW
1/2b b - solve
F (σ) = sTdiag(1
1 + σ2σ2i)s−m = 0.
Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)
Large Scale Implement using CG or other projected methods withmapped regularization L
σ(k+1) = σ(k)(1 + α(k) 1
2
(σ(k)
‖Lx(σ(k)‖
)2
(J(σ(k))− m̃)
m̃ - degrees of freedom in the residual. α a line searchparameter.
33 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Some Results: Simulated data with 10% colored noise - no masks
Figure: Estimates obtained automatically by χ2 method, above, andbelow the optimal estimates by sweeping through 50 choices
34 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Example for Dipole Inversion: The SNR estimates
Figure: Estimates obtained automatically by χ2 method indicated ascompared to optimum. SNR 10 log 10(‖xtrue‖2/‖xtrue − x‖2). Allimage based methods and using CG
35 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Computational CostsCosts in seconds are for the χ2 and optimal search
χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683
The ratio for the cost increase of searching optimally:
9.29 4.41 7.44 4.09
Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost
Remarks
Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)
36 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Computational CostsCosts in seconds are for the χ2 and optimal search
χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683
The ratio for the cost increase of searching optimally:
9.29 4.41 7.44 4.09
Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost
Remarks
Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)
37 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Computational CostsCosts in seconds are for the χ2 and optimal search
χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683
The ratio for the cost increase of searching optimally:
9.29 4.41 7.44 4.09
Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost
Remarks
Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)
38 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Computational CostsCosts in seconds are for the χ2 and optimal search
χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683
The ratio for the cost increase of searching optimally:
9.29 4.41 7.44 4.09
Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost
Remarks
Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)
39 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Phantom Data
Figure: Estimates obtained automatically by χ2 method, left, and rightthe optimal estimates by sweeping through 50 choices 40 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Example comparing SNR estimates by k− spaced method : Simulation
Figure: Notice good estimates and minimal cost using χ2 in applied tok− space data 41 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Results in Fourier domain contaminated by aliasing/artifacts
Figure: Important to correctly identify noise levels and truncation forthe dipole convolution
42 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
43 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
44 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
45 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
46 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
47 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Observations / Conclusions
1. χ2 successfully applies for 3D inversion with noiseinformation
2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the
approach.4. Still needs to be refined for use in spectral domain ( to
include gradients)5. Efficient implementations require consideration of better
Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1
48 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 using Augmented Lagrangian: Simple Example (herewith UPRE)
49 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
50 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
51 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
52 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
53 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
54 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical Results: Relating UPRE and χ2
1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)
Lemma (Connecting UPRE and χ2)
The σ solving the χ2 functional provides a local minimum of theUPRE functional.
Proof: GSVD expansion for operators.
Lemma (Convergence with increasing resolution by χ2 )
Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.
RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.
55 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1
2σ2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(σ, µ) = arg minx,d{1
2‖Ax− b‖22 +
1
2σ2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{ 1
2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Notice dimension increase of the problem56 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1
2σ2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(σ, µ) = arg minx,d{1
2‖Ax− b‖22 +
1
2σ2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{ 1
2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Notice dimension increase of the problem57 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1
2σ2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(σ, µ) = arg minx,d{1
2‖Ax− b‖22 +
1
2σ2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{ 1
2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Notice dimension increase of the problem58 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1
2σ2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(σ, µ) = arg minx,d{1
2‖Ax− b‖22 +
1
2σ2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{ 1
2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Notice dimension increase of the problem59 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1
2σ2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(σ, µ) = arg minx,d{1
2‖Ax− b‖22 +
1
2σ2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{ 1
2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Notice dimension increase of the problem60 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Focus: Tikhonov Step of the Algorithm
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k) − g(k))‖22}
Update for x: Introduce
h(k) = d(k) − g(k).
Then
x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− h(k)‖22}.
Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.
61 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Focus: Tikhonov Step of the Algorithm
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k) − g(k))‖22}
Update for x: Introduce
h(k) = d(k) − g(k).
Then
x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− h(k)‖22}.
Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.
62 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Focus: Tikhonov Step of the Algorithm
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− (d(k) − g(k))‖22}
Update for x: Introduce
h(k) = d(k) − g(k).
Then
x(k+1) = arg minx{1
2‖Ax− b‖22 +
1
2σ2‖Lx− h(k)‖22}.
Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.
63 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical results: using Unbiased Predictive Risk for the SB Tik
LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.
RemarkCan we expect h(k) is stochastic?
RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.
64 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical results: using Unbiased Predictive Risk for the SB Tik
LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.
RemarkCan we expect h(k) is stochastic?
RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.
65 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical results: using Unbiased Predictive Risk for the SB Tik
LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.
RemarkCan we expect h(k) is stochastic?
RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.
66 / 68
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion
Theoretical results: using Unbiased Predictive Risk for the SB Tik
LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.
RemarkCan we expect h(k) is stochastic?
RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.
67 / 68