Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees
-
Upload
gabriel-peyre -
Category
Technology
-
view
148 -
download
0
description
Transcript of Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees
Gabriel Peyré
www.numerical-tours.com
Low Complexity Regularization of Inverse Problems
Cours #2 Recovery Guarantees
Overview of the Course
• Course #1: Inverse Problems
• Course #2: Recovery Guarantees
• Course #3: Proximal Splitting Methods
Overview
• Low-complexity Regularization with Gauges
• Performance Guarantees
• Grid-free Regularization
Inverse Problem Regularization
observations y
parameter �Estimator: x(y) depends only on
Observations: y = �x0 + w 2 RP .
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
J(x0)Regularity of x0
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
Choice of �: tradeo� ||w||Noise level
J(x0)Regularity of x0
x
? 2 argminx2RQ
,Kx=y
J(x)
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
No noise: �� 0+, minimize
Choice of �: tradeo� ||w||Noise level
J(x0)Regularity of x0
x
? 2 argminx2RQ
,Kx=y
J(x)
This course:
Performance analysis.
Fast computational scheme.
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
No noise: �� 0+, minimize
Choice of �: tradeo� ||w||Noise level
�
Coe�cients x Image x
Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.
Synthesissparsity:
T
�
Coe�cients x Image x
Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
�
Coe�cients x Image x
Union of Linear Models for Data Processing
D�
Image x
Gradient D⇤x
Union of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
Analysissparsity:
�
Coe�cients x Image x
Multi-spectral imaging:xi,· =
Prj=1 Ai,jSj,·
Union of Linear Models for Data Processing
D�
Image x
Gradient D⇤x
Union of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
Analysissparsity:
Low-rank:
S1,·
S2,·
S3,·Figure 3. The concept of hyperspectral imagery. Image measurements are made atmany narrow contiguous wavelength bands, resulting in a complete spectrum for eachpixel.
Hyperspectral Data
Most multispectral imagers (e.g., Landsat, SPOT, AVHRR) measure radiation reflectedfrom a surface at a few wide, separated wavelength bands (Fig. 4). Most hyperspectralimagers (Table 1), on the other hand, measure reflected radiation at a series of narrowand contiguous wavelength bands. When we look at a spectrum for one pixel in ahyperspectral image, it looks very much like a spectrum that would be measured in aspectroscopy laboratory (Fig. 5). This type of detailed pixel spectrum can provide muchmore information about the surface than a multispectral pixel spectrum.
x
Gauge: J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = sparsevectors
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1
x
0T 0
T = sparsevectors
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1
x
0T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = low-rank
matrices
J(x) = ||x||⇤
x
x
0T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = low-rank
matrices
J(x) = ||x||⇤
x
x
0T 0
T = anti-sparsevectors
J(x) = ||x||1
x
x
0
T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models|x|
I = supp(x) = {i \ xi 6= 0}
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
x
@J(x)
0
|x|
Definition:
I = supp(x) = {i \ xi 6= 0}
T
x
= VectHull(@J(x))?
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
Tx
= {⌘ \ supp(⌘) = I}
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
Tx
x
@J(x)
0
|x|
Definition:
I = supp(x) = {i \ xi 6= 0}
T
x
= VectHull(@J(x))?
⌘ 2 @J(x) =) Proj
T
x
(⌘) = e
x
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
e
x
= sign(x)
Tx
= {⌘ \ supp(⌘) = I}
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
Tx
x
@J(x)
0 ex
|x|
Examples`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
x
0x
@J(x)
Examples
x
0x
@J(x)
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
0x
@J(x)
Examples
x
0x
@J(x)
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
@J(x)x
0x
@J(x)
ex
= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:
Tx
=�UA+BV ⇤ \ (A,B) 2 (Rn⇥n)2
Examples
x
0x
@J(x)
x
x
0
@J(x)
I = {i \ |xi| = ||x||1}Anti-sparsity: J(x) = ||x||1T
x
= {y \ y
I
/ sign(xI
)}e
x
= |I|�1 sign(x)
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
@J(x)x
0x
@J(x)
ex
= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:
Tx
=�UA+BV ⇤ \ (A,B) 2 (Rn⇥n)2
Overview
• Low-complexity Regularization with Gauges
• Performance Guarantees
• Grid-free Regularization
Noiseless recovery:
min�x=�x0
J(x) (P0)
�x = �
x0
Dual Certificates
x
?
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
�x = �
x0Proposition:
D(x0) = Im(�⇤) \ @J(x0)
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Dual Certificates
⌘@J(x0)
x
?
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
�x = �
x0Proposition:
D(x0) = Im(�⇤) \ @J(x0)
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Proof:
(P0) () min�2ker(�)
J(x0 + �)
8 (⌘, �) 2 @J(x0)⇥ ker(�), J(x0 + �) > J(x0) + h�, ⌘i
⌘ 2 Im(�
⇤) =) h�, ⌘i = 0 =) x0 solution.
x0 solution =) 8 �, h�, ⌘i 6 0 =) ⌘ 2 ker(�)
?.
Dual Certificates
⌘@J(x0)
x
?
= interior for the topology of a↵(E)
ri(E) = relative interior of E
Dual Certificates and L2 Stability
�x = �
x0
⌘@J(x0)
x
?
Tight dual certificates:
D(x0) = Im(�⇤) \ ri(@J(x0))
= interior for the topology of a↵(E)
ri(E) = relative interior of E
Dual Certificates and L2 Stability
Theorem:
[Fadili et al. 2013]
If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)
�x = �
x0
⌘@J(x0)
x
?
Tight dual certificates:
D(x0) = Im(�⇤) \ ri(@J(x0))
= interior for the topology of a↵(E)
ri(E) = relative interior of E
Dual Certificates and L2 Stability
[Grassmair 2012]: J(x? � x0) = O(||w||).[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.
Theorem:
[Fadili et al. 2013]
If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)
�x = �
x0
⌘@J(x0)
x
?
Tight dual certificates:
D(x0) = Im(�⇤) \ ri(@J(x0))
= interior for the topology of a↵(E)
ri(E) = relative interior of E
Dual Certificates and L2 Stability
[Grassmair 2012]: J(x? � x0) = O(||w||).[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.
�! The constants depend on N . . .
Theorem:
[Fadili et al. 2013]
If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)
�x = �
x0
⌘@J(x0)
x
?
Tight dual certificates:
D(x0) = Im(�⇤) \ ri(@J(x0))
Random matrix: �i,j ⇠ N (0, 1), i.i.d.� 2 RP⇥N ,
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
[Chandrasekaran et al. 2011]P > 2s log (N/s)
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
Theorem:
Then 9⌘ 2 ¯D(x0) with high probability on �.
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Low-rank matrices: J = || · ||⇤.
Let s = ||x0||0. If
Let r = rank(x0). If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
[Chandrasekaran et al. 2011]P > 2s log (N/s)
P > 3r(N1 +N2 � r) x0 2 RN1⇥N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
Theorem:
Then 9⌘ 2 ¯D(x0) with high probability on �.
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Low-rank matrices: J = || · ||⇤.
Let s = ||x0||0. If
Let r = rank(x0). If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
�! Similar results for || · ||1,2, || · ||1.
[Chandrasekaran et al. 2011]P > 2s log (N/s)
P > 3r(N1 +N2 � r) x0 2 RN1⇥N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
Phase TransitionsP/N
0
1
s/N r/pN
J = || · ||1 J = || · ||⇤THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 7
0 25 50 75 1000
25
50
75
100
0 10 20 300
300
600
900
FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiricalprobability that the `1 minimization problem (2.6) identifies a sparse vector x0 2 R100 given random linearmeasurements z0 = Ax0. [right] Recovery of low-rank matrices. The empirical probability that the S1minimization problem (2.7) identifies a low-rank matrix X0 2 R30£30 given random linear measurementsz0 =A (X0). In each panel, the colormap indicates the empirical probability of success (black = 0%; white =100%). The yellow curve marks the theoretical prediction of the phase transition from Theorem II; the red curvetraces the empirical phase transition.
fixed dimensions. This calculation gives the exact (asymptotic) location of the phase transition for the S1
minimization problem (2.7) with random measurements.To underscore these achievements, we have performed some computer experiments to compare the
theoretical and empirical phase transitions. Figure 2.2[left] shows the performance of (2.6) for identifyinga sparse vector in R100; Figure 2.2[right] shows the performance of (2.7) for identifying a low-rank matrixin R30£30. In each case, the colormap indicates the empirical probability of success over the randomness inthe measurement operator. The empirical 5%, 50%, and 95% success isoclines are determined from thedata. We also draft the theoretical phase transition curve, promised by Theorem II, where the number m ofmeasurements equals the statistical dimension of the appropriate descent cone, which we compute using theformulas from Sections 4.5 and 4.6. See Appendix A for the experimental protocol.
In both examples, the theoretical prediction of Theorem II coincides almost perfectly with the 50% successisocline. Furthermore, the phase transition takes place over a range of O(
pd) values of m, as promised.
Although Theorem II does not explain why the transition region tapers at the bottom-left and top-rightcorners of each plot, we have established a more detailed version of Theorem I that allows us to predict thisphenomenon as well. See the discussion after Theorem 7.1 for more information.
2.4. Demixing problems. In a demixing problem [MT12], we observe a superposition of two structuredvectors, and we aim to extract the two constituents from the mixture. More precisely, suppose that we measurea vector z0 2Rd of the form
z0 = x0 +U y0 (2.8)where x0, y0 2 Rd are unknown and U 2 Rd£d is a known orthogonal matrix. If we wish to identify the pair(x0, y0), we must assume that each component is structured to reduce the number of degrees of freedom.In addition, if the two types of structure are coherent (i.e., aligned with each other), it may be impossibleto disentangle them, so it is expedient to include the matrix U to model the relative orientation of the twoconstituent signals.
To solve the demixing problem (2.8), we describe a convex programming technique proposed in [MT12].Suppose that f and g are proper convex functions on Rd that promote the structures we expect to find in x0
THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 7
0 25 50 75 1000
25
50
75
100
0 10 20 300
300
600
900
FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiricalprobability that the `1 minimization problem (2.6) identifies a sparse vector x0 2 R100 given random linearmeasurements z0 = Ax0. [right] Recovery of low-rank matrices. The empirical probability that the S1minimization problem (2.7) identifies a low-rank matrix X0 2 R30£30 given random linear measurementsz0 =A (X0). In each panel, the colormap indicates the empirical probability of success (black = 0%; white =100%). The yellow curve marks the theoretical prediction of the phase transition from Theorem II; the red curvetraces the empirical phase transition.
fixed dimensions. This calculation gives the exact (asymptotic) location of the phase transition for the S1
minimization problem (2.7) with random measurements.To underscore these achievements, we have performed some computer experiments to compare the
theoretical and empirical phase transitions. Figure 2.2[left] shows the performance of (2.6) for identifyinga sparse vector in R100; Figure 2.2[right] shows the performance of (2.7) for identifying a low-rank matrixin R30£30. In each case, the colormap indicates the empirical probability of success over the randomness inthe measurement operator. The empirical 5%, 50%, and 95% success isoclines are determined from thedata. We also draft the theoretical phase transition curve, promised by Theorem II, where the number m ofmeasurements equals the statistical dimension of the appropriate descent cone, which we compute using theformulas from Sections 4.5 and 4.6. See Appendix A for the experimental protocol.
In both examples, the theoretical prediction of Theorem II coincides almost perfectly with the 50% successisocline. Furthermore, the phase transition takes place over a range of O(
pd) values of m, as promised.
Although Theorem II does not explain why the transition region tapers at the bottom-left and top-rightcorners of each plot, we have established a more detailed version of Theorem I that allows us to predict thisphenomenon as well. See the discussion after Theorem 7.1 for more information.
2.4. Demixing problems. In a demixing problem [MT12], we observe a superposition of two structuredvectors, and we aim to extract the two constituents from the mixture. More precisely, suppose that we measurea vector z0 2Rd of the form
z0 = x0 +U y0 (2.8)where x0, y0 2 Rd are unknown and U 2 Rd£d is a known orthogonal matrix. If we wish to identify the pair(x0, y0), we must assume that each component is structured to reduce the number of degrees of freedom.In addition, if the two types of structure are coherent (i.e., aligned with each other), it may be impossibleto disentangle them, so it is expedient to include the matrix U to model the relative orientation of the twoconstituent signals.
To solve the demixing problem (2.8), we describe a convex programming technique proposed in [MT12].Suppose that f and g are proper convex functions on Rd that promote the structures we expect to find in x0
1
0 11
P/N
From [Amelunxen et al. 20013]
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
Minimal-norm Certificate
T = Tx0
e = ex0
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
Minimal-norm Certificate
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
�T = � � ProjT
Minimal-norm Certificate
Proposition: One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
�T = � � ProjT
Minimal-norm Certificate
Proposition:
Theorem:
the unique solution x
?of P�(y) for y = �x0 + w satisfies
Tx
? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]
One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
If ⌘0 2 D(x0) and � ⇠ ||w||,
[Fuchs 2004]: J = || · ||1.[Bach 2008]: J = || · ||1,2 and J = || · ||⇤.
[Vaiter et al. 2011]: J = ||D⇤ · ||1.
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
�T = � � ProjT
Minimal-norm Certificate
Proposition:
Theorem:
the unique solution x
?of P�(y) for y = �x0 + w satisfies
Tx
? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]
One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
If ⌘0 2 D(x0) and � ⇠ ||w||,
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D(x0) with high probability on �.
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D(x0) with high probability on �.
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
�! Similar results for || · ||1,2, || · ||⇤, || · ||1.
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D(x0) with high probability on �.
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
�! Similar results for || · ||1,2, || · ||⇤, || · ||1.
�! Not using RIP technics (non-uniform result on x0).
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D(x0) with high probability on �.
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.� reduces resolution.
J(x) = ||x||1
1-D Sparse Spikes Deconvolution
�x0
x0�
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.� reduces resolution.
�0 10
2
support recovery.
J(x) = ||x||1
()
||⌘0,Ic ||1 < 1
⌘0 2 D(x0)
I = {j \ x0(j) 6= 0}||⌘0,Ic ||1
1-D Sparse Spikes Deconvolution
�x0
x0�
201
()
Overview
• Low-complexity Regularization with Gauges
• Performance Guarantees
• Grid-free Regularization
1
NWhen N ! +1, support is not stable:
||⌘0,Ic ||1 �!N!+1
c > 1.
1
c
||⌘0,Ic ||1
StableUnstable
Support Instability and Measures
1
NWhen N ! +1, support is not stable:
||⌘0,Ic ||1 �!N!+1
c > 1.
Intuition: spikes wants to move laterally.
1
c
||⌘0,Ic ||1
StableUnstable
�! Use Radon measures m 2 M(T), T = R/Z.
Support Instability and Measures
1
NWhen N ! +1, support is not stable:
||⌘0,Ic ||1 �!N!+1
c > 1.
Intuition: spikes wants to move laterally.
1
c
||⌘0,Ic ||1
StableUnstable
Extension of `1: total variation
Discrete measure: mx,a
=P
i
ai
�xi .
One has ||mx,a
||TV = ||a||1
�! Use Radon measures m 2 M(T), T = R/Z.
||m||TV = sup||g||161
Z
Tg(x) dm(x)
Support Instability and Measures
Sparse Measure Regularization
Measurements: y = �(m0) + w where
8<
:
m0 2 M(T),� : M(T) ! L2(T),w 2 L2(T).
Acquisition operator:
�(m)(x) =
Z
T'(x, x0)dm(x0) where ' 2 C
2(T⇥ T)
Sparse Measure Regularization
Measurements: y = �(m0) + w where
8<
:
m0 2 M(T),� : M(T) ! L2(T),w 2 L2(T).
Acquisition operator:
Total-variation over measures regularization:
minm2M(T)
1
2||�(m)� y||2 + �||m||TV
�(m)(x) =
Z
T'(x, x0)dm(x0) where ' 2 C
2(T⇥ T)
Sparse Measure Regularization
Measurements: y = �(m0) + w where
8<
:
m0 2 M(T),� : M(T) ! L2(T),w 2 L2(T).
Acquisition operator:
Total-variation over measures regularization:
minm2M(T)
1
2||�(m)� y||2 + �||m||TV
�! Infinite dimensional convex program.
�! If dim(Im(�)) < +1, dual is finite dimensional.
�! If � is a filtering, re-cast dual as SDP program.
�(m)(x) =
Z
T'(x, x0)dm(x0) where ' 2 C
2(T⇥ T)
Sparse Measure Regularization
Measurements: y = �(m0) + w where
8<
:
m0 2 M(T),� : M(T) ! L2(T),w 2 L2(T).
Measures: minm2M
12 ||�m� y||2 + �||m||TV
�1
+1
Fuchs vs. Vanishing Pre-Certificates
mina2RN
12 ||�za� y||2 + �||a||1
Measures: minm2M
12 ||�m� y||2 + �||m||TV
On a grid z:
�1
+1
Fuchs vs. Vanishing Pre-Certificates
zi
mina2RN
12 ||�za� y||2 + �||a||1
Measures: minm2M
12 ||�m� y||2 + �||m||TV
On a grid z:
⌘F = �⇤�⇤,+I sign(a0,I)
�1
+1
For m0 = mz,a0 , supp(m0) = x0, supp(a0) = I:
Fuchs vs. Vanishing Pre-Certificates
zi
⌘F
mina2RN
12 ||�za� y||2 + �||a||1
Measures: minm2M
12 ||�m� y||2 + �||m||TV
On a grid z:
⌘F = �⇤�⇤,+I sign(a0,I) ⌘
V
= �⇤�+,⇤x0
�sign(a0), 0
�⇤
�1
+1
For m0 = mz,a0 , supp(m0) = x0, supp(a0) = I:
where �x
(a, b) =P
i
a
i
'(·, xi
) + b
i
'
0(·, xi
)
Fuchs vs. Vanishing Pre-Certificates
⌘V
zi
⌘F
(holds for ||w|| small enough and � ⇠ ||w||)
mina2RN
12 ||�za� y||2 + �||a||1
Measures: minm2M
12 ||�m� y||2 + �||m||TV
On a grid z:
⌘F = �⇤�⇤,+I sign(a0,I) ⌘
V
= �⇤�+,⇤x0
�sign(a0), 0
�⇤
�1
+1
For m0 = mz,a0 , supp(m0) = x0, supp(a0) = I:
where �x
(a, b) =P
i
a
i
'(·, xi
) + b
i
'
0(·, xi
)
Fuchs vs. Vanishing Pre-Certificates
If 8 j /2 I, |⌘F (xj)| < 1,
then supp(a�) = supp(a0)
Theorem: [Fuchs 2004]
⌘V
zi
⌘F
(holds for ||w|| small enough and � ⇠ ||w||)
mina2RN
12 ||�za� y||2 + �||a||1
Measures: minm2M
12 ||�m� y||2 + �||m||TV
On a grid z:
⌘F = �⇤�⇤,+I sign(a0,I) ⌘
V
= �⇤�+,⇤x0
�sign(a0), 0
�⇤
�1
+1
For m0 = mz,a0 , supp(m0) = x0, supp(a0) = I:
where �x
(a, b) =P
i
a
i
'(·, xi
) + b
i
'
0(·, xi
)
then m�
= mx�,a� with
||x� � x0||1 = O(||w||)
Theorem: [Duval-Peyre 2013]If 8 t /2 x0, |⌘V (t)| < 1,
Fuchs vs. Vanishing Pre-Certificates
If 8 j /2 I, |⌘F (xj)| < 1,
then supp(a�) = supp(a0)
Theorem: [Fuchs 2004]
⌘V
zi
⌘F
Numerical Illustration
�1
+1⌘V⌘F
Zoom
+1
Ideal low-pass filter: '(x, x
0) =
sin((2fc+1)⇡(x�x
0))sin(⇡(x�x
0)) , f
c
= 6.
⌘F
⌘V
Solution path � 7! a�
�
Numerical Illustration
�1
+1⌘V⌘F
Zoom
+1
Ideal low-pass filter: '(x, x
0) =
sin((2fc+1)⇡(x�x
0))sin(⇡(x�x
0)) , f
c
= 6.
⌘F
⌘V
Solution path � 7! a�
�Theorem: [Duval-Peyre 2013]
Discrete ! continuous:
If ⌘V is valid, then a�
neighbors around supp(m0).
is supported on pairs of
(holds for � ⇠ ||w|| small enough.
Numerical Illustration
�1
+1⌘V⌘F
Zoom
+1
Ideal low-pass filter: '(x, x
0) =
sin((2fc+1)⇡(x�x
0))sin(⇡(x�x
0)) , f
c
= 6.
⌘F
⌘V
Gauges: encode linear models as singular points.
Conclusion
Gauges: encode linear models as singular points.
Performance measures
L2error
model
di↵erent CS guarantees
Conclusion
Gauges: encode linear models as singular points.
Performance measures
L2error
model
di↵erent CS guarantees
Specific certificate:⌘0, ⌘F , ⌘V , . . .
Conclusion
– Approximate model recovery Tx
? ⇡ Tx0 .
Gauges: encode linear models as singular points.
Performance measures
L2error
model
di↵erent CS guarantees
Specific certificate:⌘0, ⌘F , ⌘V , . . .
(e.g. grid-free recovery)
– CS performance with arbitrary gauges.
Conclusion
Open problems: