Prox versus Prior?
Who cares—Just learn the damn thing!
Philip Schniter and Jeremy Vila
Supported in part by NSF-I/UCRC grant IIP-0968910, by NSF grant CCF-1018368, and byDARPA/ONR grant N66001-10-1-4090.
BASP Frontiers — Jan ’13
Compressive Sensing
Goal: recover signal x from noisy sub-Nyquist measurements
y = Ax+w x ∈ RN y,w ∈ R
M M < N.
where x is K-sparse with K<M , or compressible.
With sufficient sparsity and appropriate conditions on the mixingmatrix A (e.g. RIP, nullspace), accurate recovery of x is possibleusing polynomial-complexity algorithms.
A common approach (LASSO) is to solve the convex problem
minx
‖y −Ax‖22 + α‖x‖1
where α can be tuned in accordance with sparsity and SNR.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 2 / 20
Phase Transition Curves (PTC)
The PTC identifies ratios (MN ,KM ) for which perfect noiseless recovery
of K-sparse x occurs (as M,N,K → ∞ under i.i.d sub-Gaussian A).
Suppose {xn} are drawn i.i.d.
pX(xn) = λfX(xn)+(1−λ)δ(xn)
with known λ , K/N .
LASSO’s PTC is invariant tofX(·). Thus, LASSO is robustin the face of unknown fX(·).
MMSE-reconstruction’s PTCis far better than Lasso’s, butrequires knowing prior fX(·). 0.2 0.4 0.6 0.8
0
0.2
0.4
0.6
0.8
1
M/N (undersampling)
K/M
(sparsity)
MMSE reconstruct
theoretical LASSO
empirical AMP
Wu and Verdu, “Optimal phase transitions in compressed sensing,” arXiv Nov. 2011.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 3 / 20
Motivations
For practical compressive sensing. . .
want minimal MSE
– distributions are unknown ⇒ can’t formulate MMSE estimator– but there is hope:
various algs seen to outperform Lasso for specific signal classes– really, we want a universal algorithm: good for all signal classes
want fast runtime
– especially for large signal-length N (i.e., scalable).
want to avoid algorithmic tuning parameters,
– who wants to tweak an algorithm when you can ski in the alps!
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 4 / 20
Proposed Approach: “EM-GM-AMP”
Model the signal and noise using flexible distributions:
– i.i.d Bernoulli Gaussian-mixture (GM) signal
p(xn) = λL∑
l=1
ωl N (xn; θl, φl) + (1− λ)δ(xn) ∀n
– i.i.d Gaussian noise with variance ψ (but easy to generalize)
Learn the prior parameters q , {λ, ωl, θl, φl, ψ}Ll=1
– treat as deterministic and use expectation-maximization (EM)
Exploit the learned priors in near-MMSE signal reconstruction
– use the approximate message passing (AMP) algorithm– AMP also provides EM with everything it needs to know
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 5 / 20
Approximate Message Passing (AMP)
AMP methods infer x from y = Ax+w using loopy beliefpropagation with carefully constructed approximations.
The original AMP [Donoho, Maleki, Montanari ’09] solves the LASSOproblem assuming i.i.d sub-Gaussian matrix A.
The Bayesian AMP [Donoho, Maleki, Montanari ’10] framework tacklesMMSE inference under any factorized signal prior, AWGN, and i.i.d A.
The generalized AMP [Rangan ’10] framework tackles MAP or MMSEinference under any factorized signal prior & likelihood, and generic A.
AMP is a form of iterative thresholding, requiring only twoapplications of A per iteration and ≈ 25 iterations. Very fast!
Rigorous large-system analyses (under i.i.d sub-Gaussian A) haveestablished that (G)AMP follows a state-evolution trajectory withcertain optimalities [Bayati, Montanari ’10], [Rangan ’10].
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 6 / 20
AMP Heuristics (Sum-Product)pX(x1)
pX(x2)
pX(xN )
x1
x2
xN
p1→1(x1)
pM←N (xN )
N (y1; [Ax]1, ψ)
N (y2; [Ax]2, ψ)
N (yM ; [Ax]M , ψ)
......
...
1 Message from yi node to xj node:
pi→j(xj) ∝
∫
{xr}r 6=j
N(yi;
≈ N via CLT︷ ︸︸ ︷∑
rairxr , ψ
)∏
r 6=jpi←r(xr)
≈
∫
zi
N (yi; zi, ψ)N(zi; zi(xj), ν
zi (xj)
)∼ N
To compute zi(xj), νzi (xj), the means and variances of {pi←r}r 6=j suffice,
thus Gaussian message passing!
Remaining problem: we have 2MN messages to compute (too many!).
2 Exploiting similarity among the messages{pi←j}
Mi=1
, AMP employs a Taylor-seriesapproximation of their difference whoseerror vanishes as M→∞ for dense A (andsimilar for {pi←j}Ni=1
as N→∞).Finally, need to compute only O(M+N)messages!
pX(x1)
pX(x2)
pX(xN )
x1
x2
xN
p1→1(x1)
pM←N (xN )
N (y1; [Ax]1, ψ)
N (y2; [Ax]2, ψ)
N (yM ; [Ax]M , ψ)
......
...
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 7 / 20
The GAMP Algorithm
Require: Matrix A, Bayes ∈ {0, 1}, initializations x0, ν0x
t = 0, s−1 = 0, ∀mn : Smn = |Amn|2
repeat
νtp = Sνt
x, pt = Axt − st−1.νtp (gradient step)
if Bayes then∀m : ztm = E(Z|P ; ptm, ν
tpm), νtzm = var(Z|P ; ptm, ν
tpm),
else
∀m : ztm = proxνtpm
fz(ptm), νtzm = νtpmprox′νt
pmfz(ptm)
end if
νts = (1− νt
z./νtp)./ν
tp, st = (zt − pt)./νt
p (dual update)νtr = 1./(STνt
s), rt = xt + νtrA
Tst (gradient step)if Bayes then∀n : xt+1n = E(X|R; rtn, ν
trn), νt+1xn
= var(X|R; rtn, νtrn),
else
∀n : xt+1n = proxνtrn
fx(rtn), νt+1xn
= νtrnprox′νtrn
fx(rtn)
end if
t← t+1until Terminated
Note connections to primal-dual, ADMM, split-Bregman, proximal FB splitting, DR...
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 8 / 20
Expectation-Maximization
We use expectation-maximization (EM) to learn the signal and noise
prior parameters q , {λ,ω,θ,φ, ψ}
Alternate between
E: maximizing a lower bound on ln p(y; q) for fixed q
M: maximizing q for fixed lower bounding function.
Lower bound via posterior approx∏N
n=1p(xn|y; q
i) ≈ p(x|y; qi).
ln p(y; q) =
N∑
n=1
∫
xn
p(xn|y; qi) ln p(xn; q)
︸ ︷︷ ︸
lower bound
+D(px|y ‖ px|y
)
︸ ︷︷ ︸
≥0
Incremental maximization: λ, θ1, . . . , θL, φ1, . . . , φL,ω, ψ
All quantities needed for the EM updates are provided by AMP!
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 9 / 20
Parameter Initialization
Initialization matters; EM can get stuck in a local max. Our approach:
initialize the sparsity λ according to the theoretical LASSO PTC.
initialize the noise and active-signal variances using known energies‖y‖2
2, ‖A‖2F and SNR0 = 20 dB (or user-supplied value):
ψ0 =‖y‖2
2
(SNR0 + 1)M, (σ2)0 =
‖y‖22−Mψ0
λ0‖A‖2F
fix L = 3 and initialize the GM parameters (ω,θ,φ) as the best fit toa uniform distribution with variance σ2.
As other options, we provide
a “heavy tailed” mode that forces zero-mean GM components.
a model-order-selection procedure to learn L.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 10 / 20
Examples of Learned Signal-pdfs
Example comparingEM-GM-AMP-learnedpriors to the actualpriors used togenerate the datarealization.
The top correspondsto an i.i.dtriangle-mixture signaland the bottom ani.i.d Student-t signal.
−15 −10 −5 0 5 10 150
0.1
0.2
0.3
0.4
pdf (
x)
tagST
StudentT
−15 −10 −5 0 5 10 150
0.2
0.4
0.6
0.8
pmf (
x)
True pdfEstimated pdfEstimated pmf
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.01
0.02
0.03
0.04
0.05
pdf (
x)tagtri
Trimix
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
pmf (
x)
True pdfTrue pmfEstimated pdfEstimated pmf
True and EM-GM-AMP-learned signal prior pdfs
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 11 / 20
Empirical PTCs: Bernoulli-Gaussian signals
We now evaluate noiseless reconstruction performance viaphase-transition curves constructed using N=1000-length i.i.dsignals, i.i.d Gaussian A, and 100 realizations.
We see all variants ofEM-GM-AMP performingsignificantly better thanLASSO for i.i.dBernoulli-Gaussian signals.
Perhaps not a faircomparison because the trueprior is matched to ourmodel.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M/N
K/M
EM-GM-AMP
EM-GM-AMP-3EM-BG-AMP
genie GM-AMP
LASSO thyLaplacian-AMP
Empirical noiseless Bernoulli-Gaussian PTCs
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 12 / 20
PTCs for Bernoulli and Bernoulli-Rademacher signals
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M/N
K/M
EM-GM-AMPEM-GM-AMP-3EM-BG-AMP
genie GM-AMP
LASSO thyLaplacian-AMP
Empirical noiseless Bernoulli PTCs
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M/N
K/M
EM-GM-AMP
EM-GM-AMP-3EM-BG-AMP
genie GM-AMP
LASSO thyLaplacian-AMP
Empirical noiseless Bernoulli-Rademacher PTCs
For these signals, we see EM-GM-AMP performing. . .
significantly better than LASSO,
with model-order-selection enabled, as good as genie-aided GM-AMP
significantly better than EM-BG-AMP with the i.i.d BR signal.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 13 / 20
Noisy Recovery: Bernoulli-Rademacher (±1) signals
We now compare the normalized MSE of EM-GM-AMP to severalstate-of-the-art algorithms (SL0, T-MSBL, BCS, Lasso via SPGL1)for the task of noisy i.i.d signal recovery under i.i.d Gaussian A.
For this, we fixed N=1000, K=100, SNR=25dB and varied M .
For these i.i.d BR signals, we seeEM-GM-AMP outperformingthe other algorithms for allundersampling ratios M/N .
Notice that the EM-BG-AMPalgorithm cannot accuratelymodel the Bernoulli-Rademacherprior. 0.2 0.25 0.3 0.35 0.4 0.45 0.5
−35
−30
−25
−20
−15
−10
−5
0
OMP
SP
M/N
NMSE[dB]
EM-GM-AMPEM-GM-AMP-3
EM-BG-AMP
BCS
T-MSBL
SL0genie Lasso
Noisy Bernoulli-Rademacher recovery NMSE.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 14 / 20
Noisy Recovery: Bernoulli-Gaussian and Bernoulli signals
0.2 0.25 0.3 0.35 0.4 0.45 0.5
−25
−20
−15
−10
−5
OMPSP
M/N
NMSE[dB]
EM-GM-AMP
EM-GM-AMP-L
EM-BG-AMP
BCS
T-MSBL
SL0
genie Lasso
Noisy Bernoulli-Gaussian recovery NMSE.
0.2 0.25 0.3 0.35 0.4 0.45 0.5
−35
−30
−25
−20
−15
−10
−5
0
OMPSP
M/N
NMSE[dB]
EM-GM-AMP
EM-GM-AMP-3
EM-BG-AMP
BCS
T-MSBL
SL0
genie Lasso
Noisy Bernoulli recovery NMSE.
For i.i.d Bernoulli-Gaussian and i.i.d Bernoulli signals, EM-GM-AMPagain dominates the other algorithms.
We attribute the excellent performance of EM-GM-AMP to its abilityto learn and exploit the true signal prior.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 15 / 20
Noisy Recovery of Heavy-Tailed signals
0.3 0.35 0.4 0.45 0.5 0.55 0.6
−10
−9
−8
−7
−6
−5
OMPSP
M/N
NMSE[dB]
EM-GM-AMP
EM-GM-AMP-4
EM-BG-AMP
BCS
T-MSBL
SL0
genie Lasso
Noisy Student-t recovery NMSE.
0.2 0.25 0.3 0.35 0.4 0.45 0.5
−28
−26
−24
−22
−20
OMPSP
M/N
NMSE[dB]
EM-GM-AMP
EM-GM-AMP-4
EM-BG-AMP
BCS
T-MSBL
SL0
genie Lasso
Noisy log-normal recovery NMSE.
In its “heavy tailed” mode, EM-GM-AMP again uniformlyoutperforms all other algorithms.
Rankings among other algorithms differ across signal types. (CompareOMP and SL0 performances.)
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 16 / 20
Runtime versus signal-length N
We fix M/N=0.5, K/N=0.1, SNR=25dB, and average 50 trials.
103
104
105
106
10−1
100
101
102
OMP
SP
N
Runtime[sec]
EM-GM-AMP
EM-BG-AMPEM-GM-AMP-3
BCST-MSBL
SL0
SPGL1
EM-GM-AMP fft
EM-BG-AMP fft
EM-GM-AMP-3 fft
SPGL1 fft
Noisy Bernoulli-Rademacher recovery time.
103
104
105
106
−40
−35
−30
−25
−20
SPOMP
NNMSE[dB]
EM-GM-AMP
EM-BG-AMP
EM-GM-AMP-3
BCST-MSBL
SL0SPGL1
EM-GM-AMP fftEM-BG-AMP fft
EM-GM-AMP-3 fft
SPGL1 fft
Noisy Bernoulli-Rademacher recovery NMSE.
For all N > 1000, EM-GM-AMP has the fastest runtime!
EM-GM-AMP can also leverage fast operators for A (e.g., FFT).
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 17 / 20
Extension to structured sparsity (Justin Ziniel)
Recovery of an audio signal sparsified via DCT Ψ and compressivelysampled via i.i.d Gaussian Φ (so that A = ΦΨ).
Exploit persistence of support across time via Markov chains andturbo AMP.
. . . . . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
.
.
....
t
g(0)1
g(0)m
g(0)M
x(0)1
x(0)n
x(0)N
f(0)1
f(0)n
f(0)N
s(0)1
s(0)n
s(0)N
s(1)1
s(1)n
s(1)N
s(T )1
s(T )n
s(T )N
θ(0)1
θ(0)n
θ(0)N
θ(1)1
θ(1)n
θ(1)N
θ(T )1
θ(T )n
θ(T )N
h(0)1
h(1)1
h(T−1)1
h(0)N
h(1)N
h(T−1)N
d(0)1
d(1)1
d(T−1)1
d(0)N
d(1)N
d(T−1)N
AMP
algorithm M/N = 1/5 M/N = 1/3 M/N = 1/2EM-GM-AMP-3 -9.04 dB 8.77 s -12.72 dB 10.26 s -17.17 dB 11.92 s
turbo EM-GM-AMP-3 -12.34 dB 9.37 s -16.07 dB 11.05 s -20.94 dB 12.96 s
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 18 / 20
Conclusions
We proposed a sparse reconstruction alg that uses EM and AMP tolearn and exploit the GM-signal prior and AWGN variance.
Advantages of EM-GM-AMP: for signal length N & 1000,...
State-of-the-art NMSE performance with all tested i.i.d priors.State-of-the-art complexityMinimal tuning: choose between “sparse” or “heavy-tailed” modes.
Ongoing related work:
Extensions beyond AWGN (e.g., phase retrieval, binary classification).Universal learning/exploitation of structured sparsity.Extensions to matrix completion, dict learning, robust PCA, NNMF.
EM-AMP Theory:
Asymptotic consistency under a matched prior with certainidentifiability conditions:Kamilov, Rangan, Fletcher, Unser, “Approximate message passing with consistentparameter estimation and applications to sparse learning,” NIPS 2012.
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 19 / 20
EM-GM-AMP is integrated into GAMPmatlab:http://sourceforge.net/projects/gampmatlab/
Interface and examples athttp://ece.osu.edu/~vilaj/EMGMAMP/EMGMAMP.html
Thanks!
Philip Schniter and Jeremy Vila (OSU) EM-GM-AMP BASP Frontiers — Jan ’13 20 / 20
Top Related