Topics in MMSE Estimation for Sparse Approximation

Michael Elad The Computer Science Department The Technion – Israel Institute of technology Haifa 32000, Israel

Joint work with

Irad Yavneh Matan Protter Javier Turek

The CS Department, The Technion

Topics in MMSE Estimation For Sparse ApproximationBy: Michael Elad

Part I - Motivation Denoising

By Averaging Several Sparse Representations

Sparse Representation Denoising

DA fixed Dictionary

200 y.t.smin

Sparse representation modeling:

Assume that we get a noisy measurement vector

Our goal – recovery of x (or α).

The common practice – Approximate the solution of

y x v v

where is AWGN

Orthogonal Matching Pursuit

Ki1forrdzmin)i(ECompute 1ni

Initialization

Main Iteration

)i(E)i(E,Ki1.t.siChoose 00

nn Spsup.t.symin:LS

Dnn yr:sidualReUpdate D

}i{SS:SUpdate 01nnn

StopYesNo

OMP finds one atom at a time for approximating the solution of

200 y.t.smin

Using several Representations

Consider the denoising problem

and suppose that we can find a group of J candidate solutions

such that

200 y.t.smin

Basic Questions: What could we do with such

a set of competing solutions in order to better denoise y?

Why should this help?

How shall we practically find such a set of solutions?

Relevant work: [Leung & Barron (’06)] [Larsson & Selen (’07)] [Schintter et. al. (`08)] [Elad and Yavneh (’08)] [Giraud (‘08)] [Protter et. al. (‘10)] …

Generating Many Representations

Our Answer: Randomizing the OMP

Ki1forrdzmin)i(ECompute 1ni

Initialization

Main Iteration

)i(E)i(E,Ki1.t.siChoose 00

nn Spsup.t.symin:LS

Dnn yr:sidualReUpdate D

}i{SS:SUpdate 01nnn

)i(EcexpyprobabilitwithiChoose 0

* Larsson and Schnitter propose a more complicated and deterministic tree pruning method

For now, lets set the parameter c manually for best performance. Later we shall

define a way to set it automatically

Lets Try

Proposed Experiment :

Form a random D.

Multiply by a sparse vector α0 ( ).

Add Gaussian iid noise (σ=1) and obtain .

Solve the problem

using OMP, and obtain .

Use RandOMP and obtain .

Lets look at the obtained representations …

1000RandOMPj

0 2min s.t. y 100

Some Observations

0 10 20 30 400

Candinality

Random-OMP cardinalitiesOMP cardinality

85 90 95 100 1050

Representation ErrorH

Random-OMP errorOMP error

0 0.1 0.2 0.3 0.40

Noise Attenuation

Random-OMP denoisingOMP denoising

0 5 10 15 200.05

Cardinality

Random-OMP denoisingOMP denoising

We see that

•The OMP gives the sparsest solution

•Nevertheless, it is not the most effective for denoising.

•The cardinality of a representation does not reveal its efficiency.

The Surprise … (to some of us)

0 50 100 150 200-3

Averaged Rep.Original Rep.OMP Rep.

1000RandOMPj

Lets propose the average

as our representation

This representation IS NOT SPARSE AT ALL but its noise attenuation is: 0.06 (OMP gives 0.16)

0 0.1 0.2 0.3 0.4 0.50

OMP Denoising Factor

OMP versus RandOMP resultsMean Point

Cases of zero solution, where

Repeat this Experiment …•Dictionary (random) of

size N=100, K=200

•True support of α is 10

• σx=1 and ε=10

•We run OMP for denoising.

•We run RandOMP J=1000 times and average

•Denoising is assessed by

RandOMPjJ

2y 100

Part II - Explanation It is

Time to be More Precise

Our Signal Model

DA fixed Dictionary

x D is fixed and known.

Assume that α is built by: Choosing the support s with

probability P(s) from all the 2K possibilities Ω.

Lets assume that P(iS)=Pi

are drawn independently.

Choosing the αs coefficients using iid Gaussian entries .

The ideal signal is x=Dα=Dsαs.

The p.d.f. P(α) and P(x) are clear and known

2xN 0,

Adding Noise

DA fixed Dictionary

+Noise Assumed:

The noise v is additive white Gaussian vector with probability Pv(v)

The conditional p.d.f.’s P(y|), P(|y), and even P(y|s), P(s|y), are all clear and well-

defined (although they may appear nasty).

yxexpCxyP

The Key – The Posterior P(|y)

P | yWe have access to

MAP MMSE

MAP ArgMax P( | y)ˆ MMSE E | y

The estimation of x is done by estimating α and multiplication by D.

These two estimators are impossible to compute, as we show next.

Oracle known

support s

oracle

* Actually, there is a delicate problem with this definition, due to the unavoidable mixture of continuous and discrete PDF’s. The solution is to estimate the MAP’s support S.

Lets Start with The Oracle

P|yPy|Ps,y|P ss

yexp|yP

yexpy|P

sTs2s hy

* When s is known

Comments:

• This estimate is both the MAP and MMSE.

• The oracle estimate of x is obtained by multiplication by Ds.

T 1|s| s ss s

P(y| s) P(y| s, )P( )d

h h log(det( ))exp

Based on our prior for generating the support

i ji s j s

P s P 1 P

The MAP Estimation

P(y| s)P(s)s ArgMax P(s| y) ArgMax

MAP s ss s

i s j sx

h h log(det( ))s ArgMax exp

The MAP Estimation

Implications:

The MAP estimator requires to test all the possible supports for the maximization. For the found support, the oracle formula is used.

In typical problems, this process is impossible as there is a combinatorial set of possibilities.

This is why we rarely use exact MAP, and we typically replace it with approximation algorithms (e.g., OMP).

T 1s ss s

i s j sx

h h log(det( ))2 2

s ArgMax

log lP

o 1 Pg

ss hˆs,y|E Q

This is the oracle for s, as we have seen before

The MMSE Estimation

P(s| y) P(s) P(y| s) ...

h h log( P1

det( ))e

MMSE s,y|E)y|s(Py|Eˆ

sMMSE ˆ)y|s(Pˆ

The MMSE Estimation

MMSE s,y|E)y|s(Py|Eˆ

sMMSE ˆ)y|s(Pˆ

Implications:

The best estimator (in terms of L2 error) is a weighted average of many sparse representations!!!

As in the MAP case, in typical problems one cannot compute this expression, as the summation is over a combinatorial set of possibilities. We should propose approximations here as well.

This is our c in

the Random-

The Case of |s|=1 and Pi=P

T 1s ss s

i2 2 2

i j sx

h h log(det( ))P(s| y) exp

1exp (y d)

The i-th atom in D Based on this we can propose a greedy

algorithm for both MAP and MMSE:

MAP – choose the atom with the largest inner product (out of K), and do so one at a time, while freezing the previous ones (almost OMP).

MMSE – draw at random an atom in a greedy algorithm, based on the above probability set, getting close to P(s|y) in the overall draw (almost RandOMP).

Comparative Results

The following results correspond to a small dictionary (10×16), where the combinatorial formulas can be evaluated as well.Parameters:

• N,K: 10×16

• P=0.1 (varying cardinality)

• σx=1

• J=50 (RandOMP)

• Averaged over 1000 experiments

0.60.2 0.4 0.8 10

OracleMMSEMAPOMPRand-OMP

Part III – Diving In A Closer

Look At the Unitary Case IDDDD TT

Few Basic Observations

2 2T x

s s s2 2 2 2x x

Ts s2 2 s

2oracle 1 2x

ss s 2 2 s sx

1 1h y

Q D D I I

yTDLet us denote

(The Oracle)

log(det( )) 1S log

T 1 2 2s ss

h h c2

Back to the MAP Estimation

T 1s ss s i

ji s j sx

h h log(det( ) P1

)P S| y exp

ii s i si

222 ii2

PP S| y exp

The MAP Estimator

qP S| y

P 1 ccq exp

is obtained by maximizing the

expression

Thus, every i such that qi>1 should be in the support, which leads to 22 2

i i 2MAP 2i i

i2c log

cˆ P 1 c

0 Otherwise

-3 -2 -1 0 1 2 3-3

P=0.1=0.3x=1

The MMSE Estimation

Some algebra

MMSE 2ii i

P 1 ccq exp

-3 -2 -1 0 1 2 3-3

P=0.1=0.3x=1

and we get that

This result leads to a dense representation vector. The curve is a

smoothed version of the MAP one.

What About the Error ?

n2oracle 1 2 2

s i2 i 1

E trace ... c gˆ

2 2MMSE MMSE oracle1ss

n n2 2 4 2 2

i i i ii 1 i 1

E P s| y traceˆ ˆ ˆ

... c g c g g

2 2MAP MAP MMSE MMSE

n n2 2 4 2 MAP

i i i i ii 1 i 1

E Eˆ ˆ ˆ ˆ

... c g c g I (1 2g)

28Topics in MMSE Estimation For Sparse ApproximationBy: Michael Elad

A Synthetic Experiment

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

The following results correspond to a dictionary of size (100×100)Parameters:

• n,K: 100×100

• P=0.1

• σx=1

• Averaged over 1000 experiments

The average errors are shown relative to n2

Empirical OracleTheoretical OracleEmpirical MMSETheoretical MMSEEmpirical MAPTheoretical MAP

Relative Mean-Squared-Error

Part IV - Theory Estimation Errors

Useful Lemma

Then .

Let (ak,bk) k=1,2, … ,n be pairs of positive real numbers. Let m be the index of a pair such that

a ak .

kk 1 mn

Equality is obtained only if all the ratios ak/bk are equal.

n n2MMSE 2 2 4 2 2

i i i i2 i 1 i 1

E c g c g gˆ

n n2MAP 2 2 4 2 MAP

i i i i i2 i 1 i 1

E c g c g I (1 2g)ˆ

n2oracle 2 2i

E c gˆ

We are interested in this result because :

This leads to …

Theorem 1 – MMSE Error

P 1 cG

22MMSE

2oracle

1 e1 2log GE ˆ 4G 4

2E ˆ 1 OtherwiseG e

Define . Choose m such that .

m kk, G G

this error ratio bound becomes

kP P 1K

2oracle

E ˆConst logK

Theorem 2 – MAP Error

P 1 cG

2 1MAPm

2oracle

11 2log G eE ˆ G

2E ˆ 1 OtherwiseG e

Define . Choose m such that .

m kk, G G

this error ratio bound becomes

kP P 1K

2oracle

E ˆConst logK

The Bounds’ Factors vs. P

11 2log G e

21 Otherwise

1 e1 2log G

21 Otherwise

Parameters:

• P=[0,1]

• σx=1

Notice that the tendency of

the two estimators to align for P0

is not reflected in these bounds.

Part V – We Are Done Summary and Conclusions

Today We Have Seen that …

By finding the sparsest

representation and using it to recover the clean signal

Sparsity and Redundancy are

used for denoising of

signals/images

Can we do better?

Yes! Averaging

several rep’s lead to better denoising, as

it approximates

the MMSEMore on these (including the slides and the relevant papers) can be found in http://www.cs.technion.ac.il/~elad

Unitary case?

MAP and MMSE enjoy a closed-form, exact and cheap formulae.

Their error is bounded and tightly

related to the oracle’s error

Topics in MMSE Estimation for Sparse Approximation

Documents

Transcript of Topics in MMSE Estimation for Sparse Approximation

Optimal Approximation with Sparse Deep Neural Networksspars2017.lx.it.pt/index_files/PlenarySlides/Kutyniok_SPARS2017.pdf · Approximation by DDNs versus best M-term approximations

Estimation via Sparse Approximation: Error Bounds and ...

Response Surface Approximation Using Sparse

R. Ştefănescu, A. Sandu Efficient approximation of sparse ...

AWaveletBasedSparseGridMethodforthe … · 2020. 1. 13. · Schr¨odinger equation, numerical approximation, sparse grid method, anti-symmetric sparse grids 1. ... grid method for

Mini-Mental State Examination (MMSE) and the …fsawellness.org/articles/mmse normative data.pdfMini-Mental State Examination (MMSE) and the Modified MMSE (3MS): A Psychometric Comparison

SAMBA: Sparse approximation of moment-based arbitrary ... · SAMBA: Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos 1R.Ahlfeld,B.Belkouchi,F.Montomoli Department of

ROBUST VIDEO RESTORATION BY JOINT SPARSE …matjh/download/restoration_SIIMS/video_SIIMS... · ROBUST VIDEO RESTORATION BY JOINT SPARSE AND LOW RANK MATRIX APPROXIMATION HUI JI y,

Sparse representations and approximation theoryadd to the theory of sparse representations. ... Recent years have witnessed an abundance of research devoted to the subject of sparse

On Fourier and Wavelets: Representation, Approximation and … · 2011-05-13 · Representation, Approximation and Compression: Why does it matter anyway? Parsimonious or sparse representation

Gradient-based Sparse Approximationesakhaee/papers/ISBI/Limited... · 2015-07-28 · Gradient-based Sparse Approximation for Computed Tomography ... University of Florida esakhaee@cise.ufl.edu

Sparse Principal Component Analysis via Regularized …jianhua/paper/sparsePCA.pdf · Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation Haipeng Shen∗and

Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions

Modified MMSE

Space-efficient Approximation Scheme for Maximum Matching in Sparse Graphs

Sparse and High-Dimensional Approximation - TU Chemnitzpotts/paper/slidesII.pdf · 2019. 3. 11. · Sparse and High-Dimensional Approximation Lecture, SoSe 2018 based on manuscript

Sparse representations and approximation theorypinkus/papers/sparsity.pdf · Sparse representations and approximation theory Allan Pinkus Department of Mathematics, Technion, 32000

Topics in MMSE Estimation for Sparse Approximation

Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

QUIC: Quadratic Approximation for Sparse Inverse …inderjit/public_papers/quic_jmlr14.pdfQUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation more general weighted