Sparse Coding and Dictionary Learning for Image...

137
Sparse Coding and Dictionary Learning for Image Analysis Julien Mairal INRIA Visual Recognition and Machine Learning Summer School, 27th July 2010 Julien Mairal Sparse Coding and Dictionary Learning 1/137

Transcript of Sparse Coding and Dictionary Learning for Image...

Page 1: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse Coding and Dictionary Learningfor Image Analysis

Julien Mairal

INRIA Visual Recognition and Machine Learning Summer School,27th July 2010

Julien Mairal Sparse Coding and Dictionary Learning 1/137

Page 2: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

What this lecture is about?

Why sparsity, what for and how?

Signal and image processing: Restoration, reconstruction.

Machine learning: Selecting relevant features.

Computer vision: Modelling the local appearance of imagepatches.

Computer vision: Recent (and intriguing) results in bags ofwords models.

Optimization: Solving challenging problems.

Julien Mairal Sparse Coding and Dictionary Learning 2/137

Page 3: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

The Image Denoising Problem

y︸︷︷︸

measurements

= xorig︸︷︷︸

original image

+ w︸︷︷︸noise

Julien Mairal Sparse Coding and Dictionary Learning 3/137

Page 4: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration

y︸︷︷︸

measurements

= xorig︸︷︷︸

original image

+ w︸︷︷︸

noise

Energy minimization problem - MAP estimation

E (x) =1

2‖y − x‖22

︸ ︷︷ ︸

relation to measurements

+ Pr(x)︸ ︷︷ ︸

image model (-log prior)

Some classical priors

Smoothness λ‖Lx‖22Total variation λ‖∇x‖21MRF priors

. . .

Julien Mairal Sparse Coding and Dictionary Learning 4/137

Page 5: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

What is a Sparse Linear Model?

Let x in Rm be a signal.

Let D = [d1, . . . ,dp] ∈ Rm×p be a set of

normalized “basis vectors”.We call it dictionary.

D is “adapted” to x if it can represent it with a few basis vectors—thatis, there exists a sparse vector α in R

p such that x ≈ Dα. We call αthe sparse code.

x

︸ ︷︷ ︸

x∈Rm

d1 d2 · · · dp

︸ ︷︷ ︸

D∈Rm×p

α[1]α[2]...

α[p]

︸ ︷︷ ︸

α∈Rp,sparse

Julien Mairal Sparse Coding and Dictionary Learning 5/137

Page 6: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

First Important Idea

Why Sparsity?

A dictionary can be good for representing a class ofsignals, but not for representing white Gaussian noise.

Julien Mairal Sparse Coding and Dictionary Learning 6/137

Page 7: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

The Sparse Decomposition Problem

minα∈Rp

1

2‖x−Dα‖22

︸ ︷︷ ︸

data fitting term

+ λψ(α)︸ ︷︷ ︸

sparsity-inducingregularization

ψ induces sparsity in α. It can be

the ℓ0 “pseudo-norm”. ‖α‖0△

= #{i s.t. α[i ] 6= 0} (NP-hard)

the ℓ1 norm. ‖α‖1△

=∑p

i=1 |α[i ]| (convex),

. . .

This is a selection problem. When ψ is the ℓ1-norm, the problem iscalled Lasso [Tibshirani, 1996] or basis pursuit [Chen et al., 1999]

Julien Mairal Sparse Coding and Dictionary Learning 7/137

Page 8: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration

Designed dictionaries

[Haar, 1910], [Zweig, Morlet, Grossman ∼70s], [Meyer, Mallat,Daubechies, Coifman, Donoho, Candes ∼80s-today]. . . (see [Mallat,1999])Wavelets, Curvelets, Wedgelets, Bandlets, . . . lets

Learned dictionaries of patches

[Olshausen and Field, 1997], [Engan et al., 1999], [Lewicki andSejnowski, 2000], [Aharon et al., 2006] , [Roth and Black, 2005], [Leeet al., 2007]

minαi ,D∈C

i

1

2‖xi −Dαi‖

22

︸ ︷︷ ︸

reconstruction

+λψ(αi )︸ ︷︷ ︸

sparsity

ψ(α) = ‖α‖0 (“ℓ0 pseudo-norm”)

ψ(α) = ‖α‖1 (ℓ1 norm)

Julien Mairal Sparse Coding and Dictionary Learning 8/137

Page 9: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration

Solving the denoising problem

[Elad and Aharon, 2006]

Extract all overlapping 8× 8 patches xi .

Solve a matrix factorization problem:

minαi ,D∈C

n∑

i=1

1

2‖xi −Dαi‖

22

︸ ︷︷ ︸

reconstruction

+λψ(αi)︸ ︷︷ ︸

sparsity

,

with n > 100, 000

Average the reconstruction of each patch.

Julien Mairal Sparse Coding and Dictionary Learning 9/137

Page 10: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationK-SVD: [Elad and Aharon, 2006]

Figure: Dictionary trained on a noisy version of the imageboat.

Julien Mairal Sparse Coding and Dictionary Learning 10/137

Page 11: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration

Inpainting, Demosaicking

minD∈C,α

i

1

2‖βi ⊗ (xi −Dαi )‖

22 + λiψ(αi )

RAW Image Processing

Whitebalance.Black

substraction.

Denoising

Demosaicking

Conversionto sRGB.Gamma

correction.

Julien Mairal Sparse Coding and Dictionary Learning 11/137

Page 12: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration[Mairal, Bach, Ponce, Sapiro, and Zisserman, 2009b]

Julien Mairal Sparse Coding and Dictionary Learning 12/137

Page 13: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restoration[Mairal, Sapiro, and Elad, 2008d]

Julien Mairal Sparse Coding and Dictionary Learning 13/137

Page 14: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Elad, and Sapiro, 2008b]

Julien Mairal Sparse Coding and Dictionary Learning 14/137

Page 15: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Elad, and Sapiro, 2008b]

Julien Mairal Sparse Coding and Dictionary Learning 15/137

Page 16: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for video restoration

Key ideas for video processing

[Protter and Elad, 2009]

Using a 3D dictionary.

Processing of many frames at the same time.

Dictionary propagation.

Julien Mairal Sparse Coding and Dictionary Learning 16/137

Page 17: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Sparse Coding and Dictionary Learning 17/137

Page 18: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Sparse Coding and Dictionary Learning 18/137

Page 19: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Sparse Coding and Dictionary Learning 19/137

Page 20: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Sparse Coding and Dictionary Learning 20/137

Page 21: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationInpainting, [Mairal, Sapiro, and Elad, 2008d]

Figure: Inpainting results.

Julien Mairal Sparse Coding and Dictionary Learning 21/137

Page 22: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Sparse Coding and Dictionary Learning 22/137

Page 23: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Sparse Coding and Dictionary Learning 23/137

Page 24: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Sparse Coding and Dictionary Learning 24/137

Page 25: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Sparse Coding and Dictionary Learning 25/137

Page 26: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparse representations for image restorationColor video denoising, [Mairal, Sapiro, and Elad, 2008d]

Figure: Denoising results. σ = 25

Julien Mairal Sparse Coding and Dictionary Learning 26/137

Page 27: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Original

Julien Mairal Sparse Coding and Dictionary Learning 27/137

Page 28: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Bicubic

Julien Mairal Sparse Coding and Dictionary Learning 28/137

Page 29: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Proposed method

Julien Mairal Sparse Coding and Dictionary Learning 29/137

Page 30: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Original

Julien Mairal Sparse Coding and Dictionary Learning 30/137

Page 31: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Bicubic

Julien Mairal Sparse Coding and Dictionary Learning 31/137

Page 32: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital ZoomingCouzinie-Devy, 2010, Proposed approach

Julien Mairal Sparse Coding and Dictionary Learning 32/137

Page 33: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningOriginal

Julien Mairal Sparse Coding and Dictionary Learning 33/137

Page 34: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningReconstructed image

Julien Mairal Sparse Coding and Dictionary Learning 34/137

Page 35: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningOriginal

Julien Mairal Sparse Coding and Dictionary Learning 35/137

Page 36: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningReconstructed image

Julien Mairal Sparse Coding and Dictionary Learning 36/137

Page 37: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningOriginal

Julien Mairal Sparse Coding and Dictionary Learning 37/137

Page 38: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningReconstructed image

Julien Mairal Sparse Coding and Dictionary Learning 38/137

Page 39: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningOriginal

Julien Mairal Sparse Coding and Dictionary Learning 39/137

Page 40: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningReconstructed image

Julien Mairal Sparse Coding and Dictionary Learning 40/137

Page 41: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningOriginal

Julien Mairal Sparse Coding and Dictionary Learning 41/137

Page 42: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Inverse half-toningReconstructed image

Julien Mairal Sparse Coding and Dictionary Learning 42/137

Page 43: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

One short slide on compressed sensing

Important message

Sparse coding is not “compressed sensing”.

Compressed sensing is a theory [see Candes, 2006] saying that a sparsesignal can be recovered with high probability from a few linearmeasurements under some conditions.

Signal Acquisition: W⊤x, where W ∈ Rm×s is a “sensing” matrix

with s ≪ m.

Signal Decoding: minα∈Rp ‖α‖1 s.t. W⊤x = W⊤Dα.

with extensions to approximately sparse signals, noisy measurements.

Remark

The dictionaries we are using in this lecture do not satisfy the recoveryassumptions of compressed sensing.

Julien Mairal Sparse Coding and Dictionary Learning 43/137

Page 44: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Important messages

Patch-based approaches are achieving state-of-the-art results formany image processing task.

Dictionary Learning adapts to the data you want to restore.

Dictionary Learning is well adapted to data that admit sparserepresentation. Sparsity is for sparse data only.

Julien Mairal Sparse Coding and Dictionary Learning 44/137

Page 45: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Next topics

Why does the ℓ1-norm induce sparsity?

Some properties of the Lasso.

Beyond sparsity: Group-sparsity.

The simplest algorithm for learning dictionaries.

Links between dictionary learning and matrix factorizationtechniques.

Julien Mairal Sparse Coding and Dictionary Learning 45/137

Page 46: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Why does the ℓ1-norm induce sparsity?Exemple: quadratic problem in 1D

minα∈R

1

2(x − α)2 + λ|α|

Piecewise quadratic function with a kink at zero.

Derivative at 0+: g+ = −x + λ and 0−: g− = −x − λ.

Optimality conditions. α is optimal iff:

|α| > 0 and (x − α) + λ sign(α) = 0

α = 0 and g+ ≥ 0 and g− ≤ 0

The solution is a soft-thresholding:

α⋆ = sign(x)(|x | − λ)+.

Julien Mairal Sparse Coding and Dictionary Learning 46/137

Page 47: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Why does the ℓ1-norm induce sparsity?

x

α⋆

(a) soft-thresholding operator

x

α⋆

(b) hard-thresholding operator

Julien Mairal Sparse Coding and Dictionary Learning 47/137

Page 48: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Why does the ℓ1-norm induce sparsity?Analysis of the norms in 1D

ψ(α) = α2

ψ′(α) = 2α

ψ(α) = |α|

ψ′−(α) = −1, ψ′

+(α) = 1

The gradient of the ℓ2-norm vanishes when α get close to 0. On itsdifferentiable part, the norm of the gradient of the ℓ1-norm is constant.

Julien Mairal Sparse Coding and Dictionary Learning 48/137

Page 49: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Why does the ℓ1-norm induce sparsity?Geometric explanation

x

y

x

y

minα∈Rp

1

2‖x−Dα‖22 + λ‖α‖1

minα∈Rp

‖x−Dα‖22 s.t. ‖α‖1 ≤ T .

Julien Mairal Sparse Coding and Dictionary Learning 49/137

Page 50: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Important property of the LassoPiecewise linearity of the regularization path

0 1 2 3 4−0.5

0

0.5

1

1.5

λ

co

eff

icie

nt

va

lue

s

α1

α2

α3

α4

α5

Figure: Regularization path of the Lasso

min1‖x−Dα‖2 + λ‖α‖ .

Julien Mairal Sparse Coding and Dictionary Learning 50/137

Page 51: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparsity-Inducing Norms (1/2)

minα∈Rp

data fitting term︷︸︸︷

f (α) + λ ψ(α)︸ ︷︷ ︸

sparsity-inducing norm

Standard approach to enforce sparsity in learning procedures:

Regularizing by a sparsity-inducing norm ψ.

The effect of ψ is to set some αj ’s to zero, depending on theregularization parameter λ ≥ 0.

The most popular choice for ψ:

The ℓ1 norm, ‖α‖1 =∑p

j=1 |αj |.

For the square loss, Lasso [Tibshirani, 1996].

However, the ℓ1 norm encodes poor information, just cardinality!

Julien Mairal Sparse Coding and Dictionary Learning 51/137

Page 52: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Sparsity-Inducing Norms (2/2)

Another popular choice for ψ:

The ℓ1-ℓ2 norm,

G∈G

‖αG‖2 =∑

G∈G

(∑

j∈G

α2j

)1/2, with G a partition of {1, . . . , p}.

The ℓ1-ℓ2 norm sets to zero groups of non-overlapping variables

(as opposed to single variables for the ℓ1 norm).

For the square loss, group Lasso [Yuan and Lin, 2006].

However, the ℓ1-ℓ2 norm encodes fixed/static prior information,requires to know in advance how to group the variables !

Applications:

Selecting groups of features instead of individual variables.

Multi-task learning.

Julien Mairal Sparse Coding and Dictionary Learning 52/137

Page 53: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary Learning

minα∈Rp×n

D∈C

n∑

i=1

1

2‖xi −Dαi‖

22 + λ‖αi‖1

C△

= {D ∈ Rm×p s.t. ∀j = 1, . . . , p, ‖dj‖2 ≤ 1}.

Classical optimization alternates between D and α.

Good results, but slow!

Instead use online learning [Mairal et al., 2009a]

Julien Mairal Sparse Coding and Dictionary Learning 53/137

Page 54: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Sparse Coding and Dictionary Learning 54/137

Page 55: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Sparse Coding and Dictionary Learning 55/137

Page 56: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Sparse Coding and Dictionary Learning 56/137

Page 57: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph

Julien Mairal Sparse Coding and Dictionary Learning 57/137

Page 58: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary Learning

minα∈Rp×n

D∈C

n∑

i=1

1

2‖xi −Dαi‖

22 + λ‖αi‖1

can be rewritten

minα∈Rp×n

D∈C

1

2‖X−Dα‖2F + λ‖α‖1,

where X = [x1, . . . , xn] and α = [α1, . . . ,αn].

Julien Mairal Sparse Coding and Dictionary Learning 58/137

Page 59: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningPCA

minα∈Rp×n

D∈Rm×p

‖X−Dα‖2F ,

with the additional constraints that D is orthonormal and α⊤ isorthogonal.

D = [d1, . . . ,dp] are the principal components.

Julien Mairal Sparse Coding and Dictionary Learning 59/137

Page 60: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningHard clustering

minα∈Rp×n

D∈Rm×p

‖X−Dα‖2F ,

with the additional constraints that α is binary and its columns sum toone.

Julien Mairal Sparse Coding and Dictionary Learning 60/137

Page 61: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningSoft clustering

minα∈Rp×n

D∈Rm×p

‖X−Dα‖2F ,

with the additional constraints that the columns of α sum to one.

Julien Mairal Sparse Coding and Dictionary Learning 61/137

Page 62: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningNon-negative matrix factorization [Lee and Seung, 2001]

minα∈Rp×n

D∈Rm×p

‖X−Dα‖2F ,

with the additional constraints that the entries of D and α arenon-negative.

Julien Mairal Sparse Coding and Dictionary Learning 62/137

Page 63: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningNMF+sparsity?

minα∈Rp×n

D∈Rm×p

‖X−Dα‖2F + λ‖α‖1

with the additional constraints that the entries of D and α arenon-negative.

Most of these formulations can be addressed the same types of

algorithms.

Julien Mairal Sparse Coding and Dictionary Learning 63/137

Page 64: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningNatural Patches

(a) PCA (b) NNMF (c) DL

Julien Mairal Sparse Coding and Dictionary Learning 64/137

Page 65: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matrix Factorization Problems and Dictionary LearningFaces

(d) PCA (e) NNMF (f) DL

Julien Mairal Sparse Coding and Dictionary Learning 65/137

Page 66: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Important messages

The ℓ1-norm induces sparsity and shrinks the coefficients(soft-thresholding)

The regularization path of the Lasso is piecewise linear.

Sparsity can be induced at the group level.

Learning the dictionary is simple, fast and scalable.

Dictionary learning is related to several matrix factorizationproblems.

Software SPAMS is available for all of this:

www.di.ens.fr/willow/SPAMS/.

Julien Mairal Sparse Coding and Dictionary Learning 66/137

Page 67: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Next topics: Computer Vision

Intriguing results on the use of dictionary learning for bags of words.

Modelling the local appearance of image patches.

Julien Mairal Sparse Coding and Dictionary Learning 67/137

Page 68: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning Codebooks for Image Classification

Idea

Replacing Vector Quantization by Learned Dictionaries!

unsupervised: [Yang et al., 2009]

supervised: [Boureau et al., 2010, Yang et al., 2010]

Julien Mairal Sparse Coding and Dictionary Learning 68/137

Page 69: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning Codebooks for Image Classification

Let an image be represented by a set of low-level descriptors xi at Nlocations identified with their indices i = 1, . . . ,N.

hard-quantization:

xi ≈ Dαi , αi ∈ {0, 1}p and

p∑

j=1

αi [j ] = 1

soft-quantization:

αi [j ] =e−β‖xi−dj‖

22

∑pk=1 e

−β‖xi−dk‖22

sparse coding:

xi ≈ Dαi , αi = argminα

1

2‖xi −Dα‖22 + λ‖α‖1

Julien Mairal Sparse Coding and Dictionary Learning 69/137

Page 70: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning Codebooks for Image ClassificationTable from Boureau et al. [2010]

Yang et al. [2009] have won the PASCAL VOC’09 challenge using thiskind of techniques.

Julien Mairal Sparse Coding and Dictionary Learning 70/137

Page 71: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost function

Idea:

Let us consider 2 sets S−, S+ of signals representing 2 different classes.Each set should admit a dictionary best adapted to its reconstruction.

Classification procedure for a signal x ∈ Rn:

min(R⋆(x,D−),R⋆(x,D+))

whereR⋆(x,D) = min

α∈Rp‖x−Dα‖22 s.t. ‖α‖0 ≤ L.

“Reconstructive” training{

minD−

i∈S−R⋆(xi ,D−)

minD+

i∈S+R⋆(xi ,D+)

[Grosse et al., 2007], [Huang and Aviyente, 2006],[Sprechmann et al., 2010] for unsupervised clustering (CVPR ’10)

Julien Mairal Sparse Coding and Dictionary Learning 71/137

Page 72: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost function

“Discriminative” training

[Mairal, Bach, Ponce, Sapiro, and Zisserman, 2008a]

minD−,D+

i

C(

λzi(R⋆(xi ,D−)− R⋆(xi ,D+)

))

,

where zi ∈ {−1,+1} is the label of xi .

Logistic regression function

Julien Mairal Sparse Coding and Dictionary Learning 72/137

Page 73: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost functionExamples of dictionaries

Top: reconstructive, Bottom: discriminative, Left: Bicycle, Right:Background.

Julien Mairal Sparse Coding and Dictionary Learning 73/137

Page 74: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost functionTexture segmentation

Julien Mairal Sparse Coding and Dictionary Learning 74/137

Page 75: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost functionTexture segmentation

Julien Mairal Sparse Coding and Dictionary Learning 75/137

Page 76: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost functionPixelwise classification

Julien Mairal Sparse Coding and Dictionary Learning 76/137

Page 77: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Learning dictionaries with a discriminative cost functionweakly-supervised pixel classification

Julien Mairal Sparse Coding and Dictionary Learning 77/137

Page 78: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classification[Mairal, Leordeanu, Bach, Hebert, and Ponce, 2008c]

Good edges Bad edges

Julien Mairal Sparse Coding and Dictionary Learning 78/137

Page 79: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classificationBerkeley segmentation benchmark

Raw edge detection on the right

Julien Mairal Sparse Coding and Dictionary Learning 79/137

Page 80: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classificationBerkeley segmentation benchmark

Raw edge detection on the right

Julien Mairal Sparse Coding and Dictionary Learning 80/137

Page 81: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classificationBerkeley segmentation benchmark

Julien Mairal Sparse Coding and Dictionary Learning 81/137

Page 82: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classificationContour-based classifier: [Leordeanu, Hebert, and Sukthankar, 2007]

Is there a bike, a motorbike, a car or a person on thisimage?

Julien Mairal Sparse Coding and Dictionary Learning 82/137

Page 83: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classification

Julien Mairal Sparse Coding and Dictionary Learning 83/137

Page 84: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Application to edge detection and classificationPerformance gain due to the prefiltering

Ours + [Leordeanu ’07] [Leordeanu ’07] [Winn ’05]

96.8% 89.4% 76.9%

Recognition rates for the same experiment as [Winn et al., 2005] onVOC 2005.

Category Ours+[Leordeanu ’07] [Leordeanu ’07]Aeroplane 71.9% 61.9%

Boat 67.1% 56.4%Cat 82.6% 53.4%Cow 68.7% 59.2%Horse 76.0% 67%

Motorbike 80.6% 73.6%Sheep 72.9% 58.4%

Tvmonitor 87.7% 83.8%

Average 75.9% 64.2 %

Recognition performance at equal error rate for 8 classes on a subset ofimages from Pascal 07.

Julien Mairal Sparse Coding and Dictionary Learning 84/137

Page 85: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Julien Mairal Sparse Coding and Dictionary Learning 85/137

Page 86: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Julien Mairal Sparse Coding and Dictionary Learning 86/137

Page 87: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital Art AuthentificationData Courtesy of Hugues, Graham, and Rockmore [2009]

Authentic Fake

Julien Mairal Sparse Coding and Dictionary Learning 87/137

Page 88: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital Art AuthentificationData Courtesy of Hugues, Graham, and Rockmore [2009]

Authentic Fake

Fake

Julien Mairal Sparse Coding and Dictionary Learning 88/137

Page 89: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Digital Art AuthentificationData Courtesy of Hugues, Graham, and Rockmore [2009]

Authentic Fake

Authentic

Julien Mairal Sparse Coding and Dictionary Learning 89/137

Page 90: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Important messages

Learned dictionaries are well adapted to model the localappearance of images and edges.

They can be used to learn dictionaries of SIFT features.

Julien Mairal Sparse Coding and Dictionary Learning 90/137

Page 91: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Next topics

Optimization for solving sparse decomposition problems

Optimization for dictionary learning

Julien Mairal Sparse Coding and Dictionary Learning 91/137

Page 92: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Recall: The Sparse Decomposition Problem

minα∈Rp

1

2‖x−Dα‖22

︸ ︷︷ ︸

data fitting term

+ λψ(α)︸ ︷︷ ︸

sparsity-inducingregularization

ψ induces sparsity in α. It can be

the ℓ0 “pseudo-norm”. ‖α‖0△

= #{i s.t. α[i ] 6= 0} (NP-hard)

the ℓ1 norm. ‖α‖1△

=∑p

i=1 |α[i ]| (convex)

. . .

This is a selection problem.

Julien Mairal Sparse Coding and Dictionary Learning 92/137

Page 93: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Finding your way in the sparse coding literature. . .

. . . is not easy. The literature is vast, redundant, sometimesconfusing and many papers are claiming victory. . .

The main class of methods are

greedy procedures [Mallat and Zhang, 1993], [Weisberg, 1980]

homotopy [Osborne et al., 2000], [Efron et al., 2004],[Markowitz, 1956]

soft-thresholding based methods [Fu, 1998], [Daubechies et al.,2004], [Friedman et al., 2007], [Nesterov, 2007], [Beck andTeboulle, 2009], . . .

reweighted-ℓ2 methods [Daubechies et al., 2009],. . .

active-set methods [Roth and Fischer, 2008].

. . .

Julien Mairal Sparse Coding and Dictionary Learning 93/137

Page 94: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0)

d1

d2

d3

rz

x

y

Julien Mairal Sparse Coding and Dictionary Learning 94/137

Page 95: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0)

z

x

y

d1

d2

d3

r

< r,d3 > d3

Julien Mairal Sparse Coding and Dictionary Learning 95/137

Page 96: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0)

z

x

y

d1

d2

d3

rr− < r,d3 > d3

Julien Mairal Sparse Coding and Dictionary Learning 96/137

Page 97: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0.75)

d1

d2

d3

rz

x

y

Julien Mairal Sparse Coding and Dictionary Learning 97/137

Page 98: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0.75)

z

x

y

d1

d2

d3

r

Julien Mairal Sparse Coding and Dictionary Learning 98/137

Page 99: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0, 0.75)

z

x

y

d1

d2

d3

r

Julien Mairal Sparse Coding and Dictionary Learning 99/137

Page 100: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0.24, 0.75)

d1

d2

d3r

z

x

y

Julien Mairal Sparse Coding and Dictionary Learning 100/137

Page 101: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0.24, 0.75)

z

x

y

d1

d2

d3r

Julien Mairal Sparse Coding and Dictionary Learning 101/137

Page 102: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0.24, 0.75)

z

x

y

d1

d2

d3r

Julien Mairal Sparse Coding and Dictionary Learning 102/137

Page 103: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit α = (0, 0.24, 0.65)

d1

d2

d3r

z

x

y

Julien Mairal Sparse Coding and Dictionary Learning 103/137

Page 104: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Matching Pursuit

minα∈Rp

‖ x−Dα︸ ︷︷ ︸

r

‖22 s.t. ‖α‖0 ≤ L

1: α← 02: r← x (residual).3: while ‖α‖0 < L do

4: Select the atom with maximum correlation with the residual

ı← argmaxi=1,...,p

|dTi r|

5: Update the residual and the coefficients

α[ı] ← α[ı] + dTı r

r ← r − (dTı r)dı

6: end while

Julien Mairal Sparse Coding and Dictionary Learning 104/137

Page 105: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Orthogonal Matching Pursuitα = (0, 0, 0)

Γ = ∅

d1

d2

d3

xz

x

y

Julien Mairal Sparse Coding and Dictionary Learning 105/137

Page 106: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Orthogonal Matching Pursuit α = (0, 0, 0.75)Γ = {3}

z

x

y

d1

d2

d3

xr

π1

π2

π3

Julien Mairal Sparse Coding and Dictionary Learning 106/137

Page 107: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Orthogonal Matching Pursuit α = (0, 0.29, 0.63)Γ = {3, 2}

z

x

y

d1

d2

d3

xr

π31

π32

Julien Mairal Sparse Coding and Dictionary Learning 107/137

Page 108: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Orthogonal Matching Pursuit

minα∈Rp

‖x−Dα‖22 s.t. ‖α‖0 ≤ L

1: Γ = ∅.2: for iter = 1, . . . , L do

3: Select the atom which most reduces the objective

ı← argmini∈ΓC

{

minα

‖x−DΓ∪{i}α′‖22

}

4: Update the active set: Γ← Γ ∪ {ı}.5: Update the residual (orthogonal projection)

r← (I−DΓ(DTΓ DΓ)

−1DTΓ )x.

6: Update the coefficients

αΓ ← (DTΓ DΓ)

−1DTΓ x.

7: end for

Julien Mairal Sparse Coding and Dictionary Learning 108/137

Page 109: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Orthogonal Matching Pursuit

Contrary to MP, an atom can only be selected one time with OMP. It is,however, more difficult to implement efficiently. The keys for a goodimplementation in the case of a large number of signals are

Precompute the Gram matrix G = DTD once in for all,

Maintain the computation of DT r for each signal,

Maintain a Cholesky decomposition of (DTΓ DΓ)

−1 for each signal.

The total complexity for decomposing n L-sparse signals of size m with adictionary of size p is

O(p2m)︸ ︷︷ ︸

Gram matrix

+O(nL3)︸ ︷︷ ︸

Cholesky

+O(n(pm + pL2))︸ ︷︷ ︸

DT r

= O(np(m + L2))

It is also possible to use the matrix inversion lemma instead of aCholesky decomposition (same complexity, but less numerical stability)

Julien Mairal Sparse Coding and Dictionary Learning 109/137

Page 110: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Example with the software SPAMS

Software available at http://www.di.ens.fr/willow/SPAMS/

>> I=double(imread(’data/lena.eps’))/255;

>> %extract all patches of I

>> X=im2col(I,[8 8],’sliding’);

>> %load a dictionary of size 64 x 256

>> D=load(’dict.mat’);

>>

>> %set the sparsity parameter L to 10

>> param.L=10;

>> alpha=mexOMP(X,D,param);

On a 8-cores 2.83Ghz machine: 230000 signals processed per second!

Julien Mairal Sparse Coding and Dictionary Learning 110/137

Page 111: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimality conditions of the LassoNonsmooth optimization

Directional derivatives and subgradients are useful tools for studyingℓ1-decomposition problems:

minα∈Rp

1

2‖x−Dα‖22 + λ‖α‖1

In this tutorial, we use the directional derivatives to derive simpleoptimality conditions of the Lasso.

For more information on convex analysis and nonsmooth optimization,see the following books: [Boyd and Vandenberghe, 2004], [Nocedal andWright, 2006], [Borwein and Lewis, 2006], [Bonnans et al., 2006],[Bertsekas, 1999].

Julien Mairal Sparse Coding and Dictionary Learning 111/137

Page 112: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimality conditions of the LassoDirectional derivatives

Directional derivative in the direction u at α:

∇f (α,u) = limt→0+

f (α+ tu)− f (α)

t

Main idea: in non smooth situations, one may need to look at alldirections u and not simply p independent ones!

Proposition 1: if f is differentiable in α, ∇f (α,u) = ∇f (α)Tu.

Proposition 2: α is optimal iff for all u in Rp, ∇f (α,u) ≥ 0.

Julien Mairal Sparse Coding and Dictionary Learning 112/137

Page 113: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimality conditions of the Lasso

minα∈Rp

1

2‖x−Dα‖22 + λ‖α‖1

α⋆ is optimal iff for all u in Rp, ∇f (α,u) ≥ 0—that is,

−uTDT (x−Dα⋆) + λ∑

i ,α⋆[i ] 6=0

sign(α⋆[i ])u[i ] + λ∑

i ,α⋆[i ]=0

|ui | ≥ 0,

which is equivalent to the following conditions:

∀i = 1, . . . , p,

{|dTi (x−Dα⋆)| ≤ λ if α⋆[i ] = 0dTi (x−Dα⋆) = λ sign(α⋆[i ]) if α⋆[i ] 6= 0

Julien Mairal Sparse Coding and Dictionary Learning 113/137

Page 114: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Homotopy

A homotopy method provides a set of solutions indexed by aparameter.

The regularization path (λ,α⋆(λ)) for instance!!

It can be useful when the path has some “nice” properties(piecewise linear, piecewise quadratic).

LARS [Efron et al., 2004] starts from a trivial solution, and followsthe regularization path of the Lasso, which is is piecewise linear.

Julien Mairal Sparse Coding and Dictionary Learning 114/137

Page 115: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Homotopy, LARS[Osborne et al., 2000], [Efron et al., 2004]

∀i = 1, . . . , p,

{|dTi (x−Dα⋆)| ≤ λ if α⋆[i ] = 0dTi (x−Dα⋆) = λ sign(α⋆[i ]) if α⋆[i ] 6= 0

(1)The regularization path is piecewise linear:

DTΓ (x−DΓα

⋆Γ) = λ sign(α⋆

Γ)

α⋆Γ(λ) = (DT

Γ DΓ)−1(DT

Γ x− λ sign(α⋆Γ)) = A+ λB

A simple interpretation of LARS

Start from the trivial solution (λ = ‖DTx‖∞,α⋆(λ) = 0).

Maintain the computations of |dTi (x−Dα⋆(λ))| for all i .

Maintain the computation of the current direction B.

Follow the path by reducing λ until the next kink.

Julien Mairal Sparse Coding and Dictionary Learning 115/137

Page 116: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Example with the software SPAMShttp://www.di.ens.fr/willow/SPAMS/

>> I=double(imread(’data/lena.eps’))/255;

>> %extract all patches of I

>> X=normalize(im2col(I,[8 8],’sliding’));

>> %load a dictionary of size 64 x 256

>> D=load(’dict.mat’);

>>

>> %set the sparsity parameter lambda to 0.15

>> param.lambda=0.15;

>> alpha=mexLasso(X,D,param);

On a 8-cores 2.83Ghz machine: 77000 signals processed per second!

Note that it can also solve constrained version of the problem. Thecomplexity is more or less the same as OMP and uses the same tricks(Cholesky decomposition).

Julien Mairal Sparse Coding and Dictionary Learning 116/137

Page 117: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Coordinate Descent

Coordinate descent + nonsmooth objective: WARNING: not

convergent in general

Here, the problem is equivalent to a convex smooth optimizationproblem with separable constraints

minα+,α−

1

2‖x−D+α++D−α−‖

22+λα

T+1+λα

T−1 s.t. α−,α+ ≥ 0.

For this specific problem, coordinate descent is convergent.

Supposing ‖di‖2 = 1, updating the coordinate i :

α[i ]← argminβ

1

2‖ x−

j 6=i

α[j ]dj

︸ ︷︷ ︸

r

−βdi‖22 + λ|β|

← sign(dTi r)(|dTi r| − λ)

+

⇒ soft-thresholding!

Julien Mairal Sparse Coding and Dictionary Learning 117/137

Page 118: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Example with the software SPAMShttp://www.di.ens.fr/willow/SPAMS/

>> I=double(imread(’data/lena.eps’))/255;

>> %extract all patches of I

>> X=normalize(im2col(I,[8 8],’sliding’));

>> %load a dictionary of size 64 x 256

>> D=load(’dict.mat’);

>>

>> %set the sparsity parameter lambda to 0.15

>> param.lambda=0.15;

>> param.tol=1e-2;

>> param.itermax=200;

>> alpha=mexCD(X,D,param);

On a 8-cores 2.83Ghz machine: 93000 signals processed per second!

Julien Mairal Sparse Coding and Dictionary Learning 118/137

Page 119: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

first-order/proximal methods

minα∈Rp

f (α) + λψ(α)

f is strictly convex and continuously differentiable with a Lipshitzgradient.

Generalize the idea of gradient descent

αk+1←argminα∈R

f (αk)+∇f (αk)T (α−αk)+

L

2‖α−αk‖

22+λψ(α)

← argminα∈R

1

2‖α− (αk −

1

L∇f (αk))‖

22 +

λ

Lψ(α)

When λ = 0, this is equivalent to a classical gradient descent step.

Julien Mairal Sparse Coding and Dictionary Learning 119/137

Page 120: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

first-order/proximal methods

They require solving efficiently the proximal operator

minα∈Rp

1

2‖u−α‖22 + λψ(α)

For the ℓ1-norm, this amounts to a soft-thresholding:

α⋆[i ] = sign(u[i ])(u[i ]− λ)+.

There exists accelerated versions based on Nesterov optimalfirst-order method (gradient method with “extrapolation”) [Beckand Teboulle, 2009, Nesterov, 2007, 1983]

suited for large-scale experiments.

Julien Mairal Sparse Coding and Dictionary Learning 120/137

Page 121: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Grouped Sparsity

The formulation:

minα∈Rp

1

2‖x−Dα‖22

︸ ︷︷ ︸

data fitting term

+ λ∑

g∈G

‖αg‖q

︸ ︷︷ ︸

group-sparsity-inducingregularization

The main class of algorithms for solving grouped-sparsity problems are

Greedy approaches

Block-coordinate descent

Proximal methods

Julien Mairal Sparse Coding and Dictionary Learning 121/137

Page 122: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Grouped Sparsity

The proximal operator:

minα∈Rp

1

2‖u−α‖22 + λ

g∈G

‖αg‖q

For q = 2,

α⋆g =

ug

‖ug‖2(‖ug‖2 − λ)

+, ∀g ∈ G

For q =∞,α⋆

g = ug − Π‖.‖1≤λ[ug ], ∀g ∈ G

These formula generalize soft-thrsholding to groups of variables. Theyare used in block-coordinate descent and proximal algorithms.

Julien Mairal Sparse Coding and Dictionary Learning 122/137

Page 123: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Reweighted ℓ2

Let us start from something simple

a2 − 2ab + b2 ≥ 0.

Then

a ≤1

2

(a2

b+ b

)

with equality iff a = b

and

‖α‖1 = minηj≥0

1

2

p∑

j=1

α[j ]2

ηj+ ηj .

The formulation becomes

minα,ηj≥ε

1

2‖x−Dα‖22 +

λ

2

p∑

j=1

α[j ]2

ηj+ ηj .

Julien Mairal Sparse Coding and Dictionary Learning 123/137

Page 124: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Important messages

Greedy methods directly address the NP-hard ℓ0-decompositionproblem.

Homotopy methods can be extremely efficient for small ormedium-sized problems, or when the solution is very sparse.

Coordinate descent provides in general quickly a solution with asmall/medium precision, but gets slower when there is a lot ofcorrelation in the dictionary.

First order methods are very attractive in the large scale setting.

Other good alternatives exists, active-set, reweighted ℓ2 methods,stochastic variants, variants of OMP,. . .

Julien Mairal Sparse Coding and Dictionary Learning 124/137

Page 125: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary Learning

minα∈Rp×n

D∈C

n∑

i=1

1

2‖xi −Dαi‖

22 + λ‖αi‖1

C△

= {D ∈ Rm×p s.t. ∀j = 1, . . . , p, ‖dj‖2 ≤ 1}.

Classical optimization alternates between D and α.

Good results, but very slow!

Julien Mairal Sparse Coding and Dictionary Learning 125/137

Page 126: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary Learning[Mairal, Bach, Ponce, and Sapiro, 2009a]

Classical formulation of dictionary learning

minD∈C

fn(D) = minD∈C

1

n

n∑

i=1

l(xi ,D),

where

l(x,D)△

= minα∈Rp

1

2‖x−Dα‖22 + λ‖α‖1.

Which formulation are we interested in?

minD∈C

{

f (D) = Ex [l(x,D)] ≈ limn→+∞

1

n

n∑

i=1

l(xi ,D)}

[Bottou and Bousquet, 2008]: Online learning can

handle potentially infinite or dynamic datasets,

be dramatically faster than batch algorithms.Julien Mairal Sparse Coding and Dictionary Learning 126/137

Page 127: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary Learning

Require: D0 ∈ Rm×p (initial dictionary); λ ∈ R

1: A0 = 0, B0 = 0.2: for t=1,. . . ,T do

3: Draw xt4: Sparse Coding

αt ← argminα∈Rp

1

2‖xt −Dt−1α‖

22 + λ‖α‖1,

5: Aggregate sufficient statisticsAt ← At−1 +αtα

Tt , Bt ← Bt−1 + xtα

Tt

6: Dictionary Update (block-coordinate descent)

Dt ← argminD∈C

1

t

t∑

i=1

(1

2‖xi −Dαi‖

22 + λ‖αi‖1

)

.

7: end for

Julien Mairal Sparse Coding and Dictionary Learning 127/137

Page 128: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary Learning

Which guarantees do we have?

Under a few reasonable assumptions,

we build a surrogate function ft of the expected cost f verifying

limt→+∞

ft(Dt)− f (Dt) = 0,

Dt is asymptotically close to a stationary point.

Extensions (all implemented in SPAMS)

non-negative matrix decompositions.

sparse PCA (sparse dictionaries).

fused-lasso regularizations (piecewise constant dictionaries)

Julien Mairal Sparse Coding and Dictionary Learning 128/137

Page 129: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningExperimental results, batch vs online

m = 8× 8, p = 256Julien Mairal Sparse Coding and Dictionary Learning 129/137

Page 130: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

Optimization for Dictionary LearningExperimental results, batch vs online

m = 12× 12× 3, p = 512Julien Mairal Sparse Coding and Dictionary Learning 130/137

Page 131: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References I

M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designingof overcomplete dictionaries for sparse representations. IEEE Transactions onSignal Processing, 54(11):4311–4322, November 2006.

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linearinverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

D. P. Bertsekas. Nonlinear programming. Athena Scientific Belmont, Mass, 1999.

J.F. Bonnans, J.C. Gilbert, C. Lemarechal, and C.A. Sagastizabal. Numericaloptimization: theoretical and practical aspects. Springer-Verlag New York Inc,2006.

J. M. Borwein and A. S. Lewis. Convex analysis and nonlinear optimization: Theoryand examples. Springer, 2006.

L. Bottou and O. Bousquet. The trade-offs of large scale learning. In J.C. Platt,D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural InformationProcessing Systems, volume 20, pages 161–168. MIT Press, Cambridge, MA, 2008.

Y-L. Boureau, F. Bach, Y. Lecun, and J. Ponce. Learning mid-level features forrecognition. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2010.

Julien Mairal Sparse Coding and Dictionary Learning 131/137

Page 132: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References IIS. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,

2004.

E. Candes. Compressive sampling. In Proceedings of the International Congress ofMathematicians, volume 3, 2006.

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basispursuit. SIAM Journal on Scientific Computing, 20:33–61, 1999.

I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm forlinear inverse problems with a sparsity constraint. Comm. Pure Appl. Math, 57:1413–1457, 2004.

I. Daubechies, R. DeVore, M. Fornasier, and S. Gunturk. Iteratively re-weighted leastsquares minimization for sparse recovery. Commun. Pure Appl. Math, 2009.

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals ofstatistics, 32(2):407–499, 2004.

M. Elad and M. Aharon. Image denoising via sparse and redundant representationsover learned dictionaries. IEEE Transactions on Image Processing, 54(12):3736–3745, December 2006.

Julien Mairal Sparse Coding and Dictionary Learning 132/137

Page 133: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References IIIK. Engan, S. O. Aase, and J. H. Husoy. Frame based signal compression using

method of optimal directions (MOD). In Proceedings of the 1999 IEEEInternational Symposium on Circuits Systems, volume 4, 1999.

J. Friedman, T. Hastie, H. Holfling, and R. Tibshirani. Pathwise coordinateoptimization. Annals of statistics, 1(2):302–332, 2007.

W. J. Fu. Penalized regressions: The bridge versus the Lasso. Journal ofcomputational and graphical statistics, 7:397–416, 1998.

R. Grosse, R. Raina, H. Kwong, and A. Y. Ng. Shift-invariant sparse coding for audioclassification. In Proceedings of the Twenty-third Conference on Uncertainty inArtificial Intelligence, 2007.

A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen,69:331–371, 1910.

K. Huang and S. Aviyente. Sparse representation for signal classification. In Advancesin Neural Information Processing Systems, Vancouver, Canada, December 2006.

J. M. Hugues, D. J. Graham, and D. N. Rockmore. Quantification of artistic stylethrough sparse coding analysis in the drawings of Pieter Bruegel the Elder.Proceedings of the National Academy of Science, TODO USA, 107(4):1279–1283,2009.

Julien Mairal Sparse Coding and Dictionary Learning 133/137

Page 134: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References IVD. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In

Advances in Neural Information Processing Systems, 2001.

H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. InB. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural InformationProcessing Systems, volume 19, pages 801–808. MIT Press, Cambridge, MA, 2007.

M. Leordeanu, M. Hebert, and R. Sukthankar. Beyond local appearance: Categoryrecognition from pairwise interactions of simple features. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. NeuralComputation, 12(2):337–365, 2000.

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Discriminative learneddictionaries for local image analysis. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2008a.

J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration.IEEE Transactions on Image Processing, 17(1):53–69, January 2008b.

J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discriminative sparseimage models for class-specific edge detection and image interpretation. InProceedings of the European Conference on Computer Vision (ECCV), 2008c.

Julien Mairal Sparse Coding and Dictionary Learning 134/137

Page 135: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References VJ. Mairal, G. Sapiro, and M. Elad. Learning multiscale sparse representations for

image and video restoration. SIAM Multiscale Modelling and Simulation, 7(1):214–241, April 2008d.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparsecoding. In Proceedings of the International Conference on Machine Learning(ICML), 2009a.

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse modelsfor image restoration. In Proceedings of the IEEE International Conference onComputer Vision (ICCV), 2009b.

S. Mallat. A Wavelet Tour of Signal Processing, Second Edition. Academic Press,New York, September 1999.

S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEETransactions on Signal Processing, 41(12):3397–3415, 1993.

H. M. Markowitz. The optimization of a quadratic function subject to linearconstraints. Naval Research Logistics Quarterly, 3:111–133, 1956.

Y. Nesterov. A method for solving a convex programming problem with convergencerate O(1/k2). Soviet Math. Dokl., 27:372–376, 1983.

Julien Mairal Sparse Coding and Dictionary Learning 135/137

Page 136: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References VIY. Nesterov. Gradient methods for minimizing composite objective function.

Technical report, CORE, 2007.

J. Nocedal and SJ Wright. Numerical Optimization. Springer: New York, 2006. 2ndEdition.

B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: Astrategy employed by V1? Vision Research, 37:3311–3325, 1997.

M. R. Osborne, B. Presnell, and B. A. Turlach. On the Lasso and its dual. Journal ofComputational and Graphical Statistics, 9(2):319–37, 2000.

M. Protter and M. Elad. Image sequence denoising via sparse and redundantrepresentations. IEEE Transactions on Image Processing, 18(1):27–36, 2009.

S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2005.

V. Roth and B. Fischer. The group-lasso for generalized linear models: uniqueness ofsolutions and efficient algorithms. In Proceedings of the International Conferenceon Machine Learning (ICML), 2008.

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar. Collaborative hierarchicalsparse modeling. Technical report, 2010. Preprint arXiv:1003.0400v1.

Julien Mairal Sparse Coding and Dictionary Learning 136/137

Page 137: Sparse Coding and Dictionary Learning for Image Analysislear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf · Sparse Coding and Dictionary Learning for Image Analysis

References VIIR. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal

Statistical Society. Series B, 58(1):267–288, 1996.

S. Weisberg. Applied Linear Regression. Wiley, New York, 1980.

J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visualdictionary. In Proceedings of the IEEE International Conference on ComputerVision (ICCV), 2005.

J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparsecoding for image classification. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2009.

J. Yang, K. Yu, , and T. Huang. Supervised translation-invariant sparse coding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2010.

M. Yuan and Y. Lin. Model selection and estimation in regression with groupedvariables. Journal of the Royal Statistical Society Series B, 68:49–67, 2006.

Julien Mairal Sparse Coding and Dictionary Learning 137/137