Sparse Representations and the Basis Pursuit Algorithm...Sparse Modeling in Image Processing and...
Transcript of Sparse Representations and the Basis Pursuit Algorithm...Sparse Modeling in Image Processing and...
Sparse Modeling in
Image Processing and Deep Learning
Michael EladComputer Science Department The Technion - Israel Institute of TechnologyHaifa 32000, Israel
The research leading to these results has been received funding from the European union's Seventh Framework Program
(FP/2007-2013) ERC grant Agreement ERC-SPARSE- 320649
This Lecture is About โฆ
2Michael EladThe Computer-Science DepartmentThe Technion
A Proposed Theory for Deep-Learning (DL)
Explanation:
o DL has been extremely successful in solving a variety of learning problems
o DL is an empirical field, with numerous tricks and know-how, but almost no theoretical foundations
o A theory for DL has become the holy-grail of current research in Machine-Learning and related fields
Who Needs Theory ?
3Michael EladThe Computer-Science DepartmentThe Technion
We All Do !!
โฆ because โฆ A theory
o โฆ could bring the next rounds of ideas to this field, breaking existing barriers and opening new opportunities
o โฆ could map clearly the limitations of existing DL solutions, and point to key features that control their performance
o โฆ could remove the feeling with many of us that DL is a โdark magicโ, turning it into a solid scientific discipline
o
Understanding is a good thing โฆ but another goal is inventing methods. In the history of science and technology, engineering
preceded theoretical understanding: Lens & telescope Optics Steam engine Thermodynamics Airplane Aerodynamics Radio & Comm. Info. Theory Computer Computer Science
โMachine learning has become alchemyโ
Ali Rahimi: NIPS 2017
Test-of-Time Award
Yan LeCun
A Theory for DL ?
4Michael EladThe Computer-Science DepartmentThe Technion
Architecture
Algorithms
Data
Rene Vidal (JHU): Explained the ability to optimize the typical non-convex objective and yet get to a global minima
Naftali Tishby (HUJI): Introduced the Information Bottleneck (IB) concept and demonstrated its relevance to deep learning
Stefano Soattoโs team (UCLA): Analyzed the Stochastic Gradient Descent (SGD) algorithm, connecting it to the IB objective
Stephane Mallat (ENS) & Joan Bruna (NYU): Proposed
the scattering transform (wavelet-based) and
emphasized the treatment of invariances in the input data
Richard Baraniuk & Ankit Patel (RICE): Offered a
generative probabilistic model for the data,
showing that classic architectures relate to it
Raja Giryes (TAU): Studied the architecture of DNN in the context of their ability to give distance-preserving embedding of signals
Gitta Kutyniok (TU) & Helmut Bolcskei (ETH): Studied the ability of DNN architectures to approximate families of functions
So, is there a Theory for DL ?
5Michael EladThe Computer-Science DepartmentThe Technion
The answer is tricky:
There are already various such attempts, and some of them are
truly impressive
โฆ but โฆ
none of them is complete
Interesting Observations
6Michael EladThe Computer-Science DepartmentThe Technion
o Theory origins: Signal Proc., Control Theory, Info. Theory, Harmonic Analysis, Sparse Represen., Quantum Physics, PDE, Machine learning โฆ
Ron Kimmel: โDL is a dark monster covered with mirrors. Everyone sees his reflection in it โฆโ
David Donoho: โโฆ these mirrors are taken from Cinderella's story, telling each that he is the most beautifulโ
o Todayโs talk is on our proposed theory:
โฆ and yes, our theory is the bestVardan PapyanYaniv Romano Jeremias Sulam
Architecture
Algorithms
Data
ML-CSC Multi-Layered Convolutional Sparse Coding
SparselandSparse
Representation Theory
Another underlying idea that accompanies us
Generative modeling of data sources enableso A systematic algorithm development, & o A theoretical analysis of their performance
CSCConvolutional
Sparse Coding
This Lecture: More Specifically
7
Sparsity-Inspired Models
Michael EladThe Computer-Science DepartmentThe Technion
Deep-Learning
Disclaimer: Being a lecture on the theory
of DL, this lecture is โฆ theoretical โฆ and
mathematically oriented
Multi-Layered Convolutional Sparse Modeling
8Michael EladThe Computer-Science DepartmentThe Technion
Our eventual goal in todayโs talk is to present the โฆ
So, lets use this as our running title, parse it into words,
and explain each of them
Multi-Layered Convolutional Sparse Modeling
9Michael EladThe Computer-Science DepartmentThe Technion
3D Objects
Medical Imaging
10
Our Data is Structured
o We are surrounded by various diverse sources of massive information
o Each of these sources have an internal structure, which can be exploited
o This structure, when identified, is the engine behind the ability to process data
o How to identify structure?
Voice Signals
Stock Market Biological Signals
Videos
Text Documents
Radar Imaging
Matrix Data
Social Networks
Traffic info
Seismic DataStill Images
Michael EladThe Computer-Science DepartmentThe Technion
3D Objects
Medical Imaging
11
Our Data is Structured
o We are surrounded by various diverse sources of massive information
o Each of these sources have an internal structure, which can be exploited
o This structure, when identified, is the engine behind the ability to process data
o How to identify structure?
Voice Signals
Stock Market Biological Signals
Videos
Text Documents
Radar Imaging
Matrix Data
Social Networks
Traffic info
Seismic DataStill Images
Michael EladThe Computer-Science DepartmentThe Technion
Using models
12
Model?
Effective removal of noise (and many other tasks) relies on an proper modeling of the signal
Michael EladThe Computer-Science DepartmentThe Technion
Fact 1:This signal
contains AWGN โ(0,1)
Fact 2: The clean signal
is believed to be PWC
13
Models
o A model: a mathematicaldescription of the underlying signal of interest, describing our beliefs regarding its structure
o The following is a partial list of commonly used models for images
o Good models should be simple while matching the signals
o Models are almost always imperfect
Principal-Component-Analysis
Gaussian-Mixture
Markov Random Field
Laplacian Smoothness
DCT concentration
Wavelet Sparsity
Piece-Wise-Smoothness
C2-smoothness
Besov-Spaces
Total-Variation
Beltrami-Flow
Simplicity Reliability
Michael EladThe Computer-Science DepartmentThe Technion
14
What this Talk is all About?
Data Models and Their Useo Almost any task in data processing requires a model โ
true for denoising, deblurring, super-resolution, inpainting, compression, anomaly-detection, sampling, recognition, separation, and more
o Sparse and Redundant Representations offer a new and highly effective model โ we call it
Sparselando We shall describe this and descendant versions of it that
lead all the way to โฆ deep-learning
Michael EladThe Computer-Science DepartmentThe Technion
Multi-Layered Convolutional Sparse Modeling
15Michael EladThe Computer-Science DepartmentThe Technion
Machine Learning
16
MathematicsSignal
Processing
A New Emerging Model
Sparseland
Wavelet Theory
Signal Transforms
Multi-Scale Analysis
Approximation Theory
Linear Algebra
Optimization Theory
Denoising
Interpolation
PredictionCompression
Inference (solving inverse problems)
Anomaly detectionClustering
Summarizing
Sensor-FusionSource-Separation
Segmentation
Recognition
Semi-Supervised Learning
Identification
Classification
Synthesis
Michael EladThe Computer-Science DepartmentThe Technion
17
The Sparseland Model
o Task: model image patches of size 8ร8 pixels
o We assume that a dictionary of such image patches is given,
containing 256 atom images
o The Sparseland model assumption:
every image patch can be described as a linear
combination of few atoms
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
Michael EladThe Computer-Science DepartmentThe Technion
18
The Sparseland Model
o We start with a 8-by-8 pixels patch and represent it using 256 numbers
โ This is a redundant representation
o However, out of those 256 elements in the representation, only 3 are non-zeros
โ This is a sparse representation
o Bottom line in this case: 64 numbers representing the patch are replaced by 6 (3 for the indices of the non-zeros, and 3 for their entries)
Properties of this model:
Sparsity and Redundancyฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
Michael EladThe Computer-Science DepartmentThe Technion
19
Chemistry of Data
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
o Our dictionary stands for the Periodic Table containing all the elements
o Our model follows a similar rationale: Every molecule is built of few elements
We could refer to the Sparselandmodel as the chemistry of information:
Michael EladThe Computer-Science DepartmentThe Technion
20
Sparseland : A Formal Description
Mm
n
A Dictionary
o Every column in ๐(dictionary) is a prototype signal (atom)
o The vector is generated with few non-zeros at arbitrarylocations and values
A sparse vector
n
o This is a generative model that describes how (we believe) signals are created
x
ฮฑ๐
Michael EladThe Computer-Science DepartmentThe Technion
21
Difficulties with Sparseland
o Problem 1: Given a signal, how can we find its atom decomposition?
o A simple example:
There are 2000 atoms in the dictionary
The signal is known to be built of 15 atoms
possibilities
If each of these takes 1nano-sec to test, this will take ~7.5e20 years to finish !!!!!!
o So, are we stuck?
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
20002.4e 37
15
Michael EladThe Computer-Science DepartmentThe Technion
22
Atom Decomposition Made Formal
Greedy methods
Thresholding/OMP
Relaxation methods
Basis-Pursuit
L0 โ counting number of non-zeros in the vector
This is a projection onto
the Sparseland model
These problems are known to be NP-Hard problem
Approximation Algorithms
minฮฑ ฮฑ 0 s. t. ๐ฮฑ โ y 2 โค ฮต
minฮฑ ฮฑ 0 s. t. x = ๐ฮฑ
m
n ๐
xฮฑ
Michael EladThe Computer-Science DepartmentThe Technion
23
Pursuit Algorithms
Michael EladThe Computer-Science DepartmentThe Technion
Basis Pursuit
Change the L0 into L1
and then the problem becomes convex and manageable
Matching Pursuit
Find the support greedily,
one element at a time
Thresholding
Multiply y by ๐๐
and apply shrinkage:
เทฮฑ = ๐ซ๐ฝ ๐๐y
minฮฑ ฮฑ 0 s. t. ๐ฮฑ โ y 2 โค ฮต
Approximation Algorithms
minฮฑ ฮฑ 1
s. t.๐ฮฑ โ y 2 โค ฮต
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
24
Difficulties with Sparselando There are various pursuit algorithms
o Here is an example using the Basis Pursuit (L1):
o Surprising fact: Many of these algorithms are often accompanied by theoretical guarantees for their success, if the unknown is sparse enough
Michael EladThe Computer-Science DepartmentThe Technion
25
The Mutual Coherence
o The Mutual Coherence ฮผ ๐ is the largest off-diagonal entry in absolute value
o We will pose all the theoretical results in this talk using this property, due to its simplicity
o You may have heard of other ways to characterize the dictionary (Restricted Isometry Property - RIP, Exact Recovery Condition - ERC, Babel function, Spark, โฆ)
=o Compute
Assume normalized
columns
๐
๐T
๐T๐
Michael EladThe Computer-Science DepartmentThe Technion
26
Basis-Pursuit Success
Comments: o If =0 เทฮฑ = ฮฑo This is a worst-case
analysis โ better bounds exist
o Similar theorems exist for many other pursuit algorithms
Theorem: Given a noisy signal y = ๐ฮฑ + v where v 2 โค ฮตand ฮฑ is sufficiently sparse,
then Basis-Pursuit: minฮฑ ฮฑ 1 s. t. ๐ฮฑ โ y 2 โค ฮต
leads to a stable result: เทฮฑ โ ฮฑ 22 โค
4๐2
1โฮผ 4 ฮฑ 0โ1
Michael EladThe Computer-Science DepartmentThe Technion
Donoho, Elad & Temlyakov (โ06)
เทฮฑ
ฮฑ 0 <1
41 +
1
ฮผ
M
x
ฮฑ
๐+
y
v 2 โค ฮต
minฮฑ ฮฑ 1
s. t.๐ฮฑ โ y 2 โค ฮต
minฮฑ ฮฑ 0
s. t.๐ฮฑ โ y 2 โค ฮต
27
Difficulties with Sparseland
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃo Problem 2: Given a family of signals, how do
we find the dictionary to represent it well?
o Solution: Learn! Gather a large set of signals (many thousands), and find the dictionary that sparsifies them
o Such algorithms were developed in the past 10 years (e.g., K-SVD), and their performance is surprisingly good
o We will not discuss this matter further in this talk due to lack of time
Michael EladThe Computer-Science DepartmentThe Technion
28
Difficulties with Sparseland
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃo Problem 3: Why is this model suitable to
describe various sources? e.g., Is it goodfor images? Audio? Stocks? โฆ
o General answer: Yes, this model is extremely effective in representing various sources
Theoretical answer: Clear connection to other models
Empirical answer: In a large variety of signal and image processing (and later machine learning), this model has been shown to lead to state-of-the-art results
Michael EladThe Computer-Science DepartmentThe Technion
29
Difficulties with Sparseland ?
o Problem 1: Given an image patch, how can we find its atom decomposition ?
o Problem 2: Given a family of signals, how do we find the dictionary to represent it well?
o Problem 3: Is this model flexible enough to describe various sources? E.g., Is it good for images? audio? โฆ
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
Michael EladThe Computer-Science DepartmentThe Technion
30
Difficulties with Sparseland ?
o Problem 1: Given an image patch, how can we find its atom decomposition ?
o Problem 2: Given a family of signals, how do we find the dictionary to represent it well?
o Problem 3: Is this model flexible enough to describe various sources? E.g., Is it good for images? audio? โฆ
ฮฑ1 ฮฑ2 ฮฑ3
ฮฃ
Michael EladThe Computer-Science DepartmentThe Technion
31
o Sparseland has a great success in signal &
image processing and machine learning tasks
o In the past 8-9 years, many books were published on this and closely related fields
Michael EladThe Computer-Science DepartmentThe Technion
This Field has been rapidly GROWING โฆ
32Michael EladThe Computer-Science DepartmentThe Technion
33
A New Massive Open Online Course
Michael EladThe Computer-Science DepartmentThe Technion
34
o When handling images, Sparseland is typically deployed on small
overlapping patches due to the desire to train the model to fit the data better
o The model assumption is: each patch in the image is believed to have a sparse representation w.r.t. a common local dictionary
o What is the corresponding global model? This brings us to โฆ the Convolutional Sparse Coding (CSC)
Michael EladThe Computer-Science DepartmentThe Technion
Sparseland for Image Processing
Multi-Layered ConvolutionalSparse Modeling
35Michael EladThe Computer-Science DepartmentThe Technion
Joint work with
Vardan PapyanYaniv Romano Jeremias Sulam
Michael EladThe Computer-Science DepartmentThe Technion
Convolutional Sparse Coding (CSC)
[๐] =
i=1
๐
di โ [ฮi]
๐ filters convolved with their sparse representations
An image with ๐pixels
i-th feature-map: An image of the same size as ๐ holding the sparse representation related to the i-filter
The i-th filter of small size ๐
29
This model emerged in 2005-2010, developed and advocated by Yan LeCun and others. It serves as the foundation of Convolutional Neural Networks
Michael EladThe Computer-Science DepartmentThe Technion
oHere is an alternative global sparsity-based model formulation
o๐i โ โ๐ร๐ is a banded and Circulantmatrix containing a single atom with all of its shifts
o๐ชi โ โ๐ are the corresponding coefficients ordered as column vectors
๐ =
i=1
๐
๐i๐ชi
CSC in Matrix Form
๐
๐
๐i =
=๐1 โฏ ๐๐ ๐ช1
โฎ๐ช๐
= ๐๐ช
30
Michael EladThe Computer-Science DepartmentThe Technion
The CSC Dictionary
๐1 ๐2 ๐3 =
๐ =๐
๐L
๐
31
Michael EladThe Computer-Science DepartmentThe Technion
=
๐i๐ = ๐๐i
๐
(2๐ โ 1)๐
๐i๐
๐i
Why CSC?
๐ = ๐๐ชstripe-dictionary stripe vector
32
Michael EladThe Computer-Science DepartmentThe Technion
=
๐i๐ = ๐๐i
๐
(2๐ โ 1)๐
๐i+1๐
๐i+1
Why CSC?
๐ = ๐๐ชstripe-dictionary
Every patch has a sparse representation w.r.t. to the
same local dictionary (๐) just as assumed for images
stripe vector
32
๐i+1๐ = ๐๐i+1
Michael EladThe Computer-Science DepartmentThe Technion
Classical Sparse Theory for CSC ?
Theorem: BP is guaranteed to โsucceedโ โฆ. if ๐ช ๐ <๐
๐๐ +
๐
๐
min๐ช
๐ช 0 s. t. ๐ โ ๐๐ช 2 โค ฮต
oAssuming that ๐ = 2 and ๐ = 64 we have that [Welch, โ74]
ฮผ โฅ 0.063
o Success of pursuits is guaranteed as long as
๐ช 0 <1
41 +
1
ฮผ(๐)โค
1
21 +
1
0.063โ 4.2
oOnly few (4) non-zeros GLOBALLY are allowed!!! This is a very pessimistic result!
33
Michael EladThe Computer-Science DepartmentThe Technion
Classical Sparse Theory for CSC ?
Theorem: BP is guaranteed to โsucceedโ โฆ. if ๐ช ๐ <๐
๐๐ +
๐
๐
min๐ช
๐ช 0 s. t. ๐ โ ๐๐ช 2 โค ฮต
oAssuming that ๐ = 2 and ๐ = 64 we have that [Welch, โ74]
ฮผ โฅ 0.063
o Success of pursuits is guaranteed as long as
๐ช 0 <1
41 +
1
ฮผ(๐)โค
1
21 +
1
0.063โ 4.2
oOnly few (4) non-zeros GLOBALLY are allowed!!! This is a very pessimistic result!
33
Michael EladThe Computer-Science DepartmentThe Technion
The main question we aim to address is this:
Can we generalize the vast theory of Sparseland to this new notion of local sparsity? For example, could we provide guarantees for success for pursuit algorithms?
๐ = 2
Moving to Local Sparsity: Stripes
min๐ช
๐ช 0,โs s. t. ๐ โ ๐๐ช 2 โค ฮต
โ0,โ Norm: ๐ช 0,โs = max
i๐i 0
๐ช 0,โs is low all ๐i are sparse every
patch has a sparse representation over ๐
34
๐i๐i+1
๐ช
Michael EladThe Computer-Science DepartmentThe Technion
Success of the Basis Pursuit
36
Theorem: For Y = Dฮ + E, if ฮป = 4 E 2,โp
, if
๐ช ๐,โ๐ฌ <
๐
๐๐ +
๐
๐ ๐
then Basis Pursuit performs very-well:
1. The support of ฮBP is contained in that of ฮ
2. ฮBP โ ฮ โ โค 7.5 E 2,โp
3. Every entry greater than 7.5 E 2,โp
is found
4. ฮBP is unique
ฮBP = minฮ
1
2Y โ Dฮ 2
2 + ฮป ฮ 1
Papyan, Sulam & Elad (โ17)
This is a much better result โ it allows few
non-zeros locally in each stripe, implying
a permitted O ๐non-zeros globally
Multi-Layered ConvolutionalSparse Modeling
40Michael EladThe Computer-Science DepartmentThe Technion
Michael EladThe Computer-Science DepartmentThe Technion
From CSC to Multi-Layered CSC๐ โ โ๐
๐1
๐0
๐1 โ โ๐ร๐๐1
๐1๐1
๐2
๐2 โ โ๐๐1ร๐๐2
๐1
๐ช1 โ โ๐๐1
๐ช1 โ โ๐๐1
๐ช2 โ โ๐๐2
Convolutional sparsity (CSC) assumes an
inherent structure is present in natural
signals
We propose to impose the same structure on the
representations themselves
Multi-Layer CSC (ML-CSC)
45
Michael EladThe Computer-Science DepartmentThe Technion
Intuition: From Atoms to Molecules๐ โ โ๐ ๐1 โ โ๐ร๐๐1 ๐2 โ โ๐๐1ร๐๐2
๐ช1 โ โ๐๐1
๐ช2 โ โ๐๐2
o We can chain the all the dictionaries into one effective dictionary๐eff = ๐1๐2๐3 โโโ ๐K ๐ฑ = ๐eff ๐ชK
o This is a special Sparseland (indeed, a CSC) model
o However:
A key property in this model: sparsity of the intermediate representations
The effective atoms: atoms molecules cells tissue body-parts โฆ
๐ช1 โ โ๐๐1
46
Michael EladThe Computer-Science DepartmentThe Technion
A Small Taste: Model Training (MNIST)
๐1๐2๐3 (28ร28)
MNIST Dictionary:โขD1: 32 filters of size 7ร7, with stride of 2 (dense)โขD2: 128 filters of size 5ร5ร32 with stride of 1 - 99.09 % sparseโขD3: 1024 filters of size 7ร7ร128 โ 99.89 % sparse
๐1๐2 (15ร15)
๐1 (7ร7)
47
Michael EladThe Computer-Science DepartmentThe Technion
A Small Taste: Model Training (MNIST)
๐1๐2๐3 (28ร28)
MNIST Dictionary:โขD1: 32 filters of size 7ร7, with stride of 2 (dense)โขD2: 128 filters of size 5ร5ร32 with stride of 1 - 99.09 % sparseโขD3: 1024 filters of size 7ร7ร128 โ 99.89 % sparse
๐1๐2 (15ร15)
๐1 (7ร7)
47
Michael EladThe Computer-Science DepartmentThe Technion
ML-CSC: Pursuit
o DeepโCoding Problem ๐๐๐ฮป (dictionaries are known):
Find ๐ชj j=1
K๐ . ๐ก.
๐ = ๐1๐ช1 ๐ช1 0,โs โค ฮป1
๐ช1 = ๐2๐ช2 ๐ช2 0,โs โค ฮป2
โฎ โฎ๐ชKโ1 = ๐K๐ชK ๐ชK 0,โ
s โค ฮปK
o Or, more realistically for noisy signals,
Find ๐ชj j=1
K๐ . ๐ก.
๐ โ ๐1๐ช1 2 โค โฐ ๐ช1 0,โs โค ฮป1
๐ช1 = ๐2๐ช2 ๐ช2 0,โs โค ฮป2
โฎ โฎ๐ชKโ1 = ๐K๐ชK ๐ชK 0,โ
s โค ฮปK
48
Michael EladThe Computer-Science DepartmentThe Technion
A Small Taste: Pursuit
ฮ1
ฮ2
ฮ3
ฮ0
Y
99.51% sparse(5 nnz)
99.52% sparse(30 nnz)
94.51 % sparse(213 nnz)
49
x=๐1ฮ1
x=๐1๐2ฮ2
x=๐1๐2๐3ฮ3
x
Michael EladThe Computer-Science DepartmentThe Technion
The simplest pursuit algorithm (single-layer case) is the THR algorithm, which operates on a given input signal ๐ by:
๐ช = ๐ซ๐ฝ ๐T๐
ML-CSC: The Simplest Pursuit
๐ = ๐๐ช + ๐and ๐ช is sparse
50
Michael EladThe Computer-Science DepartmentThe Technion
The layered (soft nonnegative) thresholding and the CNN forward pass
algorithm are the very same thing !!!
o Layered thresholding (LT):
oNow letโs take a look at how Conv. Neural Network operates:
๐ช2 = ๐ซฮฒ2 ๐2T ๐ซฮฒ1 ๐1
T๐๐ช2 = ๐ซฮฒ2 ๐2T ๐ซฮฒ1 ๐1
T๐๐ช2 = ๐ซฮฒ2 ๐2T ๐ซฮฒ1 ๐1
T๐
Consider this for Solving the DCP
Estimate ๐ช1 via the THR algorithm
Estimate ๐ช2 via the THR algorithm
๐๐๐ฮปโฐ : Find ๐ชj j=1
K๐ . ๐ก.
๐ โ ๐1๐ช1 2 โค โฐ ๐ช1 0,โs โค ฮป1
๐ช1 = ๐2๐ช2 ๐ช2 0,โs โค ฮป2
โฎ โฎ๐ชKโ1 = ๐K๐ชK ๐ชK 0,โ
s โค ฮปK
๐ ๐ = ReLU ๐2 +๐2T ReLU ๐1 +๐1
T๐
51
Michael EladThe Computer-Science DepartmentThe Technion
๐
Theoretical Path
M A๐ชi i=1
K๐ = ๐1๐ช1๐ช1 = ๐2๐ช2
โฎ๐ชKโ1 = ๐K๐ชK
๐ชi is ๐0,โ sparse
๐๐๐ฮปโฐ
Layered THR(Forward Pass)
Maybe other?
๐
Armed with this view of a generative source model, we may ask new and daring theoretical questions
52
Michael EladThe Computer-Science DepartmentThe Technion
Success of the Layered-THR
Theorem: If ๐ชi 0,โs <
1
21 +
1
ฮผ ๐iโ ๐ชimin
๐ชimax โ
1
ฮผ ๐iโ
ฮตLiโ1
๐ชimax
then the Layered Hard THR (with the proper thresholds)
finds the correct supports and ๐ชiLT โ ๐ชi 2,โ
pโค ฮตL
i , where
we have defined ฮตL0 = ๐ 2,โ
pand
ฮตLi = ๐ชi 0,โ
pโ ฮตL
iโ1 + ฮผ ๐i ๐ชi 0,โs โ 1 ๐ชi
max
The stability of the forward pass is guaranteed if the underlying representations are locally
sparse and the noise is locally bounded
Problems: 1. Contrast2. Error growth3. Error even if no noise
54
Papyan, Romano & Elad (โ17)
Michael EladThe Computer-Science DepartmentThe Technion
Layered Basis Pursuit (BP)
๐ช1LBP = min
๐ช1
1
2๐ โ ๐1๐ช1 2
2 + ฮป1 ๐ช1 1
๐ช2LBP = min
๐ช2
1
2๐ช1LBP โ ๐2๐ช2 2
2+ ฮป2 ๐ช2 1
๐๐๐ฮปโฐ : Find ๐ชj j=1
K๐ . ๐ก.
๐ โ ๐1๐ช1 2 โค โฐ ๐ช1 0,โs โค ฮป1
๐ช1 = ๐2๐ช2 ๐ช2 0,โs โค ฮป2
โฎ โฎ๐ชKโ1 = ๐K๐ชK ๐ชK 0,โ
s โค ฮปK
oWe chose the Thresholding algorithm due to its simplicity, but we do know that there are better pursuit methods โ how about using them?
o Lets use the Basis Pursuit instead โฆ
Deconvolutional networks[Zeiler, Krishnan, Taylor & Fergus โ10]
55
Michael EladThe Computer-Science DepartmentThe Technion
Success of the Layered BP
Theorem: Assuming that ๐ช๐ข ๐,โ๐ฌ <
๐
๐๐ +
๐
๐ ๐๐ข
then the Layered Basis Pursuit performs very well:
1. The support of ๐ชiLBP is contained in that of ๐ชi
2. The error is bounded: ๐ชiLBP โ ๐ชi 2,โ
pโค ฮตL
i , where
ฮตLi = 7.5i ๐ 2,โ
p ฯj=1i ๐ชj 0,โ
p
3. Every entry in ๐ชi greater than
ฮตLi / ๐ชi 0,โ
pwill be found
56
Papyan, Romano & Elad (โ17)
Problems: 1. Contrast2. Error growth3. Error even if no noise
Michael EladThe Computer-Science DepartmentThe Technion
Layered Iterative Soft-Thresholding:
๐ชjt = ๐ฎฮพj/cj ๐ชj
tโ1 +๐jT ๐ชjโ1 โ๐j๐ชj
tโ1
Layered Iterative Thresholding
Layered BP: ๐ชjLBP = min
๐ชj
1
2๐ชjโ1LBP โ ๐j๐ชj 2
2+ ฮพj ๐ชj 1
Can be seen as a very deep recurrent neural network
[Gregor & LeCun โ10]
t
j
j
Note that our suggestion implies that groups of layers share the same dictionaries
57
Michael EladThe Computer-Science DepartmentThe Technion
Where are the Labels?
M๐ = ๐1๐ช1๐ช1 = ๐2๐ช2
โฎ๐ชKโ1 = ๐K๐ชK
๐ชi is ๐0,โ sparse
๐
52
L ๐
Answer 1:
o We do not need labels because everything we show refer to the unsupervised case, in which we operate on signals, not necessarily in the context of recognition
We presented the ML-CSC as a machine that produces signals X
Michael EladThe Computer-Science DepartmentThe Technion
Where are the Labels?
M๐ = ๐1๐ช1๐ช1 = ๐2๐ช2
โฎ๐ชKโ1 = ๐K๐ชK
๐ชi is ๐0,โ sparse
๐
Answer 2:
o In fact, this model could be augmented by a synthesis of the corresponding label by:
L ๐ = ๐ ๐๐๐ c + ฯj=1K wj
Tฮj
o This assumes that knowing the representations (or maybe their supports?) suffice for identifying the label
o Thus, a successful pursuit algorithm can lead to an accurate recognition if the network is augmented by a FC classification layer
52
L ๐
We presented the ML-CSC as a machine that produces signals X
Michael EladThe Computer-Science DepartmentThe Technion
What About Learning?
All these models rely on proper Dictionary Learning Algorithms to fulfil their mission:
Sparseland: We have unsupervised and supervised such algorithms, and a beginning of theory to explain how these work
CSC: We have few and only unsupervised methods, and even these are not fully stable/clear
ML-CSC: One algorithm has been proposed (unsupervised) โ see ArxiV
52
ML-CSC Multi-Layered Convolutional Sparse Coding
SparselandSparse
Representation Theory
CSCConvolutional
Sparse Coding
Michael EladThe Computer-Science DepartmentThe Technion
Time to Conclude
58
Michael EladThe Computer-Science DepartmentThe Technion
This Talk Sparseland The desire to model data
We spoke about the importance of models in signal/image
processing and described Sparseland in details
59
Michael EladThe Computer-Science DepartmentThe Technion
This Talk Sparseland The desire to model data
Novel View of Convolutional Sparse Coding
We presented a theoretical study of the CSC model and how to operate locally while getting global optimality
59
Michael EladThe Computer-Science DepartmentThe Technion
This Talk
Multi-Layer Convolutional Sparse Coding
Sparseland The desire to model data
Novel View of Convolutional Sparse Coding
We propose a multi-layer extension of CSC, shown to be tightly connected to CNN
59
Michael EladThe Computer-Science DepartmentThe Technion
This Talk
A novel interpretation and theoretical
understanding of CNN
Multi-Layer Convolutional Sparse Coding
Sparseland The desire to model data
Novel View of Convolutional Sparse Coding
The ML-CSC was shown to enable a theoretical study of CNN, along with new insights
59
Michael EladThe Computer-Science DepartmentThe Technion
This Talk
A novel interpretation and theoretical
understanding of CNN
Multi-Layer Convolutional Sparse Coding
Sparseland The desire to model data
Novel View of Convolutional Sparse Coding
Take Home Message 1: Generative modeling of data
sources enables algorithm development along with theoretically analyzing
algorithmsโ performance
59
Take Home Message 2: The Multi-Layer
Convolutional Sparse Coding model could be
a new platform for understanding and developing deep-learning solutions
Michael EladThe Computer-Science DepartmentThe Technion
More on these (including these slides and the relevant papers) can be found in http://www.cs.technion.ac.il/~elad
Questions?