Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion •...
Transcript of Rigid-Motion Scattering for Image Classificationsifre/research/phd_defense...Conclusion •...
Rigid-Motion Scattering for Image Classification
Laurent Sifre
PhD Defense
October 6th, 2014
Image ClassificationTraining
Set
ClassifierRepresentation
TestingSet
Minimize training loss
Evaluate testing error
image class
fabric
tree
brick
fabric
tree
brick
Representation For Image Classification
Building a representation that is: - invariant to geometric
transformations, - informative, - stable to deformations is a fondamental problem of computer vision.
It is a hard problem to satisfy the three requirements.
Modulus of Fourier transform preserves information, is translation invariant, but highly unstable to deformations.
Existing image representation
What filters, non-linearities, connectivity? Many questions mostly answered by empirical performances.
Deep Convolutional Networks (ConvNets), Hinton, Lecun, Bengio Cascade of convolutions and regularizing non-linearities. Filters are learned.
Scattering networks (Mallat, Bruna): Mathematical construction of deep « scattering »networks with thorough analysis of their properties.
Handcrafted shallow representations. (SIFT, HOG, RIFT), Lowe and so many others. Various ad-hoc techniques. Difficult to combine to tackle harder problems.
SIFT
LeNet
Overview
• Wavelet transform and scattering network.
• Rigid-motion group.
• Separable scattering.
• Rigid-motion wavelet transform.
• Joint scattering.
• Application: texture classification.
• Application: separable ConvNets.
Problem: how to extend scattering to other groups ?
Wavelet Transform
Complementary information is recovered by rotated and dilated wavelets: scale
orientationThe wavelet transform decomposes into local average and wavelet coefficients .
Convolution with a window builds local translation invariance but also loses most of the information.
Theorem: (Littlewood-Paley ) iIf it tiles the Fourier plane tightly
then the wavelet transform almost preserves the norm
Definitions: The norm of the wavelet transform is
The associated Littlewood-Paley function is
Unitary Wavelets
In this case we say that is an frame.
Wavelet ModulusComplex modulus has a regularizing effect on analytic wavelet coefficients.
invariant part
non-linear covariant part
Wavelet modulus operator:
Translation ScatteringScattering is a cascade of wavelet-modulus:
is the « scattering order ».
Order 0 ScatteringScattering is a cascade of wavelet-modulus:
• Local average of the image. • Not very informative.
Order 1 ScatteringScattering is a cascade of wavelet-modulus:
• Local average of the wavelet modulus coefficients.
• Similar to SIFT descriptors.
Order 2 ScatteringScattering is a cascade of wavelet-modulus:
• Deep coefficients.
• Much richer information.
• Similar to ConvNets.
Order m ScatteringScattering is a cascade of wavelet-modulus:
Compact « path » notation
Scattering Properties
Theorem: (Mallat) Scattering almost preserves the norm. If , then
Theorem: (Mallat) Scattering is invariant to translations and stable to deformations. For any and any twice differentiable deformation such that ,
where
A deformation acts on image with
The Rigid-Motion Group• Rigid-motion
• Action on image position
• Compatibility equation
• Group law
• Action on images
Separable Rigid-Motion Invariance• Suppose that we have two operators that build
resp. invariance translation to and .
• Suppose that is also covariant to rotations, that is there exists an action of on such that
• Theorem 1: we can factorize into disjoint orbits and apply along each orbit. The resulting operator is invariant to rigid motions.
Rotation Covariance of Scattering
The Morlet wavelets are oriented, therefore a change of variable shows that Cascading this yields where For a fully delocalized scattering
Separable ScatteringTranslation Scattering
Orbit Extraction
Orientation Scattering
• Same properties than scattering • + rotation invariance
Separable invariant are invariant to larger group than intended
Each row translated
independently
1D Fourier + mod along rows
1D Fourier + mod along
columns
Identical representation
2D Fourier + mod
Non-identical representation
Wavelet Modulus Separates Horizontal and Vertical Grids
Equal EqualTranslated
Scattering retransforms different paths independently and then averages, which removes the translation.
Both texture have same scattering.
Signal processing on group• Recent works [Boscain12, Duits12] have
developed signal processing tools on rigid-motion group.
• Recent ConvNet [Krizhevsky12] uses three dimensional convolutions to capture higher level concept.
Rigid-motion Convolution• For any group:
• For the rigid-motion group:
• Naive implementation:= # positions = # orientations
• Factorization of convolutions for separable filters:
Fast rigid-motion convolutions• Separable rigid-motion filters:
2D spatial 1D orientation
Naive convolution:
2D conv 1D conv
Separable convolution:
2D Separable Wavelets1D wavelets
2D separable wavelets
4 types of wavelets, one for every possible combination
Separable Joint Wavelet Transform
The associated wavelet transform is an operator
A separable joint wavelet family is defined as
defined as
where = spatial orientation, spatial scale, orientation scale
Rigid-motion Wavelet FrameTheorem 2: If there exist such that
and
then the family is an frame i.e.
where
2D Fast Wavelet Transform (FWT)Suppose that there exists filters such that
Then
can be computed as
Rigid-Motion FWTFor each slice, 2D FWT For each leaf, 1D FWT
Covariance of the Wavelet TransformProperty: the 2D wavelet transform is covariant to the action of the rigid-motion group. For any rigid motion . and any image ,
where is defined for as
and for as
Joint Scattering
First, 2d wavelet modulus:
Then, cascade of rigid-motion wavelet modulus:
Order 0 Joint Scattering
• Same as order 0 translation scattering. • Local average of input image. • Local translation invariance. • Full rotation invariance. • Not very informative.
Order 1 Joint Scattering
• Spatial and orientation local average of the wavelet modulus.
• Indexed by position and scale.
• Family of fully rotation invariant 2D signals if . as it is the case here.
• Family of partially invariant 3D signal if .
Order 2 Joint Scattering
• Same invariance properties as order 1. • Interactions between different positions and orientations.
Joint Scattering InvarianceTheorem 3: there exists a constant such that for any the rigid-motion joint scattering at spatial scale and at rotational scale verifies
• The term is the largest
displacement induced by on the support of .
• If and then
OUTex 10 with Separable Scattering
Training: Single orientation.
Testing: (rot) 8 rotations.
Testing: (rot-shear) 8 orientations. Shear 1.3 horizontal.
• Good test case for invariant descriptors: the invariance cannot be learned from the data.
• Nearest neighbor classifier.Stability
Higher orders improve results
Scale Invariance• Scale is different from rotations:
• These differences make it difficult to use wavelets along scales.
• Scattering is stable to deformations and dilations, thus slightly dilated or deformed version of the same signal lie on a small dimensional subspace.
• We thus use the PCA classifier from [Bruna12] for experiments involving significant deformation and scales.
• Limited range of available scales.
• Not a periodic group.
• Wavelet coefficients at different scales are sampled at different resolutions.
PCA ClassifierAt training time, the PCA classifier models each class as the affine subspace generated by the first eigenvectors in the SVD of all scattering vector of the class.
At testing time, a test image is classified according to the minimum projection error of its scattering vector:
Logarithm and Scale Augmentation• Scattering vector of texture image have typically power law behavior
w.r.t. scale. An logarithm helps further linearizing this behavior. • To improve scale invariance, we augment the training set with the
scattering of dilated versions of each image with scales . • At testing time, we average the scattering of dilated versions of the
original image. • The scattering is covariant to scale. Dilated scattering vectors can be
deducted from the scattering vector of the original image.
KTH-Tips
• 10 classes. • 9 scales, 3 viewpoints, 3 illuminations = 81 images/class. • Low resolution 200x200. • No in-plane rotation. • Data is split between between training and testing. • Results are averaged over 200 random splits.
Rotation invariance does not degrade accuracy.
Scaleinvariance increases accuracy.
• 25 classes of 40 images. • Higher resolution 640x480. • Large, uncalibrated affine transformations. • Large deformations.
Rotation invariance increases accuracy.
Scaleinvariance increases accuracy.
UIUCTex
• 25 classes of 40 images. • Similar to UIUC. • Higher resolution 1280x960.
Rotation invariance increases accuracy.
Scaleinvariance increases accuracy.
UMD
Hyperparameters• max scale for KTH-Tips, UICTex, UMD.
• scales per octaves.
• orientations between
• full rotational invariance.
• 4 dilations for scale invariance
• Mirror padding to avoid losing to much at boundaries.
State-of-the-art results on three datasetswith almost the same hyper parameters.
Separable ConvNet• Main difference between translation and joint
scattering is 3D convolution which recombines the information from different paths.
• ConvNet 2000’s: 2D convolutions
• ConvNet 2010’s: 3D convolutions
AlexNet
LeNet
First 2 layers of AlexNetFirst layer
Second layer
Highly redondant along input
depths
waste of capacity
Separable Convolutions in Convnet.
2D conv
1D conv
For a given capacity: • Less to learn • Less to compute
ImageNet ILSVRC2012 1 K classes, 1.2 M images.
20% less steps-to-accuracy withGoogle’s AlexNet implementation.
3D conv
Vanilla Separable
Conclusion• Problem: the scattering transform (Mallat et al.) is a translation invariant,
informative and stable signal representation. How to extends its properties to other, more complicated groups that affects natural images?
• Focus on the affine group for theory and on the rigid-motion group (translations and rotations) in applications.
• The separable scattering is the most straightforward way. It cascades scattering along position and along orientation parameter. It loses information about internal variables joint distribution.
• The joint scattering recombines the internal variables of intermediate layers by cascading wavelet modulus operator on geometric group. It is a tighter invariant.
• Proofs that these operators are unitary and invariant.
• Fast algorithms.
• Texture classification: state-of-the-art on most datasets.
• Generic object classification: more efficients convolutions in ConvNets.