CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis
-
Upload
toru-tamaki -
Category
Technology
-
view
5.443 -
download
1
description
Transcript of CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis
Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda
Most discriminative image processing operators (IPOs)
reco
gniti
on
Feat
ure
spac
e
LDA
clas
sifie
r
Generating matrices (image processing operators)
Goal
Motivation
Contribution
Find a most discriminative set of image processing operations for LDA.
For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ?
Ad-hoc… discriminatively !
Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices.
Linear IPO + LDA = LDA with increased samples
xj = Gjx
m0 = Gm
m
0i =
1
Jni
JX
j=1
X
x2Xi
Gjx =1
J
JX
j=1
Gjmi = Gmi
S0i = G (Si �Ri) G
T +1
J
JX
j=1
GjRiGTj
S0W = G (SW �RW ) GT +
1
J
JX
j
GjRWGjT
S0B = GSBG
T
Ri =1
ni
X
x2Xi
xx
T
RW =cX
i
Ri
X 0 = G (X �Rall) GT +
1
J
JX
j=1
GjRallGTj
eS0i =ATPTS0
iPA
eS0W =ATPTS0
WPA
eS0B =ATPTS0
BPA
yj =ATPTxj = ATPTGjx
m
0i =ATPT = ATPT Gmi
m
0 =ATPTm
0 = ATPT Gm
�PTS0
WP��1
PTS0BP
tr(eS0B)
tr(eS0W )
increased sample
Mean of class i for increased samples
Mean of all increased samples
an original sample
a generating matrix (an image processing operator)
average of image processing operations
Scatter matrix of class i for increased samples
Within-class scatter matrix for increased samples
Between-class scatter matrix for increased samples
are scatter matrices for original (non-increased) samples
scatter matrices Rayleigh quotient
Generalized Eigenvalue problem
Training sample
Scatter matrices
PCA
LDA Feature space
PPCA projection matrix
Covariance matrix
SW , SB
Dimensionality reduction
Given , we don’t need to actually increase training samples. But, need more memory to store…
{Gj}
Analysis of IPO: the spectral decomposition
Definition 1 Let f(x), g(x) 2 L2(R2
) be complex-
valued 2D functions where x 2 R2. The inner
product is defined as
(f, g) ⌘Z
R2
f(x)g(x)dx,
where g is the complex conjugate of g.An operator G : f 7! g is linear if it satis-
fies G(af + bg) = aG(f) + bG(g), 8a, b 2 R.
G⇤is the adjoint operator of G if it satis-
fies (Gf, g) = (f,G⇤g).
Corollary 1 Filtering or geometric transformation opera-
tors G are normal operators which satisfy G⇤G = GG⇤.
G =X
�iPiA normal operator can be decomposed into projection operators!
(a) x (b) Gx (c) GTGx (d) GTx
x
E1x E2x E3x E4x E5x E6x
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
(a) H11, H21
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
(b) H12, H22
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
indexei
gen
valu
e
(c) H13, H23 P11ix P21ix P12ix P22ix P13ix P23ix
But, is it feasible for a generating matrix? Yes! Is a fltering Hermite?
||G�GT || < 10�6Almost symmetric
Is a geometric trans. Unitary? Transpose is apparently inverse
G = H1 + iH2, H1 =G+GT
2, H2 =
G�GT
2i
i =p�1
Are eigenvalues complex? Use Hermite decomposition.
So, two step approximation.
an operator two Hermite operators (which have real eigenvalues)
G = H1 + iH2, H1 =G+GT
2, H2 =
G�GT
2i
G 'X
j
ajEj =X
aj(H1j + iH2j) 'X
j
ajX
i
(�1jiP1ji + i�2jiP2ji)
Examples
Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to … wavelets?
Eigenoperators transorm images to variants.
Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images?
A: Yes.
LDA + IPO = LDIPOA: find a set of discriminative IPOs
G(k) = 1k+1
Pkl=0 G
(l) D = PA
eS0(k)W = DT G(k) (SW �RW ) G(k)TD +
1
k + 1
kX
l=0
DTG(l)RWG(l)TD
eS0(k)B = DT G(k)SBG
(k)TD
X 0(k) = G(k) (X �Rall) G(k)T +
1
k + 1
kX
l=0
G(l)RallG(l)T
S0(k)W = G(k) (SW �RW ) G(k)T +
1
k + 1
kX
l=0
G(l)RWG(l)T
S0(k)B = G(k)SBG
(k)T
Algorithm 1 LDIPOA
1: Compute PCA P and LDA A. G0 I.2: for k = 1, . . . , do3: repeat
4: ↵ step: ↵(k) = argmax↵ E(A,P,↵)5: PCA step: Compute P with ↵(k).6: LDA step: A = argmaxA E(A,P,↵(k))7: until E converges8: end for
alpha step
PCA step
LDA step
At each step k, estimate a single generating matrix represented as a linear combination.
G(k) =JX
j
↵(k)j Gj (↵(k)
1 ,↵(k)2 , . . . ,↵(k)
J )T = ↵(k)
Experiments with FERET dataset
The proposed algorithm iteratively estimates • α (coeffs. of generating matrices) • P (PCA) • A (LDA) at the same time.
k: the number of estimated generating matrices
10 generating matrices are used to increase the dataset 11 times.
1 generating matrix is used to increase the dataset double.
No generating matrices are used (normal LDA)
The Rayleigh quotient
xj = Gjx
Proposition 1 A filtering is defined as
Gf(x) =
ZG(x,y)f(y)dy,
where the kernel is symmetric G(x,y) = G(y,x) and real valued.
G is an Hermite operator which satisfies G⇤ = G.
Proposition 2 A geometric (a�ne) transformation G is defined
as
Gf(x) = |A|1/2f(Ax+ t),
where |A| 6= 0. G is a unitary operator which satisfies G⇤G = I.
real imag real imag real imag
Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA)
Maximized in a few steps
A few generating matrices are enough to improve the performance.
Bad approximation of generating matrices do not lead to any improvement…
i = 1
i = 2
...