CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis

1
Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda Most discriminative image processing operators (IPOs) recognition Feature space LDA classifier Generating matrices (image processing operators) Goal Motivation Contribution Find a most discriminative set of image processing operations for LDA. For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ? Ad-hocdiscriminatively ! Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices. Linear IPO + LDA = LDA with increased samples x j = G j x m 0 = ¯ Gm m 0 i = 1 Jn i J X j=1 X x2Xi G j x = 1 J J X j=1 G j m i = ¯ Gm i S 0 i = ¯ G (S i - R i ) ¯ G T + 1 J J X j=1 G j R i G T j S 0 W = ¯ G (S W - R W ) ¯ G T + 1 J J X j G j R W G j T S 0 B = ¯ GS B ¯ G T Ri = 1 ni X x2Xi xx T RW = c X i Ri X 0 = ¯ G (X - R all ) ¯ G T + 1 J J X j=1 G j R all G T j e S 0 i =A T P T S 0 i PA e S 0 W =A T P T S 0 W PA e S 0 B =A T P T S 0 B PA yj =A T P T xj = A T P T Gjx ˜ m 0 i =A T P T = A T P T ¯ Gmi ˜ m 0 =A T P T m 0 = A T P T ¯ Gm ( P T S 0 W P ) -1 P T S 0 B P tr( e S 0 B ) tr( e S 0 W ) increased sample Mean of class i for increased samples Mean of all increased samples an original sample a generating matrix (an image processing operator) average of image processing operations Scatter matrix of class i for increased samples Within-class scatter matrix for increased samples Between-class scatter matrix for increased samples are scatter matrices for original (non-increased) samples scatter matrices Rayleigh quotient Generalized Eigenvalue problem Training sample Scatter matrices PCA LDA Feature space P PCA projection matrix Covariance matrix SW,SB Dimensionality reduction Given , we don’t need to actually increase training samples. But, need more memory to store{G j } Analysis of IPO: the spectral decomposition Definition 1 Let f (x),g(x) 2 L 2 (R 2 ) be complex- valued 2D functions where x 2 R 2 . The inner product is defined as (f,g) Z R 2 f (x) g(x)dx, where ¯ g is the complex conjugate of g. An operator G : f 7! g is linear if it satis- fies G(af + bg)= aG(f )+ bG(g), 8a, b 2 R. G is the adjoint operator of G if it satis- fies (Gf, g)=(f,G g). Corollary 1 Filtering or geometric transformation opera- tors G are normal operators which satisfy G G = GG . G = X λ i P i A normal operator can be decomposed into projection operators! (a) x (b) Gx (c) G T Gx (d) G T x x E1x E2x E3x E4x E5x E6x 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value (a) H11,H21 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value (b) H12,H22 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value 100 200 300 400 500 600 700 800 900 1000 0.1 0.05 0 0.05 0.1 0.15 index eigen value (c) H13,H23 P 11i x P 21i x P 12i x P 22i x P 13i x P 23i x But, is it feasible for a generating matrix? Yes! Is a fltering Hermite? ||G - G T || < 10 -6 Almost symmetric Is a geometric trans. Unitary? Transpose is apparently inverse G = H 1 + iH 2 , i = -1 Are eigenvalues complex? Use Hermite decomposition. So, two step approximation. an operator two Hermite operators (which have real eigenvalues) H 1 = G + G T 2 ,H 2 = G - G T 2i G ' X j a j E j = X a j (H 1j + iH 2j ) ' X j a j X i (λ 1ji P 1ji + iλ 2ji P 2ji ) Examples Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to wavelets? Eigenoperators transorm images to variants. Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images? A: Yes. LDA + IPO = LDIPOA: find a set of discriminative IPOs ¯ G (k) = 1 k+1 Pk l=0 G (l) D = PA e S 0(k) W = D T ¯ G (k) (SW - RW) ¯ G (k) T D + 1 k +1 k X l=0 D T G (l) RWG (l) T D e S 0(k) B = D T ¯ G (k) SB ¯ G (k) T D X 0(k) = ¯ G (k) (X - Rall) ¯ G (k) T + 1 k +1 k X l=0 G (l) RallG (l) T S 0(k) W = ¯ G (k) (SW - RW) ¯ G (k) T + 1 k +1 k X l=0 G (l) RWG (l) T S 0(k) B = ¯ G (k) SB ¯ G (k) T Algorithm 1 LDIPOA 1: Compute PCA P and LDA A. G 0 I . 2: for k =1,..., do 3: repeat 4: step: (k) = argmax E(A, P, ) 5: PCA step: Compute P with (k) . 6: LDA step: A = argmax A E(A, P, (k) ) 7: until E converges 8: end for alpha step PCA step LDA step At each step k, estimate a single generating matrix represented as a linear combination. G (k) = J X j (k) j G j ((k) 1 , (k) 2 ,..., (k) J ) T = (k) Experiments with FERET dataset The proposed algorithm iteratively estimates α (coeffs. of generating matrices) P (PCA) A (LDA) at the same time. k: the number of estimated generating matrices 10 generating matrices are used to increase the dataset 11 times. 1 generating matrix is used to increase the dataset double. No generating matrices are used (normal LDA) The Rayleigh quotient x j = G j x Proposition 1 A filtering is defined as Gf (x)= Z G(x, y)f (y)dy, where the kernel is symmetric G(x, y)= G(y, x) and real valued. G is an Hermite operator which satisfies G = G. Proposition 2 A geometric (ane) transformation G is defined as Gf (x)= |A| 1/2 f (Ax + t), where |A| 6 =0. G is a unitary operator which satisfies G G = I . real imag real imag real imag Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA) Maximized in a few steps A few generating matrices are enough to improve the performance. Bad approximation of generating matrices do not lead to any improvementi =1 i =2 . . .

description

Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda: "Linear Discriminative Image Processing Operator Analysis", Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012), pp. 2526-2532, 2012. Providence Convention Center, Providence, Rhode Island, USA, June 16-21th, 2012.

Transcript of CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis

Page 1: CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis

Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda

Most discriminative image processing operators (IPOs)

reco

gniti

on

Feat

ure

spac

e

LDA

clas

sifie

r

Generating matrices (image processing operators)

Goal

Motivation

Contribution

Find a most discriminative set of image processing operations for LDA.

For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ?

Ad-hoc… discriminatively !

Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices.

Linear IPO + LDA = LDA with increased samples

xj = Gjx

m0 = Gm

m

0i =

1

Jni

JX

j=1

X

x2Xi

Gjx =1

J

JX

j=1

Gjmi = Gmi

S0i = G (Si �Ri) G

T +1

J

JX

j=1

GjRiGTj

S0W = G (SW �RW ) GT +

1

J

JX

j

GjRWGjT

S0B = GSBG

T

Ri =1

ni

X

x2Xi

xx

T

RW =cX

i

Ri

X 0 = G (X �Rall) GT +

1

J

JX

j=1

GjRallGTj

eS0i =ATPTS0

iPA

eS0W =ATPTS0

WPA

eS0B =ATPTS0

BPA

yj =ATPTxj = ATPTGjx

m

0i =ATPT = ATPT Gmi

m

0 =ATPTm

0 = ATPT Gm

�PTS0

WP��1

PTS0BP

tr(eS0B)

tr(eS0W )

increased sample

Mean of class i for increased samples

Mean of all increased samples

an original sample

a generating matrix (an image processing operator)

average of image processing operations

Scatter matrix of class i for increased samples

Within-class scatter matrix for increased samples

Between-class scatter matrix for increased samples

are scatter matrices for original (non-increased) samples

scatter matrices Rayleigh quotient

Generalized Eigenvalue problem

Training sample

Scatter matrices

PCA

LDA Feature space

PPCA projection matrix

Covariance matrix

SW , SB

Dimensionality reduction

Given , we don’t need to actually increase training samples. But, need more memory to store…

{Gj}

Analysis of IPO: the spectral decomposition

Definition 1 Let f(x), g(x) 2 L2(R2

) be complex-

valued 2D functions where x 2 R2. The inner

product is defined as

(f, g) ⌘Z

R2

f(x)g(x)dx,

where g is the complex conjugate of g.An operator G : f 7! g is linear if it satis-

fies G(af + bg) = aG(f) + bG(g), 8a, b 2 R.

G⇤is the adjoint operator of G if it satis-

fies (Gf, g) = (f,G⇤g).

Corollary 1 Filtering or geometric transformation opera-

tors G are normal operators which satisfy G⇤G = GG⇤.

G =X

�iPiA normal operator can be decomposed into projection operators!

(a) x (b) Gx (c) GTGx (d) GTx

x

E1x E2x E3x E4x E5x E6x

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(a) H11, H21

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(b) H12, H22

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

indexei

gen

valu

e

(c) H13, H23 P11ix P21ix P12ix P22ix P13ix P23ix

But, is it feasible for a generating matrix? Yes! Is a fltering Hermite?

||G�GT || < 10�6Almost symmetric

Is a geometric trans. Unitary? Transpose is apparently inverse

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

i =p�1

Are eigenvalues complex? Use Hermite decomposition.

So, two step approximation.

an operator two Hermite operators (which have real eigenvalues)

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

G 'X

j

ajEj =X

aj(H1j + iH2j) 'X

j

ajX

i

(�1jiP1ji + i�2jiP2ji)

Examples

Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to … wavelets?

Eigenoperators transorm images to variants.

Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images?

A: Yes.

LDA + IPO = LDIPOA: find a set of discriminative IPOs

G(k) = 1k+1

Pkl=0 G

(l) D = PA

eS0(k)W = DT G(k) (SW �RW ) G(k)TD +

1

k + 1

kX

l=0

DTG(l)RWG(l)TD

eS0(k)B = DT G(k)SBG

(k)TD

X 0(k) = G(k) (X �Rall) G(k)T +

1

k + 1

kX

l=0

G(l)RallG(l)T

S0(k)W = G(k) (SW �RW ) G(k)T +

1

k + 1

kX

l=0

G(l)RWG(l)T

S0(k)B = G(k)SBG

(k)T

Algorithm 1 LDIPOA

1: Compute PCA P and LDA A. G0 I.2: for k = 1, . . . , do3: repeat

4: ↵ step: ↵(k) = argmax↵ E(A,P,↵)5: PCA step: Compute P with ↵(k).6: LDA step: A = argmaxA E(A,P,↵(k))7: until E converges8: end for

alpha step

PCA step

LDA step

At each step k, estimate a single generating matrix represented as a linear combination.

G(k) =JX

j

↵(k)j Gj (↵(k)

1 ,↵(k)2 , . . . ,↵(k)

J )T = ↵(k)

Experiments with FERET dataset

The proposed algorithm iteratively estimates • α (coeffs. of generating matrices) • P (PCA) • A (LDA) at the same time.

k: the number of estimated generating matrices

10 generating matrices are used to increase the dataset 11 times.

1 generating matrix is used to increase the dataset double.

No generating matrices are used (normal LDA)

The Rayleigh quotient

xj = Gjx

Proposition 1 A filtering is defined as

Gf(x) =

ZG(x,y)f(y)dy,

where the kernel is symmetric G(x,y) = G(y,x) and real valued.

G is an Hermite operator which satisfies G⇤ = G.

Proposition 2 A geometric (a�ne) transformation G is defined

as

Gf(x) = |A|1/2f(Ax+ t),

where |A| 6= 0. G is a unitary operator which satisfies G⇤G = I.

real imag real imag real imag

Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA)

Maximized in a few steps

A few generating matrices are enough to improve the performance.

Bad approximation of generating matrices do not lead to any improvement…

i = 1

i = 2

...