LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher...

20
LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016

Transcript of LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher...

Page 1: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

LEC 3: Fisher Discriminant Analysis (FDA)– A Supervised Dimensionality Reduction Approach

Dr. Guangliang Chen

February 18, 2016

Page 2: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Outline

• Motivation:

– PCA is unsupervised which does not use training labels

– Variance is not always useful for classification

• FDA: a supervised dimensionality reduction approach

– 2-class FDA

– Multiclass FDA

• Comparison between PCA and FDA

Page 3: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Two-class FDA

See Prof. Olga Veksler’s slides at

http://www.csd.uwo.ca/~olga/Courses/CS434a_541a/Lecture8.pdf

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 3/20

Page 4: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Two-class FDA (a summary)The optimal discriminatory direction is

v∗ = S−1w (µ1 − µ2) (plus normalization)

It is the solution of

maxv:∥v∥=1

vT SbvvT Swv

←− (µ̃1 − µ̃2)2

s̃21 + s̃2

2

where

Sb = (µ1 − µ2)(µ1 − µ2)T

Sw = S1 + S2, Si =∑

x∈ Class i

(x− µi)(x− µi)T

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 4/20

Page 5: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Experiment (2 digits)MNIST handwritten digits 0 and 1

0 2000 4000 6000 8000 10000 12000 14000-3

-2

-1

0

1

2

3

4PCA (95%) + FDA

01

0 2000 4000 6000 8000 10000 12000 14000-5

0

5

10PCA

01

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 5/20

Page 6: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

How to extend to c ≥ 3 classes?

Let’s start by finding the most discriminatory direction.

For any v, the total within-class scatter in the v space is∑s̃2

i =∑

vT Siv = vT(∑

Si

)v = vT Swv

where the Si are defined in the same way as before.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 6/20

Page 7: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

To define the between-class scatter in the v space, we need to introduce

• the global center of the training data

µ = 1n

∑xi = 1

n

∑niµi,

• and its projection onto v:

µ̃ = vT µ = 1n

∑yi = 1

n

∑niµ̃i

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 7/20

Page 8: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

The between-class scatter in the v space is defined as∑i

ni(µ̃i − µ̃)2 =∑

i

ni vT (µi − µ)(µi − µ)T v

= vT

(∑i

ni(µi − µ)(µi − µ)T

)v

= vT Sbv.

We have thus arrived at the same kind of problem

maxv:∥v∥=1

vT SbvvT Swv ←−

∑ni(µ̃i − µ̃)2∑

s̃2i

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 8/20

Page 9: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

The solution is given by the largest eigenvector of S−1w Sb:

S−1w Sbv = λ1v.

It is also a generalized eigenvector:

Sbv = λ1Swv.

However, the formula v∗ = S−1w (µ1 − µ2) is no longer valid.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 9/20

Page 10: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 10/20

Page 11: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

The connection to 2-class FDA

Proposition. When c = 2, we have∑i

ni(µ̃i − µ̃)2 = n1n2

n(µ̃1 − µ̃2)2

andSb = n1n2

n(µ2 − µ1)(µ2 − µ1)T

.

This implies that the criterion∑

ini(µ̃i−µ̃)2∑

is̃2

i

is a generalization of that of thetwo-class FDA.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 11/20

Page 12: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Proof : We prove the first identity below:

∑i

ni(µ̃i − µ̃)2 = n1

(µ̃1 −

n1µ̃1 + n2µ̃2

n

)2

+ n2

(µ̃2 −

n1µ̃1 + n2µ̃2

n

)2

= n1n22

n2 (µ̃1 − µ̃2)2 + n2n21

n2 (µ̃2 − µ̃1)2

= n1n2

n(µ̃2 − µ̃1)2.

The proof of the second identity is very similar:

Sb =∑

ni(µi − µ)(µi − µ)T = · · ·

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 12/20

Page 13: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

How many discriminatory directions can/shouldwe use?

The answer is at most c− 1.

The discriminatory directions all satisfy the equation

S−1w Sbv = λv.

with the corresponding eigenvalues representing the “magnitudes” of separation.

Therefore, we only need to count the number of nonzero eigenvectors.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 13/20

Page 14: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

The within-class scatter matrix Sw is assumed to be nonsingular. However, thebetween-class scatter matrix Sb is of low rank:

Sb =∑

ni(µi − µ)(µi − µ)T

= [√

n1(µ1 − µ) · · ·√

nc(µc − µ)] ·

n1(µ1 − µ)T

...√

nc(µc − µ)T

Observe that the columns of the left matrix are linearly dependent:

√n1 ·√

n1(µ1 − µ) + · · ·+√

nc ·√

nc(µc − µ) = 0

and thus the column rank is at most c− 1.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 14/20

Page 15: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Multiclass FDA: A summaryInput: c training classes

Output: At most c− 1 discriminatory directions

Steps:

1. Form Sw =∑

i

∑x∈Class i(x− µi)(x− µi)T and

Sb =∑

i ni(µi − µ)(µi − µ)T .

2. Solve the eigenvalue problem S−1w Sbv = λv

3. Return all nonzero eigenvectors v1, . . . , vk (k ≤ c−1) in decreasing order.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 15/20

Page 16: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Multiclass FDA Illustration

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 16/20

Page 17: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

An important practical issue

In the cases of high dimensional data, the within-class scatter matrix Sw ∈ Rd×d

is often singular due to lack of observations (in certain dimensions).

Two common fixes:

• Apply PCA before FDA.

• Regularize Sw to have S′w = Sw + βId

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 17/20

Page 18: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Experiment (3 digits)MNIST handwritten digits 0, 1, and 2

-5 -4 -3 -2 -1 0 1 2 3-4

-3

-2

-1

0

1

2

3PCA (95%) + FDA

012

-4 -2 0 2 4 6 8

-4

-3

-2

-1

0

1

2

3

4

5

6

PCA

012

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 18/20

Page 19: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

Comparison between PCA and FDAPCA FDA

Use labels? no (unspervised) yes (supervised)Criterion variance discriminatoryLinear separation? yes yesNoninear separation? no no#Dimensions any ≤ c− 1Solution SVD eigenvalue problem

Remark. In the case of nonlinear separation, PCA (applied conservatively) oftenworks better than FDA as the latter can only find at most c−1 directions (whichare insufficient to preserve all the separation in the training data).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 19/20

Page 20: LEC 3: Fisher Discriminant Analysis (FDA) - SJSUgchen/Math285S16/lec3fda.pdf · LEC 3: Fisher Discriminant Analysis (FDA) – A Supervised Dimensionality Reduction Approach Dr. Guangliang

Fisher Discriminant Analysis (FDA)

HW2b (due Friday, March 4)First apply PCA 95% + FDA to all 10 classes of the MNIST digits and then dothe following.

4 Apply the plain kNN classifier to the reduced data with k = 1, . . . , 10 anddisplay the test errors curve. Compare with that of PCA 50 + kNN (foreach k). What is your conclusion?

5 Repeat Question 4 with local kmeans instead of kNN (everything elsebeing the same).

Note. Be sure to project the test data onto the same PCA 95% and FDA baseslearned on training data, in the same order!

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 20/20