Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The...

47
A Tutorial on Data Reduction Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag University of Louisville, CVIP Lab September 2009

Transcript of Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The...

Page 1: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

A Tutorial on Data Reduction

Linear Discriminant Analysis (LDA)

Shireen Elhabian and Aly A. FaragUniversity of Louisville, CVIP Lab

September 2009

Page 2: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Outline• LDA objective

• Recall … PCA

• Now … LDA

• LDA … Two Classes

– Counter example

• LDA … C Classes

– Illustrative Example

• LDA vs PCA Example

• Limitations of LDA

Page 3: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA Objective

• The objective of LDA is to performdimensionality reduction …

– So what, PCA does this L…

• However, we want to preserve as much of theclass discriminatory information as possible.

– OK, that’s new, let dwell deeper J …

Page 4: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Recall … PCA• In PCA, the main idea to re-express the available dataset to

extract the relevant information by reducing the redundancyand minimize the noise.

• We didn’t care about whether this dataset represent features from oneor more classes, i.e. the discrimination power was not taken intoconsideration while we were talking about PCA.

• In PCA, we had a dataset matrix X with dimensions mxn, wherecolumns represent different data samples.

• We first started by subtracting the mean to have a zero mean dataset,then we computed the covariance matrix Sx = XXT.

• Eigen values and eigen vectors were then computed for Sx. Hence thenew basis vectors are those eigen vectors with highest eigen values,where the number of those vectors was our choice.

• Thus, using the new basis, we can project the dataset onto a lessdimensional space with more powerful data representation.

n – feature vectors(data samples)

m-d

imen

sion

al d

ata

vect

or

Page 5: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Now … LDA• Consider a pattern classification problem, where we have C-

classes, e.g. seabass, tuna, salmon …

• Each class has Ni m-dimensional samples, where i = 1,2, …, C.

• Hence we have a set of m-dimensional samples {x1, x2,…, xNi}belong to class ωi.

• Stacking these samples from different classes into one big fatmatrix X such that each column represents one sample.

• We seek to obtain a transformation of X to Y throughprojecting the samples in X onto a hyperplane withdimension C-1.

• Let’s see what does this mean?

Page 6: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• Assume we have m-dimensional samples {x1,

x2,…, xN}, N1 of which belong to ω1 andN2 belong to ω2.

• We seek to obtain a scalar y by projectingthe samples x onto a line (C-1 space, C = 2).

– where w is the projection vectors used to project xto y.

• Of all the possible lines we would like toselect the one that maximizes theseparability of the scalars.

úúúú

û

ù

êêêê

ë

é

=

úúúú

û

ù

êêêê

ë

é

==

mm

T

w

w

wand

x

x

xwherexwy..

.

.11

The two classes are not well separated when projected onto this line

This line succeeded in separating the two classes and in the meantime reducing the dimensionality of our problem from two features (x1,x2) to only a scalar value y.

Page 7: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes

• In order to find a good projection vector, we need to define ameasure of separation between the projections.

• The mean vector of each class in x and y feature space is:

– i.e. projecting x to y will lead to projecting the mean of x to the mean ofy.

• We could then choose the distance between the projected meansas our objective function

i

i

ii

i

i

i

T

xi

T

x

T

iyixi

wxN

w

xwN

yN

andxN

µ

µµ

w

www

==

===

å

ååå

Î

ÎÎÎ

1

11~1

( )222 111

~~)( µµµµµµ -=-=-= TTT wwwwJ

Page 8: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes

• However, the distance between the projected means is not a verygood measure since it does not take into account the standarddeviation within the classes.

This axis has a larger distance between means

This axis yields better class separability

Page 9: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• The solution proposed by Fisher is to maximize a function that

represents the difference between the means, normalized by ameasure of the within-class variability, or the so-called scatter.

• For each class we define the scatter, an equivalent of thevariance, as; (sum of square differences between the projected samples and their classmean).

• measures the variability within class ωi after projecting it onthe y-space.

• Thus measures the variability within the twoclasses at hand after projection, hence it is called within-class scatterof the projected samples.

( )åÎ

-=iy

ii ysw

µ 22 ~~

2~is

22

21~~ ss +

Page 10: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• The Fisher linear discriminant is defined as

the linear function wTx that maximizes thecriterion function: (the distance between theprojected means normalized by the within-class scatter of the projected samples.

• Therefore, we will be looking for a projectionwhere examples from the same class areprojected very close to each other and, at thesame time, the projected means are as fartherapart as possible

22

21

2

2

~~~~

)( 1

sswJ

+

-=

µµ

Page 11: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes

• In order to find the optimum projection w*, we need to expressJ(w) as an explicit function of w.

• We will define a measure of the scatter in multivariate featurespace x which are denoted as scatter matrices;

• Where Si is the covariance matrix of class ωi, and Sw is called thewithin-class scatter matrix.

( )( )

21 SSS

xxS

w

Ti

xii

i

+=

--=åÎ

µµw

22

21

2

2

~~~~

)( 1

sswJ

+

-=

µµ

Page 12: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• Now, the scatter of the projection y can then be expressed as a function of

the scatter matrix in feature space x.

Where is the within-class scatter matrix of the projected samples y.

( ) ( )

( )( )

( )( )

( ) WWTTTT

iT

x

Tii

T

x

Tii

T

xi

TT

yii

SwSwwSSwwSwwSwss

wSwwxxw

wxxw

wxwys

i

i

ii

~~~

~~

212122

21

222

==+=+=+

=÷÷ø

öççè

æ--=

--=

-=-=

å

å

åå

Î

Î

ÎÎ

w

w

ww

µµ

µµ

µµ

WS~

22

21

2

2

~~~~

)( 1

sswJ

+

-=

µµ

Page 13: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• Similarly, the difference between the projected means (in y-space) can be

expressed in terms of the means in the original feature space (x-space).

• The matrix SB is called the between-class scatter of the original samples/featurevectors, while is the between-class scatter of the projected samples y.

• Since SB is the outer product of two vectors, its rank is at most one.

( ) ( )( )( )

BBT

S

TT

TT

SwSw

ww

ww

B

~

~~

2121

221

221

==

--=

-=-

!!! "!!! #$ µµµµ

µµµµ

BS~

22

21

2

2

~~~~

)( 1

sswJ

+

-=

µµ

Page 14: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes

• We can finally express the Fisher criterion in terms of SW and SBas:

• Hence J(w) is a measure of the difference between classmeans (encoded in the between-class scatter matrix)normalized by a measure of the within-class scatter matrix.

wSwwSw

sswJ

WT

BT

=+

-= 2

221

2

2

~~~~

)( 1µµ

Page 15: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes

• To find the maximum of J(w), we differentiate and equate tozero.

( ) ( ) ( ) ( )( ) ( )

0)(

0)(

0

:2

022

0

0)(

1 =-Þ

=-Þ

=÷÷ø

öççè

æ-÷÷

ø

öççè

æÞ

=-Þ

=-Þ

=÷÷ø

öççè

æ=

- wwJwSSwSwJwS

wSwSwwSwwS

wSwwSw

wSwbyDividingwSwSwwSwSw

wSwdwdwSwwSw

dwdwSw

wSwwSw

dwdwJ

dwd

BW

WB

WW

TB

T

BW

TW

T

WT

WBT

BWT

WT

BT

BT

WT

WT

BT

Page 16: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes• Solving the generalized eigen value problem

yields

• This is known as Fisher’s Linear Discriminant, although it is not adiscriminant but rather a specific choice of direction for the projectionof the data down to one dimension.

• Using the same notation as PCA, the solution will be the eigenvector(s) of

scalarwJwherewwSS BW ===- )(1 ll

( )211* maxarg)(maxarg µµ -=÷÷

ø

öççè

æ== -

WW

TB

T

wwS

wSwwSwwJw

BWX SSS 1-=

Page 17: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example• Compute the Linear Discriminant projection for the following two-

dimensional dataset.

– Samples for class ω1 : X1=(x1,x2)={(4,2),(2,4),(2,3),(3,6),(4,4)}

– Sample for class ω2 : X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)}

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

x1

x 2

Page 18: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• The classes mean are :

÷÷ø

öççè

æ=ú

û

ùêë

é÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ==

÷÷ø

öççè

æ=ú

û

ùêë

é÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ+÷÷ø

öççè

æ==

å

å

Î

Î

6.74.8

810

78

59

86

109

511

8.33

44

63

32

42

24

511

2

2

1

1

2

1

w

w

µ

µ

x

x

xN

xN

Page 19: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• Covariance matrix of the first class:

( )( )

÷÷ø

öççè

æ-

-=

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ=--=å

Î

2.225.025.01

8.33

44

8.33

63

8.33

32

8.33

42

8.33

24

222

22

1111

T

xxxS µµ

w

Page 20: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• Covariance matrix of the second class:

( )( )

÷÷ø

öççè

æ-

-=

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ+ú

û

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ=--= å

Î

3.305.005.03.2

6.74.8

810

6.74.8

78

6.74.8

59

6.74.8

86

6.74.8

109

222

22

2222

T

xxxS µµ

w

Page 21: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• Within-class scatter matrix:

÷÷ø

öççè

æ-

-=

÷÷ø

öççè

æ-

-+÷÷ø

öççè

æ-

-=+=

5.53.03.03.3

3.305.005.03.2

2.225.025.01

21 SSSw

Page 22: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• Between-class scatter matrix:

( )( )

( )

÷÷ø

öççè

æ=

--÷÷ø

öççè

æ--

=

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æúû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ=

--=

44.1452.2052.2016.29

8.34.58.34.5

6.74.8

8.33

6.74.8

8.33

2121T

TBS µµµµ

Page 23: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example• The LDA projection is then obtained as the solution of the generalized eigen

value problem

( )( )( )

2007.12,002007.1202007.1202339.4489.69794.22213.9

9794.22339.4489.62213.9

01001

44.1452.2052.2016.29

1827.00166.00166.03045.0

01001

44.1452.2052.2016.29

5.53.03.03.3

0

21

2

1

1

1

==Þ=-Þ=-Þ

=´---=

÷÷ø

öççè

æ-

=÷÷ø

öççè

æ-÷÷ø

öççè

æ÷÷ø

öççè

æÞ

=÷÷ø

öççè

æ-÷÷ø

öççè

æ÷÷ø

öççè

æ-

=-Þ

=

-

-

-

llllll

ll

ll

l

l

l

l

ISS

wwSS

BW

BW

Page 24: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

• Hence

• The optimal projection is the one that given maximum λ = J(w)

!

*21

2

12

2

11

4173.09088.0

8178.05755.0

;

2007.129794.22339.4489.62213.9

09794.22339.4489.62213.9

2

1

wwandw

Thus

ww

w

andww

w

=÷÷ø

öççè

æ=÷÷

ø

öççè

æ-=

÷÷ø

öççè

æ=÷÷

ø

öççè

æ

÷÷ø

öççè

æ=÷÷

ø

öççè

æ

"#"$%l

l

Page 25: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … Two Classes - Example

( )

÷÷ø

öççè

æ=

÷÷ø

öççè

æ--

÷÷ø

öççè

æ=

úû

ùêë

é÷÷ø

öççè

æ-÷÷ø

öççè

æ÷÷ø

öççè

æ-

-=-=

--

4173.09088.0

8.34.5

1827.00166.00166.03045.0

6.74.8

8.33

5.53.03.03.3 1

211* µµWSw

Or directly;

Page 26: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA - Projection

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

y

p(y|

wi)

Classes PDF : using the LDA projection vector with the other eigen value = 8.8818e-016

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

x1

x 2

LDA projection vector with the other eigen value = 8.8818e-016

The projection vector corresponding to the smallest eigen value

Using this vector leads to bad separability

between the two classes

Page 27: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA - Projection

0 5 10 150

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y

p(y|

wi)

Classes PDF : using the LDA projection vector with highest eigen value = 12.2007

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

x1

x 2

LDA projection vector with the highest eigen value = 12.2007

The projection vector corresponding to the highest eigen value

Using this vector leads to good separability

between the two classes

Page 28: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … C-Classes

• Now, we have C-classes instead of just two.

• We are now seeking (C-1) projections [y1, y2, …, yC-1] by meansof (C-1) projection vectors wi.

• wi can be arranged by columns into a projection matrix W =[w1|w2|…|wC-1] such that:

[ ]1211

1

1

11

1

1

|...||

.

.,

.

.

--´

-

´-´

=

úúúú

û

ù

êêêê

ë

é

=

úúúú

û

ù

êêêê

ë

é

=

CCm

C

C

m

m

wwwWandy

y

y

x

x

xwhere

xWyxwy TTii =Þ=

Page 29: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA … C-Classes

• If we have n-feature vectors, we can stack them into one matrixas follows;

[ ]1211

121

11

121

11

1

21

121

11

|...||.

....

.....

,

.........

.

--´

---

´-´

=úúúúú

û

ù

êêêêê

ë

é

=

úúúúú

û

ù

êêêêê

ë

é

=

CCm

nCCC

n

nC

nmmm

n

nm

wwwWandyyy

yyy

Y

xxx

xxx

Xwhere

XWY T=

Page 30: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA – C-Classes

• Recall the two classes case, thewithin-class scatter was computed as:

• This can be generalized in the C-classes case as:

21 SSSw +=

( )( )

å

å

å

Î

Î

=

=

--=

=

i

i

xii

Ti

xii

C

iiW

xN

and

xxSwhere

SS

w

w

µ

µµ

1

1

x1

x2µ1

µ2

µ3

Sw2

Example of two-dimensional features (m = 2), with three classes C = 3.

Ni : number of data samples in class ωi.

Page 31: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA – C-Classes• Recall the two classes case, the between-

class scatter was computed as:

• For C-classes case, we will measure thebetween-class scatter with respect to themean of all classes as follows:

( )( )

å

åå

å

Î

""

=

=

==

--=

ixii

xii

x

C

i

TiiiB

xN

and

NN

xN

where

NS

w

µ

µµ

µµµµ

1

111

( )( )TBS 2121 µµµµ --=

x1

x2µ1

µ2

µ3

Sw2

µ

Example of two-dimensional features (m = 2), with three classes C = 3.

Ni : number of data samples in class ωi.

N: number of all data .

Page 32: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA – C-Classes

• Similarly,

– We can define the mean vectors for the projected samples y as:

– While the scatter matrices for the projected samples y will be:

åå"Î

==yyi

i yN

andyN

i

1~1~ µµw

( )( )ååå= Î=

--==C

i

Ti

yi

C

iiW yySS

i11

~~~~ µµw

( )( )å=

--=C

i

TiiiB NS

1

~~~~~ µµµµ

Page 33: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA – C-Classes

• Recall in two-classes case, we have expressed the scatter matrices of theprojected samples in terms of those of the original samples as:

This still hold in C-classes case.

• Recall that we are looking for a projection that maximizes the ratio ofbetween-class to within-class scatter.

• Since the projection is no longer a scalar (it has C-1 dimensions), we then usethe determinant of the scatter matrices to obtain a scalar objective function:

• And we will seek the projection W* that maximizes this ratio.

WSWS

WSWS

BT

B

WT

W

=

=~

~

WSW

WSW

S

SWJ

WT

BT

W

B== ~

~)(

Page 34: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

LDA – C-Classes• To find the maximum of J(W), we differentiate with respect to W and equate

to zero.

• Recall in two-classes case, we solved the eigen value problem.

• For C-classes case, we have C-1 projection vectors, hence the eigen valueproblem can be generalized to the C-classes case as:

• Thus, It can be shown that the optimal projection matrix W* is the one whosecolumns are the eigenvectors corresponding to the largest eigen values of thefollowing generalized eigen value problem:

1,...2,1)(1 -====- CiandscalarwJwherewwSS iiiiiBW ll

scalarwJwherewwSS BW ===- )(1 ll

[ ]*1

*2

*1

**

**1

|...||)( -

-

===

=

C

BW

wwwWandscalarWJwhereWWSSl

l

Page 35: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Illustration – 3 Classes• Let’s generate a dataset for each

class to simulate the three classesshown

• For each class do the following,

– Use the random number generatorto generate a uniform stream of500 samples that follows U(0,1).

– Using the Box-Muller approach,convert the generated uniformstream to N(0,1).

x1

x2µ1

µ2

µ3µ

– Then use the method of eigen valuesand eigen vectors to manipulate thestandard normal to have the requiredmean vector and covariance matrix .

– Estimate the mean and covariancematrix of the resulted dataset.

Page 36: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

• By visual inspection of the figure,classes parameters (means andcovariance matrices can be given asfollows:

Dataset Generation

x1

x2µ1

µ2

µ3µ

÷÷ø

öççè

æ=

÷÷ø

öççè

æ=

÷÷ø

öççè

æ-

-=

úû

ùêë

é+=ú

û

ùêë

é--

+=úû

ùêë

é-+=

úû

ùêë

é=

5.2315.3

4004

3315

57

,5.35.2

,73

55

mean Overall

3

2

1

321

S

S

S

µµµµµµ

µ

Zero covariance to lead to data samples distributed horizontally.

Positive covariance to lead to data samples distributed along the y = x line.

Negative covariance to lead to data samples distributed along the y = -x line.

Page 37: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

In Matlab J

Page 38: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

It’s Working … J

-5 0 5 10 15 20-5

0

5

10

15

20

X1 - the first feature

X2 -

the

seco

nd fe

atur

e

x1

x2µ1

µ2

µ3µ

Page 39: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Computing LDA Projection Vectors

( )( )

å

åå

å

Î

""

=

=

==

--=

ixii

xii

x

C

i

TiiiB

xN

and

NN

xN

where

NS

w

µ

µµ

µµµµ

1

111

( )( )

å

å

å

Î

Î

=

=

--=

=

i

i

xii

Ti

xii

C

iiW

xN

and

xxSwhere

SS

w

w

µ

µµ

1

1

Recall …

BW SS 1-

Page 40: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Let’s visualize the projection vectors W

-15 -10 -5 0 5 10 15 20 25-10

-5

0

5

10

15

20

25

X1 - the first feature

X2 -

the

seco

nd fe

atur

e

Page 41: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Projection … y = WTxAlong first projection vector

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y

p(y|

wi)

Classes PDF : using the first projection vector with eigen value = 4508.2089

Page 42: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Projection … y = WTxAlong second projection vector

-10 -5 0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y

p(y|

wi)

Classes PDF : using the second projection vector with eigen value = 1878.8511

Page 43: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Which is Better?!!!• Apparently, the projection vector that has the highest eigen

value provides higher discrimination power between classes

-10 -5 0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y

p(y|

wi)

Classes PDF : using the second projection vector with eigen value = 1878.8511

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y

p(y|

wi)

Classes PDF : using the first projection vector with eigen value = 4508.2089

Page 44: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

PCA vs LDA

Page 45: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Limitations of LDA L• LDA produces at most C-1 feature projections

– If the classification error estimates establish that more features are needed, some other method must beemployed to provide those additional features

• LDA is a parametric method since it assumes unimodal Gaussian likelihoods

– If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve anycomplex structure of the data, which may be needed for classification.

Page 46: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Limitations of LDA L• LDA will fail when the discriminatory information is not in the mean but

rather in the variance of the data

Page 47: Linear Discriminant Analysis (LDA)shireen/pdfs/tutorials/Elhabian_LDA09.pdf · LDA Objective •The objective of LDA is to perform dimensionalityreduction… –Sowhat,PCAdoesthisL…

Thank You