L i near discriminant analysis (LDA)

Linear discriminant analysis(LDA)

Katarina Berta

[email protected]

[email protected]

Introduction

Fisher’s Linear Discriminant Analysis

Paper from 1936. (link)

Statistical technique for classification

LDA = two classes

MDA = multiple classes

Used in statistics, pattern recognition, machine learning

2/15

http://digital.library.adelaide.edu.au/dspace/bitstream/2440/15227/1/138.pdf

Purpose

Discriminant Analysis classifies objects in two or more groups according to linear combination of features

Feature selection Which set of features can best determine

group membership of the object? dimension reduction

ClassificationWhat is the classification rule or model to best

separate those groups?3/15

Method (1)

2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.80

1

2

3

4

5

6

7

8

9

Passed

Not passed

Good separationBad separation

4/15

Method (2)

Maximize the between-class scatterDifference of mean values (m1-m2)

Minimize the within-class scatterCovariance

Min Min

Max 5/15

Formula

Σy = 0 = Σy = 1 = Σ equal covarinaces

Bayes' theorem

Idea:x – object

i, j – classes, groups

Derivation:probability density functions

-normaly distributet-

QDA - quadratic discriminant analysisMean value Covarinace

FLD

6/24

Example

Curvature Diameter Quality Control Result

2,95 6,63 Passed

2,53 7,79 Passed

3,57 5,65 Passed

3,16 5,47 Passed

2,58 4,46 Not Passed



Factory for high quality chip rings Training set

7/15

Normalization of data

X1 X2

2,888 5,676

X1 X2 class

2,95 6,63 1

2,53 7,79 1

3,57 5,65 1

3,16 5,47 1

2,58 4,46 0

2,16 6,22 0

3,27 3,52 0

X1o X2o class

0,060 0,951 1

-0,357 2,109 1

0,679 -0,025 1

0,269 -0,209 1

-0,305 -1,218 0

-0,732 0,547 0

0,386 -2,155 0

Training data Mean corrected data

Avrage

8/15

Covarinace

0,166 -0,192

-0,192 1,349

0,259 -0,286

-0,286 2,142

Covarinace for class i

Covarinace class 1 – C1 Covarinace class 2 – C2

One entry of covarinace matrix - C

0,206 -0,233

-0,233 1,689covarinace matrix - C

0,259 -0,286

-0,286 2,142Inverse covarinace matrix C - S

9/15

Mean valuesN P(i) m(X1) m(X2)

Class 1 4 0,571 3,05 6,38 m1Class 2 3 0,429 2,67 4,73 m2Sum 7 5,72 11,12 m1+m2

0,38

1,65

m1-m2

3,487916

1,456612

W= S*(m1-m2)

W0= ln[P(1)\P(2)]-1\2*(m1+m2) = -17,7856

N – number of objects

P(i) – prior probability

m1 – mean value matrix of class 1 (m(x1), m(x2))

m2 – mean value matrix of class 2 (m(x1), m(x2))

0,259 -0,286

-0,286 2,142

S- inverse covariance

* =

10/15

ResaultX1 X2 score class

2,95 6,63 2,149 1

2,53 7,79 2,380 1

3,57 5,65 2,887 1

3,16 5,47 1,189 1

2,58 4,46 -2,285 0

2,16 6,22 -1,203 0

3,27 3,52 -1,240 0

score= X*W + W0 X1 X22,95 6,632,53 7,793,57 5,653,16 5,472,58 4,462,16 6,223,27 3,52

3,487916

1,456612* =W0 +

score

2,1492,3802,8871,189

-2,285-1,203-1,240

1

2

3

4

-3.000 -2.000 -1.000 0.000 1.000 2.000 3.000 4.000

Not Passed

Passed

11/15

Prediction

New chip: curvature = 2.81, diameter = 5.46

Predicition: will not pass

Prediction correct!

score= X*W + W0 W= S*(m1-m2)

score= -0,036

If (score>0) then class1 else class2

score= -0,036 => class21

2

3

4

-3.000 -2.000 -1.000 0.000 1.000 2.000 3.000 4.000

Not Passed

Passed

12/15

Pros & Cons

Cons

Old algorithm

Newer algorithms - much better predicition

Pros

Simple

Fast and portable

Still beats some algorithms (logistic regression) when its assumptions are met

Good to use when begining a project13/15

Conclusion

FisherFace one of the best algorithms for face recognition

Often used for dimension reduction

Basis for newer algorithms

Good for beginig of data mining projects

Thoug old, still worth trying

14/15

Thank you for your attention!

Questions?

15

L i near discriminant analysis (LDA)

Documents

Transcript of L i near discriminant analysis (LDA)