Linear Classifier

27
Linear Classifier Team teaching

description

Linear Classifier. Team teaching. Linear Methods for Classification. Lecture Notes for CMPUT 466/551 Nilanjan Ray. Linear Classification. What is meant by linear classification? The decision boundaries in the in the feature (input) space is linear Should the regions be contiguous?. R 1. - PowerPoint PPT Presentation

Transcript of Linear Classifier

Page 1: Linear Classifier

Linear Classifier

Team teaching

Page 2: Linear Classifier

2

Linear Methods for Classification

Lecture Notes for CMPUT 466/551

Nilanjan Ray

Page 3: Linear Classifier

3

Linear Classification

• What is meant by linear classification?– The decision boundaries in the in the feature

(input) space is linear• Should the regions be contiguous?

R1 R2

R3R4

X1

X2

Piecewise linear decision boundaries in 2D input space

Page 4: Linear Classifier

4

Linear Classification…

• There is a discriminant function k(x) for each

class k

• Classification rule:

• In higher dimensional space the decision boundaries are piecewise hyperplanar

• Remember that 0-1 loss function led to the classification rule:

• So, can serve as k(x)

)}(maxarg:{ xkxR jj

k

)}|(maxarg:{ xXjGPkxRj

k

)|( XkGP

Page 5: Linear Classifier

5

Linear Classification…

• All we require here is the class boundaries {x:k(x) = j(x)} be linear for every (k, j) pair

• One can achieve this if k(x) themselves are linear or any monotone transform of k(x) is linear– An example:

xxXGP

xXGP

xxXGP

x

xxXGP

T

T

T

T

0

0

0

0

])|2(

)|1(log[

)exp(1

1)|2(

)exp(1

)exp()|1(

Linear

So that

Page 6: Linear Classifier

6

Linear Discriminant Analysis

K

lll

kk

xf

xfxXkG

1

)(

)()|Pr(

Essentially minimum error Bayes’ classifier

Assumes that the conditional class densities are (multivariate) Gaussian

Assumes equal covariance for every class

Posterior probability

k is the prior probability for class k

fk(x) is class conditional density or likelihood density

Application ofBayes rule

))()(2

1exp(

||)2(

1)( 1

2/12/ kT

kpk xxxf

ΣΣ

Page 7: Linear Classifier

7

LDA…

)2

1(log)

2

1(log

loglog)|Pr(

)|Pr(log

1111l

Tll

Tlk

Tkk

Tk

l

k

l

k

xx

f

f

xXlG

xXkG

ΣΣΣΣ

)(xl)(xk

)(maxarg)(ˆ xxG kk

)|Pr(maxarg)(ˆ xXkGxGk

Classification rule:

is equivalent to:

The good old Bayes classifier!

Page 8: Linear Classifier

8

LDA…

kkg ik Nxi

NNkk /ˆ

)/()ˆ)(ˆ(ˆ1

KNxxK

k g

Tkiki

i

Σ

Training data utilized to estimate

Prior probabilities:

Means:

Covariance matrix:

When are we going to use the training data?

Nixg ii :1),,( Total N input-output pairs Nk number of pairs in class k Total number of classes: K

Page 9: Linear Classifier

9

LDA: Example

LDA was able to avoid masking here

Page 10: Linear Classifier

Study case

• Factory “ABC” produces very expensive and high quality chip rings that their qualities are measured in term of curvature and diameter. Result of quality control by experts is given in the table below.

Page 11: Linear Classifier

Curvature Diameter Quality Control Result

2.95 6.63 Passed

2.53 7.79 Passed

3.57 5.65 Passed

3.57 5.45 Passed

3.16 4.46 Not passed

2.58 6.22 Not passed

2.16 3.52 Not passed

Page 12: Linear Classifier

• As a consultant to the factory, you get a task to set up the criteria for automatic quality control. Then, the manager of the factory also wants to test your criteria upon new type of chip rings that even the human experts are argued to each other. The new chip rings have curvature 2.81 and diameter 5.46.

• Can you solve this problem by employing Discriminant Analysis?

Page 13: Linear Classifier

Solutions

• When we plot the features, we can see that the data is linearly separable. We can draw a line to separate the two groups. The problem is to find the line and to rotate the features in such a way to maximize the distance between groups and to minimize distance within group.

Page 14: Linear Classifier

• X = features (or independent variables) of all data. Each row (denoted by ) represents one object; each column stands for one feature.

• Y = group of the object (or dependent variable) of all data. Each row represents one object and it has only one column.

Page 15: Linear Classifier

x= y=

2.95

2.35

3.57

3.16

2.58

2.16

3.27

6.63

7.79

5.65

5.47

4.46

6.22

3.52

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

1

1

1

1

2

2

2

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

Page 16: Linear Classifier

• Xk = data of row k, for example x3 =

• g=number of gropus in y, in our example, g=2• Xi = features data for group i . Each row

represents one object; each column stands for one feature. We separate x into several groups based on the number of category in y.

3.57 5.65[ ]

Page 17: Linear Classifier

x1= x2=

2.95

2.53

3.57

3.16

6.63

7.79

5.65

5.47

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

2.58

2.16

3.27

4.46

6.22

3.52

⎢ ⎢ ⎢

⎥ ⎥ ⎥

Page 18: Linear Classifier

• μi = mean of features in group i, which is average of xi

• μ1 = , μ2 =

• μ = global mean vector, that is mean of the whole data set.

• In this example, μ =

2.67 4.73[ ]

3.05 6.38[ ]

2.88 5.676[ ]

Page 19: Linear Classifier

• = mean corrected data, that is the features data for group i, xi , minus the global mean vector μ

= =

−0.305

−0.732

0.386

−1.218

0.547

−2.155

⎢ ⎢ ⎢

⎥ ⎥ ⎥

0.060

−0.357

0.679

0.269

0.951

2.109

−0.025

−0.209

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

x i0

x10

x20

Page 20: Linear Classifier

Covariance matrix of group i =

C1 = C2 =

c i =(x i

0)T x i0

ni

0.166

−0.192

−0.192

1.349

⎣ ⎢

⎦ ⎥

0.259

−0.286

−0.286

2.142

⎣ ⎢

⎦ ⎥

Page 21: Linear Classifier

= pooled within group covariance matrix. It is calculated for each entry in the matrix. In our example, 4/7*0.166 + 3/7*0.259=0.206 , 4/7*(-0.192)+3/7*(-0.286)=-0.233 and 4/7*1.349+3/7*2.142=1.689 , therefore

C(r,s) =1

nnic i(r,s)

i=1

g

Page 22: Linear Classifier

C =

The inverse of covariance matrix is :

C-1 = €

0.206

−0.233

−0.233

1.689

⎣ ⎢

⎦ ⎥

5.745

0.791

0.791

0.701

⎣ ⎢

⎦ ⎥

Page 23: Linear Classifier

• P = prior probability vector (each row represent prior probability of group ). If we do not know the prior probability, we just assume it is equal to total sample of each group divided by the total samples, that is

p = =

0.571

0.429

⎣ ⎢

⎦ ⎥

4 /7

3/7

⎣ ⎢

⎦ ⎥

Page 24: Linear Classifier

• discriminant function

• We should assign object k to group i that has maximum fi

f i = μ iC−1xk

T −1

2μC−1μ i

T + ln(pi)

Page 25: Linear Classifier
Page 26: Linear Classifier

LDA

Page 27: Linear Classifier

Tugas

• Gunakan excel/matlab/tools lain untuk mengklasifikasi data set breast tissue secara :

• Naïve Bayes• LDA

Presentasikan minggu depan