Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation...

27
Assessing and Comparing Classification Algorithms roduction sampling and Cross Validation asuring Error terval Estimation and Hypothesis Testing sessing and Comparing Performance

Transcript of Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation...

Page 1: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Assessing and Comparing Classification Algorithms

•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance

Page 2: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)2

Introduction

Questions: Assessment of the expected error of a learning

algorithm: Is the error rate of 1-NN less than 2%?

Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?

Training/validation/test sets Resampling methods: K-fold cross-validation

Page 3: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3

Algorithm Preference

Criteria (Application-dependent):Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability

Cost-sensitive learning

Page 4: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Assessing and Comparing Classification Algorithms

•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance

Page 5: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5

Resampling and K-Fold Cross-Validation The need for multiple training/validation sets

{Xi,Vi}i: Training/validation sets of fold i

K-fold cross-validation: Divide X into k, Xi,i=1,...,K

Ti share K-2 parts121

31222

32111

KKKK

K

K

XXXTXV

XXXTXV

XXXTXV

Page 6: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6

1510

2510

259

159

124

224

223

123

112

212

211

111

XVXT

XVXT

XVXT

XVXT

XVXT

XVXT

5×2 Cross-Validation

5 times 2 fold cross-validation (Dietterich, 1998)

Page 7: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7

Bootstrapping

Draw instances from a dataset with replacement Prob that we do not pick an instance after N

draws

that is, only 36.8% is new!

36801

1 1 .eN

N

Page 8: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Assessing and Comparing Classification Algorithms

•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance

Page 9: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9

Measuring Error

Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives

= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found

= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

Page 10: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Methods for Performance Evaluation How to obtain a reliable estimate of performance?

Performance of a model may depend on other factors besides the learning algorithm: Class distribution Cost of misclassification Size of training and test sets

Page 11: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Learning Curve Learning curve shows

how accuracy changes with varying sample size

Requires a sampling schedule for creating learning curve: Arithmetic

sampling(Langley, et al)

Geometric sampling(Provost et al)

Effect of small sample size:

- Bias in the estimate

- Variance of estimate

Page 12: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

ROC (Receiver Operating Characteristic)

Developed in 1950s for signal detection theory to analyze noisy signals Characterize the trade-off between positive hits and

false alarms ROC curve plots TP (on the y-axis) against FP (on

the x-axis) Performance of each classifier represented as a

point on the ROC curve changing the threshold of algorithm, sample distribution

or cost matrix changes the location of the point

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

http://www.childrensmercy.org/stats/ask/roc.asp

Page 13: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

ROC Curve

At threshold t:

TP=0.5, FN=0.5, FP=0.12, FN=0.88

- 1-dimensional data set containing 2 classes (positive and negative)

- any points located at x > t is classified as positive

Page 14: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

ROC Curve(TP,FP): (0,0): declare everything

to be negative class (1,1): declare everything

to be positive class (1,0): ideal

Diagonal line: Random guessing Below diagonal line: prediction is opposite of the

true class

Page 15: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Using ROC for Model Comparison No model

consistently outperform the other M1 is better for

small FPR M2 is better for

large FPR

Area Under the ROC curve Ideal:

Area = 1 Random guess:

Area = 0.5

Page 16: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

How to Construct an ROC curveInstance P(+|A) True Class

1 0.95 +

2 0.93 +

3 0.87 -

4 0.85 -

5 0.85 -

6 0.85 +

7 0.76 -

8 0.53 +

9 0.43 -

10 0.25 +

• Use classifier that produces posterior probability for each test instance P(+|A)

• Sort the instances according to P(+|A) in decreasing order

• Apply threshold at each unique value of P(+|A)

• Count the number of TP, FP, TN, FN at each threshold

• TP rate, TPR = TP/(TP+FN)

• FP rate, FPR = FP/(FP + TN)

Page 17: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

How to construct an ROC curveClass + - + - - - + - + +

P 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00

TP 5 4 4 3 3 3 3 2 2 1 0

FP 5 5 4 4 3 2 1 1 0 0 0

TN 0 0 1 1 2 3 4 4 5 5 5

FN 0 1 1 2 2 2 2 3 3 4 5

TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0

FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0

Threshold >=

ROC Curve:

+ + - + - - - + - +

+

Reverse of above order

Page 18: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Assessing and Comparing Classification Algorithms

•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance

Page 19: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19

Interval Estimation

X = { xt }t where xt ~ N ( μ, σ2) m ~ N ( μ, σ2/N)

100(1- α) percentconfidence interval

1

950961961

950961961

22N

zmN

zmP

.N

.mN

.mP

..m

N.P

~m

N

//

Z

Page 20: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)20

When σ2 is not known:

1

950641

950641

NzmP

.N

.mP

..m

NP

1

1

1212

1

22

N

Stm

N

StmP

t~SmN

N/mxS

N,/N,/

Nt

t

Page 21: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)21

Hypothesis Testing

Reject a null hypothesis if not supported by the sample with enough confidence

X = { xt }t where xt ~ N ( μ, σ2)

H0: μ = μ0 vs. H1: μ ≠ μ0

Accept H0 with level of significance α if μ0 is in the 100(1- α) confidence interval

Two-sided test

220

// z,zmN

Page 22: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)22

One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0

Accept if

Variance unknown: Use t, instead of z Accept H0: μ = μ0 if

z,

mN 0

12120

N,/N,/ t,tS

mN

Page 23: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Assessing and Comparing Classification Algorithms

•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance

Page 24: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)24

Assessing Error: H0: p ≤ p0 vs. H1: p > p0 Single training/validation set: Binomial Test

If error prob is p0, prob that there are e errors or less in N validation trials is

1- α

Accept if this prob is less than 1- α

N=100, e=20

jNjje

j

ppj

NeXP

00

1

1

Page 25: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)25

Normal Approximation to the Binomial Number of errors X is approx N with mean Np0

and var Np0(1-p0)

Accept if this prob for X = e is less than z1-α

1- α

Z~

pNp

NpX

00

0

1

Page 26: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)26

Paired t Test

Multiple training/validation sets xt

i = 1 if instance t misclassified on fold i Error rate of fold i:

With m and s2 average and var of pi

we accept p0 or less error if

is less than tα,K-1

N

xp

N

t

ti

i 1

1

0

Kt~

SpmK

Page 27: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)27

K-Fold CV Paired t Test

Use K-fold cv to get K training/validation folds pi

1, pi2: Errors of classifiers 1 and 2 on fold i

pi = pi1 – pi

2 : Paired difference on fold i

The null hypothesis is whether pi has mean 0

12121

1

2

21

00

inif Accept0

1

0 vs.0

K,/K,/K

K

i i

K

i i

t,tt~s

mKsmK

K

mps

K

pm

:H:H