Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...

20
Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( J.-S. Roger Jang ( 張張張 張張張 ) CSIE Dept., National Taiwan Univ. CSIE Dept., National Taiwan Univ. http://mirlab.org/jang http://mirlab.org/jang [email protected] [email protected] Machine Learning Performance Evaluation

Transcript of Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...

Page 1: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Performance Evaluation:Recognition Rate EstimatePerformance Evaluation:

Recognition Rate Estimate

J.-S. Roger Jang ( J.-S. Roger Jang ( 張智星 張智星 ))

CSIE Dept., National Taiwan Univ.CSIE Dept., National Taiwan Univ.http://mirlab.org/janghttp://mirlab.org/jang

[email protected]@mirlab.org

Machine Learning Performance Evaluation

Page 2: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

2

OutlineOutline

Performance indices of a given classifier/model• Accuracy (recognition rate)

• Computation load

Methods to estimate the recognition rate• Inside test

• One-sided holdout test

• Two-sided holdout test

• M-fold cross validation

• Leave-one-out cross validation

Quiz!

Page 3: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

3

Synonyms Synonyms

The following sets of synonyms will be use interchangeably

• Classifier, model

• Recognition rate, accuracy

Page 4: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

4

Performance IndicesPerformance Indices

Performance indices of a classifier• Recognition rate

- Requires an objective procedure to derive it

• Computation load

- Design-time computation

- Run-time computation

Our focus• Recognition rate and the procedures to derive it

• The estimated accuracy depends on

- Dataset

- Model (types and complexity)

Page 5: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

5

Methods for Deriving Recognition ratesMethods for Deriving Recognition ratesMethods to derive the recognition rates

• Inside test (resubstitution recog. rate)

• One-sided holdout test

• Two-sided holdout test

• M-fold cross validation

• Leave-one-out cross validation

Data partitioning• Training set

• Training and test sets

• Training, validating, and test sets

Page 6: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

6

Inside Test (1/2)Inside Test (1/2)

Dataset partitioning• Use the whole dataset for training & evaluation

Recognition rate• Inside-test recognition rate

• Resubstitution accuracy

Training dataset whole set D = ;

( ) : Model identified by the training data

1RR 1 sgn ( )

| |

D Di i

D

D Di D i

i

x y

F

y F xD

Page 7: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

7

Inside Test (2/2)Inside Test (2/2)

Characteristics• Too optimistic since RR tends to be higher

• For instance, 1-NNC always has an RR of 100%!

• Can be used as the upper bound of the true RR.

Potential reasons for low inside-test RR:• Bad features of the dataset

• Bad method for model construction, such as

- Bad results from neural network training

- Bad results from k-means clustering

Page 8: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

8

One-side Holdout Test (1/2)One-side Holdout Test (1/2)

Dataset partitioning• Training set for model construction

• Test set for performance evaluation

Recognition rate• Inside-test RR

• Outside-test RR

Training set A = ; , test set B = ;

( ) : Model identified using set A

1Outside-test RR 1 sgn ( )

| |

A A B Bi i i i

A

B Bi A i

i

x y x y

F

y F xB

Page 9: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

9

One-side Holdout Test (2/2)One-side Holdout Test (2/2)

Characteristics• Highly affected by data partitioning

• Usually Adopted when design-time computation load is high

Page 10: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

10

Two-sided Holdout Test (1/3)Two-sided Holdout Test (1/3)

Dataset partitioning• Training set for model construction

• Test set for performance evaluation

• Role reversal

Data set A = ; , data Set B = ;

( ) : Model identified using data set A

( ) : Model identified using data set B

1Outside-test RR 1 sgn ( ) sgn ( )

| | | |

Inside-te

A A B Bi i i i

A

B

B B A Ai A i i B i

i i

x y x y

F

F

y F x y F xA B

1st RR 1 sgn ( ) sgn ( )

| | | |A A B Bi A i i B i

i i

y F x y F xA B

Page 11: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

11

Two-sided Holdout Test (2/3)Two-sided Holdout Test (2/3)

Two-sided holdout test (used in GMDH)

Data set A

Data set B

Model AModel A

constructionconstruction

RRAevaluationevaluation

Model BModel B RRB

constructionconstruction

evaluationevaluation

Outside-test RR = (RRA + RRB)/2

Page 12: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

12

Two-sided Holdout Test (3/3)Two-sided Holdout Test (3/3)

Characteristics• Better usage of the dataset

• Still highly affected by the partitioning

• Suitable for models/classifiers with high design-time computation load

Page 13: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

13

M-fold Cross Validation (1/3)M-fold Cross Validation (1/3)

Data partitioning• Partition the dataset into m fold

• One fold for test, the other folds for training

• Repeat m times

1 2 3

1 2 3

1 1;

Data set S=S S S S

S S S S

S S ,

S .

Cross-validation RR 1 sgn ( ) / | |i

i i i

m

m

i j

i

m m

i S S i jj jx y S

i j

Class distribution within each should be as close as possible to S

y F x S

Page 14: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

14

M-fold Cross Validation (2/3)M-fold Cross Validation (2/3)

Model Model kk

1S

...

...

...mdisjoint

sets

constructionconstruction

evaluationevaluation

2S

mS

kS kRR

m

RRRR

m

kk

CV

1

Outside test

Page 15: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

15

M-fold Cross Validation (3/3)M-fold Cross Validation (3/3)

Characteristics• When m=2 Two-sided holdout test

• When m=n Leave-one-out cross validation

• The value of m depends on the computation load imposed by the selected model/classifier.

Page 16: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

16

Leave-one-out Cross Validation (1/3)Leave-one-out Cross Validation (1/3)

Data partitioning• When m=n and Si=(xi, yi)

1 2 3

1 2 3

1 1;

Data set S=S S S S

S S S S

S S ,

S .

Cross-validation RR 1 sgn ( ) / | |i

i i i

m

m

i j

i

m m

i S S i jj jx y S

i j

Class distribution within each should be as close as possible to S

y F x S

Page 17: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

17

Leave-one-out Cross Validation (2/3)Leave-one-out Cross Validation (2/3)

Leave-one-out CV

Model kModel k

11 ; yx

...

...

...ni/o

pairs

constructionconstruction

evaluationevaluation

22 ; yx

nn yx ;

kk yx ;

n

RRRR

n

kk

LOOCV

1

kRR

0% or 100%!

Outside test

Page 18: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

18

Leave-one-out Cross Validation (3/3)Leave-one-out Cross Validation (3/3)

General method for LOOCV• Perform model construction (as a blackbox) n

times Slow!

To speed up the computation LOOCV • Construct a common part that will be used

repeatedly, such as

- Global mean and covariance for QC

• More info of cross-validation on Wikipedia

Page 19: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

19

Applications and Misuse of CVApplications and Misuse of CV

Applications of CV• Input (feature) selection

• Model complexity determination

• Performance comparison among different models

Misuse of CV• Do not try to boost validation RR too much, or you

are running the risk of indirectly training on the left-out data!

Page 20: Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. mirlab.org Machine.

Machine Learning Performance Evaluation

20

QuizQuiz

Explain the following terms for performance evaluation

• Inside test

• One-sided holdout test

• Two-sided holdout test

• M-fold cross validation

• Leave-one-out cross validation