Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...

Performance Evaluation:Recognition Rate EstimatePerformance Evaluation:

Recognition Rate Estimate

J.-S. Roger Jang ( J.-S. Roger Jang ( 張智星張智星 ))

CSIE Dept., National Taiwan Univ.CSIE Dept., National Taiwan Univ.http://mirlab.org/janghttp://mirlab.org/jang

[email protected]@mirlab.org

Machine Learning Performance Evaluation


2

OutlineOutline

Performance indices of a given classifier/model• Accuracy (recognition rate)

• Computation load

Methods to estimate the recognition rate• Inside test

• One-sided holdout test

• Two-sided holdout test

• M-fold cross validation

• Leave-one-out cross validation

Quiz!


3

Synonyms Synonyms

The following sets of synonyms will be use interchangeably

• Classifier, model

• Recognition rate, accuracy


4

Performance IndicesPerformance Indices

Performance indices of a classifier• Recognition rate

- Requires an objective procedure to derive it

• Computation load

- Design-time computation

- Run-time computation

Our focus• Recognition rate and the procedures to derive it

• The estimated accuracy depends on

- Dataset

- Model (types and complexity)


5

Methods for Deriving Recognition ratesMethods for Deriving Recognition ratesMethods to derive the recognition rates

• Inside test (resubstitution recog. rate)





Data partitioning• Training set

• Training and test sets

• Training, validating, and test sets


6

Inside Test (1/2)Inside Test (1/2)

Dataset partitioning• Use the whole dataset for training & evaluation

Recognition rate• Inside-test recognition rate

• Resubstitution accuracy

Training dataset whole set D = ;

( ) : Model identified by the training data

1RR 1 sgn ( )

| |

D Di i

D

D Di D i

i

x y

F

y F xD


7

Inside Test (2/2)Inside Test (2/2)

Characteristics• Too optimistic since RR tends to be higher

• For instance, 1-NNC always has an RR of 100%!

• Can be used as the upper bound of the true RR.

Potential reasons for low inside-test RR:• Bad features of the dataset

• Bad method for model construction, such as

- Bad results from neural network training

- Bad results from k-means clustering


8

One-side Holdout Test (1/2)One-side Holdout Test (1/2)

Dataset partitioning• Training set for model construction

• Test set for performance evaluation

Recognition rate• Inside-test RR

• Outside-test RR

Training set A = ; , test set B = ;

( ) : Model identified using set A

1Outside-test RR 1 sgn ( )

| |

A A B Bi i i i

A

B Bi A i

i

x y x y

F

y F xB


9

One-side Holdout Test (2/2)One-side Holdout Test (2/2)

Characteristics• Highly affected by data partitioning

• Usually Adopted when design-time computation load is high


10

Two-sided Holdout Test (1/3)Two-sided Holdout Test (1/3)

Dataset partitioning• Training set for model construction

• Test set for performance evaluation

• Role reversal

Data set A = ; , data Set B = ;

( ) : Model identified using data set A

( ) : Model identified using data set B

1Outside-test RR 1 sgn ( ) sgn ( )

| | | |

Inside-te

A A B Bi i i i

A

B

B B A Ai A i i B i

i i

x y x y

F

F

y F x y F xA B

1st RR 1 sgn ( ) sgn ( )

| | | |A A B Bi A i i B i

i i

y F x y F xA B


11


Two-sided holdout test (used in GMDH)

Data set A

Data set B

Model AModel A

constructionconstruction

RRAevaluationevaluation

Model BModel B RRB


evaluationevaluation

Outside-test RR = (RRA + RRB)/2


12


Characteristics• Better usage of the dataset

• Still highly affected by the partitioning

• Suitable for models/classifiers with high design-time computation load


13

M-fold Cross Validation (1/3)M-fold Cross Validation (1/3)

Data partitioning• Partition the dataset into m fold

• One fold for test, the other folds for training

• Repeat m times

1 2 3

1 2 3

1 1;

Data set S=S S S S

S S S S

S S ,

S .

Cross-validation RR 1 sgn ( ) / | |i

i i i

m

m

i j

i

m m

i S S i jj jx y S

i j

Class distribution within each should be as close as possible to S

y F x S


14


Model Model kk

1S

...

...

...mdisjoint

sets



2S

mS

kS kRR

m

RRRR

m

kk

CV

1

Outside test


15


Characteristics• When m=2 Two-sided holdout test

• When m=n Leave-one-out cross validation

• The value of m depends on the computation load imposed by the selected model/classifier.


16

Leave-one-out Cross Validation (1/3)Leave-one-out Cross Validation (1/3)

Data partitioning• When m=n and Si=(xi, yi)

1 2 3

1 2 3

1 1;

Data set S=S S S S

S S S S

S S ,

S .

Cross-validation RR 1 sgn ( ) / | |i

i i i

m

m

i j

i

m m

i S S i jj jx y S

i j

Class distribution within each should be as close as possible to S

y F x S


17


Leave-one-out CV

Model kModel k

11 ; yx

...

...

...ni/o

pairs



22 ; yx

nn yx ;

kk yx ;

n

RRRR

n

kk

LOOCV

1

kRR

0% or 100%!

Outside test


18


General method for LOOCV• Perform model construction (as a blackbox) n

times Slow!

To speed up the computation LOOCV • Construct a common part that will be used

repeatedly, such as

- Global mean and covariance for QC

• More info of cross-validation on Wikipedia


19

Applications and Misuse of CVApplications and Misuse of CV

Applications of CV• Input (feature) selection

• Model complexity determination

• Performance comparison among different models

Misuse of CV• Do not try to boost validation RR too much, or you

are running the risk of indirectly training on the left-out data!


20

QuizQuiz

Explain the following terms for performance evaluation

• Inside test





Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...

Documents

Transcript of Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...