Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...
-
Upload
rose-elliott -
Category
Documents
-
view
222 -
download
0
Transcript of Performance Evaluation: Recognition Rate Estimate J.-S. Roger Jang ( 張智星 ) CSIE Dept.,...
Performance Evaluation:Recognition Rate EstimatePerformance Evaluation:
Recognition Rate Estimate
J.-S. Roger Jang ( J.-S. Roger Jang ( 張智星 張智星 ))
CSIE Dept., National Taiwan Univ.CSIE Dept., National Taiwan Univ.http://mirlab.org/janghttp://mirlab.org/jang
[email protected]@mirlab.org
Machine Learning Performance Evaluation
Machine Learning Performance Evaluation
2
OutlineOutline
Performance indices of a given classifier/model• Accuracy (recognition rate)
• Computation load
Methods to estimate the recognition rate• Inside test
• One-sided holdout test
• Two-sided holdout test
• M-fold cross validation
• Leave-one-out cross validation
Quiz!
Machine Learning Performance Evaluation
3
Synonyms Synonyms
The following sets of synonyms will be use interchangeably
• Classifier, model
• Recognition rate, accuracy
Machine Learning Performance Evaluation
4
Performance IndicesPerformance Indices
Performance indices of a classifier• Recognition rate
- Requires an objective procedure to derive it
• Computation load
- Design-time computation
- Run-time computation
Our focus• Recognition rate and the procedures to derive it
• The estimated accuracy depends on
- Dataset
- Model (types and complexity)
Machine Learning Performance Evaluation
5
Methods for Deriving Recognition ratesMethods for Deriving Recognition ratesMethods to derive the recognition rates
• Inside test (resubstitution recog. rate)
• One-sided holdout test
• Two-sided holdout test
• M-fold cross validation
• Leave-one-out cross validation
Data partitioning• Training set
• Training and test sets
• Training, validating, and test sets
Machine Learning Performance Evaluation
6
Inside Test (1/2)Inside Test (1/2)
Dataset partitioning• Use the whole dataset for training & evaluation
Recognition rate• Inside-test recognition rate
• Resubstitution accuracy
Training dataset whole set D = ;
( ) : Model identified by the training data
1RR 1 sgn ( )
| |
D Di i
D
D Di D i
i
x y
F
y F xD
Machine Learning Performance Evaluation
7
Inside Test (2/2)Inside Test (2/2)
Characteristics• Too optimistic since RR tends to be higher
• For instance, 1-NNC always has an RR of 100%!
• Can be used as the upper bound of the true RR.
Potential reasons for low inside-test RR:• Bad features of the dataset
• Bad method for model construction, such as
- Bad results from neural network training
- Bad results from k-means clustering
Machine Learning Performance Evaluation
8
One-side Holdout Test (1/2)One-side Holdout Test (1/2)
Dataset partitioning• Training set for model construction
• Test set for performance evaluation
Recognition rate• Inside-test RR
• Outside-test RR
Training set A = ; , test set B = ;
( ) : Model identified using set A
1Outside-test RR 1 sgn ( )
| |
A A B Bi i i i
A
B Bi A i
i
x y x y
F
y F xB
Machine Learning Performance Evaluation
9
One-side Holdout Test (2/2)One-side Holdout Test (2/2)
Characteristics• Highly affected by data partitioning
• Usually Adopted when design-time computation load is high
Machine Learning Performance Evaluation
10
Two-sided Holdout Test (1/3)Two-sided Holdout Test (1/3)
Dataset partitioning• Training set for model construction
• Test set for performance evaluation
• Role reversal
Data set A = ; , data Set B = ;
( ) : Model identified using data set A
( ) : Model identified using data set B
1Outside-test RR 1 sgn ( ) sgn ( )
| | | |
Inside-te
A A B Bi i i i
A
B
B B A Ai A i i B i
i i
x y x y
F
F
y F x y F xA B
1st RR 1 sgn ( ) sgn ( )
| | | |A A B Bi A i i B i
i i
y F x y F xA B
Machine Learning Performance Evaluation
11
Two-sided Holdout Test (2/3)Two-sided Holdout Test (2/3)
Two-sided holdout test (used in GMDH)
Data set A
Data set B
Model AModel A
constructionconstruction
RRAevaluationevaluation
Model BModel B RRB
constructionconstruction
evaluationevaluation
Outside-test RR = (RRA + RRB)/2
Machine Learning Performance Evaluation
12
Two-sided Holdout Test (3/3)Two-sided Holdout Test (3/3)
Characteristics• Better usage of the dataset
• Still highly affected by the partitioning
• Suitable for models/classifiers with high design-time computation load
Machine Learning Performance Evaluation
13
M-fold Cross Validation (1/3)M-fold Cross Validation (1/3)
Data partitioning• Partition the dataset into m fold
• One fold for test, the other folds for training
• Repeat m times
1 2 3
1 2 3
1 1;
Data set S=S S S S
S S S S
S S ,
S .
Cross-validation RR 1 sgn ( ) / | |i
i i i
m
m
i j
i
m m
i S S i jj jx y S
i j
Class distribution within each should be as close as possible to S
y F x S
Machine Learning Performance Evaluation
14
M-fold Cross Validation (2/3)M-fold Cross Validation (2/3)
Model Model kk
1S
...
...
...mdisjoint
sets
constructionconstruction
evaluationevaluation
2S
mS
kS kRR
m
RRRR
m
kk
CV
1
Outside test
Machine Learning Performance Evaluation
15
M-fold Cross Validation (3/3)M-fold Cross Validation (3/3)
Characteristics• When m=2 Two-sided holdout test
• When m=n Leave-one-out cross validation
• The value of m depends on the computation load imposed by the selected model/classifier.
Machine Learning Performance Evaluation
16
Leave-one-out Cross Validation (1/3)Leave-one-out Cross Validation (1/3)
Data partitioning• When m=n and Si=(xi, yi)
1 2 3
1 2 3
1 1;
Data set S=S S S S
S S S S
S S ,
S .
Cross-validation RR 1 sgn ( ) / | |i
i i i
m
m
i j
i
m m
i S S i jj jx y S
i j
Class distribution within each should be as close as possible to S
y F x S
Machine Learning Performance Evaluation
17
Leave-one-out Cross Validation (2/3)Leave-one-out Cross Validation (2/3)
Leave-one-out CV
Model kModel k
11 ; yx
...
...
...ni/o
pairs
constructionconstruction
evaluationevaluation
22 ; yx
nn yx ;
kk yx ;
n
RRRR
n
kk
LOOCV
1
kRR
0% or 100%!
Outside test
Machine Learning Performance Evaluation
18
Leave-one-out Cross Validation (3/3)Leave-one-out Cross Validation (3/3)
General method for LOOCV• Perform model construction (as a blackbox) n
times Slow!
To speed up the computation LOOCV • Construct a common part that will be used
repeatedly, such as
- Global mean and covariance for QC
• More info of cross-validation on Wikipedia
Machine Learning Performance Evaluation
19
Applications and Misuse of CVApplications and Misuse of CV
Applications of CV• Input (feature) selection
• Model complexity determination
• Performance comparison among different models
Misuse of CV• Do not try to boost validation RR too much, or you
are running the risk of indirectly training on the left-out data!
Machine Learning Performance Evaluation
20
QuizQuiz
Explain the following terms for performance evaluation
• Inside test
• One-sided holdout test
• Two-sided holdout test
• M-fold cross validation
• Leave-one-out cross validation