Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland,...
Performance Evaluation in Computer Vision
Kyungnam KimComputer Vision Lab,
University of Maryland, College Park
Contents Error Estimation in Pattern Recognition
Jain et al., “Statistical Pattern Recognition: A Review”, IEEE PAMI 2000 (Section 7 Error Estimation).
Assessing and Comparing Algorithms Adrian Clark and Christine Clark, “Performance
Characterization in Computer Vision: A Tutorial”. Receiver Operating Characteristic (ROC) curve Detection Error Trade-off (DET) curve Confusion Matrix McNemar’s test
http://peipa.essex.ac.uk/benchmark/
Error Estimation in Pattern Recognition Reference - Jain et al., “Statistical Pattern Recognition: A
Review”, IEEE PAMI 2000 (Section 7 Error Estimation).
It is very difficult to obtain a closed-form expression for error rate Pe.
In practice, the error rate must be estimated from all the available samples split into training and test sets.
Error estimate = percentage of misclassified test samples.
Reliable error estimate – (1) Large sample size, (2) Independent training and test samples.
Error Estimation in Pattern Recognition The error estimate (function of the specific training and
test sets used) is random variable. Given a classifier, t is # of misclassified test samples out
of n. The probability density function of t has a binomial distribution.
The maximum-likelihood estimate, Pe, of Pe is given by Pe=t/n,
with E(Pe) = Pe and Var(Pe) = Pe(1- Pe)/n. Pe is a random variable a confidence interval (shrink
as n increases)
versions of cross-validation approach
leave all in
resamplingbased on the analogypopulation samplesample sample
http://www.uvm.edu/~dhowell/StatPages/Resampling/Bootstrapping.htmlhttp://www.childrens-mercy.org/stats/ask/bootstrap.asphttp://www.cnr.colostate.edu/class_info/fw663/bootstrap.pdfhttp://www.maths.unsw.edu.au/ForStudents/courses/math3811/lecture9.pdf
Error Estimation in Pattern Recognition Receiver Operating Characteristic (ROC) Curve
detailed later.
‘Reject Rate’: reject doubtful patterns near the decision boundary (low confidence).
A well-known reject option is to reject a pattern if its maximum a posteriori probability is below a threshold.
Trade-off between ‘reject rate’ and ‘error rate’.
Assessing and Comparing Algorithms Reference: Adrian Clark and Christine Clark, “Performance
Characterization in Computer Vision: A Tutorial”. http://peipa.essex.ac.uk/benchmark/tutorials/essex/tutorial.pdf
The same training and test sets. Some standard sets – FERET, PETS.
Simply to see which has the better success rate? Not enough. A standard statistical test, McNemar’s test is required.
Two types of testing: Technology evaluation: the response of an underlying generic
algorithm to factors such as adjustment of its tuning parameters, noisy input date, etc.
Application evaluation: how well an algorithm performs a particular task
Assessing and Comparing Algorithms
Receiver Operating Characteristic (ROC) curve
FPTN
FP rate positive false
FNTP
TP rate positive true
Assessing and Comparing Algorithms
Detection Error Trade-off (DET) curve
- logarithmic scales on both axes- more spread out, easier to distinguish- close to linear
Assessing and Comparing Algorithms
Detection Error Trade-off (DET) curve
- Forensic applications: track down a suspect- High security applications: ATM machines- EER (equal error rate)
- Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)
Assessing and Comparing Algorithms
Crossing ROC curves
Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)
Assessing and Comparing Algorithms
McNemar’s testAn appropriate statistical test must take into account
not only # of FP, etc. but also ‘# of tests’.
(a form of chi-square test)http://www.zephryus.demon.co.uk/geography/resources/fieldwork/stats/chi.htmlhttp://www.isixsigma.com/dictionary/Chi_Square_Test-67.htm