1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober...
![Page 1: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/1.jpg)
1
The Expected Performance CurveSamy Bengio, Johnny Mariéthoz, Mikaela Keller
MI – 25. oktober 2007Kresten Toftgaard Andersen
![Page 2: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/2.jpg)
2
Introduction to the paper
By Samy Bengio, Johnny Mariéthoz and Mikaela Keller, 2005 For machine learning community and researchers ect, who need to compare
models.
Content of the paper: Introduces ROC curves very briefly. Points out some risks when using ROC curves for comparing different classifying
models. Argues that ROC curves can be misleading by showing some results. The authors contributes with a so called “Expected Performance Curve”, and
argues why it is better for comparing models. Extends EPC with confidence intervals and statistical difference tests. Concludes the paper summarizing their contribution and by listing strenghts and
weaknesses of ROC and EPC. Acknowledgement and references
![Page 3: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/3.jpg)
3
Content
Motivation Introduce terminology and notation, define problem. Introduce ROC curves Example: how to calculate a ROC Present arguments of why ROC curves should be used with great care Introduce EPC Continue example showing how to calculate an EPC Present arguments of why EPC might be better than ROC Confidence interval My opinion Discussion
![Page 4: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/4.jpg)
4
Motivation
ROC analysis is an important why to compare binary classifier models.
Can be used to select optimal models and discard suboptimal models.
Area of use: Medicine (diagnostic testing, evaluate evidence-based medicine approaches) Epidemiology (factors affecting health, evaluate optimal treatment approaches) Radiology (radar signals, evaluate new radiology techniques ) Psychology (signal detection, assess human detection of weak signals) Machine Learning (evaluation of machine learning techniques) …
![Page 5: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/5.jpg)
5
Definition of 2-class classifiers
Definition of 2-class classification problems:
Apply function and associated threshold on a seperate test data set (true class must be known) and count the outcome.
![Page 6: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/6.jpg)
6
Confusion matrix
Given a 2 class classifier and an instance, there are four possible outcomes:
TP: instance is positive and is classified as positive FN: instance is positive and is classified as negative TN: instance is negative and is classified as negative FN: instance is negative and is classified as positive
![Page 7: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/7.jpg)
7
Perfomance metrics
Selected measure is a pair which is generically called V1 and V2. V1 and V2 can be calculated in many ways depending on the situation. All
are simple combinations of TP, TN, FP and FN. Exact calculation of V1 and V2 is not important in this paper.
![Page 8: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/8.jpg)
8
Perfomance metrics
An unique measure generically called V combines V1 and V2 V can also be calculated in several ways depending on the situation
(Half Total Error Rate)
![Page 9: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/9.jpg)
9
What is a ROC curve?
ROC Abbreviation for ”Receiver Operating Characteristics”. Technique for visualizing, organizing and selecting classifiers based on their
performance. ROC can both be presented as a graph or a curve.
Classifiers Discrete classifiers (decision trees, rule sets ect.) Probabilistic classifiers (Naive Bayes, neural network ect.) Varying a threshold for a probabilistic classifier will trace a curve (ROC)
Following example will show this.
![Page 10: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/10.jpg)
10
Example
![Page 11: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/11.jpg)
11
Example
![Page 12: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/12.jpg)
12
Example
Threshold
![Page 13: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/13.jpg)
13
Example
Threshold
![Page 14: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/14.jpg)
14
Example
Threshold
![Page 15: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/15.jpg)
15
Example
Threshold
![Page 16: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/16.jpg)
16
Example
![Page 17: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/17.jpg)
17
Example
![Page 18: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/18.jpg)
18
ROC curves
• BEP = Breake Even Point
• BEP corresponds to the threshold nearst to a solutions such that V1 = V2
• The selected threshold have a significant impact on the model.
• The threshold represents the a trade-off between giving importance to V1 or V2.
![Page 19: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/19.jpg)
19
Potential risk of using ROC
Each point corresponds to a particular setting of the threshold. But in “real applications” the thresholds need to be decided before seeing the test set.
Normally the threshold is found by searching for the BEP using some equation. Possibility of mismatch because training set is different from the test set. Situations may occur where the optimal threshold found be using the training set,
doesn’t correspond to the optimal threshold on the test set. One parameter, the threshold, is tuned using the training set. Potential risk to
expect that the training error reflects the general error.
“Real applications often suffer from an additional mismatch between training and test conditions”.
Risk of a different trade-off (V1, V2) in test set. ROC curves does not take the risk of a mismatch into account. This probalility should be reflected in the procedure when calculating the performance curve.
![Page 20: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/20.jpg)
20
Potential risk of using ROC
ROC’s of two real models for a Text-Independent Speaker Verifacation task.
Looking at the curves only model B seems to be better than model A.
Looking at the thresholds, A is actually the best model.
![Page 21: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/21.jpg)
21
Expected performance curve
EPC present a range of possible expected performance on the test set. The calculation takes into account the possible mismatch while estimating the
desired threshold. A parameter alpha is used to estimate the possible missmatch of the threshold.
Framework:
Paremetric performance measure: C( V1(θ, D), V2(θ, D); )Depends on:The parameter , V1 and V2 computed on some data D for the threshold θ.
Example:C( V1(θ, D), V2(θ, D); )= C( Precision(θ, D), Recall(θ, D) ; )= - ( Precision(θ, D) + (1 - ) Recall(θ, D))
Procedure:Vary inside a reasonable range and for each estimate θ that minimizes C(-,-;) on a development set and then use the obtained θ to compute V on the test set. At last plot V with respect to .
![Page 22: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/22.jpg)
22
EPC Algorithm
![Page 23: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/23.jpg)
23
Example
![Page 24: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/24.jpg)
24
Example
![Page 25: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/25.jpg)
25
Example
![Page 26: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/26.jpg)
26
Example
![Page 27: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/27.jpg)
27
Example
![Page 28: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/28.jpg)
28
Example of an typical EPC
Alpha > 0,5 = more importance to false acceptance errors
Alpha < 0,5 = more importance to false rejection errors
![Page 29: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/29.jpg)
29
EPC in real applications
Expected Performance Curves for person authentication, where one wants to trade-off false acceptance rates with false rejection rates.
Expected Performance Curves for text categorization, where one wants to trade-off precision and recall and print the F1 measure.
![Page 30: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/30.jpg)
30
Confidence Interval Confidence intervals are used to indicate the reliability of an estimate
![Page 31: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/31.jpg)
31
My opinion
The authors got a point and the idea is good. Good for comparing models… …but hard to read much from EPC, ROC more informative. Cumbersome to compute EPC. Useful… maybe? Apparently only used by the authors?
![Page 32: 1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.](https://reader036.fdocuments.us/reader036/viewer/2022081515/56649d595503460f94a3a17f/html5/thumbnails/32.jpg)
32
End of Line
QuestionsDiscussion