Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of...
Transcript of Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of...
![Page 1: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/1.jpg)
What ROC Curves Can't Do
(and Cost Curves Can)
Robert Holte
Computing Science Dept.
University of Alberta
Chris Drummond
Research Officer NRC
Adjunct Professor SITE
Cost Curves
Scalar Measures Don’t Even Come Close
![Page 2: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/2.jpg)
How to Evaluate Performance
?• Scalar measure summarizing performance
– Accuracy
– Expected cost
• Performance Visualization Techniques
– ROC curve
– Area under the ROC curve
– Cost Curve
![Page 3: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/3.jpg)
The Lure of Scalar Measures
“…it is often preferable to employ a single value
measure which summarizes the performance
of a classifier, e.g. because there are several
classifiers to be compared and there is no
clear dominance of one ROC curve above the
others.
The most widely used single measure is the
Area Under the ROC Curve …”
– paraphrase from a workshop paper
because there are several
classifiers to be compared and there is no clear
dominance of one ROC curve above the others.
![Page 4: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/4.jpg)
What’s Genuinely Good About
Scalar Measures ?
• We know how to
– average them,
– compute confidence intervals,
– test for significance, etc
• and there is off-the-shelf software to do these
calculations for us.
• But they often hide important details
![Page 5: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/5.jpg)
Tru
e P
ositiv
e R
ate
False Positive Rate
2 Splitting Criteria for C4.5
Criterion-A
Criterion-D
The key question is
NOT
Is A better than D?
BUT
When is A better than D?
Appendicitis Dataset
![Page 6: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/6.jpg)
Tru
e P
ositiv
e R
ate
Is AUC=0.95 better than AUC=0.75 ?
False Positive Rate
When
positives >> negatives 25:1
AUC=0.95
> twice the error rate of
AUC=0.75
FP = 0.75, TP = 1.0
AUC = 0.75FP = 0, TP = 0.95
AUC = 0.95
![Page 7: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/7.jpg)
Cost Curves
![Page 8: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/8.jpg)
Error Curves
Err
or
Rat
e
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0Classifier 1
TP = 0.4
FP = 0.3
Classifier 2
TP = 0.7
FP = 0.5
Classifier 3
TP = 0.6
FP = 0.2
FP FN = 1-TP
![Page 9: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/9.jpg)
Operating Range
Err
or
Rat
e
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Operating Range
“always
negative”“always
positive”
![Page 10: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/10.jpg)
Lower Envelope
Err
or
Rat
e
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
“always
negative”
“always
positive”
The lower envelope is a biased estimate of performance.
Fresh data is needed to get an unbiased estimate.
![Page 11: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/11.jpg)
X = p(+) =>
Y = error rate
X = p(+) •C(-|+)
Err
or
Rat
e
Probability of Positive P(+)
Taking Costs Into Account
Y = FN•X + FP •(1-X)
PC(+) - Probability Cost
No
rmal
ized
Ex
pec
ted C
ost
p(+)•C(-|+) + p(-)•C(+|-)
FP FN
[0,1]
Y = E[cost] norm to [0,1]
![Page 12: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/12.jpg)
Comparing Cost Curves
PC(+) - Probability Cost
No
rmal
ized
Ex
pec
ted C
ost
![Page 13: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/13.jpg)
ROC, Selection procedure
False Positive Rate
Tru
e P
osi
tiv
e R
ate Suppose this classifier was
produced by a training set
with a class ratio of 10:1,
and was used whenever the
deployment situation had a
10:1 class ratio.
![Page 14: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/14.jpg)
Cost Curves, Selection
ProcedureN
orm
aliz
ed E
xpec
ted C
ost
PC(+) - Probability Cost
![Page 15: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/15.jpg)
Averaging Cost Curves
PC(+)-Probability Cost
Norm
aliz
ed E
xp
ecte
d C
ost
![Page 16: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/16.jpg)
Averaging ROC Curves
False Positive Rate
Tru
e P
osi
tive
Rat
e
![Page 17: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/17.jpg)
Confidence Intervals
True
Predicted
pos neg
pos 78 22
neg 40 60
Original
TP = 0.78
FP = 0.4
Predicted
negposTrue
6238neg
1783pos
Resample #2
TP = 0.83
FP = 0.38
Resample confusion matrix 10000 times and take 95% envelope
Resample #1
TP = 0.75
FP = 0.45
Predicted
negposTrue
5545neg
2575pos
![Page 18: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/18.jpg)
Confidence Interval Example
Norm
aliz
ed E
xp
ecte
d C
ost
PC(+) - Probability Cost
![Page 19: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/19.jpg)
Paired Resampling to Test
Statistical Significance
Predicted by
Classifier1
Predicted by Classifier2
pos neg
pos 30 10
neg 0 60
For the 100 test examples in the negative class:
FP for classifier1: (30+10)/100 = 0.40
FP for classifier2: (30+0)/100 = 0.30
FP2 – FP1 = -0.10
Resample this matrix 10000 times to get (FP2-FP1) values.
Do the same for the matrix based on positive test examples.
Plot and take 95% envelope as before.
![Page 20: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/20.jpg)
PC(+) - Probability Cost
Norm
aliz
ed E
xpec
ted C
ost
Low correlation = Low
significance
classifier1
classifier2
FN2-FN1
FP2-FP10
![Page 21: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/21.jpg)
Comparing J48 and AdaBoost
![Page 22: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals](https://reader034.fdocuments.us/reader034/viewer/2022042914/5f4e8ab1a8e4e57f69000186/html5/thumbnails/22.jpg)
Conclusions
• Scalar performance measures, including
AUC, do not indicate when one classifier is
better than another.
• Cost/ROC curve software is available
• Cost curves enable easy visualization of
– average performance (expected cost)
– operating range
– confidence intervals on performance
– difference in performance and its significance.