Post on 19-Jan-2018
description
ROC curve estimation
Index• Introduction to ROC• ROC curve• Area under ROC curve• Visualization using ROC curve
ROC curve• Originally stands for Receiver
Operating Characteristic curve.• It is used widely in biomedical
applications like radiology and imaging.
• An important utility here is to assess classifiers in machine learning.
Example situation• Consider diagnostic test for a disease• Test has 2 possible outcomes:• Positive or negative.• Now based on this we will explain the
various notations used in ROC curves in the next slide.
Data distribution available
Test Result
Pts Pts with with diseasdiseasee
Pts Pts without without the the diseasedisease
Test Result
Call these patients “negative”
Call these patients “positive”
Threshold
Test Result
Call these patients “negative”
Call these patients “positive”
without the diseasewith the disease
True Positives
Some definitions ...
Test Result
Call these patients “negative”
Call these patients “positive”
without the diseasewith the disease
False Positives
Test Result
Call these patients “negative”
Call these patients “positive”
without the diseasewith the disease
True negatives
Test Result
Call these patients “negative”
Call these patients “positive”
without the diseasewith the disease
False negatives
Confusion Matrix• Confusion matrix is defined as a matrix
consisting of two rows and two columns.• The orientation of entries in the confusion
matrix is as follows if say the confusion matrix is called CMat.
• Then CMat[1][1]=True Positives CMat[1][2]=False Positives.
• Similarly CMat[2][1]=False Negatives and CMat[2][2]=True Negatives.
2-class Confusion Matrix
• Reduce the 4 numbers to two ratestrue positive rate = TP = (#TP)/(#P)false positive rate = FP = (#FP)/(#N)
• Rates are independent of class ratio*
True classPredicted class
positive negative
positive (#P) #TP #P - #TPnegative (#N) #FP #N - #FP
Comparing classifiers using Confusion Matrix
TruePredictedpos neg
pos 60 40neg 20 80
TruePredictedpos neg
pos 70 30neg 50 50
TruePredictedpos neg
pos 40 60neg 30 70
Classifier 1TP = 0.4FP = 0.3
Classifier 2TP = 0.7FP = 0.5
Classifier 3TP = 0.6FP = 0.2
Interpretations from the Confusion matrix
• The following metrics for a classifier can be calculated using the confusion matrix. These can be used for evaluating the classifier.
• Accuracy = (TP+TN)• Precision = TP/(TP+FP)• Recall = TP/(TP+FN)• F-Score = 2*recall*precision/(recall +
precision)
True
Pos
itive
Rat
e
(se
nsiti
vity
)
0%
100%
False Positive Rate (1-specificity)
0%
100%
ROC curve
True
Pos
itive
Rat
e
0%
100%
False Positive Rate0%
100%
True
Pos
itive
Rat
e
0%
100%
False Positive Rate0%
100%
A good test: A poor test:
ROC curve comparison
Area under ROC curve (AUC) • Overall measure of test performance• Comparisons between two tests based on
differences between (estimated) AUC• For continuous data, AUC equivalent to Mann-
Whitney U-statistic (nonparametric test of difference in location between two populations)
• Determines the accuracy of a classifier in machine learning.
True
Pos
itive
Rat
e
0%
100%
False Positive Rate
0%
100%
True
Pos
itive
R
ate
0%
100%
False Positive Rate
0%
100%
True
Pos
itive
R
ate
0%
100%
False Positive Rate
0%
100%
AUC = 50%
AUC = 90% AUC =
65%
AUC = 100%
True
Pos
itive
R
ate
0%
100%
False Positive Rate
0%
100%
AUC for ROC curves
Further Evaluation methods
• ROC curve based visualization• The visualization of the ROC curve is
a very good method of evaluating the classifier.
• Tools like Matlab, Weka and Orange provide facilities to support visualization of the ROC curve.
• ROCR is one such tool which provides effective visualization.