Receiver Operating Characteristic (ROC) Curveshomepage.stat.uiowa.edu/.../ROC_introduction.pdf ·...

ReceiverOperatingCharacteristic(ROC)

Curves

Evaluatingaclassifierandpredictiveperformance

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ROC curve

1-Specificity (i.e. % of true negatives incorrectly declared positive)

Sen

sitiv

ity (%

of t

rue

posi

tives

dec

lare

d po

sitiv

e)

Falsepositiverate

True

positivera

te

Classification(supervisedlearning)•  Insupervisedlearning,weareinterestedinbuildingamodeltopredicttheclass(oroutcome)ofanewobservationbasedonobservablepredictorsusingthetrain-set/test-setframework.– Whichstudentsarenotlikelytoreturnfortheirsecondyearofcollege?

– Whichanimalsadmittedtoaveterinaryhospitalarelikelytosurvive(ornotsurvive)?

– Whichbankcustomersarelikelytodefaultonaloan?

Someclassificationprocedures:•  Logisticregression(0-1response)– Choosevariables,fitmodel,calculate(i.e.)–  0.5predictedclass1,0.5predictedclass0

•  Classificationtree– Use‘greedy’algorithmtochoosevariablesandcreatedecisiontree(predictionattreeleaves)

•  LinearDiscriminantAnalysis(LDA)– Builtonmultivariatenormal(spheroids)– Classifynewobservationtonearestcentroid

p̂p̂ ≥ p̂ <

ŷ

Assumptions:•  Logisticregression– Predictorsarerelatedtoresponsewithasigmoidalmeanstructure

– ConditionalBernoulligiventhemean•  Classificationtree(maybe?)– Thedatacanbedescribedbyfeatures– Thereexistssometreethatisn’ttoobigthatcanpredictreasonablywell

•  Lineardiscriminantanalysis– DatafollowmultivariateGaussian– Equalcovariancematrices

Pros/Consoftheseclassifiers:

NOTE:Insimulationstudies,wehavefoundthateachoftheseclassifiersperformsbetterthantheotherswhenthedataweresimulatedunderthegivenmodelassumptions.

Pros ConsLogistic,Regression

Interpretation,of,covariates.

Computationally,complex,interative,algorithm,and,may,not,converge.

More,robust,to,deviations,from,assumptions,than,LDA.

Depends,on,sigmoidal,mean,structure,and,conditional,binomial,distributions.

Trees Fewer,assumptions,of,data,,more,flexible,algorithm,,easy,interpation,of,certain,characteristics.

Instability,(random,forests,can,help),,lack,of,smoothness,,uses,a,greedy,algorithm.

LDA Dimensionality,reduction,,estimation,under,assumptions,uses,maximum,likelihood

Not,flexible,when,assumptions,deviate,from,multivariate,normal,assumptions.

Whatmakesagoodclassifier?

•  Yourpredictionsarecorrect.– Truepositiveswerecorrectlyclassified– Truenegativeswerecorrectlyclassified

•  Example:Diseasediagnostictest– Patientsaregiventhediagnostictest– Testgivesthosewiththediseasea‘positive’

•  Highsensitivity(correctlyclassifyingtruepositives)– Testgivesthosewithoutthediseasea‘negative’

•  Highspecificity(correctlyclassifyingtruenegatives)

Misclassificationrate

•  Misclassification– Theobservationbelongstooneclass,butthemodelclassifiesitasamemberofadifferentclass.

•  Aclassifierthatmakesnoerrorswouldbeperfect– Notlikelytofindsuchaclassifierintherealworld– Noise– Otherrelevantvariablesnotintheavailabledataset

Misclassificationrate

•  Wecanpresenttheaccuracyoftheresultswithaconfusionmatrix.

•  Overall error rate = (18+5)/1000 = 2.3%

•  NOTE: The prediction value is often calculated through leave-one-out cross-validation in logistic regression.

Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957

Beatingthenaïveclassifier

•  Inthisdataset,the0’sweremuchmoreprevalentthanthe1’s.

•  WhatifwejustclassifiedALLobservationsas0?Howwelldowedo?

•  Overall error rate = (25)/1000 = 2.5%

Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957

Distinguishingbetweenthetwotypesofmistakestobemade

•  Sensitivity=#oftruepositivesdeclared‘positive’#oftruepositivesIsyourdiseasediagnostictoolgettingtheonesyouwant?

•  Specificity=#oftruenegativesnotdeclared‘positive’#oftruenegativesIsyourdiseasediagnostictoolNOTgettingtheonesyouDON’Twant?

•  Foragivenclassifier,wewouldlikebothtobehigh.

ReceiverOperatingCharacteristic(ROC)curve

•  TocreateanROCcurve,wefirstorderthepredictedprobabilitiesfromhighesttolowest.

•  Highestprobabilitiesarepredictedtohavethedisease(we’llwanttoclassifythoseas‘disease’).

•  Lowestprobabilitiesarepredictedtonothavethedisease(we’llwanttoclassifythoseas‘notdisease’).

•  Probability=0.5?Flipofthecoin.

ROCcurve•  TocreateanROCcurve,westartatthehighestpredictedprobabilityandworkourwaydownthelist(i.e.highestprobtolowestprob),westopateachpositionC(i.e.potentialcutoff)anduseitastheclassifier(i.e.Corabove=1,belowC=0),andask“Howwelldoesthisclassifierdo?Sensitivity?Specificity?”

preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..

!Mostlikelytohavedisease

Forexample,ifweclassifycaseswithC=0.8918andhigheras‘1’andprobabilitieslessthanC=0.8918as‘0’,howwelldowedo?

Classifiedas‘1’

Classifiedas‘0’

ROCcurve

•  NOTE:Ifwestartatthetopofthelistandwedon’tgoveryfardownthelistforC,we’llprobablygetmostlytruepositives(i.e.lowfalsepositiverate,good),buttherearestilllotsoftruepositivesthatwemissed(lowtruepositiverate,bad).

!Mostlikelytohavedisease preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..

Classifiedas‘1’

Classifiedas‘0’

TheROCcurveshowsuswhathappenstosensitivityandspecificityaswemoveourC(thresholdforclassifier)downthelist.

Cathighpredictedprobability(veryfewcasesdeclared‘disease’).Lowfalsepositiverate(good).Lowtruepositiverate(bad).

Catlowpredictedprobability(almostallcasesdeclared‘disease’).Highfalsepositiverate(bad).Hightruepositiverate(good).

Falsepositiverate

True

positivera

te

Sensitivityandspecificityinfofromthe“first”Cthreshold.

Sensitivityandspecificityinfofromthe“last”Cthreshold.

Falsepositiverate

True

positivera

te

EachmovedownthelistofprobabilitiescoincideswitheitheramoveupontheROCcurve(correctchoiceas‘disease’)ortotheright(incorrectchoicewhendeclared‘disease’).Thus,weseea‘stairstep’phenomenonintheROCcurve.Forsmalldatasets,thisisveryapparent.

Weessentiallystartatthepoint(0,0).Ifthefirstcase(highestprobability)iscorrectlyclassifiedwhendeclared‘disease’,wemoveverticallyup(truepositive).Ifthefirstcase(highestprobability)isincorrectlyclassifiedas‘disease,wemovetotheright(falsepositive).

Aswestartmovingdownthelist,wewanttomoveupontheROCcurve,nottotheright.

SeeanimationofROCcurvecreation

http://homepage.stat.uiowa.edu/~rdecook/stat6220/ROC_animated1.html

WhatROCshapesaysit’sagoodclassifier?

•  Wewantittogoupverticallyveryquickly–  i.e.Aswemovedownthelistofpredictedprobabilities,we’regettingall‘diseased’casesandno‘non-diseased’.

•  Weknowintheend(wheneveryoneisclassifiedasa‘disease’case)allthenegativeswillbemisclassifiedandallthepositiveswillbecorrectlyclassified.So,we’llalwaysendat(1,1).

ComparingclassifierswithROCanalysisBestofthese Worstofthese

Choosingatrandom

AUCorareaunderthecurvecomparestheclassifierstoo.

ROCcurve:Provost’sOfficeClient

•  Example:Predictwhichstudentswillnotreturnfortheirsecondyearofcollege.

– Weusethepredictedprobabilitiesfromthelogisticregressionforclassification.

–  Iftheprobabilityofacasebeinginclass1(notretained)isequaltoorgreaterthan0.5,thatcaseisclassifiedasa1.

– Anycasewithanestimatedprobabilityoflessthan0.5wouldbeclassifiedasa0(retained).

Provost’sOfficeExample

•  Logisticregressionvariables– Staffordloan– Liveoncampus– Firstgenerationcollegestudent– RAI– Selectiveprogramofstudy– HighSchoolGPA

Provost’sOfficeExample

•  Thepredictedprobabilityofnotbeingretainedisusedtoclassifyeachcase.

•  Studentswithaveryhighprobabilityareexpectedtonotreturn.

•  Studentswithaverylowprobabilityareexpectedtoreturnintheir2ndyear.

Provost’sOfficeExampleROC from logistic regression classifier

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

TheROCcurvefromthegivenclassifier:logisticregressionpredictedprobabilities…meh

*PlotgeneratedfromROCRpackageinR.

Provost’sOfficeExampleThebluepointrepresentstheclassifierbasedonprobability=0.5cutoff.

Sensitivity:41/859=4.77%Specificity:

4762/4799=99.23%OverallMisclassification:

855/5658=0.1511

ROC from logistic regression classifier

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

*Pointcalculatedbyhandandadded.

Provost’sOfficeExampleWecanalsolookataccuracy(1-misclassificationrate)vs.thecutoff.Thebestcutoffforhighestaccuracyis0.4623.

Sensitivity:66/859=7.68%Specificity:

4750/4799=98.97%OverallMisclassification:

842/5658=0.1488

Cutoff

Accuracy

0.0 0.2 0.4 0.6

0.2

0.3

0.4

0.5

0.6

0.7

0.8

*PlotgeneratedfromROCRpackageinR.

(Prettyclosetoour0.5threshold)

Provost’sOfficeExampleThegreenpointrepresentstheoptimalclassifierbasedonequalimportanceofsensitivityandspecificity(asmaxof“sensitivity+specificity”)whichisprob=0.191cutoff.

Sensitivity:55.2%Specificity:77.8%OverallMisclassification:

1456/5658=0.2573

ROC from logistic regression classifier

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

*PointdeterminedfromEpipackage.

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

cutoff (prob)

SpecificitySensitivity

SensitivitySpecificitySens+SpecDist from top left to ROC curve

Provost’sOfficeExampleOptimalcutoffscanbefoundfromsoftwareorcalculated.

Maxsens+speccutoff(0.191) Our0.5cutoff

Mineucliddistancecutoff(0.146)

Cutoffwhereeveryonedeclaredpositive.

Cutoffwherenoonedeclaredpositive.

ComparingclassifierswithROCsThecurvesmayvisually“look”different,butaretheyreally?Ifwecollecteddataagain,wouldtheylookthisdifferent?

ROC curve with 95% CIs

Specificity (%)

Sen

sitiv

ity (%

)

020

4060

80100

100 80 60 40 20 0

AUC: 71.8% (69.9%–73.7%)AUC: 71.8% (69.9%–73.7%)

BootstrappingtheROC

Canusebootstrappingtocreateaconfidencebandforthesensitivity(givenspecificity)fortheROCcurve.

*PlotgeneratedfrompROCpackageinR.

BootstrappingtheROC

Butthebootstrapresultsdon’talwayslooksonice.

ROC curve with 95% CIs

Specificity (%)

Sen

sitiv

ity (%

)

020

4060

80100

100 80 60 40 20 0

AUC: 83.0% (60.4%–100.0%)AUC: 83.0% (60.4%–100.0%)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ROC curve

1-Specificity (i.e. % of true negatives incorrectly declared positive)

Sen

sitiv

ity (%

of t

rue

posi

tives

dec

lare

d po

sitiv

e)

*PlotgeneratedfrompROCpackageinR.

BootstrappingROCcurvesforcomparison

Semi-transparentcoloringinRcancome-inhandy.# semi-transparent red: col="#ff000030"

# semi-transparent blue: col="#0000ff20"

ROC curve with 95% Bootstrapped CIs

Specificity (%)

Sen

sitiv

ity (%

)

020

4060

80100

100 80 60 40 20 0*PlotgeneratedfrompROCpackageinR.

############################################ 1a) Manually scripting the ROC curve:############################################

## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.

## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)

## Gather the data for the ROC curve:ROC.info.LogReg=data.frame(NotRetained,preds,pred.outcome)## Put in order by predicted probability, largest first:ROC.info.LogReg=ROC.info.LogReg[rev(order(ROC.info.LogReg$preds)),]

## A 'positive' will be considered as someone who was not retained.(num.true.pos=sum(NotRetained==1))(num.true.neg=sum(NotRetained==0))plot(c(0,1),c(0,1),type="n",xlab="1-Specificity (i.e. % of true neg's incorrectly declared positive)",ylab="Sensitivity (% of true pos's declared positive)",main="ROC curve (logistic regression)",sub="False Positive Rate")x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.poslines(x,y)abline(0,1)

#################################### 1b) Manually calculating AUC:####################################

## Calculate AUC thinking as geometric trapezoid shapes:#http://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand

## Same x,y labeling as previous slide:x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.pos

height = (y[-1]+y[-n])/2width = diff(x) sum(height*width)

########################################################## 1c) Manually calculating sensitivity, specificity, ## ## misclassification rate for 0.5 threshold: ##########################################################

## All the needed info is in the following table:table(NotRetained,pred.outcome) # 0 1# 0 4762 37# 1 818 41

################################################ 2) ROCR package for creating ROC curve: ################################################library(ROCR)

pred=prediction(preds,NotRetained)

perf=performance(pred,"tpr","fpr")

plot(perf,main="ROC from logistic regression classifier")

(AUC.ROCR=performance(pred,"auc")@y.values[[1]])

## Plot accuracy (1-misclassification rate) vs. cutoff:acc = performance(pred, "acc")(ac.val = max(unlist([email protected])))#[1] 0.8511842

th = unlist([email protected])[unlist([email protected]) == ac.val]

plot(acc)abline(v=th, col='grey', lty=2)th#[1] 0.462324

################################################ 3) pROC package for bootstrapped ROC: ################################################library(pROC)

## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.

## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)

rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% CIs", percent=TRUE, ci=TRUE, print.auc=TRUE)

## Calculate CI of sensitivity at select set of ## specificities and form a 'band' (might take a bit): ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#1c61b6AA") # blue band

## Use bootstrap to add CI in both directions at "best" cutoff: plot(ci(rocobj, of="thresholds", thresholds="best"),col="yellow",lwd=2)

## Overlay Bootstrapped ROC's:rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% Bootstrapped CIs", percent=TRUE, ci=TRUE, print.auc=FALSE)

ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#ff000030") # semi-transparent red color

## Gather info on second classifier:ROC.info.LogReg.2=data.frame(NotRetained,preds.2,pred.outcome.2)

## Overlay the second ROC curve onto the first:rocobj2=plot.roc(ROC.info.LogReg.2$NotRetained, ROC.info.LogReg.2$preds.2, percent=TRUE, ci=TRUE, print.auc=FALSE, add=TRUE)

ciobj2=ci.se(rocobj2,specificities=seq(0, 100, 5)) plot(ciobj2, type="shape", col="#0000ff20") # semi-transparent blue color

Somereferences

•  Flach,P.A.,(2016).ROCAnalysis.ChapterinEncyclopediaofMachineLearningandDataMining.–  https://research-information.bristol.ac.uk/files/94977288/Peter_Flach_ROC_Analysis.pdf

•  James,G.,Witten,D.,Hastie,T.andTibshirani,R.(2015).AnIntroductiontoStatisticalLearningwithApplicationsinR.–  http://www-bcf.usc.edu/~gareth/ISL/–  Click‘DownloadthebookPDF’

•  Fawcett,T.(2006).AnIntroductiontoROCAnalysis.PatternRecognitionLetters27pp.861-874.

Receiver Operating Characteristic (ROC) Curveshomepage.stat.uiowa.edu/.../ROC_introduction.pdf ·...

Documents

Transcript of Receiver Operating Characteristic (ROC) Curveshomepage.stat.uiowa.edu/.../ROC_introduction.pdf ·...