Receiver Operating Characteristic (ROC) Curveshomepage.stat.uiowa.edu/.../ROC_introduction.pdf ·...

37
Receiver Operating Characteristic (ROC) Curves Evaluating a classifier and predictive performance 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 ROC curve 1-Specificity (i.e. % of true negatives incorrectly declared positive) Sensitivity (% of true positives declared positive) False positive rate True positive rate

Transcript of Receiver Operating Characteristic (ROC) Curveshomepage.stat.uiowa.edu/.../ROC_introduction.pdf ·...

  • ReceiverOperatingCharacteristic(ROC)

    Curves

    Evaluatingaclassifierandpredictiveperformance

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ROC curve

    1-Specificity (i.e. % of true negatives incorrectly declared positive)

    Sen

    sitiv

    ity (%

    of t

    rue

    posi

    tives

    dec

    lare

    d po

    sitiv

    e)

    Falsepositiverate

    True

    positivera

    te

  • Classification(supervisedlearning)•  Insupervisedlearning,weareinterestedinbuildingamodeltopredicttheclass(oroutcome)ofanewobservationbasedonobservablepredictorsusingthetrain-set/test-setframework.– Whichstudentsarenotlikelytoreturnfortheirsecondyearofcollege?

    – Whichanimalsadmittedtoaveterinaryhospitalarelikelytosurvive(ornotsurvive)?

    – Whichbankcustomersarelikelytodefaultonaloan?

  • Someclassificationprocedures:•  Logisticregression(0-1response)– Choosevariables,fitmodel,calculate(i.e.)–  0.5predictedclass1,0.5predictedclass0

    •  Classificationtree– Use‘greedy’algorithmtochoosevariablesandcreatedecisiontree(predictionattreeleaves)

    •  LinearDiscriminantAnalysis(LDA)– Builtonmultivariatenormal(spheroids)– Classifynewobservationtonearestcentroid

    p̂p̂ ≥ p̂ <

  • Assumptions:•  Logisticregression– Predictorsarerelatedtoresponsewithasigmoidalmeanstructure

    – ConditionalBernoulligiventhemean•  Classificationtree(maybe?)– Thedatacanbedescribedbyfeatures– Thereexistssometreethatisn’ttoobigthatcanpredictreasonablywell

    •  Lineardiscriminantanalysis– DatafollowmultivariateGaussian– Equalcovariancematrices

  • Pros/Consoftheseclassifiers:

    NOTE:Insimulationstudies,wehavefoundthateachoftheseclassifiersperformsbetterthantheotherswhenthedataweresimulatedunderthegivenmodelassumptions.

    Pros ConsLogistic,Regression

    Interpretation,of,covariates.

    Computationally,complex,interative,algorithm,and,may,not,converge.

    More,robust,to,deviations,from,assumptions,than,LDA.

    Depends,on,sigmoidal,mean,structure,and,conditional,binomial,distributions.

    Trees Fewer,assumptions,of,data,,more,flexible,algorithm,,easy,interpation,of,certain,characteristics.

    Instability,(random,forests,can,help),,lack,of,smoothness,,uses,a,greedy,algorithm.

    LDA Dimensionality,reduction,,estimation,under,assumptions,uses,maximum,likelihood

    Not,flexible,when,assumptions,deviate,from,multivariate,normal,assumptions.

  • Whatmakesagoodclassifier?

    •  Yourpredictionsarecorrect.– Truepositiveswerecorrectlyclassified– Truenegativeswerecorrectlyclassified

    •  Example:Diseasediagnostictest– Patientsaregiventhediagnostictest– Testgivesthosewiththediseasea‘positive’

    •  Highsensitivity(correctlyclassifyingtruepositives)– Testgivesthosewithoutthediseasea‘negative’

    •  Highspecificity(correctlyclassifyingtruenegatives)

  • Misclassificationrate

    •  Misclassification– Theobservationbelongstooneclass,butthemodelclassifiesitasamemberofadifferentclass.

    •  Aclassifierthatmakesnoerrorswouldbeperfect– Notlikelytofindsuchaclassifierintherealworld– Noise– Otherrelevantvariablesnotintheavailabledataset

  • Misclassificationrate

    •  Wecanpresenttheaccuracyoftheresultswithaconfusionmatrix.

    •  Overall error rate = (18+5)/1000 = 2.3%

    •  NOTE: The prediction value is often calculated through leave-one-out cross-validation in logistic regression.

    Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957

  • Beatingthenaïveclassifier

    •  Inthisdataset,the0’sweremuchmoreprevalentthanthe1’s.

    •  WhatifwejustclassifiedALLobservationsas0?Howwelldowedo?

    •  Overall error rate = (25)/1000 = 2.5%

    Predict(as(1 Predict(as(0Actual(1 20 5Actual(0 18 957

  • Distinguishingbetweenthetwotypesofmistakestobemade

    •  Sensitivity=#oftruepositivesdeclared‘positive’#oftruepositivesIsyourdiseasediagnostictoolgettingtheonesyouwant?

    •  Specificity=#oftruenegativesnotdeclared‘positive’#oftruenegativesIsyourdiseasediagnostictoolNOTgettingtheonesyouDON’Twant?

    •  Foragivenclassifier,wewouldlikebothtobehigh.

  • ReceiverOperatingCharacteristic(ROC)curve

    •  TocreateanROCcurve,wefirstorderthepredictedprobabilitiesfromhighesttolowest.

    •  Highestprobabilitiesarepredictedtohavethedisease(we’llwanttoclassifythoseas‘disease’).

    •  Lowestprobabilitiesarepredictedtonothavethedisease(we’llwanttoclassifythoseas‘notdisease’).

    •  Probability=0.5?Flipofthecoin.

  • ROCcurve•  TocreateanROCcurve,westartatthehighestpredictedprobabilityandworkourwaydownthelist(i.e.highestprobtolowestprob),westopateachpositionC(i.e.potentialcutoff)anduseitastheclassifier(i.e.Corabove=1,belowC=0),andask“Howwelldoesthisclassifierdo?Sensitivity?Specificity?”

    preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..

    !Mostlikelytohavedisease

    Forexample,ifweclassifycaseswithC=0.8918andhigheras‘1’andprobabilitieslessthanC=0.8918as‘0’,howwelldowedo?

    Classifiedas‘1’

    Classifiedas‘0’

  • ROCcurve

    •  NOTE:Ifwestartatthetopofthelistandwedon’tgoveryfardownthelistforC,we’llprobablygetmostlytruepositives(i.e.lowfalsepositiverate,good),buttherearestilllotsoftruepositivesthatwemissed(lowtruepositiverate,bad).

    !Mostlikelytohavedisease preds[1,] 0.9299[2,] 0.9033[3,] 0.8918[4,] 0.8851[5,] 0.8687..

    Classifiedas‘1’

    Classifiedas‘0’

  • TheROCcurveshowsuswhathappenstosensitivityandspecificityaswemoveourC(thresholdforclassifier)downthelist.

    Cathighpredictedprobability(veryfewcasesdeclared‘disease’).Lowfalsepositiverate(good).Lowtruepositiverate(bad).

    Catlowpredictedprobability(almostallcasesdeclared‘disease’).Highfalsepositiverate(bad).Hightruepositiverate(good).

    Falsepositiverate

    True

    positivera

    te

    Sensitivityandspecificityinfofromthe“first”Cthreshold.

    Sensitivityandspecificityinfofromthe“last”Cthreshold.

  • Falsepositiverate

    True

    positivera

    te

    EachmovedownthelistofprobabilitiescoincideswitheitheramoveupontheROCcurve(correctchoiceas‘disease’)ortotheright(incorrectchoicewhendeclared‘disease’).Thus,weseea‘stairstep’phenomenonintheROCcurve.Forsmalldatasets,thisisveryapparent.

    Weessentiallystartatthepoint(0,0).Ifthefirstcase(highestprobability)iscorrectlyclassifiedwhendeclared‘disease’,wemoveverticallyup(truepositive).Ifthefirstcase(highestprobability)isincorrectlyclassifiedas‘disease,wemovetotheright(falsepositive).

    Aswestartmovingdownthelist,wewanttomoveupontheROCcurve,nottotheright.

  • SeeanimationofROCcurvecreation

    http://homepage.stat.uiowa.edu/~rdecook/stat6220/ROC_animated1.html

  • WhatROCshapesaysit’sagoodclassifier?

    •  Wewantittogoupverticallyveryquickly–  i.e.Aswemovedownthelistofpredictedprobabilities,we’regettingall‘diseased’casesandno‘non-diseased’.

    •  Weknowintheend(wheneveryoneisclassifiedasa‘disease’case)allthenegativeswillbemisclassifiedandallthepositiveswillbecorrectlyclassified.So,we’llalwaysendat(1,1).

  • ComparingclassifierswithROCanalysisBestofthese Worstofthese

    Choosingatrandom

    AUCorareaunderthecurvecomparestheclassifierstoo.

  • ROCcurve:Provost’sOfficeClient

    •  Example:Predictwhichstudentswillnotreturnfortheirsecondyearofcollege.

    – Weusethepredictedprobabilitiesfromthelogisticregressionforclassification.

    –  Iftheprobabilityofacasebeinginclass1(notretained)isequaltoorgreaterthan0.5,thatcaseisclassifiedasa1.

    – Anycasewithanestimatedprobabilityoflessthan0.5wouldbeclassifiedasa0(retained).

  • Provost’sOfficeExample

    •  Logisticregressionvariables– Staffordloan– Liveoncampus– Firstgenerationcollegestudent– RAI– Selectiveprogramofstudy– HighSchoolGPA

  • Provost’sOfficeExample

    •  Thepredictedprobabilityofnotbeingretainedisusedtoclassifyeachcase.

    •  Studentswithaveryhighprobabilityareexpectedtonotreturn.

    •  Studentswithaverylowprobabilityareexpectedtoreturnintheir2ndyear.

  • Provost’sOfficeExampleROC from logistic regression classifier

    False positive rate

    True

    pos

    itive

    rate

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    TheROCcurvefromthegivenclassifier:logisticregressionpredictedprobabilities…meh

    *PlotgeneratedfromROCRpackageinR.

  • Provost’sOfficeExampleThebluepointrepresentstheclassifierbasedonprobability=0.5cutoff.

    Sensitivity:41/859=4.77%Specificity:

    4762/4799=99.23%OverallMisclassification:

    855/5658=0.1511

    ROC from logistic regression classifier

    False positive rate

    True

    pos

    itive

    rate

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    *Pointcalculatedbyhandandadded.

  • Provost’sOfficeExampleWecanalsolookataccuracy(1-misclassificationrate)vs.thecutoff.Thebestcutoffforhighestaccuracyis0.4623.

    Sensitivity:66/859=7.68%Specificity:

    4750/4799=98.97%OverallMisclassification:

    842/5658=0.1488

    Cutoff

    Accuracy

    0.0 0.2 0.4 0.6

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    *PlotgeneratedfromROCRpackageinR.

    (Prettyclosetoour0.5threshold)

  • Provost’sOfficeExampleThegreenpointrepresentstheoptimalclassifierbasedonequalimportanceofsensitivityandspecificity(asmaxof“sensitivity+specificity”)whichisprob=0.191cutoff.

    Sensitivity:55.2%Specificity:77.8%OverallMisclassification:

    1456/5658=0.2573

    ROC from logistic regression classifier

    False positive rate

    True

    pos

    itive

    rate

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    *PointdeterminedfromEpipackage.

  • 0.0 0.2 0.4 0.6 0.8

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    cutoff (prob)

    SpecificitySensitivity

    SensitivitySpecificitySens+SpecDist from top left to ROC curve

    Provost’sOfficeExampleOptimalcutoffscanbefoundfromsoftwareorcalculated.

    Maxsens+speccutoff(0.191) Our0.5cutoff

    Mineucliddistancecutoff(0.146)

    Cutoffwhereeveryonedeclaredpositive.

    Cutoffwherenoonedeclaredpositive.

  • ComparingclassifierswithROCsThecurvesmayvisually“look”different,butaretheyreally?Ifwecollecteddataagain,wouldtheylookthisdifferent?

  • ROC curve with 95% CIs

    Specificity (%)

    Sen

    sitiv

    ity (%

    )

    020

    4060

    80100

    100 80 60 40 20 0

    AUC: 71.8% (69.9%–73.7%)AUC: 71.8% (69.9%–73.7%)

    BootstrappingtheROC

    Canusebootstrappingtocreateaconfidencebandforthesensitivity(givenspecificity)fortheROCcurve.

    *PlotgeneratedfrompROCpackageinR.

  • BootstrappingtheROC

    Butthebootstrapresultsdon’talwayslooksonice.

    ROC curve with 95% CIs

    Specificity (%)

    Sen

    sitiv

    ity (%

    )

    020

    4060

    80100

    100 80 60 40 20 0

    AUC: 83.0% (60.4%–100.0%)AUC: 83.0% (60.4%–100.0%)

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ROC curve

    1-Specificity (i.e. % of true negatives incorrectly declared positive)

    Sen

    sitiv

    ity (%

    of t

    rue

    posi

    tives

    dec

    lare

    d po

    sitiv

    e)

    *PlotgeneratedfrompROCpackageinR.

  • BootstrappingROCcurvesforcomparison

    Semi-transparentcoloringinRcancome-inhandy.# semi-transparent red: col="#ff000030"

    # semi-transparent blue: col="#0000ff20"

    ROC curve with 95% Bootstrapped CIs

    Specificity (%)

    Sen

    sitiv

    ity (%

    )

    020

    4060

    80100

    100 80 60 40 20 0*PlotgeneratedfrompROCpackageinR.

  • ############################################ 1a) Manually scripting the ROC curve:############################################

    ## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.

    ## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)

    ## Gather the data for the ROC curve:ROC.info.LogReg=data.frame(NotRetained,preds,pred.outcome)## Put in order by predicted probability, largest first:ROC.info.LogReg=ROC.info.LogReg[rev(order(ROC.info.LogReg$preds)),]

    ## A 'positive' will be considered as someone who was not retained.(num.true.pos=sum(NotRetained==1))(num.true.neg=sum(NotRetained==0))plot(c(0,1),c(0,1),type="n",xlab="1-Specificity (i.e. % of true neg's incorrectly declared positive)",ylab="Sensitivity (% of true pos's declared positive)",main="ROC curve (logistic regression)",sub="False Positive Rate")x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.poslines(x,y)abline(0,1)

  • #################################### 1b) Manually calculating AUC:####################################

    ## Calculate AUC thinking as geometric trapezoid shapes:#http://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand

    ## Same x,y labeling as previous slide:x=cumsum(1-ROC.info.LogReg$NotRetained)/num.true.negy=cumsum(ROC.info.LogReg$NotRetained)/num.true.pos

    height = (y[-1]+y[-n])/2width = diff(x) sum(height*width)

  • ########################################################## 1c) Manually calculating sensitivity, specificity, ## ## misclassification rate for 0.5 threshold: ##########################################################

    ## All the needed info is in the following table:table(NotRetained,pred.outcome) # 0 1# 0 4762 37# 1 818 41

  • ################################################ 2) ROCR package for creating ROC curve: ################################################library(ROCR)

    pred=prediction(preds,NotRetained)

    perf=performance(pred,"tpr","fpr")

    plot(perf,main="ROC from logistic regression classifier")

    (AUC.ROCR=performance(pred,"auc")@y.values[[1]])

    ## Plot accuracy (1-misclassification rate) vs. cutoff:acc = performance(pred, "acc")(ac.val = max(unlist([email protected])))#[1] 0.8511842

    th = unlist([email protected])[unlist([email protected]) == ac.val]

    plot(acc)abline(v=th, col='grey', lty=2)th#[1] 0.462324

  • ################################################ 3) pROC package for bootstrapped ROC: ################################################library(pROC)

    ## ‘NotRetained’ is the 0-1 response variable.## ‘preds’ are the leave-one-out predicted probabilities.

    ## Use 0.5 as the cutoff for classifying:pred.outcome=round(preds)

    rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% CIs", percent=TRUE, ci=TRUE, print.auc=TRUE)

    ## Calculate CI of sensitivity at select set of ## specificities and form a 'band' (might take a bit): ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#1c61b6AA") # blue band

    ## Use bootstrap to add CI in both directions at "best" cutoff: plot(ci(rocobj, of="thresholds", thresholds="best"),col="yellow",lwd=2)

  • ## Overlay Bootstrapped ROC's:rocobj=plot.roc(NotRetained, preds, main="ROC curve with 95% Bootstrapped CIs", percent=TRUE, ci=TRUE, print.auc=FALSE)

    ciobj=ci.se(rocobj,specificities=seq(0, 100, 5)) plot(ciobj, type="shape", col="#ff000030") # semi-transparent red color

    ## Gather info on second classifier:ROC.info.LogReg.2=data.frame(NotRetained,preds.2,pred.outcome.2)

    ## Overlay the second ROC curve onto the first:rocobj2=plot.roc(ROC.info.LogReg.2$NotRetained, ROC.info.LogReg.2$preds.2, percent=TRUE, ci=TRUE, print.auc=FALSE, add=TRUE)

    ciobj2=ci.se(rocobj2,specificities=seq(0, 100, 5)) plot(ciobj2, type="shape", col="#0000ff20") # semi-transparent blue color

  • Somereferences

    •  Flach,P.A.,(2016).ROCAnalysis.ChapterinEncyclopediaofMachineLearningandDataMining.–  https://research-information.bristol.ac.uk/files/94977288/Peter_Flach_ROC_Analysis.pdf

    •  James,G.,Witten,D.,Hastie,T.andTibshirani,R.(2015).AnIntroductiontoStatisticalLearningwithApplicationsinR.–  http://www-bcf.usc.edu/~gareth/ISL/–  Click‘DownloadthebookPDF’

    •  Fawcett,T.(2006).AnIntroductiontoROCAnalysis.PatternRecognitionLetters27pp.861-874.