Civitas Learning: Understanding ROC Curves

22
Introduction to ROC Curves Data Science Basics Series May 14, 2014

description

ROC curves can help us determine the effectiveness of predictive models. This presentation gives an introduction of how to understand and use ROC curves to maximize impact.

Transcript of Civitas Learning: Understanding ROC Curves

Page 1: Civitas Learning: Understanding ROC Curves

Introduction to ROC Curves Data Science Basics Series

May 14, 2014

Page 2: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

What is ROC? Receiver Operating Characteristic

Systematically trade off detection against false alarm Using

You woke me up at 3 am!!!

Wake up,

you’re late for

class!!!

Page 3: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

A Brief History of ROC Curves • Developed by electrical engineers and radar

operators during WWII to detect enemy airplanes vs. geese. •  Illustrates the performance of binary classifiers -

elements in a set divided into two groups • Compares trade-offs between detection and false

alarm rate • Now used in many fields

•  Psychology •  Medicine and biometrics •  More recently in machine learning and data mining

Page 4: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Detection vs. False Alarm  • Detec7on/sensi7vity/true  posi7ve  rate  measures  how  many  true  posi0ve  cases  are  correctly  detected  •  False  alarm/specificity/false  posi7ve  rate  measures  the  number  of  false  alarms  •  Tradeoff:  Usually  can  op0mize  for  one  but  not  both  •  Example:  Disease  detec0on  •  Sacrifice  false  alarm  for  detec0on  if  cost  of  missed  detec0on  is  alarmingly  high    

Page 5: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

How is ROC Generated?

Model

GPA Activities Courses

Financial aid SAT/ACT

High school

Features à Scores à PDF à ROC

Prob

abili

ty o

f det

ectio

n

Probability of false alarm

Optimal point on the ROC curve depends on reach capacity and ROI

Predicted  risk Score

Page 6: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

How is ROC Generated? Features à Scores à PDF à ROC

GPA Activities Courses

Financial aid SAT/ACT

High school

Prob

abili

ty o

f det

ectio

n

Probability of false alarm

Optimal point on the ROC curve depends on reach capacity and ROI

Predicted  risk Score

Model

Page 7: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

How is ROC Generated? Features à Scores à PDF à ROC

Cutoff  threshold  

GPA Activities Courses

Financial aid SAT/ACT

High school

Prob

abili

ty o

f det

ectio

n

Probability of false alarm

Optimal point on the ROC curve depends on reach capacity and ROI

Predicted  risk Score

Model

Page 8: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Model Performance

Overlap is a measure of the model’s ability to separate between success and failure.

With a strong model you can be confident of assigning a particular score to an outcome category.

With a weaker model, there is a large amount of overlap, so a particular score could mean that an outcome can be either good or bad with equal probability.

STRONG  MODEL  

WEAK  MODEL  

Predicted  risk  score  

ROC  

Page 9: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

False  Alarm  Rate  

Detec0on

 Rate  

Parts of a ROC Curve

Civitas  Model  

Random  Ordering          

       

Page 10: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

False  Alarm  Rate  

Detec0on

 Rate  

Parts of a ROC Curve

Total Population: •  10,000 students •  9,000 continued •  1,000 did not continue

ROC Information •  Correct identification rate of non-

continuing students = 125/1,250 = 10%

Point on Line: •  1,250 students •  1,125 continued •  125 did not continue

Page 11: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

False  Alarm  Rate  

Detec0on

 Rate  

Parts of a ROC Curve

Total Population: •  10,000 students •  9,000 continued •  1,000 did not continue

ROC Information •  Correct identification rate of non-

continuing students = 750/7,500 = 10%

Point on Line: •  7,500 students •  6,750 continued •  750 did not continue

Page 12: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

False  Alarm  Rate  

Detec0on

 Rate  

Tradeoffs: Without the model, more advisors are needed to reach more students who will not persist.

As you go up and to the right, you would be reaching out to more at-risk students (higher detection rate), but more interventions require more advising time and resources since correct identification rate of non-continuing students remains at the same 10%.

Page 13: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Model Performance: With the model, the same number of advisors can reach out to 5X more students who will not persist.

Total Population: •  10,000 students •  9,000 continued •  1,000 did not continue

Point on Line: •  1,250 students •  1,125 continued •  125 did not continue •  Correct = 125/1250 = 10.0%

ROC Information: •  1,250 students •  650 continued •  600 did not continue •  Correct identification rate of non-

continuing students = 600/1250 = 48.0%

False  Alarm  Rate  

Detec0on

 Rate  

Civitas  Model  

Random  Ordering          

       ~5X  

Page 14: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Model Evaluation

With a stronger predictive model •  Detection rate improves

•  False alarm rate decreases

•  Correctness increases at every student threshold

False  Alarm  Rate  

Detec0on

 Rate  

       

        Civitas  Model  

Random  Ordering  

Page 15: Civitas Learning: Understanding ROC Curves

ACCURACY VS. ROC CURVES

Why is accuracy an incomplete and likely misleading measure of a predictive model?

Page 16: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Accuracy vs. ROC Curves Case: You use an algorithm to identify students who are at risk of not continuing to the next term. Following the case study, 10% of students do not persist. You test your predictive model on the data and find that you made correct predictions 92% of the time.

Page 17: Civitas Learning: Understanding ROC Curves

A crackpot scientist tells you,

“I could’ve gotten 90% accuracy just by predicting

everyone will persist. After all the math, you gained only

2%?!”

Don’t give up yet! Your predictive model is still helpful.

Accuracy vs. ROC Curves

Page 18: Civitas Learning: Understanding ROC Curves

You have a team of advisors, and they have time to reach out to 1,250 students to suggest ways they can increase their likelihood of persisting.

Accuracy vs. ROC Curves

=  100  students  

Page 19: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Accuracy vs. ROC Curves Without the predictive model, you have to pick 1,250 students at random to assist. If 10% of them are expected to not persist, only 125 students would be likely to benefit from the intervention.

Page 20: Civitas Learning: Understanding ROC Curves

CIVITAS  LEARNING,  INC.  –  CONFIDENTIAL  INFORMATION  

Accuracy vs. ROC Curves With the predictive model, you can choose the 1,250 students by ordering them by the highest predicted risk score. The test case reveals 600 of these students are at risk and would be most likely to benefit from the right intervention at the right time.

Page 21: Civitas Learning: Understanding ROC Curves

WITHOUT

PREDICTIVE MODEL

WITH PREDICTIVE MODEL

The ROC Curve Tradeoff

Students most likely to benefit from an intervention

~5x improvement

Page 22: Civitas Learning: Understanding ROC Curves

THANK YOU

VIEW this webinar on-demand on our LinkedIn Page

FOLLOW @CivitasLearning to continue the conversation on Twitter

SHARE comments and ideas for future webinars on the Civitas Learning Space

linkedin.com/company/Civitas-Learning twitter.com/CivitasLearning civitaslearningspace.com