AUC: at what cost(s)?

18
12 July 2016 AUC - at what cost(s)? Evaluating and comparing machine learning models Alex Korbonits, Data Scientist

Transcript of AUC: at what cost(s)?

Page 1: AUC: at what cost(s)?

12 July 2016

AUC - at what cost(s)?Evaluating and comparing machine learning modelsAlex Korbonits, Data Scientist

Page 2: AUC: at what cost(s)?

2

Introduction

About Remitly and Me

Page 3: AUC: at what cost(s)?

3

Introduction

• Model selection: data and algorithms aren’t the only knobs

• Problems with typical model selection strategies

• Review of model evaluation metrics

• Augmenting these metrics to address practical problems

• Why this matters to Remitly

Agenda

Page 4: AUC: at what cost(s)?

You may think in order to solve all of your machine learning problems, you only need to have…

Page 5: AUC: at what cost(s)?
Page 6: AUC: at what cost(s)?
Page 7: AUC: at what cost(s)?

... but you need to think carefully about model selection.

Page 8: AUC: at what cost(s)?

8

Why is model selection important?

• Big data is not enough:

• Not everyone has it. Or maybe the big data you have isn’t useful.

• Fancy algorithms are not enough:

• No Free Lunch Theorem (Wolpert, 1997). There isn’t a ”one-size-fits-all” model class. Deep learning not a silver bullet.

• Inadequate coverage in the literature:

• This is a practical problem, it’s hard, and it matters.

• Problems such as class imbalance and inclusion of economic constraints.

Model Selection

Page 9: AUC: at what cost(s)?

9

ML + Economics

• Loss matrices inadequate:

• Penalty of misclassification may vary per instance.

• E.g., size of transaction. Not all misclassifications result in same penalty even if misclassified from same class.

• Indifference curves good for post-training selection:

• We can compare tradeoffs of selecting different classification thresholds.

• EXTREMELY IMPORTANT when costs of false positives and false negatives are very, very different.

Economics: including costs/revenue into model selection

Page 10: AUC: at what cost(s)?

10

Classic machine learning

• Test positive and test negative (prediction outcomes)

• Condition positive and condition negative (actual values)

• True positive: condition positive and test positive

• True negative: condition negative and test negative

• False positive (Type I error): condition negative and test positive

• False negative (Type II error): condition positive and test negative

Confusion matrix

Page 11: AUC: at what cost(s)?

11

Radar in WWII

• Classic approach measuring area under the receiver operating characteristic (ROC)

• Pros:

• Standard in the literature

• Descriptive of predictive power across thresholds

• Cons:

• Ignores class imbalances

• Ignores constraints such as costs of FP vs. FN

My curve is better than your curve

Page 12: AUC: at what cost(s)?

12

Metrics affected by class imbalance

• X axis is recall == tpr == TP / (TP + FN)

• I.e., of the total positive instances, what proportion didour model classify as positive?

• Y axis is precision == TP / (TP + FP).

• I.e., of the positive classifications, what proportion were positive instances?

• Class imbalance affects this: WLOG, class imbalance shiftscurves down (for smaller positive classes).

• There exists a one-to-one mapping from ROC space to PR space. But optimizing ROC AUC != optimizing PR AUC.

Precision and Recall curves

Page 13: AUC: at what cost(s)?

13

Inclusion of costs in ROC Space

• Indifference Curve:

• Level set that defines, e.g., where your classifier implies business profitability vs. loss.

• Defined via constraint optimization (e.g., costs of quadrants in your confusion matrix).

• Points above this curve satisfy the constraint and are good. Points below == bad.

• Why we care:

• Orange model doesn’t have a threshold that crosses your indifference curve, even if its AUC is larger. No threshold for orange model can satisfy your constraint.

Cost curves in ROC Space

Page 14: AUC: at what cost(s)?

14

How do I pick the right threshold?

• Threshold choices:

• Find point with maximum distance from indifference curve.

• Of your threshold choices, this point maximizes your utility.

• Technically you’re at a higher indifference curve

• Other things to consider:

• Changes in your constraints – costs changes, therefore your indifference curve can change.

• Update models and thresholds subject to such changes.

Picking the right classifier threshold

Page 15: AUC: at what cost(s)?

15

Citing our sources

BibliographyDavis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." In Proceedings of the 23rd international conference on Machine learning, pp. 233-240. ACM, 2006.

Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7, 205–229

Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th International Conference on Machine Learning (pp. 445–453). Morgan Kaufmann, San Francisco, CA

Drummond, C., & Holte, R. (2000). Explicitly representing expected cost: an alternative to ROC representation. Proceeding of Knowledge Discovery and Datamining (pp. 198–207).

Drummond, C., & Holte, R. C. (2004). What ROC curves can’t do (and cost curves can). ROCAI (pp. 19–26)

Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159

Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters27, no. 8 (2006): 861-874

Metz, Charles E. "Basic principles of ROC analysis." In Seminars in nuclear medicine, vol. 8, no. 4, pp. 283-298. WB Saunders, 1978

Saito, Takaya, and Marc Rehmsmeier. "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets." PloS one 10, no. 3 (2015): e0118432

"Information Theoretic Metrics for Multi-class Predictor Evaluation", Sam Steingold, 2016, accessed 23 June 2016, http://www.slideshare.net/SessionsEvents/sam-steingold-lead-data-scientist-magnetic-media-online-at-mlconf-sea-5201

“Machine Learning Meets Economics”, Datacratic 2016, accessed 23 June 2016, http://blog.mldb.ai/blog/posts/2016/01/ml-meets-economics/

Page 16: AUC: at what cost(s)?

16

What we talked about

• Model selection: data and algorithms aren’t the only knobs

• Problems with typical model selection strategies

• Review of model evaluation metrics

• Augmenting these metrics to address practical problems

• Why this matters to Remitly

Summary

Page 17: AUC: at what cost(s)?

17

Remitly’s Data Science team uses ML for a variety of purposes.

ML applications are core to our business – therefore our business must be core to our ML applications.

Machine learning at Remitly

Page 18: AUC: at what cost(s)?

www.remitly.com/careers

We’re [email protected]