Sam Steingold, Lead Data Scientist, Magnetic Media Online at MLconf SEA - 5/20/16
-
Upload
mlconf -
Category
Technology
-
view
290 -
download
1
Transcript of Sam Steingold, Lead Data Scientist, Magnetic Media Online at MLconf SEA - 5/20/16
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Information Theoretic Metrics forMulti-Class Predictor Evaluation
Sam Steingold, Michal Laclavık
MLconf Seattle 2016-05-20
Slide 1 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Who We Are: A Fast Growing Startup
I One of the fastest growing tech startups – 50% YoYgrowth.
I Marketing Tech Platform: Email + Site + Ads + Mobile
I 700+ customers in Retail, Auto, Travel, CPG, Finance
I Four Engineering locations: NY, SF, London, Ann Arbor
I 275+ employees. We are hiring aggressively!
Slide 2 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 3 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Predictor
A predictor is a black box which spits out an estimate of anunknown parameter.
E.g.:
I Will it rain tomorrow?
I Will this person buy this product?
I Is this person a terrorist?
I Is this stock a good investment?
Slide 4 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Examples
Perfect – always right
Mislabeled – always the opposite
Random – independent of the actualI San Diego Weather Forecast:
Actual : 3 days of rain per 365 daysPredict : sunshine always!
I Coin flip
Actual : true half the timePredict : true if coin lands Head
Slide 5 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Why Evaluate Predictors?
I Which one is better?I How much to pay for one?
I You can always flip the coin yourself, so the random predictor isthe least valuable!
I When to use this one and not that one?
Slide 6 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Confusion Matrix + Cost
PredictorPredicted
Sun Rain Hurricane
ActualSun 100 10 1Rain 5 20 6
Hurricane 0 3 2
CostsPredicted
Sun Rain Hurricane
ActualSun 0 1 3Rain 2 0 2
Hurricane 10 5 0
Total cost (i.e., predictor value) = 45
Slide 7 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Confusion/Costs Matrix
Probability/CostsPredicted
Target Non-target
ActualBought 1% $1 9% $0
Did not buy 9% ($0.1) 81% $0
ProfitableExpected value of one customer: $0.001 > 0.
Worthless!The Predicted and Actual are independent!
Slide 8 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 9 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
What if the Cost is Unknown?
TotalN
PredictedPopulation True(PT) False(PF)
ActualTrue(AT) TP FN(type2)False(AF) FP(type1) TN
Perfect : FN = FP = 0
Mislabeled : TP = TN = 0
Random (Predicted & Actual are independent) :TPN = PT
N ×ATN
FNN = PF
N ×ATN
FPN = PT
N ×AFN
TNN = PF
N ×AFN
Slide 10 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Metrics Based on the Confusion Matrix8 partial measures
1. Positive predictive value (PPV, Precision): TPPT
2. False discovery rate (FDR): FPPT
3. False omission rate (FOR): FNPF
4. Negative predictive value (NPV): TNPF
5. True positive rate (TPR, Sensitivity, Recall): TPAT
6. False positive rate (FPR, Fall-out): FPAF
7. False negative rate (FNR): FNAT
8. True negative rate (TNR, Specificity): TNAF
Slide 11 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Metrics Based on the Confusion Matrix4 total measures
1. Accuracy: P(Actual = Predicted).
2. F1: the harmonic average of Precision and Recall
3. Matthew’s Correlation Coefficient (MCC): AKA Pearsoncorrelation coefficient.
4. Proficiency: the proportion of the information containedin the Actual distribution which is captured by thePredictor.
Slide 12 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Metric RequirementsMeaning : the meaning of the metric should be transparent
without resorting to averages of meaningful values
Discrimination :
Weak : its value is 1 for the perfect predictor (and only forit)
Strong : additionally, its value is 0 for a worthless (randomwith any base rate) predictor (and only for such apredictor)
Universality : the metric should be usable in any setting,whether binary or multi-class, classification (a unique classis assigned to each example) or categorization/communitydetection (an example can be placed into multiplecategories or communities)
Slide 13 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Accuracy
I P(Actual = Predicted) = tp+tnN
I Perfect: 1.
I Mislabeled: 0.
I Sun Diego Weather Forecast:I Accuracy = 362/365 = 99.2%I The predictor is worthless!
I Does not detect random predictors.
Slide 14 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
F1-Score
I The harmonic mean of Precision and Recall: 2×tp2×tp+fp+fn
I Perfect: 1
I 0 if either Precision or Recall is 0
I Correctly handles SDWF (because Recall = 0)...
I ...But only if we label rain as True!
I Otherwise: Recall = 100%, Precision = 99.2%,F1 = 99.6%
I F1 is Asymmetric (Positive vs Negative)
I Why “1”?! Fβ = (1+β2)×tp(1+β2)×tp+β2×fn+fp
Slide 15 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Matthews correlation coefficient
I AKA Phi coefficient, Pearson correlation coefficient:
tp× tn− fp× fn√(tp + fp)× (tp + fn)× (fp + tn)× (fn + tn)
I Range: [−1; 1]
I Perfect: 1
I Mislabeled: −1
I Random: 0
I Handles San Diego Weather Forecast
I Hard to generalize to non-binary classifiers.
Slide 16 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Uncertainty coefficient
I AKA Proficiency: α = I (Predicted;Actual)H(Actual)
I Range: [0; 1].
I Measures the share of information (percentage of bits)contained in the Actual which is captured by thePredictor.
I 1 for both Perfect and Mislabeled predictors.
I 0 for random predictors.
I Handles San Diego Weather Forecast and all the possiblequirks – the best.
I Easily generalizes to any number of categories.
Slide 17 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Comparison
i=0TP=20 FN=0FP=0 TN=180
i=50TP=15 FN=5FP=45 TN=135
i=100TP=10 FN=10FP=90 TN=90
i=150TP=5 FN=15
FP=135 TN=45
i=200TP=0 FN=20
FP=180 TN=0
0 50 100 150 200
−10
0−
500
5010
0
Binary Predictor Metric Comparison (base rate=10%)
i=0..200; confusion matrix: tp=0.1*(200−i), fn=0.1*i, fp=0.9*i, tn=0.9*(200−i)
met
ric, %
AccuracyF−scorePearson's phiProficiency
Slide 18 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
2 Against 2 – take 1
A =
[tp = 2 fn = 3fp = 0 tn = 45
]; B =
[tp = 5 fn = 0fp = 7 tn = 38
]A B
Proficiency 30.96% 49.86%Pearson’s φ 61.24% 59.32%
Accuracy 94.00% 86.00%F1-score 57.14% 58.82%
Slide 19 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
2 Against 2 – take 2
A =
[tp = 3 fn = 2fp = 2 tn = 43
]; B =
[tp = 5 fn = 0fp = 7 tn = 38
]A B
Proficiency 28.96% 49.86%Pearson’s φ 55.56% 59.32%
Accuracy 92.00% 86.00%F1-score 60.00% 58.82%
Slide 20 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Proficiency – The Odd One Out
A =
[tp = 3 fn = 2fp = 1 tn = 44
]; B =
[tp = 5 fn = 0fp = 6 tn = 39
]A B
Proficiency 35.55% 53.37%Pearson’s φ 63.89% 62.76%
Accuracy 94.00% 88.00%F1-score 66.67% 62.50%
Slide 21 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Accuracy – The Odd One Out
A =
[tp = 1 fn = 4fp = 0 tn = 45
]; B =
[tp = 5 fn = 0fp = 13 tn = 32
]A B
Proficiency 14.77% 34.57%Pearson’s φ 42.86% 44.44%
Accuracy 92.00% 74.00%F1-score 33.33% 43.48%
Slide 22 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
F1-score – The Odd One Out
A =
[tp = 1 fn = 4fp = 0 tn = 45
]; B =
[tp = 2 fn = 3fp = 2 tn = 43
]A B
Proficiency 14.77% 14.71%Pearson’s φ 42.86% 39.32%
Accuracy 92.00% 90.00%F1-score 33.33% 44.44%
Slide 23 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Predictor Re-Labeling
For a predictor P , let 1− P be the re-labeled predictor, i.e.,when P predicts 1, 1− P predicts 0 and vice versa.
Then
Accuracy(1− P) = 1− Accuracy(P)
φ(1− P) = −φ(P)
α(1− P) = α(P)
No similar simple relationship exists for F1.
Slide 24 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 25 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Multi-Class Prediction
I Examples:I Character recognition
I Mislabeling is badI Group detection
I Label rotation is fine
I Metrics:I Accuracy = P(Actual = Predicted)I No Recall, Precision, F1!
Slide 26 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Pearson’s φ
Define
φ2 =χ2
N=
N∑i=1
N∑j=1
(Oij − Eij)2
Eij
where
Oij = P(Predicted = i & Actual = j)
Eij = P(Predicted = i)× P(Actual = j)
I 0 for a worthless (independent) predictor
I Perfect predictor: depends on the data
Slide 27 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
ProficiencySame as before!
α =I (Predicted; Actual)
H(Actual)
H(A) = −N∑i=1
P(A = i) log2 P(A = i)
I (P ; A) =N∑i=1
N∑j=1
Oij log2
Oij
Eij
I 0 for the worthless predictor
I 1 for the perfect (and mis-labeled!) predictor
Slide 28 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
φ vs α
φ is to Chi-squared testsame as
α is to Likelihood-ratio test
Neyman-Pearson lemmaLikelihood-ratio test is the most powerful test.
Slide 29 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
This Metric is Old! Why is it Ignored?
Tradition : My teacher used it
Inertia : I used it previously
Cost : Log is more computationally expensive than ratiosI Not anymore!
Intuition : Information Theory is hardI Intuition is learned: start Information Theory in High
School!
Mislabeled = Perfect : Can be confusing or outrightundesirable
I Use the Hungarian algorithm
Slide 30 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 31 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Multi-Label Categorization
I Examples:I Text Categorization
I Mislabeling is badI But may indicate problems with taxonomy
I Community DetectionI Label rotation is fine
I Metrics:I No Accuracy: cannot handle partial matchesI Precision & Recall work again!
Slide 32 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Precision & Recall
Recall =
∑i #{objects correctly classified as ci}∑
i #{objects actually in ci}
=
∑i #{oj | ci ∈ Actual(oj) ∩ Predicted(oj)}∑
i #{oj | ci ∈ Actual(oj)}
=
∑j #[Actual(oj) ∩ Predicted(oj)]∑
j #Actual(oj)
Precision =
∑i #{objects correctly classified as ci}∑
i #{objects classified as ci}
=
∑i #{oj | ci ∈ Actual(oj) ∩ Predicted(oj)}∑
i #{oj | ci ∈ Predicted(oj)}
=
∑j #[Actual(oj) ∩ Predicted(oj)]∑
j #Predicted(oj)
Slide 33 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Precision & Recall – ?!
I The above is the “macro” Precision & Recall (and F1)
I Can also define “micro” Precision & Recall (and F1)
I There is some confusion as to which is which
Side NoteSingle label per object =⇒
Precision = Recall = Accuracy = F1
Slide 34 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Proficiency: DefinitionIntroduce binary random variables:
Aci := ci ∈ Actual
Pci := ci ∈ Predicted
Define:
α =I (∏
i Pci ;∏
i Aci)
H(∏
i Aci)
Problem: cannot compute!
I KDD Cup 2005 Taxonomy: 67 categories
I Cartesian product: 267 > 1020 � 800k examples
Slide 35 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Proficiency: Estimate
Numerator : Assume that Aci is independent of everythingbut Pci (similar to Naıve Bayes).
Denominator : Use H(A× B) ≥ H(A) + H(B)
Define:
α =
∑i I (Pci ; Aci)∑
i H(Aci)=
∑i H(Aci)α(Pci ,Aci)∑
i H(Aci)
where
α(Pci ,Aci) =I (Pci ; Aci)
H(Aci)
Slide 36 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Proficiency: Permuted
Recover re-labeling invarianceLet M(c) be the optimal assignment with the cost matrixbeing the pairwise mutual informations.Define the Permuted Proficiency metric:
α′ =
∑i I (M(Pci); Aci)∑
i H(Aci)=
∑i H(Aci)α(M(Pci),Aci)∑
i H(Aci)
M is optimal =⇒ α ≤ α′
(equality iff the optimal assignment is the identity.)
Slide 37 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Proficiency: Properties
Meaning(An estimate of) the share of the information contained in theactual distribution recovered by the classifier.
Strong DiscriminationYes!
UniversalityThe independence assumption above weakens the claim thatthe metric has the same meaning across all domains and datasets.
Slide 38 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Example: KDD Cup 2005
I 800 queries
I 67 categories
I 3 human labelers
Actual labeler 1 labeler 2 labeler 3Predicted labeler 2 labeler 3 labeler 1
Precision 63.48% 36.50% 58.66%Recall 41.41% 58.62% 55.99%α 24.73% 28.06% 33.26%α′ 25.02% 28.62% 33.51%
Reassigned 9 12 11
Slide 39 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Each Human Against Dice
Pit each of the three human labelers against the randomlabeler with the same category probability distribution:
Labeler 1 Labeler 2 Labeler 3
F1 14.3% 7.7% 19.2%examples/category 3.7± 1.1 2.4± 0.9 3.8± 1.1categories/example 44± 56 28± 31 48± 71
More examples/category and categories/example =⇒ higherF1!
Slide 40 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Example: Academic Setting
Consider a typical University department:Every professor serves on 9 administrative committees out of10 available.
Worthless PredictorAssign each professor to 9 random committees.
Performance
I Precision = Recall = F1 = 90%
I Proficiency: α = 0
Slide 41 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Example: Brand DetectionLaclavik, Steingold, Ciglan, WSDM 2016
I 10k queries, 1k brandsI 90% of queries have no brands, 9% - one brandI Precision = 30%,Recall = 93%,F1 = 45%.I Proficiency: α = 67%!I Precision & Recall don’t care about no-brand queries!
Add 10k queries without brands, Precision & Recall arethe same, Proficiency: α = 69%.
I Replace the empty brands with the None brand.I Original:
Precision = 78%,Recall = 81%,F1 = 80%, α = 61%.I Extra empties:
Precision = 89%,Recall = 91%,F1 = 90%, α = 65%.
Slide 42 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Numeric Stability
I Think of the data as an infinite stream of observations,and view the actually available data as a sample.
I How would the metrics change if the sample is different?I All metrics have approximately the same variability
(standard deviation):I ≈ 1% for 800 observations of KDD Cup 2005I ≈ 0.5% for 10,000 observations in the Magnetic data set
Slide 43 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 44 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Summary
I If you know the costs – use the expected value.
I If you know what you want (Recall/Precision &c) – use it.
I If you want a general metric, use Proficiency instead of F1.
Slide 45 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Implementation
Python code inhttps://github.com/Magnetic/proficiency-metric
Contributions of implementations in other languages arewelcome!
Slide 46 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 47 of 48
InformationTheoretic Metrics
for Multi-ClassPredictor
Evaluation
Sam Steingold,Michal Laclavık
Introduction:predictors andtheir evaluation
Binary Prediction
Multi-ClassPrediction
Multi-LabelCategorization
Summary
Conclusion
Conclusion
Questions?
Slide 48 of 48