Ideas on Machine Learning Interpretability
Patrick Hall, Wen Phan, SriSatish Ambati and the H2O.ai teamFebruary 2017
2
Part 1: Seeing your data• Glyphs• Correlation graphs• 2-D projections• Partial dependence plots• Residual analysis
Part 2: Using machine learning in regulated industry• OLS regression alternatives• Build toward ML model benchmarks• ML in traditional analytics processes• Small, interpretable ensembles• Monotonicity constraints
Part 3: Understanding complex ML models• Surrogate models• LIME• Maximum activation analysis• Sensitivity analysis• Variable importance measures• TreeInterpreter
Contents
3
A framework for interpretabilityComplexity of learned functions:• Linear, monotonic• Nonlinear, monotonic• Nonlinear, non-monotonic
Enhancing trust and understanding: the mechanisms and results of an interpretable model should be both transparent AND dependable.
Scope of interpretability: Global vs. local
Application domain:Model-agnostic vs. model-specific
Part 1: Seeing your data
5
GlyphsVariables and their values can be represented by small pictures with certain attributes, called “glyphs”.
Here the four variables are represented by their position in a square and their values are represented by a color.
6
Correlation graphs
The nodes of this graph are the variables in a data set. The weights between the nodes are defined by the absolute value of their pairwise Pearson correlation.
7
2-D projections
784 dimensions to 2 dimensions with PCA 784 dimensions to 2 dimensions with autoencoder network
8
Partial dependence plots
Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
HomeValue ~ MedInc + AveOccup + HouseAge + AveRooms
9
Residual analysis
Residuals from a machine learning model should be randomly distributedobvious patterns in residuals can indicate problems with data preparation or model specification
Part 2: Using ML in regulated industry
11
OLS regression alternatives
Penalized Regression Generalized Additive Models Quantile Regression
Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
12
Build toward ML model benchmarks
Incorporate interactions and piecewise linear components to increase the accuracy of linear models relative to machine learning models.
13
Machine learning in traditional analytics processes
14
Small, interpretable ensembles
15
Monotonicity constraints
Part 3: Understanding complex machine learning models
17
Surrogate models
18
Local interpretable model-agnostic explanations
Source: https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
19
Maximum activation analysis
20
Sensitivity analysis
Data distributions shift over time. How will your model handle these shifts?
21
Variable importance measuresGlobal variable importance indicates the impact of a variable on the model for the entire training data set.
Local variable importance can indicate the impact of a variable for each decision a model makes – similar to reason codes.
22
Treeinterpreter
Source: http://blog.datadive.net/interpreting-random-forests/
Tree interpreter decomposes decision tree and random forest predictions into bias (overall average) and component terms.
This slide portrays the decomposition of the decision path into bias and individual contributions for a single decision tree - treeinterpreter simply prints a ranked list of the bias and individual contributions for a given prediction.
Top Related