Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep...
Transcript of Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep...
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation
Marco Ancona, Cengiz Öztireli2, Markus Gross1,2
1Department of Computer Science, ETH Zurich, Switzerland2Disney Research, Zurich, Switzerland
…
…
Attribution method
Pre-trainedmodel
TARGET
Layer-wise Relevance Propagation (LRP)Bach et al. 2015
DeepLIFTShrikumar et al. 2017
Saliency MapsSimonyan et al. 2015
Integrated GradientsSundararajan et al. 2017
Grad-CAMSelvaraju et al. 2016
Simple occlusionZeiler et al. 2014
LIMERibeiro et al. 2016
GuidedBackpropagationSpringenberg et al. 2014
Prediction DifferenceAnalysisZintgraf et al. 2017
Meaningful PerturbationFong et al. 2017
Gradient * InputShrikumar et al. 2016
…
KernelSHAP/DeepSHAPLundberg et al., 2017
Attribution methods
Evaluating attribution methods
• No ground-truth explanation à not easy to evaluate empirically
Evaluating attribution methods
• No ground-truth explanation à not easy to evaluate empirically• Often based on heuristics à not easy to justify theoretically
Evaluating attribution methods
• No ground-truth explanation à not easy to evaluate empirically• Often based on heuristics à not easy to justify theoretically
“Axiomatic approach”From a set of desired properties to the method definition
(Some) desirable propertiesCompleteness
ContinuityAttributions for two nearly identical inputs on a continuous function should be nearly identical.
LinearityAttributions generated for a linear combination of two models should also be a linear combination of the original attributions.
Symmetry
Attributions should sum up to the output of the function being considered, for comprehensive accounting.
If two features have exactly the same role in the model, they should receive the same attribution.
Shapley Values Shapley,LloydS.,1953
The only attribution method that satisfies all the aforementioned properties.
Shapley Values Shapley,LloydS.,1953
The only attribution method that satisfies all the aforementioned properties.
Shapley Values
The function to analyze(eg. the map from the input layer to a specific output
neuron in a DNN)
Shapley,LloydS.,1953
Shapley ValuesS is a given set of input features
Shapley,LloydS.,1953
Shapley Values Shapley,LloydS.,1953
Shapley Values
All unique subsets S of features taken from the input (set) P
Shapley,LloydS.,1953
Shapley Values
All unique subsets S of features taken from the input (set) P
Shapley,LloydS.,1953
Shapley Values
average
Shapley,LloydS.,1953
Shapley Values
marginal contributionaverage
Shapley,LloydS.,1953
Shapley Values
marginal contributionaverageall subsets
Shapley,LloydS.,1953
Shapley Values
marginal contributionaverageall subsets
“The average marginal contribution of a feature with respect to all subsets of other features”
Shapley,LloydS.,1953
Shapley Values
Issue: testing all subsets is unfeasible!
Shapley,LloydS.,1953
Shapley value sampling Castroetal.,2009
0.16
0.16 0.10
Shapley value sampling Castroetal.,2009
0.16 0.10 0.25
Shapley value sampling Castroetal.,2009
0.16 0.10 0.25 -0.35
Shapley value sampling Castroetal.,2009
Pros: Shapley value sampling is unbiased
Pros: Shapley value sampling is unbiased
Cons: might require a lot of samples (network evaluations) to produce an accurate result
Pros: Shapley value sampling is unbiased
Cons: might require a lot of samples (network evaluations) to produce an accurate result
Can we avoid sampling?
Shapley value sampling
Deep Approximate Shapley Propagation
Deep Approximate Shapley Propagation
Deep Approximate Shapley Propagation
ReLU
Deep Approximate Shapley Propagation
ReLU
k out of NFeatures on
Deep Approximate Shapley Propagation
ReLU
k out of NFeatures on
Deep Approximate Shapley Propagation
“Rectified” Normal Distribution
ReLU
k out of NFeatures on
Deep Approximate Shapley Propagation
0
0
0
0
0
0
Deep Approximate Shapley Propagation
0
0
0
0
0
0
Deep Approximate Shapley PropagationTo propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Gast etal.,2018
Deep Approximate Shapley PropagationTo propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Affine transformation
Rectified Linear Unit
Leaky Rectified Linear Unit
Mean pooling
Max pooling
…
Gast etal.,2018
The use of other probabilistic frameworks is also possible
DASP vs other methods
Gradient-based methods ü (Very) fast✗ Poor Shapley Value estimation
Sampling-based methods ü Unbiased Shapley Value estimator✗ Slow
DASP
For details, come at the posterPacific Ballroom #63
Thank you
Lightweight Probabilistic Deep Network (Keras)github.com/marcoancona/LPDN
Deep Approximate Shapley Propagationgithub.com/marcoancona/DASP
ReferencesLloyd S. Shapley, A value for n-person games, 1952Castro et al., Polynomial calculation of the Shapley value based on sampling, 2009Fatima et al., A linear approximation method for the Shapley value, 2014Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, 2016Sundararajan et al., Axiomatic attribution for deep networks, 2017Shrikumar at al., Learning important features through propagating activation differences, 2017 Lundberg et al., A Unified Approach to Interpreting Model Predictions, 2017Gast et al., Lightweight Probabilistic Deep Networks, 2018