Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep...

41
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation Marco Ancona, Cengiz Öztireli 2 , Markus Gross 1,2 1 Department of Computer Science, ETH Zurich, Switzerland 2 Disney Research, Zurich, Switzerland

Transcript of Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep...

Page 1: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Marco Ancona, Cengiz Öztireli2, Markus Gross1,2

1Department of Computer Science, ETH Zurich, Switzerland2Disney Research, Zurich, Switzerland

Page 2: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Attribution method

Pre-trainedmodel

TARGET

Page 3: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Layer-wise Relevance Propagation (LRP)Bach et al. 2015

DeepLIFTShrikumar et al. 2017

Saliency MapsSimonyan et al. 2015

Integrated GradientsSundararajan et al. 2017

Grad-CAMSelvaraju et al. 2016

Simple occlusionZeiler et al. 2014

LIMERibeiro et al. 2016

GuidedBackpropagationSpringenberg et al. 2014

Prediction DifferenceAnalysisZintgraf et al. 2017

Meaningful PerturbationFong et al. 2017

Gradient * InputShrikumar et al. 2016

KernelSHAP/DeepSHAPLundberg et al., 2017

Attribution methods

Page 4: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Evaluating attribution methods

• No ground-truth explanation à not easy to evaluate empirically

Page 5: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Evaluating attribution methods

• No ground-truth explanation à not easy to evaluate empirically• Often based on heuristics à not easy to justify theoretically

Page 6: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Evaluating attribution methods

• No ground-truth explanation à not easy to evaluate empirically• Often based on heuristics à not easy to justify theoretically

“Axiomatic approach”From a set of desired properties to the method definition

Page 7: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

(Some) desirable propertiesCompleteness

ContinuityAttributions for two nearly identical inputs on a continuous function should be nearly identical.

LinearityAttributions generated for a linear combination of two models should also be a linear combination of the original attributions.

Symmetry

Attributions should sum up to the output of the function being considered, for comprehensive accounting.

If two features have exactly the same role in the model, they should receive the same attribution.

Page 8: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values Shapley,LloydS.,1953

The only attribution method that satisfies all the aforementioned properties.

Page 9: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values Shapley,LloydS.,1953

The only attribution method that satisfies all the aforementioned properties.

Page 10: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

The function to analyze(eg. the map from the input layer to a specific output

neuron in a DNN)

Shapley,LloydS.,1953

Page 11: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley ValuesS is a given set of input features

Shapley,LloydS.,1953

Page 12: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values Shapley,LloydS.,1953

Page 13: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

All unique subsets S of features taken from the input (set) P

Shapley,LloydS.,1953

Page 14: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

All unique subsets S of features taken from the input (set) P

Shapley,LloydS.,1953

Page 15: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

average

Shapley,LloydS.,1953

Page 16: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

marginal contributionaverage

Shapley,LloydS.,1953

Page 17: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

marginal contributionaverageall subsets

Shapley,LloydS.,1953

Page 18: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

marginal contributionaverageall subsets

“The average marginal contribution of a feature with respect to all subsets of other features”

Shapley,LloydS.,1953

Page 19: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley Values

Issue: testing all subsets is unfeasible!

Shapley,LloydS.,1953

Page 20: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley value sampling Castroetal.,2009

0.16

Page 21: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

0.16 0.10

Shapley value sampling Castroetal.,2009

Page 22: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

0.16 0.10 0.25

Shapley value sampling Castroetal.,2009

Page 23: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

0.16 0.10 0.25 -0.35

Shapley value sampling Castroetal.,2009

Page 24: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Pros: Shapley value sampling is unbiased

Page 25: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Pros: Shapley value sampling is unbiased

Cons: might require a lot of samples (network evaluations) to produce an accurate result

Page 26: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Pros: Shapley value sampling is unbiased

Cons: might require a lot of samples (network evaluations) to produce an accurate result

Can we avoid sampling?

Page 27: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation
Page 28: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation
Page 29: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Shapley value sampling

Page 30: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

Page 31: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

Page 32: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

ReLU

Page 33: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

ReLU

k out of NFeatures on

Page 34: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

ReLU

k out of NFeatures on

Page 35: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

“Rectified” Normal Distribution

ReLU

k out of NFeatures on

Page 36: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

0

0

0

0

0

0

Page 37: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley Propagation

0

0

0

0

0

0

Page 38: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley PropagationTo propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Gast etal.,2018

Page 39: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Deep Approximate Shapley PropagationTo propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Affine transformation

Rectified Linear Unit

Leaky Rectified Linear Unit

Mean pooling

Max pooling

Gast etal.,2018

The use of other probabilistic frameworks is also possible

Page 40: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

DASP vs other methods

Gradient-based methods ü (Very) fast✗ Poor Shapley Value estimation

Sampling-based methods ü Unbiased Shapley Value estimator✗ Slow

DASP

Page 41: Explaining Deep Neural Networks with a Polynomial Time ...13-09-00)-13-09-25... · Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

For details, come at the posterPacific Ballroom #63

Thank you

Lightweight Probabilistic Deep Network (Keras)github.com/marcoancona/LPDN

Deep Approximate Shapley Propagationgithub.com/marcoancona/DASP

ReferencesLloyd S. Shapley, A value for n-person games, 1952Castro et al., Polynomial calculation of the Shapley value based on sampling, 2009Fatima et al., A linear approximation method for the Shapley value, 2014Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, 2016Sundararajan et al., Axiomatic attribution for deep networks, 2017Shrikumar at al., Learning important features through propagating activation differences, 2017 Lundberg et al., A Unified Approach to Interpreting Model Predictions, 2017Gast et al., Lightweight Probabilistic Deep Networks, 2018