Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions...
Transcript of Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions...
![Page 1: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/1.jpg)
Explaining and Interpreting Deep Neural Networks
Klaus-Robert Müller, Wojciech Samek, Gregoire
Montavon, Sebastian Lapuschkin, Kristof Schütt et al.
![Page 2: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/2.jpg)
Outline
• general remarks on ML, also on explaining and interpreting
• understanding single decisions of nonlinear learners
• Layer-wise Relevance Propagation (LRP)
• Applications in Neuroscience and Physics
![Page 3: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/3.jpg)
ML in a nutshell
Kernel Methods: SVM etc.
Deep Neural Networks
![Page 4: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/4.jpg)
Based on: ICASSP 2017 Tutorial
![Page 5: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/5.jpg)
Acknowledgements
![Page 6: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/6.jpg)
Recent ML systems reach superhuman performance
ML in the sciences
![Page 7: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/7.jpg)
From Data to Information
![Page 8: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/8.jpg)
From Data to Information
![Page 9: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/9.jpg)
Interpretable vs. powerful models?
![Page 10: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/10.jpg)
Interpretable vs. powerful models?!
Kernel machines
![Page 11: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/11.jpg)
Interpretable vs. powerful models?
![Page 12: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/12.jpg)
Different dimensions of interpretability
![Page 13: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/13.jpg)
Why interpretability?
![Page 14: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/14.jpg)
Why interpretability?
![Page 15: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/15.jpg)
Why interpretability?
![Page 16: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/16.jpg)
Why interpretability?
![Page 17: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/17.jpg)
Why interpretability?
![Page 18: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/18.jpg)
Why interpretability?
![Page 19: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/19.jpg)
Techniques of Interpretation
![Page 20: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/20.jpg)
Techniques of Interpretation
![Page 21: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/21.jpg)
Techniques of Interpretation
![Page 22: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/22.jpg)
Techniques of Interpretation
![Page 23: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/23.jpg)
Techniques of Interpretation
![Page 24: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/24.jpg)
Techniques of Interpretation
![Page 25: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/25.jpg)
Techniques of Interpretation
![Page 26: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/26.jpg)
Techniques of Interpretation
![Page 27: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/27.jpg)
Interpreting models
![Page 28: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/28.jpg)
Interpreting with class prototypes
![Page 29: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/29.jpg)
Examples of Class Prototypes
![Page 30: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/30.jpg)
Building more natural prototypes
Montavon, Samek, Müller arxiv 2017
![Page 31: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/31.jpg)
Building Prototypes using a generator
![Page 32: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/32.jpg)
Building Prototypes using a generator
![Page 33: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/33.jpg)
Types of Interpretation
![Page 34: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/34.jpg)
Approaches to interpretability
![Page 35: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/35.jpg)
Explaining models
![Page 36: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/36.jpg)
Explaining Neural Network Predictions
Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers
- based on generic theory (related to Taylor decomposition – deep taylor decomposition M et al 16)
- applicable to any NN with monotonous activation, BoW models, Fisher Vectors, SVMs etc.
Explanation: “Which pixels contribute how much to the classification” (Bach et al 2015)
(what makes this image to be classified as a car)
Sensitivity / Saliency: “Which pixels lead to increase/decrease of prediction score when changed”
(what makes this image to be classified more/less as a car) (Baehrens et al 10, Simonyan et al 14)
Cf. Deconvolution: “Matching input pattern for the classified object in the image” (Zeiler & Fergus 2014)
(relation to f(x) not specified) Activation Maximization
Each method solves a different problem!!!
![Page 37: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/37.jpg)
Classification
cat
ladybug
dog
large activation
Explaining Neural Network Predictions
![Page 38: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/38.jpg)
Explanation
cat
ladybug
dog
=
Initialization
Explaining Neural Network Predictions
![Page 39: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/39.jpg)
Explanation
cat
ladybug
dog
Theoretical interpretation
Deep Taylor Decomposition
?
Explaining Neural Network Predictions
depends on the activations and the weights
![Page 40: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/40.jpg)
Explanation
cat
ladybug
dog
Relevance Conservation Property
Explaining Neural Network Predictions
large relevance
![Page 41: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/41.jpg)
Advantages of LRP over Sensitivity
1. Global explanations: What makes a car a car and not what makes a car less / more a car.
2. No discontinuities: small variations do not result in large changes of the relevance.
![Page 42: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/42.jpg)
Advantages of LRP over both Sensitivity and Deconvolution
Image specific explanations: LRP takes into account the activations.
LRP provides different
explanations for different
input images.
For NNs without pooling
layers Sensitvity and
Deconvolution provides the
same explanations for
different samples.
![Page 43: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/43.jpg)
Positive and Negative Evidence: LRP distinguishs between positive evidence,
supporting the classification decision, and negative evidence, speaking against the
prediction
LRP indicates what speaks
for class ‘3’ and speaks
against class ‘9’
The sign of Sensitivity and
Deconvolution does not have
this interpretation.
-> taking norm gives unsigned
visualizations
Advantages of LRP over both Sensitivity and Deconvolution
![Page 44: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/44.jpg)
Male or Female?
![Page 45: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/45.jpg)
Advantages of LRP over both Sensitivity and Deconvolution
Aggregation of Relevance: LRP explanations are normalized (conservation of
relevance). This allows to meaningfully aggregate relevance over datasets or
regions in an image.
![Page 46: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/46.jpg)
Explaining Neural Network Predictions
Sensitivity Deconvolution LRP
![Page 47: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/47.jpg)
Application: understanding different DNN Architectures
GoogleNet focuses on the
animal faces and only few pixels.
BVLC CaffeNet is less sparse.
![Page 48: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/48.jpg)
Explaining Predictions Pixel-wise
Neural networks Kernel methods
![Page 49: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/49.jpg)
Understanding learning models
for complex gaming scenarios
![Page 50: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/50.jpg)
Analysing Breakout: LRP vs. Sensitivity
LRP sensitivity
![Page 51: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/51.jpg)
Perspectives
![Page 52: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/52.jpg)
Is the Generalization Error
all we need?
![Page 53: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/53.jpg)
Application: Comparing Classifiers
![Page 54: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/54.jpg)
Machine Learning in the Sciences
![Page 55: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/55.jpg)
Machine Learning in Neuroscience
![Page 56: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/56.jpg)
BBCI Set-up: Let the machines learn
Artifact removal
[cf. Müller et al. 2001, 2007, 2008, Dornhege et al. 2003, 2007, Blankertz et al. 2004, 2005, 2006, 2007, 2008]
![Page 57: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/57.jpg)
Brain Computer Interfacing: ‚Brain Pong‘
Leitmotiv: ›let the machines learn‹
Berlin Brain Computer Ínterface
• ML reduces patient training from
300h -> 5min
Applications
• help/hope for patients (ALS,
stroke…)
• neuroscience
• neurotechnology (video
coding, gaming, monitoring
driving)
![Page 58: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/58.jpg)
DNN Explanation Motor Imagery BCI
Note: Explanation available for single Trial (Sturm et al submitted)
![Page 59: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/59.jpg)
Machine Learning in Chemistry,
Physics and Materials
Matthias Rupp, Anatole von Lilienfeld,
Alexandre Tkatchenko, Klaus-Robert Müller
![Page 60: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/60.jpg)
Machine Learning for chemical compound space
Ansatz:
instead of
[from von Lilienfeld]
![Page 61: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/61.jpg)
Coulomb representation of molecules
2.4
iii Z=M
ji
ji
ijRR
ZZ=M
{Z1,R
1}
{Z2,R
2}
{Z3,R
3}
{0,R22}{0,R
21} {0,R23}
+ phantom atoms
{Z4,R
4}
...
Coulomb Matrix (Rupp, Müller et al 2012, PRL)
ijM
2323 M
![Page 62: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/62.jpg)
Kernel ridge regression
Distances between M define Gaussian kernel matrix K
Predict energy as sum over weighted Gaussians
using weights that minimize error in training set
Exact solution
As many parameters as molecules + 2 global parameters, characteristic length-scale or kT of system (σ), and noise-level (λ)
[from von Lilienfeld]
![Page 63: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/63.jpg)
Predicting Energy of small molecules: Results
March 2012
Rupp et al., PRL
9.99 kcal/mol
(kernels + eigenspectrum)
December 2012
Montavon et al., NIPS
3.51 kcal/mol
(Neural nets + Coulomb sets)
2015 Hansen et al 1.3kcal/mol at
10 million times faster than the
state of the art
Prediction considered chemically
accurate when MAE is below 1
kcal/mol
Dataset available at http://quantum-machine.org
![Page 64: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/64.jpg)
Learning Atomistic Representations with
Deep Tensor Neural Networks
Kristof Schütt,Farhad Arbabzadah,
Stefan Chmiela, Alexandre Tkatchenko,
Klaus-Robert Müller
![Page 65: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/65.jpg)
Input Representation
![Page 66: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/66.jpg)
Deep Tensor Neural Network
![Page 67: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/67.jpg)
![Page 68: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/68.jpg)
DTNN in detail
![Page 69: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/69.jpg)
DTNN in detail II
![Page 70: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/70.jpg)
Chemical Compound Space
![Page 71: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/71.jpg)
Molecular Dynamics Simulations
![Page 72: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/72.jpg)
Explaining and Visualizing the learned interactions
![Page 73: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/73.jpg)
Local ‚potentials‘ for various probes
![Page 74: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/74.jpg)
Quantum Chemical Insights: aromaticity
![Page 75: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/75.jpg)
Quantum Chemical Insights
![Page 76: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/76.jpg)
Conclusion
• explaining & interpreting nonlinear models is essential
• orthogonal to improving DNNs and other models
• need for opening the blackbox …
• understanding nonlinear models is essential for Sciences & AI
• new theory: LRP is based on deep taylor expansion SAMEK LECTURE
Remark: @NIPS 2017 ML4QC & XPLAINABLE ML workshops
![Page 77: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/77.jpg)
![Page 78: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/78.jpg)
© 2013 Berlin Big Data Center • All Rights Reserved
![Page 79: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/79.jpg)
Further Reading I
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise
explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7).
Bießmann, F., Meinecke, F. C., Gretton, A., Rauch, A., Rainer, G., Logothetis, N. K., & Müller, K. R. (2010).
Temporal kernel CCA and its application in multimodal neuronal data analysis. Machine Learning,
79(1-2), 5-27.
Blum, L. C., & Reymond, J. L. (2009). 970 million druglike small molecules for virtual screening in the
chemical universe database GDB-13. Journal of the American Chemical Society, 131(25), 8732-8733.
Braun, M. L., Buhmann, J. M., & Müller, K. R. (2008). On relevant dimensions in kernel feature spaces. The
Journal of Machine Learning Research, 9, 1875-1908
Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., & Müller, K. R. (2017). Machine
learning of accurate energy-conserving molecular force fields. Science Advances, 3(5), e1603015.
Hansen, K., Montavon, G., Biegler, F., Fazli, S., Rupp, M., Scheffler, M., von Lilienfeld, A.O., Tkatchenko,
A., and Muller, K.-R. "Assessment and validation of machine learning methods for predicting molecular
atomization energies." Journal of Chemical Theory and Computation 9, no. 8 (2013): 3404-3419.
Hansen, K., Biegler, F., Ramakrishnan, R., Pronobis, W., von Lilienfeld, O. A., Müller, K. R., & Tkatchenko,
A. (2015). Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and
Nonlocality in Chemical Space, J. Phys. Chem. Lett. 6, 2326−2331.
Harmeling, S., Ziehe, A., Kawanabe, M., & Müller, K. R. (2003). Kernel-based nonlinear blind source
separation. Neural Computation, 15(5), 1089-1124.
Sebastian Mika, Gunnar Ratsch, Jason Weston, Bernhard Scholkopf, KR Muller (1999), Fisher discriminant
analysis with kernels, Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE
Signal Processing Society Workshop, 41-48.
Kloft, M., Brefeld, U., Laskov, P., Müller, K. R., Zien, A., & Sonnenburg, S. (2009). Efficient and accurate lp-
norm multiple kernel learning. In Advances in neural information processing systems (pp. 997-1005).
![Page 80: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/80.jpg)
Further Reading II
Laskov, P., Gehl, C., Krüger, S., & Müller, K. R. (2006). Incremental support vector learning: Analysis,
implementation and applications. The Journal of Machine Learning Research, 7, 1909-1936 Mika, S.,
Schölkopf, B., Smola, A. J., Müller, K. R., Scholz, M., & Rätsch, G. (1998). Kernel PCA and De-Noising
in Feature Spaces. In NIPS (Vol. 4, No. 5, p. 7).
Müller, K. R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to kernel-based
learning algorithms. Neural Networks, IEEE Transactions on, 12(2), 181-201.
Montavon, G., Braun, M. L., & Müller, K. R. (2011). Kernel analysis of deep networks. The Journal of
Machine Learning Research, 12, 2563-2581.
Montavon, Grégoire, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe,
Alexandre Tkatchenko, Anatole V. Lilienfeld, and Klaus-Robert Müller. "Learning invariant
representations of molecules for atomization energy prediction." In Advances in Neural Information
Processing Systems, pp. 440-448. 2012.
Montavon, G., Braun, M., Krueger, T., & Muller, K. R. (2013). Analyzing local structure in kernel-based
learning: Explanation, complexity, and reliability assessment. IEEE Signal Processing Magazine, 30(4),
62-74.
Montavon, G., Orr, G. & Müller, K. R. (2012). Neural Networks: Tricks of the Trade, Springer LNCS 7700.
Berlin Heidelberg.
Montavon, Grégoire, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen,
Alexandre Tkatchenko, Klaus-Robert Müller, and O. Anatole von Lilienfeld. "Machine learning of
molecular electronic properties in chemical compound space." New Journal of Physics 15, no. 9
(2013): 095003.
Snyder, J. C., Rupp, M., Hansen, K., Müller, K. R., & Burke, K. Finding density functionals with
machine learning. Physical review letters, 108(25), 253002. 2012.
![Page 81: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers](https://reader033.fdocuments.us/reader033/viewer/2022051922/600f7794ec31e570bf644097/html5/thumbnails/81.jpg)
Further Reading III
Pozun, Z. D., Hansen, K., Sheppard, D., Rupp, M., Müller, K. R., & Henkelman, G., Optimizing transition
states via kernel-based machine learning. The Journal of chemical physics, 136(17), 174101. 2012 .
K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K. R. Müller, and E. K. U. Gross, How to represent crystal
structures for machine learning: Towards fast prediction of electronic properties Phys. Rev. B 89,
205118 (2014)
K.T. Schütt, F Arbabzadah, S Chmiela, KR Müller, A Tkatchenko, Quantum-chemical insights from deep
tensor neural networks, Nature Communications 8, 13890 (2017)
Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for AdaBoost. Machine learning, 42(3), 287-
320.
Rupp, M., Tkatchenko, A., Müller, K. R., & von Lilienfeld, O. A. (2012). Fast and accurate modeling of
molecular atomization energies with machine learning. Physical review letters, 108(5), 058301.
Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue
problem. Neural computation, 10(5), 1299-1319.
Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization operators and
support vector kernels. Neural networks, 11(4), 637-649.
Schölkopf, B., Mika, S., Burges, C. J., Knirsch, P., Müller, K. R., Rätsch, G., & Smola, A. J. (1999). Input
space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5),
1000-1017.
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., & Müller, K. R. (2002). A new discriminative kernel
from probabilistic models. Neural Computation, 14(10), 2397-2414.
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Müller, K. R. (2000). Engineering support
vector machine kernels that recognize translation initiation sites. Bioinformatics, 16(9), 799-807.
.