L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data...
Transcript of L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data...
![Page 1: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/1.jpg)
CS 4803 / 7643: Deep Learning
Zsolt KiraGeorgia Tech
Topics: – Visualizing CNNs
![Page 2: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/2.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualizations?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion– Style Transfer
(C) Dhruv Batra and Zsolt Kira 2
![Page 3: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/3.jpg)
What’s going on inside ConvNets?
This image is CC0 public domain
Class Scores: 1000 numbers
Input Image:3 x 224 x 224
What are the intermediate features looking for?Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.Figure reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 4: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/4.jpg)
First Layer: Visualize Filters
AlexNet:64 x 3 x 11 x 11
ResNet-18:64 x 3 x 7 x 7
ResNet-101:64 x 3 x 7 x 7
DenseNet-121:64 x 3 x 7 x 7
Krizhevsky, “One weird trick for parallelizing convolutional neural networks”, arXiv 2014He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016Huang et al, “Densely Connected Convolutional Networks”, CVPR 2017
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 5: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/5.jpg)
Visualize the filters/kernels (raw weights)
We can visualize filters at higher layers, but not that interesting
(these are taken from ConvNetJS CIFAR-10 demo)
layer 1 weights
layer 2 weights
layer 3 weights
16 x 3 x 7 x 7
20 x 16 x 7 x 7
20 x 20 x 7 x 7
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 6: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/6.jpg)
FC7 layer
4096-dimensional feature vector for an image(layer immediately before the classifier)
Run the network on many images, collect the feature vectors
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Last Layer
![Page 7: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/7.jpg)
Last Layer: Nearest Neighbors
Test image L2 Nearest neighbors in feature space
4096-dim vector
Recall: Nearest neighbors in pixel space
Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.Figures reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 8: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/8.jpg)
Last Layer: Dimensionality Reduction
Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008Figure copyright Laurens van der Maaten and Geoff Hinton, 2008. Reproduced with permission.
Visualize the “space” of FC7 feature vectors by reducing dimensionality of vectors from 4096 to 2 dimensions
Simple algorithm: Principal Component Analysis (PCA)
More complex: t-SNE
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 9: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/9.jpg)
Last Layer: Dimensionality Reduction
Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.Figure reproduced with permission.
See high-resolution versions at http://cs.stanford.edu/people/karpathy/cnnembed/
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 10: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/10.jpg)
Visualizing Activations
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, 2014. Reproduced with permission.
![Page 11: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/11.jpg)
Visualizing Activations
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, 2014. Reproduced with permission.
conv5 feature map is 128x13x13; visualize as 128 13x13 grayscale images
https://youtu.be/AgkfIQ4IGaM?t=92
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 12: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/12.jpg)
Maximally Activating Patches
Pick a layer and a channel; e.g. conv5 is 128 x 13 x 13, pick channel 17/128
Run many images through the network, record values of chosen channel
Visualize image patches that correspond to maximal activations
Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 13: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/13.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualizations?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion
(C) Dhruv Batra and Zsolt Kira 13
![Page 14: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/14.jpg)
Visual Explanations
Where does an intelligent system “look” to make its predictions?
(C) Dhruv Batra and Zsolt Kira 14
![Page 15: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/15.jpg)
Which pixels matter: Occlusion Maps
Mask part of the image before feeding to CNN, check how much predicted probabilities change
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
Boat image is CC0 public domainElephant image is CC0 public domainGo-Karts image is CC0 public domain
P(elephant) = 0.95
P(elephant) = 0.75
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 16: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/16.jpg)
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
Boat image is CC0 public domainElephant image is CC0 public domainGo-Karts image is CC0 public domain
Mask part of the image before feeding to CNN, check how much predicted probabilities change
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Which pixels matter: Occlusion Maps
![Page 17: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/17.jpg)
What if our model was linear?
(C) Dhruv Batra and Zsolt Kira 17
![Page 18: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/18.jpg)
What if our model was linear?
(C) Dhruv Batra and Zsolt Kira 18
1
0.9
-0.2
0.5
-0.9
100
0.1
-0.1
510
-200
![Page 19: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/19.jpg)
But it’s not
(C) Dhruv Batra and Zsolt Kira 19
![Page 20: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/20.jpg)
Can we make it linear?
(C) Dhruv Batra and Zsolt Kira 20
![Page 21: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/21.jpg)
Taylor Series
(C) Dhruv Batra and Zsolt Kira 21
![Page 22: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/22.jpg)
Feature Importance in Deep Models
(C) Dhruv Batra and Zsolt Kira 22
Backprop!
![Page 23: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/23.jpg)
Which pixels matter: Saliency via Backprop
Dog
Forward pass: Compute probabilities
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 24: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/24.jpg)
Which pixels matter: Saliency via Backprop
Dog
Forward pass: Compute probabilities
Compute gradient of (unnormalized) class score with respect to image pixels, take absolute value and max over RGB channels
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Why unnormalized class scores?
![Page 25: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/25.jpg)
Saliency Maps
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 26: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/26.jpg)
Saliency Maps: Segmentation without supervision
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Rother et al, “Grabcut: Interactive foreground extraction using iterated graph cuts”, ACM TOG 2004
Use GrabCut on saliency map
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 27: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/27.jpg)
Aside: DeconvNet
27Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014
![Page 28: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/28.jpg)
Gradient-based Visualizations• Raw Gradients
– [Simoyan et al. ICLRw ‘14]
• ‘Deconvolution’– [Zeiler & Fergus, ECCV ‘14]
• Guided Backprop– [Springenber et al. ICLR ‘15]
(C) Dhruv Batra and Zsolt Kira 28
Identical for all layers except ReLU
![Page 29: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/29.jpg)
Remember ReLUs?
(C) Dhruv Batra and Zsolt Kira 29
![Page 30: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/30.jpg)
(C) Dhruv Batra and Zsolt Kira 30
![Page 31: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/31.jpg)
Backprop vs Deconv vs Guided BP• Guided Backprop tends to be “cleanest”
31
Backprop Deconv Guided Backprop
![Page 32: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/32.jpg)
Intermediate features via (guided) backprop
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.
Maximally activating patches(Each row is a different neuron)
Guided Backprop
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 33: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/33.jpg)
Intermediate features via (guided) backprop
Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.
Maximally activating patches(Each row is a different neuron)
Guided Backprop
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 34: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/34.jpg)
Visualizing Activations
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, 2014. Reproduced with permission.
![Page 35: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/35.jpg)
Problem with Guided Backup• Not very “class-discriminative”
35
GB for “bus”GB for “airliner”
Slide Credit: Ram Selvaraju
![Page 36: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/36.jpg)
Grad-CAMVisual Explanations from Deep Networks
via Gradient-based Localization [ICCV ‘17]
Ramprasaath Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam
Devi Parikh Dhruv Batra
![Page 37: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/37.jpg)
Grad-CAM
37
Backprop till conv
Rectified ConvFeature Maps
+
Grad-CAM
Slide Credit: Ram Selvaraju
![Page 38: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/38.jpg)
Grad-CAM
38
Guided Grad-CAM
Backprop till conv
Guided Backpropagation
Rectified ConvFeature Maps
+
Guided Grad-CAM
Slide Credit: Ram Selvaraju
![Page 39: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/39.jpg)
Ground-truth: Volcano
Predicted: Vine snake
Ground-truth: coil
Predicted: Car mirror
Reasonable predictions are made in many failure cases.
39
Analyzing Failure Modes with Grad-CAM
Slide Credit: Ram Selvaraju
![Page 40: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/40.jpg)
(C) Dhruv Batra and Zsolt Kira 40
![Page 41: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/41.jpg)
Grad-CAM Visual Explanations for Captioning
41Slide Credit: Ram Selvaraju
![Page 42: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/42.jpg)
(C) Dhruv Batra and Zsolt Kira 47
![Page 43: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/43.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualization?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion
(C) Dhruv Batra and Zsolt Kira 48
![Page 44: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/44.jpg)
How we evaluate explanations?• Class-discriminative?
– Show what they say they found?
• Building Trust with a User?– Help users?
• Human-like?– Do machines look where humans look?
(C) Dhruv Batra and Zsolt Kira 49
![Page 45: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/45.jpg)
Is Grad-CAM more class discriminative?• Can people tell which class is being visualized?
• Images from Pascal VOC’07 with exactly 2 categories.
• Intuition: A good explanation produces discriminative visualizations for the class of interest.
50Slide Credit: Ram Selvaraju
![Page 46: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/46.jpg)
Is Grad-CAM more class discriminative?• Human accuracy for 2-class classification
51Slide Credit: Ram Selvaraju
+17%
Grad-CAM makes existing visualizations class discriminative.
![Page 47: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/47.jpg)
Help establish trust with a user?• Given explanations from 2 models,
– VGG16 and AlexNetwhich one is more trustworthy?
• Pick images where both models = correct prediction• Show these to AMT workers and evaluate
52Slide Credit: Ram Selvaraju
![Page 48: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/48.jpg)
Help establish trust in a user?
53Slide Credit: Ram Selvaraju
Users place higher trust in a model that generalizes better.
![Page 49: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/49.jpg)
Where do humans choose to look to answer visual questions?
54
![Page 50: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/50.jpg)
55
VQA-HAT (Human ATtention)
![Page 51: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/51.jpg)
56
What food is on the table? Cake
VQA-HAT (Human ATtention)
Slide Credit: Abhishek Das
![Page 52: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/52.jpg)
57
What animal is she riding? Horse
VQA-HAT (Human ATtention)
Slide Credit: Abhishek Das
![Page 53: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/53.jpg)
58
What number of cats are laying on the bed? 2
VQA-HAT (Human ATtention)
Slide Credit: Abhishek Das
![Page 54: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/54.jpg)
• Correlation with human attention maps [Das & Agarwal et al. EMNLP’16]
Grad-CAM for ‘eating’What are they doing? Human ATtention map (HAT) for ‘eating’
HAT
0.122Guided Backpropagation
Current models look at regions more similar to humans than baselines
59
0.136Guided Grad-CAM
Are Grad-CAM explanations human-like?
Slide Credit: Ram Selvaraju
![Page 55: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/55.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualizations?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion
(C) Dhruv Batra and Zsolt Kira 60
![Page 56: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/56.jpg)
Visualizing CNN features: Gradient Ascent on Pixels
(Guided) backprop:Find the part of an image that a neuron responds to
Gradient ascent on pixels:Generate a synthetic image that maximally activates a neuron
I* = arg maxI f(I) + R(I)
Neuron value Natural image regularizer
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 57: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/57.jpg)
Visualizing CNN features: Gradient Ascent on Pixels
score for class c (before Softmax)
zero image
1. Initialize image to zeros
Repeat:2. Forward image to compute current scores3. Backprop to get gradient of neuron value with respect to image pixels4. Make a small update to the image
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 58: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/58.jpg)
Visualizing CNN features: Gradient Ascent on Pixels
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Simple regularizer: Penalize L2 norm of generated image
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 59: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/59.jpg)
Visualizing CNN features: Gradient Ascent on Pixels
Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.
Simple regularizer: Penalize L2 norm of generated image
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 60: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/60.jpg)
Visualizing CNN features: Gradient Ascent on Pixels
Simple regularizer: Penalize L2 norm of generated image
Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 61: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/61.jpg)
Fooling Images / Adversarial Examples
(1)Start from an arbitrary image(2)Pick an arbitrary class(3)Modify the image to maximize the class(4)Repeat until network is fooled
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 62: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/62.jpg)
Fooling Images / Adversarial Examples
Boat image is CC0 public domainElephant image is CC0 public domain
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 63: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/63.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualizations?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion
(C) Dhruv Batra and Zsolt Kira 68
![Page 64: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/64.jpg)
DeepDream: Amplify existing features
Rather than synthesizing an image to maximize a specific neuron, instead try to amplify the neuron activations at some layer in the network
Choose an image and a layer in a CNN; repeat:1. Forward: compute activations at chosen layer2. Set gradient of chosen layer equal to its activation3. Backward: Compute gradient on image4. Update image
Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks”, Google Research Blog. Images are licensed under CC-BY 4.0
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 65: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/65.jpg)
DeepDream: Amplify existing features
Rather than synthesizing an image to maximize a specific neuron, instead try to amplify the neuron activations at some layer in the network
Equivalent to:I* = arg maxI ∑i fi(I)2
Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks”, Google Research Blog. Images are licensed under CC-BY 4.0
Choose an image and a layer in a CNN; repeat:1. Forward: compute activations at chosen layer2. Set gradient of chosen layer equal to its activation3. Backward: Compute gradient on image4. Update image
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 66: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/66.jpg)
Sky image is licensed under CC-BY SA 3.0
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 67: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/67.jpg)
Image is licensed under CC-BY 4.0
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 68: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/68.jpg)
Image is licensed under CC-BY 4.0
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 69: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/69.jpg)
Feature InversionGiven a CNN feature vector for an image, find a new image that:- Matches the given feature vector- “looks natural” (image prior regularization)
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Given feature vector
Features of new image
Total Variation regularizer (encourages spatial smoothness)
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 70: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/70.jpg)
Feature InversionReconstructing from different layers of VGG-16
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015Figure from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016. Copyright Springer, 2016. Reproduced for educational purposes.
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 71: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/71.jpg)
Side-effect - style transfer• Content representation: feature map at each layer• Style representation: Covariance matrix at each layer
– Spatially invariant– Average second-order statistics
• Idea: Optimize x to match content of one image and style of another
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).
![Page 72: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/72.jpg)
Neural StyleStep 1: Extract content targets (ConvNet activations of all layers for the given content image)
content activations
e.g.at CONV5_1 layer we would have a [14x14x512] array of target activations
Andrej Karpathy
![Page 73: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/73.jpg)
Neural StyleStep 2: Extract style targets (Gram matrices of ConvNet activations of all layers for the given styleimage)
style gram matricese.g.at CONV1 layer (with [224x224x64] activations) would give a [64x64] Gram matrix of all pairwise activation covariances (summed across spatial locations)
Andrej Karpathy
![Page 74: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/74.jpg)
Neural StyleStep 3: Optimize over image to have:
- The content of the content image (activations match content)- The style of the style image (Gram matrices of activations match style)
match content
match style
Adapted from Andrej Karpathy
![Page 75: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/75.jpg)
Style transfer
![Page 76: L14 cnns vis - Georgia Institute of Technology · Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008 Figure copyright Laurens van der Maaten and Geoff Hinton,](https://reader034.fdocuments.us/reader034/viewer/2022052518/5f04c1747e708231d40f8b95/html5/thumbnails/76.jpg)
Plan for Today• Visualizing CNNs
– Visualizing filters– Last layer embeddings– Visualizing activations– Maximally activating patches– Occlusion maps– Salient or “important” pixels
• Gradient-based visualizations
– How to evaluate visualizations?– Creating “prototypical” images for a class– Creating adversarial images– Deep dream– Feature inversion– Style Transfer
(C) Dhruv Batra and Zsolt Kira 81