Efficient BackProp Yann LeCuni, Leon Bottoui, Genevieve B. Orr2 ...
Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features...
Transcript of Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features...
![Page 1: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/1.jpg)
Top-down Neural Attention by Excitation Backprop
J ianming Zhang 1 , Zhe L in 1 , Jona than Brand t 1 ,
Xiaohu i Shen 1 , S tan Sc la ro f f 2
1ADOBE RESEARCH
2BOSTON UNIVERSITY
ECCV 2016 Amsterdam
![Page 2: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/2.jpg)
Motivation
Artificial Neural Networks
© soul wind / stock.adobe.com2
Object Categories
Captions
Stories
![Page 3: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/3.jpg)
Motivation
Artificial Neural Networks
© soul wind / stock.adobe.com3
Object Categories
Captions
Stories
Can these models ground their own predictions?
![Page 4: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/4.jpg)
Goal: Generate Top-Down Attention Maps
• elephant
• zebra
Activation M
aps P
red
ictio
n
conv convInner
Prod.conv
Input
4
Bottom-up Inference
![Page 5: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/5.jpg)
Goal: Generate Top-Down Attention Maps
• elephant
• zebra
Activation M
aps P
red
ictio
n
conv convInner
Prod.conv
Input
4
Top-down Attention
Bottom-up Inference
• elephant
• zebra
![Page 6: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/6.jpg)
Goal: Generate Top-Down Attention Maps
• elephant
• zebra
Activation M
aps P
red
ictio
n
conv convInner
Prod.conv
Input
4
Top-down Attention
Bottom-up Inference
• elephant
• zebra
• elephant
• zebra
![Page 7: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/7.jpg)
Related Work
5
[1] Zhou et al. “Object detectors emerge in deep scene
CNNs.” ICLR, 2015.
[2] Bergamo et al. “Self-taught object localization with deep
networks.” arXiv preprint arXiv:1409.3964, 2014.
[3] Cao et al. “Look and think twice: Capturing top-down visual
attention with feedback convolutional neural networks.” ICCV,
2015.
[4] Sermanet et al. “Overfeat: Integrated recognition, localization
and detection using convolutional networks.” ICLR, 2014.
[5] Zhou et al. “Learning Deep Features for Discriminative
Localization.” CVPR, 2016.
[6] Zeiler et al. “Visualizing and understanding convolutional
networks.” ECCV, 2014.
[7] Simonyan et al. “Deep inside convolutional networks:
Visualizing image classification models and saliency
maps.” ICLRW, 2014.
[8] Bach et al. “On pixel-wise explanations for non-linear classifier
decisions by layer-wise relevance propagation.” PloS One,
2015.
Masking-based [1, 2]
Optimization-based
[3]
Fully-conv-based [4,
5]
Backprop-based [6, 7,
8]
![Page 8: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/8.jpg)
Related Work
5
[1] Zhou et al. “Object detectors emerge in deep scene
CNNs.” ICLR, 2015.
[2] Bergamo et al. “Self-taught object localization with deep
networks.” arXiv preprint arXiv:1409.3964, 2014.
[3] Cao et al. “Look and think twice: Capturing top-down visual
attention with feedback convolutional neural networks.” ICCV,
2015.
[4] Sermanet et al. “Overfeat: Integrated recognition, localization
and detection using convolutional networks.” ICLR, 2014.
[5] Zhou et al. “Learning Deep Features for Discriminative
Localization.” CVPR, 2016.
[6] Zeiler et al. “Visualizing and understanding convolutional
networks.” ECCV, 2014.
[7] Simonyan et al. “Deep inside convolutional networks:
Visualizing image classification models and saliency
maps.” ICLRW, 2014.
[8] Bach et al. “On pixel-wise explanations for non-linear classifier
decisions by layer-wise relevance propagation.” PloS One,
2015.
Masking-based [1, 2]
Optimization-based
[3]
Fully-conv-based [4,
5]
Backprop-based [6, 7,
8]
› General: is applicable
to a wide variety of
DNNs
› Simple: can generate
an attention map in a
single backward pass
![Page 9: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/9.jpg)
Contributions
6
Excitation Backprop
• Based on the biologically-inspired Selective Tuning model of visual attention
• Probabilistic Winner-Take-All scheme that is applicable to modern DNNs
Contrastive Top-down Attention Formulation
• Significantly improves the discriminativeness of our attention maps
![Page 10: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/10.jpg)
The Selective Tuning Model [Tsotsos et al. 1995]
Forward pass to compute the feature values at each layer, as well as
predictions
Backward pass to localize relevant regions
7
Winner-Take-AllBackward pass
output layer
[1] Tsotsos et al. “Modeling Visual Attention via Selective Tuning.” Artificial Intelligence,
1995.
![Page 11: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/11.jpg)
The Selective Tuning Model [Tsotsos et al. 1995]
Forward pass to compute the feature values at each layer, as well as
predictions
Backward pass to localize relevant regions
7
Winner-Take-AllBackward pass
output layer
[1] Tsotsos et al. “Modeling Visual Attention via Selective Tuning.” Artificial Intelligence,
1995.
For deep neural networks, this greedy, winner-take-all method produces very sparse
binary maps, and only uses information of a very small portion of the whole network.
![Page 12: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/12.jpg)
Our Approach: Probabilistic Winner-Take-All
[1] Tsotsos et al. “Modeling Visual Attention via
Selective Tuning.” Artificial Intelligence,
1995.
Winner Sampling
8
Winner-Take-All [1]
![Page 13: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/13.jpg)
Our Approach: Probabilistic Winner-Take-All
[1] Tsotsos et al. “Modeling Visual Attention via
Selective Tuning.” Artificial Intelligence,
1995.
Marginal Winning Probability (MWP):
Winner Sampling
8
Winner-Take-All [1]
![Page 14: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/14.jpg)
Our Approach: Probabilistic Winner-Take-All
[1] Tsotsos et al. “Modeling Visual Attention via
Selective Tuning.” Artificial Intelligence,
1995.
Marginal Winning Probability (MWP):
Equivalent to an
Absorbing Markov
Chain process.
Winner Sampling
8
Winner-Take-All [1]
![Page 15: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/15.jpg)
Excitation Backprop
Assumptions:
The responses of the activation neurons are non-negative.
An activation neuron is tuned to detect certain visual features. Its
response is positively correlated to its confidence of the detection.
9
![Page 16: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/16.jpg)
Excitation Backprop
Assumptions:
The responses of the activation neurons are non-negative.
An activation neuron is tuned to detect certain visual features. Its
response is positively correlated to its confidence of the detection.
Activation
Layer N
Activation
Layer N-1
+++_
Inhibitory Neuron
Excitatory Neuron
9
![Page 17: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/17.jpg)
Excitation Backprop
Assumptions:
The responses of the activation neurons are non-negative.
An activation neuron is tuned to detect certain visual features. Its
response is positively correlated to its confidence of the detection.
Activation
Layer N
Activation
Layer N-1
+++_
Inhibitory Neuron
Excitatory Neuron
9
![Page 18: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/18.jpg)
Excitation Backprop
10
Running excitation backprop, we can extract attention maps from different
layers.
Lower layers can generate maps that highlight features of smaller scale.
![Page 19: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/19.jpg)
Challenge: Responsive to Top-down Signals?
zebra elephant
11
Maps obtained using VGG16 pool3
![Page 20: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/20.jpg)
Challenge: Responsive to Top-down Signals?
zebra elephant
11
Dominant neurons always win!
Maps obtained using VGG16 pool3
![Page 21: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/21.jpg)
Negating the Output Layer for Contrastive Signals
zebra
classifier
zebra map
12
![Page 22: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/22.jpg)
Negating the Output Layer for Contrastive Signals
non-zebra
classifier
zebra
classifier
zebra map non-zebra map
12
![Page 23: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/23.jpg)
Negating the Output Layer for Contrastive Signals
non-zebra
classifier
zebra
classifier
zebra map non-zebra map
12
![Page 24: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/24.jpg)
Contrastive Maps
zebra elephant
13
Negative values truncated to 0 and image values rescaled (for
visualization)
Contrastive attention map can be computed by a single pass
![Page 25: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/25.jpg)
Evaluation: The Pointing Game
Task:
› Given an image and an object category, point to the targets.
Evaluation Metric:
› Mean pointing accuracy across categories
› Pointing anywhere on the targets is fine
CNN Models Tested:
› CNN-S [Chatfield et al. BMVC’14]
› VGG16 [Simonyan et al. ICLR’15]
› GoogleNet [Szegedy et al. CVPR’15]
Model Training:
› Multi-label cross-entropy loss
› Do not use any localization annotations
credit: elena milevska / stock.adobe.com
14
credit: howtomontessori.com
![Page 26: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/26.jpg)
Results on VOC07 (GoogleNet)
15
69,5
79,3
74,3
72,8
80,8
79,3
85,1
60
65
70
75
80
85
90
Mean Accuracy over Categories
[1] Simonyan et al. “Deep inside convolutional networks: Visualizing
image classification models and saliency maps.” ICLRW, 2014.
[2] Zeiler et al. “Visualizing and understanding convolutional
networks.” ECCV, 2014.
[3] Bach et al. “On pixel-wise explanations for non-linear classifier
decisions by layer-wise relevance propagation.” PloS One,
2015.
[4] Zhou et al. “Learning Deep Features for Discriminative
Localization.” CVPR, 2016.
![Page 27: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/27.jpg)
Results on MS COCO (GoogleNet)
16
27,7
42,6
35,7
40,241,6
43,6
53,8
20
25
30
35
40
45
50
55
60
Mean Accuracy over Categories
![Page 28: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/28.jpg)
Qualitative Comparison
17
![Page 29: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/29.jpg)
Qualitative Comparison
18
![Page 30: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/30.jpg)
Top-down Attention from an 18K-Tag Classifier
Train an image tag classifier for ~18K tags
› 6M Stock images with user tags
› Pre-trained GoogleNet model from Caffe Model Zoo
› Cross entropy multi-label loss
19
![Page 31: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/31.jpg)
An Interesting Case
20
![Page 32: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/32.jpg)
Phrase Localization
Follow the evaluation protocol of the Flickr30K entities dataset
Localization based on top-down attention maps:
› Take the average of word attention maps to get the phrase attention map
› Compute object proposals
› Re-rank proposals using the top-down phrase map
21
0
5
10
15
20
25
30
MCG_base Grad (MCG) Deconv (MCG) LRP (MCG) CAM (MCG) Ours (MCG) CCA [1]
(EdgeBoxes)
All Small Objects
Accuracy/Recall@1
[1] Plummer, et al. “Flickr30k
entities: Collecting region-
to-phrase correspondences
for richer image-to-
sentence models.” ICCV,
2015.
![Page 33: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/33.jpg)
Conclusion
22
GPU&CPU Implementation in Caffe
https://github.com/jimmie33/Caffe-ExcitationBP
Excitation
Backprop
Contrastive
Attention
Discriminative
Top-down
Attention Map+ =
![Page 34: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/34.jpg)
Backup Slides
23
![Page 35: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/35.jpg)
Does the Contrastive Attention Formulation Work for Other Methods?
24
60,461,4 61,9
49,4
70,6
61,9
67,7
40
45
50
55
60
65
70
75
Our Grad CAM Deconv
original
contrastive
Deconv:+ Truncates negative signals- Requires normalization- Requires two backward passes- Does not use the activation values in the
backpropgation
VOC07 Difficult Set
![Page 36: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/36.jpg)
Phrase Localization on the Flickr30K Entity Dataset [1]
[1] Plummer et al. “Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models.” ICCV, 2015.
25
![Page 37: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/37.jpg)
Results
Mean Accuracy over Object Categories in the Pointing Game
26
![Page 38: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/38.jpg)
Excitation Backprop
Assumptions:
The responses of the activation neurons are non-negative.
An activation neuron is tuned to detect certain visual features. Its
response is positively correlated to its confidence of the detection.
27
Running excitation backprop, we can extract attention maps from different
layers.
Lower layers can generate maps that highlight features of smaller scale.
![Page 39: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/39.jpg)
Example Results
28
![Page 40: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/40.jpg)
Contrastive Attention
zebra elephant
29
threshold at 0
![Page 41: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/41.jpg)
Contrastive Attention
zebra elephant
29
threshold at 0
![Page 42: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/42.jpg)
Contrastive Attention
zebra elephant
elephant zebra
29
threshold at 0
![Page 43: Top-down Neural Attention by Excitation Backprop€¦ · [5] Zhou et al. “Learning Deep Features for Discriminative Localization.” CVPR, 2016. [6] Zeiler et al. “Visualizing](https://reader035.fdocuments.us/reader035/viewer/2022070113/605c2d373fa71844251291e2/html5/thumbnails/43.jpg)
Contrastive Attention
zebra elephant
elephant zebra
29
The pair of maps are well normalized using our probabilistic framework
Contrastive attention map can be computed by a single pass
threshold at 0