Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...
Transcript of Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...
German Research Center for Artificial Intelligence (DFKI)
ALL RIGHTS RESERVED. No part of this work may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system without expressed written permission from the authors.
Understanding Deep Networks through
Properties of the Input Space
GTC 2019
By: Sebastian Palacio
1
NeuralNetwork
Deep Neural Networks WorkDUH!
2
NeuralNetwork
NeuralNetwork
...yet they can be easily tricked
3
NeuralNetwork
Filter
Harden
Flag
Safeguarding becomesa “thing”
4
NeuralNetwork
Filter
Harden
Flag
Cat and Mouse Chase5
Modify the Network
How do Attacks Work?input
features features features output
6
Modify the Network
How do Attacks Work?input
features features features output
Modify the Input7
How do Attacks Work?input
features features features output
Modify the Input8
How do Attacks Work?input
features features features
9
Pass input through the network: f(x)1.
How do Attacks Work?input
features features features
10
Pass input through the network: f(x)
Compute sensitivity: f’(x)
1.
2.
How do Attacks Work?input
features features features output
Modify the Input11
Pass input through the network: f(x)
Compute sensitivity: f’(x)
Modify input according to sensitivity.
1.
2.
3.
Gradients are good estimators of the input’s space distribution
12
INPUT gradient
Perturbation
1. Reconstruction:How do Attacks Work?
13
2. Classification:How do Attacks Work?
14
Idea against attacks!
Give me Gradients!
15
Reconstruction Gradients
Classification Gradients
AVOID THIS16
Hypothesis: bigger problems are better
Reconstruction Gradients
Classification GradientsMNIST
ImageNet17
18
YFCC100mSegNet +
ImageNet
69x
...so we tried
19
Perceptually similar!
How to Compare:
20
ResNet-50SegNet
Noise Level
Model Accuracy
Targeted Vs Untargeted Attacks:
0.3
0.5
0.2𝚫y
Untargeted:Push the true class down until any other wins.
Targeted:Push a randomly selected target up until it wins.
21
Quick, pick one at random!
22
Input Input
Adversarial <-> Non adversarialHYPOTHESIS
23
PerturbationInput Gradients
Input Gradients Perturbation
Adversaries fighting an attack-agnostic Autoencoder on Imagenet
Baseline (no attack)
Classifier only (no defense)
Classifier with Autoencoder
ALP for targeted PGD (Kannan et al. 2018)
ALP for untargeted PGD (Engstrom et al. 2018)
24
Simple attack
Loop with clipping
Amount of noise
Same but in a loop
Fancy optimization
Adversaries fighting an attack-agnostic Autoencoder on Imagenet
Baseline (no attack)
Classifier only (no defense)
Classifier with Autoencoder
ALP for targeted PGD (Kannan et al. 2018)
ALP for untargeted PGD (Engstrom et al. 2018)
25
Simple attack
Loop with clipping
Amount of noise
Same but in a loop
Fancy optimization
74.0271.19
No AE With AE
Structural Gradients Obstruct Gradient-Based Attacks*
26
Recon
struc
tion
Grad
ients
Class
ifica
tion
Grad
ients
*as long as structure is not tightly related to semantics
A closer look at adversarial noise
MNIST
Reality:Non structural changes
Uninformative dimensions!
27
Expectation:Structural Change
Effects of extra dimensions
28
1D:
2D:
𝚫x
29
From 2D... ...to 3D
Semantic information on x,y
z-axis is uninformative
30
Decision Boundaries
31
Expected Boundary● Z-axis does not interfere● Perturbations need to go in
the direction of the training samples
32
Vulnerable Boundary● Small perturbations along the
“extra” dimension change the predicted class!
33
Vulnerable Boundary
● Class boundary extends over the domain of other classes
34
Extrapolating...
1D 2D 3D 784D... ...
Preserve only the information that is useful for classification
35
Step 3 Fine-tune the decoder with gradients from the classifier
train a classifier
Step 1
ImageNet
train an autoencoder
Step 2 YFCC100M
Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018) 36
37
Accuracy on ResNet-5074.02
71.19
No AE With AE
74.9474.02
With Fine-tuned AE
-2.83pp +0.92pp
Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018)
38
Looking up Reconstructions
Ori
gin
alR
esN
et50
R
eco
nst
ruct
ion
s
Experiments with S2SNets (on Imagenet)39
Baseline (no attack)
Classifier only (no defense)
Classifier with Autoencoder
Classifier with S2SNet
● Consistent offset (projection of unnecessary input signal)
● Not bounded to any specific adversarial attack.
● Zero compromise for clean images (no attack)
74.94
With S2SNet
40
So, did we solve adversarial attacks?
● Function is a proof of concept for a defense principle:○ Gradients are stable but convey information
that is less effective for adversarial attacks.○ No gradient obfuscation :)
● Content dependent.
● Still vulnerable under some specific but common threat conditions.
41
Manifold exploration is possible through input gradients.They express different things depending on the task
If structural info != semantic info, autoencoders can help with adversarial attacks.
Projection of redundant dimensions can be achieved via S2SNets
High dimensionality of the input space induces (exploitable) irregularities for decision boundaries
Sum
mar
y
It’s a sound design principle against gradient-based attacks
Enhancing robustness against adversarial attacks!
42
Thank you!
42
DeepLearning
Sebastian [email protected]@spalaciob
“Adversarial Defense using Structure-to-Signal Autoencoders”https://arxiv.org/abs/1803.07994
In collaboration with:● Joachim Folz (equal contribution)● Jörn Hees● Federico Raue
Supervisor:● Andreas Dengel
DFKI Kaiserslautern
Some images have been taken from www.pexels.com and www.openclipart.org