Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...

German Research Center for Artificial Intelligence (DFKI)

ALL RIGHTS RESERVED. No part of this work may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system without expressed written permission from the authors.

Understanding Deep Networks through

Properties of the Input Space

GTC 2019

By: Sebastian Palacio

1

NeuralNetwork

Deep Neural Networks WorkDUH!

2

NeuralNetwork

NeuralNetwork

...yet they can be easily tricked

3

NeuralNetwork

Filter

Harden

Flag

Safeguarding becomesa “thing”

4

NeuralNetwork

Filter

Harden

Flag

Cat and Mouse Chase5

Modify the Network

How do Attacks Work?input

features features features output

6

Modify the Network



Modify the Input7



Modify the Input8


features features features

9

Pass input through the network: f(x)1.


features features features

10

Pass input through the network: f(x)

Compute sensitivity: f’(x)

1.

2.



Modify the Input11

Pass input through the network: f(x)

Compute sensitivity: f’(x)

Modify input according to sensitivity.

1.

2.

3.

Gradients are good estimators of the input’s space distribution

12

INPUT gradient

Perturbation

1. Reconstruction:How do Attacks Work?

13

2. Classification:How do Attacks Work?

14

Idea against attacks!

Give me Gradients!

15

Reconstruction Gradients

Classification Gradients

AVOID THIS16

Hypothesis: bigger problems are better

Reconstruction Gradients

Classification GradientsMNIST

ImageNet17

18

YFCC100mSegNet +

ImageNet

69x

...so we tried

19

Perceptually similar!

How to Compare:

20

ResNet-50SegNet

Noise Level

Model Accuracy

Targeted Vs Untargeted Attacks:

0.3

0.5

0.2𝚫y

Untargeted:Push the true class down until any other wins.

Targeted:Push a randomly selected target up until it wins.

21

Quick, pick one at random!

22

Input Input

Adversarial <-> Non adversarialHYPOTHESIS

23

PerturbationInput Gradients

Input Gradients Perturbation

Adversaries fighting an attack-agnostic Autoencoder on Imagenet

Baseline (no attack)

Classifier only (no defense)

Classifier with Autoencoder

ALP for targeted PGD (Kannan et al. 2018)

ALP for untargeted PGD (Engstrom et al. 2018)

24

Simple attack

Loop with clipping

Amount of noise

Same but in a loop

Fancy optimization

Adversaries fighting an attack-agnostic Autoencoder on Imagenet




ALP for targeted PGD (Kannan et al. 2018)

ALP for untargeted PGD (Engstrom et al. 2018)

25

Simple attack

Loop with clipping

Amount of noise

Same but in a loop

Fancy optimization

74.0271.19

No AE With AE

Structural Gradients Obstruct Gradient-Based Attacks*

26

Recon

struc

tion

Grad

ients

Class

ifica

tion

Grad

ients

*as long as structure is not tightly related to semantics

A closer look at adversarial noise

MNIST

Reality:Non structural changes

Uninformative dimensions!

27

Expectation:Structural Change

Effects of extra dimensions

28

1D:

2D:

𝚫x

29

From 2D... ...to 3D

Semantic information on x,y

z-axis is uninformative

30

Decision Boundaries

31

Expected Boundary● Z-axis does not interfere● Perturbations need to go in

the direction of the training samples

32

Vulnerable Boundary● Small perturbations along the

“extra” dimension change the predicted class!

33

Vulnerable Boundary

● Class boundary extends over the domain of other classes

34

Extrapolating...

1D 2D 3D 784D... ...

Preserve only the information that is useful for classification

35

Step 3 Fine-tune the decoder with gradients from the classifier

train a classifier

Step 1

ImageNet

train an autoencoder

Step 2 YFCC100M

Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018) 36

37

Accuracy on ResNet-5074.02

71.19

No AE With AE

74.9474.02

With Fine-tuned AE

-2.83pp +0.92pp

Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018)

38

Looking up Reconstructions

Ori

gin

alR

esN

et50

R

eco

nst

ruct

ion

s

Experiments with S2SNets (on Imagenet)39




Classifier with S2SNet

● Consistent offset (projection of unnecessary input signal)

● Not bounded to any specific adversarial attack.

● Zero compromise for clean images (no attack)

74.94

With S2SNet

40

So, did we solve adversarial attacks?

● Function is a proof of concept for a defense principle:○ Gradients are stable but convey information

that is less effective for adversarial attacks.○ No gradient obfuscation :)

● Content dependent.

● Still vulnerable under some specific but common threat conditions.

41

Manifold exploration is possible through input gradients.They express different things depending on the task

If structural info != semantic info, autoencoders can help with adversarial attacks.

Projection of redundant dimensions can be achieved via S2SNets

High dimensionality of the input space induces (exploitable) irregularities for decision boundaries

Sum

mar

y

It’s a sound design principle against gradient-based attacks

Enhancing robustness against adversarial attacks!

42

Thank you!

42

DeepLearning

Sebastian [email protected]@spalaciob

“Adversarial Defense using Structure-to-Signal Autoencoders”https://arxiv.org/abs/1803.07994

In collaboration with:● Joachim Folz (equal contribution)● Jörn Hees● Federico Raue

Supervisor:● Andreas Dengel

DFKI Kaiserslautern

Some images have been taken from www.pexels.com and www.openclipart.org

Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...

Documents

Transcript of Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...