18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial...

41
Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks Weilin Xu David Evans Yanjun Qi http://www.cs.virginia.edu/yanjun/

Transcript of 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial...

Page 1: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

FeatureSqueezing:DetectingAdversarialExamplesinDeepNeuralNetworks

WeilinXuDavidEvansYanjun Qihttp://www.cs.virginia.edu/yanjun/

Page 2: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

DeepLearningisSolvingManyofOurProblems!

Auto-DrivingCar

VoiceAssistant

SpamDetector

MedicalGenomics

2

Page 3: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

However,DeepLearningClassifiersareEasilyFooled

3

+

“1”100%confidence

“4”100%=

+ “2”99.9%=

+ “2”83.8%=

BIM

JSMA

CW2

OriginalExample Perturbations Adversarial

Examples

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

Page 4: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

ClassifiersUnderAttack:AdversaryAdapts

ACMCCS2016

Actualimages

Recognizedfaces

4

Page 5: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

SolutionStrategy

5

Solution Strategy 1: Train a perfect vision model.Infeasibleyet.

Solution Strategy 2: Make it harder to find adversarial examples.Arms race!

Feature Squeezing: A general framework that reduces the searchspace available for an adversary and detects adversarial examples.

Simple,Cheap,Effective!

Page 6: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Roadmap

• FeatureSqueezingDetectionFramework

• FeatureSqueezers• BitDepthReduction• SpatialSmoothing

• DetectionEvaluation• Obliviousadversary• Adaptiveadversary

6

Page 7: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Model f(.)

Model f(.)

Background:MachineLearning

• MachineLearning:learntofindmodels thatcangeneralize fromobserveddatatounseendata

7

Input

X Y

X’Modelf(.)

generalizestoUnseenX’

Output

Trained Deep learning Model

“panda”

Forinstance:

Page 8: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Background:AdversarialExamples

8

+

=

t: “gibbon”

Trained Deep learning Model

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

x:originalsample

x’=x+r:adversarialsample

Trained Deep learning Model

x’= x + r

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

Page 9: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Background:AdversarialExamples

9

+

=

t: “gibbon”

Trained Deep learning Model

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

Trained Deep learning Model

x’= x + r

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3Misclassificationterm Distanceterm

Manydifferentvariationsofformulationstosearchforx’fromx,e.g.,

Page 10: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

IntriguingPropertyofAdversarialExamples

10

“panda” Trained Deep learning Model

y: “1”

Original X

+

y: “2”

Trained Deep learning Model

Adversarial Example: X + r

Irrelevantfeaturesusedinclassificationtasksarethemajorcauseofadversarialexamples.

Page 11: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

IntriguingPropertyofAdversarialExamples

11

“panda” Trained Deep learning Model

y: “1”

Original X

+

y: “2”

Trained Deep learning Model

Adversarial Example: X + r

Trained Deep learning Model

y: “1”

Squeeze(X)

+

y: “1”

Trained Deep learning Model

Squeeze(X + r )

Page 12: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Motivation

• Irrelevantfeaturesusedinclassificationtasksaretherootcauseofadversarialexamples.

• Thefeaturespacesareunnecessarilytoolargeindeeplearningtasks:e.g.rawimagepixels.

• WemayreducethesearchspaceofpossibleperturbationsavailabletoanadversaryusingFeatureSqueezing.

BitDepthReduction12

Page 13: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

13

Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018

Page 14: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

14

Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018

Page 15: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Detection Framework

15

ModelPrediction0

Input

Model

Squeezer1

Prediction1

Legitimate

𝐿< 𝑑<>T

Yes

Adversarial

No

FeatureSqueezer coalesces similar samples intoa single one.• Barelychangelegitimateinput.• Destruct adversarial perturbations.

d1

Page 16: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Detection Framework:MultipleSqueezers

16

ModelPrediction0

Input

Model

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Yes

Adversarial

No

Legitimate

Model

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

d1

d2

Page 17: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

BitDepthReduction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3-bit

1-bit

8-bit

Reduceto1-bit𝑥E = round(𝑥E×2)/2

Reduceto1-bit𝑥E = round(𝑥E×2)/2

[0.312 0.471……0.157 0.851]X_adv

[0.012 0.571……0.1590.951]

X

Originalvalue

Targetva

lue

17

[0.1.……0.1.]

[0.0.……0.1.]

SignalQuantization

Page 18: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

BitDepthReduction

Eliminatingadversarialperturbationswhilepreservingsemantics.

18

Legitimate FGSM BIM CW∞ CW2

1 1 4 2 2

1 1 1 1 1

Page 19: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AccuracywithBitDepthReduction

19

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0,JSMA)

LegitimateImages

MNISTNone 13.0% 99.43%

1-bit Depth 62.7% 99.33%

ImageNetNone 2.78% 69.70%

4-bit Depth 52.11% 68.00%

Baseline

Page 20: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

DistributionofDistance(Prediction,SqueezedPrediction)(MNIST)

Maximum L1 Distance 20

0

200

400

600

800

0.0 0.4 0.8 1.2 1.6 2.0

Num

ber o

f Exa

mpl

es

Legitimate

Adversarial

Page 21: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

SpatialSmoothing:MedianFilter

• Replaceapixelwithmedianofitsneighbors.• Effectiveineliminating”salt-and-pepper”noise.

21*Imagefromhttps://sultanofswing90.wordpress.com/tag/image-processing/

3x3MedianFilter

Page 22: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

SpatialSmoothing:Non-localMeans

• Replaceapatchwithweightedmeanofsimilarpatches.• Preservemoreedges.

22

𝑝

𝑞A

𝑞< 𝑝3 =P𝑤(𝑝, 𝑞E)×𝑞E

Page 23: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

23

Airplane94.4%

Truck99.9%

Automobile56.5%

Airplane98.4%

Airplane99.9%

Ship46.0%

Airplane98.3%

Airplane80.8%

Airplane70.0%

MedianFilter(2*2)

Non-localMeans(13-3-4)

OriginalBIM(L∞)JSMA(L0)

Page 24: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AccuracywithSpatialSmoothing

24

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0)

LegitimateImages

ImageNet

None 2.78% 69.70%

Median Filter2*2 68.11% 65.40%

Non-localMeans11-3-4 57.11% 65.40%

Baseline

Page 25: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

DistributionofDistance(Prediction,SqueezedPrediction)(ImageNet)

Maximum L1 Distance 25

020406080100120140

0.0 0.4 0.8 1.2 1.6 2.0

Legitimate

Adversarial

Num

ber o

f Exa

mpl

es

Page 26: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Adversarialinputs(CW2 attack)

Legitimateinputs

Page 27: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

OtherPotentialSqueezers

27

CXie,etal.MitigatingAdversarialEffectsThroughRandomization,ICLR2018.

JBuckman,etal.ThermometerEncoding:OneHotWayToResistAdversarialExamples,ICLR2018.

DMeng andHChen,MagNet:aTwo-ProngedDefenseagainstAdversarialExamples,inCCS2017.

FLiao,etal.DefenseagainstAdversarialAttacksUsingHigh-LevelRepresentationGuidedDenoiser,arXiv 1712.02976.

APrakash,etal.DeflectingAdversarialAttackswithPixelDeflection,arXiv 1801.08926.

• Thermometer Encoding(learnable bit depth reduction)

• Imagedenoising usingbilateralfilter,autoencoder,wavelet,etc.

• Imageresizing

Page 28: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

ExperimentalSetup

• DatasetsandModelsMNIST, 7-layer-CNNCIFAR-10, DenseNetImageNet, MobileNet

• Attacks(100examplesforeachattack)• Untargeted:FGSM,BIM,DeepFool• Targeted(Next/Least-Likely): JSMA, Carlini-WagnerL2/L∞/L0

• DetectionDatasets• Abalanceddatasetwithlegitimateexamples.• 50%fortrainingthedetector,theremainingforvalidation.

28

Page 29: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Threat Models

• Oblivious adversary: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive adversary: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

29

Page 30: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Detection Framework:MultipleSqueezers

30

ModelPrediction0

Input

Model

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Yes

Adversarial

No

Legitimate

Model

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

d1

d2

Page 31: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

HowtofindTfordetector (MNIST)

Maximum L1 Distance 31

SelectathresholdvaluewithFPR5%.

0

200

400

600

800

0.0 0.4 0.8 1.2 1.6 2.0

Num

ber o

f Exa

mpl

es

Legitimate

Adversarial

Page 32: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Detect Successful Adv. Examples (MNIST)

32

SqueezerL∞Attacks L2 Attacks L0 Attacks

FGSM BIM CW∞ CW2 CW0 JSMA

1-bit Depth 100% 97.9% 100% 100% 55.6% 100%

Median 2*2 73.1% 27.7% 100% 94.4% 82.2% 100%

[Best Single] 100% 97.9% 100% 100% 82.2% 100%

Joint 100% 97.9% 100% 100% 91.1% 100%

Bit Depth Reduction is more effective on L∞ and L2 attacks.

Median Smoothing is more effective on L0 attacks.

Joint detection improves performance.

Page 33: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AggregatedDetectionResults

Dataset Squeezers ThresholdFalse

PositiveRate

DetectionRate(SAEs)

ROC-AUCExcludeFAEs

MNIST BitDepth(1-bit),Median(2x2) 0.0029 3.98% 98.2% 99.44%

CIFAR-10BitDepth(5-bit),Median(2x2),Non-localMean(13-3-2)

1.1402 4.93% 84.5% 95.74%

ImageNetBitDepth(5-bit),Median(2x2),Non-localMean(11-3-4)

1.2128 8.33% 85.9% 94.24%

33

Page 34: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Threat Models

• Oblivious attack: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive attack: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

34

Page 35: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AdaptiveAdversary

AdaptiveCW2 attack,unboundedadversary.

WarrenHe,JamesWei,Xinyun Chen,NicholasCarlini,DawnSong,AdversarialExampleDefense:EnsemblesofWeakDefensesarenotStrong,USENIXWOOT’17.

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3 +𝑘 ∗ 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑐𝑜𝑟𝑒(𝑥′)

35

Misclassificationterm Distanceterm Detectionterm

Page 36: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AdaptiveAdversarialExamples

36

No successful adversarial examples were found for images originally labeled as 3 or 8.

Mean L2

2.80

4.14

4.67

Page 37: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

AdaptiveAdversarySuccessRates

37

0.68

0.060.01

0.44

0.01

0.24

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Adv

ersa

ry’s

Suc

cess

Rat

e

Clipped ε

Targeted (Next)

Targeted(LL)

Untargeted

Unbounded

Common 𝜺

Page 38: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

CounterMeasure:Randomization

• Binaryfilterthreshold:=0.5threshold:=𝒩 0.5,0.0625

• StrengthentheadaptiveadversaryAttack an ensemble of 3detectors with thresholds:=[0.4,0.5,0.6]

38

0

0.2

0.4

0.6

0.8

1

0 0.5 10

0.2

0.4

0.6

0.8

1

0 0.5 1

Page 39: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

39

2.80, Untargeted

4.14, Targeted-Next

4.67, Targeted-LL

3.63, Untargeted

5.48, Targeted-Next

5.76, Targeted-LL

AttackDeterministicDetector Mean L2

AttackRandomizedDetector

Page 40: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Conclusion

• FeatureSqueezinghardens deep learning models.• Feature Squeezing gives advantages to the defense side inthe armsrace with adaptive adversary.

40

Page 41: 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Thankyou!ReproduceourresultsusingEvadeML-Zoo:https://evadeML.org/zoo

41