18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial...
Transcript of 18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial...
FeatureSqueezing:DetectingAdversarialExamplesinDeepNeuralNetworks
WeilinXuDavidEvansYanjun Qihttp://www.cs.virginia.edu/yanjun/
DeepLearningisSolvingManyofOurProblems!
Auto-DrivingCar
VoiceAssistant
SpamDetector
MedicalGenomics
2
However,DeepLearningClassifiersareEasilyFooled
3
+
“1”100%confidence
“4”100%=
+ “2”99.9%=
+ “2”83.8%=
BIM
JSMA
CW2
OriginalExample Perturbations Adversarial
Examples
CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.
ClassifiersUnderAttack:AdversaryAdapts
ACMCCS2016
Actualimages
Recognizedfaces
4
SolutionStrategy
5
Solution Strategy 1: Train a perfect vision model.Infeasibleyet.
Solution Strategy 2: Make it harder to find adversarial examples.Arms race!
Feature Squeezing: A general framework that reduces the searchspace available for an adversary and detects adversarial examples.
Simple,Cheap,Effective!
Roadmap
• FeatureSqueezingDetectionFramework
• FeatureSqueezers• BitDepthReduction• SpatialSmoothing
• DetectionEvaluation• Obliviousadversary• Adaptiveadversary
6
Model f(.)
Model f(.)
Background:MachineLearning
• MachineLearning:learntofindmodels thatcangeneralize fromobserveddatatounseendata
7
Input
X Y
X’Modelf(.)
generalizestoUnseenX’
Output
Trained Deep learning Model
“panda”
Forinstance:
Background:AdversarialExamples
8
+
=
t: “gibbon”
Trained Deep learning Model
y: “panda”
x X 0.007×[𝑛𝑜𝑖𝑠𝑒]
x:originalsample
x’=x+r:adversarialsample
Trained Deep learning Model
x’= x + r
CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.
Background:AdversarialExamples
9
+
=
t: “gibbon”
Trained Deep learning Model
y: “panda”
x X 0.007×[𝑛𝑜𝑖𝑠𝑒]
Trained Deep learning Model
x’= x + r
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3Misclassificationterm Distanceterm
Manydifferentvariationsofformulationstosearchforx’fromx,e.g.,
IntriguingPropertyofAdversarialExamples
10
“panda” Trained Deep learning Model
y: “1”
Original X
+
y: “2”
Trained Deep learning Model
Adversarial Example: X + r
Irrelevantfeaturesusedinclassificationtasksarethemajorcauseofadversarialexamples.
IntriguingPropertyofAdversarialExamples
11
“panda” Trained Deep learning Model
y: “1”
Original X
+
y: “2”
Trained Deep learning Model
Adversarial Example: X + r
Trained Deep learning Model
y: “1”
Squeeze(X)
+
y: “1”
Trained Deep learning Model
Squeeze(X + r )
Motivation
• Irrelevantfeaturesusedinclassificationtasksaretherootcauseofadversarialexamples.
• Thefeaturespacesareunnecessarilytoolargeindeeplearningtasks:e.g.rawimagepixels.
• WemayreducethesearchspaceofpossibleperturbationsavailabletoanadversaryusingFeatureSqueezing.
BitDepthReduction12
13
Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018
14
Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018
Detection Framework
15
ModelPrediction0
Input
Model
Squeezer1
Prediction1
Legitimate
𝐿< 𝑑<>T
Yes
Adversarial
No
FeatureSqueezer coalesces similar samples intoa single one.• Barelychangelegitimateinput.• Destruct adversarial perturbations.
d1
Detection Framework:MultipleSqueezers
16
ModelPrediction0
Input
Model
Squeezer1
Prediction1𝐿<
max 𝑑<,𝑑A > 𝑇
Yes
Adversarial
No
Legitimate
Model
Squeezer2
Prediction2𝐿<
• Bit Depth Reduction• Spatial Smoothing
d1
d2
BitDepthReduction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3-bit
1-bit
8-bit
Reduceto1-bit𝑥E = round(𝑥E×2)/2
Reduceto1-bit𝑥E = round(𝑥E×2)/2
[0.312 0.471……0.157 0.851]X_adv
[0.012 0.571……0.1590.951]
X
Originalvalue
Targetva
lue
17
[0.1.……0.1.]
[0.0.……0.1.]
SignalQuantization
BitDepthReduction
Eliminatingadversarialperturbationswhilepreservingsemantics.
18
Legitimate FGSM BIM CW∞ CW2
1 1 4 2 2
1 1 1 1 1
AccuracywithBitDepthReduction
19
Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0,JSMA)
LegitimateImages
MNISTNone 13.0% 99.43%
1-bit Depth 62.7% 99.33%
ImageNetNone 2.78% 69.70%
4-bit Depth 52.11% 68.00%
Baseline
DistributionofDistance(Prediction,SqueezedPrediction)(MNIST)
Maximum L1 Distance 20
0
200
400
600
800
0.0 0.4 0.8 1.2 1.6 2.0
Num
ber o
f Exa
mpl
es
Legitimate
Adversarial
SpatialSmoothing:MedianFilter
• Replaceapixelwithmedianofitsneighbors.• Effectiveineliminating”salt-and-pepper”noise.
21*Imagefromhttps://sultanofswing90.wordpress.com/tag/image-processing/
3x3MedianFilter
SpatialSmoothing:Non-localMeans
• Replaceapatchwithweightedmeanofsimilarpatches.• Preservemoreedges.
22
𝑝
𝑞A
𝑞< 𝑝3 =P𝑤(𝑝, 𝑞E)×𝑞E
23
Airplane94.4%
Truck99.9%
Automobile56.5%
Airplane98.4%
Airplane99.9%
Ship46.0%
Airplane98.3%
Airplane80.8%
Airplane70.0%
MedianFilter(2*2)
Non-localMeans(13-3-4)
OriginalBIM(L∞)JSMA(L0)
AccuracywithSpatialSmoothing
24
Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0)
LegitimateImages
ImageNet
None 2.78% 69.70%
Median Filter2*2 68.11% 65.40%
Non-localMeans11-3-4 57.11% 65.40%
Baseline
DistributionofDistance(Prediction,SqueezedPrediction)(ImageNet)
Maximum L1 Distance 25
020406080100120140
0.0 0.4 0.8 1.2 1.6 2.0
Legitimate
Adversarial
Num
ber o
f Exa
mpl
es
Adversarialinputs(CW2 attack)
Legitimateinputs
OtherPotentialSqueezers
27
CXie,etal.MitigatingAdversarialEffectsThroughRandomization,ICLR2018.
JBuckman,etal.ThermometerEncoding:OneHotWayToResistAdversarialExamples,ICLR2018.
DMeng andHChen,MagNet:aTwo-ProngedDefenseagainstAdversarialExamples,inCCS2017.
FLiao,etal.DefenseagainstAdversarialAttacksUsingHigh-LevelRepresentationGuidedDenoiser,arXiv 1712.02976.
APrakash,etal.DeflectingAdversarialAttackswithPixelDeflection,arXiv 1801.08926.
• Thermometer Encoding(learnable bit depth reduction)
• Imagedenoising usingbilateralfilter,autoencoder,wavelet,etc.
• Imageresizing
ExperimentalSetup
• DatasetsandModelsMNIST, 7-layer-CNNCIFAR-10, DenseNetImageNet, MobileNet
• Attacks(100examplesforeachattack)• Untargeted:FGSM,BIM,DeepFool• Targeted(Next/Least-Likely): JSMA, Carlini-WagnerL2/L∞/L0
• DetectionDatasets• Abalanceddatasetwithlegitimateexamples.• 50%fortrainingthedetector,theremainingforvalidation.
28
Threat Models
• Oblivious adversary: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.
• Adaptive adversary: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.
29
Detection Framework:MultipleSqueezers
30
ModelPrediction0
Input
Model
Squeezer1
Prediction1𝐿<
max 𝑑<,𝑑A > 𝑇
Yes
Adversarial
No
Legitimate
Model
Squeezer2
Prediction2𝐿<
• Bit Depth Reduction• Spatial Smoothing
d1
d2
HowtofindTfordetector (MNIST)
Maximum L1 Distance 31
SelectathresholdvaluewithFPR5%.
0
200
400
600
800
0.0 0.4 0.8 1.2 1.6 2.0
Num
ber o
f Exa
mpl
es
Legitimate
Adversarial
Detect Successful Adv. Examples (MNIST)
32
SqueezerL∞Attacks L2 Attacks L0 Attacks
FGSM BIM CW∞ CW2 CW0 JSMA
1-bit Depth 100% 97.9% 100% 100% 55.6% 100%
Median 2*2 73.1% 27.7% 100% 94.4% 82.2% 100%
[Best Single] 100% 97.9% 100% 100% 82.2% 100%
Joint 100% 97.9% 100% 100% 91.1% 100%
Bit Depth Reduction is more effective on L∞ and L2 attacks.
Median Smoothing is more effective on L0 attacks.
Joint detection improves performance.
AggregatedDetectionResults
Dataset Squeezers ThresholdFalse
PositiveRate
DetectionRate(SAEs)
ROC-AUCExcludeFAEs
MNIST BitDepth(1-bit),Median(2x2) 0.0029 3.98% 98.2% 99.44%
CIFAR-10BitDepth(5-bit),Median(2x2),Non-localMean(13-3-2)
1.1402 4.93% 84.5% 95.74%
ImageNetBitDepth(5-bit),Median(2x2),Non-localMean(11-3-4)
1.2128 8.33% 85.9% 94.24%
33
Threat Models
• Oblivious attack: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.
• Adaptive attack: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.
34
AdaptiveAdversary
AdaptiveCW2 attack,unboundedadversary.
WarrenHe,JamesWei,Xinyun Chen,NicholasCarlini,DawnSong,AdversarialExampleDefense:EnsemblesofWeakDefensesarenotStrong,USENIXWOOT’17.
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3 +𝑘 ∗ 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑐𝑜𝑟𝑒(𝑥′)
35
Misclassificationterm Distanceterm Detectionterm
AdaptiveAdversarialExamples
36
No successful adversarial examples were found for images originally labeled as 3 or 8.
Mean L2
2.80
4.14
4.67
AdaptiveAdversarySuccessRates
37
0.68
0.060.01
0.44
0.01
0.24
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Adv
ersa
ry’s
Suc
cess
Rat
e
Clipped ε
Targeted (Next)
Targeted(LL)
Untargeted
Unbounded
Common 𝜺
CounterMeasure:Randomization
• Binaryfilterthreshold:=0.5threshold:=𝒩 0.5,0.0625
• StrengthentheadaptiveadversaryAttack an ensemble of 3detectors with thresholds:=[0.4,0.5,0.6]
38
0
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 1
39
2.80, Untargeted
4.14, Targeted-Next
4.67, Targeted-LL
3.63, Untargeted
5.48, Targeted-Next
5.76, Targeted-LL
AttackDeterministicDetector Mean L2
AttackRandomizedDetector
Conclusion
• FeatureSqueezinghardens deep learning models.• Feature Squeezing gives advantages to the defense side inthe armsrace with adaptive adversary.
40
Thankyou!ReproduceourresultsusingEvadeML-Zoo:https://evadeML.org/zoo
41