RAIN: A Simple Approach for Robust and Accurate Image ... · forward prediction path and the...

RAIN: A Simple Approach for Robust and AccurateImage Classification Networks

Jiawei Du12∗

[email protected] Yan1∗

[email protected] Y. F. Tan1

[email protected]

Joey Tianyi Zhou2

[email protected] Siow Mong Goh2

[email protected]

Jiashi [email protected]

1Dept. ECE, National University of Singapore, Singapore2Institute of High performance Computing, A*STAR, Singapore

Abstract

It has been shown that the majority of existing adversarial defense methods achieverobustness at the cost of sacrificing prediction accuracy. We propose a novel de-fense framework, Robust and Accurate Image classificatioN (RAIN), to improvethe robustness of given CNN classifiers and, at the same time, preserve theirhigh prediction accuracies. RAIN introduces a new randomization-enhancementscheme. It applies randomization over inputs to break the ties between the modelforward prediction path and the backward gradient path, thus improving the modelrobustness. It then enhances the input’s high-frequency details to retain the CNN’shigh prediction accuracy. Concretely, RAIN consists of two complementary ran-domization modules: randomized small circular shift (RdmSCS) and randomizeddown-upsampling (RdmDU). The RdmDU module first randomly downsamplesthe input image. Then, the RdmSCS module circularly shifts the input image alonga randomly chosen direction by a small but random number of pixels. Finally, theRdmDU module performs upsampling with a high-performance super-resolutionmodel, such as the EDSR, to reconstruct an image with rich details, since anempirical study we conduct reveals that the loss of high-frequency components ininput images leads to a drop in the accuracy of a classifier. We conduct extensiveexperiments on the STL10 and ImageNet datasets to verify the effectiveness ofRAIN. Our numerical results show that RAIN outperforms several state-of-the-artmethods in both robustness and prediction accuracy. The code of our work isavailable at https://github.com/dydjw9/RAIN

1 Introduction

In recent years, classification models based on convolutional neural networks (CNN) have been suc-cessfully applied to a variety of areas such as financial marketing [5], security [35], and autonomousvehicles [27]. In these real-world applications, robustness is an essential property of CNN classifiers.However, CNN classifiers have been revealed to be highly vulnerable to adversarial examples [36, 4]– even certain visually imperceptible perturbations to inputs can easily fool the CNNs, resulting inincorrect predictions. To ameliorate this problem, researchers have developed a variety of adversarial∗Equal contribution.

Preprint. Under review.

arX

iv:2

004.

1479

8v3

[cs

.CR

] 2

0 Ju

n 20

20

https://github.com/dydjw9/RAIN

defense methods; these can be roughly divided into two categories: adversarial training and inputpreprocessing. The adversarial training-based methods train CNN classifiers on clean images aswell as their adversarial examples [23, 41, 25, 42, 39]; the preprocessing methods develop dedicatedtransformations to map the adversarial examples onto the clean image manifold without the needof re-training classifiers [30, 26, 33, 38, 28, 21]. In practice, input images can be either clean oradversarially perturbed, and classifiers are usually uninformed about the nature of type each input.Thus, to ensure reliability, the classifiers have to treat clean and perturbed images on the same footingand try to make correct predictions for both types of images.

While it has been reported that most existing defense methods improve the robustness of classifiers,an undesirable side effect is that they sacrifice the prediction accuracy [34, 42]. In this work, weaim to tackle this problem by developing a novel preprocessing-based defense framework that canenhance the robustness of CNN classifiers and simultaneously retain a high prediction accuracy. Ourframework, Robust and Accurate Image classificatioN (RAIN), consists of two randomized modules:randomized small circular shift (RdmSCS) and randomized down-upsampling (RdmDU). In theRdmSCS module, input images are circularly shifted along a randomly chosen direction by a smalland random number of pixels. In the RdmDU module, input images are first downsampled by a factorof 2 through randomly sampling one pixel from each 2× 2 patch. We then use a well-trained deepsuper-resolution (SR) model, EDSR [22], to reconstruct images for prediction. Our RAIN frameworkcombines these two randomized modules in the following order: (i) Randomized downsampling, (ii)RdmSCS, (iii) EDSR upsampling.

The strategy of developing the RAIN is straightforward: On the one hand, randomization can improveadversarial robustness by ensuring mismatch between the inference forward path and the gradientbackward path [38]. On the other hand, to maintain high prediction accuracy, we should applyimage transformations to which CNN classifiers are (almost) invariant to. In consideration of bothaspects, we develop and investigate the efficacies of the RdmSCS and RdmDU operations. Weempirically show that these two operations effectively improve the robustness of CNN classifiers withminimal degradations to the prediction accuracies. It is worth noting that, for the RdmDU operation,performing upsampling with the bicubic interpolation suppresses the high-frequency componentsof original input images, and results in a severe drop in prediction accuracy. A further contributionof this study reveals that the removal of such high-frequency details is a critical reason for the dropin accuracy. Given such an observation, we finally use the EDSR network to reconstruct (obtainupsampled) high-quality images with rich (high-frequency) details.

The effectiveness of our RAIN framework is evaluated on the STL10 [9] and ImageNet [10] datasets.The results show that RAIN-based methods achieve higher prediction accuracy and better robustnesscompared to several baselines methods [38, 28, 23, 39]. In summary, the main contributions of thiswork are as follows:

1) We propose the RAIN framework, which enhances the robustness and retains a high predic-tion accuracy of a given CNN classifier. This method works in a plug-in manner and is easyto implement given any CNN classifier.

2) We introduce two simple yet effective components of the RAIN framework: RdmSCS andRdmDU. Each of these modules can independently improve the robustness with a minimalpenalty to the prediction accuracy.

3) We reveal that the loss or inadvertent removal of high-frequency components in input imagesis a critical factor in the loss of the accuracy of a CNN classifier. Hence, we are inspired touse the EDSR model for upsampling within the RdmDU module.

2 Preliminaries on Adversarial Robustness

To evaluate the robustness of a classifier, we adopt five white-box attack [14, 23, 24, 6, 8] andtwo black-box attack methods to generate potentially harmful perturbations. We denote a certainattack method with perturbation budget ε by Aε(·); this means that the maximum value of a certainnorm (to be specified later) of the perturbation is ε. It is noteworthy that, in the white-box attackscenario, we allow the adversary be able to access full information of the given CNN model as wellas the preprocessing-based defense module. In other words, the attacker can compute the gradientof predictions w.r.t. the input through the defense module. Compared to some existing methods

2

[26, 33, 30], in which the gradients of defense modules are not available, the end-to-end white-boxattacks in our setting are much stronger.

Given a CNN classifier C(x), its robustness is quantified by the classification accuracy over theadversarially perturbed versions of input images. Since it is less meaningful to attack inputs that arealready classified incorrectly [38, 28], we randomly collect a test set of images, T = {(xj , yj)}Mj=1,where each element (xj , yj) is correctly classified by C(x), i.e., C(xj) = yj for all 1 ≤ j ≤M . Foran adversarial attack method Aε(·), the robustness of C(x) is evaluated as

RT (C, Aε) =1

M

M∑j=1

1 [C(xj +Aε(xj)) = yj ] . (1)

Experimental Settings: Since experiments are interspersed throughout the paper starting fromSection 3, here we describe the common experimental settings. We conduct experiments on theSTL10 [9] and / or the ImageNet [10] datasets. We use a seven-layer CNN classifier for the STL10dataset and the Resnet-50 [15] model for the ImageNet. Details about the network architecures canbe found in Section 8.1. For each dataset, we randomly select 5,000 images from the validation set asthe robustness test set, i.e., M = 5, 000 in Equation (1).

For the white-box attacks, we consider one single-step attack method (FGSM [14]) and four iterativeattack methods (PGD [23], C&W [6], Deepfool [24] and EAD [8]). The FGSM and Deepfool attacksare based on the l∞ norm with ε = 8/255. The PGD attack is based on the l∞ norm with ε = 16/255and learning rate 1/255. The C&W attack is based on l2 norm with learning rate 1/255. The EADattack is based on l1 norm with learning rate 1/255. All the four iterative attacks work in 40 iterations.For the black-box attacks, we consider the ZOO method [7] with a maximum of 10240 queries andthe NES [16] with a maximum of 2000 queries. Both of these methods are based on the l∞ norm withε = 8/255. All the experiments follow the same settings discussed above unless otherwise specified.

3 Randomization for Model Robustness

Inspired by the invariance of CNNs to image shifting and scaling [18, 19], we aim to utilize thesetwo properties to design robust and accurate CNN models. In particular, we consider two randomizedtransformation modules: RdmSCS and RdmDU. One the one hand, randomization can result in amismatch between the forward prediction path and the gradient back-propagation path [38]. Hence,these two transformations are expected to be able to improve the adversarial robustness of classifiers.On the other hand, CNNs are capable of learning translation-invariant representations due to thegradually increasing size of the receptive field [13] and the pooling operations [17]. The down-upsampling operation also does not change the size of the images and the positions of contents. Thus,the two transformations of interest would likely not degrade the prediction accuracy significantly.

3.1 Randomized Small Circular Shift

Figure 1: An illustration of the RdmSCS module. Left:an input image; Mid: the input shifted with p = 0.05;Right: the input shifted with p = 0.25.

We first consider a random shift operation of theinput. Before it is fed to a given CNN model, theinput is shifted by a random number of pixelsalong a randomly selected direction. More pre-cisely, two shifting parameters, ∆h and ∆w, arerandomly sampled from uniform distributions,i.e., ∆h ∼ U(−hp, hp),∆w ∼ U(−wp,wp),where h and w are the height and width of theinput image respectively, and p is a predefinedpositive constant. The sign of ∆h (or ∆w) rep-resents the vertical (or horizontal) direction for shifting, while the magnitude quantifies the number ofpixels to be shifted. Algorithm 1 elaborates on the implementation of the proposed random shiftingoperation. We use the superscript s to denote the shifted version of an image.Conventional translation operations pad zeros into the empty regions [2]. Instead, our proposedoperation splices back the shifted part of the input image circularly from the opposite boundary(see Figure 1). The aim of this operation is to retain the frequency components and other statistics(such as the means and variances of pixel values) of original images as much as possible so that thepreprocessing operation would not have an obvious negative impact on the prediction accuracy. For

3

Algorithm 1 RdmSCS

1: function RANDOM_SHIFT(x, p)2: for is in (0,h− 1) and js in (0,w − 1) do3: Randomly initialize ∆h ∼ U(−hp, hp),∆w ∼ U(−wp,wp)4: i ≡ (is + ∆h) mod h5: j ≡ (js + ∆w) mod w6: xs(is, js) = x(i, j)

return xs

this reason, we usually set p to be a small value in practice (e.g. p = 0.05 in the proposed module).Accordingly, we dub the proposed shifting operation as Randomized Small Circular Shift (RdmSCS).

Table 1: Evaluation of RdmSCS on the robustness and prediction accuracy on STL10 dataset.Accuracy Robustness

STL10 Clean Images FGSM PGD Deep FoolCNN 1.000 0.162 0.000 0.000

CNN + RdmSCS 0.958 0.290 0.079 0.093

Evaluation: We conduct experiments on the STL10 dataset to examine the effectiveness of RdmSCS.From Table 1, we can see that the proposed RdmSCS clearly improves on the robustness of CNNclassifiers (from 16.2% to 29.0% for FGSM). At the same time, the RdmSCS-modified classifiermaintains the high accuracy on the clean images (95.8%).

3.2 Randomized Down-Upsampling

In addition to the RdmSCS operation, we also introduce a Randomized Down-Upsampling (RdmDU)module for further improving the robustness which simultaneously preserves the classificationaccuracy. The RdmDU module consists of a randomized downsampling operation followed by anupsampling operation. Given an input image x ∈ Rh×w×3, the random downsampling operation ofRdmDU partitions the input into non-overlapping 2× 2 patches and randomly picks one pixel fromthe four in each patch with the same probability 1/4. Consequently, the resultant image, denoted asx↓, has size of 1

2h×12w × 3. As the CNN classifier is trained with images of the original size, the

upsampling operation of RdmDU reconstructs a high-resolution image of size h×w× 3 , denoted asx↓↑, from x↓. Algorithm 2 shows the implementation of the RdmDU module in detail.

Algorithm 2 RdmDU

Input: Input image x;1: x = RANDOM_DOWNSAMPLING(x)2: x = Upsampling(x, factor = 2)

Output: x3: function RANDOM_DOWNSAMPLING(x)4: for i in (0, d 12he − 1) and j in (0, d 12we − 1) do5: Randomly initialize ∆i ∼ U([0, 1]),∆j ∼ U([0, 1])6: x↓(i, j) = x(min(2i+ ∆i, h− 1),min(2j + ∆j, w − 1))

return x↓

For the upsampling operation, we recall that the bicubic interpolation method [20] is widely-usedfor image resizing and is easy to implement. Thus, we start with using the bicubic interpolationwithin the RdmDU module to super-resolve the randomly downsampled input image. We modifyCNN classifiers with the bicubic-based RdmDU (RdmDU-bic) and evaluate the robustness andprediction accuracy on the STL10 dataset. We find that the RdmDU-bic can effectively enhance therobustness of the CNN classifier. Unfortunately, the prediction accuracy drops significantly to 81.2%,which is undesirable in this and other applications. To ameliorate this problem, we investigate theunderlying causes of this performance drop of the RdmDU-bic module by appealing to frequencydomain analyses in the next section.

4

Table 2: Evaluation of RdmDU-bic on the robustness and prediction accuracy on STL10 dataset.Accuracy Robustness

STL10 Clean Images FGSM PGD Deep FoolCNN 1.000 0.162 0.000 0.000

CNN + RdmDU-bic 0.812 0.475 0.050 0.539

4 What is lacking in preserving the CNN’s accuracy?

To discover the reason for the significant drop in accuracy due to the RdmDU-bic operation (see Table2), we compare input and output images of RdmDU-bic. From Figure 2, we see that visually, thedifference, if any, between a certain original image x and its processed version x↓↑ is imperceptible.However, if we transform the image into the frequency domain using the 2D-Fast Fourier Transform(2D-FFT) F(·), we find that the RdmDU-bic tends to suppress the high-frequency components ofthe original image. Therefore, we hypothesize that the accuracy drop arises because of the loss ofhigh-frequency components. To corroborate this hypothesis, we then conduct experiments to studythe contribution of high-frequency components to the accuracy of a well-trained CNN model.

Figure 2: Comparison between an inputand the output of RdmDU-bic. From left toright, images in the upper row: input x, out-put x↓↑, and x−x↓↑; specturms in the lowerrow: F(x), F(x↓↑), and F(x)−F(x↓↑).

Setup: Given a CNN classifier, we remove an increasingamount of high frequency components and examine howthe prediction accuracy changes with respect to the loss ofhigh-frequency components. Specifically, we preprocessinput images in two steps, namely low-pass filtering andenergy normalizing:

(1.) The low-pass filter removes the high frequency compo-nents above a certain threshold. We first compute the spec-trum z of an input image x via the 2D-FFT, i.e., z = F(x).Next, we remove the high-frequency components abovea given threshold r, and obtain the resultant spectrum z′,where

z′(µ, ν) = z(µ, ν) · 1(d((µ, ν), (ch, cw))

1/2√h2 + w2

≤ r). (2)

Here d(·, ·) denotes the Euclidean distance between twopoints and (ch, cw) is the position of the center, which

corresponds to the zero-frequency component. (2.) Removing high-frequency components reduces theenergy of the spectrum E(z), where E(z) ≡

∫µ

∫ν|z(µ, ν)|2dµdν. To ensure that the comparisons

are fair, we normalize the spectrums of the images before and after processing so that they have thesame energy. Thus, we multiply the resultant spectrum z′ with

√E(z)/E(z′) and reconstruct the

RGB image for making subsequent predictions.

0.20.30.40.50.60.70.80.91.0Threshold r

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Accu

racy

STL10 AccuracyImageNet Accuracy

Figure 3: On both two datasets, we cansee even the energy of the filtered spectrumis controlled to be the same with the origin,removing high-freq components still impairsthe test accuracy

Findings: We conduct experiments on the STL10 and Im-ageNet datasets. For each dataset, we plot a curve of theperformance of the classifier as a function of the thresholdon the frequency r. From Figure 3, we find that, on thetwo datasets, the prediction accuracies drop rapidly as rdecreases, corresponding to an increase in the amount ofhigh-frequency components being filtered out. Thus, thehigh-frequency components of the input images are criticalfor maintaining the accuracy of predictions. Motivated bythis observation, a possible solution to mitigate the degra-dation in the accuracy resulting from RdmDU-bic is toreplace the bicubic interpolation with a high-performancesuper-resolution (SR) model, so that we can reconstructthe input image with details that are richer.

RdmDU-EDSR: Deep-learning-based SR methods havebeen shown to achieve impressive performances on SRtasks [11, 22]. Here, we adopt the EDSR [22] model,which is one of the most effective implementations for SR tasks. Our final randomized down-upsampling operation is termed as RdmDU-EDSR. We conduct experiments on the STL10 dataset to

5

evaluate the effectiveness of RdmDU-EDSR. The results in Table 3 show that our proposed RdmDU-EDSR module further improves the robustness of CNN classifiers in comparison to RdmDU-bic (e.g.,from 47.5% to 59.5% against FGSM). Satisfyingly, the classifier’s prediction accuracy is significantlyhigher than RdmDU-bic’s (93.5% as compared to 81.2%).

Table 3: Evaluation of RdmDU-EDSR on the robustness and prediction accuracy on STL10 datasetAccuracy Robustness

STL10 Clean Images FGSM PGD Deep FoolCNN + RdmDU-bic 0.812 0.475 0.050 0.539

CNN+ RdmDU-EDSR 0.935 0.595 0.128 0.664

5 RAIN: Robust and Accurate Image Classification Networks

In the previous sections, we described two randomized preprocessing operations, both of which wereshown to improve the robustness of CNN classifiers with minimal degradations to the predictionaccuracies. Here, we formally introduce the Robust and Accurate Image Classification framework,which combines the RdmSCS and the RdmDU-EDSR.

Figure 4: The pipeline of our proposed RAIN framework.

To be specific, the whole pipeline of the RAIN is as follows (see Figure 4). For a given well-trained classification model C(·), RAIN first downsamples a certain image x through the randomizeddownsampling operation within RdmDU module and shifts the resultant image via RdmSCS modulesuccessively. Then, a well-trained EDSR model upsamples the image so that the resulting size isthe same as the original image and the details are also enriched. Lastly, the resultant image is fedto the given CNN classifier C(·) for prediction. To generate image details that are most useful forC(·), we fix the CNN’s parameters and fine-tune the EDSR model’s parameters to minimize theloss of the classification task over a few training epochs. In the following, we conduct extensiveexperiments to verify the efficacy in terms of the accuracy and robustness of the combined RdmSCSand RdmDU-EDSR modules within the RAIN framework.

5.1 An Ablation Study on the Combination of RdmSCS and RdmDU

The two randomized preprocessing modules can be combined in three different orders: Rdm-SCS+RdmDU, RdmDU+RdmSCS, and RAIN. Here, we conduct experiments on the STL10 datasetto show that RAIN outperforms the other two variants. The experimental settings have been describedin Section 2. From Table 4, we can see that all of the three variations are more robust than the originalCNN model. However, the RAIN architecture achieves the best accuracy (92.9%) and robustness(70.8% against FGSM and 78.1% against Deep Fool) among the three variants.

Table 4: Comparison of the three type of combinations of RdmSCS and RdmDU-EDSR. We use ‘D,‘E’ and ‘S’to denote Randomized Downsampling, EDSR upsampling and RdmSCS respectively. Among the three types,the type ‘D-S-E’ (RAIN) achieves the strongest robustness.

Accuracy RobustnessSTL10 Clean Images FGSM PGD Deep Fool

Original model 1.000 0.162 0.000 0.000S - D - E 0.909 0.680 0.237 0.754D - E - S 0.899 0.701 0.188 0.688D - S - E 0.929 0.708 0.237 0.781

6

5.2 Evaluating the Adversarial Robustness of RAIN

We evaluate the robustness of our proposed RAIN on the STL10 [9] and ImageNet [10] datasetsunder white-box and black-box attacks. We also compare RAIN with other four recent adversarialdefense methods: two of them are preprocessing-based defense methods, namely Random Resizing &Padding [38] and Pixel Deflection [28]; the other two are adversarial-training-based methods, namelyPGD-Adversarial Training [23] and Feature Denoising [39].

Table 5: Robustness comparison of different defense methods against white-box attacks. We report theclassification accuracies on clean images and various types of adversarial examples. The results show that theproposed RAIN framework achieves the best performance in prediction accuracy and robustness. The methodsof [23] and [39] train classifers with PGD adversarial examples. Thus, we do not evaluate their robustness withrespect to PGD attacks for comparison.

Accuracy RobustnessSTL10 Clean FGSM PGD C&W-l2 Deep Fool EAD-l1

Pixel Deflection [28] 0.883 0.286 0.065 0.320 0.117 0.407Rand. Pad.&Res. [38] 0.907 0.576 0.070 0.699 0.305 0.789Madry [23] 0.705 0.592 - 0.289 0.234 0.028Feature Denoising [39] 0.696 0.631 - 0.300 0.227 0.034RAIN 0.929 0.708 0.237 0.734 0.781 0.828

ImageNet Clean FGSM PGD C&W-l2 Deep Fool EAD-l1Pixel Deflection [28] 0.858 0.406 0.117 0.320 0.210 0.365Rand. Pad.&Res. [38] 0.928 0.644 0.154 0.438 0.703 0.759Madry [23] 0.623 0.620 - 0.355 0.426 0.063Feature Denoising [39] 0.653 0.648 - 0.398 0.410 0.115RAIN 0.933 0.686 0.273 0.523 0.867 0.857

White-box Attack: We follow the white-box attack setting as described in Section 2, i.e., the attackerhas full information of the defense modules. From Table 5, we observe that our RAIN frameworkoutperforms the other four baselines in terms of both prediction accuracy and robustness. Moreprecisely, RAIN results in the best prediction accuracies on the STL10 (92.9%) and ImageNet (93.3%)datasets. Regarding the adversarial robustness, RAIN consistently outperforms other baselines againstvarious attacks. This is especially so for the DeepFool attack in which RAIN’s performance issignificantly better than the second best competitor (78.1%-vs-30.5% on STL10 and 86.7%-vs-70.3%on ImageNet).

Black-box Attacks: We follow the black-box attack setting as described in Section 2. All theblack-box attacks have perturbation budget given by ε = 8/255. We compare the proposed RAINarchitecture to the baseline methods under black-box adversarial attacks. Table 6 shows that ourproposed RAIN framework achieves the highest accuracy and the best robustness against the ZOOattack (91.2% on STL10, 88.5% on ImageNet). In terms of the NES attack, the performance of RAINis very close to that of the Random Resizing & Padding method [38], and is clearly better than otherthree baselines competitors.

Table 6: Evaluation of the robustness of the proposed RAIN framework and the baseline methods underblack-box attacks on STL10 and ImageNet datasets.

Accuracy RobustnessSTL10 Clean Images ZOO NES

Pixel Deflection [28] 0.883 0.679 0.650Rand. Pad.&Res. [38] 0.907 0.854 0.873Madry [23] 0.705 0.705 0.663Feature Denoising [39] 0.696 0.621 0.594RAIN 0.929 0.912 0.871

ImageNet Clean Images ZOO NESPixel Deflection [28] 0.858 0.846 0.841Rand. Pad.&Res. [38] 0.928 0.867 0.881Madry [23] 0.623 0.620 0.611Feature Denoising [39] 0.653 0.663 0.641RAIN 0.933 0.885 0.882

In summary, our experiments convincingly show that our proposed RAIN framework can effectivelyboost the robustness of CNN classifiers against various types of adversarial attacks. Unlike existingdefense methods, our RAIN framework preserves high prediction accuracies of the given underlyingclassifiers. We conclude that the RAIN framework is a promising solution for building reliableCNN-based classification systems.

7

6 Related Work

Robustness measures the sensitivity of the accuracy of a classification model to perturbations of theinputs. Many approaches have been proposed to estimate the most harmful adversarial exampleswithin a given neighborhood, such as FGSM [14], DeepFool [24], and PGD [23]. These methodscompute adversarial perturbations based on available gradients evaluated at neighborhoods of thegiven input images. They can access the gradients in each layer and are thus called white-box attacks.In contrast, several other methods [7, 16, 37, 12] propose to estimate the adversarial perturbation byusing a number of queries. These approaches, without information about the gradients in each layer,are called black-box attacks. In this work, we use five white-box and two black-box attack methodsto evaluate the robustness of classifiers.

To enhance the adversarial robustness of given classifiers, researchers have recently proposed a widevariety of defense methods. Here, we review two main classes of such methods, namely adversarialtraining and input preprocessing methods. We also discuss the connections of these methods to ourwork.

Adversarial Training. In this class of methods, adversarial training-based methods train CNNclassifiers from scratch on both clean images and their corresponding adversarial examples toenhance their robustness. In each training epoch, the adversarial perturbations corresponding to theclean images are generated in real-time. The works in [23, 1] show that adversarial training is aneffective defense mechanism against certain types of adversarial perturbations. However, trainingCNNs from scratch and computing adversarial examples in real-time has an adverse effect on thetraining cost; in fact, a study [31] has shown that it increases the training cost by 3-to-30 times. Thiscomputational burden restricts the applicability of adversarial training methods on large-scale datasets.To ameliorate the problem, researchers have proposed several methods for efficiently estimatingadversarial perturbations [41, 32]. Besides, another severe drawback of adversarial training methodsis that the resultant classifiers tend to over-fit a certain class of adversarial examples [29, 3, 31]. Thisresults in a sharp drop in the prediction accuracy. Consequently, the classifiers are still vulnerable toother types of attacks.

Input Preprocessing Preprocessing-based defense methods remove the adversarial informationfrom input images by transforming images in a dedicated way before they are fed into given CNNclassifiers. One straightforward idea is to map input images onto the manifold generated by the cleandata. To this end, several works [30, 33] propose to train generative models on the original datasetand use them to reconstruct images that are close to the adversarial examples for making predictions.Similarly, the work of [40] first impairs small patches of the adversarial examples and then performsreconstruction via matrix estimation. Traditional image processing operations have also been shownto be effective in robustifying given CNN classifiers. For example, Mustafa et al. [26] propose to useSR models to deactivate the adversarial perturbations. However, the enhanced robustness that resultsfrom these methods is due to the fact that the attackers are not provided with the internal gradients ofthe preprocessing modules [1, 30, 33]. In the case of the end-to-end white-box attacks (used in thiswork), these methods will fail in defending against the adversaries. To mitigate this drawback, theworks of [38, 28] propose to randomize image processing operations such as padding and resizing.The randomization introduced effectively breaks the connection between the forward inference pathand backward gradient path, and consequently it improves robustness against adversaries. However,these randomization-based methods also suffer from deteriorations in prediction accuracies becauseCNN models are not invariant to the random padding and resizing operations.

Our work aims to develop an effective and efficient defense framework, which can robustify givenCNN classifiers and maintain high prediction accuracies. The RAIN framework amalgamatesrandomization concepts into two image processing modules that preserve the overall classificationaccuracy.

7 Conclusions

This paper proposes a novel adversarial defense framework, RAIN, which combines two random-ization modules, namely RdmSCS and RdmDU. As shown through our extensive experiments onbenchmark datasets, , RAIN can significantly enhance the robustness of given CNN classifiers andsimultaneously preserve their high prediction accuracies. RAIN ensures the reliability of CNN

8

classifiers when the set of inputs is a combination of clean images and their adversarially perturbedversions.

In the future, we aim to develop a defense framework that is applicable to the classification oflow-resolution and low-quality images because the quality of such images can be severely degradedby preprocessing operations, such as downsampling. The degradation may be so severe that effectiveSR models cannot reconstruct images to a level that predictions can be reliably made. We alsoplan to analyze the fundamental limits of the trade-off between prediction accuracy and adversarialrobustness in a theoretically-grounded manner. This may lead to the proposal of a defense mechanismagainst adversarial attacks that has some theoretical guarantees.

9

Broader Impact

The RAIN framework has the potential to robustify CNN classifiers to make reliable predictions. Thisis especially critical in domains such as medical diagnoses, where trustworthy predictions are neededto be made on clean input images as well as those that have been maliciously perturbed. In addition,our observation that high-frequency details are paramount in ensuring correct predictions are mademay also inspire the research community to pay more attention to high-frequency components ofinput images when developing algorithms for enhancing certain properties of CNN classifiers. Hence,our work has the potential to achieve not only scientific but also societal impact.

References[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security:

Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.

[2] A. Azulay and Y. Weiss. Why do deep convolutional networks generalize so poorly to smallimage transformations. arXiv preprint arXiv:1805.12177, 2018.

[3] Y. Balaji, T. Goldstein, and J. Hoffman. Instance adaptive adversarial training: Improvedaccuracy tradeoffs in neural nets. arXiv preprint arXiv:1910.08051, 2019.

[4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndic, P. Laskov, G. Giacinto, and F. Roli.Evasion attacks against machine learning at test time. In Joint European Conference on MachineLearning and Knowledge Discovery in Databases. Springer, 2013.

[5] L. Bottou, Y. Bengio, and Y. Le Cun. Global training of document processing systems usinggraph transformer networks. In Proceedings of IEEE Computer Society Conference on ComputerVision and Pattern Recognition, pages 489–494. IEEE, 1997.

[6] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEESymposium on Security and Privacy (SP). IEEE, 2017.

[7] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh. Zoo: Zeroth order optimization basedblack-box attacks to deep neural networks without training substitute models. In Proceedings ofthe 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.

[8] P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh. Ead: elastic-net attacks to deep neuralnetworks via adversarial examples. In Thirty-second AAAI Conference on Artificial Intelligence,2018.

[9] A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised featurelearning. In Proceedings of the Fourteenth International Conference on Artificial Intelligenceand Statistics, pages 215–223, 2011.

[10] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchicalimage database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages248–255. IEEE, 2009.

[11] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutionalnetworks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307,2015.

[12] J. Du, H. Zhang, J. T. Zhou, Y. Yang, and J. Feng. Query-efficient meta attack to deep neuralnetworks. arXiv preprint arXiv:1906.02398, 2019.

[13] R. Gens and P. M. Domingos. Deep symmetry networks. In Advances in Neural InformationProcessing Systems, 2014.

[14] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014.

[15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages770–778, 2016.

10

[16] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box adversarial attacks with limited queriesand information. arXiv preprint arXiv:1804.08598, 2018.

[17] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks.In Advances in Neural Information Processing Systems, 2015.

[18] E. Kauderer-Abrams. Quantifying translation-invariance in convolutional neural networks.ArXiv, 2018.

[19] O. S. Kayhan and J. C. van Gemert. On translation invariance in cnns: Convolutional layers canexploit absolute spatial location. ArXiv, 2020.

[20] R. G. Keys. Cubic convolution interpolation for digital image processing. 1981.

[21] C. Kou, H. K. Lee, E.-C. Chang, and T. K. Ng. Enhancing transformation-based defensesagainst adversarial attacks with a distribution classifier. In International Conference on LearningRepresentations, 2020.

[22] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for singleimage super-resolution. In Proceedings of the IEEE Conference on Computer Vision and PatternRecognition Workshops, pages 136–144, 2017.

[23] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning modelsresistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.

[24] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method tofool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pages 2574–2582, 2016.

[25] M. Mosbach, M. Andriushchenko, T. Trost, M. Hein, and D. Klakow. Logit pairing methodscan fool gradient-based attacks. arXiv preprint arXiv:1810.12042, 2018.

[26] A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao. Image super-resolution as a defenseagainst adversarial attacks. IEEE Transactions on Image Processing, 29:1711–1724, 2019.

[27] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practicalblack-box attacks against deep learning systems using adversarial examples. arXiv preprintarXiv:1602.02697, 2016.

[28] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Deflecting adversarial attackswith pixel deflection. In Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, pages 8571–8580, 2018.

[29] L. Rice, E. Wong, and J. Z. Kolter. Overfitting in adversarially robust deep learning. arXivpreprint arXiv:2002.11569, 2020.

[30] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-gan: Protecting classifiers againstadversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.

[31] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry. Adversarially robust gener-alization requires more data. In Advances in Neural Information Processing Systems, pages5014–5026, 2018.

[32] A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor,and T. Goldstein. Adversarial training for free! In Advances in Neural Information ProcessingSystems, pages 3353–3364, 2019.

[33] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generativemodels to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766,2017.

[34] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, and Y. Gao. Is robustness the cost of accuracy?–acomprehensive study on the robustness of 18 deep image classification models. In Proceedingsof the European Conference on Computer Vision, pages 631–648, 2018.

11

[35] W. Sultani, C. Chen, and M. Shah. Real-world anomaly detection in surveillance videos. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages6479–6488, 2018.

[36] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

[37] C.-C. Tu, P. Ting, P.-Y. Chen, S. Liu, H. Zhang, J. Yi, C.-J. Hsieh, and S.-M. Cheng. Autozoom:Autoencoder-based zeroth order optimization method for attacking black-box neural networks.In Proceedings of the AAAI Conference on Artificial Intelligence, pages 742–749, 2019.

[38] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects throughrandomization. arXiv preprint arXiv:1711.01991, 2017.

[39] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and K. He. Feature denoising for improvingadversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, pages 501–509, 2019.

[40] Y. Yang, G. Zhang, D. Katabi, and Z. Xu. Me-net: Towards effective adversarial robustnesswith matrix estimation. arXiv preprint arXiv:1905.11971, 2019.

[41] D. Zhang, T. Zhang, Y. Lu, Z. Zhu, and B. Dong. You only propagate once: Acceleratingadversarial training via maximal principle. In Advances in Neural Information ProcessingSystems, pages 227–238, 2019.

[42] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan. Theoretically principledtrade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.

12

8 Appendix

8.1 Networks Used on the STL10 and ImageNet datasets

CNN Classifier on the STL10: The CNN architecture used on the STL10 dataset is shown in Table7. Our implementation builds on the open-source codes.2

Table 7: The architecture of the CNN classifier on the STL10 dataset.

STL10 Repetition Layer

Convolutional Layers

×1 Conv(3, 32, 3, 1) + MaxPooling2d + ReLU×1 Conv(32, 64, 3, 1) + MaxPooling2d + ReLU×1 Conv(64, 128, 3, 1) + MaxPooling2d + ReLU×1 Conv(128, 128, 3, 1) + MaxPooling2d + ReLU×2 Conv(128, 256, 3, 0) + ReLU

Fully Connected Layer ×1 MaxPooling2d + Linear(256,10)

CNN Classifier on the ImageNet: We use the Resnet-50 [15] as the classification model on theImagenet dataset. The pretrained version of the Resnet-50 can be found from the Torchvisionrepository.3

EDSR: Our RAIN uses the same EDSR model for both of the two datasets. We use the single-scalebaseline version of EDSR,4 which consists of only 16 residual blocks (see Table 1 in [22]).

8.2 RAIN with adversarial training

We also conduct experiments on the STL10 dataset to evaluate the robustness of RAIN-modifiedCNN classifiers via adversarial training. The adversarial examples for training are generated by thePGD attack. We compare our RAIN to two state-of-the-art adversarial training - based methods[23, 39]. The robustness evaluation setting is described in Section 2.

Table 8: Evaluation of the robustness and prediction accuracy of various defense methods.Accuracy Robustness

STL10 Clean Images FGSM PGD C&W-l2 Deep Fool EAD-l1Madry [23] 0.705 0.592 0.649 0.289 0.234 0.028

Feature Denoising [39] 0.696 0.631 0.668 0.300 0.227 0.034RAIN 0.929 0.708 0.237 0.734 0.781 0.828

RAIN-adv 0.745 0.690 0.485 0.656 0.578 0.664

From Table 8, we can observe that the adversarially trained RAIN outperforms the two baselinemethods [23, 39] in accuracy and all types of adversarial robustness except the PGD attack. On theone hand, although the methods in [23] and [39] achieve better robustness against the PGD attack,they fail to generalize to other types of attacks. More precisely, their classification accuracies onC&W, Deep Fool, and EAD attacks are all lower than 30.0%. On the other hand, comparing RAINand RAIN-adv, we can see that training with adversarial examples can clearly improve the robustnessagainst various types of attack; however, it also leads to a significant drop in prediction accuracy.Thus, the adversarial training strategy is a double-edged sword.

2https://github.com/aaron-xichen/pytorch-playground3https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py4https://github.com/thstkdgus35/EDSR-PyTorch

13

https://github.com/aaron-xichen/pytorch-playground

https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

https://github.com/thstkdgus35/EDSR-PyTorch

RAIN: A Simple Approach for Robust and Accurate Image ... · forward prediction path and the...

Documents

Transcript of RAIN: A Simple Approach for Robust and Accurate Image ... · forward prediction path and the...