On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box,...
Transcript of On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box,...
Chawin SitawarinEECS, UC [email protected]
On the Robustness of Deep k-Nearest Neighbor
David WagnerEECS, UC [email protected]
2nd Deep Learning and Security Workshop (IEEE S&P 2019)
Motivation
“ Adversarial examples for kNN and DkNN ”
• No previous work attacks kNN directly
• Deep k-Nearest Neighbor (DkNN) shows a possibility for detecting adversarial examples but it is difficult to evaluate
• kNN is not differentiable so most existing attacks don’t work
• To measure how robust they really are, we need a white-box attack(no security through obscurity)
DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 2Chawin Sitawarin
k-Nearest Neighbor
Threat model: white-box, untargeted, Lp norm-ball adversarial exampleso All training samples are known to the attacker
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 3
k = 3
𝑧
Attack on kNN
• Baseline: mean attacko Move 𝑧 towards mean of
the nearest classo Use binary search to
determine the distance
• But this is not optimal
𝑧
mean
Mean Attack
Attack on kNN
• Our gradient-based attacko Main idea: move 𝑧 towards
a set of m nearest neighbors from a different class, {𝑥%} 𝑧
Attack on kNN
• Our gradient-based attacko Main idea: move 𝑧 towards
a set of m nearest neighbors from a different class, {𝑥%}
o Set up as a constrained optimization problem
*We use Euclidean distance here, but it can be directly substituted with cosine distance
𝑧
𝑥'
𝑥(
𝑥)
Here, 𝑚 = 5
𝑥-
𝑥.
Nearby training sample PerturbationInput
Lp-norm constraint Constraint on input domain
Attack on kNN
• Our gradient-based attacko Main idea: move 𝑧 towards
a set of m nearest neighbors from a different class, {𝑥%}
o Set up as a constrained optimization problem
Nearby training sample PerturbationInput
Lp-norm constraint Constraint on input domain
• It’s sufficient to be closeto only 𝑥( and 𝑥)!
• But it is difficult to know which 𝑥% ahead of time
𝑧
𝑥'
𝑥(
𝑥)
𝑥-
𝑥.
Here, 𝑚 = 5
Attack on kNN
• Our gradient-based attacko Main idea: move 𝑧 towards
a set of m nearest neighbors from a different class, {𝑥%}
o Set up as a constrained optimization problem
𝑧
𝑥'
𝑥(
𝑥)
𝑥-
𝑥.
Here, 𝑚 = 5
𝜂
• We want to ignore samples that are too far away by setting a threshold
• But hard threshold is not differentiable
Attack on kNN
• Our gradient-based attacko Main idea: move 𝑧 towards
a set of m nearest neighbors from a different class, {𝑥%}
o Set up as a constrained optimization problem
o Use sigmoid as a soft threshold
o Choose 𝜂 to be mean distance to k-th neighbor
𝑧
𝑥'
𝑥(
𝑥)
𝑥-
𝑥.
Here, 𝑚 = 5
Sigmoid function Threshold
𝜂
Approximate hard threshold with a soft, differentiable one
Results on kNN
• kNN uses cosine distance with k = 75 on MNIST dataset
Most have perceptible / semantic perturbation
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 10
Attacks Accuracy (%) Mean Perturbation (L2)No Attack 95.74 -Mean Attack 5.89 8.611Our Gradient Attack 9.89 6.565
Clean
Adv.
Deep k-Nearest Neighbor
• Proposed by Papernot & McDaniel ’18
• Essentially, kNN on outputs of multiple layers of a neural network
• Simple scheme that offers some interpretability
• Can detect out-of-distribution samples and adversarial examples to some degree
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 11
6
Layer 4
Layer 3
Layer 2
Layer 1
Prediction
Input
Legitimate Sample
Attack on DkNN
• Baseline: mean attacko Same as kNN
• Our gradient-based attacko Similar to our gradient-
based attack on kNN
Gradient-based attack on kNN
Attack on DkNN
• Our gradient-based attacko Similar to our gradient-
based attack on kNNo Instead of distance in the
pixel space, we consider distance in the representation space
o And sum over all the layers
Sum over all layers Representation at 𝜆-th layer
Results on DkNN
• We use the same network and hyperparameters suggested by Papernot & McDaniel ’18
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 14
Attacks Accuracy (%) Mean Perturbation (L2)No Attack 98.83 -Mean Attack 13.13 4.408P&M’18 Attack 16.02 3.459Our Gradient Attack 0.00 2.164
Results on DkNN
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 15
Neighbors at layer 4
Prediction
Input
5
Mean Attack
5
Our Gradient Attack
Neighbors at layer 3
Neighbors at layer 2
Neighbors at layer 1
Results on DkNN
• Some perturbations have semantic meaning• But some are imperceptible• Suggests that L2-norm is not
always a good metric• Suggests that there is some
hope for the defense
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 16
Clean
Adv.
Clean
Adv.
Clean
Adv.
Credibility
• DkNN can output a credibility score for a give input• It can be used to filter out adversarial examples and out-of-
distribution samples• Promising but not very effective currently
o Some adversarial examples have a high credibility scoreo Some clean samples have a low credibility score
• We refer to paper for more details
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 17
Conclusion
• We propose an attack on kNN and DkNN• Nonetheless, they appear to be more robust compared to other
algorithms out of the boxoRequires larger perturbationo Some perturbation also has semantic meaning
• Improving the DkNNoOngoing work: DkNN on representations of a robust network
(e.g. adversarially trained networks)oMore robust variants of kNN (e.g. weighted voting)
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 18
Extra Slides
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 19
• Gradient-based attacko Main idea: move 𝑧 towards
a group of 𝑚 nearby samples (𝑥%) from a different class
o Set up as a constrained optimization problem
o Use sigmoid as a soft threshold
o Choose 𝜂 to be mean distance to k-th neighbor
Sigmoid function Threshold𝑑
𝜎 𝑑 − 𝜂 ≈ 50 if 𝑑 < 𝜂1 if 𝑑 > 𝜂
𝜎 𝑥 =1
1 + 𝑒<=>
Sigmoid
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 20
Credibility
Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 21
Distribution of credibility