Privacy Risks of Securing Machine Learning Models against Adversarial...
Transcript of Privacy Risks of Securing Machine Learning Models against Adversarial...
Privacy Risks of Securing Machine Learning Models
against Adversarial Examples
1
Liwei Song1, Reza Shokri2, Prateek Mittal1
1Princeton University, 2National University of Singapore
2
Trustworthy Machine Learning
Membership inference attacks against machine learning models (Shokri et al. S&P’17)
❑ Properties of trustworthy machine learning models
❑ Privacy-Preserving with regard to training data
3
Trustworthy Machine Learning
❑ Properties of trustworthy machine learning models
❑ Privacy-Preserving with regard to training data
❑ Robust against adversarial attacks, which cause model misclassifications
Evasion attacks (adversarial examples) against machine learning models (Goodfellow et al. ICLR’15)
4
Our Contribution
❑ Jointly consider privacy and robustness for trustworthy machine learning
❑ Investigate membership inference privacy risks of defenses against adversarial examples
❑ Propose new membership inference attacks against robust machine learning models
❑ Adversarial defense methods indeed increase privacy leakage!
❑ Perturb inputs at the test time
❑ Defenses: Madry et al. ICLR’18; Wong
et al. NeurIPS’18…
❑ Infer participation in the training set
❑ Attacks: Shokri et al. S&P’17; Yeom et
al. CSF’18
Membership Inference Adversarial Examples
5
Privacy Risks in Machine Learning
❑ Membership inference attacks
❑ Guess whether a sample was used to train the target machine learning model or not
❑ Distinguishability between training data (members) and test data (non-members)
(𝑥, 𝑦) ∈ 𝐷train?
input (𝑥, 𝑦) target classifier 𝐹membership inference
adversary
target classifier 𝐹 6
Security Risks in Machine Learning
❑ Evasion attacks (adversarial examples)
❑ Attacks during test time to cause model misclassifications by adding small perturbations
❑ 𝐹 𝑥 + 𝛿 ≠ 𝐹(𝑥), with 𝛿 𝑝 ≤ 𝜖
adversarial examples
7
Robustness (against Adversarial Examples) in Machine Learning
❑ Defense methods against adversarial examples
min𝜃
1
|𝐷train|
(𝑥,𝑦)∈𝐷train
𝛼 ∗ natural loss(𝑥, 𝑦) + 1 − 𝛼 ∗ adversarial loss(𝑥, 𝑦)
robust classifiernatural classifier
8
Robustness (against Adversarial Examples) in Machine Learning
❑ Defense methods against adversarial examples
❑ Empirical defenses: robust training with adversarial examples (Madry et al. ICLR’18; Sinha et al. ICLR’18; Zhang et al. ICML’19 )
9
Robustness (against Adversarial Examples) in Machine Learning
❑ Defense methods against adversarial examples
❑ Empirical defenses: robust training with adversarial examples
❑ Verifiable defenses: robust training with verified upper bound of adversarial loss
(Wong et al. NeurIPS’18; Mirman et al. ICML’18; Gowal et al. SECML’18)
10
Why Robust Models Have Higher Privacy Risks? (I)
❑ Intuition 1: training samples have larger influence on robust machine learning
classifiers, compared to natural classifiers.
natural classifier robust classifier
11
Quantifying Influence of Training Data
Training samples have larger influence on the robust classifier,
indicating a higher privacy risk.
1. Randomly select a training sample
2. Retrain the model without the
selected training sample
3. Compute the difference of the
selected sample’s prediction
confidence in the retrained model
and the original model
Influence of training data on both natural and
robust CIFAR10 classifiers (Madry et al. ICLR’18)
12
Why Robust Models Have Higher Privacy Risks? (II)
❑ Intuition 2: Larger divergence between model
predictions on train data (members) and test data
(non-members)
❑ Drop in test accuracy
❑ Lack of robustness generalization
13
Divergence between Members and Non-Members
Robust CIFAR10 classifier (Madry et al. ICLR’18):
99% train accuracy and 87% test accuracy in the
benign setting; 96% train accuracy and 47% test
accuracy in the adversarial setting
Natural (undefended) CIFAR10 classifier:
100% train accuracy and 95% test accuracy in the
benign setting; 0% train accuracy and 0% test
accuracy in the adversarial settingThe robust classifier has a larger divergence between loss distributions
over members and non-members, indicating a higher privacy risk.
14
Membership Inference Attack based on Benign Examples’ Predictions
benign input (𝑥, 𝑦)
member
non-member
target robust classifier 𝐹,
with perturbation constraint 𝔅𝜖
membership inference
adversary (ℐB)
𝐹(𝑥)𝑦 ≥ 𝜏B ?
❑ Strategy ℐB (Yeom et al. CSF’18): compare the prediction confidence on the benign input with a threshold value (𝜏B)
15
New Membership Inference Attack based on Adversarial Examples’ Predictions
member
non-member
benign input (𝑥, 𝑦) &
adversarial input 𝑥𝑎𝑑𝑣
membership inference
adversary (ℐA)
𝐹(𝑥𝑎𝑑𝑣)𝑦≥ 𝜏A ?
❑ Strategy ℐA: compare the prediction confidence on the adversarial input with a threshold value (𝜏A)
target robust classifier 𝐹,
with perturbation constraint 𝔅𝜖
16
New Membership Inference Attack based on Verified Worst-Case Predictions
❑ Strategy ℐV (applied only to verifiably robust classifiers): compare the verified worst-case prediction confidence with a threshold value (𝜏V)
member
non-member
benign input (𝑥, 𝑦) &
verification method 𝒱membership inference
adversary (ℐV)
𝒱 𝐹 𝑥 𝑦, 𝔅𝜖
≥ 𝜏V ?
target robust classifier 𝐹,
with perturbation constraint 𝔅𝜖
Training
method
train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
Natural 100% 95.01% 0% 0% 57.43%
Madry et al.
ICLR’18
99.99% 87.25% 96.08% 46.61% 74.89%
17
Madry et al.’s Robust Training: Standard Membership Inference Attacks
❑ CIFAR10 Wide ResNet classifiers with 𝑙∞ perturbation constraint of 8/255 ( 𝛿 ∞ ≤ 8/255)
❑ Random guessing leads to 50% membership inference accuracy.
❑ The robust classifier has a much higher membership inference accuracy.
No robustnessLower
privacy risk
Much better
robustnessMuch higher
privacy risk
18
Madry et al.’s Robust Training: New Membership Inference Attacks
Training
method
train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
Natural 100% 95.01% 0% 0% 57.43% 50.86%
Madry et al.
ICLR’18
99.99% 87.25% 96.08% 46.61% 74.89% 75.67%
More privacy leakage by exploring
adversarial robustness
❑ CIFAR10 Wide ResNet classifiers with 𝑙∞ perturbation constraint of 8/255 ( 𝛿 ∞ ≤ 8/255)
❑ Using model predictions over adversarial examples (ℐA) leads to more privacy leakage.
19
Privacy Risk with Mixed Robust Training
❑ Only part of training set is used for robust training, the remaining part is for natural training.
❑ With more training samples for robust training, the training set has larger influence on the robust classifier.
20
Madry et al.’s Robust Training: Privacy Risk with Mixed Robust Training
Adv-train
ratio
train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
0 100% 95.01% 0% 0% 57.43% 50.85%
1/2 100% 87.78% 75.85% 43.23% 67.20% 66.36%
3/4 100% 86.68% 88.34% 45.66% 71.07% 72.22%
1 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%
❑ Mixed robust training on CIFAR10 dataset with 𝜖 as 8/255: only part of training set, whose ratio is “adv-train ratio”, is used to compute adversarial loss during training.
❑ With more samples used for robust training, the model is more vulnerable to membership inference attacks.
21
Privacy Risk with Perturbation Budget
❑ With a larger adversarial perturbation budget, the training set has larger influence on the robust classifier.
22
Madry et al.’s Robust Training: Privacy Risk with Perturbation Budget
Perturbation
budget (𝜖)
train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
2/255 100% 93.74% 99.99% 82.20% 64.48% 66.54%
4/255 100% 91.19% 99.89% 70.03% 69.44% 72.43%
8/255 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%
❑ Train robust CIFAR10 classifiers with different adversarial perturbation constraints
❑ The robust model with a larger perturbation budget is more vulnerable to membership inference attacks.
23
Madry et al.’s Robust Training: Robustness with Model Capacity
❑ Robust classifiers need large model capacity (Madry et al. ICLR’18).
❑ Robust classifier with a larger perturbation budget needs a larger capacity scale.
Adversarial training accuracy of robust CIFAR10 classifiers (Madry et al. ICLR’18)
with difference capacity scales (number of output channels in Wide ResNet).
24
Madry et al.’s Robust Training: Privacy Risk with Model Capacity
❑ The model with a larger capacity is more vulnerable to membership inference attacks.
Membership inference attacks on robust CIFAR10 classifiers (Madry et al. ICLR’18)
with difference capacity scales (number of output channels in Wide ResNet).
25
Madry et al.’s Robust Training: More Datasets
❑ Robust classifiers on all datasets have higher membership inference accuracy.
❑ Using adversarial examples’ predictions (ℐA) leads to more privacy leakage.
Dataset model
architecture
training
method
𝝐 train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
Yale
Face
VGG1 Natural N.A. 100% 98.25% 4.53% 2.92% 55.85% 54.27%
Yale
Face
VGG1 Madry et al.
ICLR’18
8/255 98.89% 96.69% 99.00% 77.63% 61.69% 68.83%
Fashion
MNIST
VGG2 Natural N.A. 100% 92.18% 4.35% 4.14% 57.12% 50.95%
Fashion
MNIST
VGG2 Madry et al.
ICLR’18
0.1 99.93% 90.88% 96.91% 68.06% 58.32% 64.49%
CIFAR10 Wide
ResNet
Natural N.A. 100% 95.01% 0% 0% 57.43% 50.86%
CIFAR10 Wide
ResNet
Madry et al.
ICLR’18
8/255 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%
26
Empirical Robust Training: Membership Inference Attacks
Training
methods
train
acc
test
acc
adv-train
acc
adv-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
Natural 100% 98.25% 4.53% 2.92% 55.85% 54.27%
Madry et al.
ICLR’18
99.89% 96.69% 99.00% 77.63% 61.69% 68.83%
Sinha et al.
ICLR’18
99.58% 93.77% 83.26% 55.06% 62.23% 64.07%
Zhang et al.
ICML’19
99.53% 93.77% 99.42% 83.85% 58.06% 65.59%
❑ Yale face VGG classifiers with a 𝑙∞ perturbation constraint (𝜖) of 8/255
❑ All three empirically robust models have higher membership inference accuracy.
❑ Using adversarial examples’ predictions (ℐA) leads to more privacy leakage.
27
Verifiable Robust Training: Membership Inference Attacks
Training
methods
train
acc
test
acc
adv-train
acc
adv-test
acc
ver-train
acc
ver-test
acc
inference
acc (ℐB)
inference
acc (ℐA)
inference
acc (ℐV)
Natural 100% 98.25% 4.53% 2.92% N.A. N.A. 55.85% 54.27% N.A.
Wong et al.
NeurIPS’18
98.89% 92.80% 98.53% 83.66% 96.37% 68.87% 55.90% 60.40% 64.48%
Mirman et al.
ICML’18
99.26% 83.27% 85.68% 50.39% 43.32% 18.09% 65.11% 65.64% 67.05%
Gowal et al.
SECML’18
99.16% 85.80% 94.42% 69.68% 89.58% 36.77% 60.45% 66.28% 76.05%
❑ Yale face VGG classifiers with a 𝑙∞ perturbation constraint (𝜖) of 8/255
❑ All three verifiably robust models have higher membership inference accuracy.
❑ Using verified worst-case predictions (ℐV) leads to more privacy leakage.
28
Summary
❑ Measure the success of membership inference attacks against robust machine learning models
❑ Robust training methods increase privacy leakage.
❑ We propose new membership inference attacks by exploiting the structural properties of robust models.
❑ The privacy leakage is related to perturbation budget, model capacity and the ratio of robust training samples.
❑ Jointly think about privacy and robustness in trustworthy machine learning
❑ Source code: https://github.com/inspire-group/privacy-vs-robustness