Privacy Risks of Securing Machine Learning Models against Adversarial...

Privacy Risks of Securing Machine Learning Models

against Adversarial Examples

1

Liwei Song1, Reza Shokri2, Prateek Mittal1

1Princeton University, 2National University of Singapore

2

Trustworthy Machine Learning

Membership inference attacks against machine learning models (Shokri et al. S&P’17)

❑ Properties of trustworthy machine learning models

❑ Privacy-Preserving with regard to training data

3

Trustworthy Machine Learning

❑ Properties of trustworthy machine learning models

❑ Privacy-Preserving with regard to training data

❑ Robust against adversarial attacks, which cause model misclassifications

Evasion attacks (adversarial examples) against machine learning models (Goodfellow et al. ICLR’15)

4

Our Contribution

❑ Jointly consider privacy and robustness for trustworthy machine learning

❑ Investigate membership inference privacy risks of defenses against adversarial examples

❑ Propose new membership inference attacks against robust machine learning models

❑ Adversarial defense methods indeed increase privacy leakage!

❑ Perturb inputs at the test time

❑ Defenses: Madry et al. ICLR’18; Wong

et al. NeurIPS’18…

❑ Infer participation in the training set

❑ Attacks: Shokri et al. S&P’17; Yeom et

al. CSF’18

Membership Inference Adversarial Examples

5

Privacy Risks in Machine Learning

❑ Membership inference attacks

❑ Guess whether a sample was used to train the target machine learning model or not

❑ Distinguishability between training data (members) and test data (non-members)

(𝑥, 𝑦) ∈ 𝐷train?

input (𝑥, 𝑦) target classifier 𝐹membership inference

adversary

target classifier 𝐹 6

Security Risks in Machine Learning

❑ Evasion attacks (adversarial examples)

❑ Attacks during test time to cause model misclassifications by adding small perturbations

❑ 𝐹 𝑥 + 𝛿 ≠ 𝐹(𝑥), with 𝛿 𝑝 ≤ 𝜖

adversarial examples

7

Robustness (against Adversarial Examples) in Machine Learning

❑ Defense methods against adversarial examples

min𝜃

1

|𝐷train|

(𝑥,𝑦)∈𝐷train

𝛼 ∗ natural loss(𝑥, 𝑦) + 1 − 𝛼 ∗ adversarial loss(𝑥, 𝑦)

robust classifiernatural classifier

8



❑ Empirical defenses: robust training with adversarial examples (Madry et al. ICLR’18; Sinha et al. ICLR’18; Zhang et al. ICML’19 )

9



❑ Empirical defenses: robust training with adversarial examples

❑ Verifiable defenses: robust training with verified upper bound of adversarial loss

(Wong et al. NeurIPS’18; Mirman et al. ICML’18; Gowal et al. SECML’18)

10

Why Robust Models Have Higher Privacy Risks? (I)

❑ Intuition 1: training samples have larger influence on robust machine learning

classifiers, compared to natural classifiers.

natural classifier robust classifier

11

Quantifying Influence of Training Data

Training samples have larger influence on the robust classifier,

indicating a higher privacy risk.

1. Randomly select a training sample

2. Retrain the model without the

selected training sample

3. Compute the difference of the

selected sample’s prediction

confidence in the retrained model

and the original model

Influence of training data on both natural and

robust CIFAR10 classifiers (Madry et al. ICLR’18)

12

Why Robust Models Have Higher Privacy Risks? (II)

❑ Intuition 2: Larger divergence between model

predictions on train data (members) and test data

(non-members)

❑ Drop in test accuracy

❑ Lack of robustness generalization

13

Divergence between Members and Non-Members

Robust CIFAR10 classifier (Madry et al. ICLR’18):

99% train accuracy and 87% test accuracy in the

benign setting; 96% train accuracy and 47% test

accuracy in the adversarial setting

Natural (undefended) CIFAR10 classifier:

100% train accuracy and 95% test accuracy in the

benign setting; 0% train accuracy and 0% test

accuracy in the adversarial settingThe robust classifier has a larger divergence between loss distributions

over members and non-members, indicating a higher privacy risk.

14

Membership Inference Attack based on Benign Examples’ Predictions

benign input (𝑥, 𝑦)

member

non-member

target robust classifier 𝐹,

with perturbation constraint 𝔅𝜖

membership inference

adversary (ℐB)

𝐹(𝑥)𝑦 ≥ 𝜏B ?

❑ Strategy ℐB (Yeom et al. CSF’18): compare the prediction confidence on the benign input with a threshold value (𝜏B)

15

New Membership Inference Attack based on Adversarial Examples’ Predictions

member

non-member

benign input (𝑥, 𝑦) &

adversarial input 𝑥𝑎𝑑𝑣

membership inference

adversary (ℐA)

𝐹(𝑥𝑎𝑑𝑣)𝑦≥ 𝜏A ?

❑ Strategy ℐA: compare the prediction confidence on the adversarial input with a threshold value (𝜏A)



16

New Membership Inference Attack based on Verified Worst-Case Predictions

❑ Strategy ℐV (applied only to verifiably robust classifiers): compare the verified worst-case prediction confidence with a threshold value (𝜏V)

member

non-member

benign input (𝑥, 𝑦) &

verification method 𝒱membership inference

adversary (ℐV)

𝒱 𝐹 𝑥 𝑦, 𝔅𝜖

≥ 𝜏V ?



Training

method

train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

Natural 100% 95.01% 0% 0% 57.43%

Madry et al.

ICLR’18

99.99% 87.25% 96.08% 46.61% 74.89%

17

Madry et al.’s Robust Training: Standard Membership Inference Attacks

❑ CIFAR10 Wide ResNet classifiers with 𝑙∞ perturbation constraint of 8/255 ( 𝛿 ∞ ≤ 8/255)

❑ Random guessing leads to 50% membership inference accuracy.

❑ The robust classifier has a much higher membership inference accuracy.

No robustnessLower

privacy risk

Much better

robustnessMuch higher

privacy risk

18

Madry et al.’s Robust Training: New Membership Inference Attacks

Training

method

train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

Natural 100% 95.01% 0% 0% 57.43% 50.86%

Madry et al.

ICLR’18

99.99% 87.25% 96.08% 46.61% 74.89% 75.67%

More privacy leakage by exploring

adversarial robustness

❑ CIFAR10 Wide ResNet classifiers with 𝑙∞ perturbation constraint of 8/255 ( 𝛿 ∞ ≤ 8/255)

❑ Using model predictions over adversarial examples (ℐA) leads to more privacy leakage.

19

Privacy Risk with Mixed Robust Training

❑ Only part of training set is used for robust training, the remaining part is for natural training.

❑ With more training samples for robust training, the training set has larger influence on the robust classifier.

20

Madry et al.’s Robust Training: Privacy Risk with Mixed Robust Training

Adv-train

ratio

train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

0 100% 95.01% 0% 0% 57.43% 50.85%

1/2 100% 87.78% 75.85% 43.23% 67.20% 66.36%

3/4 100% 86.68% 88.34% 45.66% 71.07% 72.22%

1 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%

❑ Mixed robust training on CIFAR10 dataset with 𝜖 as 8/255: only part of training set, whose ratio is “adv-train ratio”, is used to compute adversarial loss during training.

❑ With more samples used for robust training, the model is more vulnerable to membership inference attacks.

21

Privacy Risk with Perturbation Budget

❑ With a larger adversarial perturbation budget, the training set has larger influence on the robust classifier.

22

Madry et al.’s Robust Training: Privacy Risk with Perturbation Budget

Perturbation

budget (𝜖)

train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

2/255 100% 93.74% 99.99% 82.20% 64.48% 66.54%

4/255 100% 91.19% 99.89% 70.03% 69.44% 72.43%

8/255 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%

❑ Train robust CIFAR10 classifiers with different adversarial perturbation constraints

❑ The robust model with a larger perturbation budget is more vulnerable to membership inference attacks.

23

Madry et al.’s Robust Training: Robustness with Model Capacity

❑ Robust classifiers need large model capacity (Madry et al. ICLR’18).

❑ Robust classifier with a larger perturbation budget needs a larger capacity scale.

Adversarial training accuracy of robust CIFAR10 classifiers (Madry et al. ICLR’18)

with difference capacity scales (number of output channels in Wide ResNet).

24

Madry et al.’s Robust Training: Privacy Risk with Model Capacity

❑ The model with a larger capacity is more vulnerable to membership inference attacks.

Membership inference attacks on robust CIFAR10 classifiers (Madry et al. ICLR’18)

with difference capacity scales (number of output channels in Wide ResNet).

25

Madry et al.’s Robust Training: More Datasets

❑ Robust classifiers on all datasets have higher membership inference accuracy.

❑ Using adversarial examples’ predictions (ℐA) leads to more privacy leakage.

Dataset model

architecture

training

method

𝝐 train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

Yale

Face

VGG1 Natural N.A. 100% 98.25% 4.53% 2.92% 55.85% 54.27%

Yale

Face

VGG1 Madry et al.

ICLR’18

8/255 98.89% 96.69% 99.00% 77.63% 61.69% 68.83%

Fashion

MNIST

VGG2 Natural N.A. 100% 92.18% 4.35% 4.14% 57.12% 50.95%

Fashion

MNIST

VGG2 Madry et al.

ICLR’18

0.1 99.93% 90.88% 96.91% 68.06% 58.32% 64.49%

CIFAR10 Wide

ResNet

Natural N.A. 100% 95.01% 0% 0% 57.43% 50.86%

CIFAR10 Wide

ResNet

Madry et al.

ICLR’18

8/255 99.99% 87.25% 96.08% 46.61% 74.89% 75.67%

26

Empirical Robust Training: Membership Inference Attacks

Training

methods

train

acc

test

acc

adv-train

acc

adv-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

Natural 100% 98.25% 4.53% 2.92% 55.85% 54.27%

Madry et al.

ICLR’18

99.89% 96.69% 99.00% 77.63% 61.69% 68.83%

Sinha et al.

ICLR’18

99.58% 93.77% 83.26% 55.06% 62.23% 64.07%

Zhang et al.

ICML’19

99.53% 93.77% 99.42% 83.85% 58.06% 65.59%

❑ Yale face VGG classifiers with a 𝑙∞ perturbation constraint (𝜖) of 8/255

❑ All three empirically robust models have higher membership inference accuracy.

❑ Using adversarial examples’ predictions (ℐA) leads to more privacy leakage.

27

Verifiable Robust Training: Membership Inference Attacks

Training

methods

train

acc

test

acc

adv-train

acc

adv-test

acc

ver-train

acc

ver-test

acc

inference

acc (ℐB)

inference

acc (ℐA)

inference

acc (ℐV)

Natural 100% 98.25% 4.53% 2.92% N.A. N.A. 55.85% 54.27% N.A.

Wong et al.

NeurIPS’18

98.89% 92.80% 98.53% 83.66% 96.37% 68.87% 55.90% 60.40% 64.48%

Mirman et al.

ICML’18

99.26% 83.27% 85.68% 50.39% 43.32% 18.09% 65.11% 65.64% 67.05%

Gowal et al.

SECML’18

99.16% 85.80% 94.42% 69.68% 89.58% 36.77% 60.45% 66.28% 76.05%

❑ Yale face VGG classifiers with a 𝑙∞ perturbation constraint (𝜖) of 8/255

❑ All three verifiably robust models have higher membership inference accuracy.

❑ Using verified worst-case predictions (ℐV) leads to more privacy leakage.

28

Summary

❑ Measure the success of membership inference attacks against robust machine learning models

❑ Robust training methods increase privacy leakage.

❑ We propose new membership inference attacks by exploiting the structural properties of robust models.

❑ The privacy leakage is related to perturbation budget, model capacity and the ratio of robust training samples.

❑ Jointly think about privacy and robustness in trustworthy machine learning

❑ Source code: https://github.com/inspire-group/privacy-vs-robustness

https://github.com/inspire-group/privacy-vs-robustness

Privacy Risks of Securing Machine Learning Models against Adversarial...

Documents

Transcript of Privacy Risks of Securing Machine Learning Models against Adversarial...