Research done at Pennsylvania State University & Google ... · in Machine Learning Nicolas Papernot...
Transcript of Research done at Pennsylvania State University & Google ... · in Machine Learning Nicolas Papernot...
Security and Privacyin Machine LearningNicolas Papernot
Research done at Pennsylvania State University & Google Brain
December 2017 - Tutorial at IEEE WIFS, Rennes, France
@NicolasPapernot
Patrick McDaniel(Penn State)
Ian Goodfellow(Google Brain)
Martín Abadi (Google Brain)Pieter Abbeel (Berkeley)Michael Backes (CISPA)Dan Boneh (Stanford)Z. Berkay Celik (Penn State)Yan Duan (OpenAI)Úlfar Erlingsson (Google Brain)Matt Fredrikson (CMU)Kathrin Grosse (CISPA)Sandy Huang (Berkeley)Somesh Jha (U of Wisconsin)
Thank you to my collaborators
2
Alexey Kurakin (Google Brain)Praveen Manoharan (CISPA)Ilya Mironov (Google Brain)Ananth Raghunathan (Google Brain)Arunesh Sinha (U of Michigan) Shuang Song (UCSD)Ananthram Swami (US ARL)Kunal Talwar (Google Brain)Florian Tramèr (Stanford)Michael Wellman (U of Michigan)Xi Wu (Google)
@NicolasPapernot
The attack surface
3
4
Machine Learning Classifier
[0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]
[p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)] f(x,θ)x
Classifier: map inputs to one class among a predefined set
5
Machine Learning Classifier
[0 1 0 0 0 0 0 0 0 0][0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 0 1][0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0][0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 0][0 0 0 0 1 0 0 0 0 0]
Learning: find internal classifier parameters θ that minimize a cost/loss function (~model error)
Adversarial goals
6
Confidentiality and Privacy● Confidentiality of the model itself (e.g., intellectual property)
○ Tramèr et al. Stealing Machine Learning Models via Prediction APIs
● Privacy of the training or test data (e.g., medical records)○ Fredrikson et al. Model Inversion Attacks that Exploit Confidence Information○ Shokri et al. Membership Inference Attacks Against Machine Learning Models○ Chaudhuri et al . Privacy-preserving logistic regression○ Song et al. Stochastic gradient descent with differentially private updates○ Abadi et al. Deep learning with differential privacy○ Papernot et al. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Integrity and Availability● Integrity of the predictions (wrt expected outcome)● Availability of the system deploying machine learning
Training time○ Kearns et al. Learning in the presence of malicious errors○ Biggio et al. Support Vector Machines Under Adversarial Label Noise○ Kloft et al. Online anomaly detection under adversarial impact
Inference time
● White-box attacker:○ Szegedy et al. Intriguing Properties of Neural Networks○ Biggio et al. Evasion Attacks against Machine Learning at Test Time○ Goodfellow et al. Explaining and Harnessing Adversarial Examples
● Black-box attacker:○ Dalvi et al. Adversarial Learning○ Szegedy et al. Intriguing Properties of Neural Networks○ Xu et al. Automatically evading classifiers: A Case Study on PDF Malware Classifiers
Adversarial capabilities for integrity attacks
7
ML
ML
Training time○ Kearns et al. Learning in the presence of malicious errors○ Biggio et al. Support Vector Machines Under Adversarial Label Noise○ Kloft et al. Online anomaly detection under adversarial impact
Inference time
● White-box attacker:○ Szegedy et al. Intriguing Properties of Neural Networks○ Biggio et al. Evasion Attacks against Machine Learning at Test Time○ Goodfellow et al. Explaining and Harnessing Adversarial Examples
● Black-box attacker:○ Dalvi et al. Adversarial Learning○ Szegedy et al. Intriguing Properties of Neural Networks○ Xu et al. Automatically evading classifiers: A Case Study on PDF Malware Classifiers
Adversarial capabilities for integrity attacks
8
ML
ML
Part I
Security in machine learning
9
Adversarial examples(white-box attacks)
10
Generalization error
11
Task decision boundary Training points for class 1Training points for class 2
Generalization error
12
Task decision boundaryModel decision boundary
Training points for class 1Training points for class 2
Generalization error
13
Task decision boundaryModel decision boundary
Training points for class 1Training points for class 2
Testing points for class 1
Generalization error
14
Task decision boundaryModel decision boundary
Training points for class 1Training points for class 2Adversarial examples for class 1Testing points for class 1
Adversarial examples
15
Optimization with L-BFGS:
+ =
x r x*
Szegedy et al. Intriguing Properties of Neural Networks
Jacobian-based Saliency Map Approach (JSMA)
16Papernot et al. The Limitations of Deep Learning in Adversarial Settings
17
Jacobian-Based Iterative Approach: source-target misclassification
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Attacking other machine learning techniques
18
Logistic Regression
Nearest Neighbors (with soft-min)} Attacks for neural networks adapt
Support Vector Machines
Decision Trees
}}
Perturb orthogonally to hyperplane
Greedy search algorithm
Evading a Neural Network Malware Classifier
DREBIN dataset of Android applications
Add constraints to JSMA approach: - only add features: keep malware behavior - only features from manifest: easy to modify
“Most accurate” neural network- 98% accuracy, with 9.7% FP and 1.3% FN - Evaded with a 63.08% success rate
19Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
P[X=Malware] = 0.90P[X=Benign] = 0.10
P[X*=Malware] = 0.10P[X*=Benign] = 0.90
Supervised vs. reinforcement learning
20
Supervised learning Reinforcement learning
Model inputs Observation (e.g., traffic sign, music, email) Environment & Reward function
Model outputsClass
(e.g., stop/yield, jazz/classical, spam/legitimate)
Action
Training “goal”(i.e., cost/loss)
Minimize class prediction errorover pairs of (inputs, outputs)
Maximize reward by exploring the environment and
taking actions
Example
Adversarial attacks on neural network policies
21Huang et al. Adversarial Attacks on Neural Network Policies
Adversarial examples(black-box attacks)
22
Threat model of a black-box attack
Training dataModel architecture Model parameters
Model scores
Adversarial capabilities
(limited) oracle access: labels
Adversarial goal Force a ML model remotely accessible through an API to misclassify
23
Example
Our approach to black-box attacks
24
Alleviate lack of knowledge about model
Alleviate lack of training data
Adversarial example transferability
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
These property comes in several variants:
● Intra-technique transferability:○ Cross model transferability○ Cross training set transferability
● Cross-technique transferability
25
ML A
Szegedy et al. Intriguing properties of neural networks
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
These property comes in several variants:
● Intra-technique transferability:○ Cross model transferability○ Cross training set transferability
● Cross-technique transferability
26Szegedy et al. Intriguing properties of neural networks
ML A
ML B Victim
Adversarial example transferability
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
27
Adversarial example transferability
Intra-technique transferability: cross training data
28[PMG16b] Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Strong Weak Intermediate
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
29
Adversarial example transferability
Cross-technique transferability
30Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Cross-technique transferability
31Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Large adversarial subspaces enable transferability
32Tramèr et al. The Space of Transferable Adversarial Examples
On average: 44 orthogonal directions -> 25 transfer
Our approach to black-box attacks
33
Adversarial example transferability from a substitute model to
target model
Alleviate lack of knowledge about model
Alleviate lack of training data
Attacking remotely hosted black-box models
34
Remote ML sys
“no truck sign”“STOP sign”
“STOP sign”
(1) The adversary queries remote ML system for labels on inputs of its choice.
35
Remote ML sys
Local substitute
“no truck sign”“STOP sign”
“STOP sign”
(2) The adversary uses this labeled data to train a local substitute for the remote system.
Attacking remotely hosted black-box models
36
Remote ML sys
Local substitute
“no truck sign”“STOP sign”
(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.
Attacking remotely hosted black-box models
37
Remote ML sys
Local substitute
“yield sign”
(4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.
Attacking remotely hosted black-box models
Our approach to black-box attacks
38
Adversarial example transferability from a substitute model to
target model
Synthetic data generation
+
Alleviate lack of knowledge about model
Alleviate lack of training data
Results on real-world remote systems
39
All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)
Remote Platform ML technique Number of queriesAdversarial examples
misclassified (after querying)
Deep Learning 6,400 84.24%
Logistic Regression 800 96.19%
Unknown 2,000 97.72%
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Defending against adversarial examples
40
Distillation and the softmax layer
41Hinton et al. Distilling Knowledge in a Neural Network
During training, increase the temperature T in the softmax layer:
The temperature T controls the transformation of logits z(x) in probabilities f(x):
● Smaller values of T: ○ Probabilities are discrete○ Example: [0, 0, 0.99, 0.01, 0]
● Higher values of T:○ Probabilities are uniform○ Example: [0.15, 0.15, 0.4, 0.15, 0.15]
Defensive distillation
42[PMW16] Papernot et al. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Defensive distillation against non-adaptive attacks
43
Fast Gradient Sign MethodJacobian-based Saliency Map Approach
[PMW16] Papernot et al. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Role of temperature during defensive distillation
44
We measured the mean gradient amplitudes on the 10,000 test set inputs of the CIFAR-10 dataset.
[PMW16] Papernot et al. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Defensive distillation (and adversarial training for different reasons) decrease the magnitude of gradients around the training and test data.
For a non-adaptive adversary, this makes it harder to find the model’s error subspace, but does not correct the model predictions in this subspace.
If the adversary adapts, it can use alternative ways to find this subspace: e.g., the gradients of another model.
Hence, we can evade defensive distillation and adversarial training with the black-box attack if the substitute is undefended.
Gradient masking in defenses and black-box attacks
45
Defended model
Undefended substitute
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Adversarial training
46
Small when prediction is correct on legitimate input
Adversarial training
47
Small when prediction is correct on legitimate input
Small when prediction is correct on adversarial input
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
48Figure by Ian Goodfellow
Training time (epochs)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
49Figure by Ian Goodfellow
Training time (epochs)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
50Figure by Ian Goodfellow
Training time (epochs)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
51Figure by Ian Goodfellow
Training time (epochs)
Gradient masking in adversarially trained models
52Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr
Direction of the adversarially trained model’s gradient
Direction of another model’s
gradient
Gradient masking in adversarially trained models
53Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr
Direction of the adversarially trained model’s gradient
Adversarial example
Non-adversarial example
Direction of another model’s
gradient
Gradient masking in adversarially trained models
54Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr
Non-adversarial example
Direction of the adversarially trained model’s gradient
Direction of another model’s
gradient
Adversarial exampleNon-adversarial example
Evading gradient masking (1)
55
Threat model: white-box adversary
Attack:
(1) Random step (of norm alpha)(2) FGSM step (of norm eps - alpha)
Tramèr et al. Ensemble Adversarial Training: Attacks and DefensesIllustration adapted from slides by Florian Tramèr
Threat model: black-box attack
Attack:
(1) Learn substitute for defended model(2) Find adversarial direction using substitute
Evading gradient masking (2)
56Papernot et al. Practical Black-Box Attacks against Machine LearningPapernot et al. Towards the Science of Security and Privacy in Machine Learning
Complete post on gradient masking at cleverhans.io (joint blog with Ian Goodfellow)
Adversarial training
57
Small when prediction is correct on legitimate input
Small when prediction is correct on adversarial inputgradient is not adversarial
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
58
Model A Adversarial gradient
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
59
Model A Inference
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
60
Model A Adversarial gradient
Model B Model C Model D
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
61
Model A Adversarial gradient
Model B Model C Model D
Model C
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
62
Model A Adversarial gradient
Model B Model D
Model D
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
63
Model A Adversarial gradient
Model B Model C
Model D
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
64
Model A Adversarial gradient
Model B Model C
Model DModel B Model C
Ensemble adversarial training
Intuition: present adversarial gradients from multiple models during training
65
Model A
Inference
Experimental results on MNIST
66
(from holdout)
Experimental results on ImageNet
67
(from holdout)
Benchmarking progress in the adversarial MLcommunity
68
69
70
Growing community
1.3K+ stars340+ forks
40+ contributors
71
Adversarial examples represent worst-case distribution drifts
[DDS04] Dalvi et al. Adversarial Classification (KDD)
72
Adversarial examples are a tangible instance of hypothetical AI safety problems
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
Part II
Privacy in machine learning
73
Types of adversaries and our threat model
74
In our work, the threat model assumes:
- Adversary can make a potentially unbounded number of queries- Adversary has access to model internals
Model inspection (white-box adversary)Zhang et al. (2017) Understanding DL requires rethinking generalization
Model querying (black-box adversary)Shokri et al. (2016) Membership Inference Attacks against ML ModelsFredrikson et al. (2015) Model Inversion Attacks
?Black-box
ML
A definition of privacy
75
Randomized Algorithm
Randomized Algorithm
Answer 1Answer 2
...Answer n
Answer 1Answer 2
...Answer n
????}
Our design goals
76
Preserve privacy of training data when learning classifiers
Differential privacy protection guarantees
Intuitive privacy protection guarantees
Generic* (independent of learning algorithm)Goals
Problem
*This is a key distinction from previous work, such as Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy Shokri et al. (2015) Privacy-preserving Deep Learning Abadi et al. (2016) Deep Learning with Differential Privacy Hamm et al. (2016) Learning privately from multiparty data
The PATE approach
77
Teacher ensemble
78
Partition 1
Partition 2
Partition n
Partition 3
...
Teacher 1
Teacher 2
Teacher n
Teacher 3
...
Training
Sensitive Data
Data flow
Aggregation
79
Count votes Take maximum
Intuitive privacy analysis
80
If most teachers agree on the label, it does not depend on specific partitions, so the privacy cost is small.
If two classes have close vote counts, the disagreement may reveal private information.
Noisy aggregation
81
Count votes Add Laplacian noise Take maximum
Teacher ensemble
82
Partition 1
Partition 2
Partition n
Partition 3
...
Teacher 1
Teacher 2
Teacher n
Teacher 3
...
Aggregated Teacher
Training
Sensitive Data
Data flow
Student training
83
Partition 1
Partition 2
Partition n
Partition 3
...
Teacher 1
Teacher 2
Teacher n
Teacher 3
...
Aggregated Teacher Student
Training
Available to the adversaryNot available to the adversary
Sensitive Data
Public Data
Inference Data flow
Queries
Why train an additional “student” model?
84
Each prediction increases total privacy loss.Privacy budgets create a tension between the accuracy and number of predictions.
Inspection of internals may reveal private data.Privacy guarantees should hold in the face of white-box adversaries.
1
2
The aggregated teacher violates our threat model:
Student training
85
Partition 1
Partition 2
Partition n
Partition 3
...
Teacher 1
Teacher 2
Teacher n
Teacher 3
...
Aggregated Teacher Student
Training
Available to the adversaryNot available to the adversary
Sensitive Data
Public Data
Inference Data flow
Queries
Deployment
86
Inference
Available to the adversary
QueriesStudent
Differential privacy:A randomized algorithm M satisfies ( , ) differential privacy if for all pairs of neighbouring datasets (d,d’), for all subsets S of outputs:
Application of the Moments Accountant technique (Abadi et al, 2016)
Strong quorum ⟹ Small privacy cost
Bound is data-dependent: computed using the empirical quorum
Differential privacy analysis
87
PATE-G:the generative variant of PATE
88
PATE-__
Generative Adversarial Networks (GANs)
89
2 competing models trying to game each other:
Goodfellow et al. (2014) Generative Adversarial Networks
Generator:
Input: noise sampled from random distribution
Output: synthetic input close to the expected training distribution
Discriminator
Input: output from generator OR example from real training distribution
Output: in distribution OR fake
Gaussian sample Fake sample Sample P(real)=...
P(fake)=...
GANs for semi-supervised learning
90
2 competing models trying to game each other:
Generator:
Input: noise sampled from random distribution
Output: synthetic input close to the expected training distribution
Discriminator
Input: output from generator OR example from real training distribution
Output: in distribution (which class) OR fake
Gaussian sample Fake sample Sample
P(real 0)=...P(real 1)=...
...P(real n)=...P(fake)=...
Salimans et al. (2016) Improved techniques for training GANs
Student training in PATE-G
91
Partition 1
Partition 2
Partition n
Partition 3
...
Teacher 1
Teacher 2
Teacher n
Teacher 3
...
Aggregated Teacher
Queries
Training
Available to the adversaryNot available to the adversary
Sensitive Data
Inference Data flow
Generator
Discriminator
Public Data
Deployment of PATE-G
92
Inference
Available to the adversary
QueriesDiscriminator
Experimental results
93
Experimental setup
94
Dataset Teacher Model Student Model
MNIST Convolutional Neural Network Generative Adversarial Networks
SVHN Convolutional Neural Network Generative Adversarial Networks
UCI Adult Random Forest Random Forest
UCI Diabetes Random Forest Random Forest
/ /models/tree/master/differential_privacy/multiple_teachers
Aggregated teacher accuracy
95
Trade-off between student accuracy and privacy
96
Trade-off between student accuracy and privacy
97
UCI Diabetes
1.44
10-5
Non-private baseline 93.81%
Student accuracy
93.94%
Synergy between privacy and generalization
98
?Thank you for listening! Get involved at:
99
www.cleverhans.io@NicolasPapernot
github.com/tensorflow/cleverhans