Lecture 15 Weight Initialization,Momentum and Learning Rates
Learning without Centralized Training Data Privacy...
Transcript of Learning without Centralized Training Data Privacy...
![Page 1: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/1.jpg)
Federated Learning Privacy-Preserving Collaborative Machine Learning without Centralized Training DataJakub Konečný[email protected] presenting the work of manyTrends in Optimization Seminar, University of Washington in SeattleJan 30, 2018
![Page 2: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/2.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 3: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/3.jpg)
Imbue mobile devices with state of the art machine learning systems without centralizing data and with privacy by default.
Federated Learning
Our Goal
![Page 4: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/4.jpg)
Imbue mobile devices with state of the art machine learning systems without centralizing data and with privacy by default.
A very personal computer2015: 79% away from phone ≤2 hours/day1 63% away from phone ≤1 hour/day 25% can't remember being away at all
2013: 72% of users within 5 feet of phone most of the time2.
Plethora of sensors
Innumerable digital interactions12015 Always Connected Research Report, IDC and Facebook22013 Mobile Consumer Habits Study, Jumio and Harris Interactive.
Federated Learning
Our Goal
![Page 5: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/5.jpg)
Imbue mobile devices with state of the art machine learning systems without centralizing data and with privacy by default.
Deep Learning
non-convex
millions of parameters
complex structure (eg LSTMs)
Federated Learning
Our Goal
![Page 6: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/6.jpg)
Cloud-centric ML for Mobile
![Page 7: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/7.jpg)
Current Model Parameters
The model lives in the cloud.
![Page 8: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/8.jpg)
trainingdata
We train models in the cloud.
![Page 9: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/9.jpg)
Mobile Device
Current Model Parameters
![Page 10: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/10.jpg)
request
prediction
Make predictions in the cloud.
![Page 11: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/11.jpg)
trainingdata
request
prediction
Gather training data in the cloud.
![Page 12: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/12.jpg)
trainingdata
And make the models better.
![Page 13: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/13.jpg)
On-device inference
On-device inference is using a cloud-distributed model to make predictions directly on an edge device without a cloud round-trip.
![Page 14: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/14.jpg)
request
prediction
Instead of making predictions in the cloud
![Page 15: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/15.jpg)
Distribute the model,make predictions on device.
![Page 16: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/16.jpg)
Privacy
Offline
Latency
Power Sensors
Data Caps
![Page 17: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/17.jpg)
Machine Intelligence for Mobile Devices
MI models in the
data center
MI models on the device
e.g. ranking a large inventory
e.g. keyboard suggest, ambient audio analysise.g. speech
transcription
![Page 18: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/18.jpg)
trainingdata
But how do we continue toimprove the model?
![Page 19: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/19.jpg)
trainingdata
But how do we continue toimprove the model?
?
![Page 20: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/20.jpg)
Interactions generate training data on device... Local
Training Data
![Page 21: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/21.jpg)
Local Training Data
Which we gather to thecloud.
![Page 22: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/22.jpg)
And make the model better.
![Page 23: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/23.jpg)
And make the model better.(for everyone)
![Page 24: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/24.jpg)
Privacy
Offline
Latency
Power Sensors
Data Caps
![Page 25: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/25.jpg)
On-Device Learning (Personalization)
![Page 26: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/26.jpg)
Local Training Data
Instead of centralizingthe training data...
![Page 27: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/27.jpg)
Train models right on the device.Better for everyone (individually.)
![Page 28: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/28.jpg)
But what about…
1. New User Experience
2. Benefitting from peers' data
![Page 29: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/29.jpg)
Federated computation and learning
Federated computation: where a server coordinates a fleet of participating devices to compute aggregations of devices’ private data.
Federated learning: where a shared global model is trained via federated computation.
![Page 30: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/30.jpg)
Mobile Device
Local Training Data
Current Model Parameters
Federated Learning
CloudServiceProvider
![Page 31: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/31.jpg)
Mobile Device
Local Training Data
Many devices will be offline.
Current Model Parameters
CloudServiceProvider
Federated Learning
![Page 32: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/32.jpg)
Mobile Device
Local Training Data
Current Model Parameters
Federated Learning
Many devices will be offline.
1. Server selects a sample of e.g. 100 online devices.
![Page 33: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/33.jpg)
Federated Learning
2. Selected devices download the current model parameters.
![Page 34: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/34.jpg)
Federated Learning
3. Devices compute an update using local training data
![Page 35: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/35.jpg)
∑
Federated Learning
4. Server aggregates users' updates into a new model.
![Page 36: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/36.jpg)
∑
Federated Learning
5. Repeat until convergence
![Page 37: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/37.jpg)
To make the model better.(for everyone)
![Page 38: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/38.jpg)
And personalize it,for every one.
![Page 39: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/39.jpg)
Privacy
Offline
Latency
Power Sensors
Data Caps
![Page 40: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/40.jpg)
Privacy
Offline
Latency
Power
In Vivo
Sensors
Data Caps
Personalization
Training & Evaluation
![Page 41: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/41.jpg)
Privacy
Offline
Latency
Power
In Vivo
Sensors
Data Caps
Personalization
Training & Evaluation
![Page 42: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/42.jpg)
Applications of Federating Learning
What makes a good application?
● On-device data is more relevant than server-side proxy data
● On-device data is privacy sensitive or large
● Labels can be inferred naturally from user interaction
Example applications
● Language modeling for mobile keyboards and voice recognition
● Image classification for predicting which photos people will share
● ...
![Page 43: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/43.jpg)
Federated Learning in Gboard
![Page 44: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/44.jpg)
Federated Learning in Gboard
![Page 45: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/45.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 46: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/46.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
![Page 47: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/47.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
![Page 48: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/48.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
Unbalanced DataSome devices have few examples, some have orders of magnitude more
![Page 49: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/49.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
Unbalanced DataSome devices have few examples, some have orders of magnitude more
Highly Non-IID DataData on each device reflects one individual's usage pattern
![Page 50: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/50.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
Unbalanced DataSome devices have few examples, some have orders of magnitude more
Highly Non-IID DataData on each device reflects one individual's usage pattern
Unreliable Compute NodesDevices go offline unexpectedly; expect faults and adversaries
![Page 51: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/51.jpg)
Atypical Federated Learning assumptionsMassively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
Unbalanced DataSome devices have few examples, some have orders of magnitude more
Highly Non-IID DataData on each device reflects one individual's usage pattern
Unreliable Compute NodesDevices go offline unexpectedly; expect faults and adversaries
Dynamic Data AvailabilityThe subset of data available is non-constant, e.g. time-of-day vs. country
![Page 52: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/52.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 53: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/53.jpg)
Server
Until Converged:1. Select a random subset (e.g. 1000) of the (online) clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return θ'-θt to server.
The Federated Averaging algorithm
θt
θ'
![Page 54: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/54.jpg)
Rounds to reach 10.5% Accuracy
FedSGD 820FedAvg 35
23x decrease in communication rounds
H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017
Large-scale LSTM for next-word predictionDataset: Large Social Network, 10m public posts, grouped by author.
![Page 55: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/55.jpg)
CIFAR-10 convolutional model
Updates to reach 82%SGD 31,000FedSGD 6,600FedAvg 630
49x
(IID and balanced data)
decrease in communication (updates) vs SGD
H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017
![Page 56: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/56.jpg)
Open Questions
![Page 57: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/57.jpg)
Privacy
Offline
Latency
Power
In Vivo
Sensors
Data Caps
Personalization
Training & Evaluation
![Page 58: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/58.jpg)
Federated Learning + Personalized Learning
![Page 59: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/59.jpg)
Federated Learning + Personalized Learning
Take as a recipe for
(rather than a task-useful model on its own)
![Page 60: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/60.jpg)
Take as a recipe for
(rather than a task-useful model on its own)
1. Multitask Learning
represents a regularizer for training , i.e. smoothing among models for similar users.
P. Vanhaesebrouck, A. Bellet and M. Tommasi. Decentralized Collaborative Learning of Personalized Models over Networks. AISTATS, 2017.
V. Smith, C.K. Chiang, M. Sanjabi, & A. Talwalkar Federated Multi-Task Learning. arXiv preprint, 2017.
![Page 61: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/61.jpg)
Take as a recipe for
(rather than a task-useful model on its own)
2. Learning to Learn
represents a learning procedure that is biased towards learning efficiently.
This procedure either replaces standard SGD or controls any free parameters (e.g. learning rate) when training .
O. Wichrowska, N. Maheswaranathan, M. W. Hoffman, S. G. Colmenarejo, M. Denil, N. de Freitas, & J. Sohl-Dickstein. Learned Optimizers that Scale and Generalize. arXiv preprint, 2017.
N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel. Meta-Learning with Temporal Convolutions. arXiv preprint, 2017.
![Page 62: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/62.jpg)
Take as a recipe for
(rather than a task-useful model on its own)
3. Model-Agnostic Meta Learning
represents a good initialization for standard algorithms to very quickly learning .
"Personalize to new users quickly" becomes the training objective for by back-propagating through the training procedure for .
Chelsea Finn, P. Abbeel, S. Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. arXiv preprint, 2017
![Page 63: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/63.jpg)
Better algorithms than Federated Averaging?Massively Distributed
Training data is stored across a very large number of devices
Limited CommunicationOnly a handful of rounds of unreliable communication with each devices
Unbalanced DataSome devices have few examples, some have orders of magnitude more
Highly Non-IID DataData on each device reflects one individual's usage pattern
Unreliable Compute NodesDevices go offline unexpectedly; expect faults and adversaries
Dynamic Data AvailabilityThe subset of data available is non-constant, e.g. time-of-day vs. country
![Page 64: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/64.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 65: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/65.jpg)
Server
Until Converged:1. Select a random subset (e.g. 1000) of the (online) clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return θ'-θt to server.
The Federated Averaging algorithm
θt
θ'
![Page 66: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/66.jpg)
Server
Until Converged:1. Select a random subset (e.g. 1000) of the (online) clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return θ'-θt to server.
The Federated Averaging algorithm
Potential bottleneck θt
θ'
![Page 67: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/67.jpg)
Upload time can be bottleneckAsymmetric connection speed
Download generally faster than uploadhttp://www.speedtest.net/reports/
![Page 68: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/68.jpg)
Upload time can be bottleneckAsymmetric connection speed
Download generally faster than uploadhttp://www.speedtest.net/reports/
Some markets more data sensitiveE.g. India, Nigeria, Brazil, ...
![Page 69: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/69.jpg)
Upload time can be bottleneckAsymmetric connection speed
Download generally faster than uploadhttp://www.speedtest.net/reports/
Some markets more data sensitiveE.g. India, Nigeria, Brazil, ...
Secure AggregationFurther increases the data needed to communicate
![Page 70: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/70.jpg)
Upload time can be bottleneckAsymmetric connection speed
Download generally faster than uploadhttp://www.speedtest.net/reports/
Some markets more data sensitiveE.g. India, Nigeria, Brazil, ...
Secure AggregationFurther increases the data needed to communicate
Energy consumptionData transmission generally power intensive
![Page 71: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/71.jpg)
∑
Federated Learning
We are not interested in the updates themselves
![Page 72: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/72.jpg)
Distributed Mean Estimation with Limited Communication
zi z 1n
![Page 73: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/73.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization Unquantize
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
1n
![Page 74: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/74.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization Unquantize
Stochastic Binary Quantization (1 bit per dimension)
yi 1n
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 75: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/75.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization Unquantize
Stochastic Binary Quantization (1 bit per dimension)
Mean Squared Error:
If d is large, this is prohibitive.
1n
yi
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 76: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/76.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
yizi z
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 77: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/77.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
yizi z
Much better for large dCan be modified to O(1/n) A. T. Suresh, F.. Yu, S. Kumar, & H. B.
McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 78: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/78.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
yizi z
Efficiency:
Avoid representing, transmitting, or inverting R, which is d ✕ d.
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 79: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/79.jpg)
Structured Matrix: R = HDD: diagonal matrix of i.i.d. Rademacher entries (±1)H: Walsh-Hadamard matrix
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
yizi z
recurse: inverse:
A. T. Suresh, F.. Yu, S. Kumar, & H. B. McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 80: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/80.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
yizi z
Structured Matrix: R = HD
Rotation & Inverse RotationTime: O(d log d)Additional Space: O(1)Communication (seed): O(1) A. T. Suresh, F.. Yu, S. Kumar, & H. B.
McMahan. Distributed Mean Estimation with Limited Communication. ICML 2017.
![Page 81: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/81.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
ExpandSubsample
Randomly select subset of elements
![Page 82: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/82.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
ExpandSubsample
Randomly select subset of elements
Efficiency:
Communicate only subsampled values
Corresponding indices can be represented as a random seed
![Page 83: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/83.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
ExpandSubsample
Randomly select subset of elements
Practice:
Some subsampling and moderate quantization
Better than
No subsampling and aggressive quantization
![Page 84: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/84.jpg)
Distributed Mean Estimation with Limited Communication
xi xStochasticquantization UnquantizeRandom
RotationInverse
Rotation 1n
ExpandSubsample
Existing related work● Mostly alternative to Quantization● Complementary to Subsampling and Rotation
○ Not necessarily efficient (MPI GPU-to-GPU communication)
For instance:D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. NIPS 2017
W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, H. Li. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. NIPS 2017
![Page 85: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/85.jpg)
Federated Learning with Compressed Updates
J. Konečný, H. B. McMahan, F. Yu, P. Richtarik, A. T. Suresh, D. Bacon Federated Learning: Strategies for Improving Communication Efficiency. arXiv:1610.05492
![Page 86: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/86.jpg)
Federated Learning with Compressed Updates
J. Konečný, H. B. McMahan, F. Yu, P. Richtarik, A. T. Suresh, D. Bacon Federated Learning: Strategies for Improving Communication Efficiency. arXiv:1610.05492
![Page 87: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/87.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 88: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/88.jpg)
Federated Learning
∑
Server aggregates users' updates into a new model.
Recall the aggregation step of federated learning:
![Page 89: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/89.jpg)
Federated Learning
Might these updates contain privacy-sensitive data?
![Page 90: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/90.jpg)
Federated Learning
1. Ephemeral
Might these updates contain privacy-sensitive data?
![Page 91: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/91.jpg)
Federated Learning
1. Ephemeral
2. Focused
Might these updates contain privacy-sensitive data?
![Page 92: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/92.jpg)
Federated Learning
1. Ephemeral
2. Focused
→ Federated Learning is aprivacy-preserving technology.
Might these updates contain privacy-sensitive data?
![Page 93: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/93.jpg)
Federated Learning
1. Ephemeral
2. Focused
3. Only in Aggregate∑
Might these updates contain privacy-sensitive data?
![Page 94: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/94.jpg)
Wouldn't it be great if...
Server aggregates users' updates, but cannot inspect the individual updates.
∑
![Page 95: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/95.jpg)
Confidential + Proprietary
Secure Aggregation.
![Page 96: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/96.jpg)
Confidential + Proprietary
Secure Aggregation.Existing protocols either:
Transmit a lot of data
Fail when users drop out
(or both)
![Page 97: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/97.jpg)
Confidential + Proprietary
Secure Aggregation.Existing protocols either:
Transmit a lot of data
Fail when users drop out
(or both)
A novel protocol for K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth. Practical Secure Aggregation for Privacy-Preserving Machine Learning. To appear at CCS 2017.
![Page 98: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/98.jpg)
Alice
Bob
Carol
Random positive/negative pairs, aka antiparticles
Devices cooperate to sample random pairs of 0-sum perturbations vectors.
Matched pair sums to 0
![Page 99: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/99.jpg)
Alice
Bob
Carol
Random positive/negative pairs, aka antiparticles
Devices cooperate to sample random pairs of 0-sum perturbations vectors.
![Page 100: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/100.jpg)
Alice
Bob
Carol
Add antiparticles before sending to the server
Each contribution looks random on its own...
++
+
+
+
+
![Page 101: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/101.jpg)
The antiparticles cancel when summing contributions
++
+
+
+
but paired antiparticles cancel out when summed.
Each contribution looks random on its own...
+∑
Alice
Bob
Carol
![Page 102: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/102.jpg)
Revealing the sum.
++
+
but paired antiparticles cancel out when summed.
Each contribution looks random on its own...
+
+
+∑
Alice
Bob
Carol
![Page 103: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/103.jpg)
Confidential + Proprietary
But there are two problems...
![Page 104: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/104.jpg)
Carol
1. These vectors are big! How do users agree efficiently?
Alice
Bob
![Page 105: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/105.jpg)
2. What if someone drops out?
++
+
+
+
+∑
Alice
Bob
Carol
![Page 106: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/106.jpg)
++
++
That's bad.
2. What if someone drops out?
∑
Alice
Bob
Carol
![Page 107: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/107.jpg)
Alice
Bob
Carol
Pairwise Diffie-Hellman Key Agreement
a
b
Secret
c
![Page 108: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/108.jpg)
Alice
Bob
Carol
Pairwise Diffie-Hellman Key Agreement
a
b
Safe to reveal(cryptographically
hard to infer secret)
Secret
Public parameters: g, (mod p)c
gc
ga
gb
![Page 109: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/109.jpg)
Alice
Bob
Carol
ga
gb
gc
a
b
Because gx are public, we can share them via the server.
c
Pairwise Diffie-Hellman Key Agreement
![Page 110: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/110.jpg)
Alice
Bob
Carol
ga
gb
gc
ga
gc
gb
gc
ga
gb
a
b
c
Saves a copy
Pairwise Diffie-Hellman Key Agreement
![Page 111: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/111.jpg)
ga
gc
gb
gc
ga
gb
a
b
c
Combine with secret
Pairwise Diffie-Hellman Key AgreementAlice
Bob
Carol
![Page 112: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/112.jpg)
gba
gab
cgac
gca
gcb
gbc
a
bga
gc
gb
gc
ga
gb
Commutative op→ Shared secret!
Pairwise Diffie-Hellman Key AgreementAlice
Bob
Carol
![Page 113: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/113.jpg)
gba
gab
cgac
gca
gcb
gbc
a
b
Secrets are scalars, but….
Shared secret!
Pairwise Diffie-Hellman Key AgreementAlice
Bob
Carol
![Page 114: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/114.jpg)
gba
gab
cgac
gca
gcb
gbc
a
b
Secrets are scalars, but….
Use each secret to seed a pseudorandom number generator,generate paired antiparticle vectors.
PRNG(gba) → = - Shared secret!
Pairwise Diffie-Hellman Key Agreement + PRNG ExpansionAlice
Bob
Carol
![Page 115: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/115.jpg)
a
b
c
Pairwise Diffie-Hellman Key Agreement + PRNG Expansion
Secrets are scalars, but….
Use each secret to seed a pseudorandom number generator,generate paired antiparticle vectors.
PRNG(gba) → = -
Alice
Bob
Carol
![Page 116: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/116.jpg)
a
b
c
1. Efficiency via pseudorandom generator
2. Mobile phones typically don't support peer-to-peer communication anyhow.
3. Fewer secrets = easier recovery.
Pairwise Diffie-Hellman Key Agreement + PRNG Expansion
Secrets are scalars, but….
Use each secret to seed a pseudorandom number generator,generate paired antiparticle vectors.
PRNG(gba) → = -
Alice
Bob
Carol
![Page 117: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/117.jpg)
k-out-of-n Threshold Secret SharingGoal: Break a secret into n pieces, called shares. ● <k shares: learn nothing● ≥k shares: recover s perfectly.
s
2-out-of-3 secret sharing: Each line is a share
x-coordinate of the intersection is the secret
![Page 118: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/118.jpg)
k-out-of-n Threshold Secret Sharing
?? ?
Goal: Break a secret into n pieces, called shares. ● <k shares: learn nothing● ≥k shares: recover s perfectly
![Page 119: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/119.jpg)
k-out-of-n Threshold Secret Sharing
ss s
Goal: Break a secret into n pieces, called shares. ● <k shares: learn nothing● ≥k shares: recover s perfectly
![Page 120: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/120.jpg)
c
a
b
Users make shares of their secrets Alice
Bob
Carol
![Page 121: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/121.jpg)
Alice
Bob
Carol
c
a
b
And exchange with their peers
![Page 122: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/122.jpg)
++
++
That's bad.
∑
Alice
Bob
Carol
![Page 123: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/123.jpg)
b?ga
gb
gc
++
++
∑
Alice
Bob
Carol
![Page 124: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/124.jpg)
gc
ga
gb
++
++
∑
Alice
Bob
Carol
![Page 125: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/125.jpg)
gc
ga++
++
∑gb
Alice
Bob
Carol
![Page 126: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/126.jpg)
bgc
ga
gb
++
++
∑
Alice
Bob
Carol
![Page 127: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/127.jpg)
++
++
ga
bgc
∑
Alice
Bob
Carol
![Page 128: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/128.jpg)
++
++
ga
bgc
∑
Alice
Bob
Carol
![Page 129: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/129.jpg)
ga
b
Enough honest users + a high enough threshold ⇒ dishonest users cannot reconstruct the secret.
gc
++
++
∑
Alice
Bob
Carol
![Page 130: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/130.jpg)
ga
b
Enough honest users + a high enough threshold ⇒ dishonest users cannot reconstruct the secret.
However….
gc
++
++
∑
Alice
Bob
Carol
![Page 131: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/131.jpg)
ga
b + +late.
gc
++
++
∑
Alice
Bob
Carol
![Page 132: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/132.jpg)
++
++
+ +
ga
b late.
Oops.
gc
∑
Alice
Bob
Carol
![Page 133: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/133.jpg)
++
+
+
+
+
Alice
Bob
Carol
![Page 134: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/134.jpg)
++
+
+
+
+
+
+
+
Alice
Bob
Carol
![Page 135: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/135.jpg)
c
a
b
Alice
Bob
Carol
![Page 136: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/136.jpg)
c
a
b
Alice
Bob
Carol
![Page 137: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/137.jpg)
Alice
Bob
Carol
abc :
abc :
:
abc :
:
:
![Page 138: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/138.jpg)
++
++
+
+
abc :
:
abc :
:
Alice
Bob
Carol
![Page 139: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/139.jpg)
abc :
:
( , b, )?
abc :
:
++
++
+
+
Alice
Bob
Carol
![Page 140: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/140.jpg)
abc :
:
abc :
:
abc :
:
abc :
:
++
++
+
+
Alice
Bob
Carol
![Page 141: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/141.jpg)
++
++
+
+
b
abc :
:
abc :
:
abc :
:
abc :
:
Alice
Bob
Carol
![Page 142: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/142.jpg)
++
++
+
+
b
abc :
:
abc :
:
abc :
:
abc :
:
Alice
Bob
Carol
![Page 143: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/143.jpg)
++
++
+
+
b
abc :
:
abc :
:
abc :
:
abc :
:
Alice
Bob
Carol
![Page 144: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/144.jpg)
abc :
abc :
:
:
abc :
:
abc :
:
b
++
++
+
+
∑
Alice
Bob
Carol
![Page 145: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/145.jpg)
b + ++late.
abc :
abc :
:
:
abc :
:
++
++
+
+
∑
Alice
Bob
Carol
![Page 146: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/146.jpg)
+ ++b late.
Permanent.
(honest users already gave b)
abc :
abc :
:
:
abc :
:
++
++
+
+
∑
Alice
Bob
Carol
![Page 147: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/147.jpg)
Server aggregates users' updates, but cannot inspect the individual updates.
∑
Secure Aggregation
# Params Bits/Param # Users Expansion
220 = 1 m 16 210 = 1 k 1.73x
224 = 16 m 16 214 = 16 k 1.98x
Communication Efficient
Secure⅓ malicious clients + fully observed server
Robust
⅓ clients can drop out
Interactive Cryptographic ProtocolEach phase, 1000 clients + server interchange messages over 4 rounds of communication.
K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth. Practical Secure Aggregation for Privacy-Preserving Machine Learning. To appear at CCS 2017.
![Page 148: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/148.jpg)
This Talk
Differential Privacy▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
![Page 149: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/149.jpg)
Federated Learning
1. Ephemeral
2. Focused
3. Only in Aggregate
4. Differential Privacy
∑
Might these updates contain privacy-sensitive data?
Might the final model memorize a user's data?
![Page 150: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/150.jpg)
Differential Privacy
Differential privacy is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it.
Andy GreenbergWired 2016.06.13
![Page 151: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/151.jpg)
∑
Differential Privacy
![Page 152: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/152.jpg)
∑
Differential PrivacyDifferential Privacy(trusted aggregator)
+
![Page 153: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/153.jpg)
Server
Until Converged:1. Select a random subset (e.g. C=100) of the (online) clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return θ'-θt to server.
Federated Averaging
θt
θ'
![Page 154: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/154.jpg)
Server
Until Converged:1. Select each user independently with probability q, for say E[C]=1000 clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return θ'-θt to server. θt
θ'
Differentially-Private Federated AveragingMcMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models.
![Page 155: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/155.jpg)
Server
Until Converged:1. Select each user independently with probability q, for say E[C]=1000 clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + data-weighted average of client updates
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return Clip(θ'-θt) to server. θt
θ'
Differentially-Private Federated AveragingMcMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models.
![Page 156: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/156.jpg)
Server
Until Converged:1. Select each user independently with probability q, for say E[C]=1000 clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + bounded sensitivity data-weighted average of client updates
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return Clip(θ'-θt) to server. θt
θ'
Differentially-Private Federated AveragingMcMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models.
![Page 157: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/157.jpg)
Server
Until Converged:1. Select each user independently with probability q, for say E[C]=1000 clients
2. In parallel, send current parameters θt to those clients
3. θt+1 = θt + bounded sensitivity data-weighted average of client updates
+ Gaussian noise N(0, Iσ2)
Selected Client k
1. Receive θt from server.
2. Run some number of minibatch SGD steps, producing θ'
3. Return Clip(θ'-θt) to server. θt
θ'
Differentially-Private Federated AveragingMcMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models.
![Page 158: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/158.jpg)
Privacy Accounting for Noisy SGD: Moments Accountant
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, & L. Zhang. Deep Learning with Differential Privacy.CCS 2016.
Moments Accountant
Previous composition theorems
← B
ette
r
(Sm
alle
r Eps
ilon
= M
ore
Priv
acy)
![Page 159: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/159.jpg)
Baseline Trainingusers per round = 10017.5% accuracy in 4120 rounds
(1.152, 1e-9) DP Training[users per round] = 5k
17.5% accuracy in 5000 rounds
Private training achieves equal accuracy, but using 60x more computation.
User Differential Privacy for Federated Language Models
McMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models.
![Page 160: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/160.jpg)
∑
Differential PrivacyDifferential Privacy(trusted aggregator)
+
![Page 161: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/161.jpg)
∑
Differential PrivacyLocal Differential Privacy
+
+
+
![Page 162: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/162.jpg)
∑
Differential PrivacyDifferential Privacywith Secure Aggregation+
+
+
![Page 163: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/163.jpg)
● Less local noise for DP because individual updates not observed
● Quantization (necessary for SA) also used by CU. (more efficient combination open question)
Secure Aggregation (SA)● Lower communication costs good for FL
settings, especially with SA● Structured noise of CU may complement or
replace local noise for DP (open question)
Compressed Updates (CU)
Inherently Complementary Techniques
● Touching each user’s data infrequently is good for user-level DP guarantees
● FedAvg updates are simple averages, making DP analysis easier and allowing SA
● Averaging over many users allows high CU without losing too much signal
Federated Learning (FL)● Local noise may have regularization effect
for FL (open question)● Requires bounding update norms, reducing
error of quantization in CU● Bounding the norm of individual users may
mitigate abuse potential under SA (open)
Differential Privacy (DP)
![Page 164: Learning without Centralized Training Data Privacy ...jakubkonecny.com/files/2018-01_UW_Federated_Learning.pdf · Meta Learning represents a good initialization for standard algorithms](https://reader034.fdocuments.us/reader034/viewer/2022042321/5f0a9c1c7e708231d42c79eb/html5/thumbnails/164.jpg)
Differential Privacy
Federated Learning Privacy-Preserving Collaborative Machine Learning without Centralized Training Data
Jakub Konečný / [email protected]
▶
Secure Aggregation▶
Minimizing Communication▶
FL Algorithms▶
FL vs. other distributed optimization
▶
Why Federated Learning?▶
1
2
3
4
5
6
On-device learning has many advantages
Federated Learning:training a shared global model,from a federation of participating deviceswhich maintain control of their own data,with the facilitation of a central server.
Federated Learning is PracticalFederatedAveraging often converges quickly(in terms of communication rounds)
Inherently Complementary TechniquesFederated Learning Differential PrivacySecure Aggregation Compressed Updates