Collaborative, Privacy-Preserving Data Aggregation at Scale
description
Transcript of Collaborative, Privacy-Preserving Data Aggregation at Scale
![Page 1: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/1.jpg)
Collaborative, Privacy-PreservingData Aggregation at Scale
Michael J. FreedmanPrinceton University
Joint work with: Benny Applebaum, Haakon Ringberg, Matthew Caesar, and Jennifer Rexford
![Page 2: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/2.jpg)
Problem:Network Anomaly Detection
![Page 3: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/3.jpg)
Collaborative anomaly detection
• Some attacks look like normal traffic– e.g., SQL-injection, application-level DoS [Srivatsa TWEB ‘08]
• Is it a DDoS attack or a flash crowd? [Jung WWW ‘02]
Yahoo!Google
Bing
I’m not sure about Beasty!
I’m not sure about Beasty!
I’m not sure about Beasty!
![Page 4: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/4.jpg)
Collaborative anomaly detection
• Targets (victims) could correlate attacks/attackers[Katti IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03]
Yahoo!Google
Bing
“Fool us once, shame on you. Fool us N times, shame on us.”
![Page 5: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/5.jpg)
Problem:Network Anomaly Detection
Solution:• Aggregate suspect IPs from many ISPs• Flag those IPs that appear > threshold τ
![Page 6: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/6.jpg)
Problem:Distributed Ranking
Solution:• Collect domain statistics from many users• Aggregate data by domain
![Page 7: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/7.jpg)
Problem:
…
Solution:• Aggregate (id, data) from many sources• Analyze data grouped by id
![Page 8: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/8.jpg)
But what about privacy?
What inputs are submitted?
Who submitted what?
![Page 9: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/9.jpg)
Data Aggregation Problem
• Many participants, each with (key, value) observation• Goal: Aggregate observations by key
Key Valuesk1 ( va, vb )k2 ( vi, vj, vk )…kn ( vx )
A A
A
![Page 10: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/10.jpg)
Data Aggregation Problem
• Many participants, each with (key, value) observation• Goal: Aggregate observations by key
Key Valuesk1 ( va, vb )k2 ( vi, vj, vk )…kn ( vx )
A A
A
F (F (
F (
))
)
PDA: Only release the value column CR-PDA: Plus keys whose values satisfy some func
![Page 11: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/11.jpg)
Data Aggregation Problem
• Many participants, each with (key, value) observation• Goal: Aggregate observations by key
Key Valuesk1 ( 1, 1 )k2 ( 1, 1, 1 )…kn ( 1 )
Σ Σ
Σ
PDA: Only release the value column CR-PDA: Plus keys whose values satisfy some func
≥ τ?
≥ τ?
≥ τ?
![Page 12: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/12.jpg)
Goals• Keyword privacy: No party learns anything about keys
• Participant privacy: No party learns who submitted what
• Efficiency: Scale to many participants, each with many inputs
• Flexibility: Support variety of computations over values
• Lack of coordination: – No synchrony required, individuals cannot prevent progress– All participants need not be online at same time
![Page 13: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/13.jpg)
Potential solutionsApproach
KeywordPrivacy
ParticipantPrivacy Efficiency Flexibility
Lack ofCoord
GarbledCircuit
Evaluation
MultipartySet Intersection
Yes Yes Very Poor Yes No
Yes Yes Poor No NoDece
ntra
lized
![Page 14: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/14.jpg)
Security Efficiency
• Weaken security assumptions?– Assume honest but curious participants?
– Assume no collusion among malicious participants?
• In large/open setting, easy to operate multiple nodes (so-called “Sybil attack”)
![Page 15: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/15.jpg)
Towards Centralization?
DB
Participants
![Page 16: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/16.jpg)
Potential solutionsApproach
KeywordPrivacy
ParticipantPrivacy Efficiency Flexibility
Lack ofCoord
GarbledCircuit
Evaluation
MultipartySet Intersection
HashingInputs
NetworkAnonymization
Yes Yes Very Poor Yes No
Yes Yes Poor No No
No No Very Good Yes Yes
No Yes Very Good Yes Yes
Dece
ntra
lized
Cent
raliz
ed
![Page 17: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/17.jpg)
Towards semi-centralization
Participants
Proxy DB
Assumption: Proxy and DB do
not collude
![Page 18: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/18.jpg)
Potential solutionsApproach
KeywordPrivacy
ParticipantPrivacy Efficiency Flexibility
Lack ofCoord
GarbledCircuit
Evaluation
MultipartySet Intersection
HashingInputs
NetworkAnonymization
ThisWork
Yes Yes Very Poor Yes No
Yes Yes Poor No No
No No Very Good Yes Yes
No Yes Very Good Yes Yes
Yes Yes Good Yes Yes
Dece
ntra
lized
Cent
raliz
ed
![Page 19: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/19.jpg)
Privacy Guarantees
• Privacy of PDA against malicious entities and participants – Malicious participant may collude with either malicious
proxy or DB, but not both– May violate correctness in almost arbitrary ways
• Privacy of CR-PDA against honest-but-curious entities and malicious participants
![Page 20: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/20.jpg)
PDA Strawman #0
Participant Proxy DB
1. Client sends input k
k
![Page 21: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/21.jpg)
PDA Strawman #1
Participant Proxy DB
1. Client sends encrypted input k2. Proxy batches and retransmits3. DB decrypts input
ds
k #1.1.1.1 12.2.2.2 9
Violates keyword privacy
EDB(k) EDB(k)
![Page 22: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/22.jpg)
ds
PDA Strawman #2
Participant Proxy DB
1. Client sends hashes of k2. Proxy batches and retransmits3. DB decrypts input
H (k) #H(1.1.1
.1)1
H(2.2.2.2)
9
Still violates keyword privacy:IPs drawn from small domains
EDB( H (k) ) EDB( H (k) )
![Page 23: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/23.jpg)
PDA Strawman #3
Participant Proxy DB
1. Client sends keyed hashes of k– Keyed hash function (PRF)– Key s known only by proxy
Fs (k) #Fs(1.1.1
.1)1
Fs(2.2.2.2)
9
EDB( Fs (k) ) EDB( Fs (k) )
But how do clients learn Fs (IP)) ?
Secret s
![Page 24: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/24.jpg)
Our Basic PDA Protocol
Participant Proxy DB
1. Client sends keyed hashes of k– Fs(x) learned by client through
Oblivious PRF protocol
2. Proxy batches and retransmits keyed hash3. DB decrypts input
Fs (k) #Fs(1.1.1
.1)1
Fs(2.2.2.2)
9
EDB( Fs (k) )OPRF
EDB( Fs (k) ) Fs (k)
Secret s
![Page 25: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/25.jpg)
Fs (k) #Fs(1.1.1
.1)1
Fs(2.2.2.2)
9
retransmits
Basic CR-PDA Protocol
Participant Proxy DB
1. Client sends keyed hashes of k,and encrypted k for recovery
2. Proxy retransmits keyed hash3. DB decrypts input4. Identify rows to release and transmit EPRX (k) to proxy5. Proxy decrypts k and releases
EDB( Fs (k) ) Fs (k)EDB(EPRX (k))
EPRX (k)
Fs (k) # Enc’d k
Fs(1.1.1.1)
1 EPRX(1.1.1
.1)Fs(2.2.2.
2)9 EPRX(2.2.2
.2)
Secret s
![Page 26: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/26.jpg)
retransmits
Privacy Properties
Participant Proxy DB
• Any coalition of HBC participants• HBC coalition of proxy and participants• HBC database
EDB( Fs (k) ) Fs (k)EDB(EPRX (k))
EPRX (k)
• Keyword privacy: Nothing learned about unreleased keys• Participant privacy: Key Participant not learned
Secret s
![Page 27: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/27.jpg)
retransmits
Privacy Properties
Participant Proxy DB
• Any coalition of HBC participants• HBC coalition of proxy and participants• HBC database
EDB( Fs (k) ) Fs (k)EDB(EPRX (k))
EPRX (k)
• Keyword privacy: Nothing learned about unreleased keys• Participant privacy: Key Participant not learned
Secret s
malicious participants
HBC coalition of DB and participants
![Page 28: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/28.jpg)
retransmits
More Robust PDA Protocol
Participant Proxy DB
• Any coalition of HBC participants• HBC coalition of proxy and participants• HBC database
EDB( Fs (k) ) Fs (k)EDB(EPRX (k))
EPRX (k)Secret s
malicious participants
HBC coalition of DB and participants
• ORPF Encrypted OPRF Protocol• Ciphertext re-randomization by proxy• Proof by participant that submitted k’s match
![Page 29: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/29.jpg)
Encrypted-OPRF protocol• Problem: in basic OPRF protocol, participant learns Fs(k)
• Encrypted-OPRF protocol:– Client learns blinded Fs(k)
– Client encrypts to DB
– Proxy can unblind Fs(k) “under the encryption”
( ) r -1Enc ( )
( ) rFs(k)
(π si)ki=1El Gamal g mod p
![Page 30: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/30.jpg)
Encrypted-OPRF protocol• Problem: in basic OPRF protocol, participant learns Fs(k)
• Encrypted-OPRF protocol– Client learns blinded Fs(k)
– Client encrypts to DB
– Proxy can unblind Fs(k) “under the encryption”
• OPRF runs OT protocol for each bit of input k• OT protocols expensive, so use batch OT protocol [Ishai et al]
( ) r -1Enc ( )
( ) rFs(k)
![Page 31: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/31.jpg)
Scalable Protocol Architecture
ParticipantsClient-Facing
Proxies
Share secret s
Proxy Decryption
Oracles
SharePRX key
Front-EndDB Tier
ShareDB key
Back-EndDB Storage
PartitionFs keyspace
![Page 32: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/32.jpg)
Evaluation• Scalable architecture implemented– Basic CR-PDA / PDA protocol
+ and encrypted-OPRF protocol w/ Batch OT
– ~5000 lines of threaded C++, GnuPG for crypto
• Testbed of 2 GHz Linux machines
Algorithm Parameter ValueRSA / ElGamal key size 1024 bits
Oblivious Transfer k 80AES key size 256 bits
![Page 33: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/33.jpg)
Throughput vs. participant batch size
Single CPU core for DB and proxy each
![Page 34: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/34.jpg)
Maximum throughput per server
Four CPU cores for DB and proxy (each)
![Page 35: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/35.jpg)
Throughput scalability
Number CPU cores per DB and proxy (each)
![Page 36: Collaborative, Privacy-Preserving Data Aggregation at Scale](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681672e550346895ddbd33d/html5/thumbnails/36.jpg)
Summary• Privacy-Preserving Data Aggregation protects:– Participants: Do not reveal who submitted what– Keywords: Only reveal values / released keys
• Novel composition of crypto primitives– Based on assumption that 2+ known parties don’t collude
• Efficient implementation of architecture– Scales linearly with computing resources– Ex: Millions of suspected IPs in hours
• Of independent interest…– Introduced encrypted OPRF protocol– First implementation/validation of Batch OT protocol