Dual Query: Practical Private Query Release for High Dimensional Data
-
Upload
steven-wu -
Category
Technology
-
view
170 -
download
2
Transcript of Dual Query: Practical Private Query Release for High Dimensional Data
Dual Query:Practical Private Query Release for
High Dimensional Data
Speaker: Steven WuUniversity of Pennsylvania
ICML 2014
Joint work withMarco Gaboardi
Emilio Jesús Gallego AriasJustin Hsu
Aaron Roth
Sensitive Database
(Medical Records)
Queries
Release answers that preserve privacy
Private Query Release
D
Differential Privacy
Algorithm
ratio bounded
AliceAlice BobBob ChrisChris DonnaDonna ErnieErnieXavierXavier
Differential Privacy (DMNS06)
• An algorithm A with domain X and range R satisfies ε-differential privacy if for every outcome r and every pair of databases D, D’ differing in one record:
Pr[ A(D) = r ] ≤ (1 + ε)Pr[ A(D’) = r ]
Useful Properties:
• Strong, worst-cast notion of privacy• Similar to stability for learning algorithms
More Formally
Release approximate answers to a large collection of queries with
Privacy and Accuracy
Answer Exponentially Many
queries
• Privately learn a distribution D’ approximating D
True Database Approximate Database
Learning Algorithm
ApproximatelySame Answers on the queries
Learn from Learning Theory
• [DRV08]: query release via boosting
• [HR10]: use multiplicative weights (MW) update algorithm to learn a distribution
• [HLM12]: experimentally evaluated the MW algorithm, performs well for ≤ 80 attributes
What is the bottleneck?
The algorithm operates on the distribution of all possible data records:
Exponential in d !
Impossibility Result• No private algorithm can answer exponentially large
collection of queries efficiently and accurately
• Shown by a line of lower bounds:[DNRRV09] [Ullman-Vadhan11] [Ullman13] [BUV14]
• Problem theoretically hard in the worst case
• But can we do something in practice? (not with exponential space)
Computing the Equilibrium
Multiplicative Weights vs. Best Response
Data Player Query Player
Converge toApproximate Equilibrium
exponential size distribution
Dual Approach
Multiplicative Weights vs. Best Response
Data PlayerQuery Player
Solve an NP-Hard Problem
Best Response Problem
• Minimize error w.r.t query player’s distribution• Concisely represented but NP-Hard• Can be encoded as an integer program
Send it to CPLEX Solver
Don’t Need to Optimize ExactlyIf the optimization problem is too hard, stop CPLEX and return the current solution
Take-Away• Private Query Release for High Dimensional Data is
Hard
• Reconfigure Existing Algorithm to Isolate the Hard Part
• Dual Query: an algorithm that performs well in practice