Boosting and Differential Privacy
-
Upload
colin-rhodes -
Category
Documents
-
view
34 -
download
5
description
Transcript of Boosting and Differential Privacy
Boosting and Differential Privacy
Cynthia Dwork, Microsoft Research
The Power of Small, Private, Miracles
Joint work with Guy Rothblum and Salil Vadhan
Boosting [Schapire, 1989] General method for improving accuracy of any given learning
algorithm
Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic
Labels are {+1, -1} Run many times; combine the resulting heuristics
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
Terminate?
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
How? Terminate?
Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from Dk
Combiner is majority Weight update:
If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent”
If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent”
Re-normalize to obtain updated D
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
Why Does it Work?Update rule: multiply weight by exp(-ct (i))
Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt
Nt Dt+1(i) = Dt(i) exp(-ct(i))
NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))
s Ns Dt+1 (i) = (1/m) exp (- s cs(i))
i s Ns Dt+1 (i) = (1/m) i
exp (- s cs(i))
s Ns = (1/m) i exp (-s cs(i))
At(i) correct?
s Ns = (1/m) i exp (-s cs(i)) s Ns is shrinking exponentially (depends on ´)
Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise
i exp (-s cs(i)) = i exp (- yis As(i)) This is an upper bound on # of incorrectly classified examples:
If yi ≠ sign[s As(i)] ( = majority{A1(i), A2(i),…}),
then yi s As(i) < 0, so exp(-yi s As(i)) ≥ 1.
Therefore, the number of incorrectly classified examples is exponentially small in t
-1/+1renormalize
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …majority
Initially:D uniform on DB rows
Does well on ½ + ´ of D
Privacy?
Terminate?
Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large
Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting
Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may
cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)
Iterative Smoothing Not today.
Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries,
produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).
Assume existence of (²0, ±0)-dp Base Learner producing an
object O that does well on more than half of D Pr q » D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Initially:D uniform on Q
Does well on ½ + ´ of D
-1/+1renormalize
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …median
Initially:D uniform on Q
Does well on ½ + ´ of D
Privacy?
Individual can affectmany queries at once!
Terminate?
Privacy is Problematic In smooth boosting for people, at each round an individual has
only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of
q(At) simultaneously for many q As time progresses, distributions on neighboring databases could
evolve completely differently, yielding very different A t’s Slightly ameliorated by sampling (if only a few samples, maybe can
avoid the q’s on the edge?)
How can we make the re-weighting less sensitive?
Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Qk
Combiner is median [viz. Freund92] Weight update for queries
If very well approximated by At, decrease weight by factor of e (“-1”)
If very poorly approximated by At, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up):
2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)
+
Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/´2 rounds
with
¹ = ((log | Q |/´2 )2 ½ √k ) / ²
The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q .
Better privacy (small ²) gives worse utility (larger ¹) Better base learner (smaller k, larger ´) helps
Proving Privacy Technique #1: Pay Your Debt and Move On
Fix A1, A2, …, At (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On”
Based on distributions Dt+1 and D’ t+1 determined in round t Will call them D, D’
Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q1, q2, …, in turn
For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq »D ln ( D[q ] / D’[q] )|
Prq1,…,qk [| i ln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2)
AB
Bounding Eq »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0
· Eq » D ln[ D(q)/D’(q) ] · 2A2 (that is, B · 2A2).
KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0
So, KL(D||D’) · KL(D||D’) + KL(D’||D)= q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ]
· q 0 + |D’(q)-D(q)| A
= A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ]
· A q eA min (D(q),D’(q)) - min (D(q),D’(q))
· A q (eA – 1) min (D(q),D’(q))
· 2A2 when A < 1
Compare DiDwN03
Motivation and Application Boosting for People
Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives
Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases
in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled)
Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n2/3) DNNRV09: O(n1/2 |Q |o(1)) Now: O(n1/2 log2 |Q |)
Result is more general Only know of base learner for counting queries
Base Learner
S: Labeled examples from D
A1, A2, …
Update D
A
Combine A1, A2, …
Does well on ½ + ´ of D
Terminate?