Boosting and Differential Privacy

24
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research

description

Boosting and Differential Privacy. Cynthia Dwork, Microsoft Research. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. The Power of Small, Private, Miracles. Joint work with Guy Rothblum and Salil Vadhan. TexPoint fonts used in EMF. - PowerPoint PPT Presentation

Transcript of Boosting and Differential Privacy

Page 1: Boosting and Differential  Privacy

Boosting and Differential Privacy

Cynthia Dwork, Microsoft Research

Page 2: Boosting and Differential  Privacy

The Power of Small, Private, Miracles

Joint work with Guy Rothblum and Salil Vadhan

Page 3: Boosting and Differential  Privacy

Boosting [Schapire, 1989] General method for improving accuracy of any given learning

algorithm

Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic

Labels are {+1, -1} Run many times; combine the resulting heuristics

Page 4: Boosting and Differential  Privacy

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …

Does well on ½ + ´ of D

Terminate?

Page 5: Boosting and Differential  Privacy

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …

Does well on ½ + ´ of D

How? Terminate?

Page 6: Boosting and Differential  Privacy

Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from Dk

Combiner is majority Weight update:

If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent”

If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent”

Re-normalize to obtain updated D

Page 7: Boosting and Differential  Privacy

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Page 8: Boosting and Differential  Privacy

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Page 9: Boosting and Differential  Privacy

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Page 10: Boosting and Differential  Privacy

Why Does it Work?Update rule: multiply weight by exp(-ct (i))

Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt

Nt Dt+1(i) = Dt(i) exp(-ct(i))

NtNt-1…N1Dt+1(i) = D1(i)exp(-s cs(i))

s Ns Dt+1 (i) = (1/m) exp (- s cs(i))

i s Ns Dt+1 (i) = (1/m) i

exp (- s cs(i))

s Ns = (1/m) i exp (-s cs(i))

At(i) correct?

Page 11: Boosting and Differential  Privacy

s Ns = (1/m) i exp (-s cs(i)) s Ns is shrinking exponentially (depends on ´)

Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise

i exp (-s cs(i)) = i exp (- yis As(i)) This is an upper bound on # of incorrectly classified examples:

If yi ≠ sign[s As(i)] ( = majority{A1(i), A2(i),…}),

then yi s As(i) < 0, so exp(-yi s As(i)) ≥ 1.

Therefore, the number of incorrectly classified examples is exponentially small in t

Page 12: Boosting and Differential  Privacy

-1/+1renormalize

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …majority

Initially:D uniform on DB rows

Does well on ½ + ´ of D

Privacy?

Terminate?

Page 13: Boosting and Differential  Privacy

Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large

Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting

Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may

cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)

Page 14: Boosting and Differential  Privacy

Iterative Smoothing Not today.

Page 15: Boosting and Differential  Privacy

Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries,

produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB).

Assume existence of (²0, ±0)-dp Base Learner producing an

object O that does well on more than half of D Pr q » D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)

Page 16: Boosting and Differential  Privacy

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …

Initially:D uniform on Q

Does well on ½ + ´ of D

Page 17: Boosting and Differential  Privacy

-1/+1renormalize

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …median

Initially:D uniform on Q

Does well on ½ + ´ of D

Privacy?

Individual can affectmany queries at once!

Terminate?

Page 18: Boosting and Differential  Privacy

Privacy is Problematic In smooth boosting for people, at each round an individual has

only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of

q(At) simultaneously for many q As time progresses, distributions on neighboring databases could

evolve completely differently, yielding very different A t’s Slightly ameliorated by sampling (if only a few samples, maybe can

avoid the q’s on the edge?)

How can we make the re-weighting less sensitive?

Page 19: Boosting and Differential  Privacy

Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Qk

Combiner is median [viz. Freund92] Weight update for queries

If very well approximated by At, decrease weight by factor of e (“-1”)

If very poorly approximated by At, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up):

2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)

+

Page 20: Boosting and Differential  Privacy

Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/´2 rounds

with

¹ = ((log | Q |/´2 )2 ½ √k ) / ²

The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q .

Better privacy (small ²) gives worse utility (larger ¹) Better base learner (smaller k, larger ´) helps

Page 21: Boosting and Differential  Privacy

Proving Privacy Technique #1: Pay Your Debt and Move On

Fix A1, A2, …, At (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On”

Based on distributions Dt+1 and D’ t+1 determined in round t Will call them D, D’

Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q1, q2, …, in turn

For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq »D ln ( D[q ] / D’[q] )|

Prq1,…,qk [| i ln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2)

AB

Page 22: Boosting and Differential  Privacy

Bounding Eq »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0

· Eq » D ln[ D(q)/D’(q) ] · 2A2 (that is, B · 2A2).

KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0

So, KL(D||D’) · KL(D||D’) + KL(D’||D)= q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ]

· q 0 + |D’(q)-D(q)| A

= A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ]

· A q eA min (D(q),D’(q)) - min (D(q),D’(q))

· A q (eA – 1) min (D(q),D’(q))

· 2A2 when A < 1

Compare DiDwN03

Page 23: Boosting and Differential  Privacy

Motivation and Application Boosting for People

Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives

Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases

in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled)

Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n2/3) DNNRV09: O(n1/2 |Q |o(1)) Now: O(n1/2 log2 |Q |)

Result is more general Only know of base learner for counting queries

Page 24: Boosting and Differential  Privacy

Base Learner

S: Labeled examples from D

A1, A2, …

Update D

A

Combine A1, A2, …

Does well on ½ + ´ of D

Terminate?