Gaussian Differential Privacy - Statistics Departmentsuw/paper/GDP_talk.pdf · Netﬂix ratings +...

Gaussian Differential Privacywith Applications to Deep Learning

Weijie Su

Wharton School, University of Pennsylvania

Big Brother is watching you! [1984, George Orwell]

Can anonymization preserve privacy?

The Netflix competition

• In 2006, Narayanan and Shmatikov demonstrated that

Netflix ratings + IMDb = De-anonymization

• The second Netflix competition was canceled

3 / 56Weijie on GDP

• The second Netflix competition was canceled

3 / 56Weijie on GDP

• The second Netflix competition was canceled3 / 56Weijie on GDP

Releasing summary statistics?

Genomic research often releases minor allele frequencies (MAFs), i.e., samplemean

In 2008, Homer et al shocked the genetics community by showing that MAFsare not private

4 / 56Weijie on GDP

What we lose if we give up privacy? [WSJ ’13]

Peggy Noonan: A loss of privacy is a lossof something personal and intimate

Nat Hentoff: Privacy is an Americanconstitutional liberty right

5 / 56Weijie on GDP

Is this our future?

6 / 56Weijie on GDP

From hypothesis testing to privacy

In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting

7 / 56Weijie on GDP

From hypothesis testing to privacy

In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting

7 / 56Weijie on GDP

Interpreting privacy via hypothesis testing

Two neighboring datasets:

S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}

Based on output of an algorithm, perform hypothesis testing

H0 : true dataset is S vs H1 : true dataset is S′

• Preserves privacy of Anne and Eva if hypothesis testing is difficult

• Essence in differential privacy (DP)

8 / 56Weijie on GDP

H0 : Anne in the dataset vs H1 : Eva in the dataset

8 / 56Weijie on GDP

H0 : Anne in the dataset vs H1 : Eva in the dataset

8 / 56Weijie on GDP

The impact of differential privacy

Google (Chrome), Apple (iOS 10+),Microsoft, U.S. Census Bureau [Dworkand Roth ’14; Erlingsson et al ’14; Apple DP

team ’17; Ding et al ’17; Abowd ’16]

Test of time: 2017 Gödel prize

9 / 56Weijie on GDP

What’s new in this talk?

9 / 56Weijie on GDP

Collaborators

Jinshuo Dong (Penn CS) Aaron Roth (Penn CS)

10 / 56Weijie on GDP

A new privacy notion and the old one

f-differential privacy: this talk

• Interpreting privacy via hypothesistesting

• Privacy measure: type I and II errorstrade-off

• Privacy functional parameter:f : [0, 1]→ [0, 1]

• How to achieve: adding Gaussiannoise

(ε, δ)-differential privacy: Dwork et al

• Privacy measure: worst-caselikelihood ratio

• Privacy parameters:ε > 0, 0 6 δ < 1

• How to achieve: adding Laplacenoise

A very early preview of f-DP

The zoo of differential privacy

Informativeness Composition Subsampling

(ε, δ)-DP

Divergence based DPs

Informativeness

Informative representation of privacy guarantees of algorithms?

• The two parameters in (ε, δ)-DP are notenough

• Rényi divergence is lossy (concentratedDP [Dwork, Rothblum ’16], zeroconcentrated DP [Bun, Steinke ’16],truncated concentrated DP [Bun, Dwork,Rothblum, Steinke ’18], Rényi DP [Mironov’17])

Composition

How does the overall privacy degrade under a sequence of private algorithms?

• The advanced composition theorem in(ε, δ)-DP is not accurate

Subsampling

How is privacy amplified with subsampling?

• Privacy amplification by subsampling iseither inaccurate or complicated indivergence based DPs

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

Trade-off functions

H0 : true dataset is S vs H1 : true dataset is S′

H0 : P vs H1 : Q

For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)

DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as

T (P,Q)(α) = infφ{βφ : αφ 6 α}

• The Neyman–Pearson lemma

• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].

Trade-off functions

H0 : P vs H1 : Q

Trade-off functions

H0 : P vs H1 : Q

Trade-off functions

H0 : P vs H1 : Q

Definition of f-DP

DefinitionA (randomized) algorithmM is said to be f-differentially private if

T(M(S),M(S′)

for all neighboring datasets S and S′

• Randomness ofM(S),M(S′) is from the algorithmM

• Distinguishing between Anne and Eva is no easier than telling apart P andQ if f = T (P,Q)

• Related to hypothesis testing region [Kairouz et al ’17]

When is it f-DP?

0.0 0.2 0.4 0.6 0.8 1.0

type I error

Is f -DP

Not f -DP

Symmetrization

Type I and type II errors in privacy are symmetric. How about f?

Proposition

LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}

• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}

• f symm is a trade-off function

Symmetrization

Type I and type II errors in privacy are symmetric. How about f?

Proposition

LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}

• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}

• f symm is a trade-off function

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

indistinguishable

�intercept = �

slope = �e"

f",�

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

0.0 0.2 0.4 0.6 0.8 1.0

type I error

indistinguishable

�intercept = �

slope = �e"

f",�

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

0.0 0.2 0.4 0.6 0.8 1.0

type I error

indistinguishable

�intercept = �

slope = �e"

f",�

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

0.0 0.2 0.4 0.6 0.8 1.0

type I error

indistinguishable

�intercept = �

slope = �e"

f",�

Happy with (ε, δ)-DP?

• 4 segments. A bit adhoc?

• w.p. δ, very bad eventscan happen

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

0.0 0.2 0.4 0.6 0.8 1.0

type I error

indistinguishable

�intercept = �

slope = �e"

f",�

Happy with (ε, δ)-DP?

• 4 segments. A bit adhoc?

• w.p. δ, very bad eventscan happen

A primal-dual perspective

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0type I error

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

Proposition

Let I be an index set. An algorithm satisfies the collection of (εi, δi)-DP for alli ∈ I if and only if it is f-DP with f(α) = supi∈I fεi,δi(α)

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

From primal to dual

Proposition

An algorithm is f-DP if and only if it is(ε, δ(ε)

)-DP for all ε > 0 with

δ(ε) = 1 + f∗(−eε)

• Conjugate f∗(y) = supx∈[0,1] yx− f(x)

• f-DP is equivalent to an infinite collection of (ε, δ)-DP

• Which domain is more convenient? A powerful tool

From primal to dual

Proposition

An algorithm is f-DP if and only if it is(ε, δ(ε)

)-DP for all ε > 0 with

δ(ε) = 1 + f∗(−eε)

• Conjugate f∗(y) = supx∈[0,1] yx− f(x)

• f-DP is equivalent to an infinite collection of (ε, δ)-DP

• Which domain is more convenient? A powerful tool

Is f too general? Let’s focus!

Gaussian differential privacy (GDP)

Consider Gaussian trade-off function

Gµ := T(N (0, 1),N (µ, 1)

)for µ > 0. Explicitly, Gµ(α) = Φ

(Φ−1(1− α)− µ

)DefinitionAn algorithmM is said to be µ-GDP if

T(M(S),M(S′)

)> Gµ

• Focal to f-DP (a central limit theorem phenomenon)

Gaussian differential privacy (GDP)

Consider Gaussian trade-off function

Gµ := T(N (0, 1),N (µ, 1)

)for µ > 0. Explicitly, Gµ(α) = Φ

(Φ−1(1− α)− µ

)DefinitionAn algorithmM is said to be µ-GDP if

T(M(S),M(S′)

)> Gµ

• Focal to f-DP (a central limit theorem phenomenon)

How to interpt µ in GDP?

0.0 0.2 0.4 0.6 0.8 1.00.0

1.0 �intercept = �

slope = �e"

f",�

0.0 0.2 0.4 0.6 0.8 1.00.0

1.0G0.5

• Privacy amounts to distinguishing betweenN (0, 1) andN (µ, 1)

• µ 6 0.5: reasonably private; µ > 6: blatantly non-private

Mask individual characteristics via adding noise

Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|

TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ

• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP

• Lighter tail than Laplace noise

Outline

Blackwell’s theorem

We say (P ′, Q′) is Blackwell harder to distinguish than (P,Q) if there is a postprocessing such that P ′ = Proc(P ), Q′ = Proc(Q)

Theorem (Blackwell ’50)The following two statements are equivalent:

(a) T (P ′, Q′) > T (P,Q)

(b) (P ′, Q′) is Blackwell harder to distinguish than (P,Q)

• Trade-off functions lead to the equivalent ordering as post processing

Rényi divergence is not as informative

Rényi divergence of order γ

Rγ(P‖Q) :=1

γ − 1logEQ

)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke

’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence

Proposition

Let Pε = Bern( eε

1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:

(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)

)(b) Using total variation, dTV(Pε, Qε) > dTV

(N (0, 1),N (ε, 1)

)• No such a phenomenon for trade-off functions

• Similar examples exist for (ε, δ)-DP

Rγ(P‖Q) :=1

γ − 1logEQ

Proposition

Let Pε = Bern( eε

(N (0, 1),N (ε, 1)

• No such a phenomenon for trade-off functions

Rγ(P‖Q) :=1

γ − 1logEQ

Proposition

Let Pε = Bern( eε

(N (0, 1),N (ε, 1)

)• No such a phenomenon for trade-off functions

Properties f-DP

• Informative representation of privacy

Outline

What is composition?

Composition surely leads to aprivacy compromise. But how fast?

Definition of composition

LetM1 : X → Y1 andM2 : X × Y1 → Y2 be private algorithms. Define theircompositionM : X → Y1 × Y2 as

M(S) = (M1(S),M2(S,M1(S)))

Given a sequence of algorithmsMi : X × Y1 × · · · × Yi−1 → Yi for i 6 k,recursively define the composition:

M : X → Y1 × · · · × Yk

Tensor product of trade-off functions

Definition

The tensor product of two trade-off functions f = T (P,Q) and g = T (P ′, Q′) isdefined as

f ⊗ g := T (P × P ′, Q×Q′)

• Well-defined

• The operator ⊗ is commutative and associative

• For GDP, Gµ1⊗Gµ2

⊗ · · · ⊗Gµk = Gµ, where µ =√µ21 + · · ·+ µ2

Composition is an algebra

TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP

• Cannot be improved in general

• Composition in f-DP is reduced to algebra

• k-step composition of µ-GDP algorithms is√kµ-GDP

Composition is an algebra

TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP

• Cannot be improved in general

• Composition in f-DP is reduced to algebra

• k-step composition of µ-GDP algorithms is√kµ-GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

Theorem

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

Theorem

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

Theorem

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

Theorem

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

In CLT we trust

10-fold composition of (1/√

10, 0)-DP. δ = 0.001 in green curve

Properties of f-DP

• Algebraically convenient and tight composition operations

Outline

What is subsampling for privacy?

Given dataset S, apply the algorithmM on a subsampled dataset sub(S),resulting a new algorithmM◦ sub(S)

• Subsampling provides stronger privacy guarantees than when run on thewhole dataset

• A frequently used tool for amplifying privacy

Subsampling theorem for f-DP

subm uniformly picks an m-sized subset from S. Let p := m/n

p-sampling operator Cp acting on trade-off functions

Cp(f) := Conv(

min{fp, f−1p })

= min{fp, f−1p }∗∗

• fp = pf + (1− p)Id, with Id(α) = 1− α

• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)

TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight

• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]

Subsampling theorem for f-DP

subm uniformly picks an m-sized subset from S. Let p := m/n

p-sampling operator Cp acting on trade-off functions

Cp(f) := Conv(

min{fp, f−1p })

= min{fp, f−1p }∗∗

• fp = pf + (1− p)Id, with Id(α) = 1− α

• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)

TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight

• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]

Numerical examples

0 x∗ fp(x∗) 1

fp(x∗)

f−1p

0.0 0.2 0.4 0.6 0.8 1.00.0

1.0fε,δ

fε′,δ′

Cp(fε,δ)

Left: f = G1.8, p = 0.35. Right: ε = 3, δ = 0.1, p = 0.2

Our gain

Properties of f-DP

• Algebraically convenient and tight composition operations

• Sharp privacy amplification via subsampling

Outline

Collaborators

Zhiqi Bu (Wharton) Jinshuo Dong (Penn CS) Qi Long (Penn Biostatistics)

Privacy concerns in deep learningPrivacy Issues of Training Data

Dataset Server Model

• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]

Privacy concerns in deep learningPrivacy Issues of Training Data

• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]

Private deep learning

Reproducing results of Google Brain

0 10 20 30 40 50 60 70epochs

Medium noise: = 0.7

Private test accuracyNon-private test accuracy

Experiments on MNIST [Abadi et al ’16]

Can the f-DP framework improveprivacy analysis and prediction accuracy?

Composition and subsampling via f-DP

SGD equationθt+1 = SGD ◦ sub(S; θt)

LemmaSGD with noise sampled fromN (0, 4C2/(B2µ2)Id) is µ-GDP

Thus, we get

TheoremThe SGD mechanismM(S) = (θ1, θ2, . . . , θm) is

CB/n(Gµ)⊗ CB/n(Gµ)⊗ · · · ⊗ CB/n(Gµ)︸︷︷︸T

which is, asymptotically, µ̃-GDP with

µ̃ =B

√2m(eµ2Φ(3µ/2) + 3Φ(−µ/2)− 2)

Tighter privacy analysis via f-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

95.0% accuracy in 15 epochs, σ = 1.3

0.285-GDP by CLT

(1.19,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.737-GDP by CLT

(3.01,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

1.554-GDP by CLT

(7.1,1.0e-5)-DP by MA

Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]

Holography for privacy?

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.285-GDP by CLT

(1.19,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.737-GDP by CLT

(3.01,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

1.554-GDP by CLT

(7.1,1.0e-5)-DP by MA

Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]

Privacy guarantees in (ε, δ)-DP

0 5 10 15 20 25 30epochs

Medium noise: = 0.7

GDP-DP-

Experiments on MNIST, with δ = 10−7

Same ε and δ, but less noise

0 5 10 15 20 25 30epochs

0.70= 0.7: necessary noise by GDP

Noise added by -DPNoise added by GDP

0 5 10 15 20 25 30epochs

= 0.7: accuracy with only necessary noise by GDP

-DP Private test accuracyNon-private test accuracyGDP Private test accuracy

More privacy and higher accuracyPrivacy Issues of Training Data

Privacy w/ Better Accuracy

Concluding remarks

Summary

Informativeness Composition Subsampling

(ε, δ)-DP

Divergence based DPs

In the f-DP framework

• Trade-off functions are informative for privacy loss

• Composition is equivalent to tensor product

• Subsampling is to average and convexify

Take-home messages

• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383

• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607

NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund

The Return of the King

• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383

• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607

NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund

Gaussian Differential Privacy - Statistics Departmentsuw/paper/GDP_talk.pdf · Netﬂix ratings +...

Documents

Transcript of Gaussian Differential Privacy - Statistics Departmentsuw/paper/GDP_talk.pdf · Netﬂix ratings +...

Imdb Com Chart Top

IMDb Your Watchlist

imdb supplement

TimesTen IMDB

Imdb work amad

imdb supplement final

Imdb the thicket

FYP Poster (Poh Weijie)

Imdb blank template jack

Imdb - twenty three

Netflix Architecture - University of Michigansugih/courses/eecs589/f16/15-Netflix.pdf · Netflix Architecture Why study Netflix? Sandvine June 2016 Global Internet Phenomena Report:

Hana1 Imdb Backup En

IMDb Splunk Analysis

IMDB scandal

Big & Personal: data and models behind Netflix ...Big & Personal: data and models behind Netflix recommendations Xavier Amatriain Netflix xavier@netflix.com ABSTRACT Since the

IMDb - GA Project 3 - IMDb movie ticket booking feature

Brie Larson - IMDb

Alice imdb page

Logistics how to move things Hannes Blaas LIU Weijie Alexis Guido Hannes Blaas LIU Weijie Alexis Guido.

Imdb Film List