Gaussian Differential Privacy - Statistics Departmentsuw/paper/GDP_talk.pdf · Netflix ratings +...

Post on 17-Oct-2020

1 views 0 download

Transcript of Gaussian Differential Privacy - Statistics Departmentsuw/paper/GDP_talk.pdf · Netflix ratings +...

Gaussian Differential Privacywith Applications to Deep Learning

Weijie Su

Wharton School, University of Pennsylvania

Big Brother is watching you! [1984, George Orwell]

Can anonymization preserve privacy?

The Netflix competition

• In 2006, Narayanan and Shmatikov demonstrated that

Netflix ratings + IMDb = De-anonymization

• The second Netflix competition was canceled

3 / 56Weijie on GDP

Can anonymization preserve privacy?

The Netflix competition

• In 2006, Narayanan and Shmatikov demonstrated that

Netflix ratings + IMDb = De-anonymization

• The second Netflix competition was canceled

3 / 56Weijie on GDP

Can anonymization preserve privacy?

The Netflix competition

• In 2006, Narayanan and Shmatikov demonstrated that

Netflix ratings + IMDb = De-anonymization

• The second Netflix competition was canceled3 / 56Weijie on GDP

Releasing summary statistics?

Genomic research often releases minor allele frequencies (MAFs), i.e., samplemean

In 2008, Homer et al shocked the genetics community by showing that MAFsare not private

4 / 56Weijie on GDP

What we lose if we give up privacy? [WSJ ’13]

Peggy Noonan: A loss of privacy is a lossof something personal and intimate

Nat Hentoff: Privacy is an Americanconstitutional liberty right

5 / 56Weijie on GDP

Is this our future?

6 / 56Weijie on GDP

From hypothesis testing to privacy

In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting

7 / 56Weijie on GDP

From hypothesis testing to privacy

In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting

7 / 56Weijie on GDP

Interpreting privacy via hypothesis testing

Two neighboring datasets:

S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}

Based on output of an algorithm, perform hypothesis testing

H0 : true dataset is S vs H1 : true dataset is S′

• Preserves privacy of Anne and Eva if hypothesis testing is difficult

• Essence in differential privacy (DP)

8 / 56Weijie on GDP

Interpreting privacy via hypothesis testing

Two neighboring datasets:

S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}

Based on output of an algorithm, perform hypothesis testing

H0 : Anne in the dataset vs H1 : Eva in the dataset

• Preserves privacy of Anne and Eva if hypothesis testing is difficult

• Essence in differential privacy (DP)

8 / 56Weijie on GDP

Interpreting privacy via hypothesis testing

Two neighboring datasets:

S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}

Based on output of an algorithm, perform hypothesis testing

H0 : Anne in the dataset vs H1 : Eva in the dataset

• Preserves privacy of Anne and Eva if hypothesis testing is difficult

• Essence in differential privacy (DP)

8 / 56Weijie on GDP

The impact of differential privacy

Google (Chrome), Apple (iOS 10+),Microsoft, U.S. Census Bureau [Dworkand Roth ’14; Erlingsson et al ’14; Apple DP

team ’17; Ding et al ’17; Abowd ’16]

Test of time: 2017 Gödel prize

9 / 56Weijie on GDP

What’s new in this talk?

9 / 56Weijie on GDP

Collaborators

Jinshuo Dong (Penn CS) Aaron Roth (Penn CS)

10 / 56Weijie on GDP

A new privacy notion and the old one

f-differential privacy: this talk

• Interpreting privacy via hypothesistesting

• Privacy measure: type I and II errorstrade-off

• Privacy functional parameter:f : [0, 1]→ [0, 1]

• How to achieve: adding Gaussiannoise

(ε, δ)-differential privacy: Dwork et al

• Interpreting privacy via hypothesistesting

• Privacy measure: worst-caselikelihood ratio

• Privacy parameters:ε > 0, 0 6 δ < 1

• How to achieve: adding Laplacenoise

11 / 56Weijie on GDP

A new privacy notion and the old one

f-differential privacy: this talk

• Interpreting privacy via hypothesistesting

• Privacy measure: type I and II errorstrade-off

• Privacy functional parameter:f : [0, 1]→ [0, 1]

• How to achieve: adding Gaussiannoise

(ε, δ)-differential privacy: Dwork et al

• Interpreting privacy via hypothesistesting

• Privacy measure: worst-caselikelihood ratio

• Privacy parameters:ε > 0, 0 6 δ < 1

• How to achieve: adding Laplacenoise

11 / 56Weijie on GDP

A new privacy notion and the old one

f-differential privacy: this talk

• Interpreting privacy via hypothesistesting

• Privacy measure: type I and II errorstrade-off

• Privacy functional parameter:f : [0, 1]→ [0, 1]

• How to achieve: adding Gaussiannoise

(ε, δ)-differential privacy: Dwork et al

• Interpreting privacy via hypothesistesting

• Privacy measure: worst-caselikelihood ratio

• Privacy parameters:ε > 0, 0 6 δ < 1

• How to achieve: adding Laplacenoise

11 / 56Weijie on GDP

A new privacy notion and the old one

f-differential privacy: this talk

• Interpreting privacy via hypothesistesting

• Privacy measure: type I and II errorstrade-off

• Privacy functional parameter:f : [0, 1]→ [0, 1]

• How to achieve: adding Gaussiannoise

(ε, δ)-differential privacy: Dwork et al

• Interpreting privacy via hypothesistesting

• Privacy measure: worst-caselikelihood ratio

• Privacy parameters:ε > 0, 0 6 δ < 1

• How to achieve: adding Laplacenoise

11 / 56Weijie on GDP

A very early preview of f-DP

11 / 56Weijie on GDP

The zoo of differential privacy

Informativeness Composition Subsampling

ε-DP

(ε, δ)-DP

Divergence based DPs

f-DP

12 / 56Weijie on GDP

Informativeness

Informative representation of privacy guarantees of algorithms?

• The two parameters in (ε, δ)-DP are notenough

• Rényi divergence is lossy (concentratedDP [Dwork, Rothblum ’16], zeroconcentrated DP [Bun, Steinke ’16],truncated concentrated DP [Bun, Dwork,Rothblum, Steinke ’18], Rényi DP [Mironov’17])

13 / 56Weijie on GDP

Composition

How does the overall privacy degrade under a sequence of private algorithms?

• The advanced composition theorem in(ε, δ)-DP is not accurate

14 / 56Weijie on GDP

Subsampling

How is privacy amplified with subsampling?

• Privacy amplification by subsampling iseither inaccurate or complicated indivergence based DPs

15 / 56Weijie on GDP

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

16 / 56Weijie on GDP

Trade-off functions

H0 : true dataset is S vs H1 : true dataset is S′

H0 : P vs H1 : Q

For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)

DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as

T (P,Q)(α) = infφ{βφ : αφ 6 α}

• The Neyman–Pearson lemma

• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].

17 / 56Weijie on GDP

Trade-off functions

H0 : P vs H1 : Q

For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)

DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as

T (P,Q)(α) = infφ{βφ : αφ 6 α}

• The Neyman–Pearson lemma

• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].

17 / 56Weijie on GDP

Trade-off functions

H0 : P vs H1 : Q

For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)

DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as

T (P,Q)(α) = infφ{βφ : αφ 6 α}

• The Neyman–Pearson lemma

• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].

17 / 56Weijie on GDP

Trade-off functions

H0 : P vs H1 : Q

For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)

DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as

T (P,Q)(α) = infφ{βφ : αφ 6 α}

• The Neyman–Pearson lemma

• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].

17 / 56Weijie on GDP

Definition of f-DP

DefinitionA (randomized) algorithmM is said to be f-differentially private if

T(M(S),M(S′)

)> f

for all neighboring datasets S and S′

• Randomness ofM(S),M(S′) is from the algorithmM

• Distinguishing between Anne and Eva is no easier than telling apart P andQ if f = T (P,Q)

• Related to hypothesis testing region [Kairouz et al ’17]

18 / 56Weijie on GDP

When is it f-DP?

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

f

Is f -DP

Not f -DP

Not f -DP

19 / 56Weijie on GDP

Symmetrization

Type I and type II errors in privacy are symmetric. How about f?

Proposition

LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}

• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}

• f symm is a trade-off function

20 / 56Weijie on GDP

Symmetrization

Type I and type II errors in privacy are symmetric. How about f?

Proposition

LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}

• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}

• f symm is a trade-off function

20 / 56Weijie on GDP

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

indistinguishable

�intercept = �

slope = �e"

f",�

21 / 56Weijie on GDP

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

indistinguishable

�intercept = �

slope = �e"

f",�

21 / 56Weijie on GDP

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

indistinguishable

�intercept = �

slope = �e"

f",�

21 / 56Weijie on GDP

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

indistinguishable

�intercept = �

slope = �e"

f",�

21 / 56Weijie on GDP

Happy with (ε, δ)-DP?

• 4 segments. A bit adhoc?

• w.p. δ, very bad eventscan happen

Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E

P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ

Proposition (Wasserman and Zhou ’10)

Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

IIer

ror

indistinguishable

�intercept = �

slope = �e"

f",�

21 / 56Weijie on GDP

Happy with (ε, δ)-DP?

• 4 segments. A bit adhoc?

• w.p. δ, very bad eventscan happen

A primal-dual perspective

21 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

II e

rror

f

22 / 56Weijie on GDP

From dual to primal

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

Proposition

Let I be an index set. An algorithm satisfies the collection of (εi, δi)-DP for alli ∈ I if and only if it is f-DP with f(α) = supi∈I fεi,δi(α)

23 / 56Weijie on GDP

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

II e

rror

f

24 / 56Weijie on GDP

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

II e

rror

f

24 / 56Weijie on GDP

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

II e

rror

f

24 / 56Weijie on GDP

From primal to dual

0.0 0.2 0.4 0.6 0.8 1.0type I error

0.0

0.2

0.4

0.6

0.8

1.0

type

II e

rror

f

24 / 56Weijie on GDP

From primal to dual

Proposition

An algorithm is f-DP if and only if it is(ε, δ(ε)

)-DP for all ε > 0 with

δ(ε) = 1 + f∗(−eε)

• Conjugate f∗(y) = supx∈[0,1] yx− f(x)

• f-DP is equivalent to an infinite collection of (ε, δ)-DP

• Which domain is more convenient? A powerful tool

25 / 56Weijie on GDP

From primal to dual

Proposition

An algorithm is f-DP if and only if it is(ε, δ(ε)

)-DP for all ε > 0 with

δ(ε) = 1 + f∗(−eε)

• Conjugate f∗(y) = supx∈[0,1] yx− f(x)

• f-DP is equivalent to an infinite collection of (ε, δ)-DP

• Which domain is more convenient? A powerful tool

25 / 56Weijie on GDP

Is f too general? Let’s focus!

25 / 56Weijie on GDP

Gaussian differential privacy (GDP)

Consider Gaussian trade-off function

Gµ := T(N (0, 1),N (µ, 1)

)for µ > 0. Explicitly, Gµ(α) = Φ

(Φ−1(1− α)− µ

)DefinitionAn algorithmM is said to be µ-GDP if

T(M(S),M(S′)

)> Gµ

for all neighboring datasets S and S′

• Focal to f-DP (a central limit theorem phenomenon)

26 / 56Weijie on GDP

Gaussian differential privacy (GDP)

Consider Gaussian trade-off function

Gµ := T(N (0, 1),N (µ, 1)

)for µ > 0. Explicitly, Gµ(α) = Φ

(Φ−1(1− α)− µ

)DefinitionAn algorithmM is said to be µ-GDP if

T(M(S),M(S′)

)> Gµ

for all neighboring datasets S and S′

• Focal to f-DP (a central limit theorem phenomenon)

26 / 56Weijie on GDP

How to interpt µ in GDP?

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0 �intercept = �

slope = �e"

f",�

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0G0.5

G1

G3

G6

• Privacy amounts to distinguishing betweenN (0, 1) andN (µ, 1)

• µ 6 0.5: reasonably private; µ > 6: blatantly non-private

27 / 56Weijie on GDP

Mask individual characteristics via adding noise

Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|

TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ

• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP

• Lighter tail than Laplace noise

29 / 56Weijie on GDP

Mask individual characteristics via adding noise

Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|

TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ

• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP

• Lighter tail than Laplace noise

29 / 56Weijie on GDP

Mask individual characteristics via adding noise

Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|

TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ

• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP

• Lighter tail than Laplace noise

29 / 56Weijie on GDP

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

30 / 56Weijie on GDP

Blackwell’s theorem

We say (P ′, Q′) is Blackwell harder to distinguish than (P,Q) if there is a postprocessing such that P ′ = Proc(P ), Q′ = Proc(Q)

Theorem (Blackwell ’50)The following two statements are equivalent:

(a) T (P ′, Q′) > T (P,Q)

(b) (P ′, Q′) is Blackwell harder to distinguish than (P,Q)

• Trade-off functions lead to the equivalent ordering as post processing

31 / 56Weijie on GDP

Rényi divergence is not as informative

Rényi divergence of order γ

Rγ(P‖Q) :=1

γ − 1logEQ

(dP

dQ

)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke

’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence

Proposition

Let Pε = Bern( eε

1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:

(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)

)(b) Using total variation, dTV(Pε, Qε) > dTV

(N (0, 1),N (ε, 1)

)• No such a phenomenon for trade-off functions

• Similar examples exist for (ε, δ)-DP

32 / 56Weijie on GDP

Rényi divergence is not as informative

Rényi divergence of order γ

Rγ(P‖Q) :=1

γ − 1logEQ

(dP

dQ

)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke

’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence

Proposition

Let Pε = Bern( eε

1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:

(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)

)(b) Using total variation, dTV(Pε, Qε) > dTV

(N (0, 1),N (ε, 1)

)

• No such a phenomenon for trade-off functions

• Similar examples exist for (ε, δ)-DP

32 / 56Weijie on GDP

Rényi divergence is not as informative

Rényi divergence of order γ

Rγ(P‖Q) :=1

γ − 1logEQ

(dP

dQ

)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke

’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence

Proposition

Let Pε = Bern( eε

1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:

(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)

)(b) Using total variation, dTV(Pε, Qε) > dTV

(N (0, 1),N (ε, 1)

)• No such a phenomenon for trade-off functions

• Similar examples exist for (ε, δ)-DP

32 / 56Weijie on GDP

Properties f-DP

• Informative representation of privacy

32 / 56Weijie on GDP

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

33 / 56Weijie on GDP

What is composition?

34 / 56Weijie on GDP

What is composition?

34 / 56Weijie on GDP

What is composition?

34 / 56Weijie on GDP

Composition surely leads to aprivacy compromise. But how fast?

Definition of composition

LetM1 : X → Y1 andM2 : X × Y1 → Y2 be private algorithms. Define theircompositionM : X → Y1 × Y2 as

M(S) = (M1(S),M2(S,M1(S)))

Given a sequence of algorithmsMi : X × Y1 × · · · × Yi−1 → Yi for i 6 k,recursively define the composition:

M : X → Y1 × · · · × Yk

35 / 56Weijie on GDP

Tensor product of trade-off functions

Definition

The tensor product of two trade-off functions f = T (P,Q) and g = T (P ′, Q′) isdefined as

f ⊗ g := T (P × P ′, Q×Q′)

• Well-defined

• The operator ⊗ is commutative and associative

• For GDP, Gµ1⊗Gµ2

⊗ · · · ⊗Gµk = Gµ, where µ =√µ21 + · · ·+ µ2

k

36 / 56Weijie on GDP

Composition is an algebra

TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP

• Cannot be improved in general

• Composition in f-DP is reduced to algebra

• k-step composition of µ-GDP algorithms is√kµ-GDP

37 / 56Weijie on GDP

Composition is an algebra

TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP

• Cannot be improved in general

• Composition in f-DP is reduced to algebra

• k-step composition of µ-GDP algorithms is√kµ-GDP

37 / 56Weijie on GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

38 / 56Weijie on GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

38 / 56Weijie on GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

38 / 56Weijie on GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

38 / 56Weijie on GDP

Central limit theorem for f-DP

Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/

√k) close to perfect privacy. Then

limk→∞

fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ

• The convergence is uniform on [0, 1]

• µ can be computed from {fki}

• IfMki is fki-DP, their composition is approximately µ-GDP

• An effective approximation tool

• GDP is to f-DP as Gaussian variables (rvs) to general rvs

• Proof follows from Le Cam’s third lemma

38 / 56Weijie on GDP

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

(α+

c

k

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

k

)+c

k

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

39 / 56Weijie on GDP

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

(α+

c

k

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

k

)+c

k

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

39 / 56Weijie on GDP

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

(α+

c

k

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

k

)+c

k

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

39 / 56Weijie on GDP

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

(α+

c

k

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

k

)+c

k

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

39 / 56Weijie on GDP

Central limit theorem for ε-DP

Theorem

Fix µ > 0 and assume ε =√µ/k. Then

(α+

c

k

)− c

k6 f⊗kε,0 (α) 6 Gµ

(α− c

k

)+c

k

• Local computation is #P-complete [Murtagh and Vadhan ’16]

• Sharper than the O(1/√k) bound in Berry–Esseen

Privacy CLT Beats Berry–Esseen for ε-DP! Why?

• Due to randomization of rejection rules, leading to continuity of trade-offfunctions

39 / 56Weijie on GDP

In CLT we trust

10-fold composition of (1/√

10, 0)-DP. δ = 0.001 in green curve

40 / 56Weijie on GDP

Properties of f-DP

• Informative representation of privacy

• Algebraically convenient and tight composition operations

40 / 56Weijie on GDP

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

41 / 56Weijie on GDP

What is subsampling for privacy?

Given dataset S, apply the algorithmM on a subsampled dataset sub(S),resulting a new algorithmM◦ sub(S)

• Subsampling provides stronger privacy guarantees than when run on thewhole dataset

• A frequently used tool for amplifying privacy

42 / 56Weijie on GDP

Subsampling theorem for f-DP

subm uniformly picks an m-sized subset from S. Let p := m/n

p-sampling operator Cp acting on trade-off functions

Cp(f) := Conv(

min{fp, f−1p })

= min{fp, f−1p }∗∗

• fp = pf + (1− p)Id, with Id(α) = 1− α

• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)

TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight

• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]

43 / 56Weijie on GDP

Subsampling theorem for f-DP

subm uniformly picks an m-sized subset from S. Let p := m/n

p-sampling operator Cp acting on trade-off functions

Cp(f) := Conv(

min{fp, f−1p })

= min{fp, f−1p }∗∗

• fp = pf + (1− p)Id, with Id(α) = 1− α

• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)

TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight

• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]

43 / 56Weijie on GDP

Numerical examples

0 x∗ fp(x∗) 1

0

x∗

fp(x∗)

1

f

fp

f−1p

Cp(f)

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0fε,δ

fε′,δ′

Cp(fε,δ)

Left: f = G1.8, p = 0.35. Right: ε = 3, δ = 0.1, p = 0.2

44 / 56Weijie on GDP

Our gain

Properties of f-DP

• Informative representation of privacy

• Algebraically convenient and tight composition operations

• Sharp privacy amplification via subsampling

44 / 56Weijie on GDP

Outline

1. Introduction of f-DP

2. Informative representation of privacy

3. Composition and central limit theorems

4. Amplifying privacy via subsampling

5. Application to deep learning

45 / 56Weijie on GDP

Collaborators

Zhiqi Bu (Wharton) Jinshuo Dong (Penn CS) Qi Long (Penn Biostatistics)

46 / 56Weijie on GDP

Privacy concerns in deep learningPrivacy Issues of Training Data

Dataset Server Model

• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]

47 / 56Weijie on GDP

Privacy concerns in deep learningPrivacy Issues of Training Data

Dataset Server Model

• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]

47 / 56Weijie on GDP

Private deep learning

48 / 56Weijie on GDP

Reproducing results of Google Brain

0 10 20 30 40 50 60 70epochs

0.70

0.75

0.80

0.85

0.90

0.95

1.00

accu

racy

Medium noise: = 0.7

Private test accuracyNon-private test accuracy

Experiments on MNIST [Abadi et al ’16]

49 / 56Weijie on GDP

Can the f-DP framework improveprivacy analysis and prediction accuracy?

49 / 56Weijie on GDP

Composition and subsampling via f-DP

SGD equationθt+1 = SGD ◦ sub(S; θt)

LemmaSGD with noise sampled fromN (0, 4C2/(B2µ2)Id) is µ-GDP

Thus, we get

TheoremThe SGD mechanismM(S) = (θ1, θ2, . . . , θm) is

CB/n(Gµ)⊗ CB/n(Gµ)⊗ · · · ⊗ CB/n(Gµ)︸ ︷︷ ︸T

-DP,

which is, asymptotically, µ̃-GDP with

µ̃ =B

n

√2m(eµ2Φ(3µ/2) + 3Φ(−µ/2)− 2)

50 / 56Weijie on GDP

Tighter privacy analysis via f-DP

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

95.0% accuracy in 15 epochs, σ = 1.3

0.285-GDP by CLT

(1.19,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

96.6% accuracy in 60 epochs, σ = 1.1

0.737-GDP by CLT

(3.01,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

97.0% accuracy in 45 epochs, σ = 0.7

1.554-GDP by CLT

(7.1,1.0e-5)-DP by MA

Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]

51 / 56Weijie on GDP

Holography for privacy?

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

95.0% accuracy in 15 epochs, σ = 1.3

0.285-GDP by CLT

(1.19,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

96.6% accuracy in 60 epochs, σ = 1.1

0.737-GDP by CLT

(3.01,1.0e-5)-DP by MA

0.0 0.2 0.4 0.6 0.8 1.0

type I error

0.0

0.2

0.4

0.6

0.8

1.0

typ

eII

erro

r

97.0% accuracy in 45 epochs, σ = 0.7

1.554-GDP by CLT

(7.1,1.0e-5)-DP by MA

Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]

51 / 56Weijie on GDP

Privacy guarantees in (ε, δ)-DP

0 5 10 15 20 25 30epochs

1

2

3

4

5

6

7

Medium noise: = 0.7

GDP-DP-

Experiments on MNIST, with δ = 10−7

52 / 56Weijie on GDP

Same ε and δ, but less noise

0 5 10 15 20 25 30epochs

0.50

0.55

0.60

0.65

0.70= 0.7: necessary noise by GDP

Noise added by -DPNoise added by GDP

0 5 10 15 20 25 30epochs

0.70

0.75

0.80

0.85

0.90

0.95

1.00

accu

racy

= 0.7: accuracy with only necessary noise by GDP

-DP Private test accuracyNon-private test accuracyGDP Private test accuracy

53 / 56Weijie on GDP

More privacy and higher accuracyPrivacy Issues of Training Data

Dataset Server Model

Privacy w/ Better Accuracy

54 / 56Weijie on GDP

Concluding remarks

54 / 56Weijie on GDP

Summary

Informativeness Composition Subsampling

ε-DP

(ε, δ)-DP

Divergence based DPs

f-DP

In the f-DP framework

• Trade-off functions are informative for privacy loss

• Composition is equivalent to tensor product

• Subsampling is to average and convexify

55 / 56Weijie on GDP

Take-home messages

• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383

• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607

NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund

56 / 56Weijie on GDP

The Return of the King

• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383

• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607

NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund

56 / 56Weijie on GDP