Local Differential Privacy - GitHub PagesBig Picture local di erential privacy is crucial for data...

Local Differential Privacy

Peter Kairouz

Department of Electrical & Computer EngineeringUniversity of Illinois at Urbana-Champaign

Joint work with Sewoong Oh (UIUC) and Pramod Viswanath (UIUC)

1 / 33

Wireless Communication

The fundamental limits of digital communication are well understood

2 / 33

Rise of the Planet of the Apps!

3 / 33

Rise of the Planet of the Apps!

we study the fundamental trade-off between privacy and utility

3 / 33

Does Privacy Matter? [Greenwald 2014]

“If you’re doing something that you don’t want other people to know,maybe you shouldn’t be doing it in first place”

“Privacy is no longer a social norm!”4 / 33

Recent Privacy Leaks

from anonymous faces to social security numbers

5 / 33


deanonymizing Netflix data

5 / 33


deanonymizing Netflix data, identifying personal genomes, etc.

5 / 33

Privacy is a fundamental human right!

6 / 33

The Ultimate Protection

“The future of privacy is lying”

randomizing = systematic lying

7 / 33

Privacy via Plausible Deniability [Warner 1965]

“Have you ever used illegal drugs?”

say yes answer truthfully

8 / 33

Privacy via Plausible Deniability [Warner 1965]

0

1

0

1

12

1

12

Q: privatization mechanism

instead of X = x, share Y = y w.p. Q (y|x)Q : |X | × |Y| stochastic mapping

8 / 33

The Local Privacy Model [Duchi, et. al., 2012]

1X

2X

1Y

2Y

Clients

PrivatizationQ

PrivatizationQ

Data Analyst

clients receive a service if they share their data

clients do not trust data analysts

9 / 33

Inference of Information

X ∈ X = {0, 1} Y ∈ YPrivatizationQ

X : input alphabet

Y: output alphabet

Given Y = y and Q, detect whether X = 0 or X = 1

10 / 33


given Y and Q, detect whether X = 0 or X = 1

two types of error: false alarm and missed detection

S0: X = 0

S1: X = 1

Y: output alphabet

PFA = P (Y ∈ S1|X = 0) and PMD = P (Y ∈ S0|X = 1)

10 / 33


Case 1: Q1

0

1

0

1

110

110

910

910

FAP

MDP

R1

10 / 33


Case 2: Q2

0

1

0

1

310

310

710

710

FAP

MDP

R2

10 / 33


FAP

MDP

FAP

MDP

R1 R2

if R2 ⊂ R1, Q2 guarantees more privacy

10 / 33

Local Differential Privacy (Binary Data)

FAP

MDP

1

1

010 1 0.5

0.5

1

1 e

1

1 e

R(ε,δ)

11 / 33

Local Differential Privacy (Binary Data)

FAP

MDP

1

1

010 1 0.5

0.5

1

1 e

1

1 e

R(ε,δ)

RQ

Q is (ε, δ)-differentially private if RQ ⊆ R(ε,δ)

11 / 33

Local Differential Privacy

Q is ε-locally differentially private iff for all x, x′ ∈ X and y ∈ Y

e−ε ≤ Q(y|x)Q(y|x′)

≤ eε

ε controls the level of privacy

ε ↓ =⇒ more private

ε ↑ =⇒ less private

12 / 33

Local Differential PrivacyQ is ε-locally differentially private iff for all x, x′ ∈ X

PFA + eεPMD ≥ 1

eεPFA + PMD ≥ 1

FAP

MDP

Rε RQ1

1+eε

11+eε

Q is ε-DP iff RQ ⊆ Rε for all x, x′ ∈ X12 / 33

Privacy vs. Utility

the more private you want to be, the less utility you get

there is a fundamental trade-off between privacy and utility

maximizeQ

U(Q)

subject to Q ∈ Dε

U(Q): application dependent utility function

Dε: set of all ε-locally differentially private mechanisms

13 / 33

Summary of Results

Binary data: |X | = 2

The Binary Randomized Response

w.p. 11+eε lie w.p. eε

1+eε answer truthfully

optimal for all ε

optimal for all U(Q) obeying the data processing inequality

14 / 33

Summary of Resultsk-ary data: |X | = k > 2

Locally Differentially Private Mechanisms

Staircase Mechanisms

Binary Mechanism (BM)

Randomized Response (RM)

staircase mechanisms are optimal for all ε and a rich class of utilities

BM and RR are optimal in the high and low privacy regimes

14 / 33

CASE 1: BINARY DATA

15 / 33

Utility Functions

U

X ∈ XU (Q)

Y ∈ YU (QW )

Z ∈ ZQ W

utility functions obeying the data processing inequality:

U (QW ) ≤ U (Q) ≤ U

further randomization can only reduce utility

note: Q ∈ Dε =⇒ QW ∈ Dε

16 / 33

Main Result [Kairouz, et. al., 2014]

The binary randomized response is optimal

0

1

0

1

11+eε

11+eε

eε

1+eε

eε

1+eε

FAP

MDP

RQBRR= Rε1

1+eε

11+eε

=⇒ ∀U obeying the data processing inequality: U(Q) ≤ U(QRR)

17 / 33

CASE 2: k-ARY DATA

18 / 33

Information Theoretic Utility Functions

|X | = k > 2

focus on a rich set of convex utility functions

maximizeQ

U (Q)


includes all f-divergences, mutual information, etc.

19 / 33

Statistical Data Model

1 ~X P

2 ~X P

1 ~Y

M

2~

Y

M

Clients

PrivatizationQ

PrivatizationQ

Data Analyst

analyst interested in statistics of data rather than individual samples

privatized data: Yi ∼Mν(Y = y) =∑

x∈X Q(Y = y|x)Pν(X = x)

20 / 33

Hypothesis Testing

1 ~X P

2 ~X P

1 ~X

P

2~

X

P

ClientsData Analyst

Given {Xi}ni=1, detect whether ν = 0 or ν = 1

performance is a function of distance between P0 from P1

Chernoff-Stein’s lemma: PFA ≈ e−nDkl(P0||P1)

21 / 33

Private Hypothesis Testing

1 ~X P

2 ~X P

1 ~Y

M

2~

Y

M

Clients

PrivatizationQ

PrivatizationQ

Data Analystˆ {0,1}

Given {Yi}ni=1, detect whether ν = 0 or ν = 1

performance is a function of distance between M0 from M1

Chernoff-Stein’s lemma: PFA ≈ e−nDkl(M0||M1)

22 / 33

f-Divergences

A rich family of measures of differences between distributions:

maximizeQ

Df ( QP0︸︷︷︸M0

‖ QP1︸︷︷︸M1

)


KL divergence Dkl(M0||M1)

total variation ‖M0 −M1‖TV

minimax rates and error exponents

23 / 33


Recall that:Q is ε-locally differentially private iff for all x, x′ ∈ X and y ∈ Y

e−ε ≤ Q(y|x)Q(y|x′)

≤ eε

ε controls the level of privacy

ε ↓ =⇒ more private

ε ↑ =⇒ less private

24 / 33


Q is a staircase mechanism if for all x, x′ ∈ X and y ∈ Y:

Q(y|x)Q(y|x′)

∈{e−ε, 1, eε

}

Locally Differentially Private Mechanisms


Binary Mechanism (BM)

Randomized Response (RM)

24 / 33

Main Result [Kairouz, et. al., 2014]

maximizeQ

Df ( QP0︸︷︷︸M0

‖ QP1︸︷︷︸M1

) = maximizeQ

Df ( QP0︸︷︷︸M0

‖ QP1︸︷︷︸M1

)

subject to Q ∈ Dε subject to Q ∈ Sε

Sε: set of all staircase mechanisms with |Y| ≤ |X |

staircase mechanisms are optimal

no gain in larger output alphabets

there are finitely many staircase mechanisms

same result holds for a rich set of convex utility functions

25 / 33

Binary Mechanisms

y = 0

1

x = 1 2 3 4 5

eε

1+eε

11+eε

maps k-ary inputs to binary outputs

26 / 33

Binary Mechanisms

A deterministic binary mapping followed by a randomized response

0

1

0

1

11+eε

11+eε

eε

1+eε

eε

1+eε

X ∈ S =⇒ Y = 0

X ∈ Sc =⇒ Y = 1

X ∈ X Y

a highly quantized version of the original data

26 / 33

Optimality of Binary Mechanisms (High Privacy)

QB(0|x) =

{ eε

1+eε if P0(x) ≥ P1(x)1

1+eε if P0(x) < P1(x)

QB(1|x) =

{ eε

1+eε if P0(x) < P1(x)1

1+eε if P0(x) ≥ P1(x)

∀ P0,P1, ∃ ε(P0, P1) > 0 such that ∀ε ≤ ε(P0, P1), QB is optimal

27 / 33

Randomized Response

y = 1

2

3

4

x = 1 2 3 4

eε

3+eε

13+eε

maps k-ary inputs to k-ary outputs

28 / 33

Randomized Response

w.p. |X |−1|X |−1+eε lie w.p. eε

|X |−1+eε answer truthfully

lie = choose another character in X uniformly at random

can be viewed as a k-ary extension to the binary randomized response

28 / 33

Optimality of Randomized Response (Low Privacy)

QRR(y|x) =

{eε

|X |−1+eε if y = x1

|X |−1+eε if y 6= x

∀ P0,P1, ∃ ε(P0, P1) > 0 such that ∀ε ≥ ε(P0, P1), QRR is optimal

note that QRR does not depend on P0 and P1

29 / 33

Numerical Example for KL Divergence

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

binary mechanismrandomized response

Dkl(QP0‖QP1)Dkl(QoptP0‖Dkl(QoptP1)

ε

|X | = 6

100 random pairs of (P0, P1)

30 / 33

Numerical Example for KL Divergence

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

optimal mechanismbinary mechanism

randomized responsegeometric mechanism

Dkl(QP0‖QP1)Dkl(P0‖P1)

ε

|X | = 6

averaged over 100 random pairs of (P0, P1)

30 / 33

Big Picture

local differential privacy is crucial for data collection applications

we studied a broad class of information theoretic utilities

we provided explicit constructions of optimal mechanisms

more results on local privacy:

“Extremal mechanisms for local differential privacy”, NIPS 2014“The composition theorem for differential privacy”, ICML 2015“Optimality of non-interactive randomized response”, arXiv:1407.1546

31 / 33

Going Forward

private green button

private genomic data sharing

private RAPPOR

32 / 33

Thank YouQuestions?

33 / 33

Local Differential Privacy - GitHub PagesBig Picture local di erential privacy is crucial for data...

Documents

Transcript of Local Differential Privacy - GitHub PagesBig Picture local di erential privacy is crucial for data...