Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...

33
Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research) 1 iDASH Privacy Workshop 9/29/2012

Transcript of Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...

Page 1: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Pufferfish: A Semantic Approach to Customizable Privacy

Ashwin Machanavajjhala

ashwin AT cs.duke.edu

Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research)

1 iDASH Privacy Workshop 9/29/2012

Page 2: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘13]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 2

Page 3: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Data Privacy Problem

6

Individual 1

r1

Individual 2

r2

Individual 3

r3

Individual N

rN

Server

DB

Utility: Privacy: No breach about any individual

iDASH Privacy Workshop 9/29/2012

Page 4: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Data Privacy in the real world

iDASH Privacy Workshop 9/29/2012 7

Application Data Collector Third Party (adversary)

Private Information

Function (utility)

Medical Hospital Epidemiologist Disease Correlation between disease and geography

Genome analysis

Hospital Statistician/ Researcher

Genome Correlation between genome and disease

Advertising Google/FB/Y! Advertiser Clicks/Browsing

Number of clicks on an ad by age/region/gender …

Social Recommen-

dations

Facebook Another user Friend links / profile

Recommend other users or ads to users based on

social network

Page 5: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

T-closeness

Li et. al ICDE ‘07

K-Anonymity

Sweeney et al. IJUFKS ‘02

Many definitions

• Linkage attack

• Background knowledge attack

• Minimality /Reconstruction attack

• de Finetti attack

• Composition attack

iDASH Privacy Workshop 9/29/2012 10

L-diversity

Machanavajjhala et. al TKDD ‘07

E-Privacy

Machanavajjhala et. al VLDB ‘09

& several attacks

Differential Privacy

Dwork et. al ICALP ‘06

Page 6: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Differential Privacy

• Domain independent privacy definition that is independent of the attacker.

• Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks

– Claimed to be tolerant against adversaries with arbitrary background knowledge.

• Allows simple, efficient and useful privacy mechanisms – Used in a live US Census Product [M et al ICDE ‘08]

iDASH Privacy Workshop 9/29/2012 13

Page 7: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

No Free Lunch Theorem

It is not possible to guarantee any utility in addition to privacy, without making assumptions about

• the data generating distribution

• the background knowledge available to an adversary

14

[Kifer-Machanavajjhala SIGMOD ‘11]

iDASH Privacy Workshop 9/29/2012

[Dwork-Naor JPC ‘10]

Page 8: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘12]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 15

Page 9: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Contingency tables

16

2 2

2 8

D

Count( , )

Each tuple takes k=4 different values

iDASH Privacy Workshop 9/29/2012

Page 10: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Contingency tables

17

? ?

? ?

D

Count( , )

Want to release counts privately

iDASH Privacy Workshop 9/29/2012

Page 11: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Laplace Mechanism

18

2 + Lap(1/ε) 2 + Lap(1/ε)

2 + Lap(1/ε) 8 + Lap(1/ε)

D Mean : 8 Variance : 2/ε2

Guarantees differential privacy.

iDASH Privacy Workshop 9/29/2012

Page 12: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Marginal counts

19

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

Does Laplace mechanism still guarantee privacy?

Auxiliary marginals published for following reasons:

1. Legal: 2002 Supreme Court case Utah v. Evans

2. Contractual: Advertisers must know exact demographics at coarse granularities

4

10

4 10

iDASH Privacy Workshop 9/29/2012

Page 13: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Marginal counts

20

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

2 + Lap(1/ε)

2 + Lap(1/ε) 2 + Lap(1/ε)

Count ( , ) = 8 + Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 + Lap(1/ε)

iDASH Privacy Workshop 9/29/2012

Page 14: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Mean : 8 Variance : 2/ke2

Marginal counts

21

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

2 + Lap(1/ε)

2 + Lap(1/ε) 2 + Lap(1/ε)

can reconstruct the table with high precision for large k

iDASH Privacy Workshop 9/29/2012

Page 15: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Customizing Privacy

• For handling correlations – Social networks [Kifer-M SIGMOD ‘11]

• Utility driven applications – “Realistic” vs “Worst-case” adversaries [M et al PVLDB ‘09]

• Dealing with aggregate secrets [Kifer-M PODS ‘12]

Qn: How to design principled privacy definitions customized to such scenarios?

iDASH Privacy Workshop 9/29/2012 24

Page 16: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘12]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 25

Page 17: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Pufferfish Semantics

• What is being kept secret?

• Who are the adversaries?

• How is information disclosure bounded?

iDASH Privacy Workshop 9/29/2012 27

Page 18: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Sensitive Information

• Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer”

– “individual j’s record is not in the data”

• Discriminative Pairs: Spairs is a subset of SxS. Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”)

– (“Bob has cancer”, “Bob has diabetes”)

iDASH Privacy Workshop 9/29/2012 28

Page 19: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Adversaries

• An adversary can be completely characterized by his/her prior information about the data – We do not assume computational limits

• Data Evolution Scenarios: set of all probability distributions that could have generated the data. – No assumptions: All probability distributions over data instances are

possible.

– I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk)

iDASH Privacy Workshop 9/29/2012 29

Page 20: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Information Disclosure

• Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if for every – w ε Range(M),

– (si, sj) ε Spairs

– Θ ε D, such that P(si | θ) ≠ 0, P(sj | θ) ≠ 0

P(M(data) = w | si, θ) ≤ eε P(M(data) = w | sj, θ)

iDASH Privacy Workshop 9/29/2012 30

Page 21: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Pufferfish Semantic Guarantee

iDASH Privacy Workshop 9/29/2012 31

Prior odds of si vs sj

Posterior odds of si vs sj

Page 22: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Applying Pufferfish to Differential Privacy

• Spairs: – “record j is in the table” vs “record j is not in the table”

– “record j is in the table with value x” vs “record j is not in the table”

• Data evolution: – Probability record j is in the table: πj

– Probability distribution over values of record j: fj

– For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)

iDASH Privacy Workshop 9/29/2012 32

Page 23: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Applying Pufferfish to Differential Privacy

• Spairs: – “record j is in the table” vs “record j is not in the table”

– “record j is in the table with value x” vs “record j is not in the table”

• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)

A mechanism M satisfies differential privacy

if and only if

it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above)

iDASH Privacy Workshop 9/29/2012 33

Page 24: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Differential Privacy

• Sensitive information: All pairs of secrets “individual j is in the table with value x” vs “individual j is not in the table”

• Adversary: Adversaries who believe the data is generated using any probability distribution that is independent across individuals

• Disclosure: ratio of the prior and posterior odds of the adversary is bounded by eε

iDASH Privacy Workshop 9/29/2012 34

Page 25: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Characterizing “good” privacy definition

• Any privacy definition that can be phrased as follows composes with itself.

where I is the set of all tables.

iDASH Privacy Workshop 9/29/2012 35

Page 26: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘12]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 37

Page 27: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Induced Neighbor Privacy

• Differential Privacy: Neighboring tables differ in one value

• Induced neighbors: tables whose shortest path does not contain another table that is consistent with the marginals (e.g., D and D’)

iDASH Privacy Workshop 9/29/2012 39

D D’’ D’

Page 28: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Induced Neighbor Privacy

For every output …

O D2 D1

Adversary should not be able to distinguish between any D1 and D2 based on any O

Pr[A(D1) = O] Pr[A(D2) = O] .

For every pair of induced neighbors

< ε (ε>0) log

40 iDASH Privacy Workshop 9/29/2012

Page 29: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Induced Neighbors Privacy and Pufferfish

• Given a set of count constraints Q,

• Spairs: – “record j is in the table” vs “record j is not in the table”

– “record j is in the table with value x” vs “record j is not in the table”

• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] α Πrj not in D (1-πj) x Πrj in D πj x fj(rj) , if D satisfies Q

– P[Data = D | θ] = 0, if D does not satisfy Q

A mechanism M satisfies induced neighbors privacy

if and only if

it satisfies Pufferfish instantiated using Spairs and {θ}

iDASH Privacy Workshop 9/29/2012 41

Page 30: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Computing induced sensitivity

2D case: qall : outputs all the counts in a 2-D contingency table. Marginals: row and column sums. The induced-sensitivity of qall = min(2r, 2c).

General Case: Deciding whether Sin(q) > 0 is NP-hard.

Conjecture: Computing Sin(q) is hard (and complete) for the second level of the polynomial hierarchy.

44 iDASH Privacy Workshop 9/29/2012

Page 31: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘12]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 45

Page 32: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Open Questions

• Privacy of social networks – Adversaries may use social network evolution models to infer sensitive

information about edges in a network [Kifer-M SIGMOD ‘11]

– Can correlations in a social network be generatively described?

• Characterize necessary and sufficient conditions for resilience against classes of attacks. – Sufficient conditions for composition. [Kifer-M PODS’12]

• Attack-resilient relaxations of differential privacy – Existing privacy definitions do not provide sufficient utility for certain

applications (e.g., social recommendations [M et al, VLDB ‘11])

iDASH Privacy Workshop 9/29/2012 46

Page 33: Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A Semantic Approach to Customizable Privacy Ashwin Machanavajjhala ashwin AT cs.duke.edu

Thank you

[M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011

[Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011

[Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012

[Ding-M ‘12] B. Ding, A. Machanavajjhala, “Induced Neighbors Privacy(Work in progress)”, 2012

iDASH Privacy Workshop 9/29/2012 49