Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du,...

33
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical Engineering & Computer Science Syracuse University, Syracuse, New York.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    3

Transcript of Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du,...

Page 1: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification

Wenliang (Kevin) Du,

Zhouxuan Teng,

and Zutao Zhu.Department of Electrical Engineering & Computer Science

Syracuse University, Syracuse, New York.

Page 2: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Introduction Privacy-Preserving Data Publishing. The impact of background knowledge:

How does it affect privacy? How to measure its impact on privacy?

Integrate background knowledge in privacy quantification. Privacy-MaxEnt: A systematic approach. Based on well-established theories.

Evaluation.

Page 3: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Privacy-Preserving Data Publishing Data disguise methods

Randomization Generalization (e.g. Mondrian) Bucketization (e.g. Anatomy)

Our Privacy-MaxEnt method can be applied to Generalization and Bucketization. We pick Bucketization in our presentation.

Page 4: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Data Sets

Identifier Quasi-Identifier (QI) Sensitive Attribute (SA)

Page 5: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Bucketized Data

P( Breast cancer | {female, college}, bucket=1 ) = 1/4P( Breast cancer | {female, junior}, bucket=2 ) = 1/3

Quasi-Identifier (QI) Sensitive Attribute (SA)

Page 6: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Impact of Background Knowledge

Background Knowledge:

It’s rare for male to have breast cancer.

This analysis is hard for large data sets.

Page 7: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Previous Studies Martin, et al. ICDE’07.

First formal study on background knowledge Chen, LeFevre, Ramakrishnan. VLDB’07.

Improves the previous work. They deal with rule-based knowledge.

Deterministic knowledge. Background knowledge can be much more

complicated. Uncertain knowledge

Page 8: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Complicated Background Knowledge Rule-based knowledge:

P (s | q) = 1. P (s | q) = 0.

Probability-Based Knowledge P (s | q) = 0.2. P (s | Alice) = 0.2.

Vague background knowledge 0.3 ≤ P (s | q) ≤ 0.5.

Miscellaneous types P (s | q1) + P (s | q2) = 0.7 One of Alice and Bob has “Lung Cancer”.

Page 9: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Challenges How to analyze privacy in a systematic way

for large data sets and complicated background knowledge?

Directly computing P( S | Q ) is hard.

What do we want to compute? P( S | Q ), given the background knowledge and

the published data set. P(S | Q ) is primitive for most privacy metrics.

Page 10: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Our Approach

BackgroundKnowledge

Published Data

Public Information

Constraintson x

Constraintson x

Solve x

Consider P( S | Q ) as variable x (a vector).

Most unbiased solution

Page 11: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Maximum Entropy Principle “Information theory provides a constructive

criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum entropy estimate. It is least biased estimate possible on the given information.” — by E. T. Jaynes, 1957.

Page 12: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

The MaxEnt Approach

BackgroundKnowledge

Published Data

Public Information

Constraintson P( S | Q )

Constraintson P( S | Q )

Estimate P( S | Q )

Maximum Entropy Estimate

Page 13: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Entropy

Because H(S | Q, B) = H(Q, S, B) – H(Q, B)

Constraint should use P(Q, S, B) as variables

BSQ

BQSPBQSPBQPBQSH,,

).,|(log),|(),(),|( :Entropy

BSQ

BSQPBSQPBSQH,,

).,,(log),,(),,( :Entropy

Page 14: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Maximum Entropy Estimate

Let vector x = P(Q, S, B). Find the value for x that maximizes its

entropy H(Q, S, B), while satisfying h1(x) = c1, …, hu(x) = cu : equality constraints

g1(x) ≤ d1, …, gv(x) ≤ dv : inequality constraints

A special case of Non-Linear Programming.

Page 15: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Constraints from Knowledge

Linear model: quite generic. Conditional probability:

P (S | Q) = P(Q, S) / P(Q). Background knowledge has nothing to do with B:

P(Q, S) = P(Q, S, B=1) + … + P(Q, S, B=m).

Background Knowledge

Constraintson P(Q, S, B)

Page 16: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Constraints from Published Data

Constraints Truth and only the truth. Absolutely correct for the original data set. No inference.

Published Data SetD’

Constraintson P(Q, S, B)

Page 17: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Assignment and Constraints

Observation: the original data is one of the assignmentsConstraint: true for all possible assignments

Page 18: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

QI Constraint

Constraint:

Example:

),(),,(1

bqPbsqP j

h

j

2.0)1,()1,,()1,,()1,,( 1312111 qPsqPsqPsqP

Page 19: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

SA Constraint

Constraint:

Example:

),(),,(1

bsPbsqPg

ii

P(q1,s4 ,2) P(q3,s4,2) P(q4,s4 ,2) P(s4 ,2) 0.1

Page 20: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Zero Constraint P(q, s, b) = 0, if q or s does not appear in

Bucket b. We can reduce the number of variables.

Page 21: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Theoretic Properties Soundness: Are they correct?

Easy to prove. Completeness: Have we missed any constraint?

See our theorems and proofs. Conciseness: Are there redundant constraints?

Only one redundant constraint in each bucket. Consistency: Is our approach consistent with the

existing methods (i.e., when background knowledge is Ø).

Page 22: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Completeness w.r.t Equations Have we missed any equality constraint?

Yes! If F1 = C1 and F2 = C2 are constraints, F1 + F2 = C1

+ C2 is too. However, it is redundant.

Completeness Theorem: U: our constraint set. All linear constraints can be written as the linear

combinations of the constraints in U.

Page 23: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Completeness w.r.t Inequalities Have we missed any inequalities constraint?

Yes! If F = C, then F ≤ C+0.2 is also valid (redundant).

Completeness Theorem: Our constraint set is also complete in the

inequality sense.

Page 24: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Putting Them Together

BackgroundKnowledge

Published Data

Public Information

Constraintson P( S | Q )

Constraintson P( S | Q )

Estimate P( S | Q )

Maximum Entropy Estimate

Tools: LBFGS, TOMLAB, KNITRO, etc.

Page 25: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Inevitable Questions:

Where do we get background knowledge? Do we have to be very very knowledgeable? For P (s | q) type of knowledge:

All useful knowledge is in the original data set. Association rules:

Positive: Q S Negative: Q ¬S, ¬Q S, ¬Q ¬S

Bound the knowledge in our study. Top-K strongest association rules.

Page 26: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Knowledge about Individuals

Knowledge 1: Alice has either s1 or s4.

Constraint:

Knowledge 1: Two people among Alice, Bob, and Charlie have s4.

Constraint:

Alice: (i1, q1)Bob: (i4, q2)Charlie: (i9, q5)

NqipsqiPsqiPsqiP 111411111111 ),()2,,,()2,,,()1,,,(

NsqiPsqiPsqiP 2459424411 )3,,,()3,,,()2,,,(

Page 27: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Evaluation Implementation:

Lagrange multipliers: Constrained Optimization Unconstrained Optimization

LBFGS: solving the unconstrained optimization problem.

Pentium 3Ghz CPU with 4GB memory.

Page 28: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Privacy versus KnowledgeEstimation Accuracy: KL Distance between P(MaxEnt) (S | Q) and P(Original) (S | Q).

Page 29: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Privacy versus # of QI attributes

Page 30: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Performance vs. Knowledge

Page 31: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Running Time vs. Data Size

Page 32: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Iteration vs. Data size

Page 33: Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.

Conclusion Privacy-MaxEnt is a systematic method

Model various types of knowledge Model the information from the published data Based on well-established theory.

Future work Reducing the # of constraints Vague background knowledge Background knowledge about individuals