PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety...
-
date post
15-Jan-2016 -
Category
Documents
-
view
228 -
download
0
Transcript of PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety...
PRIVACY CRITERIA
Roadmap
Privacy in Data mining
Mobile privacy
(k-e) – anonymity
(c-k) – safety
Privacy skyline
Privacy in data mining
Random Perturbation (quantitative data) Given value x, return value x + r, r is a random value from a
distribution Construct decision-tree classifier on perturbed data s.t.
accuracy is comparable to classifiers of original data
Randomized Response (categorical data) Basic idea: disguise data by probabilistically changing the
value of sensitive attribute to another value Distribution of original data can be reconstructed using the
disguised data
Roadmap
Privacy in Data mining
Mobile privacy
(k-e) – anonymity
(c-k) – safety
Privacy skyline
Mobile privacy Spatial cloaking: Cloaked region
Contains location q and at least k-1 other user locations
Circular region of location q Contains location q and number of dummy
locations generated by client Transformation based matching
Transform region through Hilbert curves by using Hilbert keys
Casper: user registers with (k, Amin) profile k: user is k-anonymous Amin : minimum acceptable resolution of the
cloaked spatial region
Roadmap
Privacy in Data mining
Mobile privacy
(k-e) – anonymity
(c-k) – safety
Privacy skyline
(k-e) - anonymity
Privacy protection for numerical sensitive attributes
GOAL: group sensitive attribute values s.t. No less than k distinct values Range of group larger than threshold e
Permutation-based technique to support aggregate queries Constructing help table
Aggregate Query Answering on Anonymized Tables @ ICDE2007
(k-e) - anonymityOriginal Table
Table after Permutation
(k-e) - anonymityTable after Permutation
Help Table
Roadmap
Privacy in Data mining
Mobile privacy
(k-e) – anonymity
(c-k) – safety
Privacy skyline
(c-k) – safety
Goal: quantify background knowledge k of attacker maximum disclosure w.r.t. k is less than threshold
c
Express background knowledge through a language
Worst –Case Background Knowledge for Privacy –Preserving Data Publishing @ ICDE2007
(c-k) – safety
Create buckets , where randomly permute sensitive attribute values within each bucket
Original Table Bucketized Table
(c-k) – safety Bound background knowledge i.e., attacker knows k
basic implications
Atom: tp[S] = s, s S, p Person e.g. tJack[Disease] = flu
Basic implication: For some m, n and Ai, Bi atoms
e.g. tJack[Disease] = flu tCharlie[Disease] = flu
is the language consisting of conjunctions of k basic implications
(c-k) – safety
Find bucketization B of original table s.t. B is (c-k) – safe
The maximum disclosure of B w.r.t is less than threshold c
Roadmap
Privacy in Data mining
Mobile privacy
(k-e) – anonymity
(c-k) – safety
Privacy skyline
Privacy skyline
Original data transformed in Generalized or Bucketized data
Quantify external knowledge through skyline for each sensitive value
External knowledge for each individual Having single sensitive value Having multiple sensitive values
Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge @ VLDB 2007
Privacy skyline Three types of knowledge (l, k, m) e.g.(2, 3, 1)
l: Knowledge about target individual t flueTom[S] and cancerTom[S] (obtained from Tom.s
friend)
k: Knowledge about individuals (u1, ..uk) other than t flue Bob[S] and flue Cary[S] and cancer Frank[S]
(obtained from another hospital) m: Knowledge about the relationship between t
and other individuals (v1, …vm) AIDS Ann[S] AIDS Tom[S] (because Ann is
Tom’s wife)
Privacy skyline Example: knowledge threshold (1, 5, 2) and
confidence c=50% for sensitive value AIDS Adversary knows l≤1 sensitive values that t does
not have Adversary knows sensitive values of k≤5 others Adversary knows m≤2 members in t’s same-value
family
Adversary cannot predict individual t to have AIDS with confidence 50% when the above hold
Privacy skyline
If transformed data D* is safe for (1, 5, 2) it is safe for any (l, k, m) with l≤1, k≤5, m≤2
i.e., the shaded region
Privacy skyline
Skyline for set of incomparable points {(1, 1, 5), (1, 3, 4), (1, 5, 2)}
Privacy skyline
Given a skyline {(l1, k1, m1, c1), …,(lr, kr, mr, cr)}
release candidate D* is safe for sensitive value iff , for i =1 to r
max {Pr( t[S] | Lt, (li, ki, mi), D*)} < ci
maximum probability of a sensitive value to be for individual t w.r.t external knowledge and release candidate is below confidence threshold ci
Original Table Generalize Table
Bucketized Table