Privacy-preserving Anonymization of Set Value Data
description
Transcript of Privacy-preserving Anonymization of Set Value Data
![Page 1: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/1.jpg)
Privacy-preserving Anonymization of Set Value Data
Manolis TerrovitisInstitute for the Management of Information Systems
(IMIS), RC AthenaNikos Mamoulis
University of Hong Kong (HKU)Panos Kalnis
King Abdullah University of Science and Technology (KAUST)
![Page 2: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/2.jpg)
2
Motivation
Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items
0% Milk
Pregn
ancy
test
Beer
Helen
![Page 3: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/3.jpg)
3
Motivation (cont.)
Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk
Database
t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk
Published
AttackerFind all transactions that contain Beer & 0% Milk
t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk
![Page 4: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/4.jpg)
4
km-anonymity
Di
tttDt
ooo
,...,
,...,,
21
21
Set of items
TransactionDatabase
tqsDttres |
kresres 0
mqs Query terms
km-anonymity:
![Page 5: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/5.jpg)
5
Related Work: K-Anonymity [Swe02]
Age ZipCode Disease42 25000 Flu46 35000 AIDS50 20000 Cancer54 40000 Gastritis48 50000 Dyspepsia56 55000 Bronchitis
[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.
(a) Microdata
Quasi-identifier
Age ZipCode Disease42-46 25000-35000 Flu42-46 25000-35000 AIDS50-54 20000-40000 Cancer50-54 20000-40000 Gastritis48-56 50000-55000 Dyspepsia48-56 50000-55000 Bronchitis
(a) 2-anonymous microdata
NOT suitable for high-dimensionality
![Page 6: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/6.jpg)
6
Related Work: L-diversity in Transactions
[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008
Requires knowledge of (non)-sensitive attributes
![Page 7: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/7.jpg)
7
Our Approach: Employs Generalization
Aaa 21,
Gene
raliz
atio
n Hi
erar
chy
otherwise , node leaf ,0
)(pu
pNCP
Information loss
k=2m=2
![Page 8: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/8.jpg)
8
Lattice of Generalizations
![Page 9: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/9.jpg)
9
Optimal Algorithm
Q: Q: Q:
![Page 10: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/10.jpg)
10
Count Tree
1221
1212122 ,,,
,,,,,,,,
baBaAbABbaBA
BAbabat
A1B
1 2a 1 1b 1
1b 1B
1 2a 1 1b 1
1 1 1
All generalized forms of the paths reside in the tree We can find easily which anonymizations are needed
![Page 11: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/11.jpg)
11
Apriori-based Anonymization
Global Optimal vs Local Optimal Solution for each path
We examine the paths By size (A priori principle) Paths with invalid nodes are skipped
![Page 12: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/12.jpg)
12
Apriori-based Anonymization1. Initialize gen_map2. For i := 1 to m do
1. For all t D do1. Extend t acccording to gen_map2. Add all i-subsets of extended t to
count-tree3. Check all paths in count tree and update
gen_map
![Page 13: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/13.jpg)
13
Small Datasets (2-15K, BMS-WebView2)
|I|=40..60, k=100, m=3
![Page 14: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/14.jpg)
14
Small Datasets (BMS-WebView2)
|D|=10K, k=100, m=1..4
![Page 15: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/15.jpg)
15
Apriori Anonymization for Large Datasets
500s
ec10
sec
100s
ec |D| |I|515K 165759K 49777K 3340
k=5 m=3
![Page 16: Privacy-preserving Anonymization of Set Value Data](https://reader038.fdocuments.us/reader038/viewer/2022110215/5681688f550346895ddf16f5/html5/thumbnails/16.jpg)
16
Points to Remember Anonymization of Transactional Data
Attacker knows m items Any m items can be the quasi-identifier
Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information
loss Extensions (VLDBJ 2010)
Local recoding (sort by Gray order and partition)
Global recoding (by partitioning the data domain)