Data Streams and Continuous Query Systems CS 240B: Professor Zaniolo Eric Sytwu Joseph Joswig.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and...
-
Upload
paula-haynes -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and...
![Page 1: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/1.jpg)
1
Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss
Author: Barzan Mozafari and Carlo Zaniolo
Speaker: Hongwei Tian
![Page 2: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/2.jpg)
2
Outline
• Motivation• Brief background on NBC• Privacy breach for views• Transformation from unsafe views to safe views• Extension for arbitrary prior distributions• Experiments• Conclusion
![Page 3: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/3.jpg)
3
Motivation
• PPDM methods seek to achieve the benefits from data mining on the data, without compromising privacy of individuals in the data.
– data collection phase– data publishing phase– data mining phase
![Page 4: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/4.jpg)
4
Motivation
• Privacy breaches when publishing NBCs– Bob knows that Alice lives on Westwood and she is in 40s– Bob’s prior belief on Alice earning 70K was 5/7 = 71%– After seeing the views, Bob infers that with a probability of
1/10 × (4/5 + 4×3/4 + 5×1) = 88% Alice earns a 70K salary.
![Page 5: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/5.jpg)
5
Motivation
• Publishing better views
– Bob’s posterior belief 1/6 × (2/3+1/2+1/2+1+1+1) = 78%
– 71%-to-78% is safer than 71%-to-88%
![Page 6: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/6.jpg)
6
Motivation
• Achieve same classification results– Test input is <P, 30>– The NBC built on V1 predicts the class label as 50K, because
5/7×1/5×1/5 < 2/7×1/2×1/2– The prediction from the second classifier (built on V2) is again
50K, because 3/5×1/3×1/3 < 2/5×1/2×1/2
![Page 7: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/7.jpg)
7
Motivation
• NBC has proved to be one of the most effective classifiers in practice and in theory.
• Given an unsafe NBC, it is possible to find an equivalent one that is safer to publish.
• The objective is determining whether a set of NBC-enabling views are safe to publish
• And if not, how to find a secure database that produces the same NBC model satisfying privacy requirements.
![Page 8: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/8.jpg)
8
Brief Background on NBC
• The original database T is an instance of a relation
• In order to build an NBC, the only views that need to be published are for all , and
• Equivalent to publishing these views, one can instead publish the following counts. For , , ,
1,..., ,nR A A C
, ( )iA CT ( )C T
1 i n
1 i n
it A c C
, ( )i
it c A t C cN T
( )c C cP T
![Page 9: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/9.jpg)
9
Brief Background on NBC
• Using these counts, we can express the NBC’s probability estimation as follows. For all and for all
, the NBC’s prediction is:1 1( ,..., )n nt t A A
c C
1 1
,
...
Pr ( )( ) /
i
n n
it cc
ic
A t A t
NP
T PClass c
T T
,,
i
it c
c c ic
NX P
P , ', '
'
i
it c
c c ic
NX P
P
![Page 10: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/10.jpg)
10
0,2 0 0
0
. , . ( )
. , .
c I
d D
d S
cd
d S
P P t C c t d t I I P d V d V
P t C c t d t I I P d
P d
Privacy Breach for Views
• Prior and posterior knowledge
where
Quasi-identifier:
Family of all table instances:
all instances satisfying the given views:
1 3,I A A 0 1 3 1 3,I t t A A
0, .D d t d t I I
0,1 0. , .c I
d D
P P t C c t d t I I P d
0( )S d D V d V
![Page 11: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/11.jpg)
11
• For a given table T, publishing V(T) = V0 causes a privacy breach with respect to a pair of given constants 0 < L1 < L2 < 1, if either of the following holds:
or,
• For example, 0.5-to-0.8 does not satisfy the privacy requirement L1 = 0.51 and L2 = 0.8, but 0.5-to-0.78 does.
0 0, ,1 1 2 2c I c IP L L P
0 0, ,2 1 2 1c I c IP L L P
Privacy Breach for Views
![Page 12: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/12.jpg)
12
• assume a uniform distribution of the database instances; assume a uniform distribution of class values.
0,1 0
1. , .c I
d D
P P t C c t d t I I P dC
Privacy Breach for Views
0,2
1c I c cd d
d S d S
P P dS
![Page 13: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/13.jpg)
13
Privacy Breach for Views
• Let I0 be the value of a given quasi-identifier I, and let V0 be the value of a given view V(T). If there exist some m1,m2 > 0 such that for all :
then for any c and any pair of L1,L2 > 0 publishing V0 will not cause any privacy breaches w.r.t. L1 and L2, provided that the following amplification criterion holds:
c C1 21 c
dd S
m m
C S C
2 2 1
1 1 2
1
1
m L L
m L L
![Page 14: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/14.jpg)
14
Privacy Breach for Views
• For a given quasi-identifier I = I0, a given view V(T) = V0 is safe to publish against any L1-to-L2 privacy breaches, if there exists such that the following conditions hold:
and for all :
• select the largest possible• for a given , recast the privacy goal as that of checking/enforcing
the second condition
1
, 'c c C
22 1
1 2
( 1) 1
1 1
C L L
C L L
'
cdd Scdd S
![Page 15: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/15.jpg)
15
Privacy Breach for Views
• With respect to a given I0 as the value of a quasi-identifier I, and a given amplification ratio , the viewset (P,N) is safe to publish, if for all , and , the following conditions hold:
, 'c c C1 i n
it A
,
, '
0i
It c
it c
N
N '0 Ic
c
P
P
![Page 16: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/16.jpg)
16
Privacy Breach for Views
• Two observations– all quasi-identifiers that have the same cardinality
(i.e., number of attributes) can be blocked at the same time, since the conditions are functions of |I|, and not of I or I0.
– all privacy breaches for all quasi-identifiers of any cardinality can be blocked by simply blocking the one with largest cardinality, namely n, because
1 1n n
![Page 17: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/17.jpg)
17
Privacy Breach for Views
• With respect to a given amplification ratio , the viewset (P,N) is safe to publish, if for all , and , the following conditions hold:
, 'c c C1 i n
it A
,
, '
0it c nit c
N
N '0 c n
c
P
P
![Page 18: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/18.jpg)
18
Transformation from unsafe views to safe views
• NBC-Equivalence• Let f and f’ be two functions that map each element of
to a non-negative real number. We call f and f’
NBC-equivalent, ifi
i
A Ci
i
A , 'c c C 'c c
( , ) ( , ') '( , ) '( , ')
( , ) ( , ') '( , ) '( , ')
f c f c f c f c
f c f c f c f c
![Page 19: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/19.jpg)
19
Transformation from unsafe views to safe views
• Transformation algorithms– Input: V is the given view set consisting of and ;
amplification ratio
– Description:• Step 1: Replace all those that are 0 to non-zero• Step 2: Scale down all to new rational numbers that
satisfy the given amplification ratio• Step 3: Adjust the numbers such that again• Step 4: Normalize the numbers or turn them into integers
– Output: V
,it cN cP
,it c c
t
N P
,it cN
,it cN
(1) Raising all the counts to the same power does not change the classification;
(2) In other words a set of NBC-equivalent viewsets is closed under exponentiation.
Example: 100 and 16, 10>4 100-16>10-4
![Page 20: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/20.jpg)
20
Extension for arbitrary prior distributions
• See an tiny example
![Page 21: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/21.jpg)
21
Experiments
• Adult dataset containing 32,561 tuples• The attributes used were Age, Years of education, Work
hours per week, and Salary.• an NBC trained on the k-anonymous data vs. an NBC
trained on the output of Safety Views Transformation
![Page 22: 1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.](https://reader036.fdocuments.us/reader036/viewer/2022062423/56649f1c5503460f94c326c6/html5/thumbnails/22.jpg)
22
Conclusion
• Reformulated privacy breach for view publishing• Presented sufficient conditions that are easy to
check/enforce• Provided algorithms that guarantee the privacy of the
individuals who provided the training data, and incur zero accuracy loss in terms of building an NBC.