Privacy Preserving Indexing of Documents on the Network Mayank Bawa Roberto J. Bayardo Jr. Rakesh...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Privacy Preserving Indexing of Documents on the Network Mayank Bawa Roberto J. Bayardo Jr. Rakesh...
Privacy Preserving Indexing of Documents on
the Network
Mayank BawaRoberto J. Bayardo Jr.
Rakesh [email protected]
Sharing Private Content
• Rapid growth in Private & Semi-Private information on the network – Experimental results of drug tests– Drafts of research papers, patents,…– Architectural CAD documents
• Mechanisms to search information have failed to keep pace– Public Information: Google, Yahoo!– Private Information: ???
Talk Overview
1. Content Privacy issues in sharing access-controlled content
2. Data structure for search on access-controlled content
3. Algorithm for building such a data structure
Provider
• Shares documents• Enforces access policy
P1
Alzheimer’s Disease (Alice, Bob)
AIDS (Alice)
…
Small-Pox (Alice, Bob, Lisa, …)
P1 P2 P3
P32 P2026
Searcher
• Wants documents that match her keyword query Q
• Has an identity
Alice
P1 P2 P3
P32 P2026
Q = “Amyloid Peptide”
Automating Search
A searcher s issues a query q expecting a set of documents d such that
1. d is shared by some provider p
2. d matches the query q
3. d is accessible to s as dictated by p’s access policy
Content Privacy
An adversary A should not be able to deduce, using the search mechanism, that provider P is sharing document d with keywords q unless A has been granted access to d by P
Soln #1: Document Index
P2 P1 P3
P32 P2026
Alice
Q = “Amyloid Peptide”
Inverted Index
P1
Documents
Access Policy
?Alice
Soln #2: Keyword Index
P2 P1 P3
P32 P2026
Alice/George
Q = “Amyloid Peptide”
Keyword Index
P1
Keywords
Soln #2: Keyword Index
P2 P1 P3
P32 P2026
Alice/George
P1 has a document with
words “Amyloid Peptide”
Keyword Index
Keyword Index
ti {p: ti d,provider(d)= p}
ExampleAmyloid {…, P1, …}Peptide {…, P1, …}
Problem Cause Every term is mapped precisely
Soln #2: Keyword Index
Intuition
Add “false positives”
Example
Amyloid {…, P1, P2,…}
Peptide {…, P1, P2,…}
Soln #3: Privacy Preserving Index
Soln #3: Privacy Preserving Index (PPI)
P2 P1 P3
P32 P2026
Alice/George
Q = “Amyloid Peptide”
Privacy Preserving Index
P1
P2
Soln #3: Privacy Preserving Index (PPI)
P2 P1 P3
P32 P2026
Alice/George
P1 or P2 may have a document
with words “Amyloid Peptide”
Privacy Preserving Index
Soln #3: Privacy Preserving Index
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Completeness, Quantifiable Privacy on Reiter-Rubin scale, Loss in Selectivity
Consistency of Behavior
1. Results for “Peptide” should tally with results from searches earlier
2. Results for “Amyloid Peptide” “Amyloid” and “Peptide” should tally
3. …
Filtering of “noise” impossible
Step 3:Group (OR) Vector
]1log[,3max(
10:Error
)}1(78{
)1(1
c
r
Theorem: After r rounds, the Group Vector
subsumes with prob. 1iGiV
Searches
P2 P1 P3
P32 P2026Group A
Group F
Group S
Keyword Index (PPI)
Alice/George
Q = “Amyloid Peptide”
Group
F
Intuition:3.Group Vector
Group Vector is a logical OR => Members are indistinguishable
Privacy size of group
Intuition:3.Group Vector
Group Vector is a logical OR => Members are indistinguishable
Privacy size of group
Search Cost size of group
Privacy vs Performance Tradeoff
Evaluation Procedure
• YouServ: Personal web-server deployed within IBM corporate intranet since 2001
• Content from 324 YouServ web-servers
• Partitioned into privacy groups of size C
• Query Set consisting of 100 queries chosen randomly from YouServ query logs
Summary
• Searches on access-controlled data– Privacy Preserving Indexes– Randomized Construction
• Project Home– Google: Stanford Peers– Google: IBM YouServ
Comments & Questions
• Google: Stanford Peers– http://www-db.stanford.edu/peers
• Google: IBM YouServ– http://almaden.ibm.com/cs/people/
bayardo/userv
Growing Privacy Concerns
• Popular Press– Economist: The End of Privacy(’99)– Time: The Death of Privacy(’97)
• Govt. Directives/Commissions– European Union Directive on Privacy Protection(’98)
– Canadian Personal Information Protection Act(’01)
Context
“The misuse of subpoena process by an adult entertainment company emphasizes the potential for abuse with insufficient privacy protections in the law.”
--- Cindy Cohen(Legal Director, Electronic Frontier Foundation)
Context
“Better support for anonymity and privacy is sorely needed […] amid the RIAA’s campaign to subpoena information about customers.”
--- Wendy Seltzer
(Staff Attorney, Electronic Frontier Foundation)
Growing Privacy Concerns
In 07/2003, the RIAA began filing - at the rate of 75 or more per day – DMCA Section 512(h) subpoenas to force ISPs to identify file sharers.
DMCA 512(h) subpoenas are issued without prior judicial review […and so…] may be used to obtain identity information in cases where there is no copyright infringement.
Growing Privacy Concerns
• Unfair Walmart/KMart against a customer who posted their prices at a comparison-shopping site
• Errors RIAA against Prof. Usher at Penn State Dept. of Astronomy & Astrophysics [+dozen other cases]
• Vested A person against ISPs to erase record of his past messages
• Others Against Internet Archive,…
Adversary
Passive (observes sent messages: queries, responses, indexes)
Active (acts deliberately: searcher, provider, indexer)
Global/Local view
Collude/Independent actions
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Loss in Selectivity |Pfalse|/|Ptrue| for [B]; at most 2 for [C]
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Correctness No true positives excluded; provider enforces access control
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Privacy All providers equivalent in [A,C]
0 1/2 1
[B]
3.Constructing OR Vector
Group F outi
ii
ini
ii
ii
PprobwithB
Bbifelse
PprobwithB
Bbifelse
nopBbif
. 0
)10(
. 1
)01(
)(
inout
in
PP
P
Start
1 2
1
: iBib
3.Constructing OR Vector
Group F outi
ii
ini
ii
ii
PprobwithB
Bbifelse
PprobwithB
Bbifelse
nopBbif
. 0
)10(
. 1
)01(
)(
inout
inin
PP
PP
RoundEvery
1 2
: ib iB
Construction Properties
Completeness: For any query q, the result set Mq contains all providers that share documents matching q
Correctness: The mapping Mq is expected to be a Privacy Preserving Index
Construction Properties
Privacy: Within a privacy group G, an active adversary can only breach its neighbor’s privacy with probability < 0.71 (Possible Innocence)
0 1/2 1