Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.
-
Upload
harry-gallagher -
Category
Documents
-
view
220 -
download
0
Transcript of Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.
![Page 1: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/1.jpg)
Sovereign Information Sharing, Sovereign Information Sharing, Searching and MiningSearching and Mining
Rakesh AgrawalRakesh Agrawal
IBM Almaden Research CenterIBM Almaden Research Center
![Page 2: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/2.jpg)
ThesisThesis
Organizational boundaries are blurring in the Organizational boundaries are blurring in the emerging networked economyemerging networked economy– Compete and co-operate simultaneouslyCompete and co-operate simultaneously– Int’l value chainInt’l value chain
Need to rethink information sharing, searching, and Need to rethink information sharing, searching, and mining in the new brave world of virtual mining in the new brave world of virtual organizationsorganizations
![Page 3: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/3.jpg)
Separate databases due to Separate databases due to statutory, competitive, or security statutory, competitive, or security reasons.reasons. Selective, minimal sharing on Selective, minimal sharing on
need-to-know basis.need-to-know basis. Example:Example: Among those who took Among those who took
a particular drug, how many had a particular drug, how many had adverse reaction and their DNA adverse reaction and their DNA contains a specific sequence?contains a specific sequence? Researchers must not learn Researchers must not learn
anything beyond counts.anything beyond counts. Commutative Encryption:Commutative Encryption:
E1(E2(T)) = E2(E1(T))E1(E2(T)) = E2(E1(T))
Minimal Necessary Sharing
R S R must not
know that S has b & y
S must not know that R has a & x
uu
vv
RSaa
uu
vv
xx
bb
uu
vv
yy
R
S
Count (R S) R & S do not learn
anything except that the result is 2.
Sovereign Information Sharing
SIGMOD 00
![Page 4: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/4.jpg)
Privacy Preserving Data MiningPrivacy Preserving Data Mining
0
200
400
600
800
1000
1200
2 10 18 26 34 42 50 58 66 74 82
Original Randomized Reconstructed
50 | 40K | ... 30 | 70K | ...
Randomizer Randomizer
Reconstructdistribution
of Age
Reconstructdistributionof Salary
Data Mining Algorithms
Data Mining Model
65 | 20K | ... 25 | 60K | ...
Alice’s age
Alice’s salary
Bob’s age
30+35
0
20
40
60
80
100
120
10 20 40 60 80 100 150 200
Randomization Level
Original Randomized Reconstructed
Insight: Preserve privacy at the individual level, while still building accurate data mining models at the aggregate level.
Add random noise to individual values to protect privacy.
EM algorithm to estimate original distribution of values given randomized values + randomization function.
Algorithms for building classification models and discovering association rules on top of privacy-preserved data with only small loss of accuracy.
SIGMOD 00
![Page 5: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/5.jpg)
Finessing Schema ChaosFinessing Schema Chaos
0 10 20 30 40 500
10
20
30
40
50
0 10 20 30 40 500
10
20
30
40
50
1 2 3 4 5 7
Query Size
0
20
40
60
80
100
AccuracyNon-Reflectivity
Randomized Non-Reflectivity
Use a simple regular expression extractor to get numbers
Do simple data extraction to get hints
Hint for unit: the word following the number.
Hint for attribute name: k following numbers.
256 MB SDRAM memory
Unit Hint:MB
Attribute Hint:SDRAM, Memory
Use only numbers in the queries
Treat any attribute name in the query also as hint Reflectivity estimates
accuracyW W W 03
![Page 6: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/6.jpg)
Privacy Preserving IndexingPrivacy Preserving Indexing
A public mapping function that maps a query to a A public mapping function that maps a query to a set of providers P that may contain the desired set of providers P that may contain the desired documentdocument
P contains false negativesP contains false negatives Providers return a document only if the searcher is Providers return a document only if the searcher is
authorized to access the documentauthorized to access the document
VLDB 03
![Page 7: Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.](https://reader035.fdocuments.us/reader035/viewer/2022081806/5697c0011a28abf838cc2b1b/html5/thumbnails/7.jpg)
Some Interesting TopicsSome Interesting Topics
Current integration approaches do not scaleCurrent integration approaches do not scale– Information integration per se is not interestingInformation integration per se is not interesting– Static vs. dynamic plumbingStatic vs. dynamic plumbing
Incentive compatibilityIncentive compatibility Auditing interactionsAuditing interactions