Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector...
-
Upload
helen-cooper -
Category
Documents
-
view
214 -
download
0
Transcript of Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector...
![Page 1: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/1.jpg)
Distributing Data for Secure Data Services
Vignesh Ganapathy, Dilys Thomas, Tomas Feder,
Hector Garcia Molina, Rajeev MotwaniApril 8th, 2011
Stanford, TRDDC, TRUST
![Page 2: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/2.jpg)
RoadMap
Motivation for Secure Databases
Column level distribution
Encryption, Distribution
Privacy constraints
Set cover initialization
Query Mediation
Cost estimation
Where and Select clause processing
Query decomposition
Experiments
Related Work
![Page 3: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/3.jpg)
HealthPersonal medical details
Disease history
Clinical research dataBanking
Bank statement
Loan Details
Transaction history
FinancePortfolio information
Credit history
Transaction records
Investment details
InsuranceClaims records
Accident history
Policy details
OutsourcingCustomer data for testing
Remote DB Administration
BPO & KPORetail BusinessInventory records
Individual credit card details
Audits
ManufacturingProcess details
Blueprints
Production data
Govt. AgenciesCensus records
Economic surveys
Hospital Records
Motivation 1: Data Privacy in Enterprises
![Page 4: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/4.jpg)
Motivation 2: Government Regulations
Country Privacy Legislation
Australia Privacy Amendment Act of 2000
European Union Personal Data Protection Directive 1998
Hong Kong Personal Data (Privacy) Ordinance of 1995
United Kingdom Data Protection Act of 1998
United States Security Breach Information Act (S.B. 1386) of 2002
Gramm-Leach-Bliley Act of 1999
Health Insurance Portability and Accountability Act of 1996
![Page 5: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/5.jpg)
Motivation 3: Personal Information
EmailsSearches on Google/YahooProfiles on Social Networking sitesPasswords / Credit Card / Personal information at multiple E-
commerce sites / OrganizationsDocuments on the Computer / Network
![Page 6: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/6.jpg)
Losses due to Lack of Privacy: ID-Theft
• 3% of households in the US affected by ID-Theft
• US $5-50B losses/year
• UK £1.7B losses/year
• AUS $1-4B losses/year
![Page 7: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/7.jpg)
Data Privacy
Value disclosure: What is the value of attribute salary of person X
Perturbation
Privacy Preserving OLAP
Identity disclosure: Whether an individual is present in the database table
Randomization, K-Anonymity etc.
Data for Outsourcing / Research
Linkage disclosure: Linking columns from multiple sites
![Page 8: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/8.jpg)
RoadMap
Motivation for Secure Databases
Column level distribution
Encryption, Distribution
Privacy constraints
Set cover initialization
Query Mediation
Cost estimation
Where and Select clause processing
Query decomposition
Experiments
Related Work
![Page 9: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/9.jpg)
Masketeer: A tool for data privacy
Lodha, Patwardhan, Roy, Sundaram etal.
![Page 10: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/10.jpg)
Two Can Keep a Secret: A Distributed Architecture for Secure Database Services
Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi,
Motwani, Srivastava, Thomas, Xu
CIDR 2005
How to distribute data across multiple sites for (1)redundancy and(2) privacy so that a singlesite being compromised does not lead to data loss
![Page 11: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/11.jpg)
Motivation
• Data outsourcing growing in popularity– Cheap, reliable data storage and management
• 1TB $399 < $0.5 per GB• $5000 – Oracle 10g / SQL Server• $68k/year DBAdmin
• Privacy concerns looming ever larger– High-profile thefts (often insiders)
• UCLA lost 900k records• Berkeley lost laptop with sensitive information• Acxiom, JP Morgan, Choicepoint• www.privacyrights.org
![Page 12: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/12.jpg)
Present solutions
Application level: Salesforce.com
On-Demand Customer Relationship Management
$65/User/Month ---- $995 / 5 Users / 1 Year
Amazon Elastic Compute Cloud
1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth
Elastic, Completely controlled, Reliable, Secure
$0.10 per instance hour
$0.20 per GB of data in/out of Amazon
$0.15 per GB-Month of Amazon S3 storage used
Google Apps for your domain
Small businesses, Enterprise, School, Family or Group
![Page 13: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/13.jpg)
Encryption Based Solution
EncryptClient DSP
Client-side
Processor
Query Q Q’
“Relevant Data”
Answer
Problem: Q’ “SELECT *”
![Page 14: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/14.jpg)
The Power of Two
Client DSP1
DSP2
![Page 15: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/15.jpg)
The Power of Two
DSP1
DSP2
Client-side
Processor
Query QQ1
Q2
Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
![Page 16: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/16.jpg)
SB1386 Privacy
{ Name, SSN},
{ Name, LicenceNo}
{ Name, CaliforniaID}
{ Name, AccountNumber}
{ Name, CreditCardNo, SecurityCode}
are all to be kept private.
A set is private if at least one of its elements is “hidden”.
Element in encrypted form ok
![Page 17: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/17.jpg)
Techniques
Vertical FragmentationPartition attributes across R1 and R2E.g., to obey constraint {Name, SSN}, R1 Name, R2 SSNUse tuple IDs for reassembly. R = R1 JOIN R2
EncodingOne-time Pad
For each value v, construct random bit seq. rR1 v XOR r, R2 r
Deterministic EncryptionR1 EK (v) R2 K Can detect equality and push selections with equality predicate
Random additionR1 v+r , R2 rCan push aggregate SUM
![Page 18: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/18.jpg)
Example
An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}
Privacy Constraints
{Telephone}, {Email}
{Name, Salary}, {Name, Position}, {Name, DoB}
{DoB, Gender, ZipCode}
{Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and Encoding.
Decomposed Schema
R1:{TID, Name, Email, Telephone, Gender, Salary}
R2:{TID, Name, Email, Telephone, DoB, Position,ZipCode}
Encrypted Attributes E: {Telephone, Email, Name}
![Page 19: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/19.jpg)
Partitioning, Execution
• Partitioning Problem– Partition to minimize communication cost for
given workload– Even simplified version hard to
approximate– Hill Climbing algorithm after starting with
weighted set cover
• Query Reformulation and Execution– Consider only centralized plans– Algorithm to partition select and where clause
predicates between the two partitions
![Page 20: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/20.jpg)
Set Cover+ Greedy for partitioning
![Page 21: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/21.jpg)
RoadMap
Motivation for Secure Databases
Column level distribution
Encryption, Distribution
Privacy constraints
Set cover initialization
Query Mediation
Cost estimation
Where and Select clause processing
Query decomposition
Experiments
Related Work
![Page 22: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/22.jpg)
Cost Estimation
![Page 23: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/23.jpg)
State Definitions
• 0: condition clause cannot be pushed to either servers• 1: condition clause can be pushed to Server 1• 2: condition clause can be pushed to Server 2 • 3: condition clause can be pushed to both servers• 4: condition clause can be pushed to either servers
![Page 24: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/24.jpg)
OR State Evaluation
![Page 25: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/25.jpg)
AND State Evaluation
![Page 26: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/26.jpg)
Query Partitioning
• Query 1:
SELECT TID, name, salary
FROM R1
WHERE Name=’Tom’
• Query 2:
SELECT TID, dob, zipcode
FROM R2
WHERE Position=’Staff’
Original Query
SELECT Name, DoB, Salary
FROM R WHERE
(Name =’Tom’ AND Position=’Staff’) AND
(Zipcode =’94305’ OR Salary > 60000)
R1:R1:{TID, Name, Email, Telephone,Gender, Salary}
R2:{TID, Name, Email, Telephone, DoB, Position,Zipcode}
![Page 27: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/27.jpg)
Distributed Query Plan
![Page 28: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/28.jpg)
RoadMap
Motivation for Secure Databases
Column level distribution
Encryption, Distribution
Privacy constraints
Set cover initialization
Query Mediation
Cost estimation
Where and Select clause processing
Query decomposition
Experiments
Related Work
![Page 29: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/29.jpg)
Perfomance Gain Experiment
![Page 30: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/30.jpg)
Iterations Vs Privacy Constraints
![Page 31: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/31.jpg)
Acknowledgements: Collaborators
Stanford Privacy Group
TRDDC Privacy Group
PORTIA, TRUST, Google
![Page 32: Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford,](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649eef5503460f94bfedef/html5/thumbnails/32.jpg)
March 18, 2011
Back Up slides