Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented...
-
Upload
darlene-rose -
Category
Documents
-
view
216 -
download
0
Transcript of Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented...
Privacy Preserving Schema and Data Matching
Scannapieco, Bertino, Figotin and Elmargarmid
Presented by : Vidhi Thapa
INTRODUCTION Record Matching
Process of identifying records representing same real world entity
Can be executed in Single source Across sources
Goal: Record matching that preserves privacy of both data and schema
RECORD MATCHING Record matching involves:
Sharing and integrating data Protecting privacy of data
Two major innovations: Approximate matching Awareness of schema information
EMBEDDING
Embed records in Euclidean space Method used SparseMap Comparison Functions
edit distance Matching Decision Rule
Classify records as a match/ non-match Record Matching
EXAMPLE EDIT DISTANCE e( “Virginia”, “Vermont”) = 5
Virginia
Verginia
Verminia
Vermonia
Vermonta
Vermont
HYPOTHESIS Two hypothesis:
Parties P and Q store the records to be matched in the relations RP(A1,…An) and RQ(B1,…Bn) respectively,
1. having identical schemas
2. having possible schema-level conflicts
Record matching between RP and RQ
P will know only a set PMatch, consisting of records in RP that match with records in RQ.
Similarly Q will know only the set QMatch.
SECURE DATA MATCHING
Pairs of records compared by means of comparison function
Third party introduced to assure privacy SparseMap
reference set metric space No. of subsets = [log2N]2
HEURISTIC
Distance Approximation Input: Object o, Set Si
Output: Approx d(o, Si)
Greedy Sampling Input: m co-ordinates Output: t <= m most discriminating co-ordinates
DATA MATCHING PROTOCOL assume parties P and Q store records to be matched
in the relations RP(A1,…An) and RQ(B1,…Bn) respectively
a third party-based protocol consists of the three following phases Phase 1: Setting of the embedding space Phase 2:Embedding of RP and RQ values Phase 3:Comparison to decide matching records
Phase 1
Phase 2
ILLUSTRATION Stress
Eg: Academic(8.0,5.0,7.0,7.0) and usefull(6.0,6.0,6.0,7.0) Using 1st co-ordinate – 0.5625, Using 2nd co-ordinate – 0.7656 Using 3rd co-ordinate – 0.7656 Using 4th co-ordinate – 1.0
Choose 1st co-ordinate Using 1st and 2nd co-ordinate – 0.5191 Using 1st and 3rd co-ordinate – 0.5191 Using 1st and 4th co-ordinate – 0.5625
Phase 3
Given a vector v in Pstr and w in Qstr , the Euclidean distance calculated
Decision rule applied to all records comparisons: If true, records of Pstr and Qstr inserted in two sets
PMatch and QMatch respectively
Final sets sent to two parties respectively
SECURE SCHEMA MATCHING
SW : global schema owned by third party W LW : language αw : alphabet
SP and SQ are the source schemas owned by two parties
if SW is Customer (Name, DateofBirth, ResidenceAddress) and SP is Cust( FirstName, LastName, DateofBirth), it is mapped asconcatenate( Cust.FirstName, Cust.LastName) = Customer.Name
SECURE SCHEMA MATCHING (contd)
P generates SP’ (D1, . . . , Ds) from the mapping of SP with SW(D1, . . . , DL);
Q generates SQ’(D1, . . . , Dx) from the mapping of SQ with SW(D1, . . . , DL);
P and Q negotiate: secret key k Embedding parameters ( Lx, N, dist); Hash function h
P sends HP =(h(D1, k), . . . , h(Ds, k)) to W; Q sends HQ = (h(D1, k) . . . , h(Dx, k)) to W;
W computes the intersection HP ∩ HQ
SECURITY ANALYSIS
Length of the database Database size Set of matching records Set of matching attributes Number of matching attributes
EXPERIMENTAL EVALUATION
EXPERIMENTAL EVALUATION
CONCLUSION Privacy-preserving record matching between
two parties that can have different schemas Requires privacy at schema level Obtain privacy by embedding records in
vector space Applications:
DNA sequences, Images, Proteins, etc.