Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented...

19
Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa

Transcript of Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented...

Page 1: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Privacy Preserving Schema and Data Matching

Scannapieco, Bertino, Figotin and Elmargarmid

Presented by : Vidhi Thapa

Page 2: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

INTRODUCTION Record Matching

Process of identifying records representing same real world entity

Can be executed in Single source Across sources

Goal: Record matching that preserves privacy of both data and schema

Page 3: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

RECORD MATCHING Record matching involves:

Sharing and integrating data Protecting privacy of data

Two major innovations: Approximate matching Awareness of schema information

Page 4: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

EMBEDDING

Embed records in Euclidean space Method used SparseMap Comparison Functions

edit distance Matching Decision Rule

Classify records as a match/ non-match Record Matching

Page 5: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

EXAMPLE EDIT DISTANCE e( “Virginia”, “Vermont”) = 5

Virginia

Verginia

Verminia

Vermonia

Vermonta

Vermont

Page 6: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

HYPOTHESIS Two hypothesis:

Parties P and Q store the records to be matched in the relations RP(A1,…An) and RQ(B1,…Bn) respectively,

1. having identical schemas

2. having possible schema-level conflicts

Record matching between RP and RQ

P will know only a set PMatch, consisting of records in RP that match with records in RQ.

Similarly Q will know only the set QMatch.

Page 7: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

SECURE DATA MATCHING

Pairs of records compared by means of comparison function

Third party introduced to assure privacy SparseMap

reference set metric space No. of subsets = [log2N]2

Page 8: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

HEURISTIC

Distance Approximation Input: Object o, Set Si

Output: Approx d(o, Si)

Greedy Sampling Input: m co-ordinates Output: t <= m most discriminating co-ordinates

Page 9: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

DATA MATCHING PROTOCOL assume parties P and Q store records to be matched

in the relations RP(A1,…An) and RQ(B1,…Bn) respectively

a third party-based protocol consists of the three following phases Phase 1: Setting of the embedding space Phase 2:Embedding of RP and RQ values Phase 3:Comparison to decide matching records

Page 10: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Phase 1

Page 11: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Phase 2

Page 12: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

ILLUSTRATION Stress

Eg: Academic(8.0,5.0,7.0,7.0) and usefull(6.0,6.0,6.0,7.0) Using 1st co-ordinate – 0.5625, Using 2nd co-ordinate – 0.7656 Using 3rd co-ordinate – 0.7656 Using 4th co-ordinate – 1.0

Choose 1st co-ordinate Using 1st and 2nd co-ordinate – 0.5191 Using 1st and 3rd co-ordinate – 0.5191 Using 1st and 4th co-ordinate – 0.5625

Page 13: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Phase 3

Given a vector v in Pstr and w in Qstr , the Euclidean distance calculated

Decision rule applied to all records comparisons: If true, records of Pstr and Qstr inserted in two sets

PMatch and QMatch respectively

Final sets sent to two parties respectively

Page 14: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

SECURE SCHEMA MATCHING

SW : global schema owned by third party W LW : language αw : alphabet

SP and SQ are the source schemas owned by two parties

if SW is Customer (Name, DateofBirth, ResidenceAddress) and SP is Cust( FirstName, LastName, DateofBirth), it is mapped asconcatenate( Cust.FirstName, Cust.LastName) = Customer.Name

Page 15: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

SECURE SCHEMA MATCHING (contd)

P generates SP’ (D1, . . . , Ds) from the mapping of SP with SW(D1, . . . , DL);

Q generates SQ’(D1, . . . , Dx) from the mapping of SQ with SW(D1, . . . , DL);

P and Q negotiate: secret key k Embedding parameters ( Lx, N, dist); Hash function h

P sends HP =(h(D1, k), . . . , h(Ds, k)) to W; Q sends HQ = (h(D1, k) . . . , h(Dx, k)) to W;

W computes the intersection HP ∩ HQ

Page 16: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

SECURITY ANALYSIS

Length of the database Database size Set of matching records Set of matching attributes Number of matching attributes

Page 17: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

EXPERIMENTAL EVALUATION

Page 18: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

EXPERIMENTAL EVALUATION

Page 19: Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

CONCLUSION Privacy-preserving record matching between

two parties that can have different schemas Requires privacy at schema level Obtain privacy by embedding records in

vector space Applications:

DNA sequences, Images, Proteins, etc.