On method-specific record linkage for risk assessment

Post on 16-Jan-2016

26 views 1 download

description

On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage - PowerPoint PPT Presentation

Transcript of On method-specific record linkage for risk assessment

On method-specific record linkage for risk assessment

Jordi NinJavier Herranz Vicenç Torra

2

Disclosure Risk Scenario:

How an intruder re-identifies an individual

Preliminaries:

Protection methods and Record Linkage

Location record linkage:

A new way to compute the disclosure risk

Conclusions and future work:

On method-specific record linkage for risk assessment Contents

3

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

4

On method-specific record linkage for risk assessment Disclosure Risk Scenario

X

n

a

Attribute classification

Identifiers: Passport number

Quasi-Identifiers: Age, postal code

Confidential: Income

id SexMarital status

Income

1

2

...

Male

Male

...

Single

Single

13.500

11.000

5

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Re-identification scenario

X = id || Xnc || Xc X’ = X’nc || Xc

Privacy is ensured, quasi-identifiers are anonymized

Data quality is preserved, confidential attributes are preserved

6

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Data set 1 Data set 2

X1 X2 X3 X4

X1 X2 X3 X4

X1 X2 X3 X4

X’1 X’2 X’3 X’4

X’1 X’2 X’3 X’4

X’1 X’2 X’3 X’4

Problem: Find a correct mapping between data file 1 and data file 2

Record Linkage

7

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Distance based Record linkage

Probabilistic Record linkage

• The nearest pairs of record are considered as linked pairs • It is very easy to tune

• Results very dependent of the parameters

• Moderated time cost

• Linked pairs are computed using conditional probabilities • Tuning is difficult

• Few parameters

• High time cost

8

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

9

On method-specific record linkage for risk assessment Preliminaries

Rank swapping - p

Algorithm

For all attrj where 1 j n

Attrj is sorted

all values xij are swapped with xil where i < l l+p

Sorting Attrj is reversed

End for

End algorithm Simple

Preserve µ and

All combinations disappear

10

On method-specific record linkage for risk assessment Preliminaries

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Rank swapping - p example

p = 20%

8

6

10

7

9

2

1

4

5

3

1

2

3

4

5

6

7

8

9

10

11

On method-specific record linkage for risk assessment Preliminaries

Microaggregation - ka

k

a a a

k

k

k

a = 1 Optimal

a > 1, NP-Hard Heuristic

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

k=3

12

On method-specific record linkage for risk assessment Preliminaries

Optimal univariate Microaggregation

Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist)

Result 2. All clusters of any optimal partition have between k and 2k-1 elements.

x1

x2

x3

x4

k = 2

Clusters are built using the nodes of the shortest path

algorithm

13

On method-specific record linkage for risk assessment Preliminaries

MDAV Microaggregation

k=2

X X’

MDAV is multivariate heuristic microaggegation

14

On method-specific record linkage for risk assessment Preliminaries

Score: Protection method evaluation

Score = 0.5 IL + 0.5 DR

IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)

IL1 = mean of absolute error

IL2 = mean variation of average

IL3 = mean variation of variance

IL4 = mean variation of covariancie

IL5 = mean variation of correlation

DR = 0.25 DLD+0.25 PLD+0.5 ID

DLD = number of links using DBRL

PLD = number of links using PRL

ID = protected values near orginal

15

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

16

On method-specific record linkage for risk assessment Location Problem Desciption

L-RL: Location Record Linkage

Standard record linkage compares all records

Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set

It is unnecessary to compare all the records

17

On method-specific record linkage for risk assessment Location record linkage

Method Description

Xext X’QuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

18

On method-specific record linkage for risk assessment Location record linkage

Example: Rank swapping

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

P=20%

17

6

13

14

16

19

12

5

16

Distance

19

On method-specific record linkage for risk assessment Location record linkage

Rank Swapping Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Rank swapping configurations:

p = 2 … 20

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

20

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Linkage Results

21

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Score Results

22

On method-specific record linkage for risk assessment Location record linkage

Univariate Microaggregation Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

23

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Linkage Results

24

On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Score Results

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

25

On method-specific record linkage for risk assessment Location record linkage

MDAV Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

26

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Linkage Results

27

On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Score Results

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

28

Disclosure Risk Scenario

Preliminaries

Location Problem Description

Location Record Linkage

Conclusions and future work

29

On method-specific record linkage for risk assessment Conclusions and future work

• We have presented a new type of record linkage designed

to exploit the limitations of some protection methods

• L-RL method obtains a more accurate DR evaluation for

rank swapping and univariate microaggregation

• MDAV is immune to the location problem

Conclusions

• We plan to study the DR of MDAV and other protection methods using other ad-hoc methods

Future work

On method-specific record linkage for risk assessment

Jordi NinJavier Herranz Vicenç Torra