WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

13
WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS Christine Bycroft, Katherine Merrett Office for National Statistics, UK

description

WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS. Christine Bycroft, Katherine Merrett Office for National Statistics, UK. Outline. What is PRAM Why we needed to adapt the PRAM method Adapted PRAM Methodology Disclosure risks Effect on Data Quality Conclusions. - PowerPoint PPT Presentation

Transcript of WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Page 1: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

WP 15Experience of using a Post Randomisation Method (PRAM) at ONS

Christine Bycroft, Katherine Merrett

Office for National Statistics, UK

Page 2: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Outline

• What is PRAM

• Why we needed to adapt the PRAM method

• Adapted PRAM Methodology

• Disclosure risks

• Effect on Data Quality

• Conclusions

Page 3: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

What is PRAM

• PRAM is a disclosure control technique for categorical data in microdata files.

• The values of a categorical variable are changed according to a prescribed probability.

• Each new perturbed value may or may not be different from the original value.

• For example, a person who is classified as a widow may be re-classified as single.

Page 4: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Probability mechanism for PRAM

• The probability mechanism is described by an invertible transition matrix P

• One P matrix for each variable

• Let P=( pij ) be an LxL matrix for a variable having L

categories. The entries of the matrix are the conditional probabilities.

• pii is the probability of no change

Pr( _ | _ )ijP New value j Old value i

Page 5: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Risk and data utility for PRAM

Disclosure risk• PRAM offers protection by inflow and outflow: • inflow from safe combinations of values to risky combinations• outflow from risky combinations to safe combinations.

Data Utility• the Invariant PRAM method preserves univariate frequencies in expectation• No control over joint distributions- may create edit failures, e.g. 14 year old doctor - or highly unusual combinations, e.g. 17 year old widow

Page 6: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Why adapt the PRAM method?

• Applied to the 2001 Individual Sample of Anonymised Records (SARs) drawn from the Census.(know population uniques from Census records)

• Used recoding as first method to reduce risk• Do not apply PRAM to the whole file• Perturb only remaining high risk records (small proportion of

all records)

• Wish to preserve exact univariate frequencies, not just expected values

• Wish to control joint distributions to minimise edit failures and unusual combinations

Page 7: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Adapted PRAM Methodology

• Perturbing only those records which are high risk

• For the transition matrix, P we want to:– Maximise the probability of changing values– Preserve freqencies (ie P is invariant)– Create perturbed records that are feasible and will not result

in highly unusual combinations

• Define a linear programming problem

Page 8: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Adapted PRAM Methodology

• The LP routine minimised the objective function, subject to constraints. The objective function is

• We have set up a Weight Matrix to avoid extreme transitions.

• Rather than having extreme changes that might create highly unusual individuals or invalid combinations, we prefer to keep the values as they are.

( )

ii ii ij iji i j

ij

w p w p

W w

Where

is a weight matrix; a low weight for a

preferred transition and a high weight for a

non -preferred transition

Page 9: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Implementation

• PRAM variables sequentially - greatest contribution to risk first

• Define weight matrix for each variable • LP solved in SAS, to get P transition matrix• PRAM within control variables (eg PRAM age within marital

status categories)

• Implementation of pij probabilities preserves exact frequencies

• Check for edit failures, and correct• Perturbed records are flagged as being imputed (whether

changed or not)

Page 10: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Results: Disclosure risks

• Our aim was to only protect against attempts at exact matching. Assumed that perturbing the value of one variable in a high risk record provides sufficient protection

• Protection by high outflow, but low inflowResults showed high proportions changed, except for last variables in sequence

• Acceptable, since these variables had the lowest overall contribution to disclosure risk, and only a small number of records were affected

Page 11: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Results: Data Quality

Preservation of the univariate frequencies - excellent results

Preservation of the multivariate frequencies

a) very few records failed the edit checks

b) compare tables before and after PRAM:

c) Each cell: ratio of the relative error due to PRAM and relative sampling error

Page 12: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Effect on Data Quality

• Results from 15 tables (nearly 3,000 cells) • The effect of perturbation relative to sample error decreases as

the cell size increases. Thus the damage done by PRAM is greater for cells with low frequencies.

0-5 6-10 11-20 21-40 41-90 91-150 150-500 500+ TotalPercentage of cells with

a ratio >1 35 25 24 13 15 10 17 10 16Percentage of cells with

a ratio >2 9 8 6 4 5 4 7 4 5

Cell Frequency Before PRAM

Table 1: Percentage of Cells across all tables with a ratio of the error due to PRAM and the sampling error of greater than 1 and 2

Page 13: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS

Conclusions

• As used in this context on targeted records, PRAM is an efficient method of data perturbation, which is well controllable.

• Applying PRAM to a small proportion of the file has allowed us to strike a good balance between recoding and minimising the damage from perturbation.