WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS
-
Upload
keegan-avila -
Category
Documents
-
view
34 -
download
0
description
Transcript of WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS
![Page 1: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/1.jpg)
WP 15Experience of using a Post Randomisation Method (PRAM) at ONS
Christine Bycroft, Katherine Merrett
Office for National Statistics, UK
![Page 2: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/2.jpg)
Outline
• What is PRAM
• Why we needed to adapt the PRAM method
• Adapted PRAM Methodology
• Disclosure risks
• Effect on Data Quality
• Conclusions
![Page 3: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/3.jpg)
What is PRAM
• PRAM is a disclosure control technique for categorical data in microdata files.
• The values of a categorical variable are changed according to a prescribed probability.
• Each new perturbed value may or may not be different from the original value.
• For example, a person who is classified as a widow may be re-classified as single.
![Page 4: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/4.jpg)
Probability mechanism for PRAM
• The probability mechanism is described by an invertible transition matrix P
• One P matrix for each variable
• Let P=( pij ) be an LxL matrix for a variable having L
categories. The entries of the matrix are the conditional probabilities.
• pii is the probability of no change
Pr( _ | _ )ijP New value j Old value i
![Page 5: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/5.jpg)
Risk and data utility for PRAM
Disclosure risk• PRAM offers protection by inflow and outflow: • inflow from safe combinations of values to risky combinations• outflow from risky combinations to safe combinations.
Data Utility• the Invariant PRAM method preserves univariate frequencies in expectation• No control over joint distributions- may create edit failures, e.g. 14 year old doctor - or highly unusual combinations, e.g. 17 year old widow
![Page 6: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/6.jpg)
Why adapt the PRAM method?
• Applied to the 2001 Individual Sample of Anonymised Records (SARs) drawn from the Census.(know population uniques from Census records)
• Used recoding as first method to reduce risk• Do not apply PRAM to the whole file• Perturb only remaining high risk records (small proportion of
all records)
• Wish to preserve exact univariate frequencies, not just expected values
• Wish to control joint distributions to minimise edit failures and unusual combinations
![Page 7: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/7.jpg)
Adapted PRAM Methodology
• Perturbing only those records which are high risk
• For the transition matrix, P we want to:– Maximise the probability of changing values– Preserve freqencies (ie P is invariant)– Create perturbed records that are feasible and will not result
in highly unusual combinations
• Define a linear programming problem
![Page 8: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/8.jpg)
Adapted PRAM Methodology
• The LP routine minimised the objective function, subject to constraints. The objective function is
• We have set up a Weight Matrix to avoid extreme transitions.
• Rather than having extreme changes that might create highly unusual individuals or invalid combinations, we prefer to keep the values as they are.
( )
ii ii ij iji i j
ij
w p w p
W w
Where
is a weight matrix; a low weight for a
preferred transition and a high weight for a
non -preferred transition
![Page 9: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/9.jpg)
Implementation
• PRAM variables sequentially - greatest contribution to risk first
• Define weight matrix for each variable • LP solved in SAS, to get P transition matrix• PRAM within control variables (eg PRAM age within marital
status categories)
• Implementation of pij probabilities preserves exact frequencies
• Check for edit failures, and correct• Perturbed records are flagged as being imputed (whether
changed or not)
![Page 10: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/10.jpg)
Results: Disclosure risks
• Our aim was to only protect against attempts at exact matching. Assumed that perturbing the value of one variable in a high risk record provides sufficient protection
• Protection by high outflow, but low inflowResults showed high proportions changed, except for last variables in sequence
• Acceptable, since these variables had the lowest overall contribution to disclosure risk, and only a small number of records were affected
![Page 11: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/11.jpg)
Results: Data Quality
Preservation of the univariate frequencies - excellent results
Preservation of the multivariate frequencies
a) very few records failed the edit checks
b) compare tables before and after PRAM:
c) Each cell: ratio of the relative error due to PRAM and relative sampling error
![Page 12: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/12.jpg)
Effect on Data Quality
• Results from 15 tables (nearly 3,000 cells) • The effect of perturbation relative to sample error decreases as
the cell size increases. Thus the damage done by PRAM is greater for cells with low frequencies.
0-5 6-10 11-20 21-40 41-90 91-150 150-500 500+ TotalPercentage of cells with
a ratio >1 35 25 24 13 15 10 17 10 16Percentage of cells with
a ratio >2 9 8 6 4 5 4 7 4 5
Cell Frequency Before PRAM
Table 1: Percentage of Cells across all tables with a ratio of the error due to PRAM and the sampling error of greater than 1 and 2
![Page 13: WP 15 Experience of using a Post Randomisation Method (PRAM) at ONS](https://reader035.fdocuments.us/reader035/viewer/2022081603/568135bd550346895d9d2200/html5/thumbnails/13.jpg)
Conclusions
• As used in this context on targeted records, PRAM is an efficient method of data perturbation, which is well controllable.
• Applying PRAM to a small proportion of the file has allowed us to strike a good balance between recoding and minimising the damage from perturbation.