The Application for Statistical Processing at SURS · 2017-11-03 · The Application for...

Post on 15-Apr-2020

4 views 0 download

Transcript of The Application for Statistical Processing at SURS · 2017-11-03 · The Application for...

The Application for

Statistical Processing at

SURS

Andreja Smukavec, SURS

Rudi Seljak, SURS

UNECE Statistical Data Confidentiality Work Session

Helsinki, 5 – 7 October 2015

Old system

• Stove-pipe oriented production

– Ad-hoc solutions were developed for a

particular survey

• Survey methodologists‘ strive for

improvement was crucial

– “Our data are not confidential“

• Process metadata were not organized

– Difficulties when a survey methodologist

resigns

Renovation

• An internal project started in 2012

– IT, General Methodology and subject-matter

specialists

– Build a global solution appropriate for most of

the surveys

– Solution which covers most of the parts of

statistical production:

• Data validation

• Data editing and imputation

• Aggregation and standard error estimation

• Statistical disclosure control for tabular data

• Tabulation

Renewed system

• Generalised metadata driven application

– Database of process metadata

• MS Access -> ORACLE

• For each survey instance

– General SAS code

– GUI for process metadata

– Different microdata environments allowed,

just some basic rules for the structure of

microdata databases

• Ad hoc SAS program for preparation of

microdata

Schematic presentation of the

renewed system

Different microdata databases

General SAS

Ad -

Database of process

metadata

Metadata repository

Different kind of

output

… program program

Application for management

Data on tables and variables

Ad-hoc

Tabular data protection

1. Calculation of primary sensitivity for

seven types of statistics: number, total,

share, ratio, average…

– Threshold, p%-rule, (n,k)-dominance rule

– „Holding rule“ + sampling weights

– Zeroes unsafe

2. Secondary suppression applied in case

of sensitive statistics (number and total)

– SAS-Tool (Excel file with metadata, Tau

Argus, SAS macros)

Tabular data protection

• Results for each survey instance saved in

the database with statistics (ORACLE)

– Statuses for lower precision

– Confidentiality flags for the type of primary

and secondary suppression

• 3 types of tabulation (codelists)

– Excel format (the most user-friendly)

– plain text format (.tab,.hrc) for Tau-Argus

– plain text format (.csv) for PX-Edit (SURS’s

publication tool)

Tabulation & Tabular Data Protection

program

General SAS program

Database of process metadata

Caculation of statistics

Tabulation

Different microdata databases

Ad - hoc program

Tabular

protection

Output tables

General SAS program

Database with

statistics

Database of process metadata

Parameters for SDC in MetaSOP

Tabulation in MetaSOP

Processing in MetaSOP

Example of 3-dimensional

table After aggregation

CC_SI / Dim_2

Dim_3

TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08

1 37700934.42 35625442 2075493 11 47110694.48 46417660 693034.1 2 733763444.2 6.62E+08 71456295 21 517712620.1 4.8E+08 37489998 22 161044502.5 1.1E+08 50837088 23 37903335.85 37783060 120275.8 24 343495995.1 2.86E+08 57438583

11 TOT 59283130.99 56199883 3083248 1 64428657.15 62453677 1974980 11 21989840.69 21609892 379948.2 2 69502173.33 67377101 2125073 21 13959568.67 13959569 - 22 338148.7639 338148.8 z 23 7911125.122 7911125 - 24 27886089.54 26016025 1870064

12 TOT 215349659.2 2.04E+08 11792968 1 5993635.356 5993635 - 11 2035728.954 2035729 - 2 55635358.28 54430511 1204847 21 146242216.3 1.43E+08 2783876 22 4164502.417 3872003 292499.2 23 38774447.75 34931862 3842585 24 42332750.72 37447112 4885639

21 TOT 176972728 1.76E+08 1323998 1 2248602.352 2248602 z 11 166013.5624 166013.6 z 2 372993785.9 3.69E+08 4134769 21 418831917.8 4.08E+08 10337323 22 29411096.08 29411096 z 23 56581.5975 56581.6 z 24 88244091.34 86483431 1760660

After use of SAS-Tool

CC_SI / Dim_2

Dim_3

TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08

1 37700934.42 35625442 2075493 11 47110694.48 46417660 693034.1 2 733763444.2 6.62E+08 71456295 21 517712620.1 4.8E+08 37489998 22 161044502.5 1.1E+08 50837088 23 37903335.85 37783060 120275.8 24 343495995.1 2.86E+08 57438583

11 TOT 59283130.99 56199883 3083248 1 64428657.15 z z 11 21989840.69 z z 2 69502173.33 z z 21 13959568.67 13959569 -

22 338148.763 z z 23 7911125.122 7911125 - 24 27886089.54 z z

12 TOT 215349659.2 2.04E+08 11792968 1 5993635.356 5993635 - 11 2035728.954 2035729 - 2 55635358.28 54430511 1204847 21 146242216.3 1.43E+08 2783876 22 4164502.417 z z 23 38774447.75 z z

24 42332750.72 z z 21 TOT 176972728 1.76E+08 1323998

1 z z z 11 z z z 2 z z z 21 418831917.8 4.08E+08 10337323 22 29411096.08 z z

23 z z z 24 88244091.34 z z

New organization

• Old system:

– Every survey had its own programmer and its

own general methodologist

• Renewed system:

– General methodologist and IT expert

(„support team“) help the subject-matter

specialist to

• insert and edit the process metadata (except for

SDC) into the application

• run particular parts of the statistical process

Advantages

• The subject-matter personnel‘s skills

improve (higher quality of data)

• The process metadata can be changed

easily and the procedure can be repeated

in short time (flexibility)

• The rules for data processing are gathered

in one place (transparency)

Drawbacks

• High risk of syntax errors in the process of

the insertion of metadata expressions

• Subject-matter personnel has to learn

some new skills (SAS expressions)

• An error during the execution can cause

problem if the support team is busy or not

available

Challenges for the future

• Introduce the application successfully into

the production

– Adjusting to changes by the subject-matter

specialists

– Building a qualified support team

• Adding new functionalities

– Indices

– Secondary suppression for other types of

statistics

– GUI instead of the Excel file for the SAS - Tool

Thank you for attention.