Pilot Census in Poland Some Quality Aspects Geneva , 7-9 July 2010

Post on 23-Jan-2016

21 views 0 download

description

Pilot Census in Poland Some Quality Aspects Geneva , 7-9 July 2010. Janusz Dygaszewicz Central Statistical Office POLAND. Data processing infrastructure. XML. Registry 1. CAXI. TXT. Questionaries. Registry 2. ETL Tools. Operational Microdata Base. Analitycal Microdata Base. - PowerPoint PPT Presentation

Transcript of Pilot Census in Poland Some Quality Aspects Geneva , 7-9 July 2010

Pilot Census in PolandSome Quality Aspects

Geneva, 7-9 July 2010

Janusz DygaszewiczCentral Statistical Office

POLAND

2

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Data processing infrastructure

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Key elements of census process in terms of census quality • Census planning - scope of census,• Data sources,• Data collecting,• Data storing,• Data processing,• Development of census results,• Dissemination of census results,• Census Metadata System.

Census Quality

3

CENSUS PLANNING

4

Census planning Quality aspects: relevance, accuracy, costs including the burden on respondents, information security

• Determining the data scope defined in Act including:• Compliance with needs of domestic and

EU users,• Quality of data source,• Coherence and comparability of results

from census 2011 and 2002,

Census Quality

5

DATA ACQUISITION

6

7

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Data acquisition

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Files format:• Flat files,• XML files,• Local Databases XML files integration,

Data acquisition

8

Data acquisition - Portal

9

Datasources Quality aspects: accuracy, timeliness and punctuality, comparability and coherence, costs including the burden on respondents, information security• Assessment of data sources quality for census:

• analyses of methodological compliance of concepts definitions from registers with those adopted in statistics and the UNECE and EUROSTAT Recommendations for the 2010 Censuses on Population and Housing,• developing methodology for compliance

analyses,• constructing the IT system PiK for describing,

comparing and assessing coherence level,

Census Quality – data acquisition

10

Registers• developing methodology for assessing the

quality: dimensions, quality indicators,• evaluation and description of sources

quality,• MATRIX that represents the possibility of

obtaining the values for the census from registers:• census variable compliance indicators

(methodology compliance indicator), • register suitability indicators (population

coverage indicator for data from the register),

Census Quality – data acquisition

11

Data sets• developing methodology for assessing

the quality,• evaluation and description of data sets

quality,• developing methodology for improving

source data sets quality – rules for: standardization, normalization, de-duplication, editing, imputation, calibration

Census Quality – data acquisition

12

CENSUS FRAME PREPARATION

13

Citizens, buildings and dwelling list preparing,

Citizens, buildings and dwelling list and statistical data integration,

Census Frame preparing.

Census Frame preparation

14

Goal Frame Preparation,

Random Sample preparation,

Quality of Census Frame

15

Census frame pre-census revision - checking in field by enumerators

Census frame preparation – validation and updating in counties,

Enumerator tracking

18

19

20

21

22

Census Completeness Monitoring

24

TRANSFORMATION TO STATISTICAL REGISTER

25

26

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Source data collection and preparation

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Registers loading into data laboratory envroiment,

Denormalization,

Standarization,

Deduplication,

Validation,

Data completion,

Vocabulary validation and automatic correction,

Statistical files (register) generation,

Source data collection and preparation

27

Collecting dataQuality aspects: accuracy, costs including the burden on respondents, information security

• Collecting data from information systems• Central registers,• Distributed registers,

• format / file structure (XSD schemas),• data transfer platform,• application for encrypted data transfer,• application for validation and data set control

Census Quality – collection and preparation

28

Data loading to Operational Microdatabase,

Validation

Manual and automatic correction (cleaning),

Deduplication,

Variables calculating,

Source data loading and correction

29

30

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

CAxI

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

•CAII - Computer Assisted Internet Interview,•CAPI - Computer Assisted Personal Interview,•CATI - Computer Assisted Telephone Interviewing.

CAxI

CAxI

31

CAXI

• Collecting data from respondents: CAII, CAPI, CATI;• CAxI input validation:

• Numerical data validation (answers within boundaries)• Cross question arithmetical validation• Hints and automatic answer completion• Dictionaries and drop down menus

• CAxI logical validation: • Answers determined by questions• Cross question logical validation• Data collection logical paths

Census Quality – data collection by electronic questionare

32

Data storingQuality aspects: information security

• Data storing in Operational Microdata Base,• Notification of Operational Microdata Base

to registration by General Inspector for Protection of Personal Data,

Census Quality

33

GOLDEN RECORD,

34

35

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Golden Record generation

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

36

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Export to Analitycal Microdata Base

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Integration with Census Frame and CAxI data,

Validation,

Correction,

Operational Imputation,

Transfer proper values to Golden Record,

Golden Record generation

37

Registers 1..n

CAxI

Golden Record

OMB Layers

Transition Tables Preparing,

Golden Records anonymisation,

Transfer to Analitycal Microdatabase,

Export to Analitycal Microdata Base

38

Data processingQuality aspects: accuracy

• Developing quality indicators for data sets at each stage of data processing and the procedures for calculating their value,

• Developing procedures for bringing data from administrative sources to full compliance or minimum discrepancy with appropriate methodology adopted in statistics,

• Developing procedures for normalization, editing of data sets from the administrative systems, including the imputation of data (administrative data sets),

• Developing procedures for synchronization of data from administrative systems,• Developing rules for linking data from different administrative systems,• Developing rules for linking data from administrative systems with data from CAII, CAPI, CATI,• Developing rules for calculation of Golden Record census variables,• Developing rules for anonymisation of Golden Record census data.

Census Quality

39

ANALITYCAL MICRODATABASE

40

41

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Analitycal Microdata Base

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Analitycal Microdata Base - process

42

Process

data

Load dat a and m et adat aI nt egrat e dat aCl assi f y and code dat aEdi t and val i dat e dat aI m put eD er i ve new var i abl esWageAggregat eCreat e fil es

Analyse

Disse

minate

Archive

Manage metainformation

Manage quality

Functionality

43

AdministrationInformation

Security Management

Data Processing

Information Analisys

Requirement and Product Management

Dissemination

Metadata

Quality Management

Analitycal Microdatabase

Development of census resultsQuality aspects: relevance, accuracy, comparability and coherence

• Developing rules for missing data completion - imputation and calibration,• Developing rules for creating derived objects - creation of new objects

(households, families),• Developing a model / method of data estimation with the use of the data

from administrative systems and sample surveys,• Developing rules for calculating data outputs.

Census Quality

44

DISEMINATION

45

Dissemination of census resultsQuality aspects: relevance, timeliness and punctuality, accessibility and clarity, comparability and coherence, information security

• Designing Analitycal Microdata Base features including compliance with users needs, accessibility and clarity of census data.

Census Quality - disemination

46

METAINFORMATION MANAGEMENT

47

48

XML

TXT

Registry 1Registry 1

Metadata serverMetadata server

Operational Microdata

Base

Operational Microdata

Base

Registry 2Registry 2

Registry nRegistry nAnalitycalMicrodata

Base

AnalitycalMicrodata

Base

ETL ToolsETL

Tools

Portal

CAXI

Metadata server

XML

FilesStatistical

FilesGolden Record

Metadata MetadataMetadata

SDMX

Questionaries

Metainformation management

49

Metainformation

Definition

BussinesReferencial

Conceptual Methodical Quality

Structural

Technical

System

Postprocessing

Census Metadata SystemQuality aspects: accessibility and clarity

• Developing quality indicators at each stage of census and the procedures for calculating their value.

Census Quality – metainformation

50

51

POLAND