On Implementing CSPA Specifications for Editing and Imputation Services Donato Summa , Monica ...
description
Transcript of On Implementing CSPA Specifications for Editing and Imputation Services Donato Summa , Monica ...
On Implementing CSPA Specifications for Editing and
Imputation Services
Donato Summa, Monica Scannapieco, Diego Zardetto, Istat, Italy
Istituto Nazionale di Statistica – ISTAT
2
The CSPA concept
• National Statistical Institutes (NSIs) produce Official Statistics having very similar goals
• Common activities carried on in an independent way, almost without relying on shared solutions
• Statistical organizations have attempted many times to share their processes, methodologies and software solutions (significant work to integrate)
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
3
The CSPA concept
• As part of the modernization effort in the Official Statistics field, the High Level Group for the Modernization of Statistical Production and Services (HLG) has taken action in order to address these issues
• promotion of development and implementation of the CSPA (Common Statistical Production Architecture)
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
4
The CSPA concept
CSPA provides a template architecture for official statistics, describing:
• What the official statistical industry wants to achieve
• How the industry can achieve this, i.e. principles that guide how statistics are produced
• What the industry will have to do, compliance with the CSPA
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
5
The CSPA concept
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
editrules CANCEIS SCSTools
Services CSPA compliant
Platforms
6
The CSPA concept
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
7
The Error Localization service
• In the POC initiative of 2013 CSPA project Istat undertook the responsibility of developing the CSPA Error Localization service, with the roles of designer, builder and assembler
• It was decided to wrap the “localizeErrors” function contained in the
“editrules” R package developed at Statistics Netherlands
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
8
The Error Localization service
• Data used for test cases come from Istat’s Structure of Earning Survey
• Input unit data sets involve 20 variables• The rules set consists of 44 edits involving 17
numeric variables appearing in the unit data sets• 3 different test cases with the same rules set
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
Data set 1
1000erroneus records
Data set 2
2000exact
records
Data set 3
3000mixed records
9
The Error Localization service
• The service was implemented technically as a Java standalone application (jar executable file) that wraps up the “localizeErrors” function of the “editrules” R package
• The jar can be called by GUI or by command line and is responsible of:– Take input parameter from user (or application)
– Invoke the execution of the R script in the R environment with provided input parameters
– Return the output parameters (output file generation) Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
10
The Error Localization service
• The Error Localization service wrapped by the Java program was then deployed on CORE thus proving the fully compatibility of CSPA services with respect to a specific NSI’s internal platform
• CORE (COmmon Reference Environment) is the Istat internal platform for statistical processes execution
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
11Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
Tool
CSPA
Platform
Service
….
….
….
12
Conclusion
• Istat is currently involved in the 2014 CSPA Implementation project, with the role of developing the Error Correction service.
• the following activities are ongoing:– study how to extend such a service in order to
perform a full editing and imputation process– design a CSPA specification, to be shared and agreed
among CSPA implementation project participants– implement the specifications provided at by concrete
CSPA services wrapping existing tools.
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014
13
Thank you for the attention !
Donato Summa, Work Session on Statistical Data Editing, Paris, 28-30/04/2014