DwB –Data withoutBoundaries 07/12/2011 AdditionalWorkshop ... · DwB –Data withoutBoundaries...

17
DwB – Data without Boundaries Additional Workshop – Metadata Standards 07/12/2011 1 1 DWB workshop on Metadata Standards Session 3 SDMX and GSBPM August GÖTZFRIED Eurostat 2 The outline of the presentation: 1. Introduction 2. What is SDMX and why do we need it? 3. The SDMX information model 4. SDMX used along the GSBPM 5. SDMX used in the European Statistical System 6. Conclusion Introduction 3 1. Introduction The Commission Communication 404/2009 “On the production method of EU statistics: a vision for the next decade” The basic ideas of this communication (from 08/2009) are: • From statistical stove pipes to more integrated statistical production processes; • Better integration of the ESS in terms of IT infrastructure, software, data quality, metadata, methodology etc. (both in terms of horizontal and vertical integration); • Increased use of administrative data sources in the statistical data production processes; • Statistical legislation should be broader and more cross-cutting in covering larger statistical domains.

Transcript of DwB –Data withoutBoundaries 07/12/2011 AdditionalWorkshop ... · DwB –Data withoutBoundaries...

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

1

1

DWB workshop on Metadata Standards

Session 3SDMX and GSBPM

August GÖTZFRIED Eurostat

2

The outline of the presentation:

1. Introduction2. What is SDMX and why do we need it?3. The SDMX information model4. SDMX used along the GSBPM5. SDMX used in the European Statistical

System6. Conclusion

Introduction

3

1. IntroductionThe Commission Communication 404/2009 “On the production method of EU statistics:

a vision for the next decade”

The basic ideas of this communication (from 08/2009) are:• From statistical stove pipes to more integrated statistical production

processes;• Better integration of the ESS in terms of IT infrastructure, software, data

quality, metadata, methodology etc. (both in terms of horizontal and verticalintegration);

• Increased use of administrative data sources in the statistical dataproduction processes;

• Statistical legislation should be broader and more cross-cutting in coveringlarger statistical domains.

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

2

4

The Joint ESS strategy paper (05/2010) The basic principles

This joint ESS strategy paper was created on the basis of the Communication 404/2009.

� Principle 1: fulfilling the needs of multiple users for policydecision making (e.g. EU 2020).

� Principle 2: focussing on processes, tools and infrastructure ofstatistical production; priorities, programs, products etc. aredeveloped in parallel.

� Principle 3: medium and long-term perspective for statisticalbusiness process integration; it covers processes, statistics,IT, data quality, methodology, ….

� Principle 4: Better integration of ESS planning.

� Principle 5: subsidiarity needs to be respected; unnecessaryduplication of work should however be avoided.

5

The Joint ESS strategy paper (05/2010) The basic principles

� Principle 6: the ESS is based on a partnership; � Principle 7: Modernisation of statistical business processes

based on integration and standardisation, generic IT tools, infrastructure, methods, metadata, etc. – Integration of different data sources (some actions in the MEETS

program); increased use of administrative data...– Data linking instead of new surveys…– Horizontal integration (i.e. between statistical domains) and

vertical integration (between the ESS NSIs) to be achieved.– European approach to statistics; collaborative networks

(ESSnets, sponsorships…).

6

The Joint ESS strategy paper (05/2010)Actions to be undertaken

� More harmonisation and standardisation of statisticalmethodologies for data collection, data validation,dissemination etc.

� Harmonising the IT infrastructure and sharing IT tools asfacilitator of the use of agreed statistical methods.

� Harmonising metadata (role of metadata in the GSBPM).

� More emphasis on cross-cutting areas such as data qualitystatistical methods, methodology, metadata, IT….

� Data quality: need for harmonised high-level quality assuranceand adaptation of the quality framework.

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

3

7

The Joint ESS strategy paper (05/2010)Actions to be undertaken

� Identify generic processes for producing ESS statistics;� Metadata driven IT infrastructure based on SDMX;� Some examples: the New Dissemination chain, the ESS

Metadata Handler, Vision Infrastructure Projects...� First candidates for integration: broader statistical domains

such as business statistics, price statistics, etc.

� Many actions already launched some years ago or ongoing(e.g. those based on the MEETS program);

� ……………

Overall: the aim is to reduce cost and burden for the whole European Statistical System

8

The Joint ESS strategy paper (05/2010)The instruments

� New generation of framework regulations related to thestandardisation of the statistical business processes(e.g. related to data/metadata exchange, metadata, dataquality, etc.).

� Community financial contribution: ESSnets, grants etc.

� Sponsorships, ESSnets, MEETS and other collaborationnetworks.

and SDMX (providing statistical and technical standards plus an IT infrastructure and IT tools);

Therefore: SDMX as one of the main enabler of this

ESS strategy.

9

2. What is SDMX and why do we need it ?

Some parallels with everyday life

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

4

10

Some parallels with everyday life

SDMX is an ISO standard allowing dataand metadata to be exchanged,understood and operated usingstandard software.

� USB plug

11

SDMX aims at improving existing technical andstatistical standards for the exchange of statisticalinformation, adapting to new demands generated bythe Internet revolution.

Role of SDMX ?

12

� A model to describe statistical data and metadata

� A standard for automated communication from machine to machine

� A technology supporting standardised IT tools

In order to take advantage of all this :

� Statisticians agree to use a common description for data and metadata

� The data exchange process is then driven by the common description

� Data descriptions are made available for everybody who wants to understand and reuse the data

What does SDMX consist of?

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

5

13

SDMX SDMX -- The backgroundThe background

• The exchange of statistical data and metadata is complex,resource intensive and expensive.

• In the past, national and international organisations havedeveloped specific processes and IT solutions.

• Opportunities and challenges related to new technologies s uchas XML, web services, etc. arose in the last years.

SDMX is the global answer by main statistical organ isations in the world.

14

Seven international organizations (BIS, ECB, Eurostat, IM F, OECD, UN, World Bank)have joined forces based on:

– Memorandum of Understanding signed in March 2007– Rotating chair every two years

SDMX is globalSDMX is global

U.N. Statistical Commission (02/2008): • SDMX is recognised as “the preferred standard for the exchang e and sharing of

data and metadata in the global statistical community”.

• Emphasised the need of further involving national and inter national agencies.

• Underlined importance of capacity building and outreach (S eminars, Workshops,manuals, training, technical assistance).

• Requested SDMX Sponsors to continue the SDMX development of technical andstatistical standards, IT service infrastructure and IT to ols.

15

The SDMX componentsThe SDMX components

SDMX consists of technical and statistical standards and gu idelines,together with an IT service infrastructure and IT tools. Thi s means morein particular:

• The SDMX information model for data and metadata;

• SDMX Content-oriented Guidelines as statistical standard s;

• SDMX IT architecture for data and metadata exchange;

• SDMX IT tools supporting the implementation and use of the SD MXtechnical and statistical standards.

SDMX is not just a data transmission format, but sh ould be used from end-to-end of the statistical business process .

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

6

16

a. The SDMX information model (IM)

• SDMX technical standards: 2.0/2.1

• XML format for the exchange of SDMX structured dataand metadata

• Data/metadata structure definitions (DSDs/MSDs) to bedefined for statistical domains

More detailed information on the IM in the next cha pter

17

b. The SDMX Content Oriented Guidelines (COG)

• Statistical Cross-Domain Concepts + Code lists:short list of statistical concepts relevant to all statistical domains; to be used withinthe SDMX technical standards

(e.g. frequency, observation status, time format, unit of measurement)

• Statistical Subject-Matter Domainslist of subject-matter domains

(e.g. demographic and social statistics, economic statistics, environment)

• Metadata Common Vocabularycommon cross-domain statistical terminology used in the SDMX content-orientedguidelines

See also www.sdmx.org

18

SDMX takes care of every element of a statistical ta ble

Indicator

Time

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

Number of touristic establishments

in Italy, annual data

A100Hotels and similar

B010Tourist Campsites

B020Holiday dwellings

TIME

COUNTRY

FREQUENCY

TOURISM TOPIC

OBS_VALUE

P

E OBS_STATUS

DIMENSIONS ATTRIBUTES MEASURES

Pos In

Key

Dimension or

attribute nameIdentifier Presentation

Attachement

levelCode list

1 Frequency FREQ A1 CL_FREQ

2 Country COUNTRY A2 CL_AREA

3 Tourism topic TOURISM_TOPIC AN4 CL_TOPIC

4 Time TIME N4

Observation status OBS_STATUS A1 Observation CL_OBS_STATUS

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

7

19

Sending organisations Receiving organisations

Input environment

Processing environment

Verification /Conversion

to SDMX

Warehousestorage

XSLT forSDMX-ML

PUSH

Loader Dissemination

SDMX-EDIfile

SDMX-MLfile

c. The SDMX IT architecture

- The “push” mode

20

Sending organisations Receiving organisations

Database

Pull Requestor

Receiveddata in

SDMX-MLLoader Dissemination

WebService

SDMX-MLfile

RSS

PULL

Input environment

Processing environment

Warehousestorage

XSLT forSDMX-ML

- The “pull” mode

21

Dissemination

XSL forSDMX-ML

Data HUBSDMX Data

Messagecache

Database

WebService

HUB

Database

WebService

Database

WebService

WebGraphic

User Interface

Sending organisations Receiving organisations

- The “pull” mode & hub

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

8

22

• The SDMX IT tools are available in the SDMX IT tool database(on the SDMX website):

• SDMX IT applications can be retrieved in accordance to thefunctionalities they offer……..e.g.

Visualization, syntax transformation, analytical tools, etc.

• More SDMX IT applications will be added when they getavailable

d. The SDMX IT tools

23

Benefits of SDMXBenefits of SDMX

The use of the SDMX standards and guidelines provides the mai nbenefits:

– Reducing reporting burden;

– Fostering international data and metadata consistency;

– Enhancing the integration and the efficiency of statistica lbusiness processes (vertical and horizontal);

– Providing standard dissemination formats;

– Facilitating data and metadata use and analysis;

– Open standards maintained by the SDMX sponsors with theinput of the statistical organisations.

24

Available Now

� SDMX technical and statisticalstandards (2.0/2.1)

� SDMX IT tools and ITinfrastructure

� SDMX Website

� SDMX-COG

� Two SDMX working groups put inplace (SDMX TWG and SWG)

� The SDMX Action Plan 2011 to2015;

Coming Next

� More DSDs and MSDs (e.g. inBalance of Payment Statistics/National Accounts);

� Improved SDMX Content-orientedGuidelines;

� Global SDMX Registries;

� Additional deliveries of the SDMXStatistical and Technical workinggroups (for more details see alsothe SDMX Action Plan);

SDMX: what is next ?SDMX: what is next ?

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

9

25

Summarising …Summarising …

SDMX is global.

SDMX is not only about IT standards, but also about statistical standards

SDMX gets more and more used by statistical organisations

SDMX at one of the main enablers when it comes to the integration and harmonisation of

statistical business processes

26

The main SDMX groups of components:

Technical Specifications

The SDMX

Information Model

Guidelines to

hamonise content

The Content Oriented Guidelines (COG)

Tools

IT Architectures for data exchange

SDMX compliant tools

3. The SDMX Information Model (IM)

27

The SDMX Information Model is a meta-model describing the objects involved in:

� The collection

� The dissemination

� The publication

of aggregate statistics and related metadata

The abstract model is like a structured set of containers

� Everything in SDMX is model-driven:

� All messages and interfaces are implementations of the information model

The SDMX-IM provides a way of modelling data, metadata and exchange processes

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

10

28

Dataset

Structure

Data

Structural

Metadata

Data Structure Definition (DSD) Dimensions

(ex: country, variable/topic,

year)

Dimensions

(ex: country, variable/topic,

year)

Attributes

(ex: unit of measure)

Attributes

(ex: unit of measure)

Metadata about an individual value, a time series or a group of time series

Metadata about an individual value, a time series or a group of time series

The SDMX-IM provides a way of modelling data, metad ata and exchange processes

Identify/Describe

29

The main elements of the SDMX-IM for data andmetadata exchange:

DATA & METADATA

FLOWS

Structure Definition

Category Scheme

Category

ConstraintProvision Agreement

Data Provider

Data & Metadata set

30

Data & Metadata in the SMDX-IM

Time series

Cross-sectional

- The different Statistical Data (Figures) representations:

- The different Metadata types:

Structural metadata (identifiers)

Reference metadata (descriptors)

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

11

31

Statistical data: Time-series Representation

32

Statistical data: Cross-sectional Representation

33

1135351111353511

Structural Metadata : What does the data mean?

� Structural metadata aim at identifying/browsing/retrieving the data.

�Structural metadata must be associated to the data, otherwise the data (values) cannot be understood.

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

12

34

IndicatorTime

2002A00 33411 2374 61479

2003A00 33480 2530 58526

2004A00 33518 2529 56586

2005A00 33527 2411 68385

2006A00 33768 2510 68376

2007A00 34058 2587 61810

Number of touristic establishments

in Italy, annual data

A100Hotels and similar

B010Tourist Campsites

B020Holiday dwellings

� Structural metadata are generally expressing concepts and are often represented by code lists.

35

Concepts may have different roles:

� Dimensions: to actually identify/distinguish the data

� Attributes: additional information on the data

� Measures: the statistical value (figure)

36

To easily exchange and process data, we first define a standard container based on the structure of the statistical table: The Data Structure Definition (DSD)

DSD

Code lists

Code lists

Code lists

Dimensions

Attributes

Measures

Concepts

The DSD can be seen as a "logical container" for a specific set ofdata that should be exchanged. It includes the concepts thatrepresent the data, gives them roles (Dimension, Measure, Attributes)and links them to code lists.

The Data Structure Definition

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

13

37

The elements of a Data Structure Definition (DSD):

38

Structural Metadata : What does the data mean?

� Reference metadata aim at describing the data andprovide information about the methodology applied or dataquality;

� Reference metadata can be exchanged independently ofthe data.

39

ConceptsThe Metadata Structure Definition:

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

14

40

Some other elements to enable the exchange:

Production and dissemination ofstatistical data

Production and dissemination of

Reference Metadata

41

SDMX Information Model – summary

42

4. SDMX and the GSBPM

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

15

43

4. SDMX and the GSBPMUse of SDMX statistical standards

SDMX

44

4. SDMX and the GSBPMUse of the SDMX technical standards

SDMX/DDI SDMX

Currently investigated

National andinternational

data/metadata exchange

45

4. SDMX and the GSBPMOverall

� For progressing towards more integration of the ESS:there is need for the ESS to also define a technicalstandard for micro-data production and exchange;

� There should be no competition between technicalstandards: use the more appropriate one for therespective steps of the GSBPM;

� If technical standards are combined, terminology shouldbe aligned/translated;

� Important: the governance of the standards: how can theappropriate governance of the standards be assured?

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

16

46

5. SDMX implementation in the ESS

The general principles (valid since 2008/2009):

� In each statistical domain where the dataset is newlyshaped or considerably revised, the SDMX technicaland statistical standards and guidelines should beimplemented.

� In each statistical domain where the collection ofnational reference data is newly shaped or revised, theESS reference metadata standards should be used(such as the Euro SDMX Metadata Structure (ESMS) –the ESQRS or other upcoming standards).

47

5. SDMX implementation in the ESS

SDMX is currently implemented in a series of statisticaldomains:

– Work is ongoing in National accounts, Balance ofpayment statistics, fishery statistics, waste statistics,transport statistics, etc.

– In addition: the ESS Census Hub: fully fletched SDMXimplementation using the SDMX hub architecture(attention: this means an overall shortening of the ESSbusiness process);

– In addition: the Euro Group Register……….

48

5. SDMX implementation in the ESS

The progress on SDMX implementation with regard toreference metadata:

– The Euro SDMX Metadata Structure (ESMS) is fully usedin production and dissemination for Eurostat referencemetadata; the ESMS is now also used for more and morenational reference metadata flows;

– The ESQRS is used in more and more statisticaldomains for national reference metadata flows;

– The National Reference Metadata Editor is the ESSMetadata application which can be used when producingand exchanging national ESMS/ESQRS files;

DwB – Data without Boundaries

Additional Workshop – Metadata Standards

07/12/2011

17

49

5. SDMX implementation in the ESS

The upcoming EP/EC Regulation on processes, standardsand metadata:

– New framework regulation which makes the use ofSDMX compulsory within the ESS;

– It harmonises the data and metadata exchangeprocesses within the ESS;

– It also makes the use of the SDMX based referencemetadata structures (ESMS, ESQRS) and StandardCode lists compulsory;

– Transition periods will be needed;

50

5. SDMX implementation in the ESS

Some problems:

– Slow SDMX implementation with regard to aggregatedata: data sets are not clear enough, slow internationalagreement on DSDs, incomplete SDMX standards,…

– Better progress on reference metadata due to only a fewgeneric MSDs and supporting IT applications;

– SDMX implementation sometimes requires changes inthe domain specific business processes towards morebusiness process integration;

– Not easy to break-up statistical domains organised instove-pipes.

51

6. Conclusions

� SDMX and the use of SDMX goes ahead;

� When implementing SDMX, one needs to use businessopportunities and explain well the benefits ofstandardisation;

� The full potential of SDMX needs to be exploited alsowith regard to the use of the SDMX data hubs;

� More clearness needed for the ESS what technicalstandard to use for micro-data production andexchange.