Emerging Trends in Data Exchange and Data Hubbing Jacob Assa, UN Statistics Division Regional...

28
Emerging Trends in Data Exchange and Data Hubbing Jacob Assa, UN Statistics Division Regional Workshop on Data Dissemination and Communication Manila, the Philippines June 20-22, 2012 United Nations Statistics Division 2012

Transcript of Emerging Trends in Data Exchange and Data Hubbing Jacob Assa, UN Statistics Division Regional...

Emerging Trends in Data Exchange

and Data Hubbing

Jacob Assa, UN Statistics Division

Regional Workshop on Data Dissemination and Communication

Manila, the PhilippinesJune 20-22, 2012

United Nations Statistics Division2012

2

Outline of the Presentation

1. Data Dissemination in Context

2. Dissemination History at UNSD

3. Dissemination versus Communication

4. Data Exchange and SDMX

5. Data Hubbing Nationally and Globally

3

Data Dissemination in Context

Virtual Value Chain :

(Svend and Hollensen, 2001)

Dissemination – last but not least step Often done as an afterthought Can be made more efficient and effective:

From Data Publishing to Data Exchange From Data Silos to Data Hubbing

Organize, select and compile information

Defineinformation problem

Synthesize informationDistribute informationValue

4

Dissemination History in UNSD

League of Nations 1919-1948 – print publications

United Nations 1948-1995 – print publications (yearbooks, manuals) 1995-2000 – CD-ROM, static web pages 2000-2008 – online databases, dynamic web queries

(UN Comtrade, UN Common Database) 2008 – launch of UNdata – UN System data portal 2010 – World Statistics Pocketbook app for iPhones and

iPads 2012 – launch of CountryData – UN national data portal

5

Dissemination versus Communication

One-way vs. two-way communication Considerable evolution of statistical communication over

recent years Traditionally, statistical organizations focused on

Dissemination through printed publications One-way communication through few media channels

Newspapers Radio Television

Since 1990s, acknowledged need to do more than just disseminate data

Employing communication professionals Widespread use of the Internet New methods of communication and dissemination

6

Dissemination versus Communication

New methods of communication:

Web 2.0 technologies Blogs Wikis Social networks

Interactive websites Allow users to upload data and create graphs Sharing and discussion with other users

7

Paper questionnaires Excel sheets CSV files Email

Semi-structured XML files However, XML in itself is simply a mark-up

language and does not standardize data structure between exchanging parties

Data Exchange - Unstructured

8

XML - Example

Philippines,

GDP in constant 2000 US$

(World Bank)

1960 17,990,832,237

1961 19,001,301,599

1962 19,908,256,877

1963 21,313,876,851

9

10

Data Exchange - Structured

Statistical Data and Metadata Exchange (SDMX)

What is it? An initiative to foster standards for the electronic

exchange of statistical information Goal - explore e-standards that could increase efficiency

gains and avoid duplication Sponsored by BIS, ECB, EUROSTAT, IMF, OECD, UN, WB

What it is not Not a technology…but implemented using technology

(XML EDIFACT syntax and GESMES/TS message)

How does it work? Exchange partners agree on Data Structure Definitions Data and metadata exported and imported accordingly

11

Benefits of SDMX

Protection of existing technology investments

Many different types: Data warehouses OLAP cubes GESMES/TS Publication systems

SDMX standardizes formats and protocols at the point where data and metadata go between counter-parties

12

REPOSITORY Provisioning

Metadata

REGISTRY Data Set/

Metadata Set

REPOSITORY Structural Metadata

Subscription/Notification

Register

Query

Submit

Query

Submit

Query

Describes data and metadata structures

Describes data and metadata sources and reporting processes

Indexes data and metadata

SDMX Registry Interfaces

SDMX Registry/Repository

13

Impact of the SDMX Registry

The SDMX Registry allows for one of the major efficiency gains possible with SDMX:

Shifting from “push”-based reporting to “pull”-based reporting

This can save lots of time and duplication of effort

14

Specifies a set of concepts which describe and identify a set of data

Tells which concepts are the dimensions (identification and description) and which are attributes (just description)

Tells which code lists provide the possible values for the dimensions and attributes

What is a Data Structure Definition?

15

16

17

What is Data Hubbing?

In general, a hub is the central part of a wheel where the spokes come together. The term is familiar to frequent fliers who travel through airport "hubs" to make connecting flights from one point to another

In data communications, a hub is a place of convergence where data arrives from one or more directions and is forwarded out in one or more other directions

http://searchnetworking.techtarget.com

18

Data Hubbing at the National Level

Cambodia – DFID Project Objectives

Improve coordination in the National Statistical System

Collate development data in one place/hub

Make access to national data easier

Reduce data request burden

Use of latest IT software and practices

19

Line Ministries National Statistical Office United Nations

Line Ministry

Database

National Indicator Registry

National Repository DB

DevInfo

Upload

XLS

Scripts

Register files

Post

notification

Publish

SDMX-ML

Download

Mapping tool

Project Dissemination Model

20

Data Hubbing at the International Level (1)

The Joint External Debt Hub (JEDH)

Jointly developed by Bank for International Settlements (BIS) International Monetary Fund (IMF) Organization for Economic Cooperation and

Development (OECD) World Bank (WB)

21

JEDH Site before SDMX

BIS

IMF

OECD

WorldBank

WEBSITE

(VariousFormats) (3-month production cycle)

22

JEDH with SDMX

BIS

IMF

OECD

WorldBank

SDMX-ML

SDMX-ML

SDMX-ML

SDMX-ML

SDMX-ML(Debtor database)

[Info about data is registered]

SDMX“Agent”

SDMXRegistry

Discover data and URLs

Retrieves data from sites

JEDH Site

Data providedin real timeto site

SDMX-MLLoaded into

JEDH DB

23

Data Hubbing at the International Level (2)

UNdata Portal Before, a researcher interested analyzing the

effects of population, health and education on per capita income growth would need to visit: UNSD website for population figures WHO website for health indicators UNESCO website for education indicators UNSD/World Bank/IMF website for income data

Now all these indicators are available in one place through a single user interface

24

Comtrade World BankIMFILOFAOPopulation UNESCO

Abstraction layer

Source Databases

UNdata Portal

Data hub contained cached copies of source databases

Internet

Search Engine

25

http://data.un.org/

26

Data Hubbing at the International Level (3)

European Central Bank (ECB)

Push vs. pull plus a hybrid approach

Central Hub to which all member banks submit their SDMX data

The ECB then pulls the entire dataset from the Central Hub

SDMX-based visualizations

27

28

Resources

UNSD - Handbook of Statistical Organization (3rd ed.)

http://unstats.un.org/unsd/dnss/hb/default.aspx UNECE - Making Data Meaningful (2 parts)

http://www.unece.org/stats/documents/writing/ SDMX - http://sdmx.org/

ContactsUnited Nations Statistics Hotline - [email protected] Assa, UNSD - [email protected]