Post on 18-Dec-2015
OPERATIONAL METADATA FOR FEDERATING STATISTICAL
REFERENCE SYSTEMS AT EUROSTAT
G. Pongas, F. Vernadat
EC Eurostat B2
Overview of the talk
• Introduction
• CVD (Cycle de Vie des Données)
• REFIN: Internal Reference
• Eurostat Dissemination Portal (Site 3)
• Conclusion
Introduction
Metadata in statistical information:
• define some of the semantics of data• needed for proper production and usage of data• make data comparable• ensure some level of data quality• required for efficient search
CVD (Cycle de Vie des Données)
DataProviders
Da
ta C
olle
ctio
n
Va
lida
tion
Co
rre
ctio
nIm
pu
tatio
n
Inte
rna
lR
efe
ren
ceO
pe
ratio
ns
Ext
ern
al
Re
fere
nce
Dis
sem
ina
tion
StatisticalMetadata
Production
Raw
data
Validated
data
Derived
data
Public
data
Current Situation at Eurostat
ReferenceEnvironment
Sta
dium
Sta
tel
NSI
NSI
…
Suppliers Customers
CollectionEnvironment
ProductionEnvironment
DisseminationEnvironment
…
Comext
NewCronos
Public. Office, Web Site,
Info. relays
DG
ECB
Externalusers
NSI
PS
PS
PS
PS
PS
PSNSI
OECD
ReferenceEnvironment
Sta
dium
Sta
tel
Sta
dium
Sta
tel
NSI
NSI
…
Suppliers Customers
CollectionEnvironment
ProductionEnvironment
DisseminationEnvironment
…
Comext
NewCronos
Public. Office, Web Site,
Info. relays
DG
ECB
Externalusers
NSI
PS
PS
PS
PS
PS
PSNSI
OECD
EUROSTAT INTERNAL REFERENCE The problem
Two many different systems at EUROSTAT for handling data:
– FAME– Oracle Express– Oracle RDBMS– SAM– SAS
REFIN: The problem (Cont’d)
• Some of them are general purpose (e.g. Oracle RDBMS) whereas others may include special features (for data validation or computation) but they all have their own access methods and user interfaces (Express Analyser, FAME...)
• Major drawbacks:- High complexity for users- Data comparison between different systems is not
easy
What is REFIN ?
The REFIN system specifically addresses these issues
– Gives access to heterogeneous systems
– Provides the users with a common interface
• Data location and source system is hidden• Data not duplicated, access to the original data.
– Uses a unique exchange format (PIVOT)
– Implements specific security rules
REFIN architecture
FAMEDATA Bases+METADATA
ORACLE DBMSDATA bases
+METATDATA
MICROSOFT ACCESS
SAM+METADATA
HLI SNAPI OCI
REFININTERNAL REFERENCE
REFIN ADAPTOR
SECURITY LAYER
METADATA
METADATA + LOCALISATION DATA + PROCEDURES
RPC/DCE or XMLDAO/ODA/ODBC
ORACLE EXPRESS
DATA Bases+METADATA
REFIN architecture
SAM
EXPRESS
ORACLE
FAME
MetabaseBuilder
REFINParticular
Metabases
REFINCommonMetabase
MetabaseConverter
1) Generation of REFIN metadata
2) Mapping to Common Metadata
REFIN architecture
SAM
EXPRESS
ORACLE
FAMEFAME Driver
SAM Driver
EXPRESS Driver
ORACLE Driver
HLI
ODBC
SNAPI
OCI
REFINAPI
New possibilities provided by REFIN
To build heterogeneous data sets by mixing data from different origins and systems
Eurostat Dissemination Portal (Site 3)
Professionaluser
InternetPortalWeb
Server
LDAP
User Groups. EC. Journalists. Students. Citizens. ...
Publications Datasets
Dedicatedsections
+ virtual Publi/Datasets (URL’s)+ METADATA
Application Server
Quick/Adv. SearchSubscriptionAlert/Info pushContent DownloadContent ImportPrint
E-commerce
Local DB/File server
Ap
plic
atio
n I
nte
gra
tion
NewCronos(Num. Data+ metadata)
SUITEXML/XSL
JSP
Datasets
(fixed)
EVA/EVALightDatasets
(open)
Comext DB (Ref. Data+ statisticalmetadata)
ComextClient
ComextServer
RAMONCODED
CIRCA
STATPUB
API EU-Bookshop(OPOCE)
EU-DOREU off. Publi.
Presentation Layer Business Layer Back-office Layer
WSDL/SOAPService
callWeb services
Open Datasets
ESTAT Portal Platform
Internet user
WebCache
DOUCEUR
XML/XSLJSP
StatisticalMetadata
Publications
Site 3 MetadataSite3 Attribute Description Dublin Mandatory Domain
product_code Unique identifier of the content object stringISBN_ISSN Official ISBN or ISSN publication code stringauthor Author's name(s) of the content object stringresponsible_unit Identification of Unit responsible for the
availability of the content object
LOV
coeditor Name of co-editing organisation, if any stringcreator Name of user who uploaded the content object stringapproved_by Name of person who approved content object
upload/creation on the Website
string
current_version Current version of the content object stringrelease_date Issue date of content object by authoring unit datecreation_date Date of creation/upload of the content object datestart_effectivity_date Date and time at which the content object
becomes visible on the Website
full date
end_effectivity_date Date and time at which the content objectbecomes invisible on the Website
full date
expiration_date Date at which the content object must bedeleted/purged from on the Website
date
theme Theme name of content object LOVcollection Collection name of content object LOVlanguage Default language of the content object LOVother_languages List of other languages in which a version of
the content object is available
LOV
Site 3 Metadatatable_of_contents Name of file containing the table of contents stringtitle Official title of content object stringsummary Content object summary or official abstract stringkeywords Unordered list of keywords (maximum 10) stringfreetext Free text to add comments if needed. Not
visible on the Websitestring
graphs Indicate if there is any graph attached to thecontent object
Boolean
tables Indicate if there is any data table in thecontent object
Boolean
maps Indicate if there is any map attached to thecontent object
Boolean
cover_image Name of file containing data for the coverimage, if any
string
filename_url URL or file name of the content object (if notphysically stored on the Website)
string
related_products Name(s) of related products/datasets stringtype Type of content object (publication, dataset,
metadata, link)
LOV
support_format Medium type (electronic or paper) LOVother_formats List of other available mediums LOVlayout_size Size of publication format (e.g. A4, A5…) stringpage_nb Number of pages of the content object numbertable_number Dataset identifier (logical name) stringprice Selling price of the content object in Euro numberout_of_stock Indicate if publication is out of stock Booleanupdate_frequency Update frequency (e.g. daily, weekly,
monthly, quarterly, yearly, biannual)LOV
coverage Time period covered by publication or dataset stringstatus Content object status (visible, embargoed…) LOV
Conclusion
• Importance of linking data and metadata
• Importance of having an integrated metadata environment
• Clear distinction between– Statistical metadata– IT metadata– Dissemination metadata