Pascal Heus Open Data Foundation pheus@opendatafoundation opendatafoundation

27
Workshop on Metadata Standards and Best Practices November 19-20 th , 2007 Session 2 Metadata specifications for socio- economic science and supporting initiatives Pascal Heus Open Data Foundation [email protected] http:// www.opendatafoundation.org

description

Workshop on Metadata Standards and Best Practices November 19-20 th , 2007 Session 2 Metadata specifications for socio-economic science and supporting initiatives. Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org. Outline. - PowerPoint PPT Presentation

Transcript of Pascal Heus Open Data Foundation pheus@opendatafoundation opendatafoundation

Page 1: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

Workshop on Metadata Standards and Best PracticesNovember 19-20th, 2007

Session 2Metadata specifications for socio-economic

science and supporting initiatives

Pascal Heus

Open Data Foundation

[email protected]

http://www.opendatafoundation.org

Page 2: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Outline

• Metadata specifications• Key players • Ongoing initiatives• Conclusions / Q&A

Page 3: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

What is Metadata?

• Common definition: Data about Data

Unlabeled stuff Labeled stuff

The bean example is taken from: A Manager’sIntroduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf

Page 4: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

What are XML specifications? (1)

• XML is a language that facilitate the capture of descriptive elements and attributes

• Different objects carry different characteristics (book, car, weather)

• We need to agreed on common set of descriptive elements (semantic)

• Just like we used to design database, we have to describe the structure

• This modeling process creates a Document Type Definition (DTD) or an XML Schema

Page 5: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

What are XML specifications? (2)

• Specifications are made available to the general public on the web– Usually a URL

• Can be turned into a “standard” (ISO)• Typically maintained by a consortium of

agencies– Independent model– OASIS, W3C– ISO

Page 6: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

A suggested set for socio-economic data

• Statistical Data and Metadata Exchange (SDMX)– Macrodata, time series, indicators, registries– http://www.sdmx.org

• Data Documentation Initiative (DDI)– Microdata (surveys, studies)– http://www.ddialliance.org

• ISO 11179– Semantic modeling, concepts, registries– http://metadata-standards.org/11179/

• ISO 19115– Geography– http://www.isotc211.org/

• Dublin Core– Resources (documentation, images, multimedia)– http://www.dublincore.org

Page 7: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Statistical Data and Metadata Exchange (SDMX)

• Purpose: Exchange of statistical information (time series/indicators). – Covers the metadata capture as well as implementation of

registries. – Currently version 2.0 and also an ISO standard

(17369:2005)

• Sponsors: Bank for International Settlements (BIS), European Central Bank (ECB), EUROSTAT, International Monetary Fund (IMF), Organization for Economic Cooperation and Development (OECD), United Nations (UN), World Bank

• Can actually be used for many other purposes. It’s a metadata metadata model.

• http://www.sdmx.org

Page 8: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Data Documentation Initiative 1/2.x

• Purpose: Archive and document survey microdata– Effort to establish an international XML-based

standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences

– Sections: document, survey, files, variables, other material

– Used by data archives (producers) and librarians

• Sponsors: DDI Alliance • http://www.ddialliance.org

Page 9: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Data Documentation Initiative 3.0

• Purpose: Document the survey life cycle– Major shift from DDI 1/2.x– Currently in candidate recommendation, release

in 2008

• Sponsors: DDI Alliance• http://www.ddialliance.org/ddi3

Page 10: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

DDI & SDMX

• Are complementary specifications• DDI 3.0 and SDMX 2.0 have been designed

to work with each other– SDMX registries can wrap DDI documents– Microdata: single point in time / geography, high

level of details (for statisticians, researchers)– Macrodata: high level indicators across time and

geography (fro economists, policy makers)– Using DDI+SDMX allows linkages and drilling

down from indicator to its source

• See "DDI and SDMX: Complementary, Not Competing, Standards", A. Gregory, P. Heus, July 2007 available at http://www.opendatafoundation.org/?lvl1=resources&lvl2=papers

Page 11: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ISO 11179

• Purpose: Manage registries / concepts– international standard for representing metadata

for an organization in a Metadata Registry (a central location in an organization where metadata definitions are stored and maintained in a controlled method)

– Compliance with this standard is important and both DDI 3.0 and SDMX have mapping mechanisms

• Sponsors: ISO/IEC Joint Technical Committee on Metadata Standards

• http://metadata-standards.org/

Page 12: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ISO 19115

• Purpose: Capture geography– It is a component of the series of ISO 191xx

standards for Geospatial metadata. – ISO 19115 defines how to describe geographical

information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use.

– Compliance in DDI 3.0

• Sponsors: ISO/TC 211 Geographic information/Geomatics

• http://www.isotc211.org/

Page 13: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Dublin Core

• Purpose: describe resources– standard for cross-domain information resource

description– widely used to describe digital materials such as

video, sound, image, text, and composite media– Small sore set of elements– Used for survey documentation

• Sponsors: Dublin Core Metadata Initiative• http://dublincore.org/

Page 14: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Advantages of XML metadata

• Metadata is easy to transform – From one standard to another or into different

format• DDI to SDMX, Dublin Core, MARC

– To other formats fro presentation• HTML, PDF

• Metadata is easy to exchange– Web services (SOAP, REST, etc.)

• Metadata is searchable– XPath, XQuery

• All these are native feature of XML

Page 15: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

PART 2Active agencies and ongoing initiatives

Page 16: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

DDI Alliance

• Membership based organization– Agencies: ICPSR, World Bank, Open Data

Foundation– National data archives: Danish, Finish, Dutch,

Norway, Swiss, UK– Germany: Centre for Survey Research and

Methodology (ZUMA), German Socio-Economic Panel Study (SOEP), Zentralarchiv fuer Empirische Sozialforschung (University of Koeln)

– Universities: Alberta, Berkeley, Guelph, Harvard/MIT, Minnesota, etc.

• Steering and Expert Committee• Meets annually at IASSIST• http://www.ddialliance.org

Page 17: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ICPSR

• The Interuniversity Consortium for Political and Social Research

• The world's largest archive of digital social science data– Acquire and preserve social science data– Provide open and equitable access to these data– Promote effective data use

• Home of the DDI Alliance• http://www.icpsr.umich.edu

Page 18: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

International Household Survey Network

• Partnership of international organizations seeking to improve the availability, quality and use of survey data in developing countries

• United Kingdom Department for International Development (DfID), * International Labor Organization (ILO), Partnership for Statistics in the 21st Century (PARIS21), United Nations Children Fund (UNICEF), United Nations Statistics Division (UNSD), World Health Organization and the Health Metrics Network (WHO/HMN), World Bank

• Plays a major role in the adoption of DDI around the globe, active in many developing countries

• Developer of the Microdata Management Toolkit• http://www.surveynetwork.org

Page 19: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation

• US based non-profit organization• Adoption of global metadata standards and

the development of open-source solutions promoting the use of statistical data

• Coordination of development efforts• Board of directors, advisors and

management group• Open to individual membership, institutional

association is through projects• http://www.opendatafoundation.org

Page 20: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Metadata Technology

• UK based private company• Consulting services and development of

tools based on open standards and open source

• Training services, registry services, metadata repositories, hosting

• Focus on SDMX, DDI and related standards• http://www.metadatechnology.com

Page 21: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

IASSIST

• International Association for Social Science Information Service & Technology

• IASSIST is an international organization of professionals working in and with information technology and data services to support research and teaching in the social sciences.

• Individual based membership• Primary platform for DDI community• Annual conference

– 2008: Stanford, CA, 2009: Tampere, Finland– DDI Alliance annual meeting

• http://www.iassistdata.org/

Page 22: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

DDI Foundation Tools Program

• Initiative aiming at the development of a Foundation Framework and a Toolkit to support the implementation of DDI applications and utilities (open source)

• MOU established September 2007, 2-year program (renewable on a annual basis afterwards)

• Canada Research Data Centre Network, Danish Data Archive, DDI Alliance, GESIS-ZUMA, National Opinion Research Center (NORC), Open Data Foundation (ODaF), and the UK Data Archive (UKDA)

• Web site coming soon

Page 23: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

UKDA Data Exchange Tools (DExT)

• Aim to develop, refine and test models for data exchange for both survey data and qualitative research data based on XML/RDF schema and will develop tools for data import and export

• Research the feasibility of developing automated conversion procedures for legacy formats

• ODaF currently involved in data conversion tool and qualitative metadata (QuDExT)

• http://www.data-archive.ac.uk/dext/

Page 24: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

NORC Data Enclave

• National Opinion Research Center• provides a secure environment within which

authorized researchers can access sensitive microdata remotely from their offices or onsite

• Data from National Institute for Standards and Technology’s (NIST) Technology Innovation Program (TIP), the Ewing Marion Kauffman Foundation, and the Economic Research Service at the US Department of Agriculture

• Possibly the first virtual data enclave• http://dataenclave.norc.org

Page 25: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Canada RDC Project

• Consists of 14 Research Data Centres Centres, 6 branch RDCs and the Federal Research Data Centre in Ottawa

• Data provided by Statistics Canada• RDC are now connected through a high

speed secure network• Project to adopt a DDI 3.0 based metadata

framework for survey documentation and research work and sponsor development of tools

• ODaF providing technical assistance• http://www.statcan.ca/english/rdc/index.htm

Page 26: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

EU 7th Research Framework Program

• Under Socio-economic Sciences and Humanities – related specific 2007 objectives: to bring together existing research infrastructures to support the efficient provision of essential research services

• INFRA-2008-1.1.2.27: promoting European wide access to microdata sets of official statistics for research and leading to a European statistical system open to researchers.– INFRA-2008-1.1.2.28 (through the development,

harmonisation and optimal use of indicators and data for economic and innovation research)

– INFRA-2008-1.1.2.29 (Developing improved access to historical archives and cultural collections for research purpose).

• Call coming out this month (due mid-Feb)• Proposal will be made for RDC networking/remote

access, data disclosure and metadata (Germany contact is Stefan Bender at IAB Nurnberg RDC)

Page 27: Pascal Heus  Open Data Foundation pheus@opendatafoundation opendatafoundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Conclusions

• Metadata specifications available but need tools

• Lost of complementary ongoing initiatives and potential synergies

• Need coordination and partnerships (ODaF)