Dataset Citation and Identification

18
Dataset citation and identification Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite December, 2009

description

This presentation sets out some of the challenges around citing and identifying datasets and introduces DataCite, the international data citation initiative. DataCite was founded on 1-December 2009 to support researchers byproviding methods for them to locate, identify, and citeresearch datasets with confidence.This presentation was given by Adam Farquhar at the STM Publishers Association Innovation Conference on 4-Dec-2009.

Transcript of Dataset Citation and Identification

Page 1: Dataset Citation and Identification

Dataset citation and identification

Adam Farquhar, PhD

Head of Digital Library Technology, The British LibraryPresident, DataCite

December, 2009

Page 2: Dataset Citation and Identification

2

Widening gap

A widening gap in the scientific record between published research and the data that underlies it

Published work held by libraries

Datasets held by data centres

No effective way to link between datasets and articles

No widely used method to identify datasets

No widely used method to cite datasets

As a result, datasets are

Difficult to discover

Difficult to access

Second-class citizens in the scientific record

Page 3: Dataset Citation and Identification

3

Datasets – first class citizens?

Datasets

Data is difficult to manage after

project funding ceases

Informal networks provide the

primary means of sharing

Only 21% use a national or

international facility

Datasets are not included in impact

analysis

Good luck finding it or getting

permission to use it (your discipline

may vary)

Source: UKRDS Study

Published articles

Libraries ensure long-term storage

and management

Established funded services provide

the primary means of access

Nearly all published articles are held

in multiple national libraries

Articles and citations form the

backbone of impact analysis

Catalogues and full-text search

support discovery

Page 4: Dataset Citation and Identification

4

Dataset citation using Digital Object Identifiers (DOIs)

The DOI system offers an easy way to connect the article with the underlying data

Several organisations assign DOIs to datasets

IUCR, ICPSR, OECD through CrossRef

Pangea, Mare, and others through TIB (German Science Library)

DatasetG.Yancheva, N. R. Nowaczyk et al (2007)

Rock magnetism and X-ray flourescence

spectrometry analyses on sediment cores

of the Lake Huguang Maar, Southeast

China, PANGAEA

doi:10.1594/PANGAEA.587840

ArticleG. Yancheva, N. R. Nowaczyk et al (2007)

Influence of the intertropical convergence

zone on the East Asian monsoon

Nature 445, 74-77

doi:10.1038/nature05431

Cite

s

Page 5: Dataset Citation and Identification

5

DataCite – International DataCitation Initiative

Our long term vision is to support researchers by

providing methods for them to locate, identify, and cite research datasets with confidence.

Milestones

2005, Hannover, TIB begins to issue DOIs for datasets

March 2009, Paris

Memorandum signed at ICSTI

December 2009, London

DataCite Association founded

(DataCite : Data Centres :: CrossRef : Publishers)

Page 6: Dataset Citation and Identification

6

Global partnership

Germany - TechnischeInformationsbibliothek (TIB)

United Kingdom - The British Library

France - L’Institut de l’InformationScientifique et Technique (INIST)

Switzerland - Library of the ETH Zürich

Denmark - Library of TU Delft

Netherlands - Technical Information Center

Canada - Canadian Institute for Scientific and Technical Information (CISTI)

Australia - National Data Service (ANDS)

USA - California Digital Library

USA - Purdue University

Page 7: Dataset Citation and Identification

7

DataCite

The DataCite registration agency

Maintains the resolution infrastructure

Maintains a searchable database of metadata

Manages the identifiers over the long term

Establishes and shares best practice

Publishing agents (data centres, research institutes, publishers) are responsible for

Quality assurance

Content storage and access

Creating the identifier

Creating and updating metadata

Page 8: Dataset Citation and Identification

8

DataCite Structure

DataCite

Member

Institution

Data CentreData CentreData Centre

Member

Institution

Data CentreData CentreData Centre

Carries

Works with

International DOI

Foundation

Managing Agent

(TIB)

Member

Associate

Stakeholder

Page 9: Dataset Citation and Identification

9

Page 10: Dataset Citation and Identification

10

Page 11: Dataset Citation and Identification

11

Page 12: Dataset Citation and Identification

12

Page 13: Dataset Citation and Identification

13

Page 14: Dataset Citation and Identification

14

Page 15: Dataset Citation and Identification

15

Page 16: Dataset Citation and Identification

16

Page 17: Dataset Citation and Identification

17

Research Data in Articles

Page 18: Dataset Citation and Identification

18

How can we work together?

DataCite supports researchersby enabling them to locate, identify, and cite research datasets with confidence

This is the start of a conversation

We welcome your comments, questions, and ideas!

Contact:adam.farquhar {@} bl.uk

jan.brase {@} tib.uni-hannover.de

Help to establish best practices

Adjust author policies to require clear unambiguous citations for datasets

Integrate links to datasets into delivery platforms

Collaborate to understand evolving roles and responsibilities among publishers, data centres, and libraries

Help me to rewrite this list!