COMPLETING THE CIRCLEwokinfo.com/media/pdf/dci-webinar.pdfICSU WDS – ROLES & RELATIONS IN A...

Post on 26-Jun-2020

1 views 0 download

Transcript of COMPLETING THE CIRCLEwokinfo.com/media/pdf/dci-webinar.pdfICSU WDS – ROLES & RELATIONS IN A...

COMPLETING THE CIRCLE:

PERSPECTIVES ON INTEGRATING DATASETS IN BASIC RESEARCH AND DISCOVERY

Panelists Mary Vardigan John Kunze Michael Diepenbroek Nigel Robinson

January 31, 2013

PANELIST CONTACT INFORMATION   Nigel Robinson (moderator, presenter)

Director, York Operations Thomson Reuters United Kingdom nigel.robinson@thomsonreuters.com go.thomsonreuters.com/DCI

  Mary Vardigan Assistant Director Inter-university Consortium for Political and Social Research (ICPSR) United States vardigan@umich.edu http://www.icpsr.umich.edu/icpsrweb/landing.jsp

  John Kunze Associate Director University of California Curation Center California Digital Library United States john.kunze@ucop.edu http://www.cdlib.org/

  Michael Diepenbroek Managing Director PANGAEA Data Publisher for Earth & Environmental Sciences (ICSU World Data System - WDS) Germany mdiepenbroek@pangaea.de www.pangaea.de

AGENDA   INTRODUCTION

  GUEST SPEAKERS

  Q&A

THE DIGITAL UNIVERSE EXPANSION

DIGITAL SCHOLARSHIP

  Very visible within the literature as a concept   Articles, projects, university labs all devoted to digital scholarship in various ways

Digital Scholarship

  Authors / researchers   Research administrators   Librarians, data archivists   Publishers   Grant funding organizations

Interested Parties

  Discipline-specific and multidisciplinary content   Needs and requirements vary by discipline   Diverse content formats, with few standards   Includes collaboration and communications

Content

“Data is the new gold” – Neellie Kroes, EU Digital Agenda Commissioner

THE INCREASING VISIBILITY OF DATA   Grant funding agencies

  Journal publishers  Publisher website  Data journals

  Data repositories & registration agencies

SHARING RESEARCH DATA HOW CAN WE ENCOURAGE GOOD PRACTICE?

Mary Vardigan Assistant Director, ICPSR January 31, 2013

OUTLINE OF PRESENTATION   What is ICPSR?

  Importance of data sharing

  Ways ICPSR is encouraging good practice

  Benefits of the data citation index

WHAT IS ICPSR?   Repository of social science data established in 1962

  Over 8,000 studies, over 60,000 datasets

  Membership-based organization – over 700 members

  Source for training in statistics and data curation through the summer program

  www.icpsr.umich.edu

IMPORTANCE OF DATA SHARING   Open scientific inquiry – Findings can be verified

  New research – Extend original findings, address new questions

  Reduced costs – Large collections like the General Social Survey intended for sharing (over 9,000 publications written)

  Training – Students benefit from using others’ data

MORE ON DATA SHARING Colleagues surveyed principal investigators on data sharing behavior

  Findings: When data are shared, two to three times as many primary publications result1

  Data sharing leads to more science, more knowledge

 1 A. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307

ENCOURAGING GOOD PRACTICE

PROMOTE STANDARDS FOR DATA CITATION   Have provided data citations since 1990, DOIs since 2008

  With Data-PASS partners, contacted major journals in sociology, economics, political science –  Highlighted past data citation practices –  Emphasized use of citations, persistent

identifiers –  ASR revised its submission guidelines to reflect

data citation requirement

BUILD COMMUNITY ENGAGEMENT Sloan Foundation Grant: Three meetings

  Establish consistent citation of data in social science journals

  Promote research transparency and replication

  Optimize editorial workflows related to data

  Develop common standards/solutions for repositories

  Explore models for sustainability of repositories

  Challenge grants – three to five grants of up to $20,000 each (announcement coming soon)

OFFER RICH METADATA   Metadata are key to discovery and to effective data use

  ICPSR’s 8000 studies have structured XML metadata compliant with the DDI standard

  We also provide searchable metadata at the question/variable level

LINK DATA AND PUBLICATIONS   ICPSR’s Bibliography of Data- Related Literature – 60,000 citations

  Two-way linking: Data link to publications, publications to data

  Forms the basis for information in DCI

  Proper data citation practice with DOIs can automate links between data and publications

PARTICIPATE IN THE DATA CITATION INDEX   Reinforces good practice – linking publications and data, data citation, metadata, access

  Brings greater visibility for data resources and data producers

  Elevates status of research data

  Highlights DOIs for data prominently

  Broadens resource discovery across disciplines

  Shows impact of investment in data to funders

SUGGESTIONS FOR THE DATA CITATION INDEX   Add more links between data and publications for more repositories

  Integrate data fully into the Web of Knowledge, using appropriate language (e.g., “this dataset has been cited“)

THANK YOU…

Mary Vardigan ICPSR vardigan@umich.edu

LIBRARY TOOLS SUPPORTING DATA-RICH RESEARCH  UNIVERSITY OF CALIFORNIA CURATION CENTER CALIFORNIA DIGITAL LIBRARY

THE RESEARCH DATA PROBLEM

  Journal article –  Uniquely and persistently

identified

–  Concept of “publish”

–  Multiple copies

–  Easily findable

–  Services: impact metrics, citation tracking, etc.

  Research data –  Nope

–  Not really

–  Typically one

–  Difficult

–  Nope

Research data is seen as a second-class citizen in the scholarly record.  

WHERE CAN LIBRARIES MAKE A DIFFERENCE?

Research  

Collect  Save  

Publish  Share  

Create  Knowledge  

Research  &  Scholarship  Lifecycle  

COLLECT> PUBLISH> SHARE> SAVE> RESEARCH

Capture today’s web; build tomorrow’s archives

Create, edit, share, and save data management plans

Open source curation add-in for Microsoft Excel

COLLECT> PUBLISH> SHARE> SAVE> RESEARCH

Create and manage persistent identifiers: ARKs,

DOIs, etc.

An infrastructure to publish and get credit for sharing research

data

COLLECT> PUBLISH> SHARE> SAVE> RESEARCH

Curation repository: store, manage, preserve, and share

research data

Open deposit, open access repository for spreadsheet data

 Data  Observa�on  Network  for  Earth  

COLLECT> PUBLISH> SHARE> SAVE> RESEARCH

What’s missing to complete the “incentive” circuit?

  Impact measures, citation tracking

“Connecting the data to the research it informs”

Altmetrics tools to measure non-traditional products and

uses ,  etc.  ,  

THE REST OF THE STORY

www.cdlib.org/uc3

dataup.cdlib.org

www.escholarship.org

wokinfo.com/products_tools/multidisciplinary/dci/

John.Kunze@ucop.edu

RESEARCH DATA ENTERS SCHOLARLY COMMUNICATION TOWARDS AN INFRASTRUCTURE FOR DATA PUBLICATION IN THE EMPIRICAL SCIENCES

Michael Diepenbroek, Hannes Grobe, Uwe Schindler PANGAEA® - AWI / MARUM

  Licenses (Creative Commons)   Business models

  Open Access

  Persistent identification

PREREQUISITES FOR DATA PUBLICATION?

Effort needed

Data

Value

Articles

  Trusted & certified archives

Source: PARSE Insight, Report 3.4 www.parse-insight.eu

Researchers: Publishers:

PREREQUISITES FOR DATA PUBLICATION?

  QA/QC -> review procedures   (Meta)data & interoperability standards

(machine readable)

DOC

PDF

CSV

NetCDF

TXT

XML

XLSX

XLS

GRIB

OECD principles and guidelines for access to research data (2007)   Professionalism   Interoperability   Quality   Efficiency

Data Set Data Set Data Set

Data Set

Data Set Data Set

Data Set

Data Set

Data Set

PREREQUISITES FOR DATA PUBLICATION?

Data

time

Article Data

Article

Article Data

Data

Article Data

  Citability

PREREQUISITES FOR DATA PUBLICATION?

COLLABORATION BETWEEN DATA ARCHIVES & SCIENCE JOURNALS   Linking editorial workflows   Linking services

ICSU WDS – ROLES & RELATIONS IN A FEDERATED SYSTEM

Publishers commercial, open access

(e.g. ESSD journal), crossreferencing

Data Collection & Processing Facilities

QA/QC, data products, also data rescue

Data Archiving & Publication Facilities

certified repositories

Related Networks & Programs

GEOSS, GMES, WMO-IS, IOC etc

Metadata & Data Services

web portals, catalogues

Visualisation & Analysis

compute systems, virtual labs, GIS systems

Research Institutions universities,

research institutes

Research Projects / Programs national, EU, international

Libraries DOI registry

interdiscipl. catalogues

Research Facilities sattelites, vessels,

observatories, alert systems etc.

Education & Outreach

Scientific Communities & Other Stakeholders

Datasets and Data Citation Index, 2013 ~ www.icsu-wds.org

BIBLIOMETRICS 35% to 69% more

citations

Courtesy of Jon Sears (AGU)

Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308

www.icsu-wds.org

Hydrosphere Lithosphere Atmosphere Cryosphere

Total number of data sets ~350.000 Data items ~ 6.3 billions

PANGEA MULTIDISCIPLINARY DATA ARCHIVE AND PUBLISHER

LINKING INFRASTRUCTURE

Publishers Elsevier Nature

Springer Wiley

Bibliometrics Thomson

Catalogues Scopus WDS

GEOSS PANGAEA linking WS OAI-PMH

DATA PUBLISHING – CROSS-REFERENCING

DATA PUBLISHING – CROSS-REFERENCING

DATA PUBLISHING – CROSS-REFERENCING

Publishers

Data archive Bibliometrics

Catalogues

Data archive

LINKING INFRASTRUCTURE

Data archive

Data archive

Data archive

ICSU WDS PERSPECTIVE

Certified Data Archives

Registries

Bibliometric Services

Catalogues

Web of Knowledge Google Scholar Scopus

Thomson Reuters Citation Indexes

Crossref DataCite ORCID

Journals

ICSU WDS

COLLABORATING TO CREATE THE DATA CITATION INDEX NIGEL ROBINSON

THE INCREASING VISIBILITY OF DATA   Grant funding agencies

  Journal publishers  Publisher website  Data journals

  Data repositories & registration agencies

DEPOSITION OF DATA BY RESEARCHERS

48

24%

36%

47%

51%

17%

Publisher  website

Repository  managed  by  a  third  party  (e.g,  domain-­‐…

Department  or  institutional  repository

Personal  website

Other

Q16.    Where  do  you  place  your  non-­‐traditional   scholarly   output  to  make  it  available   to  others?  (n=471)

RESEARCHERS NOT RECEIVING CREDIT

49

Barriers to creating and sharing data:   Work is not adequately

exposed or accredited  

data repositories do not have clear standards or mechanisms in place for doing so

BARRIERS TO RESEARCHERS CITING DATA Researchers agree that data should be cited, but there are currently no universally accepted standards for citing data

50

“Lack of knowledge about standards for citation and of proper scholarly recognition and/or evaluation of such materials…” “…cumbersome citation formats including very long internet addresses.” “Incomplete citation information available (dates and real author names as distinct from aliases).”

DATA CITATION BEHAVIOUR Current citation style (in full text of article)

Desired/future citation style (as part of cited references)

U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, 1988. Version 1. Inter-university Consortium for Political and Social Research

http://dx.doi.org/10.3886/ICPSR09907.v1

Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes

extracellular a-synuclein. Gene

acc=GSE11574

OBSERVED RESEARCHER PROBLEMS   Access & discovery

  Citation standards

  Lack of willingness to deposit and cite

  Lack of recognition / credit

WHERE DO WE START?   Enable the discovery of data repositories, data studies and data sets in the context of traditional literature

  Help researchers find data sets and studies and track the full impact of their research output

  Provide expanded measurement of researcher and institutional research output and assessment

  Facilitate more accurate and comprehensive bibliometric analyses

DATA REPOSITORIES   Over 500 repositories identified

INDEXING A DATA REPOSITORY ON WEB OF KNOWLEDGE

  Repository/Source: Comprises data studies, data sets and/or microcitations. Stores and provides access to the raw data.

  Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time.

  Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.

  Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a predicate and an object. 55

Record Types

DCI record: data repository

data study data set

microcitation

Descriptive metadata feed from repository

Repository raw

metadata is analysed

Metadata added

DATA REPOSITORY MODEL

Repository

Data Study

Data Set

Microcitation

Data Study, Data Set and Microcitation levels are optional

CHALLENGES   Metadata availability

–  Lack of resources –  Lack of expertise

  Metadata quality –  Metadata inconsistencies

  Data repositories are not static

  Partnerships

DATA CITATION INDEX - METADATA PARTNERSHIPS

DataCite

Repository 1

Repository 2

Repository 3

Data Citation Index

DataCite

Data Citation Index

Repository 1

Repository 2

Repository 3

COLLABORATION BENEFITS   Any repository providing metadata to the aggregator is included in the Data Citation Index

  Uniform data

  Faster and more frequent updates

QUESTION & ANSWER Please type any questions into the webex chat panel