Poster presented at ISMB2011 in Vienna July 18

1
!"#$%&"# () *+,-.- /0 12. "+345.6, 7488+,(109) :.;.,12 <368.=43> %34?3688. @<%AB$CCAD$CEFG +,-.3 ?36,1 6?3..8.,1 $CCAHIJ ===J?.,$52.,J43? Motivating online sharing of research data Box 1: Identifying digital research outputs Challenges Current methods for monitoring data use/reuse and assessing impact relies on various referencing standards and conventions. Tracking reuse is difficult, time-consuming and inaccurate, not the least due to difficulties in identifying the datasets in question. Existing and emerging solutions Assigning digital identifiers (IDs) to published works allows them to be reliably identified and cited. In order to fulfill the requirements of the scholarly record, IDs should be persistent, globally unique and citable. Together with unique IDs for contributors (Box 2), this forms the basis of unambiguous attribution. Digital Object Identifiers (DOIs) are widely used for identifying and citing STM publications, via the not-for-profit CrossRef publishers' association ( http://www .crossref.org). DOIs for scientific datasets issued via DataCite ( http://datacite.org) are increasingly used for scientific data published in online digital repositories. IRISC201 1 - Identity in Research Infrastructure and Scientific Communication The 2-day IRISC2011 international workshop will be held September 12-13 in Helsinki, Finland. This event will bring together key stakeholders and experts and help foster collaboration, coordination and awareness in this area, not only in biomedicine and bioinformatics but in all areas of scientific research. Agenda and other info at http://irisc-workshop.org For further information please contact [email protected] Gudmundur A. Thorisson, Owen Lancaster and Anthony J. Brookes Department of Genetics, University of Leicester, Leicester, UK Identifying knowledge contributors As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn creates challenges in attributing credit for contributions to science, as well as in tracking use/reuse and assessing impact of research outputs. We are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The technical approach we are exploring utilizes emerging frameworks for data identification and citation (Box 1) and for contributor identification (Box 2), in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. The core aim is that of ensuring that data creators are recognized and rewarded for publishing their data. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences. Pilot project: Cafe V ariome - facilitating exchange of genetic variation data and attributing data creators Box 2: Identifying contributors Challenges Approx. 2/3 of ~6M authors in PubMed share a last name + first initial with at least one author. This name ambiguity create difficulties in identifying and attributing creators of published works, including datasets published via online digital repositories. Solving the contributor identification challenge is key to including these important outputs in the scholarly record. Emerging solutions With contributions from GEN2PHEN, the international ORCID initiative ( http://www .orcid.org) is creating a global infrastructure to "support the creation of a permanent, clear and unambiguous record of scholarly communication". ORCID will enable identification of contributors via unique IDs and reliably linking them with their published works, including but not limited to: - Peer-reviewed publications (CrossRef DOIs) - Datasets (DataCite DOIs) - Publications in the 'grey' literature The new infrastructure will help solve many current identification-related problems and create new opportuntities, such as: Discovery: - Which other papers were published by co-authors of this paper? - Which datasets were made available by this research project? Evaluation: - What is the scholarly record of this job applicant? - How often were the paper we published cited in the last 2 years? - What is the total no. citations and other references to papers, datasets and other outputs of the project we funded? Publish data Retrieve Atom feeds Submi&ng muta,ons from diagnos,c labs using “Café Variome enabled” so:ware via simple bu>on click Data are shared with diverse 3rd par,es via manual retrieval or automated feedbased monitoring/retrieval Diagnostic laboratories Central ‘clearinghouse’ End-users (e.g. LSDB curators) Data citation: G. A. Thorisson (ORCID:35-883-3523) and O. Lancaster (ORCID:35-992-3523). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/cafevariome.BRCA2-2352354 G. A. Thorisson, Univ. Leicester [email protected] ORCID:35-883-3523 Unique DOI name for dataset in DataCite, located at: http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354 Unique identifier for contributor in ORCID

description

Contributor identification is a core challenge in data publication. As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn makes it difficult to accurately attribute datasets published via online digital repositories to their creators – one of several key requirements for including these important outputs in the scholarly record. In the GEN2PHEN project (http://www.gen2phen.org) we are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The core aim is that of ensuring that data creators are recognized and rewarded for publishing data. This work builds on and integrates with recently launched international initiatives to i) extend and adapt the existing DOI infrastructure for identifying, locating and citing online datasets (DataCite: http://www.datacite.org), and to ii) create a global registry of unique identifiers for authors and other contributors (ORCID: http://www.orcid.org). The technical approach we are exploring in this pilot project utilizes this emerging global data citation and contributor identification framework, in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences. This document is published under the CC-BY license (http://creativecommons.org/licenses/by/3.0/). This means that you can copy, redistribute and adapt the content, as long as you attribute the original work.

Transcript of Poster presented at ISMB2011 in Vienna July 18

Page 1: Poster presented at ISMB2011 in Vienna July 18

!"#$%&"#'()'*+,-.-'/0''12.'"+345.6,'7488+,(109)':.;.,12'<368.=43>'%34?3688.'@<%AB$CCAD$CEFG'+,-.3'?36,1'6?3..8.,1'$CCAHIJ

===J?.,$52.,J43?

!!!"#$%&'($%")*#

Motivating online sharing of research data

Box 1: Identifying digital research outputs

ChallengesCurrent methods for monitoring data use/reuse and assessing impact relies on various referencing standards and conventions. Tracking reuse is difficult, time-consuming and inaccurate, not the least due to difficulties in identifying the datasets in question.

Existing and emerging solutionsAssigning digital identifiers (IDs) to published works allows them to be reliably identified and cited.In order to fulfill the requirements of the scholarly record, IDs should be persistent, globally unique and citable. Together with unique IDs for contributors (Box 2), this forms the basis of unambiguous attribution.

Digital Object Identifiers (DOIs) are widely used for identifying and citing STM publications, via the not-for-profit CrossRef publishers' association (http://www.crossref.org).DOIs for scientific datasets issued via DataCite (http://datacite.org) are increasingly used for scientific data published in online digital repositories.

IRISC2011 - Identity in Research Infrastructure and Scientific Communication

The 2-day IRISC2011 international workshop will be held September 12-13 in Helsinki, Finland. This event will bring together key stakeholders and experts and help foster collaboration, coordination and awareness in this area, not only in biomedicine and bioinformatics but in all areas of scientific research.Agenda and other info at http://irisc-workshop.org

For further information please contact [email protected]

Gudmundur A. Thorisson, Owen Lancaster and Anthony J. Brookes Department of Genetics, University of Leicester, Leicester, UK

Identifying knowledge contributors

As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn creates challenges in attributing credit for contributions to science, as well as in tracking use/reuse and assessing impact of research outputs.

We are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The technical approach we are exploring utilizes emerging frameworks for data identification and citation (Box 1) and for contributor identification (Box 2), in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. The core aim is that of ensuring that data creators are recognized and rewarded for publishing their data. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences.

Pilot project: Cafe Variome - facilitating exchange of genetic variation data and attributing data creators

Box 2: Identifying contributors

Challenges Approx. 2/3 of ~6M authors in PubMed share a last name + first initial with at least one author. This name ambiguity create difficulties in identifying and attributing creators of published works, including datasets published via online digital repositories. Solving the contributor identification challenge is key to including these important outputs in the scholarly record.

Emerging solutionsWith contributions from GEN2PHEN, the international ORCID initiative (http://www.orcid.org) is creating a global infrastructure to "support the creation of a permanent, clear and unambiguous record of scholarly communication".

ORCID will enable identification of contributors via unique IDs and reliably linking them with their published works, including but not limited to:

- Peer-reviewed publications (CrossRef DOIs)

- Datasets (DataCite DOIs)

- Publications in the 'grey' literature

The new infrastructure will help solve many currentidentification-related problems and create newopportuntities, such as: Discovery:

- Which other papers were published by co-authors of this paper?

- Which datasets were made available by this research project?

Evaluation:

- What is the scholarly record of this job applicant?

- How often were the paper we published cited in the last 2 years?

- What is the total no. citations and other references to papers, datasets and other outputs of the

project we funded?

Publish data Retrieve Atom feeds

Submi&ng  muta,ons  from  diagnos,c  labs  using  “Café  Variome  enabled”  so:ware  via  simple  bu>on  click

Data  are  shared  with  diverse  3rd  par,es  via  manual  retrieval  or  automated  feed-­‐based  monitoring/retrieval

Diagnostic laboratories

Central ‘clearinghouse’

End-users (e.g.LSDB curators)

Data citation: G. A. Thorisson (ORCID:35-883-3523) and O. Lancaster (ORCID:35-992-3523). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/cafevariome.BRCA2-2352354

G. A. Thorisson, Univ. [email protected]:35-883-3523

Unique DOI name for dataset in DataCite, located at: http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354

Unique identifier for contributor in ORCID