Download - Poster presented at ISMB2011 in Vienna July 18

!"#$%&"#'()'*+,-.-'/0''12.'"+345.6,'7488+,(109)':.;.,12'<368.=43>'%34?3688.'@<%AB$CCAD$CEFG'+,-.3'?36,1'6?3..8.,1'$CCAHIJ

===J?.,$52.,J43?

!!!"#$%&'($%")*#

Motivating online sharing of research data

Box 1: Identifying digital research outputs

ChallengesCurrent methods for monitoring data use/reuse and assessing impact relies on various referencing standards and conventions. Tracking reuse is difficult, time-consuming and inaccurate, not the least due to difficulties in identifying the datasets in question.

Existing and emerging solutionsAssigning digital identifiers (IDs) to published works allows them to be reliably identified and cited.In order to fulfill the requirements of the scholarly record, IDs should be persistent, globally unique and citable. Together with unique IDs for contributors (Box 2), this forms the basis of unambiguous attribution.

Digital Object Identifiers (DOIs) are widely used for identifying and citing STM publications, via the not-for-profit CrossRef publishers' association (http://www.crossref.org).DOIs for scientific datasets issued via DataCite (http://datacite.org) are increasingly used for scientific data published in online digital repositories.

IRISC2011 - Identity in Research Infrastructure and Scientific Communication

The 2-day IRISC2011 international workshop will be held September 12-13 in Helsinki, Finland. This event will bring together key stakeholders and experts and help foster collaboration, coordination and awareness in this area, not only in biomedicine and bioinformatics but in all areas of scientific research.Agenda and other info at http://irisc-workshop.org

For further information please contact [email protected]

Gudmundur A. Thorisson, Owen Lancaster and Anthony J. Brookes Department of Genetics, University of Leicester, Leicester, UK

Identifying knowledge contributors

As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn creates challenges in attributing credit for contributions to science, as well as in tracking use/reuse and assessing impact of research outputs.

We are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The technical approach we are exploring utilizes emerging frameworks for data identification and citation (Box 1) and for contributor identification (Box 2), in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. The core aim is that of ensuring that data creators are recognized and rewarded for publishing their data. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences.

Pilot project: Cafe Variome - facilitating exchange of genetic variation data and attributing data creators

Box 2: Identifying contributors

Challenges Approx. 2/3 of ~6M authors in PubMed share a last name + first initial with at least one author. This name ambiguity create difficulties in identifying and attributing creators of published works, including datasets published via online digital repositories. Solving the contributor identification challenge is key to including these important outputs in the scholarly record.

Emerging solutionsWith contributions from GEN2PHEN, the international ORCID initiative (http://www.orcid.org) is creating a global infrastructure to "support the creation of a permanent, clear and unambiguous record of scholarly communication".

ORCID will enable identification of contributors via unique IDs and reliably linking them with their published works, including but not limited to:

- Peer-reviewed publications (CrossRef DOIs)

- Datasets (DataCite DOIs)

- Publications in the 'grey' literature

The new infrastructure will help solve many currentidentification-related problems and create newopportuntities, such as: Discovery:

- Which other papers were published by co-authors of this paper?

- Which datasets were made available by this research project?

Evaluation:

- What is the scholarly record of this job applicant?

- How often were the paper we published cited in the last 2 years?

- What is the total no. citations and other references to papers, datasets and other outputs of the

project we funded?

Publish data Retrieve Atom feeds

Submi&ng muta,ons from diagnos,c labs using “Café Variome enabled” so:ware via simple bu>on click

Data are shared with diverse 3rd par,es via manual retrieval or automated feed-‐based monitoring/retrieval

Diagnostic laboratories

Central ‘clearinghouse’

End-users (e.g.LSDB curators)

Data citation: G. A. Thorisson (ORCID:35-883-3523) and O. Lancaster (ORCID:35-992-3523). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/cafevariome.BRCA2-2352354

G. A. Thorisson, Univ. [email protected]:35-883-3523

Unique DOI name for dataset in DataCite, located at: http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354

Unique identifier for contributor in ORCID

mailto:[email protected]