NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

41
Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator The rise of the data-centric research and publication enterprises Susanna-Assunta Sansone, PhD @biosharing @isatools @scientificdata iDASH meeting, San Diego, Sept 15-16, 2014 Board of Directors; Technical Advisory Board; Coordinating Editors; Sector Lead

description

http://idash.ucsd.edu/data-integration-analysis-and-sharing-symposium

Transcript of NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Page 1: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Data Consultant,

Honorary Academic Editor

Associate Director,

Principal Investigator

The rise of the data-centric

research and publication enterprises

Susanna-Assunta Sansone, PhD

@biosharing

@isatools

@scientificdata

iDASH meeting, San Diego, Sept 15-16, 2014

Board of Directors; Technical Advisory Board;

Coordinating Editors; Sector Lead

Page 2: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/

Credit to:

Page 3: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Worldwide movement for FAIR data

Credit: Barend Mons

Page 4: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Worldwide movement for FAIR data

Credit: Barend Mons

http://bd2k.nih.gov/workshops.html#ADDS

Page 5: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Notes and narrative Spreadsheets and tables Linked data and nanopublications

Increase the level of annotation at the source, tracking provenance and using community standards

Doing my fair share of work

Working with and for:

Page 6: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

6

• make annotation explicit and discoverable

• structure the descriptions for consistency

• ensure/regulate access

• deposit and publish

• etc….

To make any dataset ‘FAIR’, one

must have standards, tools and

best practices to: • report sufficient details• capture all salient features of

the experimental workflow

Page 7: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

7

…breath and depth

of the experimental context

…is pivotal

Page 8: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

8

sample characteristic(s)

experimental design

experimental variable(s)

technology(s)

measurement(s)

protocols(s)

data file(s)

......

Page 9: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The role of reporting or content standards

Including minimum

information reporting

requirements, or

checklists to report the

same core, essential

information

Including controlled

vocabularies, taxonomies,

thesauri, ontologies etc. to

use the same word and

refer to the same ‘thing’

Including conceptual

model, conceptual

schema from which an

exchange format is derived

to allow data to flow from

one system to another

Community-developed “norms” set to structure and enrich the

description of datasets, facilitating understanding, sharing and reuse

Page 10: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

A community mobilization - some examples

de jure de facto

grass-rootsgroups

standard organizations

Nanotechnology Working Group

Page 11: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Organizational and operational structures - quite diverse

de jure de facto

grass-rootsgroups

standard organizations

Nanotechnology Working Group

Page 12: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

12

Technologically-delineated views of the world

Biologically-delineated views of the world

Generic features (‘common core’)- description of source biomaterial- experimental design components

Arrays

Scanning Arrays &Scanning

Columns

Gels

MS MS

FTIR

NMR

Columns

transcriptomicsproteomics

metabolomics

plant biologyepidemiology

microbiology

Fragmentation, duplications and gaps

To compare and integrate data we need interoperable standards

Page 13: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Growing number of reporting standards

~ 156

~ 70

~ 334

So

urce: B

ioP

ortal

Databases, annotation,

curation tools

implementing standards

miameMIAPA

MIRIAMMIQASMIX

MIGEN

CIMRMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

MAGE-TabGCDML

SRAxmlSOFT FASTA

DICOM

MzMLSBRML

SEDML…

GELML

ISA-Tab

CML

MITAB

AAO

CHEBIOBI

PATO ENVOMOD

BTO

IDO…

TEDDY

PRO

XAO

DO

VO

Page 14: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Which standards and database can we use/recommend

Page 15: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

BioSharing works to map the landscape of content standards in the

life sciences, broadly covering biological, natural and

biomedical sciences

The web-based, curated and searchable registry works to ensure the

standards are informative and discoverable, monitoring their

development, evolution also their use in databases

and adoption in data policies.

Page 16: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

BioSharing’s goal is to assist stakeholders to make informed decisions:

• researchers, developers and curators who lack support and guidance on how to

best navigate and select the various content standards and understand their

maturity, or find databases that implement them;

• funders, journals, and librarians because they do not have enough information to

make informed decisions on which content standards or database should be

recommended in their policies, or funded or implemented.

Page 17: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Operational Team

Advisory Board and RDA Working Group

Page 18: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

18

Core functionalities:• search and filtering• submissions forms to add new records• “claim” functionality of existing records• person’s profile (as maintainer of

records) associated to the ORCID

profile• visualization and views of content

Current content:

• Over 500

• Over 600

Page 19: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards

CTSA Omics Data Standards Working Group

Page 20: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries

Page 21: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries

Page 22: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries

Page 23: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Page 25: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

General-purpose, configurable format for

the description of experimental metadata

Designed to support:

• provenance tracking

• use of community minimal reporting

guidelines and terminologies

- reference system to link to (CDISC)

SDTM files; further connections

explored via

Designed to be converted to:

• a growing number of other metadata

formats, e.g. used by EBI repositories

• RDF representation with mapping to

several ontologies, incl. PROV-O to

deliver

analysis method script

Data file or record in a database

Page 26: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Page 27: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

ISA powers data collection, curation resources and repositories, e.g.:

Page 28: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Embedding and in activities

CEDAR: Centre for Extended Data Annotation and Retrieval

(PI: Musen; pending notification of award)

The centre will take advantage of the recent growth in community-driven metadata standards to develop innovative methods to facilitate the annotation, cataloguing, and retrieval of dataset collections.

(pending final decision and notification of award)

Page 29: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Role of publishers as “agents of change”

• Data has to become an integral part

of the scholarly communications

• Responsibilities lie across several

stakeholder groups: researchers,

data centers, librarians, funding

agencies and publishers

• Publishers occupy a leverage point

in this process

Page 30: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Launched on May 27th, 2014

A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these

Credit for sharing your data

Focused on reuse and reproducibility

Peer reviewed,curated

Promoting CommunityData Repositories

Open Access

Supported by:

Page 31: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Experimental metadata or

structured component

(in-house curated, machine-readable formats)

Experimental metadata or

structured component

(in-house curated, machine-readable formats)

Article or

narrative component

(PDF and HTML)

Article or

narrative component

(PDF and HTML)

Data Descriptor: narrative and structure

Page 32: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Experimental metadata or

structured component

(in-house curated, machine-readable formats)

Experimental metadata or

structured component

(in-house curated, machine-readable formats)

Article or

narrative component

(PDF and HTML)

Article or

narrative component

(PDF and HTML)

Data Descriptor: narrative and structure

Page 33: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Data Descriptor - focus on reuse

Sections:• Title• Abstract• Background & Summary• Methods• Technical Validation• Data Records• Usage Notes • Figures & Tables • References• Data Citations

Detailed descriptions of methods and technical analyses supporting quality

of the measurements; does not contain tests of new scientific hypotheses

In traditional publications this information is not provided in a sufficiently detailed manner

However this information is essential for understanding, reusing, and reproducing datasets

Page 34: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Scientific hypotheses:

Synthesis

Analysis

Conclusions

Scientific hypotheses:

Synthesis

Analysis

Conclusions

Methods and technical analyses supporting the quality of the measurements:

What did I do to generate the data?

How was the data processed?

Where is the data?

Who did what when

Methods and technical analyses supporting the quality of the measurements:

What did I do to generate the data?

How was the data processed?

Where is the data?

Who did what when

Relation with traditional articles - content

Page 35: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)

AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)

AFTER: expand on your research articles, adding further information for reuse of the data

Relation with traditional articles - time

Page 36: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Citations of and links to data files - databases

Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group

Page 37: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Value added component integrated in a growing ecosystem

We currently recognize over 50 public data repositories

Re

sea

rch

p

ap

ers

Da

ta

rec

ord

sD

ata

D

es

crip

tors

Page 38: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Evaluation is not be based on the perceived impact or novelty of the findings

• Experimental rigour and technical data qualityo Methodologically soundo Technical validation experiments and statistical analyseso Depth, coverage, size, and/or completeness of data sufficient for the types

of applications

• Completeness of the descriptiono Sufficient details to allow others to reproduce the results, reuse or integrate

it with other datao Compliance with relevant minimum information or reporting standards

• Integrity of the data files and repository recordo Data files match the descriptions in the Data Descriptoro Deposited in the most appropriate available data repository

Peer review process focused on quality and reuse

Page 39: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

• Neuroscience, ecology, epidemiology, environmental science, functional

genomics, metabolomics, toxicology etc.

• New previously published individual datasets, curated aggregation and

citizen science:

o a fuller, more in-depth look at the data processing steps, supported by

additional data files and code from each step

o additional tutorial-like information for scientists interested in reusing or

integrating the data with their own

• Datasets in figshare, Dryad and domain specific databases

• Code deposited in figshare and GitHub

• First collection:

39

Current content is diverse - bimonthly releases

Page 40: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

• Neuroscience, ecology, epidemiology, environmental science, functional

genomics, metabolomics, toxicology etc.

• New previously published individual datasets, curated aggregation and

citizen science:

o a fuller, more in-depth look at the data processing steps, supported by

additional data files and code from each step

o additional tutorial-like information for scientists interested in reusing or

integrating the data with their own

• Datasets in figshare, Dryad and domain specific databases

• Code deposited in figshare and GitHub

• First collection:

40

Current content is diverse - bimonthly releases

Page 41: NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

Acknowledgements

Advisory Boards and Collaborators

Philippe Rocca-Serra, PhD

AlejandraGonzalez-Beltran, PhD

EamonnMaguire

MiloThurston, PhD

Visit nature.com/scientificdata

Email [email protected]

Tweet@ScientificData

Honorary Academic Editor Susanna-Assunta Sansone, PhD

Managing EditorAndrew L Hufton, PhD

Editorial CuratorVictoria Newman

Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators