NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
-
Upload
susanna-assunta-sansone -
Category
Data & Analytics
-
view
289 -
download
0
description
Transcript of NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
The rise of the data-centric
research and publication enterprises
Susanna-Assunta Sansone, PhD
@biosharing
@isatools
@scientificdata
iDASH meeting, San Diego, Sept 15-16, 2014
Board of Directors; Technical Advisory Board;
Coordinating Editors; Sector Lead
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
Worldwide movement for FAIR data
Credit: Barend Mons
Worldwide movement for FAIR data
Credit: Barend Mons
http://bd2k.nih.gov/workshops.html#ADDS
Notes and narrative Spreadsheets and tables Linked data and nanopublications
Increase the level of annotation at the source, tracking provenance and using community standards
Doing my fair share of work
Working with and for:
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
6
• make annotation explicit and discoverable
• structure the descriptions for consistency
• ensure/regulate access
• deposit and publish
• etc….
To make any dataset ‘FAIR’, one
must have standards, tools and
best practices to: • report sufficient details• capture all salient features of
the experimental workflow
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
7
…breath and depth
of the experimental context
…is pivotal
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
8
sample characteristic(s)
experimental design
experimental variable(s)
technology(s)
measurement(s)
protocols(s)
data file(s)
......
The role of reporting or content standards
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
Community-developed “norms” set to structure and enrich the
description of datasets, facilitating understanding, sharing and reuse
A community mobilization - some examples
de jure de facto
grass-rootsgroups
standard organizations
Nanotechnology Working Group
Organizational and operational structures - quite diverse
de jure de facto
grass-rootsgroups
standard organizations
Nanotechnology Working Group
12
Technologically-delineated views of the world
Biologically-delineated views of the world
Generic features (‘common core’)- description of source biomaterial- experimental design components
Arrays
Scanning Arrays &Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomicsproteomics
metabolomics
plant biologyepidemiology
microbiology
Fragmentation, duplications and gaps
To compare and integrate data we need interoperable standards
Growing number of reporting standards
~ 156
~ 70
~ 334
So
urce: B
ioP
ortal
Databases, annotation,
curation tools
implementing standards
miameMIAPA
MIRIAMMIQASMIX
MIGEN
CIMRMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-TabGCDML
SRAxmlSOFT FASTA
DICOM
MzMLSBRML
SEDML…
GELML
ISA-Tab
CML
MITAB
AAO
CHEBIOBI
PATO ENVOMOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
Which standards and database can we use/recommend
BioSharing works to map the landscape of content standards in the
life sciences, broadly covering biological, natural and
biomedical sciences
The web-based, curated and searchable registry works to ensure the
standards are informative and discoverable, monitoring their
development, evolution also their use in databases
and adoption in data policies.
BioSharing’s goal is to assist stakeholders to make informed decisions:
• researchers, developers and curators who lack support and guidance on how to
best navigate and select the various content standards and understand their
maturity, or find databases that implement them;
• funders, journals, and librarians because they do not have enough information to
make informed decisions on which content standards or database should be
recommended in their policies, or funded or implemented.
Operational Team
Advisory Board and RDA Working Group
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
18
Core functionalities:• search and filtering• submissions forms to add new records• “claim” functionality of existing records• person’s profile (as maintainer of
records) associated to the ORCID
profile• visualization and views of content
Current content:
• Over 500
• Over 600
Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards
CTSA Omics Data Standards Working Group
Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries
Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries
Registering and cataloging is just step one; the next include:• Develop assessment criteria for usability and popularity of standards• Associate standards to data policies and databases• Assemble journal and funder policies re data storage• Make fully cross-searchable• Continue to embed it in the ecosystem of complementary registries
General-purpose, configurable format for
the description of experimental metadata
Designed to support:
• provenance tracking
• use of community minimal reporting
guidelines and terminologies
- reference system to link to (CDISC)
SDTM files; further connections
explored via
Designed to be converted to:
• a growing number of other metadata
formats, e.g. used by EBI repositories
• RDF representation with mapping to
several ontologies, incl. PROV-O to
deliver
analysis method script
Data file or record in a database
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
ISA powers data collection, curation resources and repositories, e.g.:
Embedding and in activities
CEDAR: Centre for Extended Data Annotation and Retrieval
(PI: Musen; pending notification of award)
The centre will take advantage of the recent growth in community-driven metadata standards to develop innovative methods to facilitate the annotation, cataloguing, and retrieval of dataset collections.
(pending final decision and notification of award)
Role of publishers as “agents of change”
• Data has to become an integral part
of the scholarly communications
• Responsibilities lie across several
stakeholder groups: researchers,
data centers, librarians, funding
agencies and publishers
• Publishers occupy a leverage point
in this process
Launched on May 27th, 2014
A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these
Credit for sharing your data
Focused on reuse and reproducibility
Peer reviewed,curated
Promoting CommunityData Repositories
Open Access
Supported by:
Experimental metadata or
structured component
(in-house curated, machine-readable formats)
Experimental metadata or
structured component
(in-house curated, machine-readable formats)
Article or
narrative component
(PDF and HTML)
Article or
narrative component
(PDF and HTML)
Data Descriptor: narrative and structure
Experimental metadata or
structured component
(in-house curated, machine-readable formats)
Experimental metadata or
structured component
(in-house curated, machine-readable formats)
Article or
narrative component
(PDF and HTML)
Article or
narrative component
(PDF and HTML)
Data Descriptor: narrative and structure
Data Descriptor - focus on reuse
Sections:• Title• Abstract• Background & Summary• Methods• Technical Validation• Data Records• Usage Notes • Figures & Tables • References• Data Citations
Detailed descriptions of methods and technical analyses supporting quality
of the measurements; does not contain tests of new scientific hypotheses
In traditional publications this information is not provided in a sufficiently detailed manner
However this information is essential for understanding, reusing, and reproducing datasets
Scientific hypotheses:
Synthesis
Analysis
Conclusions
Scientific hypotheses:
Synthesis
Analysis
Conclusions
Methods and technical analyses supporting the quality of the measurements:
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what when
Methods and technical analyses supporting the quality of the measurements:
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what when
Relation with traditional articles - content
BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
AFTER: expand on your research articles, adding further information for reuse of the data
Relation with traditional articles - time
Citations of and links to data files - databases
Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group
Value added component integrated in a growing ecosystem
We currently recognize over 50 public data repositories
Re
sea
rch
p
ap
ers
Da
ta
rec
ord
sD
ata
D
es
crip
tors
Evaluation is not be based on the perceived impact or novelty of the findings
• Experimental rigour and technical data qualityo Methodologically soundo Technical validation experiments and statistical analyseso Depth, coverage, size, and/or completeness of data sufficient for the types
of applications
• Completeness of the descriptiono Sufficient details to allow others to reproduce the results, reuse or integrate
it with other datao Compliance with relevant minimum information or reporting standards
• Integrity of the data files and repository recordo Data files match the descriptions in the Data Descriptoro Deposited in the most appropriate available data repository
Peer review process focused on quality and reuse
• Neuroscience, ecology, epidemiology, environmental science, functional
genomics, metabolomics, toxicology etc.
• New previously published individual datasets, curated aggregation and
citizen science:
o a fuller, more in-depth look at the data processing steps, supported by
additional data files and code from each step
o additional tutorial-like information for scientists interested in reusing or
integrating the data with their own
• Datasets in figshare, Dryad and domain specific databases
• Code deposited in figshare and GitHub
• First collection:
39
Current content is diverse - bimonthly releases
• Neuroscience, ecology, epidemiology, environmental science, functional
genomics, metabolomics, toxicology etc.
• New previously published individual datasets, curated aggregation and
citizen science:
o a fuller, more in-depth look at the data processing steps, supported by
additional data files and code from each step
o additional tutorial-like information for scientists interested in reusing or
integrating the data with their own
• Datasets in figshare, Dryad and domain specific databases
• Code deposited in figshare and GitHub
• First collection:
40
Current content is diverse - bimonthly releases
Acknowledgements
Advisory Boards and Collaborators
Philippe Rocca-Serra, PhD
AlejandraGonzalez-Beltran, PhD
EamonnMaguire
MiloThurston, PhD
Visit nature.com/scientificdata
Email [email protected]
Tweet@ScientificData
Honorary Academic Editor Susanna-Assunta Sansone, PhD
Managing EditorAndrew L Hufton, PhD
Editorial CuratorVictoria Newman
Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators