Data Consultant, Honorary Academic Editor
Susanna-Assunta Sansone, PhD Associate Director, Principal Investigator
RDA Engagement IG, Sept, 2013
Mapping the landscape of stakeholders and
standards in the life sciences
@
§ Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed, open, common reporting standards are pivotal to
structure and enrich the annotation of
• entities of interest (e.g., genes,
metabolites, phenotypes) and
• experimental steps (e.g.,
provenance of study materials,
technology and measurement types)
Standards for describing and reporting datasets
A ‘general mobilization’ to develop standards, e.g.:
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
A ‘general mobilization’ to develop standards…..BUT
§ Fragmentation of the standards is a major issue ! • Being focused on particular communities’ interests, be their individual technologies
or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards
• This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
Growing number of reporting standards
+ 130
Estimated
+ 150
Source: MIB
BI,
EQU
ATOR
+ 303
Source: BioPortal
Databases, annotation,
curation tools
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
To track provenance of the information
and ensure richness of data and experimental
metadata descriptions, to
maximize reusability
But how much do we know about these standards
• A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains
• A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains
• Progressively associate standards to data policies and databases • Develop assessment criteria for usability and popularity of standards • Help stakeholders to make informed decisions on e.g. what standards or
databases to use or recommend
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
9
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
10
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
11
Users can claim entries and maintain them
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
12
§ Existence of a formal specification, with: • good level of documentation, with scope and use cases • ease of implementation • human and machine readability
§ Broad adoption and implementation, outside the initial group by: • community databases (hence existence of standards-annotated datasets) • software (e.g. for reporting, editing, curating, submitting to databases)
§ Active user community, also providing: • support • responsiveness to community requests • examples
§ Interoperability with and extensibility to other standards, ranging from: • compatibility with other standards • flexibility to cover new domains • conversion and mapping, if applicable
§ Openness
Criteria to be used in evaluating standards for adoption:
Jessica D. Tenenbaum Duke Translational Medicine Institute
Melissa Haendel OHSU Library
Susanna-Assunta Sansone University of Oxford
also as part of the NIH Clinical and Translational Science Award (CTSA) program
§ Database name § Main resource URL § Contact information § Date resource established (year) § Conditions of use (free, or type of license) § Scope: data types captured, curation polic § Standards implemented: checklists, terminologies, formats § Taxonomic coverage § Data accessibility/output options § Data release frequency § Versioning period and access to historical files § Documentation available § User support options § Data submission policy § Relevant publications § Tools available
Core attributes to describe databases and assist in evaluating scope and relevance as well as access to data:
Gaudet et al. NAR Database, 2011
Beside grass-roots initiatives and formal
standardization initiatives,
which other stakeholders are relevant and
operative in the data area?
Data publication platforms, e.g.:
§ Pharma R&D has invested heavily in procedures and tools that integrate external information with their own data to enhance the decision-making process
§ Now pre-competitive initiatives and private-public partnerships are blooming as solutions towards reducing costs, associated to data management and curation, and maximize data interoperability
Pre-competitive initiative
Big Life Science
Company
Yesterday Today Tomorrow
Yesterday Today Tomorrow Innovation Model
Innovation inside Searching for Innovation Heterogeneity of collaborations; part of the wider ecosystem
IT Internal apps & data Struggling with change security and trust
Cloud, services
Data Mostly inside In and out Distributed
Portfolio Internally driven and owned Partially shared Shared portfolio
Credit to: Pistoia Alliance
Big Life Science
Company
Proprietary content provider
Public content provider
Academic group
Software vendor
CRO
Service provider
Regulatory authorities
The information landscape in the industrial sector …evolving…
Our industry needs a Disruptive Innovation. That Disruption...is Pistoia
Credit to: Pistoia Alliance
If you want to go fast, go alone If you want to go far, go together
Top Related