FAIRsharing Keynote - International Workshop on Sharing, Citation and Publication of Scientific Data...

30
Describing and Connecting Standards, Databases and Policies Across Disciplines Peter McQuilton, PhD @fairsharing_org International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines, Tachikawa, Tokyo, 5-7 December 2017

Transcript of FAIRsharing Keynote - International Workshop on Sharing, Citation and Publication of Scientific Data...

Describing and Connecting Standards, Databases and

Policies Across Disciplines

Peter McQuilton, PhD

@fairsharing_org

International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines, Tachikawa, Tokyo, 5-7 December 2017

Science is big data!

Credit to: ttps://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/ 2014

But we don’t handle data well

A set of principles, for those wishing to enhance

the value of their

data holdings

Designed and endorsed by a diverse set of stakeholders - representing academia, industry, funding

agencies, and scholarly publishers.

FAIRFindable

Accessible

Interoperable

Reusable

Visible, citable

Trackable

Community standards

Reproducible

These put emphasis on enhancing the

ability of machines to automatically find

and use the data, in addition to supporting

its reuse by individuals

Most data aren’t FAIR

Most data aren’t FAIR

• Not always well cited, stored

o Software, code, workflows are hard to find/access

• Poorly described for third party reuse

o Different level of detail and annotation

• Curation activities are perceived as time-consuming

o Collection and harmonization of detailed methods and

experimental steps is rushed at the publication stage

Not FAIR – low findability and badly documented

• Available in a public repository

• Findable through some sort of search facility

• Retrievable in a standard format

• Self-described so that third parties can make sense of it

• Intended to outlive the experiment for which they were collected

To do better science, more efficiently, we need data that are…

My database is going offline, where should I

put the data, and in what format?

Before accepting my paper, this journal

wants my data to be in a public repository, but

which one?

My funder says I should deposit the data in a reputable

repository. But which one?

I’m collecting in-vivo animal

testing data –what metadata should I curate?

I’m about to start a set of experiments. In what

format should I record the data?

A web-based, curated, and searchable portal that monitors the

development and evolution of standards*, across all disciplines,

inter-related to databases/repositories and data policies

* A standard is a formal community specification for reporting, sharing and citing data, metadata and other digital assets.

Initial focus on metadata (or content) standards

Content standards

Models/Formats = Conceptual

model, conceptual schema,

exchange formats

Terminologies = Controlled

vocabularies, taxonomies,

thesauri, ontologies etc.

Guidelines = Minimum information

reporting requirements, checklists

Formats Terminologies Guidelines

Formats Terminologies Guidelines

240+

119+

709+

Source:

Sources:

MIAME

MIRIAM

MIQASMIX

MIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

SRAxml

SOFT FASTA

DICOM

MzML

SBRML

SEDML…

GELML

ISA

CML

MITAB

AAO

CHEBIOBI

PATO ENVO

MOD

BTO

IDO…

TEDDY

PRO

XAO

DO

VO

~1500

Source:

FAIRsharing enhances their findability

Content standards

Data policies by funders, journals and other organizations

Databases/Repositories

Formats Terminologies Guidelines

Mapping a complex and evolving landscape

270

4823

2

97

87 4

204

9 6 8

Paper in preparation, preliminary information as of July 2017

Ready for use, implementation, or recommendation

In development

Status uncertain

Deprecated as subsumed or superseded

All records are manually curated

in-house and verified by the

community behind each resource

Community verified status indicators

Finding and Accessing the data

Collections group together

one or more types of

resource by domain,

project or organization.

Recommendations are a

core-set of resources that

are selected and

recommended by a funder

or journal data policy.

Grouping the data

Data Policy

Visualizing the relationships between data…

Making FAIRsharing FAIR -Interoperability/Accessibility

• Data annotation:• Users/Maintainers – ORCID

• Organisations – FundRef

• Species – NCBI Taxon ontology

• Disciplines and Domains – re3data/EDAM/BRO

• API – swagger (ELIXIR guidelines)

• DOIs for standards (coming soon)

Making FAIRsharing FAIR -Findable - Embeddable Widget• Recommendation/Collection Widget for embedding

in third-party websites• Journal data policies (GigaScience, PLOS, Springer

Nature…)

• Standard Developing Organisations (e.g. TDWG)

• Societies/Organisations (e.g. ELIXIR)

Dr Massimiliano Izzo

Standard developing groups, incl:Journal publishers, incl:

Cross-links, data exchange, incl:

Societies and organisations, incl: Institutional RDM services, incl:

Projects, programmes, incl:

Working with and for the community

OBO

The FAIRsharing team Our Advisory Board

Thank-you for listening.Questions?