Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of...

43
by Barend Mons Brought to you for in V parts A Plea For Professional Datapublishing Bringing Data to Broadway

description

RDA Fourth Plenary Keynote - Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway" - Monday 22nd Sept 2014, Amsterdam, the Netherlands https://rd-alliance.org/plenary-meetings/fourth-plenary/plenary4-programme.html

Transcript of Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of...

Page 1: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

by Barend Mons

Brought to you for

in V parts

A Plea For Professional Datapublishing

Bringing Data to Broadway

Page 2: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

The Cast

FAIR play For Research Data and other Research Objects

Findable Accessible

Interoperable Reusable

Page 3: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Part I

Moaning and Lamenting

Page 4: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Singers and Dancers

Page 5: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

A-

The Curse of Multidisciplinarity

Page 6: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

I can not keep my data experts !!!

Page 7: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

2005: Text Mining ? Why Bury it first and then mine it again !

f

Page 8: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 9: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 10: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Part IIThe Explicitome

and the Elusive Part(our own fault)

The Explicitome: everything we already asserted

Page 11: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

narrative

The Elusive Explicitome Phenomenon example from: Yepes & Verspoor, 2013

Tables/figures

abstract

# of assertions

Supplementary data

2% 4% 50%* # of SNP-Phen:

The Elusive Explicitome: what escapes us (95%)

Hurdle 1: Paywalls

Hurdle 2: ‘TIF’walls

Hurdle 3: The Wall of Broken Links

5 500* 1000 50K-1M+

Page 12: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Data loss is real and significant, while data growth is staggering

Nature news, 19 December 2013 • Computer speed and storage capacity is doubling every 18 months and this rate is steady

• DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade‘Oops, that link was the laptop of my PhD student’

Page 13: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Computer Analytics

(takes charge)

Enormity of datasets

(beyond narrative)

Collaborative Intelligence

(calls for million minds) Irreversable movement

(towards OA)

FAIR

Data Publishing &

Stewardship

?

The trends in e-Science

Page 14: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 15: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 16: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 17: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 18: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Professionalise Data Stewardship

Educate, Reward and Keep Data Experts

FA

IR

Page 19: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Part 3 Unavoidable: some science of ‘our own’

but…..as examples, sorry

Part III

INTERMEZZOSome Research….

….Sorry for the LS examples…..

Page 20: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Simplified eScience

RO’s The Explicitome

+ WorkFlows

User

New dataset

New Insights

Ridiculogram

Page 21: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Thanks to Peter WittenBurg

Page 22: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

AERIAL SURVEY pattern recognition in

Ridiculograms

HUMAN EXCAVATION rationalisation and

‘confirmational reading’

‘Why would I believe this association’???

XFAIR for computers FAIR for people

Page 23: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

For KD we need each association only once

23Cardinal Assertion

(<1011)

n identical assertions

‘n’ different provenances

Page 24: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

We publish about less than a million LS Concepts !

24106 concept clusters (Knowlets)

Page 25: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

BioSemantics Knowledge Discovery Pipeline data sources ‘coordinated’ data

!

nanopub cache

cardinal assertion

storesemantic data modellingindexing

reasoning algorithms

trends

phase transitions

‘new’ data differentials alerts

{funding priorities

LUMC - LIACSwww.biosemantics.org

• gene • disease

semantic query

{

Page 26: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

© Phortos Consultants

44,000 hypotheses (PPI)

What about the other 43,999 ?

Page 27: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Part 3 Unavoidable: some science of ‘our own’

but…..as examples, sorry

Part IV

Towards SolutionsBigger is not Better

Zipping the Explicitome

Page 28: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Electronic Health

Databases

Value Added

Databases

The Rescued Explicitome

narrative

Tables/figures

Supplementary data

abstract

Total Explicitome an estimated 1014 asserted associations in 2,500 data sources

PROVENANCE

ETL to FAIR

FAIR to

read

Page 29: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Assertions

Concepts

1014

1011

10680%

20%

Semantic MedLineU+C+CT+EG+GO = 36 M

Cardinal

Zipping the Explicitome

Page 30: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Part 3 Unavoidable: some science of ‘our own’

but…..as examples, sorry

Part V

(FAIR) data should take CENTER STAGE

Page 31: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

DOI

PID

ARK

HandlesUUID

TURI’s

?

Page 32: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

PID

'provenance' (user defined)

Data (elements)

Metadata (intrinsic)

A simplified diagram of a Digital (data) Object irrespective of technological choices and naming

Page 33: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

PID

'provenance' (user defined)

Data (elements)

Metadata (intrinsic)

Digital Object Architecture s are Digital Objects

Nanopublications are Research ObjectsSome Research Objects are

Page 34: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

PID\\\

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

Totally UNFAIR

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

Findable Usable for Humans

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR metadata

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- restricted access

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- Open Access

PID

Metadata (intrinsic)

'provenance' (user defined)

Data (elements)

FAIR data- Open Access/Functionally Linked

Data as increasingly FAIR Digital Objects

Page 35: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

The Data Stewardship Cycle

35

5%

Page 36: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Page 37: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Repositories

Data Owners

(supp)data

Databases

ELIXIR FAIR Data Search Index

End-users

FAIR L2

ELIXIR semantic data repository

ELIXIR Data FAIR Port

ELIXIR federated data

FAIR L1

Search for datasets

Download data (sub)

sets in many formats (xml, rdf, json etc)

FAIR L3

FAIR L4

ASPs, Inhouse IT, Bioinformatics

Etc..

Tools & Applications

ElixirFin.

ElixirEsp.

ElixirNor.ElixirUKElixir

SWEElixirNL..ElixirFin.

ElixirEsp.

ElixirNor.ElixirUKElixir

SWEElixirNL..

FAIRport proof of concept

www.nanopubmed.org

Page 38: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Parties needed Typical Candidates NL-exampleTusted Party Usually Public Sector

With 'data stewardship' mandate 1

Executive Party/ Coordinator

Usually Public or Private Sector With Expert Knowledge on Project

ans relation management 2

Technology Providers PID/ARTA stewards3 4

DTL/ELIXIR-nl

others

Publishing pipeline EURETOS6

DOA architecture/IMS CNRI + EURETOS5

Repository Software7

eInfrastructure8

Page 39: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Malpractices…….

Journal Impact Factor

Ignore Altmetrics

No data stewardship plan

Obstruct Tenure Data Experts

‘supplementary data’

Knowledge Sharing Impaired

Page 40: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

4/10/1440

EUDAT

DATAVERSE

BD2KELIXIR

NIHCom

monsH2020

DRYADRDA

FigShare

Nanopub

Biosharing

Elsevier

NatureScience

SageBio

NITRDFORCE11

ORCIDVIVO

HVPDataCite

EGA

Reseach Objects

NebulusEmbassy

SADI

EURETOSYARCdata

IMI

DANS

interoperability

ISA

Open PHACTS

Data Fabric

Page 41: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Good practices (apart from collaborating)

RO Impact Factor

Award Altmetrics

5% for data stewardship plan

Train & Tenure Data Experts

‘professional data publishing’

FAIR play

Page 42: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

THE END

Thank you!

Page 43: Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

COMMENT: (till October 1st) ENDORSE: (after October 1st)

1. FAIR guiding principles with public discussion forum: https://www.force11.org/group/fairgroup/fairprinciples

2. Notes and Annexes: https://www.force11.org/node/6062/

3.  Group home page https://www.force11.org/group/fairgroup

Endorsed by 82 organisations and [y] individuals