Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking...

24
OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute UCLA HeartBD2K Center [email protected]

Transcript of Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking...

Page 1: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

OmicsDI –Discovering and Linking Public ‘Omics’ Datasets

Henning HermjakobEuropean Bioinformatics InstituteUCLA HeartBD2K Center

[email protected]

Page 2: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

OmicsDI VisionA PubMed for (omics) datasets

http://omicsdi.org

Page 3: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Omics XMLDatabases EBI Search Indexer

INDEXING ENGINE

EBI CLUSTERIndexed Data

LuceneIndexes

LuceneIndexes

• 520GB• 1.1B entries

REST

WS

SEARCH ENGINE

CACHE SERVERS

End points:• Statistics• Datasets

RESTFUL WS DATABASE

WEB APP

SEARCHSTATISTICSTAGGING

Om

icsA

pp

Page 4: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Dataset XML

Valid

ator

Mandatory Fields:

• Repository Id• Dataset Title• Publication date• Submitter information (Name, Affiliation)• Original URL

Desired Fields:

• Description/Abstract• Sample and Data Protocols• PubMed Id• Organism, Tissue, Disease

Additional Fields:• Protein Id (Ensembl or Uniprot)• Metabolite Id (ChEMBL) • More…

Page 5: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Graphical Browsing

Perez-Riverol, Yasset, et al. "Omics Discovery Index - Discovering and Linking Public Omics Datasets." bioRxiv (2016): 049205.

Page 6: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Search results overview

Page 7: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Dataset view

Page 8: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Access Metrics

Page 9: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Multi-Omics linking

Page 10: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Data re-use

Page 11: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Data re-use

Page 12: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Schema.org integration- Search engine exposure- citability

<script type="application/ld+json"> {

"@context": "http://schema.org", "@type": "Dataset", "name": "Expression data from skin biopsy samples from patients with moderate -to severe

psoriasis ", "description": "A gene expression profiling sub-study was conducted in which skin biopsy samples

were collected from 85 patients with moderate-to-severe psoriasis who were participating in ACCEPT, an IRB-approved Phase 3, multicenter, randomized trial. This analysis identified 4,175 probe-sets as being significantly modulated in psoriasis lesions (LS) compared with matched biopsies of non-lesional (NL) skin. Skin biopsy samples (n=170) were collected at baseline for RNA extraction and microarray analysis from 85 patients with moderate-to-severe psoriasis without receiving active psoriasis therapy.",

"sameAs": "http://www.ebi.ac.uk/gxa/experiments/E-GEOD-30999",

"creator": { "@type" : "Person", "name" : “Suárez-Fariñas Mayte”

}, "url": "http://www.omicsdi.org/dataset/ExpressionAtlas/E-GEOD-30999"

} </script>

http://www.omicsdi.org/dataset/atlas-experiments/E-GEOD-30999

Page 13: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

13

Biological SimilarityProteomic dataset Genomic dataset Metabolomic dataset

P P P

publication

metadata

M M M

publication

metadata

P P P G G G M M M

UNIPROTENSEMBL

ENSEMBL CHEMBL

Cross-references

PUBMED

Cross-references Cross-references

REACTOME Pathways

PG

M

INTACTP

P

P

G G G

publication

metadata

Page 14: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Biological Similarity

Page 15: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

OmicsDI and bioCADDIE Originally independenty funded Administrative supplement 2016

PIs Peipei Ping, UCLA Lucila Ohno-Machado, UCSD

Susanna Assunta-Sansone, U Oxford Eric Deutsch, ISB Henning Hermjakob, EBI

WPs Map OmicsDI, DATS data model Re-usable visualisation widgets Access metrics

Collaboration OmicsDI provides metadata from “its” repositories to DataMed OmicsDI goes more into the “depth” for omics DataMed focuses on breadth

Page 16: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Acknowledgements

YassetPerez-Riverol

MingzeBai

GaurhariDass

PRIDE TeamRui WangTobias TernentNoemi del ToroJuan Antonio VizcainoHenning Hermjakob

Metabolights TeamKenneth HaugPablo Conesa Mingo

EGA TeamHelen ParkinsonJustin PaschallDylan Spalding

NIH BD2K Center of Excellence @ UCLAGrant number 1U54GM114833-01

Peipei PingVincent Ky

EBI Search TeamSilvano SquizzatoYoung Mi ParkRodrigo Lopez

Page 17: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.
Page 18: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

The Bigger Picture - WorkflowsData Discovery:• OmicsDI

Tool Discovery: • BD2K

Coordination Center

Data Access:ProXI• Web services

based retrieval of data from OmicsDIrepositories

Data Analysis:• Reactome

Many options, data dependent

Page 19: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Finding a publication• Straightforward through PubMed or (Europe) PubMed Central

Page 20: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Finding a Dataset

• Many disconnectedsearch entry points

• Google does not workwell, as it does notseparate out datasets

• Vision:A PubMed for (‘omics)datasets

Page 21: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

21

Indexing System

Lucence-based Indexer System:

Strength:• Already implemented• Open source if we need to

migrate the infrastructure.• Indexed with all the EBI

information facilitates cross-references.

• Indexes all of EBI (1.1 B entries), known to scale well

Limitations:• Only an indexing system, not a

database -> no persistence• Relies on EBI infrastructure

EBI Search Indexer

INDEXING ENGINE

EBI CLUSTERIndexed Data

LuceneIndexes

LuceneIndexes

• 520GB• 1.1B entries

REST

WS

SEARCH ENGINE

CACHE SERVERS

Page 22: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

DDI application

• Database (Mongo)• Access statistics

• Web Service• Search • Statistics

• Web Application• Statistics • Browsing• Knowledge

Discovery

End points:• Statistics• Datasets

RESTFUL WS DATABASE

WEB APP

SEARCHSTATISTICSTAGGING

Om

icsA

pp

Page 23: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Ontology-aware Indexing

Page 24: Discovering and Linking Public ‘Omics’ Datasets 2 HENNING...OmicsDI – Discovering and Linking Public ‘Omics’ Datasets Henning Hermjakob European Bioinformatics Institute.

Ontology Highlighting