Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

16
BDBM , Moscow June 30, 2014 An Introduction to MOPED Multi-Omics Profiling Expression Database Eugene Kolker [email protected] , [email protected]

Transcript of Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Page 1: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

BDBM, Moscow June 30, 2014

An Introduction to MOPED Multi-Omics Profiling Expression Database

Eugene [email protected],

[email protected]

Page 2: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

What is MOPED?

moped.proteinspire.orgPublically accessible Multi-Omics Database

Protein, Gene, and Pathway expression data

Expression Categorized by organism, tissue,

condition, localization

More info on kolkerlab.org

Thanks to Oxana Trifonova, Andrey Lisitsa!

What is MOPED?

Page 3: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

The Multi-OMICS Cascade

PROTEOME

TRANSCRIPTOME

GENOME

METABOLOME

Cell & Organism

What can happen?

What appears to be

happening?

What has happened and

what makes it happen?

What has happened and

what is happening?

???

Modified from Hammock, 2007

Page 4: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Protein and Gene pages summarize expression, external

links, and pathway connections

Consistently processed expression from raw data

Relative expression experiments for comparisons across

tissues and conditions

Key MOPED Features

Page 5: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Discover expression of pathways within experiments

Experiment Metadata linked to expression data

Visualizations of expression data along the chromosome

Key MOPED Features, 2

Page 6: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Protein Details

Multi-omics connection to Gene

Connections to:

Pathways (from Reactome, BioCyc, and PANTHER

External Databases (including GeneCards, UniProt, NCBI)

Protein

concentrations in

ppm, ng/mL, nM

Page 7: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Gene Chromosome Visualizations

Advanced Filtering

Relative Gene

Expression Data

Page 8: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Experiment Metadata

Nature, 2013

OMICS, 2014

Page 9: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

• >4 million Gene Expression Records

• >600,000 Protein Expression Records

• Data on Human, Mouse, Worm, Yeast

• >60,000 proteins

• >90,000 genes

• >5,000 pathways

• >22,000 users

from 90 countries

Nature, 2014

Pandey:

~2200 raw

data sets, 1.2 TB

Kuster:

twice less +

other labs’ data

Release Statistics

Page 10: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Volume, Veracity, Velocity,

Variety, and Value

Banking/Marketing/IT:

Volume, Velocity

ValueLife Sciences/Healthcare:

Veracity, Variety

5 Vs of Big Data

Big Data, 2013, 1(1)

Page 11: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

What is DELSA?

Data-Enabled Life Sciences

Alliance @ delsaglobal.org Data

Knowledge

Action

Outcomes

Page 12: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Contact EK: [email protected] [email protected]

For more info: moped.proteinspire.organd kolkerlab.org

Спасибо!

Вопросы?

Page 13: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Protein Relative Expression

Page 14: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Life Sciences and Fourth Paradigm

- Theory, Experimentation, Simulation, & Data-enabled Science

- Enormous increase in scale of data generation, vast data

diversity and complexity

- Development, improvement and sustainability of 21st Century

tools, databases, algorithms & cyberinfrastructure

- Past: 1 PI (Lab/Institute/Consortium) = 1 (Gene) Problem

- Future: Knowledge ecologies and New metrics to assess

scientists & outcomes (lab’s capabilities vs. ideas/impact)

- Unprecedented opportunities for scientific discovery and

solutions to major world problems

Urgent Need:

A Sustainable Supporting Ecosystem!

Page 15: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

High-dimensional data are particularly prone to overfitting; as a

result, a computational model emerging from the research and

discovery phase may function well on the samples used for the

discovery research, but is inaccurate on any other sample.

Micheel, Nass, Omenn, US National Academies, 2012

The future of science will be influenced by the interconnectivity

of governments, research and educational institutions, and

individual citizens around the globe. Subra Suresh, NSF, 2012

From Data to Outcomes

Page 16: Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

What is the Local FDR (LFDR)?

• FDR measures cumulative false rate above the threshold

(shaded areas)

• LFDR measures the FDR at the certain threshold (heights)

• LFDR = b/(a+b)• If there are many IDs above the threshold, it is possible for

FDR to be small (e.g. 2%) and LFDR big (e.g. 20%)

• Using LFDR prevents bad IDs being lumped with good IDs

Bioinformatics, 2008

Proteomics, 2010