Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

Post on 16-Jul-2015

117 views 2 download

Tags:

Transcript of Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database

BDBM, Moscow June 30, 2014

An Introduction to MOPED Multi-Omics Profiling Expression Database

Eugene Kolkereugene.kolker@seattlechildrens.org,

egnklkr@gmail.com

What is MOPED?

moped.proteinspire.orgPublically accessible Multi-Omics Database

Protein, Gene, and Pathway expression data

Expression Categorized by organism, tissue,

condition, localization

More info on kolkerlab.org

Thanks to Oxana Trifonova, Andrey Lisitsa!

What is MOPED?

The Multi-OMICS Cascade

PROTEOME

TRANSCRIPTOME

GENOME

METABOLOME

Cell & Organism

What can happen?

What appears to be

happening?

What has happened and

what makes it happen?

What has happened and

what is happening?

???

Modified from Hammock, 2007

Protein and Gene pages summarize expression, external

links, and pathway connections

Consistently processed expression from raw data

Relative expression experiments for comparisons across

tissues and conditions

Key MOPED Features

Discover expression of pathways within experiments

Experiment Metadata linked to expression data

Visualizations of expression data along the chromosome

Key MOPED Features, 2

Protein Details

Multi-omics connection to Gene

Connections to:

Pathways (from Reactome, BioCyc, and PANTHER

External Databases (including GeneCards, UniProt, NCBI)

Protein

concentrations in

ppm, ng/mL, nM

Gene Chromosome Visualizations

Advanced Filtering

Relative Gene

Expression Data

Experiment Metadata

Nature, 2013

OMICS, 2014

• >4 million Gene Expression Records

• >600,000 Protein Expression Records

• Data on Human, Mouse, Worm, Yeast

• >60,000 proteins

• >90,000 genes

• >5,000 pathways

• >22,000 users

from 90 countries

Nature, 2014

Pandey:

~2200 raw

data sets, 1.2 TB

Kuster:

twice less +

other labs’ data

Release Statistics

Volume, Veracity, Velocity,

Variety, and Value

Banking/Marketing/IT:

Volume, Velocity

ValueLife Sciences/Healthcare:

Veracity, Variety

5 Vs of Big Data

Big Data, 2013, 1(1)

What is DELSA?

Data-Enabled Life Sciences

Alliance @ delsaglobal.org Data

Knowledge

Action

Outcomes

Contact EK: egnklkr@gmail.comor eugene.kolker@seattlechildrens.org

For more info: moped.proteinspire.organd kolkerlab.org

Спасибо!

Вопросы?

Protein Relative Expression

Life Sciences and Fourth Paradigm

- Theory, Experimentation, Simulation, & Data-enabled Science

- Enormous increase in scale of data generation, vast data

diversity and complexity

- Development, improvement and sustainability of 21st Century

tools, databases, algorithms & cyberinfrastructure

- Past: 1 PI (Lab/Institute/Consortium) = 1 (Gene) Problem

- Future: Knowledge ecologies and New metrics to assess

scientists & outcomes (lab’s capabilities vs. ideas/impact)

- Unprecedented opportunities for scientific discovery and

solutions to major world problems

Urgent Need:

A Sustainable Supporting Ecosystem!

High-dimensional data are particularly prone to overfitting; as a

result, a computational model emerging from the research and

discovery phase may function well on the samples used for the

discovery research, but is inaccurate on any other sample.

Micheel, Nass, Omenn, US National Academies, 2012

The future of science will be influenced by the interconnectivity

of governments, research and educational institutions, and

individual citizens around the globe. Subra Suresh, NSF, 2012

From Data to Outcomes

What is the Local FDR (LFDR)?

• FDR measures cumulative false rate above the threshold

(shaded areas)

• LFDR measures the FDR at the certain threshold (heights)

• LFDR = b/(a+b)• If there are many IDs above the threshold, it is possible for

FDR to be small (e.g. 2%) and LFDR big (e.g. 20%)

• Using LFDR prevents bad IDs being lumped with good IDs

Bioinformatics, 2008

Proteomics, 2010