Data integration - Integration of functional associations using STRING

Post on 10-May-2015

664 views 0 download

Tags:

description

EMBO World Practical Course on Computational Biology, Shanghai Jiao Tong University, Shanghai, China, August 22, 2009.

Transcript of Data integration - Integration of functional associations using STRING

Data integrationIntegration of functional associations using STRING

Lars Juhl Jensen

Jensen, Kuhn et al., Nucleic Acids Research, 2009

functional associations

confidence scores

cross-species integration

630 genomes

model organism databases

Ensembl

RefSeq

defining orthology

two modes

protein mode

von Mering et al., Nucleic Acids Research, 2005

COG mode

von Mering et al., Nucleic Acids Research, 2005

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

conserved neighborhood

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

examples

bacterial Cox assembly

Banci et al., PNAS, 2005

Banci et al., PNAS, 2005

cellulose degradation

Cell

Cellulosomes

Cellulose

experimental data

protein interactions

yeast two-hybrid

affinity purification

fragment complementation

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

BINDBiomolecular Interaction Network Database

BioGRIDGeneral Repository for Interaction Datasets

DIPDatabase of Interacting Proteins

IntAct

MINTMolecular Interactions Database

HPRDHuman Protein Reference Database

PDBProtein Data Bank

inferred associations

gene coexpression

GEOGene Expression Omnibus

expression compendia

curated knowledge

complexes

MIPSMunich Information center

for Protein Sequences

Gene Ontology

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

KEGGKyoto Encyclopedia of Genes and Genomes

MetaCyc

Reactome

PIDNCI-Nature Pathway Interaction Database

literature mining

>10 km

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

co-mentioning

NLPNatural Language Processing

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

easy in theory …

… but not in practice

many data types

not comparable

variable quality

many sources

different file formats

different gene identifiers

partially redundant

spread over 630 genomes

quality scores

reproducibility

von Mering et al., Nucleic Acids Research, 2005

intergenic distances

benchmarking

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

raw quality scores

probabilistic scores

integrate over orthologs

protein mode

von Mering et al., Nucleic Acids Research, 2005

COG mode

von Mering et al., Nucleic Acids Research, 2005

combine all evidence

Frishman et al., Modern Genome Annotation, 2009

small molecules

Kuhn et al., Nucleic Acids Research, 2008

metametabolomics

Acknowledgments

Christian von Mering

Michael Kuhn

Manuel Stark

Samuel Chaffron

Philippe Julien

Monica Campillos

Tobias Doerks

Jan Korbel

Berend Snel

Martijn Huynen

Peer Bork

larsjuhljensen