Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale,...

40
2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Do not reproduce without permission 2 Gerstein.info/talks (c) 2003 Computational Proteomics of Protein Complexes Mark B Gerstein Yale U Talk at NIH 2003.04.07

Transcript of Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale,...

Page 1: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

2

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 2 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Computational Proteomicsof Protein Complexes

Mark B GersteinYale U

Talk at NIH2003.04.07

Page 2: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

3

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 3 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

The Interactome: the Next ‘omic Step

Interactome

ProteomeTranscriptome

Genome

Page 3: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

4

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 4 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

The popularity of interactome information

0

50

100

150

200

250

300

350

400

450

1999 2000 2001 2002 2003

Cit

atio

ns

per

yea

r

Gavin et al. p-p int dataset

Ho et al. p-p int dataset

Uetz et al. p-p int dataset

Ribosome Structure

Spellman et al. Expression Expt.

deRisi et al. Expression Expt.

Page 4: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

5

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 5 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Computational Proteomics of Complexes

1. Interactions provide a systematic way of defining protein function on a genomic scale

2. Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome

3. Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data

4. Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non-interaction information (combining #1 and #2)

Page 5: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

6

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 6 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Circumscribing Protein Function in terms of Interactions

Page 6: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

7

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 7 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Understanding Protein Function on a Genomic Scale

• 250 of 650 known on chr. 22 [Dunham et al.]

• >>30K+ Proteins in Entire Human Genome(alt. splicing)

.…… ~650

Page 7: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

8

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 8 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Issues in defining protein function on a genomic scale

• Multi-functionality: 2 functions/protein (also 2 proteins/function)

• Role Conflation: molecular, cellular, phenotypic

• Fun terms… but do they scale? • Starry night• Sarah (affects female fertility); Sonic; Darkener of apricot &

suppressor of white apricot; Redtape, gridlock, roadblock (when mutated block transport along axons); ROP vs ROM ("Regulator of Copy Number" or RNA-I-II-complex-binding-protein)

• For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]

Page 8: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

9

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 9 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Ontologies for function: Networks, Hierarchies, DAGs

All of SCOP entries

1Oxido-

reductases

3Hydrolases

1.1Acting on CH-OH

1.1.1.1 Alcohol dehydrogenase

ENZYME

1.1.1NAD and

NADP acceptor

NON-ENZYME

3.1Acting on

ester bonds

1 Meta-bolism

1.1 Carb.

metab.

3.8 Extracel.

matrix

3.8.2 Extracel.

matrixglyco-protein

1.1.1 Polysach.

metab.

3.8.2.1 Fibro-nectin

General similarity Functional class similarityPrecise functional similarity

3 Cell

structure

1.5Acting on

CH-NH

3.4Acting on

peptide bonds

1.1.1.3Homoserine

dehydrogenase

1.2Nucleotide

metab.

3.1 Nucleus

3.8.2.2Tenascin

1.1.1.1 Glycogenmetab.

1.1.1.2 Starchmetab.

3.1.1.1 Carboxylesterase

3.1.1Carboxylic

ester hydro-lases

3.1.1.8 Cholineesterase

Page 9: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

10

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 10

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Ontologies for function: Interaction vectors

Lan et al. IEEE (2002) & COSB (2003)

Page 10: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

11

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 11

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Validating and Integrating Genomic Protein-Protein Interaction Datasets

with Known Complexes

Page 11: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

12

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 12

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Protein interaction data

• Databases (BIND, DIP, MIPS etc.) literature

• High-throughput datasets in vivo pull down yeast two-hybrid

• Computational predictions Tangential genomic data

• Expression data• Phenotypic data• Localization Data

Page 12: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

13

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 13

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Combining interaction data

• High-throughput data is less reliable than more careful, smaller scale experiments Orthogonal datasets

• Combining data increases accuracy coverage

• How to do this in a quantitative way? How to weight the different data sources? General classification problem (machine

learning) Bayesian networks: probabilistic

Page 13: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

14

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 14

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Example of data integration:RNA polymerase II

Which subunits interact?-> protein-protein interaction

experiments

Kornberg et al., 2001

Compare with Gold Std. structure:

Edwards, Kus, Jansen, Greenbaum, Greenblatt, Gerstein, TIG (2002)

Page 14: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

15

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 15

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11

Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12

Page 15: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

16

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 16

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11

Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12

structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 16: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

17

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 17

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Interaction experiments before structure was known

Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11

Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12

structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0

Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1

Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0

Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 1 0 1 0 0 1 0

Far western 1 0 0 0 1 0

Page 17: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

18

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 18

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11

Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12

structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0

Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1

Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0

Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 1 0 1 0 0 1 0

Far western 1 0 0 0 1 0

= false

= true

Page 18: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

19

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 19

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Integrate using naive Bayes classifier

Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11

Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12

structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0

Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1

Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0

Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

Pull-down 1 1 1 0 1 0 0 1 0

Far western 1 0 0 0 1 0

Combined (Bayesian) 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

= false

= true

Page 19: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

20

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 20

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA polymerase II

Integrate using naive Bayes classifier

Majority 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Intersection 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Union 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Page 20: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

21

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 21

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Data integration: RNA ploymerase II

Subunit pairs covered Fraction true [%]Far western 15 53Cross linking 20 65Far western 30 77Pull-down 35 57Pull-down 35 66Pull-down 9 44Far western 6 50

Combined (Naive Bayes) 45 80Union 45 60Intersection 45 76Majority 45 73

Page 21: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

22

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 22

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Comparison of interaction data sets

.

Data set

Method

Page 22: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

23

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 23

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Comparison of experimental data with gold standards

Positives8250 interactions in MIPS complexes

Negatives~2.7 M pairs in diff.

Subcellular compartments

TP

FP

Set of experimental“interactions”

Page 23: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

24

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 24

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Gavin

Uetz Ho

90/556711/135

1357/6226

6/6

353/21218/6

15/1

TP / FP

Combining experimental data

Jansen et al. JSFG 2002

Page 24: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

25

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 25

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Integrating Structural Complexes with Non-interaction Genomic Information:

Using them to Interpret Gene Expression data

Page 25: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

26

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 26

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MC

M3

MC

M6

CD

C4

7M

CM

2C

DC

46

CD

C5

4

DP

B3

CD

C4

5D

PB

2C

DC

2C

DC

7P

OL

2H

YS

2P

OL

32

DB

F4

OR

C2

OR

C6

OR

C5

OR

C4

OR

C3

OR

C1

Format of Gene Expression

Data

Conditions (e.g. Cancers) or Timepoints

A B A A A B B B A B B B B B A …..

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …..

MCM3

MCM6

CDC47

MCM2

CDC46

S CDC54

E DPB3

N CDC45

E DPB2

G CDC2

CDC7

POL2

HYS2

POL32

DBF4

….

Page 26: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

27

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 27

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MC

M3

MC

M6

CD

C4

7M

CM

2C

DC

46

CD

C5

4D

PB

3C

DC

45

DP

B2

CD

C2

CD

C7

PO

L2

HY

S2

PO

L3

2D

BF

4O

RC

2O

RC

6O

RC

5O

RC

4O

RC

3O

RC

1

MCMsprots.

ORC

Polym.&

Expression Correlations Segment Replication

Complex into Component Parts

Page 27: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

28

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 28

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Range of Expression Correlations within Complexes

Replication CplxOverall .05 ORC .19, MCMs .75Pol. .45, .75,

Ribosome Overall .80Large .80Small .81

ProteasomeOverall .43 20S .5019S .51

Page 28: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

29

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 29

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Protein-Protein Interactions &

Expression

between selected expression timecourses

(all pairs, control)

(strong interactions in perm- anent complexes, clearly diff.)

Cell Cycle CDC28 expt. (Davis) Sets of interactions

(from MIPS)

(Uetz et al.)

Pairwise interactions

Page 29: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

31

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 31

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Permanent v. Transient Complexes

Jansen et al., Genome Research, 2002

Page 30: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

33

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 33

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Genome-wide prediction of protein complexes based on both high-

throughput interaction data and non-interaction, genomic information

Page 31: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

34

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 34

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Global Network of 3 Different

Types of Relationships

~313K significant

relationshipsfrom ~18M

possible

Page 32: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

35

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 35

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Global Network of 3 Different

Types of Relationships

Simultaneous 188KInverted 63KShifted 67K

~313K significant

relationshipsfrom ~18M

possible

Page 33: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

36

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 36

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Globally, how well do expression relationships

predict known interactions?

Coverage of the 8250 Known Interactions in Complexes Found [MIPS]

Random ~2% 1x(313K/18M)

24x

EnrichmentCompared to RandomizedExpressionRelationships

CC: 313K relationships from ~18M possible from clustering cell-cycle expt.

CC 42%

Page 34: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

37

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 37

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Combining Expression Data Sets Increases

Coverage & Decreases Noise

Coverage of the 8250 Known Interactions in Complexes Found [MIPS]

KO: 278K relationshipsfrom clusteringknock-out profiles [Rosetta]

KO 34% 22x

EnrichmentCompared to RandomizedExpressionRelationships

Page 35: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

38

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 38

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Combining Expression Data Sets Increases

Coverage & Decreases Noise

Coverage of the 8250 Known Interactions in Complexes Found [MIPS]

CC: 313K relationships from ~18M possible from clustering cell-cycle expt.

CC 42% 24x

KO: 278K relationshipsfrom clusteringknock-out profiles [Rosetta]

KO 34% 22xKO v CC 55% 111xKO ^ CC 21% 254x

EnrichmentCompared to RandomizedExpressionRelationships

Page 36: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

39

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 39

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Computational Proteomics of Complexes

1. Interactions provide a systematic way of defining protein function on a genomic scale

2. Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome

3. Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data

4. Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non-interaction information (combining #1 and #2)

Page 37: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

40

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 40

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

For the Future

• Developing an accurate interactome for the cell, from prediction and through integration of high-throughput information

• Development of statistical approaches to combine and integrate information

• Development of database technologies to store hetrogeneous and noisy genome-wide interaction datasets

• A moderate number of structural complexes are very useful as gold standard data

Page 38: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

41

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 41

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Protein complexes &Structural Genomics

• A computational challenge following from the solution of the partslist Given many monomeric structures produced by structural genomics,

predict (or rationalize) the interactome through docking

• Maybe many structures will be only be solved as complexes….

Page 39: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

43

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 43

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Bottlenecks in analysis of all of TargetDB (Interologs)

Page 40: Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

44

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 44

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Acknowledgements

J Qian, R Jansen, A Drawid, C Wilson,

D Greenbaum, C Goh, N Lan, H Hegyi, R Das, S Douglas, B StengerJ Lin, Y Kluger

CollaboratorsM Snyder (A Kumar, H Zhu, …)

A Edwards, B Kus, J Greenblatt

NIH

GeneCensus.org