Structural Systems Biology: Proteins are social molecules Modelling Protein...

13
1 Structural Systems Biology: Modelling Protein Interactions and Complexes Patrick Aloy BWS – Feb ‘07 Proteins are social molecules ccl1 YPR025C - cyclin kin28 YDL108W - pkinase cdc28 YBR160W - pkinase cks1 YBR135W - CKS cln1 YMR199W - cyclin cln2 YPL256C - cyclin clb2 YPR119W - cyclin clb4 YLR210W - cyclin clb3 YDL155W - cyclin clb5 YPR120C - cyclin cln3 YAL040C - cyclin clb6 YGR109C - cyclin clb1 YGR108W - cyclin pho85 YPL031C - pkinase pcl1 YNL289W - cyclin pcl2 YDL127W - cyclin pcl1 YDL179W - cyclin pcl5 YHR071W - cyclin I cdc42 YLR229C - ras ste20 YHL007C - PBD gic2 YDR309C - PBD cla4 YNL298W - PH rdi1 YDL135C - Rho_GDI gsp1 YLR293C - ras yrb2 YIL063C - Ran_BP1 ras1 YOR101W - ras sdc25 YLL016W - RasGEF cdc25 YLR310C - RasGEF ras2 YNL098C - ras rho1 YPR165W - ras sac7 YDR389W - RhoGAP rho4 YKR055W - ras ira2 YOL081W - RasGAP mge1 YOR232W - GrpE ssc1 YJR045C - HSP70 act1 YFL039C - actin pfy1 YOR122C - profilin spt15 YER148W - TBP TF III B YGR246C - transcrript _fac2 vam3 YOR106W - Syntaxin sed5 YLR026C - Syntaxin vps45 YGL095C - Sec1 sly1 YDR189W - Sec1 YLR396C - Sec1 tlg2 YOL018C - Syntaxin I Gavin*, Aloy* et al, Nature (2006). A great tool to study complexes (TAP / MS) 50 100 R e l a t i v e I n t e n s i t y [ % ] 1000 1500 2000 2500 3000 m/z M * * M

Transcript of Structural Systems Biology: Proteins are social molecules Modelling Protein...

1

Structural Systems Biology:

Modelling Protein Interactionsand Complexes

Patrick Aloy

BWS – Feb ‘07

Proteins are social molecules

ccl1YPR025C - cyclin

kin28YDL108W - pkinase

cdc28YBR160W - pkinase

cks1YBR135W - CKS

cln1YMR199W - cyclin

cln2YPL256C - cyclin

clb2YPR119W - cyclin

clb4YLR210W - cyclin

clb3YDL155W - cyclin

clb5YPR120C - cyclin

cln3YAL040C - cyclin

clb6YGR109C - cyclin

clb1YGR108W - cyclin

pho85YPL031C - pkinase

pcl1YNL289W - cyclin

pcl2YDL127W - cyclin

pcl1YDL179W - cyclin

pcl5YHR071W - cyclin

I

cdc42YLR229C - ras

ste20YHL007C - PBD

gic2YDR309C - PBD

cla4YNL298W - PH rdi1

YDL135C - Rho_GDI

gsp1YLR293C - ras

yrb2YIL063C - Ran_BP1

ras1YOR101W - ras

sdc25YLL016W - RasGEF

cdc25YLR310C - RasGEF

ras2YNL098C - ras

rho1YPR165W - ras

sac7YDR389W - RhoGAP

rho4YKR055W - ras

ira2YOL081W - RasGAP

mge1YOR232W - GrpE

ssc1YJR045C - HSP70

act1YFL039C - actin

pfy1YOR122C - profilin

spt15YER148W - TBP

TF III BYGR246C - transcrript _fac2

vam3YOR106W - Syntaxin

sed5YLR026C - Syntaxin

vps45YGL095C - Sec1

sly1YDR189W - Sec1

vps33YLR396C - Sec1

tlg2YOL018C - Syntaxin

I

Gavin*, Aloy* et al, Nature (2006).

A great tool to study complexes(TAP / MS)

50

100

Rel

ativ

e In

tens

ity [%

]

1000 1500 2000 2500 3000 m/z

M

*

*M

2

URA3 Kluyveromyces lactis

ORF

TAP

Chromosome

PCR productHomologous recombination

ProteinNH2 COOHTAPTAP-fusion

ORF

50

100

Rel

ativ

e In

tens

ity [%

]

1000 1500 2000 2500 3000 m/z

M

*

*M

Genome-wide analysis of the yeast proteomeORFs processed 6,466(30% with clear human orthologues)

ORFs with positive homologous recombination 5,474 (85%)

Selection of strains expressing TAP-fusion proteins 3,206 (59%)

Successful TAP-purifications 1,993 (62%)

MALDI-TOF samples 52,000 Protein IDs 36,000

2,760 (non redundant)

Extensive re-purification of complexes

64% of the known complexes were purified more than once

Reverse tagging is a means to validate new interactors

Screen ran to saturation

Reproducibility rate of 69%on139 repeated purifications

Capturing complex dynamics

Can we use our complete screen for complexes in yeast to extract

general biological principles ?

and just for the record:purifications are NOT complexes

3

De novo definition of protein complexes

V

X

Y

Z

Bait

W

V

X

Y

Z

Bait

W

BaitV

XY

Z

W

BaitV

XY

Z

W

V X

Y

Z

Bait

W

V X

Y

Z

Bait

W

Pros:information on biological re-use

Cons:no direct interactions

Affinity purification data

Matrix

Spoke

ZZ

Socio-affinity index

A

B

C B C

A-B SA-C SA-D -B-C MB-D -C-D -

---S--

MSMSSS

---MSS

C

D

B

A

D

B

C

ScoreLow Med High

TAG TAG TAG TAG

Pair Evidence (Spoke, Matrix)

A(i, j) = S i, j | i= bait + Si, j | j= bait + M i, j

S i, j | i= bait = log(n i, j | i= bait

f ibait n bait f j

prey n i= baitprey )

M i, j = log(n i, j

prey

f iprey f j

prey n prey (n prey − 1) /2all − baits∑

)

0,0001

0,001

0,01

0,1

1

0 5 10 15 20

Interaction Score

Inte

ract

ion

Affi

nity

full-lengthdomain

0

510

1520

2530

354045

-10 -9 -8 -7 -6 -5 -4 -3

Log (Kd)

% o

f int

erac

tions

AllAPY2H

7 / 5

4 2 / 1

31

/ 13

9 / 5

4 3 / 1

33

/ 13

12 /

545

/ 13

1 / 1

3

14 /

542

/ 13

2 / 1

3

10 /

541

/ 13

4 / 1

3

1 / 5

4 1 / 1

3

1 / 5

4 1 / 1

3

Biophysical meaning of Socio-affinities

Real affinity ?

P < 0.08

APs cover a broad range of Kds

Socio-affinity

Biophysical meaning of Socio-affinities

Physical proximity ?

0%10%20%30%40%50%60%70%80%90%

100%

< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15

Interaction Scores

% in

phy

sica

l con

tact

PDBY2H

17 /

921

15 /

719

5 / 9

518

/ 26

9

4 / 1

2414

/ 19

7

3 / 2

013

/ 28

8 / 1

611

/ 20 22

/ 34

10 /

19

10 /

1915

/ 22

5 / 8

12 /

23

5 / 5

17 /

22

2 / 2

7 / 7

15 /

22 25 /

30

775

/ 152

4774

795

/ 152

4764

0%10%20%30%40%50%60%70%80%90%

100%

< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15

Interaction Scores

% in

phy

sica

l con

tact

PDBY2H

17 /

921

15 /

719

5 / 9

518

/ 26

9

4 / 1

2414

/ 19

7

3 / 2

013

/ 28

8 / 1

611

/ 20 22

/ 34

10 /

19

10 /

1915

/ 22

5 / 8

12 /

23

5 / 5

17 /

22

2 / 2

7 / 7

15 /

22 25 /

30

775

/ 152

4774

795

/ 152

4764

Socio-affinity

Very good at removing “sticky” proteins(e.g. Vma2 present in 552 purifications but only good scores with Vma5,Vma10, Vma6 & Rav1)

4

• Socio-affinities capture the tendency of two proteins to be together under different conditions and thus can be used to define complexes

• It is known that proteins can belong to multiple complexes

• We need an iterative clustering procedure to disentangle the biological redundancy and versatility of protein complex composition

De novo definition of protein complexes

A

B C

D

10 9

11

66

6

10

A

B C

D

8 7

9 E5

44

4

5

5

- A B C D E F G H IA - 10 9 6 5 0 0 0 0B - - 11 6 5 0 0 0 0C - - - 6 5 0 0 0 0D - - - - 0 0 0 0 0E - - - - - 0 0 0 0F - - - - - - 10 6 4G - - - - - - - 4 6H - - - - - - - - 10I - - - - - - - - -

- A B C D E F G H IA - 8 7 4 5 0 0 0 0B - - 9 4 5 0 0 0 0C - - - 4 5 0 0 0 0D - - - - 0 0 0 0 0E - - - - - 0 0 0 0F - - - - - - 8 4 2G - - - - - - - 2 4H - - - - - - - - 10I - - - - - - - - -

H

I

F

G

8 8

4

4

22

H

I

F

G

10

6

6

44

ABCDEFGHI

ABCEDFGHI

-2

Score matrix Dendrogram Complexes

Iteration Threshold

Clustering strategy

Exploring the parameters space

• We explored a sensible range of clustering parameters (number of iterations, penalty values, etc) and generated 1,784 potential sets of protein complexes with varying degrees of stringency

• We compared each set in terms of accuracy and coverage to a hand-curated set of protein complexes (Aloy et al. Science, 2004)

• The best set consisted of 491 complexes with a coverage of 83% and an accuracy of 78%

• Known complexes and/or functional variations were in sets with slightly poorer accuracy and coverage

• We picked all the sets with values of accuracy and coverage above 70% and clustered the similar complexes

Definitive set of protein complexes

• We ended up with 5,488 slightly different variations (isoforms) of 491 complexes

• The procedure increased the coverage to 90%

• We retrieved 61% of the 279 previously known complexes (MIPS + literature mining) and identified, on average, 80% of their components

• 257 out of the 491 complexes are entirely novel

• We found no novel components for only 20 of the 279 complexes in our gold-standard set

5

Modular organisation of protein complexes

• Core average size 3.1 [1-23]• Module average size 2.9 [2-9]• Modules associated on average to 3.3 cores

Evidence supporting the modular organisation

Functional requirements(RNA processing and degradation)

Modularity and cross-talkbetween functions & compartments

Cell cycle

Cell fate

Cell transport D

efen

se

Energy

Environment

Metabolism Prot. fate Prot. synthesisTranscription

mRNA processing

Signaling

Unknown

Cell cycle

Cell fate

Cell transport

Defense

Energy

Metabolism

Prot. fate

Signaling

Unknown

Modules

Prot. synthesis

Environment

Cor

es

Cell cycle

Cell fate

Cell transport D

efen

se

Energy

Environment

Metabolism Prot. fate Prot. synthesisTranscription

mRNA processing

Signaling

Unknown

Cell cycle

Cell fate

Cell transport

Defense

Energy

Metabolism

Prot. fate

Signaling

Unknown

Modules

Prot. synthesis

Environment

Cor

es

6

• Protein networks may provide a molecular frame for the interpretation of “simple” genetic traits: essentiality (only ~20% in yeast)

• Recent phenotypic screens moved beyond essentiality in single growth condition

• Aim at providing phenotypic profiles for each genes

Rationalising phenotypesthrough complex architecture

0

5

10

15

0

≤50

>50

Similarity score

Nb

of c

ompl

exes

Random Complex core

Rationalising phenotypesthrough complex architecture

Hierarchical, dynamical and modularorganisation of protein complexes

Gavin*, Aloy*, et al. (2006) NatureBravo & Aloy (2006) Curr Opin Struct Biol

• 491 complexes (257 novel) with over 5000 isoforms• 147 functional (??) modules

But where are the details?

ras1YOR101W - ras

sdc25YLL016W - RasGEF

cdc25YLR310C - RasGEF

ras2YNL098C - ras5W - ras

ira2OL081W - RasGAP

7

Can we use 3D structures to understandthe interaction space?

ras

RhoGAP1. Interface2. Specificity

Do homologous proteins interactin the same way ?

Aloy et al. (2003) J Mol Biol

A’’B’’

A

A’

B

B’

Chothia & Lesk, EMBO J. 1986

10 Å iRMSD

% sequence identity

iRM

SD

medium highlow

80th percentile

Aloy et al. (2003) J Mol BiolAloy et al. (2005) Curr Opin Struct Biol

iRMSD vs PID

% Sequence Identity

iRM

SD

http://www.russell.embl.de/simint

iRMSD vs PID

90th percentile80th percentile

Ferredoxin-like

Asp transcarbamylase Thr deaminase

Dom1 Dom2 Dom1 Dom2

SH2 – SH3

SH3

SH3SH2

SH2

lck abl

8

CDK

p25p18

CKSs

Cyc

lins

Type 1

Type 2

Type 3

Type 4

Interaction Type(equivalent to the concept of fold)

Aloy & Russell (2004) Nature Biotechol

Structural data

Interaction data

Functional data

Genomic data

fLfcfiflnefpNN onsCivilizati ××××××= *

Aloy & Russell (2004) Nature Biotechol

Is Nature restricted to a few interaction types?

speciesAllFPFNIntsTypes ErrCNN −− ××××= 1

… emulating Cyrus again (Chothia, 1992)

Is the number of Interaction Types limited ?NTypes = NInts ×C × rFN

−1 × rFP × EAll−species

10,000interaction types

EU Sixth Framework IP (~14 Million €)

Year

Inte

ract

ions

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

New interaction types

Interaction types

Total available interactions

0

50

100

150

200

250

300

350

400

1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Growth in the number of Interaction types

9

Can we use 3D structures to understandthe interaction space?

ras

RhoGAP1. Interface2. Specificity

Y053QCRSHINCXIORAG2I7PPDCRL

DRA5DALADAP2DAB2DC4AS6OG

Family A Family B

non-Family B

What about the specificity ?

Structure

Asp

Arg Asp

Phe

Phe

Phe

Interface pair potentials

+ +

- -

Side-chain to side-chain

Side-chain to main-chain

InterPreTSInteraction Prediction through Tertiary Structure

Aloy & Russell, PNAS, 99, 5896, 2002.Aloy & Russell, Bioinformatics. 19, 161, 2003.

YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ...

RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...

Score

Significance(Do RHO4 & YFE7 interact?)

Alignments

1tx4A PIVLRETVAYLQA-------HALTTE ...YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ...

1tx4B KLVIVGDGACGKTCLLIVNSKDQF-- ...RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...

FGF - Receptor

FGF IL-1 Ricin

FGF

FHF

β-trefoil

10

Ras binding domains

Blind test on33 potential binders

22/27 (81%)Correct predictions

Z-scores

Cas

es

0

1

2

3

4

5

6

7

-2 -1,5 -1 -0,5 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5

Bind

Unclear

Don't Bind

RBPs

Ras

Structure-based P-PYeast interaction network

Aloy & Russell (2005) FEBS lett. (Systems Biology issue)

Putting structureinto pathways

Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006

Interactions of known structure

Interaction Discovery(‘Omics)

Cell Biology(EM)

We can predict interactions, good for us… and now what ?

Complex structure prediction

X-rayFive component complex

homology

homology

homology

homology

Two-hybrid network

Russell et al, Curr. Opin. Struct Biol. 2004Aloy et al, Curr. Opin. Struct. Biol. 2005

+Electron microscopy

11

Structure-based assemblyof protein complexes

from binary interactions

Aloy et al. (2004) Science

Modelling complexes from binary interactions

Samecomplex

Protein A

Protein B

Protein C

Protein D

Protein E

Protein F

HomologousProteins

Known Structure

Aloy et al. (2004) Science

3Drepertoire Pipeline1739 genes

589 multi-protein assemblies232 complexes

126 purifications102 manually annotated complexes

EM quality 6 - 9

634 proteins

Nearly completeMost individual components & few interactionsMost individual componentsSome individual componentsNo structural information

42

1220

25

3

Structural Overview(102 hand-annotated complexes)

Aloy et al, Science. 2004

12

Respiratory Fumarate Reductase S. Putrefaciens (1d4d)

Adenylylsufate reducatase A. Fulgidus (1jnr)

Succinate dehydrogenase E.Coli (1nek)templates sharing less than 40% homology

Models are filtered by:- Quality of the superpositon target/template- Geometrical clashes (bumps, interactions made)- Quality of contacts (InterPreTS)

In this case:- 4/7 domains could be modelled- distance to original complex: 8.1A- good InterPreTS scores

<25% id

<28% id

<27% id

Fumarate reducatase W. Succinogenes (1qla)

Matthieu Pichaud (EMBL-HD)

Structure-based assemblyof protein complexes …

… and networks

A BC

D F

EK

I HJ

GCross-talk

Complex from affinity purificationComplex from literature, etc.Interaction from two-hybridsInteraction predicted by structureSequence similaritySimilarity inferred through structure

?

Bridge the gap between abstract networksand real cells

Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006

13

Protein interaction network Sub-network / Pathway Interaction interface

Whole cell tomogramMacromolecular complexBinary interaction

Building the cell from pieces

Understanding cell networks at atomic level Acknowledgements

Rob Russell

Anne-Claude Gavin

Structural Bioinformatics @ IRB

Andreas ZanzoniAmelie SteinSasha PanjkovichRoland Pache