Structural bioinformatics Protein–protein interaction site ...
Structural Systems Biology: Proteins are social molecules Modelling Protein...
Transcript of Structural Systems Biology: Proteins are social molecules Modelling Protein...
1
Structural Systems Biology:
Modelling Protein Interactionsand Complexes
Patrick Aloy
BWS – Feb ‘07
Proteins are social molecules
ccl1YPR025C - cyclin
kin28YDL108W - pkinase
cdc28YBR160W - pkinase
cks1YBR135W - CKS
cln1YMR199W - cyclin
cln2YPL256C - cyclin
clb2YPR119W - cyclin
clb4YLR210W - cyclin
clb3YDL155W - cyclin
clb5YPR120C - cyclin
cln3YAL040C - cyclin
clb6YGR109C - cyclin
clb1YGR108W - cyclin
pho85YPL031C - pkinase
pcl1YNL289W - cyclin
pcl2YDL127W - cyclin
pcl1YDL179W - cyclin
pcl5YHR071W - cyclin
†
†
†
I
cdc42YLR229C - ras
ste20YHL007C - PBD
gic2YDR309C - PBD
cla4YNL298W - PH rdi1
YDL135C - Rho_GDI
gsp1YLR293C - ras
yrb2YIL063C - Ran_BP1
ras1YOR101W - ras
sdc25YLL016W - RasGEF
cdc25YLR310C - RasGEF
ras2YNL098C - ras
rho1YPR165W - ras
sac7YDR389W - RhoGAP
rho4YKR055W - ras
ira2YOL081W - RasGAP
†
mge1YOR232W - GrpE
ssc1YJR045C - HSP70
act1YFL039C - actin
pfy1YOR122C - profilin
spt15YER148W - TBP
TF III BYGR246C - transcrript _fac2
†
vam3YOR106W - Syntaxin
sed5YLR026C - Syntaxin
vps45YGL095C - Sec1
sly1YDR189W - Sec1
vps33YLR396C - Sec1
tlg2YOL018C - Syntaxin
I
Gavin*, Aloy* et al, Nature (2006).
A great tool to study complexes(TAP / MS)
50
100
Rel
ativ
e In
tens
ity [%
]
1000 1500 2000 2500 3000 m/z
M
*
*M
2
URA3 Kluyveromyces lactis
ORF
TAP
Chromosome
PCR productHomologous recombination
ProteinNH2 COOHTAPTAP-fusion
ORF
50
100
Rel
ativ
e In
tens
ity [%
]
1000 1500 2000 2500 3000 m/z
M
*
*M
Genome-wide analysis of the yeast proteomeORFs processed 6,466(30% with clear human orthologues)
ORFs with positive homologous recombination 5,474 (85%)
Selection of strains expressing TAP-fusion proteins 3,206 (59%)
Successful TAP-purifications 1,993 (62%)
MALDI-TOF samples 52,000 Protein IDs 36,000
2,760 (non redundant)
Extensive re-purification of complexes
64% of the known complexes were purified more than once
Reverse tagging is a means to validate new interactors
Screen ran to saturation
Reproducibility rate of 69%on139 repeated purifications
Capturing complex dynamics
Can we use our complete screen for complexes in yeast to extract
general biological principles ?
and just for the record:purifications are NOT complexes
3
De novo definition of protein complexes
V
X
Y
Z
Bait
W
V
X
Y
Z
Bait
W
BaitV
XY
Z
W
BaitV
XY
Z
W
V X
Y
Z
Bait
W
V X
Y
Z
Bait
W
Pros:information on biological re-use
Cons:no direct interactions
Affinity purification data
Matrix
Spoke
ZZ
Socio-affinity index
A
B
C B C
A-B SA-C SA-D -B-C MB-D -C-D -
---S--
MSMSSS
---MSS
C
D
B
A
D
B
C
ScoreLow Med High
TAG TAG TAG TAG
Pair Evidence (Spoke, Matrix)
A(i, j) = S i, j | i= bait + Si, j | j= bait + M i, j
S i, j | i= bait = log(n i, j | i= bait
f ibait n bait f j
prey n i= baitprey )
M i, j = log(n i, j
prey
f iprey f j
prey n prey (n prey − 1) /2all − baits∑
)
0,0001
0,001
0,01
0,1
1
0 5 10 15 20
Interaction Score
Inte
ract
ion
Affi
nity
full-lengthdomain
0
510
1520
2530
354045
-10 -9 -8 -7 -6 -5 -4 -3
Log (Kd)
% o
f int
erac
tions
AllAPY2H
7 / 5
4 2 / 1
31
/ 13
9 / 5
4 3 / 1
33
/ 13
12 /
545
/ 13
1 / 1
3
14 /
542
/ 13
2 / 1
3
10 /
541
/ 13
4 / 1
3
1 / 5
4 1 / 1
3
1 / 5
4 1 / 1
3
Biophysical meaning of Socio-affinities
Real affinity ?
P < 0.08
APs cover a broad range of Kds
Socio-affinity
Biophysical meaning of Socio-affinities
Physical proximity ?
0%10%20%30%40%50%60%70%80%90%
100%
< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15
Interaction Scores
% in
phy
sica
l con
tact
PDBY2H
17 /
921
15 /
719
5 / 9
518
/ 26
9
4 / 1
2414
/ 19
7
3 / 2
013
/ 28
8 / 1
611
/ 20 22
/ 34
10 /
19
10 /
1915
/ 22
5 / 8
12 /
23
5 / 5
17 /
22
2 / 2
7 / 7
15 /
22 25 /
30
775
/ 152
4774
795
/ 152
4764
0%10%20%30%40%50%60%70%80%90%
100%
< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15
Interaction Scores
% in
phy
sica
l con
tact
PDBY2H
17 /
921
15 /
719
5 / 9
518
/ 26
9
4 / 1
2414
/ 19
7
3 / 2
013
/ 28
8 / 1
611
/ 20 22
/ 34
10 /
19
10 /
1915
/ 22
5 / 8
12 /
23
5 / 5
17 /
22
2 / 2
7 / 7
15 /
22 25 /
30
775
/ 152
4774
795
/ 152
4764
Socio-affinity
Very good at removing “sticky” proteins(e.g. Vma2 present in 552 purifications but only good scores with Vma5,Vma10, Vma6 & Rav1)
4
• Socio-affinities capture the tendency of two proteins to be together under different conditions and thus can be used to define complexes
• It is known that proteins can belong to multiple complexes
• We need an iterative clustering procedure to disentangle the biological redundancy and versatility of protein complex composition
De novo definition of protein complexes
A
B C
D
10 9
11
66
6
10
A
B C
D
8 7
9 E5
44
4
5
5
- A B C D E F G H IA - 10 9 6 5 0 0 0 0B - - 11 6 5 0 0 0 0C - - - 6 5 0 0 0 0D - - - - 0 0 0 0 0E - - - - - 0 0 0 0F - - - - - - 10 6 4G - - - - - - - 4 6H - - - - - - - - 10I - - - - - - - - -
- A B C D E F G H IA - 8 7 4 5 0 0 0 0B - - 9 4 5 0 0 0 0C - - - 4 5 0 0 0 0D - - - - 0 0 0 0 0E - - - - - 0 0 0 0F - - - - - - 8 4 2G - - - - - - - 2 4H - - - - - - - - 10I - - - - - - - - -
H
I
F
G
8 8
4
4
22
H
I
F
G
10
6
6
44
ABCDEFGHI
ABCEDFGHI
-2
Score matrix Dendrogram Complexes
Iteration Threshold
Clustering strategy
Exploring the parameters space
• We explored a sensible range of clustering parameters (number of iterations, penalty values, etc) and generated 1,784 potential sets of protein complexes with varying degrees of stringency
• We compared each set in terms of accuracy and coverage to a hand-curated set of protein complexes (Aloy et al. Science, 2004)
• The best set consisted of 491 complexes with a coverage of 83% and an accuracy of 78%
• Known complexes and/or functional variations were in sets with slightly poorer accuracy and coverage
• We picked all the sets with values of accuracy and coverage above 70% and clustered the similar complexes
Definitive set of protein complexes
• We ended up with 5,488 slightly different variations (isoforms) of 491 complexes
• The procedure increased the coverage to 90%
• We retrieved 61% of the 279 previously known complexes (MIPS + literature mining) and identified, on average, 80% of their components
• 257 out of the 491 complexes are entirely novel
• We found no novel components for only 20 of the 279 complexes in our gold-standard set
5
Modular organisation of protein complexes
• Core average size 3.1 [1-23]• Module average size 2.9 [2-9]• Modules associated on average to 3.3 cores
Evidence supporting the modular organisation
Functional requirements(RNA processing and degradation)
Modularity and cross-talkbetween functions & compartments
Cell cycle
Cell fate
Cell transport D
efen
se
Energy
Environment
Metabolism Prot. fate Prot. synthesisTranscription
mRNA processing
Signaling
Unknown
Cell cycle
Cell fate
Cell transport
Defense
Energy
Metabolism
Prot. fate
Signaling
Unknown
Modules
Prot. synthesis
Environment
Cor
es
Cell cycle
Cell fate
Cell transport D
efen
se
Energy
Environment
Metabolism Prot. fate Prot. synthesisTranscription
mRNA processing
Signaling
Unknown
Cell cycle
Cell fate
Cell transport
Defense
Energy
Metabolism
Prot. fate
Signaling
Unknown
Modules
Prot. synthesis
Environment
Cor
es
6
• Protein networks may provide a molecular frame for the interpretation of “simple” genetic traits: essentiality (only ~20% in yeast)
• Recent phenotypic screens moved beyond essentiality in single growth condition
• Aim at providing phenotypic profiles for each genes
Rationalising phenotypesthrough complex architecture
0
5
10
15
0
≤50
>50
Similarity score
Nb
of c
ompl
exes
Random Complex core
Rationalising phenotypesthrough complex architecture
Hierarchical, dynamical and modularorganisation of protein complexes
Gavin*, Aloy*, et al. (2006) NatureBravo & Aloy (2006) Curr Opin Struct Biol
• 491 complexes (257 novel) with over 5000 isoforms• 147 functional (??) modules
But where are the details?
ras1YOR101W - ras
sdc25YLL016W - RasGEF
cdc25YLR310C - RasGEF
ras2YNL098C - ras5W - ras
ira2OL081W - RasGAP
7
Can we use 3D structures to understandthe interaction space?
ras
RhoGAP1. Interface2. Specificity
Do homologous proteins interactin the same way ?
Aloy et al. (2003) J Mol Biol
A’’B’’
A
A’
B
B’
Chothia & Lesk, EMBO J. 1986
10 Å iRMSD
% sequence identity
iRM
SD
medium highlow
80th percentile
Aloy et al. (2003) J Mol BiolAloy et al. (2005) Curr Opin Struct Biol
iRMSD vs PID
% Sequence Identity
iRM
SD
http://www.russell.embl.de/simint
iRMSD vs PID
90th percentile80th percentile
Ferredoxin-like
Asp transcarbamylase Thr deaminase
Dom1 Dom2 Dom1 Dom2
SH2 – SH3
SH3
SH3SH2
SH2
lck abl
8
CDK
p25p18
CKSs
Cyc
lins
Type 1
Type 2
Type 3
Type 4
Interaction Type(equivalent to the concept of fold)
Aloy & Russell (2004) Nature Biotechol
Structural data
Interaction data
Functional data
Genomic data
fLfcfiflnefpNN onsCivilizati ××××××= *
Aloy & Russell (2004) Nature Biotechol
Is Nature restricted to a few interaction types?
speciesAllFPFNIntsTypes ErrCNN −− ××××= 1
… emulating Cyrus again (Chothia, 1992)
Is the number of Interaction Types limited ?NTypes = NInts ×C × rFN
−1 × rFP × EAll−species
10,000interaction types
EU Sixth Framework IP (~14 Million €)
Year
Inte
ract
ions
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
New interaction types
Interaction types
Total available interactions
0
50
100
150
200
250
300
350
400
1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Growth in the number of Interaction types
9
Can we use 3D structures to understandthe interaction space?
ras
RhoGAP1. Interface2. Specificity
Y053QCRSHINCXIORAG2I7PPDCRL
DRA5DALADAP2DAB2DC4AS6OG
Family A Family B
non-Family B
What about the specificity ?
Structure
Asp
Arg Asp
Phe
Phe
Phe
Interface pair potentials
+ +
- -
Side-chain to side-chain
Side-chain to main-chain
InterPreTSInteraction Prediction through Tertiary Structure
Aloy & Russell, PNAS, 99, 5896, 2002.Aloy & Russell, Bioinformatics. 19, 161, 2003.
YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ...
RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...
Score
Significance(Do RHO4 & YFE7 interact?)
Alignments
1tx4A PIVLRETVAYLQA-------HALTTE ...YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ...
1tx4B KLVIVGDGACGKTCLLIVNSKDQF-- ...RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...
FGF - Receptor
FGF IL-1 Ricin
FGF
FHF
β-trefoil
10
Ras binding domains
Blind test on33 potential binders
22/27 (81%)Correct predictions
Z-scores
Cas
es
0
1
2
3
4
5
6
7
-2 -1,5 -1 -0,5 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5
Bind
Unclear
Don't Bind
RBPs
Ras
Structure-based P-PYeast interaction network
Aloy & Russell (2005) FEBS lett. (Systems Biology issue)
Putting structureinto pathways
Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006
Interactions of known structure
Interaction Discovery(‘Omics)
Cell Biology(EM)
We can predict interactions, good for us… and now what ?
Complex structure prediction
X-rayFive component complex
homology
homology
homology
homology
Two-hybrid network
Russell et al, Curr. Opin. Struct Biol. 2004Aloy et al, Curr. Opin. Struct. Biol. 2005
+Electron microscopy
11
Structure-based assemblyof protein complexes
from binary interactions
Aloy et al. (2004) Science
Modelling complexes from binary interactions
Samecomplex
Protein A
Protein B
Protein C
Protein D
Protein E
Protein F
HomologousProteins
Known Structure
Aloy et al. (2004) Science
3Drepertoire Pipeline1739 genes
589 multi-protein assemblies232 complexes
126 purifications102 manually annotated complexes
EM quality 6 - 9
634 proteins
Nearly completeMost individual components & few interactionsMost individual componentsSome individual componentsNo structural information
42
1220
25
3
Structural Overview(102 hand-annotated complexes)
Aloy et al, Science. 2004
12
Respiratory Fumarate Reductase S. Putrefaciens (1d4d)
Adenylylsufate reducatase A. Fulgidus (1jnr)
Succinate dehydrogenase E.Coli (1nek)templates sharing less than 40% homology
Models are filtered by:- Quality of the superpositon target/template- Geometrical clashes (bumps, interactions made)- Quality of contacts (InterPreTS)
In this case:- 4/7 domains could be modelled- distance to original complex: 8.1A- good InterPreTS scores
<25% id
<28% id
<27% id
Fumarate reducatase W. Succinogenes (1qla)
Matthieu Pichaud (EMBL-HD)
Structure-based assemblyof protein complexes …
… and networks
A BC
D F
EK
I HJ
GCross-talk
Complex from affinity purificationComplex from literature, etc.Interaction from two-hybridsInteraction predicted by structureSequence similaritySimilarity inferred through structure
?
Bridge the gap between abstract networksand real cells
Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006
13
Protein interaction network Sub-network / Pathway Interaction interface
Whole cell tomogramMacromolecular complexBinary interaction
Building the cell from pieces
Understanding cell networks at atomic level Acknowledgements
Rob Russell
Anne-Claude Gavin
Structural Bioinformatics @ IRB
Andreas ZanzoniAmelie SteinSasha PanjkovichRoland Pache