EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene...

29
EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html

Transcript of EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene...

Page 1: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

MSDpisa

a web service for studyingProtein Interfaces, Surfaces and Assemblies

Eugene Krissinel

http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html

Page 2: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

What PISA is aboutWhat PISA is about

Crystal = translated Unit CellMore than 80% of protein structures are solved by means of X-ray diffraction on crystals.

An X-ray diffraction experiment produces atomic coordinates of the crystal’s Asymmetric Unit (ASU).

In general, neither ASU nor Unit Cell has any relation to Biological Unit, or stable protein complex which acts as a unit in physiological processes.

Is there a way to infer Biological Unit from the protein crystallography data?

Unit Cell = all space symmetry group mates of ASU

PDB file

Page 3: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

?no image or bad image

In (very) simple words …In (very) simple words …

2

crystallisation

3

in crystal

? ?good image but no

associations

in vivo

1

Page 4: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

At first glance …At first glance …

… the solution is simple as 1-2:

1. Evaluate all protein contacts (interfaces) in crystal2. Leave only the strongest (“biologically relevant”) ones

- and what you get will have chances to be a stable protein complex.

Small technical problem:

How to discriminate between “real” (biologically relevant) and “superficial” (inter-assembly, or crystal packing) interfaces?

Page 5: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

0 20 40 60 80

0

1000

2000

3000

4000

5000

6000

7000

PDB entry

Bur

ied

AS

A [Å

2 ]

dimersmonomers

Real and superficial protein interfacesReal and superficial protein interfaces

Most often used discrimination criteria - interface area.

A cut-off at 900 Å2 gives about 80% success rate of discrimination between monomers and dimers.

Big proteins would be always sticky if this criteria is true …

Page 6: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

0 20 40 60 80

-80

-60

-40

-20

0

PDB entry

Fre

e E

nerf

gy G

ain

[kca

l/M]

dimersmonomers

Free energy gain of interface formation.

A cut-off at -8 kcal/M gives about 82% success rate of discrimination between monomers and dimers.

Can energy measure be uniform for all weights and shapes?

Real and superficial protein interfacesReal and superficial protein interfaces

Page 7: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

0 20 40 60 80

0

0.2

0.4

0.6

0.8

PDB entry

P-v

alue

of H

ydro

phob

ic P

atch dimers

monomers

Real and superficial protein interfacesReal and superficial protein interfaces

P-value of hydrophobic patches.

A measure of probability for the interface to be more hydrophobic than found.

A cut-off at 0.2 gives about 60% success rate of discrimination between monomers and dimers.

Page 8: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

PDB entry

Pac

king

Edg

e F

acto

r

dimersmonomers

Real and superficial protein interfacesReal and superficial protein interfaces

Packing edge factor.

A measure showing how closely the mass packing edge matches the actual interface.

A cut-off at 0.3 gives about 60% success rate of discrimination between monomers and dimers

interface

packing edge

Page 9: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

No ultimate discriminating parameter for the identification of biologically relevant protein interfaces may be proposed at present even for dimeric complexes

Jones, S. & Thornton, J.M. (1996) Principles of protein-protein interactions, Proc. Natl. Acad. Sci. USA, 93, 13-20.

Formation of N>2 -meric complexes is most probably a corporate process involving a set of interfaces. Therefore significance of an interface should not be detached from the context of protein complex

Real and superficial protein interfacesReal and superficial protein interfaces

Page 10: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Making assemblies from significant interfacesMaking assemblies from significant interfaces

PQS server @ MSD-EBI (Kim Henrick) Trends in Biochem. Sci. (1998) 23, 358

Method: recursive splitting of the largest complexes as allowed by crystal symmetry. Termination criteria is derived from the individual statistical scores of crystal contacts. The results are not curated.

PITA software @ Thornton group EBI (Hannes Ponstingl) J. Appl. Cryst. (2003) 36, 1116

Method: progressive build-up by addition of monomeric chains that suit the selection criteria. The results are partly curated.

Despite failure to find an ultimate measure for interface biological relevance, two approaches were developed that use scoring of individual interfaces:

Page 11: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

It is not properties of individual interfaces but rather chemical stability of protein complex in general that really matters

Protein chains will most likely associate into largest complexes that are still stable

A protein complex is stable if its free energy of dissociation is positive:

Chemical stability of protein complexesChemical stability of protein complexes

0int STGGdiss

How to calculate Gdiss?

Page 12: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Protein affinityProtein affinity

sbsbhbhb

n

iisns NENEAGAAAGG

121int ,

Solvation energy of protein complex

Solvation energies of dissociated

subunits

Free energy of H-bond formation

Number of H-bonds between

dissociated subunits

Free energy of salt bridge

formation

Number of salt bridges between

dissociated subunits

321 AAA 321 AAA

Dissociation into stable subunits with minimum

dissG

Choice of dissociation subunits:

Gint is function of protein interfaces

Page 13: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Solvation free energySolvation free energy

k

rkkks aaAG

Atomic solvation parameters

Atom’s accessible

surface area

Atom’s accessible surface area in reference (unfolded)

state

protein

solv

ent

ka

Eisenberg, D. & McLachlan, A.D. (1986)Nature 319, 199-203.

k

Page 14: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Entropy of macromolecules in solutions Entropy of macromolecules in solutions

aSISmSS surfSrottrans ,ˆ

Translational entropy Rotational entropy Sidechain entropy

MassSolvent-accessible

surface areaTensor of inertia

mRcmS ttrans log23

2321log2,ˆ SrSrot IIIRcIS

FaaSsurf

Murray C.W. and Verdonik M.L. (2002)J. Comput.-Aided Mol. Design 16, 741-753.

Symmetry number

ct , cr and F are semiempirical parameters

Page 15: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Entropy of dissociation Entropy of dissociation

n

n

ii AAASASS 21

1

,

Fitted parameter

Fitted parameter

Mass of i-th subunit

i i

i im

mRCn log123

buried

AAAAI

AAIR FanSk nk

i iSk ik

12

1

2

log2

k-th principal moment of inertia of i-th subunit

S is function of protein complex

Page 16: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

How to identify an assembly in crystal?How to identify an assembly in crystal?

We now know (or we think that we know) how to evaluate chemical stability of protein complexes.

Given a 3D-arrangement of protein chains, we can now say whether there are chances that this arrangement is a stable assembly, or biological unit.

But how to get potential assemblies in first place?

Page 17: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

How to catch a Desert Lion?

Method of Desert LionMethod of Desert Lion

Catch alllions and keepOne living in

Desert

Page 18: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Enumerating assemblies in crystalEnumerating assemblies in crystal

crystal is represented as a periodic graph with monomeric chains as vertices and interfaces as edges

each set of assemblies is identified by engaged interface types

all assemblies may be enumerated by a backtracking scheme engaging all possible combinations of different interface types

Example: crystal with 3 interface types

Assembly set

Engaged interface types

1 000 - only monomers2 001 - dimer N13 010 - dimer N24 011

Assembly set

Engaged interface types

5 100 - dimer N36 101 7 110 8 111 - all crystal

Page 19: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Clever backtrackingClever backtracking

The number of different interface types may reach a hundred. The algorithm is not going to complete backtracking of 2100 combinations unless it is clever enough to

check geometry and engage induced interfaces as soon as they emerge

check geometry and terminate backtracking if assembly contains two identical chains in parallel orientations

see the future and terminate backtracking if there are no stable assemblies down the current branch of the recursion tree

Engaged interfaces

Induced interface

Otherwise assembly will be infinite due to translation symmetry in crystal

Based on the observation that entropy of dissociation of unstable assemblies only increases down the recursion tree

… only then the algorithm completes in 0.1 secs to 1.5 hours depending on the structure …

Page 20: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Detection of Biological Units in CrystalsDetection of Biological Units in Crystals

1. Build periodic graph of the crystal

2. Enumerate all possibly stable assemblies

3. Evaluate assemblies for chemical stability

4. Leave only sets of stable assemblies in the list and range them by chances to be a biological unit :

• Larger assemblies take preference• Single-assembly solutions take preference• Otherwise, assemblies with higher Gdiss take preference

Method Summary

Page 21: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Are we any close?Are we any close?

Assembly classification on the benchmark set of 218 structures published in

Ponstingl, H., Kabir, T. and Thornton, J. (2003) Automatic inference of protein quaternary structures from crystals. J. Appl. Cryst. 36, 1116-1122.

1mer 2mer 3mer 4mer 6mer Other Sum Correct 1mer 50 4 0 1 0 0 55 91% 2mer 6 68+11 0 2+1 0 0 76+12 90% 3mer 1 0 22 0 1 0 24 92% 4mer 2 3 0 27+6 0 0 32+6 87% 6mer 0 0 0 1 10+2 0 11+2 92% Total: 198+20 90%

198+20 <=> 198 homomers and 20 heteromers

Fitted parameters:

hbE

sbE

1. Free energy of a H-bond :

2. Free energy of a salt bridge :

3. Constant entropy term :

4. Surface entropy factor : FT CT

= 0.51 kcal/mol

= 0.21 kcal/mol

= 11.7 kcal/mol

= 0.57·10-3 kcal/(mol*Å2)

Classification error in Gdiss : ± 5 kcal/mol

Page 22: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

A better method?A better method?

PQS server : 78% (not optimised on the benchmark set, but manually curated)

PITA software : 84% (optimised with 18 parameters, system overfit(?))

Present study : 90% (optimised with 4 parameters, system underfit)

Percent of successful classifications, as measured on the same benchmark set of 218 PDB entries:

Page 23: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

1mer 2mer 3mer 4mer 5mer 6mer 8mer 10mer 12mer Other Sum Correct 1mer 131 11 0 4 0 2 2 0 0 0 150 87% 2mer 12+6 88+12 1 4 0 1 2 0 0 0 105+21 79% 3mer 1 2 6+2 0 0 1 0 0 0 0 7+5 67% 4mer 1+1 5+2 0 25+5 0 0 1+2 0 0 0 32+10 71% 5mer 1 0 0 0 2+1 0 0 0 0 0 2+2 75% 6mer 1 2+1 0 0 0 13+2 0 0 0 0 15+4 79% 8mer 0 1 0 0 0 0 0+2 0 0 0 1+2 67% 10mer 0 0 0 0 0 0 0 2 0 0 2 100% 12mer 2 0 0 0 0 0 0 0 5+1 0 7+1 75% Total: 321+45 81%

What is beyond the benchmark set?What is beyond the benchmark set?

Classification results obtained for 366 recent depositions into PDB in reference to manual classification in MSD-EBI :

321+45 <=> 321 homomers and 45 heteromers

Classification error in Gdiss : ± 5 kcal/mol

Page 24: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Is it ever going to be 100%?Is it ever going to be 100%?

theoretical models for protein affinity and entropy change upon protein complexation are primitive

coordinate (experimental) data is of a limited accuracy

there is no feasible way to take conformations in crystal into account

experimental data on multimeric states is very limited and not always reliable - calibration of parameters is difficult

protein assemblies may exist in some environments and dissociate in other - a definite answer is simply not there

Nobody should be that naive, because :

Page 25: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Web-server PISAWeb-server PISA

A new MSD-EBI tool for working around Protein Interfaces, Surfaces and Assemblies

http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html

Page 26: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Page 27: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Page 28: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

Page 29: EMBL-EBI MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel .

EMBL-EBI

ConclusionsConclusions

Stable protein complexes, which are likely to be biological units, may be calculated from protein crystallography data at 80-90% success rate

Biological relevance of a particular protein interface cannot be reliably inferred from the interface properties only. Instead, one should conclude about significance of an interface from the analysis of the relevant protein assemblies

Acknowledgement. This work has been supported by research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.