Databanks + New tools = New insights THE AXIOM S imple A tom D epth I ndex C alculator protein fold...

Post on 20-Jan-2016

217 views 2 download

Tags:

Transcript of Databanks + New tools = New insights THE AXIOM S imple A tom D epth I ndex C alculator protein fold...

Databanks +New tools =New insights

THE AXIOM

Simple Atom Depth

Index Calculator

protein fold barcodingCATH – ADAPT… -1

protein foldingBirth of the Earth

Digging inside objects to discover their origins

SADIC: a new tool to analyze atom depth

* Chakravarty S, Varadarajan R. Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold Des. 1999 7:723-732

* Pintar A, Carugo O, Pongor S. Atom depth as a descriptor of the protein interior. Biophys J. 2003 84:2553-2561.

atom depth calculated as the distance with:

the closest external water*

the closest dot of the water accessible surface*

the closest surface exposed atom*

atom depth

HEWL 4lzt

2D

atom depth2D

Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860

Calculation of exposed volumes

3D

HEWL 4lzt

2D

atom depth

Calculation of exposed volumes

HEWL 4lzt

3D

Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860

Calculation of exposed volumes

Depth index:

Di,r = 2Vi,r / V 0,r

where Vi,r is the exposed volume of a sphere of radius r centered on atom i of the molecule and V0,r is the exposed volume of the same sphere when centered on an isolated atom

HEWL 4lzt

atom depth3D

Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics 2005 21:2856-2860

the sphere radius r should have the biggest value which makes Vi = 0 for the most buried atom

0,0

0,5

1,0

1,5

2,0

4,0

8,0

12,0

16,0

20,0

24,0

Di,r

r [Å]

Thr 47 α carbon Di,9 = 1.59

Ile 58 α carbon Di,9 = 0.13

Trp 28 α carbon Di.9 = 0.03

58

47

28

atom depth3D vs 2D

HEWL 4lzt

3D atom depth analysis

from PDB ID1UBQ

http://www.sbl.unisi.it/prococoa/

Di

SBL Bioinformatics Projects

Projects SADIC correlated:

1. fold dependent aa compositions of protein cores;

2. towards i-SADIC.----------------------------------------------------

Projects SADIC uncorrelated:

1. systematic analysis of PPI

Di analysis of protein atomsdefining strutural

layers in protein 3D structureseach strutural layer

includes atoms with similar Di’s

fast and accurate analysis of aa content of structural

layers

Ln Dicolor

L6 > 1.2 red

L5 1.0 – 1.2 orange

L4 0.8 – 1.0 yellow

L3 0.6 – 0.8 green

L2 0.4 -0.6 blue

L1 0.2 - 0.4 indigo

L0 < 0.2 violet

3 VTR (chitinolytic enzyme 572 aa)

Di analysis of protein atoms

N 0.19CA 0.30C 0.25O 0.23CB 0.50CG 0.68CD 0.91CE 1.11NZ 1.29

K63

N 0.38CA 0.52C 0.50O 0.52CB 0.76CG 0.95CD 1.17OE1 1.24OE2 1.24

E24

3D atom depth analysisN 0.10CA 0.05C 0.11O 0.18CB 0.02CG 0.02CD1 0.02CD2 0.00

L43

Dimax

Dimax

Dimax

from PDB ID1UBQ

http

://ww

w.s

bl.u

nis

i.it/pro

co

co

a/

Dimax analysis of protein residues

defining aa occupancy in protein strutural layers

each strutural layer includes residues with

similar Dimax’sfast and accurate analysis of aa

distribution in protein structures

Dimax analysis of protein singlesquite a few proteins like to stay single

(at least in the crystalline state)

Bioinformatiha 2, Firenze 18 ottobre

-9

a database of protein singlesExperimental Method: X-RAY (79,770)

Chain Type: Protein (74,456)

Only 1 chain in asym. unit: (28,803)

Oligomeric state: 1 (21,193)

Number of Entities: 1 (3,517)

Homologue Removal @ 95% identity

(2,410)

2,410 proteins in the dataset

4,657,574 atoms589,383 residues

2162

322482

642802

9621122

12821442

16021762

192202468

1012141618

DOOPS:

a database of protein singles

2,410 proteins in the dataset

4,657,574 atoms589,383 residues

DOOPS:

Swiss-Prot: 540,958 proteins in the dataset (192 Maa)

2162

322482

642802

9621122

12821442

16021762

192202468

1012141618

0 20001000

calculation of % amino acid content in L0

the first quantitative analysis of a large array of protein cores!aa % in L0

Alanine 11.51Cysteine 2.63Aspartate 1.77Glutamate 1.2

Phenylalanine 6.36Glycine 10.81

Histidine 1.32Isoleucine 11.74

Lysine 0.58Leucina 16.27

Methionine 2.49Asparagine 1.7

Proline 2.45Glutamine 1.21Arginine 0.83Serine 4.85

Threonine 4.65Valine 13.7

Tryptophan 1.43Tyrosine 2.5

Dimax analysis of protein cores2,410 proteins; 4,657,574 atoms; 589,383 residues DOOPS:

~20 % of total molecular volume ΣDOOPS aa(L0) =

106,088(from 2410 proteins)

core aa if Dimax < 0.2

aa % in L0

Alanine 11.51Cysteine 2.63Aspartate 1.77Glutamate 1.2

Phenylalanine* 6.36Glycine 10.81

Histidine 1.32Isoleucine 11.74

Lysine 0.58Leucina 16.27

Methionine 2.49Asparagine 1.7

Proline 2.45Glutamine 1.21Arginine 0.83Serine 4.85

Threonine 4.65Valine 13.7

Tryptophan 1.43Tyrosine 2.5

ClassArchitectur

esTopolog

y

Homologous

superfamily

Domains

1 (mainly α) 5 386 875 37,038

2 (mainly β) 20 229 520 43,881

3 (α & β) 14 594 1113 90,029

4 (few sec. str.) 1 104 118 2,588

Total 40 1313 2626173,53

6

Di analysis of protein coresfolding clues from aa core

composition?

:

1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 total

Proteinsmono

213 (84)

84(40)

19(17)

10(3)

17(13)

57(37)

94(73)

134(110)

12(12)

84(73)

52(44)

139(106)

218203

10(8)

49(49)

1,190(872)( )

Di analysis of protein coresfolding clues from aa core

composition?

#

domain

DOOPS + CATHselected Architectures

with ≥ 10 PDB files

:

Cys

PDB ID 1UZK(A01)

aa % average value (av)

av + σ

av + 2σ

av - σ

av - 2σ

Towards protein folding barcodes

ribbon

LeuPhe

PDB ID 1RG8(A00)

trefoil

Val

PDB ID 2IMH(A01)

four layersandwich

ClassArchitectur

esTopolog

y

Homologous

superfamily

1 5 386 875

2 20 229 520

3 14 594 1113

4 1 104 118

Total 40 1313 2626

% L0 1.10 1.20 1.25 1.50 2.10 2.30 2.40 2.60 2.80 3.10 3.20 3.30 3.40 3.60 3.90 overall

ALA 13,28 10,32 21,46 12,74 9,26 10,05 8,43 9,32 5,5 10,69 10,08 12,58 11,88 14,95 12,0111.5

1ARG 0,6 1,28 0,24 1,39 0 0,64 1,72 0,75 0 0,55 1,11 1,75 0,3 0,47 0,95 0.83

ASN 0,67 2,62 0,73 2,77 1,85 2,04 1,77 1,36 0 2,1 2,9 0,96 1,52 2,8 2,1 1.70

ASP 1,61 2,62 0,24 2,91 1,23 1,27 2,03 1,79 0 2,1 2,9 3,02 1,77 2,34 0,95 1.77

CYS 3,35 2,99 5,37 0,83 22,84 2,04 1,46 4,42 0,92 2,83 2,1 1,49 1,86 1,4 3,05 2.63

GLN 0,6 1,5 0,24 1,11 1,23 1,15 1,81 1,69 0 0,46 1,56 2,15 0,99 1,4 1,33 1.21

GLU 1,48 1,44 0,73 1,52 0 1,15 1,19 1,04 0 0,91 2,59 2,41 1,08 0,93 0,67 1.20

GLY 8,05 8,72 9,76 13,85 16,05 9,92 16,2 10,82 9,17 8,78 11,81 11,35 12,64 13,08 9,9110.8

1HIS 1,01 1,6 2,44 1,11 0,62 0,76 0,79 0,56 0 2,65 1,96 3,02 1,91 0,47 2,48 1.32

ILE 12,68 9,95 10,73 8,59 6,79 13,61 10,68 10,78 13,76 12,8 11,77 12,53 11,53 7,01 11,3411.7

4

LEU 23,88 18,34 22,44 11,77 8,02 17,18 12,97 13,98 33,94 16,54 11,9 14,33 14,22 15,42 13,6316.2

7LYS 0,67 0,91 0 1,11 0 0,38 0,49 0,56 0 0,09 0,62 1,36 0,55 0 0,67 0.58

MET 2,62 4,17 1,71 4,99 0 2,8 2,65 3,15 1,83 2,93 2,76 2,41 2,39 3,27 1,91 2.49

PHE 6,44 6,79 2,93 4,57 4,32 7,12 7,06 6,73 15,6 7,22 4,95 6,18 6,07 4,21 6,01 6.36

PRO 1,34 2,46 3,41 2,63 3,09 3,31 3 2,78 0 3,29 2,9 1,84 2,25 1,4 1,81 2.45

SER 3,49 4,55 3,66 5,96 3,09 5,34 5,56 5,13 2,75 2,83 5,35 4,43 4,23 6,07 5,34 4.85

THR 2,28 4,81 4,15 7,2 5,56 3,31 5,12 4,47 0,92 3,2 5,22 4,25 4,94 5,14 5,91 4.65

TRP 1,01 1,55 0 2,77 3,7 0,38 1,63 2,78 2,75 2,19 1,52 0,66 1,26 0,47 2,1 1.43

TYR 2,62 3,69 0,24 4,57 2,47 1,27 2,69 4,38 0,92 3,29 3,12 1,58 2,32 0 2,29 2.50

VAL 12,34 9,68 9,51 7,62 9,88 16,28 12,75 13,51 11,93 14,53 12,88 11,7 16,29 19,16 15,54 13.7

# PDB

213 (84)

84(40)

19(17)

10(3)

17(13)

57(37)

94(73)

134(110)

12(12)

84(73)

52(44)

139(106)

218203

10(8)

49(49) 2,410

Di of 173,536 CATH domains28 h, 5’ (average comp. time 1.72

s/domain)Calculations performed on

6 cores 990X CPU based computer

Ala

PDB ID 3CKC(A02)

alphahorseshoe

CATH-ADAPT

CATH - atom depth assisted protein

tomography

Towards protein folding barcodesPutting the protein universe in

order

Towards protein folding barcodesPutting the protein universe in

order

towards i-SADIC(implemented SADIC)

towards i-SADIC(implemented SADIC)

H/D exchange rate profiles

towards i-SADIC(implemented SADIC)

H/D exchange rate profilesD

DD

DD

D

D

D

D

D

D

D

D

D

towards i-SADIC(implemented SADIC)

H/D exchange rate profiles

towards i-SADIC(implemented SADIC)

H/D exchange rate profiles

towards i-SADIC(implemented SADIC)

H/D exchange rate profiles

2D atom depth or 3D atom depth

H/D exchange rate profiles

data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660.

dnwi = or atom distance with the nearest water

molecule

Di,9 = or atom depth index with a probe od radius 9 Å

iSADIC atom depth 3D atom depth

H/D exchange rate profiles

data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol. 1993 230(2):651-660.

Di,9 = or atom depth index with a probe od radius 9 Å

iDi,9 = aDi,9 + bASAi

cDi,9 + dDnwi

iSADIC atom depth 3D atom depth

H/D exchange rate profiles

iDi,9 = aDi,9 + bASAi

cDi,9 + dDnwi

protein-protein interface analysis

biological vs crystallographic interfaces

crystallographic dimers

biological dimers

vs

N ARG CA ARG C ARG O ARG CB ARG CG ARG CD ARG NE ARG CZ ARG NH1 ARG NH2 ARG H ARG HA ARG HB2 ARG HB3 ARG HG2 ARG HG3 ARG HD2 ARG HD3 ARG HE ARGHH11 ARGHH12 ARGHH21 ARGHH22 ARG

N LYSCA LYSC LYSO LYSCB LYSCG LYSCD LYSCE LYSNZ LYSH LYSHA LYSHB2 LYSHB3 LYSHG2 LYSHG3 LYSHD2 LYSHD3 LYSHE2 LYSHE3 LYSHZ1 LYSHZ2 LYSHZ3 LYS