DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION
description
Transcript of DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION
![Page 1: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/1.jpg)
DISTANCE MATRIX-BASED APPROACH TO PROTEIN
STRUCTURE PREDICTION
Andrzej Kloczkowski, Robert L. Jernigan, Zhijun Wu, Guang Song, Lei Yang - Iowa State
University, USA
Andrzej Kolinski, Piotr Pokarowski - Warsaw University, Poland
![Page 2: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/2.jpg)
Matrices containing structural information
• Distance matrix (dij)
• Matrix of square distances D = (dij2)
• Contact matrix C = (cij)
cij = 1 if dij > dcutoff
otherwise cij = 0
• Laplacian of C (Kirchhoff matrix)
Lc = diag(cij) - C
![Page 3: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/3.jpg)
Lc-1 generalized inverse of Lc in
elastic network models defines covariance between fluctuations
Similarly we can define Laplacian of D: LD and generalized inverse LD
-1
![Page 4: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/4.jpg)
Spectral decomposition of structural matrices
A = k vk vkT
is expressed by eigenvalues and corresponding eigenvectors of A
![Page 5: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/5.jpg)
Spectral decomposition of a square distance matrix
Spectral decomposition of a square distance matrix is a complete and simple description of a system of points. It has at most 5 nonzero, interpretable terms:
A dominant eigenvector is proportional to r2 - the square distance of points to the center of the mass, and the next three are principal components of the system of points.
![Page 6: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/6.jpg)
![Page 7: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/7.jpg)
![Page 8: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/8.jpg)
![Page 9: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/9.jpg)
![Page 10: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/10.jpg)
CN – contact number
PECM – principal eigenvector of the contact matrix
GNM – fluctuations of residues computed from the Gaussian Network Model (Bahar et al. 1997)
SVR – Support Vector Regression – variant of SVM for continuous variables
B-factor – temperature factor from X-ray crystallography
![Page 11: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/11.jpg)
B-factor correlates with the distance from the center of mass r2 – Petsko 1980
Correlation between fluctuations of residues and the inverse of their contact number – Halle 2002
![Page 12: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/12.jpg)
Approximation of distance matrices
• A = k vk vkT
• We used a nonredundnt database of 680 structures from the ASTRAL database
• r2 itself approximates structures with DRMS 7.3Å
• r2 combined with first principal component approximates structures with DRMS 4.0Å
![Page 13: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/13.jpg)
Current work:
Prediction of r2 from the sequence with SVR
Prediction of the first structural component from the sequence
![Page 14: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/14.jpg)
Principal Component Analysis of Multiple HIV-1 Proteases Structures
• 164 X-ray PDB structures and 28 NMR PDB structures and 10,000 structures (snapshots) from the Molecular Dynamics simulations were analysed.
• The Principal Component Analysis of these three different datasets were performed.
• The results were compared with normal modes computed from the Anisotropic Network Model – an Elastic Network Model that considers anisotropy of fluctuations of residues in protein.
![Page 15: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/15.jpg)
The -carbon trace of the HIV-1 structure
![Page 16: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/16.jpg)
Elastic network models
Rubber elasticity (polymers - Flory)
Intrinsic motions of structures (Tirion 1996)
Simple elastic networks of uniform material Appropriate for largest, most important domain
motions of proteins - independent of many structure details
High resolution structures not needed to learn about important motions
Rubbery Bodies with Well Defined, Highly Controlled Motions
![Page 17: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/17.jpg)
Elastic Network Models
Calculating Protein Position Fluctuations
Vtot(t) = (/2) tr [R(t)T R(t)]
<Ri . Rj> = (1/ZN) ∫ (Ri . Rj) exp {-Vtot/kT} d{R}
= (3kT/) [-1]ij
= Kirchhoff matrix of contacts
=
Compute Normal Modes for Fluctuations and Correlations
![Page 18: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/18.jpg)
HIV Reverse Transcriptase – Slowest Motion
Push-pull Hinge
![Page 19: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/19.jpg)
Modes of Motion – HIV Protease
Mode 1 Mode 2 Mode 3
Three Ways to Open the Flaps
![Page 20: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/20.jpg)
NMR Structures Fit Elastic Networks Better than X-Ray Structures
Results for 164 X-ray and 28 NMR HIV Protease Structures
HIV ProteaseOverlaps between directions of motions
(dot products of vectors)
Includes Many Drug Bound Structures
Distortions for Drug Binding Are Intrinsic to Protein Structure
![Page 21: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/21.jpg)
Cumulative Overlaps with NMR Motions
NMR Agreement Better than X-ray
![Page 22: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/22.jpg)
Structural Refinement Using Distribution of Distances
• We have developed a method of refining NMR structures using derived distance constraints and mean-force potentials.
• The original NMR experimental constraints for the structures were downloaded from BioMagResBank.
• The structures were refined using the default dynamic simulated annealing protocol implemented in CNS software (Brunger et al. Yale Univ).
• We used also mean-force potentials E = kT ln P(r) by adding them into the energy function of the NMR modeling software CNS. The structures have been improved significantly (in terms of RMSD, their energy, NOEs, etc.) after refinement with the database-derived mean-force potentials.
![Page 23: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/23.jpg)
CASPR 2006
• We have successfully used this method in CASPR 2006 structure refinement experiment.
• Figure below shows application of our method for a model of 1WHZ (70 residues) – a refinement from 2.19 Å to 1.80 Å has been obtained.
![Page 24: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/24.jpg)
Distance Intervals
i
j
The distances are given with their possible ranges.
Sj)i,(,u||xx||l
such that xall find
ji,jiji,
j
NP-hard!
![Page 25: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/25.jpg)
A Generalized Distance Geometry Problem
j
i
n3j
x,rj 1
i j i j i,j
i j i j i,j
max r
subject to
||x x || r r u
||x x || r r l , (i,j) S
ri
rj
di,j
Root mean square fluctuationsB-factors
![Page 26: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/26.jpg)
Data generation:
fi : the rms fluctuation of atom i.
S = {(i,j) : di,j = ||yi – yj|| < 5Å}
li,j = di,j – fi – fj
ui,j = di,j + fi + fj
Problem solved:
ri : the fluctuation radius of atom i.
maxx, r ∑ ri3
di,j = ||xi – xj||
li,j ≤ di,j – ri – rj
ui,j ≥ di,j + ri + rj, (i,j) in S
Original:
Computed:
RMSD (x, y) = 3.6 e -07
Protein 1AX8
1017 atoms
![Page 27: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/27.jpg)
0 200 400 600 800 1000 12000
0.05
0.1
0.15
0.2
0.25
Atomic Fluctuations
Original
Computed
fi
ri
![Page 28: DISTANCE MATRIX-BASED APPROACH TO PROTEIN STRUCTURE PREDICTION](https://reader036.fdocuments.us/reader036/viewer/2022062305/5681596e550346895dc6afec/html5/thumbnails/28.jpg)
Acknowledgments:
• NIH support:
• 1R01GM081680-01 (AKlo)
• 1R01GM073095-01A2 (RLJ) 1R01GM072014-01 (RLJ)