Correlated Movements in Proteins: Insights from the Dynamics …€¦ · Correlated Movements in...
Transcript of Correlated Movements in Proteins: Insights from the Dynamics …€¦ · Correlated Movements in...
Correlated Movements in Proteins:
Insights from the Dynamics of Protein Structure Networks
Saraswathi Vishveshwara
Molecular Biophysics Unit
JNC, Nov 15, 2007
Indian Institute of Science
Bangalore, India
Outline
1.Correlated fluctuations from protein dynamics
a) residue-wise RMSD
b) Pair-wise cross correlations
c) Important modes from essential dynamics
2. Protein structure Networks
3. Dynamics of networks:
Results based on the equilibrium dynamics of
the protein: Methionine tRNA synthetase
Essential modes- effect of ligand binding
Cross-correlation maps
Communication pathways
Depend crucially upon the folded states of the Depend crucially upon the folded states of the
proteins. proteins.
Biological Functions of Proteins
Protein Structure to Function
Biological
multimeric
states
Disease states
mutations
Active sites, enzyme clefts
Antigenic sites
Surface properties
3D STRUCTURE
Protein-Ligand
Interactions
While the rigidity is required to maintain the structure, flexibility is needed to perform the function
Flexibility and correlated motions from MD simulations
Residue-wise RMSD- fluctuations of individual residues
Cross correlation map- correlations between residues
Essential modes- global motions in the protein
Simulation on Eosinophil Cationic Protein (ECP)
Sanjeev & Vishveshwara, JBSD (2004, 2005)
RMSD Trajectories
RMSD as a function of simulation time
continuous line: w.r.t <MD>
broken line: w.r.t crystal structure
RMSD as a function of residue number
( ) ( )( ) ( )222 2
i i j j
ij
i i j j
r r r rC
r r r r
− −=
− −
Pair-wise Cross Correlation
Cij – the cross correlation between the residues i and j
ri, rj the coordinates at a given time point,
<ri>, <rj> are the averages over the trajectory
Collective modes from Essential Dynamics
Molecule of N atoms has 3N degrees of freedom. Essential dynamics is a way of capturing important collective modes
Covariance matrix (M=[mij])
mij=(1/S) Σt(xi(t)-<xi>)(xj(t)-<xj>)
where S = total number of configurations
t = time in picoseconds
i = ith coordinate (i=1,2,...,3N) ('N' being number of atoms)
<xi>= average value of xi over all configurations
Find Eigenvalues and normalized Eigenvectors of Covariance matrix
Reference:
A. Amadei, A.B. Linssen, H.J. Berendsen, Proteins: Struct. Func. Gen., 17, 412 (1993).
Relative positional fluctuation :
RPF(n)= Si=1,nl(i)/ Si =1,3N l(i)
where l(i) is ith eigenvalue.
RPF(n) gives the amount of motion associated in the
subspace spanned by the first n eigenvalues.
Rossmann fold (RED): CP (Green): Anticodon domain (Navy Blue):
Structure of Methionine tRNA Synthetase
Molecular Dynamics (MD) simulations on native and ligand-bound methionine t-RNA synthetase
1. MetRS (550 amino acids) 2. MetRS:Met complex Methionine3. MetRS:Met AMP complex methionyl adenylate
(Amit Ghosh and SV -unpublished)
RMSD Plots
EcMetRS:Met AMP
EcMetRS
EcMetRS:Met
Time(ns)
Time (ns)
Top ten modes account for about 72% of the motion in essential space
RPF Plots
Essential Dynamics
Superposition of the top modesfree (MetRS): Opening mode (red)
ligand-bound MetRS(MetRS:Met AMP) : Closing mode (blue)
Essential Dynamics
captures the global movements in proteins, such as inter-domain motions, hinge bending etc.
Dynamical cross correlation map representing inter-residue cross correlations
MetRS+tRNA+MetAMP
( ) ( )( ) ( )222 2
i i j j
ij
i i j j
r r r rC
r r r r
− −=
− −
Protein structures as Graphs
Protein GraphsProtein Graphs
• Nodes : Amino acid Residues
• Edges : Spatial neighbours
within fixed distance
Main Chain Interaction (back bone level )
S.M.PatraS.M.Patra, , KannanKannan, , VishveshwaraVishveshwara, , BiophysBiophys. . ChemChem (2000); JMB (2000); JMB
(1999)(1999)
High and low contact criteria. A pair of phenylalanine rings interacting with each other are shown. The lines between the phenylalanines indicate the atoms that are within a distance of 4.5Å.
a) High Contact b) Low Contact
Side Chain Interaction
c)No interaction
High (Iij=11%)Iij=0%
Low (Iij=4%)
Iij = (nij÷√÷√÷√÷√(Ni*Nj))××××100
Imin is user defined interaction cutoff.
An (ij) residue pair with Iij > Imin is connected by an edge.
Backbone-based versus the Side-chain-based
Protein Structure Graphs
Backbone-based (coarse grained)
Based on C-alpha-C-alpha distance
The extent of side-chain interaction is not considered
Side-chain-based
The interactions between sidechains are quantified, hence a weighted graph can be constructed or graphs can be constructed on the basis of the strength of interaction
Graph spectral parametersGraph spectral parameters
Provide information on theProvide information on the clusters of interacting residuesclusters of interacting residues
Detect Detect cluster cluster centrescentres, , which play a crucial role in the integrity which play a crucial role in the integrity
of the clusterof the cluster
Centre of a GraphCentre of a Graph
E(v) = max d(v,vi)vi∈G
Clusters (at interaction
strength Imin =6%)
in Ornithine Decarboxylase
Advantages of Graph Spectral Analysis
• Solution to weighted graph
• Identification of Cluster Centre
PSN
Size of largest cluster,Degree distribution, Hubs
Interacting biomolecularresidues
Adjacency Matrix
PSG
Side chain Clusters(DFS)
Construction of Graphs and Networks
Hubs in Protein Structure
Networks
� Hubs - highly connected amino acids in the protein
structure.
� Identified as amino acids having a contact number of >= 4
at a given Imin.
Arg
Pro His
Ser
Asp
Arg
Applications of Graph Spectral Analysis
to Protein structures
A) Clusters of Importance:
a) Active/Binding site
b) Domain identification
c) Determinant of thermal stability-Aromatic clusters
d) Protein-Protein interaction surfaces
B) Cluster centers:
Identification of hot spots, Signature motifs of oligomerization
A case study
Oligomerization in Legume lectins
Brinda, Surolia, Vishveshwara, 2005, 2006
Legume Lectins:
High sequence similarity
Similar 3-D structures
Diverse Quaternary Associations
Interface clusters at different
types of dimeric interfaces
Interface clusters
Different colors represent residues from different chains (note: The residues are not sequential and hence the signature of association type can not be obtained from sequence analysis)
Cluster Center
The residue position in the cluster (interface graph) is obtained from graph spectral analysis. The residues with highest rank are the cluster centers (hot spots). They may be the targets for mutational experiments
Pentraxin is yet another quaternary structure of a lectin
Protein-Protein Interaction Networks
Strong Interface
Centre of interface cluster
Weak Interface
5-aminolevulinic acid dehydratase (1B4K)
Urate Oxidase (1UOX)
Brinda, Vishveshwara, BMC Bioinformatics, 2005
(Analysis of a large dataset)
References on Protein structure networks
Three key residues form a critical contact network in a protein folding transition state.Nature. 409:641-645, Vendruscolo, M., E. Paci, C. M. Dobson, and M. Karplus. 2001.
Small-world view of the amino acids that play a key role in protein folding.Phys. Rev. E. 65:061910 Vendruscolo, M., N. V. Dokholyan, E. Paci, and M. Karplus. 2002.
Topological determinants of protein folding. Proc. Natl. Acad. Sci. USA. 99:8637-8641,Dokholyan, N. V., L. Li, F. Ding, and E. I. Shakhnovich. 2002.
Uncovering network systems within protein structures.J. Mol. Biol. 334:781-791. Greene, L. H., and V. A. Higman. 2003.
Network analysis of protein structures identifies functional residues.J. Mol. Biol. 344:1135-1146 Amitai, G., A. Shemesh, E. Sitbon, M. Shklar, D. Netanely, I. Venger, and S. Pietrokovski. 2004.
Small-world communication of residues and significance for protein dynamics. Biophys. J.86:85-91.Atilgan, A. R., P. Akan, and C. Baysal. 2004.
Network properties of protein structures.Physica A. 346:27-33. Bagler, G., and S. Sinha. 2005.
A Network Represenataion of Protein Structures: Implications for Protein Stability,Biophysical Journal, 6, 296, Brinda K.V. and Vishveshwara.S, 2005
Dynamics of Lysozyme Structure Network: probing the Processes of UnfoldingBiophysical Journal 92, Ghosh Amit, Brinda.K.V., Saraswathi Vishveshwara. , April 1, 2007
Protein Structure Graph and Network Topology
Brinda and S. Vishveshwara, Biophysical Journal, Dec 2005
Investigation of a large number of non-redundant structures
Giant cluster profile
Number of nodes with ‘k’ links
Degree Distribution plots
The Protein network topology exhibits
a complex behaviour
Poisson distribution?
Scale free- power law?
The topology depends on the Interaction strength (Imin)
Poisson Distribution when all the weak interactions are considered
Exponential-like behaviour at Interaction strengths greater than Icritical
Protein Structure Networks: Two proteins performing the same function at different temperatures. Although the overall structures are very much similar, the additional stability of the protein functioning at high temperature is due to increased number of hubs (shown in green). The concept is useful in protein design
Brinda and S.Vishveshwara, Biophysical journal , Dec 2005
Hubs in carboxy peptidase of a thermophilic and a mesophilicorganism
Blue: hubs common to both proteins
Green: Hubs exclusive to the thermophilic
Orange: Hubs exclusive to the mesophelic
Hubs and Thermal Stability of Proteins
Dynamics of structure networks
Analysis of Network parameters from MD trajectories
Insights into biological processes such as
The mechanism of protein folding
Long distance communications (allosteric effect)
Identification of communication pathways in MetRS from network analysis of MD trajectories
Amit Ghosh and S Vishveshwara PNAS September 2007
MD simulations on MetRS complexes
A:MetRS
B: MetRS+MetAMP
C:MetRS+tRNA
D:MetRS+tRNA+MetAMP
Communication between the anticodon region and the active-site region in MetRS is crucial for the faithful translation of the genetic code
RMSD Profiles
MetRS complexes Simulations
The CP domain opens up when MetRS is bound to both the tRNA and the activated methionine
Conformation of tRNA
MetRS+tRNA MetRS+tRNA+MetAMP
Interactions at the active-site
How is the message communicated between the anticodon region and the activation site, which are separated by about 70 A?
Size of the largest cluster in MetRS (averaged over MD snapshots) as a function of interaction strength
Imin (%)
Fluctuations in top three clusters of MetRS during MD simulation
Dynamical cross correlation maps representing the collective atomic fluctuations
MetRS+tRNA+MetAMP MetRS
Cross Correlated residues identified from MD simulations need not be connected in space.
The residues connecting (non-covalent) the correlated ones are identified from the protein structure network, by the analysis of the shortest path between the selected residues
Deduced from dynamic cross correlations and identification of shortest paths connected through non-covalent interactions at Imin=3%
Communication pathways from the anticodon region to the amino-
acylation site
Communication paths between the anti-codon region and the active site in MetRS
Communication paths between the anti-codon region and the active site in MetRS
Summary
Correlated movements are required for the functioning of
proteins. They can be identified at different levels.
Residue-wise RMSD, Cross correlations, Essential dynamics
The global non-covalent connectivity in proteins can be
represented as graphs and networks
Dynamics of such network can be analyzed from MD trajectories .
Cross correlation maps, in combination with network analysis
yield interesting information on important dynamical events such
as: Protein folding, Communication paths
The communication pathways between the anticodon region and the amino acylation site have been deduced from the dynamic cross correlations and the network analysis of the MD trajectories of Methionyl tRNA Synthetase
Acknowledgements
Amit Ghosh: Lysozyme and MetRS
Drs. Brinda, Kannan, swarna M. Patra- structure network
Department of Biotechnology, Government of India
EcMetRS:Met EcMetRS
Closing Mode Opening Mode
Amit Ghosh, Brinda, Vishveshwara,
Biophysical J April 2007
Concept of Structure Network Integrated with Dynamics
Equilibrium and unfolding simulations of T4-Lysozyme
probed from Network perspective
Protein Folding/unfolding
Analysis of the dynamics trajectories
T4 Lysozyme
Simulation details
Simulation Profiles: Root Mean Square Deviation(RMSD)
Time(ns)
RMSD
Interactions across secondary structures (300K)
k
� Non-native contact compared crystal structure
o Native contact retained
a. αααα3 ���� αααα1, ββββ1, ββββ3, αααα2
b. αααα1 ���� αααα5
c. αααα1 ���� αααα10, αααα11
d. αααα5 ���� αααα3, αααα4
e. αααα5 ���� αααα6
f. αααα5 ���� αααα9, αααα10
g. αααα7 ���� αααα8
300K Imin=3.4%
around transition
after transition
Interactions across secondary structures (500K)
* Non-native contact compared with 300K simulation o Native contact retained
500K Imin=2.5%
Unfolding events
Analysis of Network Parameters
Side-Chain Interaction Strength Dependent Analysis
•Degree distribution profiles
•Largest cluster profiles
•Native/Non-native contacts as a function of simulation time
Distribution of nodes with k links (degree distribution)
300K 500K
At 300K the number of nodes with one or two links is higher than the number of nodes with zero link(orphans), for Imin values less than 5%.
The transition from bell shaped curve to a decay-like curve takes place at a lower Imin of 3% in 500K
Size of the Largest Cluster Profiles
The size of the largest cluster in each of the simulations undergoes a transition at Imin = Icritical. (the size of the largest cluster is half of the maximum)
Icritical shifts from 3.4% at 300K to 2.5% at 500K.
Size of the largest cluster at Imin=0% (averaged over the simulation)
Large (135) at 300K Small (104) at 500K
Native/Non native contacts
The average number of contacts
300K 400K
native (0%) 206 160
native (3.5%) 98 72
non-native(0%) 50
non-native(3.5%) 45
native/non-native contacts = 1 at approximate time points 2 ns and 0.7 ns respectively for Imin values 0% and 2.5%. The non-native contacts increase after these time points
Ratio of native to non-native contacts is one
Native/Non native contacts@500K
Size and the composition of the large clusters
• 300K simulation:
At Imin = Icritical, the largest cluster encompasses the residues of both the domains
Average cluster composition (residues present in the largest cluster in >50% snapshots) is fairly constant. The largest cluster includes a considerable amount of hydrophobic residues
At Imin > Icritical , the top 2 largest clusters belong to the larger domain and the 3rd largest cluster belongs to the smaller domain. The participation of hydrophobic residues reduce considerably and aromatic residues dominate
• 500K simulation
Large fluctuations in the size and the composition of the clusters.
Different unfolding states such as the transition state, collapse state, unfolded state can be recognized from the size and the composition of the top large cluster
Lysozyme unfolding monitored through the changes in the size and the composition of large clusters formed by non-covalent interactions of amono acid side-chains
Composition of the top three large clusters (Imin=5%)
1st Largest
cluster
3rd Largest
cluster
2nd Largest cluster
Correlations with Experiments
Mutation studies:
A large number of mutations and their effects on stability have been experimentally carried out.
The destabilizing mutations (Val111,Trp138, Phe153) have been identified as hubs from our study. We predict that the mutation of other residues (Phe104,Arg145,Arg148) will also destabilize the protein, since they are hubs at high Imin and also remain as hubs in 400K simulation
Stages of domain formation during folding:
Our high temperature simulation points to the fact that the domain D2 is formed at an early stage, reinforcing the experimental findings
Outline
1. Introduction to protein structure networks
2. Biologically relevant information from graph/network analysis
3. Dynamics of structure networks
a) Unfolding simulation of Lysozyme
Network changes during folding/unfolding transition
b) Equilibrium simulation of methionyl tRNA synthetase
Network of communication pathways
Size of the largest cluster in PSG is dependent on the
interaction strength and a phase transition is seen at
Imin=Icritical in all the proteins
Property of a small world network
Threshold of the giant cluster appearance
|E| > (n/2) ?
Trajectory of the size of the largest cluster at Imin = 4% (500K simulation)
Rg fluctuates around 13.4Å in the 300K, which is slightly higher at 400K . Large fluctuations, varying from 13 Å to 18 Å are seen in all the three simulations at 500K.
II. Radius of gyration trajectories at different temperatures.
2nd Largest cluster
1st Largest
cluster
3rd Largest
cluster
Composition of the largest cluster(Imin=5%)
Relative Positional Fluctuation(RPF)
Simulations on proteins of Ribonuclease-A family
Essential dynamics is captured by < top 20 modes
Projection of top 3 vectors on to the MD Trajectory
ECP-A (top mode)
backbone
Projection of the Essential Modes
Structures obtained from top 3 modes
along with side chains significantly
contributing to these modes
Displacement along a single eigenvector (S): S= (r-<r>) .e
B.S.Sanjeev and S. Vishveshwara, Eur Phys. J D 20, 601-608 (2002)
II. Essential Dynamics:
WhereS = total number of configurations,t = time in ps.
xi = ith coordinate (i = 1,2,…., 3N) (N being number of atoms).
<xi> = average value of xi over all configurations.
Relative positional fluctuation (RPF) was then calculated as:
Where (i) is ith eigenvalue, RPF(n) gives the amount of motion associated in the subspace spanned by the first n
Essential dynamics captures major part of the dynamics in a few
essential modes.
( )( )(1/ ) ( ) ( )ij i i j j
t
m S x t x x t x= − −∑
1,
1,3
( )
( )( )
i n
i N
i
RPF ni
λ
λ=
=
=∑
∑
Matrix representation: Unweighted and Weighted1
2 3
1
2 3
0.6 0.5
0.7
1 1
1
A= 0 1 1
1 0 1
1 1 0
L= 2 -1 -1
-1 2 -1
-1 -1 2
A= 0.0 0.6 0.5
0.6 0.0 0.7
0.5 0.7 0.0
L= 1.1 -0.6 -0.5
-0.6 1.3 -0.7
-0.5 -0.7 1.2
A=Adjacency; L=Laplacian
Aij = 1, if i≠j and i and j are connected in the graph.
Aij = 0, otherwise.
Protein structure graphs: N×N matrixN=No. of residues
Weights in protein side chain graphs = 1/dij
dij=Cβ-Cβ distance in Å (Cα
in case of Gly).
EigenEigen Spectra of the Laplacian Matrix of a graph Spectra of the Laplacian Matrix of a graph
Eigen-
values
0.0000 0.0800 0.9016 1.5500 1.9500 2.0600 2.7420 4.5164
Node EVC1 EVC EVC EVC EVC EVC EVC EVC
1 0.3536 0.2739 0.1380 -0.0000 0.0000 0.0000 0.5362 0.70242
2 0.3536 0.2739 -0.2560 -0.0000 -0.0000 0.7071 0.2422 -0.4193 3 0.3536 0.2739 -0.4375 -0.0000 -0.0000 -0.0000 -0.7031 0.3380 4 0.3536 0.2739 -0.2560 -0.0000 -0.0000 -0.7071 0.2422 -0.4193 5 0.3536 0.2739 0.8115 0.0000 -0.0000 -0.0000 -0.3175 -0.2018 6 0.3536 -0.4564 -0.0000 -0.4082 -0.7071 0.0000 0.0000 0.0000 7 0.3536 -0.4564 0.0000 -0.4082 0.7071 -0.0000 -0.0000 -0.0000 8 0.3536 -0.4564 -0.0000 0.8165 -0.0000 0.0000 0.0000 0.0000 1 : Eigenvector Components 2 : Largest vector components of top eigenvalues of each cluster are shown in blue