Structural glycoinformatics approaches
• Structural modeling– Comparative modeling of glycoproteins– Complex modeling: glycoprotein replacement
• Modeling of the complex of glycans and GBPs and GTs:– docking– Analysis of interaction specificities
• Key residues vs. Specific glycan conformations
• Molecular Dynamics– Modeling the dynamics of the recognition of glycans by
GBPs– Modeling the enzymology of GTs: quantum mechanic
calculations
obtain sequence (target)
fold assignment
comparativemodeling
ab initiomodeling
build, assess model
Approaches to predicting protein structures
high identitylong alignment
low identityfragment alignment
Sequence-sequence alignment orSequence-structure alignment
Comparative modeling of proteins
• Definition: Prediction of three dimensional structure of a target protein from the
amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available.
• Why a Model:A Model is desirable when either X-ray crystallography or NMR spectroscopy cannot determine the structure of a protein in time or at all. The built model provides a wealth of information of how the protein functions with information at residue property level, e.g. the interaction with the ligands, GBPs/GTs with glycans.
??
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE
Comparative Modeling(or homology modeling)
Use as template & model
8lyz1alc
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar
Sequence
Homologous
Homology models have RMSDs less than 2Å more than 70% of the time.
Homology models can be very smart!
.
0
20
40
60
80
100
0 50 100 150 200 250
Number of residues aligned
Perc
enta
ge s
equence
identi
ty/s
imila
rity
(B.Rost, Columbia, NewYork)
Sequence identity implies structural similarity
Don’t know region .....
Sequence similarity implies structural similarity?
Step 1: Fold IdentificationAim: To find a template or templates structures from protein database (PDB)
Improved Multiple sequence alignment methods improves sensitivity - remote homologs PSIBLAST, CLUSTAL
pairwise sequence alignment - finds high homology sequences BLAST
Fold recognition programs – find low homology sequences (threading, profile-profile alignment)
Step 2: Model ConstructionAim: To build three dimension (3D) structures of proteins, coordinates of every
atoms of the homology proteins
Approach 1: protein structure buildup: cores, loops and sidechains;
Approach 2: whole protein modeling: constraint-based optimization.
Commonly used programs: Modeller (http://salilab.org/modeller/)Swiss-model (http://swissmodel.expasy.org/)Geno3D (http://geno3d-pbil.ibcp.fr/)… …
Modeling of glycan-protein complexes• Template: glycan-protein complex;
– Case 1: same glycan, different protein• Glycoprotein replacement: comparative modeling of protein
structure• Energy minimization, allowing structural flexibility of glycans
– Case 2: same protein, different glycan• Flexible docking of glycans
– Case 3: different protein and different glycan• Comparative modeling of proteins• Flexible docking of glycan• Can also be applied without a template of complex
Flexible docking• Semi-flexible (rigid protein, flexible ligand)
– Useful for drug screening– >150 programs: Dock, AutoDock, FlexX/FlexE, …
• Flexible protein: mainly sidechains (hard)• Two elements of semi-flexible docking algorithms
– ligand sampling methods• Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte
Carlo…– Treatment of intermolecular forces:
• Simplified scoring functions: empirical, knowledge-based and molecular mechanics e.g. AMBER, CHARMM, GROMOS, ...
• Very simple treatment of solvation and entropy, or completely ignored!
Flexible docking of glycans to proteins
• Glycan structure sampling– Automatic generation / sampling of 3D glycan
structures: Sweet II (http://www.dkfz-heidelberg.de/spec/sweet2)
• Docking of each glycan conformation to the GBP: Scoring schemes– Empirical scores– Forcefield
• GLYCAM: modified AMBER forcefield / MD tools for glycans (R. Woods group)
– Challenge: water molecules
Flexibility of molecules
• Atoms connected by covalent bonds
• Bond lengths and bond angles are rigid
• Torsion (dihedral) angles are flexible
Frequently used definitions of glycosidic torsion angles
Angle NMR style
C − 1 crystallographic style
C + 1 crystallographic style
ϕ H1—C1—O—C′x O5—C1—O—C′x O5—C1—O—C′x
ψ C1—O—C′x—H′x C1—O—C′x—C′x−1 C1—O—C′x—C′x+1
ψ [(1–6)-linkage] C1—O—C′6—C′5 C1—O—C′6—C′5 C1—O—C′6—C′5
ω [(1–6)-linkage] O—C′6—C′5—H′5 O—C′6—C′5—C′4 O—C′6—C′5—O′5
ASN
sweet2: http://www.dkfz-heidelberg.de/spec/sweet2/
Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans binding to influenza
viral HAs
Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Combine structural analysis with the glycan array analysis: providing structural insights.
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Ligand binding by the scavenger receptor C-type lectin (SRCL) and LSECtin
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Binding of multiple classes of ligands to DC-SIGN and the macrophage galactose receptor. Model of the binding site in the macrophage galactose receptor with a bound GalNAc residue, based on the structure of the galactose-binding mutant of mannose-binding protein that was created by insertion of key binding site residues from the galactose-binding receptor.
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Mechanisms of mannose-binding protein interaction with ligands.
Molecular Dynamics: simulation of molecular motions
• Energy model of conformation• Two main approaches:
– Monte Carlo - stochastic– Molecular dynamics – deterministic
• Understand molecular function and interactions– Catalysis of enzymes
• Complementary to experiments• Obtain a movie of the interacting molecules
Basic Concepts of simulation of molecular motion
1. Compute energy for the interaction between all pairs of atoms.
2. Move atoms to the next state.3. Repeat.
Energy Function
• Target function that MD uses to govern the motion of molecules (atoms)
• Describes the interaction energies of all atoms and molecules in the system
• Always an approximation– Closer to real physics --> more realistic, more
computation time (I.e. smaller time steps and more interactions increase accuracy)
F = MA
exp(-E/kT)
domain
quantumchemistry
moleculardynamics
Monte Carlo
mesoscale continuum
Length Scale
Tim
e Sc
ale
10-10 M 10-8 M 10-6 M 10-4 M
10-12 S
10-8 S
10-6 S
Taken from Grant D. SmithDepartment of Materials Science and EngineeringDepartment of Chemical and Fuels EngineeringUniversity of Utahhttp://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt
Scale in Simulations
The energy model
http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechanics_document.html
The NIH Guide to Molecular Modeling
• Proposed by Linus Pauling in the 1930s
• Bond angles and lengths are almost always the same
• Energy model broken up into two parts:– Covalent terms
• Bond distances (1-2 interactions)
• Bond angles (1-3)• Dihedral angles (1-4)
– Non-covalent terms• Forces at a distance between
all non-bonded atoms
The energy equation
Energy = Stretching Energy +Bending Energy + Torsion Energy + Non-Bonded Interaction Energy
These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a force-field.
Bond Stretching Energy
kb is the spring constant of the bond.
r0 is the bond length at equilibrium.
Unique kb and r0 assigned for each bond pair, i.e. C-C, O-H
Bending Energy
k is the spring constant of the bend.
0 is the bond length at equilibrium.
Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types (e.g. C-C-C, C-O-C, C-C-H, etc.)
Torsion Energy
A controls the amplitude of the curve
n controls its periodicity
shifts the entire curve along the rotation angle axis ().
The parameters are determined from curve fitting.
Unique parameters for torsional rotation are assigned to each bonded quartet of atoms based on their types (e.g. C-C-C-C, C-O-C-N, H-C-C-H, etc.)
Non-bonded Energy
A determines the degree the attractiveness
B determines the degree of repulsion
q is the charge
A determines the degree the attractiveness
B determines the degree of repulsion
q is the charge
Simulating In A Solvent• The smaller the system, the more particles on the
surface– 1000 atom cubic crystal, 49% on surface
– 106 atom cubic crystal, 6% on surface
• Would like to simulate infinite bulk surrounding N-particle system
• Two approaches:– Implicitly– Explicitly
• Periodic boundary conditions
Schematic representation of periodic boundary conditions.
http://www.ccl.net/cca/documents/molecular-modeling/node9.html
Parameters for MD: Forcefield
• Derived from direct experimental measurements on small molecules (~10 atoms)
• Commonly used: AMBER, CHARMM, GROMOS, etc– GLYCAM for MD of glycoconjugates (derived from
AMBER forcefield)
Monte CarloExplore the energy surface by randomly probing the
configuration space by a Markov Chain approachMetropolis method (avoids local minima):
1. Specify the initial atom coordinates.2. Select atom i randomly and move it by random displacement.3. Calculate the change of potential energy, E corresponding to
this displacement.4. If E < 0, accept the new coordinates and go to step 2.5. Otherwise, if E 0, select a random R in the range [0,1] and:
1. If e-E/kT < R accept and go to step 2 2. If e-E/kT R reject and go to step 2
Deterministic Approach
• Provides us with a trajectory of the system.– From atom positions, velocities, and accelerations,
calculate atom positions and velocities at the next time step.
– Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.
• Typical simulations of small proteins including surrounding solvent in the pico-seconds.
Fi E
x i
F m
a
Deterministic / MD methodology
• From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step.
• Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.
• There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being the most commonly used.
MD algorithm
• Initialize system– Ensure particles do not overlap in initial positions
(can use lattice)– Randomly assign velocities.
• Move and integrate.
{r(t), v(t)}
{r(t+t), v(t+t)}
Leapfrog algorithm
MD studies of Prion proteins
• Prion protein (PrP) is associated with an unusual class of neurodegenerative diseases– Scrapie (sheep); bovine spongiform encephalopathy (BSE) in cattle; kuru,
Creutzfeldt-Jacob disease (CJD), Gerstmann-Sträussler-Scheinker syndrome
(GSS), and fatal familiar insomnia (FFI) in humans
• Protein-only hypothesis (Prusiner, 1982): the disease is caused by an abnormal form of the 250 amino acid PrP, which accumulates in plaques in the brain.
• PrP (PrPSc) differs from the normal cellular form (PrPC) only in its 3-D structure, and FTIR and CD spectra indicate it has a significantly increased content of ß-sheet conformation compared with PrPC
• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc);
PrP is a glyco-protein
• Available NMR structures are for non-glycosylated PrPC only
• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc)
• Objective: study of the influence of two N-linked glycans (Asn181 and Asn197) and of the GPI anchor attached to Ser230
Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.
MD simulations• Molecular dynamics simulations on the C-terminal region of human prion
protein HuPrP(90–230), with and without the three glycans• AMBER94 force field in a periodic box model with explicit water
molecules, considering all long-range electrostatic interactions• HuPrP(127–227) is stabilized overall from addition of the glycans,
specifically by extensions of two helix and reduced flexibility of the linking turn containing Asn197;
• The stabilization appears indirect, by reducing the mobility of the surrounding water molecules, and not from specific interactions such as H bonds or ion pairs.– Asn197 having a stabilizing role, while Asn181 is within a region with already
stable secondary structure
Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.
Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans
binding to influenza viral HAs
Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)
A retrospective analysis
MD simulation of glycan binding of influenza HAs
• A combined approach (MD + sequences) to predict ligand-binding mutants of H5N1 influenza HA– Modeling the ligand-bound state of H5N1 HA using the isolate VN1194
bound to α2,3-sialyllactose as previously crystallized– Excess mutual information was computed between each residue of each
monomer and the corresponding bound ligand, using the average mutual information between the residue and all residues as an estimate of the “background” mutual information.
– Combine these results with sequence analysis of H5N1 mutational data to predict clusters of residues that undergo coordinated mutation, which have some capacity to vary but are subject to selective pressure relating mutation. These residues may be richer targets to change ligand specificity than residues absolutely conserved or residues that display uncorrelated mutations (involved in immune escape).
Kasson, et. al., JACS, 2009, 131 (32), pp 11338–11340
Experimentally identified ligand-binding mutations in red, the top 5% of residues by dynamics scoring in cyan (overlap of these two in magenta), and the six mutation sites identified by both dynamics and sequence analysis in yellow.
The top three mutations from the ligand dissociation analyses in yellow. A modeled α2,3-sialyllactose is shown in orange.
Prediction of dissociation rate for HA mutants (in silico mutagenesis)
• Bayesian analysis methods to predict dissociation rates based on extensive simulation of each mutant and evaluate whether a mutant has a faster dissociation rate than the influenza clinical isolate that we use as a wild-type reference.
• These simulations were used to estimate the dissociation rate for each mutation.
• The mutation sites predicted by analysis of the molecular dynamics data include both residues immediately contacting the bound glycan and residues located farther away on the globular head of the hemagglutinin molecule.
Top Related