Post on 20-Jan-2016
Computational Method forComputational Method for
Predicting Amyloidogenic Predicting Amyloidogenic SequencesSequences
Bill WelshBill Welsh
UMDNJ- Robert Wood Johnson Medical UMDNJ- Robert Wood Johnson Medical SchoolSchool
Computational Method forComputational Method for
Predicting Amyloidogenic Predicting Amyloidogenic SequencesSequences
Bill WelshBill Welsh
UMDNJ- Robert Wood Johnson Medical UMDNJ- Robert Wood Johnson Medical SchoolSchool
welshwj@umdnj.edu
Amyloid Fibril Formation
A Common Mechanism for Protein Misfolding Diseases
• Numerous amyloid & misfolding diseases• All of them are incurable at present• Short list of more familiar examples
– Alzheimer’s disease
– Parkinson’s disease
– Huntington’s disease
– Crutzfeld-Jakob disease (“Mad Cow”)
– Familial Amyloidosis
– Type II Diabetes
• Triggered by short sequences that convert from native -helix or coil to -strand
• We call this trait ‘hidden -strand propensity’
1. No sequence specificities
2. Absence of detailed structural information on misfolded proteins (amyloid fibrils)
Problems
Our Solution
1. Misfolding process is triggered by short (5-7 residue) sequences
2. Redefine sequence-structure relationships in terms of tertiary context
3. Identify short sequences that exhibit non-native (hidden) -strand propensity [HP].
Relative Occurrence of Secondary Structure Elements in Different Tertiary Contact States
Secondary structure
Coil Total sequences
Low 38 % 59 % 3 % 191,300
Medium 47 % 37 % 16 % 112,199
High 39 % 11 % 50 % 150,288
Tertiary contacts
All 41 % 38 % 21 % 453,787
Based on SCOP20v1.57
Two non-H atoms 4Å apart separated by more than 4 residues in sequence
Tertiary Contact (TC)
Intriguing Relationship Between
Tertiary Contacts and Secondary Structure
Striking Conclusion
-helix dominates in low-TC regions
-sheet dominates in high-TC regions
TC Influence on Secondary Structure TC Influence on Secondary Structure PropensityPropensity
L S A Q E K D N R I F T M P G W C V Y H
9.9 8.3 6.4 10.2 8.6 8.0 9.1 10.6 15.9 11.0 18.5 9.3 11.9 7.0 6.5 25.9 12.7 10.1 21.7 16.2
Average Tertiary Contacts (TCs) in SCOP20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
TC
Fra
gm
ents
pre
dic
ted
as
hel
ix
beta-strandCoil
-helix propensity of -strands increases sharply
at low TCs
hel
ix p
rop
ensi
ty
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
TC
fra
gm
en
ts p
red
icte
d a
s b
eta
-str
an
d
Helix Coil
β-strand propensity of helices increases sharply at high TCs
-st
ran
d p
rop
ensi
ty
Database of >450,0007-residue sequences
with secondary structure & TCs
Database of >450,0007-residue sequences
with secondary structure & TCs
TertiaryContact (TC)
TertiaryContact (TC)
DSSPSec Str DSSP
Sec Str Fandrich et al., 2001 Nature
Amyloid fibrils from myoglobin
The CSSP Algorithm: Locating Sequences Exhibiting HP
Amino acid|… …AGHGQEVLIRLFTGHPETL… …| PHD |… …HHHHHHHHHHHHHH HHH… …| P() |… …7756899999999623469… …| P() |… …0000000000000000000… …|
PHD prediction of secondary structure
SCOP20Sequences
SCOP20Sequences
3D Structure from PDB
3D Structure from PDB
Low TC
High TC
A G H G Q E V L I R L F T G H P E T LW W W W W W W W W W W W W W W W W W W
W W W W W W W W W W W W W W W W W W W
… …P(|low)
P(|high)
Sequence of hidden -propensity
sliding 7-residue window
- Q – E – V – L – I – R – L -
Query Sequence
Similar Sequences
Sensitivity of the CSSP MethodSensitivity of the CSSP Method
Cameleon SequencesCameleon Sequences
ASVKQVSin -sheet
ASVKQVSin -helix
1AMP 1GKYAminopeptidase Guanylate kinase
Query local sequence
Resident proteinNative secondary
structure
Tertiary Contacts
(TC)
HP prediction (0-10 scale)
PDB ID Name P() P() P(Coil)
ASVKQVS
1AMP Aminopeptidase strand 1.3 2 7 1
1GKY Guanylate kinase helix 0.4 8 1 1
Helix Beta Coil Propensity
StrongModerateWeakVery weak
Amyloidogenic wild type Aβ fragment
Non-amyloidogenic mutant Aβ fragment
Hidden β-propensity in Alzheimer’s Disease
KLVFF are key residues in amyloid fibril polymerization(Tjernberg et al., JBC 1996)
Yoon and Welsh, Protein Science (2004); ibid., Proteins (2005)
hIAPP sequence (Type 2 Diabetes)
hIAPP sequence (4-34) associated with type II diabetes
NAC sequence of α-synuclein associated with Parkinson’s disease
-NFLVH- -FLVHS- Mazor et al., JMB (2002)
VTNVGGAVVTGVTAVA VTGVTAVAQKTV GAVVTGVTAVA Bodles et al., J Neurochem (2001)
NAC sequence (Parkinson’s disease)
-NFGAIL- Zanuy, Nussinov, et al. Biophysical Journal (2003)
Beta propensity of acetylcholinesterase (AChE)
and its homolog butyrylcholinesterase (BuChE)
Amyloidogenic AChE586-599 fragment
Nonamyloidogenic BuChE573-596 fragment
Cottingham et al., Biochemistry (2002); ibid., (2003): AChE586-599 and BuChE573-586
Amyloid Formation by G334V
Mutant p53 Associated with Lung Cancer
Higashimoto et al, Biochemistry 45, 1608-1619 (2006)
Amyloidogenic Sequence Knowledge Base (ASKB)
http://askb.umdnj.edu/askb/welcome.html
CSSP Algorithm that
predicts
“Hidden” -Strand
Propensity
in Proteins & Polypeptides
Searchable peptide
database
Unfolded
KRTGhidden log
G-helix
-strand
Random coil
G
coilG
G
coilG
)log(log coilKKRT
-rich amyloid
Partially Folded
amyloidG
coilahidden GGG
hiddenamyloid GG
coilhidden P
P
P
PRTG
loglog
coilPP
PRT
2
log
Estimating Free Energies
Predicted vs. Expt’l -Sheet
Structure of Prion Protein Peptide
• Decatur and coworkers employed FTIR spectroscopy to determine % -sheet structure for peptides based on residues 109-122 of the Syrian hamster prion protein (H1) substituted at position 117.
• We plotted our calculated HP metrics for the sequences H1, A117G, A117V, A117L, and A117I vs. Decatur’s expt’l values.
• Strong correlation (R2=0.96) suggests that calculated HP profiles are excellent predictors of -sheet nature.
SA Petty, T Thorsteinn, & SM Decatur, Biochemistry 44:4720-4726 (2005)
Thank You!welshwj@umdnj.edu
Thank You!welshwj@umdnj.edu
The CSSP algorithm successfully pinpoints amyloidogenic sequences in numerous examples where expt’l data are available
These sequences possess hidden -strand propensity generally short sequences (4-7 residues) that serve as ‘core nucleation motifs’ to
trigger amyloid fibril formation adopt -helix in low contact regions (low TC) and -strand in high contact regions
(high TC)
These sequences are conformationally ambivalent interconvertible between -helix and -strand highly sensitive to tertiary environment generally contain hydrophobic, aromatic residues (Phe, Trp, Tyr) consistent with recent findings: Rojas Quijano et al Biochemistry (2006)
Ability to form amyloid is a generic trait of all proteins
General Observations and Implications