Post on 22-Jun-2020
3/11/14
1
Programme 8.00-8.30 Last week’s quiz results 8.30-9.00 Prediction of secondary structure &
surface exposure 9.00-9.20 Protein disorder prediction 9.20-9.30 Break 9.30-11.00 Ex.: Secondary structure prediction 11.00-11.10 Break 11.10-11.40 Summary & discussion 11.40-12.00 Quiz
1
Feedback Persons
2
3/11/14
2
Programme 8.00-8.30 Last week’s quiz results 8.30-9.00 Prediction of secondary structure &
surface exposure 9.00-9.20 Protein disorder prediction 9.20-9.30 Break 9.30-11.00 Ex.: Secondary structure prediction 11.00-11.10 Break 11.10-11.40 Summary & discussion 11.40-12.00 Quiz
3
1-D Predictions
Prediction of local features: Secondary structure
& surface exposure
4
3/11/14
3
Learning Objectives § After today’s session you should be able to:
– Explain the meaning and usage of the following local feature terms:
• Secondary structure • Surface accessibility/exposure • Transmembrane helix • Signal peptide • Protein disorder
– Use different 1-D prediction servers and interpret the results (the exercise).
5
Residue Patterns § Helices
– Helix capping – Amphiphilic residue
patterns
§ Sheets – Amphiphilic residue
patterns – Residue preferences at
edges vs. middle
§ Special residues – Proline
• Helix breaker
– Glycine • In turns/loops/bends
N
C
6
3/11/14
4
1-D predictions § Local Structures " Secondary Structure " Trans Membrane Helix
§ Features " Surface Accessibility " Signal Peptides
7
Secondary Structure Elements
§ α-helix = H § 310-helix = G § π-helix = I § Extended (ß)-Strand = E § Isolated ß-bridge = B § Turn = T § Bend = S
Rest (Coil) = C/.
8
3/11/14
5
Assignment from Structure
• DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )• STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )• DSSPcont ( http://cubic.bioc.columbia.edu/services/DSSPcont/ )
9
Helices
10
3/11/14
6
§ Α-helix = H § 310 - helix = G § π-helix = I § Extended (ß)-Strand = E § Isolated ß-bridge = B § Turn = T § Bend = S § The Rest (Coil) = ./C
Three-State Prediction of Classes
H
E
C
11
Prediction Servers
§ PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/)
§ PHDProf § Jpred
12
3/11/14
7
PSIPRED PSIPRED PREDICTION RESULTS!!Key!!Conf: Confidence (0=low, 9=high)!Pred: Predicted secondary structure (H=helix, E=strand, C=coil)! AA: Target sequence!!!# PSIPRED HFORMAT (PSIPRED V2.3 by David Jones)!!Conf: 962265677776523477650688877787645776578999999733875215678887!Pred: CCCHHHHHHHHHHHCCCCCCCHHHHHHHHHHHCCCCCCHHHHHHHHHCCCCCCHHHHHHH! AA: MSLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEVLMEWLKTRPILSPLTKGIL! 10 20 30 40 50 60!!Conf: 754642045401245555330224688880246788999999865213001344431012!Pred: HHHHHHCCCCHHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHHHCCHHHHHHHHHCCC! AA: GFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLYRKLKREITFHGAKEISLSYS! 70 80 90 100 110 120!!Conf: 113899999987067751045678889988888742346778777764042033332466!Pred: HHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHHH! AA: AGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMV! 130 140 150 160 170 180!!!Conf: 554368888741366024789999999999999999862489875310478899999998!Pred: HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHHH! AA: LASTTAKAMEQMAGSSEQAAEAMEVASQARQMVQAMRTIGTHPSSSAGLKNDLLENLQAY! 190 200 210 220 230 240!!!Conf: 886363002159!Pred: HHHHCCHHHCCC! AA: QKRMGVQMQRFK! 250!!!Calculate PostScript, PDF and JPEG graphical output for this result using: !http://bioinf2.cs.ucl.ac.uk/cgi-bin/psipred/gra/nph-view2.cgi?
id=3644f256afcf5ec3.psi!!
13
PSIPRED
14
3/11/14
8
Trans-Membrane Helices
15
Transmembrane Helix Predictors
§ TMHMM § HMMTOP § DAS
16
3/11/14
9
Signal Peptide
SignalP Phobius Philius
17
Prediction Methods
Exemplified by Secondary Structure Predictions
18
3/11/14
10
Amino Acid Statistics
VKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRI!.HHHHHHHHHHHHHHHHS.......GGGEEEEEEEEE.SS.EEEEEEETTTTEEEEEEEEEHHHHHHTT.HHHHHHHHHH!
VKEFLAKAK!
KEFLAKAKE!
EFLAKAKED!!.!.!.!.!.!
Helix QLDQFDRIK!
LDQFDRIKT!
DQFDRIKTL!!.!.!.!.!.!
Strand KKWETPSQN!
KWETPSQNT!
WETPSQNTA!!.!.!.!.!.!
Coil
19
Propensities
Helix
20
3/11/14
11
BLOSUM Substitution A R N D C Q E G H I L K M F P S T W Y V B Z X * A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4
21
Position Specific Substitution Matrices (PSSM)
22
3/11/14
12
PSSM
A R N D C Q E G H I L K M F P S T W Y V 1 I -2 -4 -5 -5 -2 -4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4 2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4 -3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3 1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4 E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4 -2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2 -4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V -3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3 3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4 -5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I -3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2 0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3 5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A 2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3 -5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3 -1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3 -5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4 -3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1 0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1 -3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1 -2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4 -4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2 4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1 5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2 -2 -3 -5 -4 -5 -1 2 -6 -3 -4
23
Neural Networks § Benefits
– Generally applicable – Can capture higher order correlations – Inputs other than sequence information
§ Drawbacks – Needs a lot of data (different solved structures
with low sequence identity). – Complex methods with several pitfalls.
24
3/11/14
13
Neural Networks
I K E E H V I I Q A E
H E C
IKEEHVIIQAEFYLNPDQSGEF….. Window
Input Layer
Hidden Layer
Output Layer
Weights
25
NetSurfP
Prediction of Real Value Solvent Accessibility
By Bent Petersen
26
3/11/14
14
Objective § Predict residues as being either buried or
exposed (25 % threshold) – Two states/classes, Buried/Exposed
§ Predict the Relative Solvent Accessibility – “Real” Value
27
Why predict RSA?
§ Residues exposed on surface can be: – Involved in PTM’s – Potential antigenic regions – Involved in Protein-Protein interactions – Prediction of Disease-SNP’s
28
3/11/14
15
What is ASA?
§ Accessible Solvent Area, Å2
§ Surface area accessible to a rolling water molecule
29
RSA
RSA = Relative Solvent Accessibility ACC = Accessible area in protein structure ASA = Accessible Surface Area in Gly-X-Gly or Ala-X-Ala
Classification Networks “Real” value Networks
Classification: Buried = RSA < 25 %, Exposed = RSA > 25 %"“Real” Value: values 0 - 1, RSA > 1 set to 1"
30
3/11/14
16
Learning / Training dataset
§ Training set: Cull_1764: – Max. Seq. ID: 25 % – Resolution: ≤ 2.0 Å – R-Factor: ≤ 0.2 – Seq. Length 30-3000 AA – Excluding non-X-ray entries
31
Learning / Training dataset § Homology reduced against evaluation set
CB513 (302 sequences removed)
§ Final Training set: – 1764 sequences – 417.978 amino acids
• Buried: 55.80 % (233.221 amino acids) • Exposed: 44.20 % (184.757 amino acids)
32
3/11/14
17
Neural Network - Input Position Specific Scoring Matrices, PSSM
A R N D C Q E G H I L K M F P S T W Y V
B H 2BEM.A 1 -4 -3 -2 -4 -6 -2 -3 -5 11 -6 -5 -3 -4 -4 -5 -3 -4 -5 -1 -6 A G 2BEM.A 2 -2 -5 -3 -4 -5 -4 -5 7 -5 -7 -6 -4 -5 -6 -5 -3 -4 -5 -6 -6
A Y 2BEM.A 3 -1 1 -4 -3 -5 -4 -4 -4 1 -4 -1 -4 -1 2 -5 0 -1 4 7 -2
A V 2BEM.A 4 -1 -5 -5 -6 -4 -4 -5 -5 -5 4 1 -5 6 -3 -2 -2 0 -5 -4 4
B E 2BEM.A 5 -2 -4 -3 0 -4 -1 3 -2 -4 0 -3 -2 1 -2 -3 3 3 -5 -4 0
4 time iterativ psi-blast against nr70
Secondary Structure predictions B H 2BEM.A 1 0.003 0.003 0.966
A G 2BEM.A 2 0.018 0.086 0.868
A Y 2BEM.A 3 0.020 0.199 0.752
A V 2BEM.A 4 0.021 0.271 0.679
B E 2BEM.A 5 0.020 0.199 0.752 (sec predictor by Pernille Andersen)
"
33
Method
34
3/11/14
18
Results - Real Value Prediction
§ Training / Evaluation
Train Evaluated Method
Ahmad et al. (2003) Not Published 0.48 ANN
Yuan and Huang (2004) Not Published 0.52 SVR
Nguyen and Rajapakse(2006) Not Published 0.66 Two-Stage SVR
Dor and Zhou (2007) 0.738 Not Published ANN
NetSurfP 0.722 0.70 ANN
35
NetSurfP
/usr/cbs/bio/src/NetSurfP/NetSurfP -h
36
3/11/14
19
NetSurfP Output
37
Programme 8.00-8.30 Last week’s quiz results 8.30-9.00 Prediction of secondary structure &
surface exposure 9.00-9.20 Protein disorder prediction 9.20-9.30 Break 9.30-11.00 Ex.: Secondary structure prediction 11.00-11.10 Break 11.10-11.40 Summary & discussion 11.40-12.00 Quiz
38
3/11/14
20
Introduction to
DisEMBL, IUPred & FoldUnfold
Protein D i s o r d e r 39
Protein Folding § Initially formed
structure is in molten globule state (ensemble).
§ Molten globule condenses to native fold via transition state.
E
U
F
T
ΔG
Unfolded state, ensemble
Native fold, one structure
Transition state(s), one or more narrow ensembles
40
3/11/14
21
Degrees of Structure
41
Structures of Unstructured Regions
§ Estimate: 20% of all proteins contain unstructured regions. – 1% of structures in PDB contain
unstructured regions.
§ Structural genomics – Special structural genomics
projects – Selection and modification of
targets – Prediction of crystallisable
domains
Protein disorder publications in PubMed
Iakoucheva & Dunker Structure 2003
42
3/11/14
22
What’s the Fuss About? § Properties of Disordered Regions
– Flexible, i.e. adaptable – Accessible
• Contain Extended Linear Motifs (ELM)
– Different behaviour in interaction interfaces • Very adaptable • Many hydrophobic interactions (close packing)
– No fixed structure without interaction partner – Folding upon binding
43
DisEMBL § Basic notion
– No consensus on protein disorder definition. – Defines three types of disorder
§ The method – ANN-based
§ Disorder definitions – Loop/Coil (DSSP-assigned residues: T, S, B, I) – Hot loops (high B-factor) – Missing residues (in X-ray structures, “Remark 465”)
Linding et al. Structure 2003 44
3/11/14
23
DisEMBL § Derived propensity scale (implicit)
45
DisEMBL Output § Ero1-Lα
46
3/11/14
24
IUPred § Basic notion:
– Globular proteins need to make a large number of inter-residue interactions to overcome the loss of entropy upon folding.
§ The method – 20 x 20 energy predictor matrix (pairwise interactions).
• Derived from globular proteins. – Quadratic expression in amino acid composition.
§ Definitions – Binary definition: Order/disorder – Two ranges:
• long ~ regions/domains • Short ~ loops
– Domain prediction (inverse of long range predictions).
Dosztanáyi et al. Bioinformatics 2005 47
IUPred Output § Ero1-Lα
Position Residue Disorder Tendency 1 E 0.5055 2 E 0.3740 3 Q 0.1731 4 P 0.2164 5 P 0.1852
…
…
…
48
3/11/14
25
FoldUnfold § Basic notion
– Globular proteins need to establish a high number of interactions to compensate for the loss of entropy during the folding process.
§ The method – Mean packing density
• Derived from globular proteins. – ANN-based.
§ Definitions – Binary definition: Order/disorder – Two ranges:
• Long ~ regions/domains • Short ~ loops
Galzitskaya et al. Bioinformatics 2006
& Protein Science 2000 49
FoldUnfold Output
§ Ero1-Lα
disordered: 77 —— 99 disordered: 110 —— 126 disordered: 135 —— 152 disordered: 196 —— 207 disordered: 341 —— 351
50
3/11/14
26
Comparison
Disordered residues: 77 —— 99 110 —— 126 135 —— 152 196 —— 207 341 —— 351
DisEMBL
IUPred
FoldUnfold
51
Ero1 example
52
3/11/14
27
Links
§ DisEMBL: – http://dis.embl.de/
§ IUPred: – http://iupred.enzim.hu/
§ FoldUnfold – http://skuld.protres.ru/~mlobanov/ogu/
53
Programme 8.00-8.30 Last week’s quiz results 8.30-9.00 Prediction of secondary structure &
surface exposure 9.00-9.20 Protein disorder prediction 9.20-9.30 Break 9.30-11.00 Ex.: Secondary structure prediction 11.00-11.10 Break 11.10-11.40 Summary & discussion 11.40-12.00 Quiz
54
3/11/14
28
Exercise
http://xray.bmc.uu.se/gerard/embo2001/predic/index.html Step 1-5
55