Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta...
-
Upload
simon-mccormick -
Category
Documents
-
view
224 -
download
7
Transcript of Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta...
Lecture 3.3 1
Superposition & Threading†
Gary Van Domselaar
University of Alberta
†Slides adapted from David Wishart
Lecture 3.3 2
Outline
• Vectors, matrices and other geometry issues
• General Superposition concepts• Threading and threading methods
Lecture 3.3 3
Vectors Define Bonds and Atomic Positions
x
y
z H3N+
O
ORH
Origin
CO bond
Lecture 3.3 4
Review - Vectors
(1,2,1)
(0,0,0)
u
u = 1i + 2j + 1k^ ^ ^
u =
121
= (1-0)2 + (2-0)2 + (1-0)2 = 6u
Vectors have a length & a direction
x
y
z
Lecture 3.3 5
Review - Vectors
• Vectors can be added together
• Vectors can be subtracted
• Vectors can be multiplied (dot or cross or by a matrix)
• Vectors can be transformed (resized)
• Vectors can be translated
• Vectors can be rotated
Lecture 3.3 6
Matrices
• A matrix is a table or “array” of characters • A matrix is also called a tensor of “rank 2”
2 4 6 8 9 41 3 5 7 9 31 0 1 0 1 09 4 6 4 3 53 4 3 4 3 4
row
colu
mn
A 5 x 6 Matrix
# co
lum
ns
# ro
ws
Lecture 3.3 7
Different Types of Matrices
2 4 6 8 9 41 3 5 7 9 31 0 1 0 1 09 4 6 4 3 53 4 3 4 3 43 6 7 9 1 0
2 4 6 8 9 44 3 5 7 9 36 5 1 0 1 08 7 0 4 3 59 9 1 3 3 44 3 0 5 4 0
135973
A squareMatrix
A symmetricMatrix
A columnMatrix
(A vector)
Lecture 3.3 8
Different Types of Matrices
A B C D E FG H I J K LM N O P Q RS T U V W X
cos sin 0 sin -cos 0
0 0 1
A rectangularMatrix
A rotationMatrix
A rowMatrix
(A vector)
2 4 6 8 9
Lecture 3.3 9
Review - Matrix Multiplication
2 4 01 3 11 0 0
1 0 22 1 30 1 0
2x1 + 4x2 + 0x02x0 + 4x1 + 0x12x2 + 4x3 + 0x01x1 + 3x2 + 1x01x0 + 3x1 + 1x11x2 + 3x3 + 1x01x1 + 0x2 + 0x01x0 + 0x1 + 0x11x2 + 0x3 + 0x0
x10 4 16 7 4 11 1 0 0
Lecture 3.3 10
Rotation
1 0 0 0 cos sin 0 -sin cos
cos sin 0-sin cos 0 0 0 1
Rotateabout x
Rotateabout z
x
z
y
Lecture 3.3 11
Rotation
1 0 0 0 cos sin 0 -sin cos
cos sin 0-sin cos 0 0 0 1
Clockwise about x Clockwise about z
1 0 0 0 cos -sin 0 sin cos
cos -sin 0 sin cos 0 0 0 1
Counterclockwise about x Counterclockwise about z
Lecture 3.3 12
Rotation
X =
X =
x
y
z
x
y
z
1 0 0 0 cos sin 0 -sin cos
1 0 0 0 cos sin 0 -sin cos
Lecture 3.3 13
Rotation (Detail)
X =
x
y
z
x
y
z
= cossin-sin + cos
111
1 0 0 0 cos sin 0 -sin cos
1 0 0 0 cos sin 0 -sin cos
Lecture 3.3 14
Superposition
• Objective is to match or overlay 2 or more similar objects
• Requires use of translation and rotation operators (matrices/vectors)
• Recall that very three dimensional object can be represented by a plane defined by 3 points
Lecture 3.3 15
Superposition
x
y
z
a
b
c
a’b’
c’
x
y
z
a
b
c
a’b’
c’
Identify 3 “equivalence” points in objects to be aligned
Lecture 3.3 16
b’
c’
Superposition
x
y
z
x
y
z
a
b
c
a’b’
c’
a
b
c
Translate points a,b,c and a’,b’,c’ to origin
Lecture 3.3 17
b’
c’
Superposition
x
y
z
a
b
c
b’
c’
x
y
z
a
b
c
Rotate the a,b,c plane clockwise by about x axis
Lecture 3.3 18
Superposition
b’
c’
x
y
z
a
b
c
b’
c’
x
y
z
a
bc
Rotate the a,b,c plane clockwise by about z axis
Lecture 3.3 19
Superposition
b’
c’
x
y
z
a
bc
b’
c’
x
y
z
a
bc
Rotate the a,b,c plane clockwise by about x axis
Lecture 3.3 20
Superposition
b’
c’
x
y
z
a
bc
b’
c’
x
y
z
a
bc
’
Rotate the a’,b’,c’ plane anticlockwise by ’ about x axis
Lecture 3.3 21
Superposition
b’
c’
x
y
z
a
bc
b’c’
x
y
z
a
bc
‘
Rotate the a’,b’,c’ plane anticlockwise by ’ about z axis
Lecture 3.3 22
Superposition
b’c’
x
y
z
a
bc
Rotate the a’,b’,c’ plane clockwise by ’ about x axis
b’c’
x
y
z
a
bc
’
Lecture 3.3 23
Superposition
Apply all rotations and translations to remaining points
b’c’
x
y
z
a
bcb’c’
x
y
z
a
bc
Lecture 3.3 24
Superposition
Before After
b’c’
x
y
z
a
bcx
y
z
a
b
c
a’b’
c’
Lecture 3.3 25
Returning to the “red” frame
Before After
y
z
x
b’c’
x
y
z
a
bc
a
b
c
Lecture 3.3 26
Returning to the “red” frame
• Begin with the superimposed structures on the x-y plane
• Apply counterclockwise rot. By • Apply counterclockwise rot. By • Apply counterclockwise rot. By • Apply red translation to red origin
Just do things in reverse order!
Lecture 3.3 27
Superposition - Applications
• Ideal for comparing or overlaying two or more protein structures
• Allows identification of structural homologues (CATH and SCOP)
• Allows loops to be inserted or replaced from loop libraries (comparative modelling)
• Allows side chains to be replaced or inserted with relative ease
Lecture 3.3 28
Side Chain Placement
http://www.fccc.edu/research/labs/dunbrack/scwrl/
SCWRL
Lecture 3.3 29
CCOOHH2N
H
NH3+
Amino Acid Side Chains
Lecture 3.3 30
Adding a Side Chain
x
y
z
x
y
z
x
y
z
Lecture 3.3 31
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 32
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 33
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 34
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 35
Superposition• The concept of superposition is key to
many aspects of protein structure generation and comparison
• Superposition may be used to insert side chains and loops (for homology models)
• Side chains require more consideration as side chain packing ultimately determines the 3D structure of proteins
Lecture 3.3 36
Superposition - RMSD• The degree of similarity between two or
more structures is described by its average root mean square deviation (RMSD):
x1
RMSD N;x ,y i 1
N
xi y i
2
N
x1
x5
x4
x3
x2
y1
y2
y3 y
4
y5
0"
0.8 "
0.7"
1"
Lecture 3.3 37
Superposition Software
• Swiss PDB Viewer– Aligns 2
homologous structures
Lecture 3.3 38
Superposition Software• CE: Structure Comparison by
Combinatorial Extension
• http://cl.sdsc.edu/ce.html
• Superposition for 2 chains and for multiple chains (new)
Lecture 3.3 39
Superposition Software• SuperPose
http://wishart.biology.ualberta.ca/SuperPose/
• Superposition for 2 chains and for multiple chains
• Subdomain superposition
• Superposition of structures with low sequence identity
Lecture 3.3 40
Definition
• Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.
Lecture 3.3 41
Why Threading?
• Secondary structure is more conserved than primary structure
• Tertiary structure is more conserved than secondary structure
• Therefore very remote relationships can be better detected through 2o or 3o structural homology instead of sequence homology
Lecture 3.3 42
Visualizing Threading
TH
READ
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 43
Visualizing Threading
TH
RE
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 44
Visualizing Threading
TH
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 45
Visualizing Threading
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 46
Visualizing ThreadingTHREAD..SEQNCEECN..
Lecture 3.3 47
Threading• Database of 3D structures and sequences
– Protein Data Bank (or non-redundant subset)
• Query sequence– Sequence < 25% identity to known structures
• Alignment protocol– Dynamic programming
• Evaluation protocol– Distance-based potential or secondary structure
• Ranking protocol
Lecture 3.3 48
2 Kinds of Threading
• 2D Threading or Prediction Based Methods (PBM)– Predict secondary structure (SS) or ASA of query– Evaluate on basis of SS and/or ASA matches
• 3D Threading or Distance Based Methods (DBM)– Create a 3D model of the structure– Evaluate using a distance-based “hydrophobicity”
or pseudo-thermodynamic potential
Lecture 3.3 49
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 50
Database Conversion>Protein1THREADINGSEQNCEECNQESGNIHHHHHHCCCCEEEEECCCHHHHHHERHTHREADINGSEQNCETHREADHHCCEEEEECCCCCHHHHHHHHHH
>Protein2QWETRYEWQEDFSHAECNQESGNIEEEEECCCCHHHHHHHHHHHHHHHYTREWQHGFDSASQWETRACCCCEEEEECCCEEEEECC
>Protein3LKHGMNSNWEDFSHAECNQESGEEECCEEEECCCEEECCCCCCC
Lecture 3.3 51
Secondary Structure
Structure Phi ( ) Psi( )
Antiparallel -sheet -139 +135Parallel -Sheet -119 +113Right-handed -helix +64 +40
310 helix -49 -26
helix -57 -70Polyproline I -83 +158Polyproline II -78 +149Polyglycine II -80 +150
Phi & Psi angles for Regular Secondary Structure Conformations
Table 10
- -
Lecture 3.3 52
2o Structure Identification
• DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp)
• VADAR - Volume Area Dihedral Angle Reporter (redpoll.pharmacy.ualberta.ca)
• PDB - Protein Data Bank (www.rcsb.org)
QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAHHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC
Lecture 3.3 53
Accessible Surface Area
Solvent ProbeAccessible Surface
Van der Waals Surface
Reentrant Surface
Lecture 3.3 54
ASA Calculation• DSSP - Database of Secondary Structures for Proteins
(swift.embl-heidelberg.de/dssp)• VADAR - Volume Area Dihedral Angle Reporter
(www.redpoll.pharmacy.ualberta.ca/vadar/)
• GetArea - www.scsb.utmb.edu/getarea/area_form.html
QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE1056298799415251510478941496989999999
Lecture 3.3 55
Other ASA sites• Connolly Molecular Surface Home Page
– http://www.biohedron.com/
• Naccess Home Page – http://sjh.bi.umist.ac.uk/naccess.html
• ASA Parallelization– http://cmag.cit.nih.gov/Asa.htm
• Protein Structure Database – http://www.psc.edu/biomed/pages/research/PSdb/
Lecture 3.3 56
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 57
ASA Prediction
• PredictProtein-PHDacc (58%)– http://cubic.bioc.columbia.edu/predictprotein
• PredAcc (70%?)– condor.urbb.jussieu.fr/PredAccCfg.html
QHTAW... QHTAWCLTSEQHTAAVIWBBPPBEEEEEPBPBPBPB
Lecture 3.3 58
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 59
G E N E T I C S| | | | * | |G E N E S I S
G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 0 0 10 0 10 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10
G E N E T I C SG 60 40 30 20 20 0 10 0E 40 50 30 30 20 0 10 0N 30 30 40 20 20 0 10 0E 20 20 20 30 20 10 10 0S 20 20 20 20 20 0 10 10I 10 10 10 10 10 20 10 0S 0 0 0 0 0 0 0 10
Lecture 3.3 60
Sij (Identity Matrix) A C D E F G H I K L M N P Q R S T V W YA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0D 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0E 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0F 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0H 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0I 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0K 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0L 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0M 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0N 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0P 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0Q 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Lecture 3.3 61
A A T V DA 1VVD
A A T V DA 1 1 VVD
A A T V DA 1 1 0 0 0VVD
A A T V DA 1 1 0 0 0V 0VD
A A T V DA 1 1 0 0 0V 0 1 1VD
A A T V DA 1 1 0 0 0V 0 1 1 2VD
Lecture 3.3 62
A Simple Example...
A A T V DA 1 1 0 0 0V 0 1 1 2 1VD
A A T V DA 1 1 0 0 0V 0 1 1 2 1V 0 1 1 2 2D 0 1 1 1 3
A A T V DA 1 1 0 0 0V 0 1 1 2 1V 0 1 1 2 2D 0 1 1 1 3
A A T V D | | | |A - V V D
A A T V D | | | | A V V D
A A T V D | | | |A V - V D
Lecture 3.3 63
Let’s Include 2o info & ASA
H E CH 1 0 0E 0 1 0C 0 0 1
E P BE 1 0 0P 0 1 0B 0 0 1
Sij = k1Sij + k2Sij + k3Sijseq strc asatotal
Sijstrc Sij
asa
Lecture 3.3 64
A A T V DA 2VVD
A A T V DA 2 2 VVD
A A T V DA 2 2 1 0 0VVD
A A T V DA 2 2 1 0 0V 1VD
A A T V DA 2 2 1 0 0V 1 3 3VD
A A T V DA 2 2 1 0 0V 1 3 3 3VD
E E E C C E E E C C E E E C C
E E E C C E E E C C E E E C C
EECC
EECC
EECC
EECC
EECC
EECC
Lecture 3.3 65
A Simple Example...
A A T V DA 2 2 1 0 0V 1 3 3 3 2VD
A A T V DA 2 2 1 0 0V 1 3 3 3 2V 0 2 3 5 4D 0 2 3 4 7
A A T V DA 2 2 1 0 0V 1 3 3 3 2V 0 2 3 5 4D 0 2 3 4 7
E E E C C E E E C C E E E C C
EECC
EECC
EECC
A A T V D | | | |A - V V D
A A T V D | | | | A V V D
A A T V D | | | |A V - V D
Lecture 3.3 66
2D Threading Performance
• In test sets 2D threading methods can identify 30-40% of proteins having very remote homologues (i.e. not detected by BLAST) using “minimal” non-redundant databases (<700 proteins)
• If the database is expanded ~4x the performance jumps to 70-75%
• Performs best on true homologues as opposed to postulated analogues
Lecture 3.3 67
2D Threading Advantages
• Algorithm is easy to implement• Algorithm is very fast (10x faster than 3D
threading approaches)• The 2D database is small (<500 kbytes)
compared to 3D database (>1.5 Gbytes)• Appears to be just as accurate as DBM or
other 3D threading approaches• Very amenable to web servers
Lecture 3.3 68
Servers - PredictProtein
Lecture 3.3 69
Servers - 123D
Lecture 3.3 70
Servers - GenThreader
Lecture 3.3 71
More Servers - www.bronco.ualberta.ca
Lecture 3.3 72
2D Threading Disadvantages
• Reliability is not 100% making most threading predictions suspect unless experimental evidence can be used to support the conclusion
• Does not produce a 3D model at the end of the process
• Doesn’t include all aspects of 2o and 3o structure features in prediction process
• PSI-BLAST may be just as good (faster too!)
Lecture 3.3 73
Making it Better
• Include 3D threading analysis as part of the 2D threading process -- offers another layer of information
• Include more information about the “coil” state (3-state prediction isn’t good enough)
• Include other biochemical (ligands, function, binding partners, motifs) or phylogenetic (origin, species) information
Lecture 3.3 74
3D Threading Servers
• Generate 3D models or coordinates of possible models based on input sequence
• Loopp (version 2) – http://ser-loopp.tc.cornell.edu/loopp.html
• 3D-PSSM– http://www.sbg.bio.ic.ac.uk/~3dpssm/
• All require email addresses since the process may take hours to complete
Lecture 3.3 75
Lecture 3.3 76