Superposition & Threading - Bioinformatics...Superposition • The concept of superposition is key...
Transcript of Superposition & Threading - Bioinformatics...Superposition • The concept of superposition is key...
Lecture 3.3 1
Superposition & Threading†
Gary Van DomselaarUniversity of Alberta
†Slides adapted from David Wishart
Lecture 3.3 2
Outline
• Vectors, matrices and other geometry issues
• General Superposition concepts
• Threading and threading methods
Lecture 3.3 3
Vectors Define Bonds and Atomic Positions
x
y
z H3N+
O
ORH
Origin
CO bond
Lecture 3.3 4
Review - Vectors
(1,2,1)
(0,0,0)
u
u = 1i + 2j + 1k^ ^ ^
u =
121
= (1-0)2 + (2-0)2 + (1-0)2 = 6u
Vectors have a length & a direction
x
y
z
Lecture 3.3 5
Review - Vectors
• Vectors can be added together
• Vectors can be subtracted
• Vectors can be multiplied (dot or cross or by a matrix)
• Vectors can be transformed (resized)
• Vectors can be translated
• Vectors can be rotated
Lecture 3.3 6
Matrices
• A matrix is a table or “array” of characters
• A matrix is also called a tensor of “rank 2”
2 4 6 8 9 41 3 5 7 9 31 0 1 0 1 09 4 6 4 3 53 4 3 4 3 4
row
colu
mn
A 5 x 6 Matrix
# co
lum
ns
# ro
ws
Lecture 3.3 7
Different Types of Matrices
2 4 6 8 9 41 3 5 7 9 31 0 1 0 1 09 4 6 4 3 53 4 3 4 3 43 6 7 9 1 0
2 4 6 8 9 44 3 5 7 9 36 5 1 0 1 08 7 0 4 3 59 9 1 3 3 44 3 0 5 4 0
135973
A squareMatrix
A symmetricMatrix
A columnMatrix
(A vector)
Lecture 3.3 8
Different Types of Matrices
A B C D E FG H I J K LM N O P Q RS T U V W X
cosθ sinθ 0
sinθ -cosθ 0
0 0 1
A rectangularMatrix
A rotationMatrix
A rowMatrix
(A vector)
2 4 6 8 9
Lecture 3.3 9
Review - Matrix Multiplication
2 4 01 3 11 0 0
1 0 22 1 30 1 0
2x1 + 4x2 + 0x02x0 + 4x1 + 0x12x2 + 4x3 + 0x01x1 + 3x2 + 1x01x0 + 3x1 + 1x11x2 + 3x3 + 1x01x1 + 0x2 + 0x01x0 + 0x1 + 0x11x2 + 0x3 + 0x0
x10 4 16 7 4 11 1 0 0
Lecture 3.3 10
Rotation
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
cosφ sinφ 0-sinφ cosφ 0 0 0 1
Rotateabout x
Rotateabout z
θ
φx
z
y
Lecture 3.3 11
Rotation
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
cosφ sinφ 0-sinφ cosφ 0 0 0 1
Clockwise about x Clockwise about z
1 0 0
0 cosθ -sinθ 0 sinθ cosθ
cosφ -sinφ 0 sinφ cosφ 0 0 0 1
Counterclockwise about x Counterclockwise about z
Lecture 3.3 12
Rotation
X =
X =
x
y
z
x
y
z
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
Lecture 3.3 13
Rotation (Detail)
X =
x
y
z
x
y
z
= 1 cosθ + sinθ-sinθ + cosθ
111
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
1 0 0
0 cosθ sinθ 0 -sinθ cosθ
Lecture 3.3 14
Superposition
• Objective is to match or overlay 2 or more similar objects
• Requires use of translation and rotation operators (matrices/vectors)
• Recall that very three dimensional object can be represented by a plane defined by 3 points
Lecture 3.3 15
Superposition
x
y
z
a
b
c
a’b’
c’
x
y
z
a
b
c
a’b’
c’
Identify 3 “equivalence” points in objects to be aligned
Lecture 3.3 16
b’
c’
Superposition
x
y
z
x
y
z
a
b
c
a’b’
c’
a
b
c
Translate points a,b,c and a’,b’,c’ to origin
Lecture 3.3 17
b’
c’
Superposition
x
y
z
a
b
c
b’
c’
x
y
z
θ a
b
c
Rotate the a,b,c plane clockwise by θ about x axis
Lecture 3.3 18
Superposition
b’
c’
x
y
z
a
b
c
b’
c’
x
y
z
a
bc
φ φ
Rotate the a,b,c plane clockwise by φ about z axis
Lecture 3.3 19
Superposition
b’
c’
x
y
z
a
bc
b’
c’
x
y
z
a
bc
ψ
Rotate the a,b,c plane clockwise by ψ about x axis
Lecture 3.3 20
Superposition
b’
c’
x
y
z
a
bc
b’
c’
x
y
z
a
bc
θ ’
Rotate the a’,b’,c’ plane anticlockwise by θ ’ about x axis
Lecture 3.3 21
Superposition
b’
c’
x
y
z
a
bc
b’c’
x
y
z
a
bc
φ ‘
Rotate the a’,b’,c’ plane anticlockwise by φ ’ about z axis
Lecture 3.3 22
Superposition
b’c’
x
y
z
a
bc
Rotate the a’,b’,c’ plane clockwise by ψ ’ about x axis
b’c’
x
y
z
a
bc
ψ ’
Lecture 3.3 23
Superposition
Apply all rotations and translations to remaining points
b’c’
x
y
z
a
bcb’c’
x
y
z
a
bc
Lecture 3.3 24
Superposition
Before After
b’c’
x
y
z
a
bcx
y
z
a
b
c
a’b’
c’
Lecture 3.3 25
Returning to the “red” frame
Before After
y
z
x
b’c’
x
y
z
a
bc
a
b
c
Lecture 3.3 26
Returning to the “red” frame
• Begin with the superimposed structures on the x-y plane
• Apply counterclockwise rot. By ψ• Apply counterclockwise rot. By φ• Apply counterclockwise rot. By θ• Apply red translation to red origin
Just do things in reverse order!
Lecture 3.3 27
Superposition - Applications
• Ideal for comparing or overlaying two or more protein structures
• Allows identification of structural homologues (CATH and SCOP)
• Allows loops to be inserted or replaced from loop libraries (comparative modelling)
• Allows side chains to be replaced or inserted with relative ease
Lecture 3.3 28
Side Chain Placement
http://www.fccc.edu/research/labs/dunbrack/scwrl/
SCWRL
Lecture 3.3 29
CCOOHH2N
H
NH3+
Amino Acid Side Chains
Lecture 3.3 30
Adding a Side Chain
x
y
z
x
y
z
x
y
z
Lecture 3.3 31
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 32
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 33
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 34
Adding a Side Chain
x
y
z
x
y
z
y
Lecture 3.3 35
Superposition• The concept of superposition is key to
many aspects of protein structure generation and comparison
• Superposition may be used to insert side chains and loops (for homology models)
• Side chains require more consideration as side chain packing ultimately determines the 3D structure of proteins
Lecture 3.3 36
Superposition - RMSD• The degree of similarity between two or
more structures is described by its average root mean square deviation (RMSD):
x1
RMSD N;x ,y =∑i=1N
∥xi−yi∥2
N
x1
x5
x4
x3
x2
y1
y2
y3 y
4
y5
0"
0.3"
0.8 "
0.7"
1"
Lecture 3.3 37
Superposition Software
• Swiss PDB Viewer– Aligns 2
homologous structures
Lecture 3.3 38
Superposition Software• CE: Structure Comparison by
Combinatorial Extension
• http://cl.sdsc.edu/ce.html
• Superposition for 2 chains and for multiple chains (new)
Lecture 3.3 39
Superposition Software• SuperPose
http://wishart.biology.ualberta.ca/SuperPose/
• Superposition for 2 chains and for multiple chains
• Subdomain superposition
• Superposition of structures with low sequence identity
Lecture 3.3 40
Definition
• Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.
Lecture 3.3 41
Why Threading?
• Secondary structure is more conserved than primary structure
• Tertiary structure is more conserved than secondary structure
• Therefore very remote relationships can be better detected through 2o or 3o structural homology instead of sequence homology
Lecture 3.3 42
Visualizing Threading
TH
READ
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 43
Visualizing Threading
TH
RE
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 44
Visualizing Threading
TH
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 45
Visualizing Threading
THREADINGSEQNCEECNQESGNIERHTHREADINGSEQNCETHREADGSEQNCEQCQESGIDAERTHR...
Lecture 3.3 46
Visualizing ThreadingTHREAD..SEQNCEECN..
Lecture 3.3 47
Threading• Database of 3D structures and sequences
– Protein Data Bank (or non-redundant subset)
• Query sequence– Sequence < 25% identity to known structures
• Alignment protocol– Dynamic programming
• Evaluation protocol– Distance-based potential or secondary structure
• Ranking protocol
Lecture 3.3 48
2 Kinds of Threading
• 2D Threading or Prediction Based Methods (PBM)– Predict secondary structure (SS) or ASA of query
– Evaluate on basis of SS and/or ASA matches
• 3D Threading or Distance Based Methods (DBM)– Create a 3D model of the structure
– Evaluate using a distance-based “hydrophobicity” or pseudo-thermodynamic potential
Lecture 3.3 49
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 50
Database Conversion>Protein1THREADINGSEQNCEECNQESGNIHHHHHHCCCCEEEEECCCHHHHHHERHTHREADINGSEQNCETHREADHHCCEEEEECCCCCHHHHHHHHHH
>Protein2QWETRYEWQEDFSHAECNQESGNIEEEEECCCCHHHHHHHHHHHHHHHYTREWQHGFDSASQWETRACCCCEEEEECCCEEEEECC
>Protein3LKHGMNSNWEDFSHAECNQESGEEECCEEEECCCEEECCCCCCC
Lecture 3.3 51
Secondary Structure
Structure Phi (Φ) Psi(Ψ)
Antiparallel β-sheet -139 +135Parallel β-Sheet -119 +113Right-handed α-helix +64 +40
310 helix -49 -26
π helix -57 -70Polyproline I -83 +158Polyproline II -78 +149Polyglycine II -80 +150
Phi & Psi angles for Regular Secondary Structure Conformations
Table 10
- -
Lecture 3.3 52
2o Structure Identification
• DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp)
• VADAR - Volume Area Dihedral Angle Reporter (redpoll.pharmacy.ualberta.ca)
• PDB - Protein Data Bank (www.rcsb.org)
QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAHHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC
Lecture 3.3 53
Accessible Surface Area
Solvent ProbeAccessible Surface
Van der Waals Surface
Reentrant Surface
Lecture 3.3 54
ASA Calculation• DSSP - Database of Secondary Structures for Proteins
(swift.embl-heidelberg.de/dssp)
• VADAR - Volume Area Dihedral Angle Reporter (www.redpoll.pharmacy.ualberta.ca/vadar/)
• GetArea - www.scsb.utmb.edu/getarea/area_form.html
QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE1056298799415251510478941496989999999
Lecture 3.3 55
Other ASA sites• Connolly Molecular Surface Home Page
– http://www.biohedron.com/
• Naccess Home Page – http://sjh.bi.umist.ac.uk/naccess.html
• ASA Parallelization– http://cmag.cit.nih.gov/Asa.htm
• Protein Structure Database – http://www.psc.edu/biomed/pages/research/PSdb/
Lecture 3.3 56
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 57
ASA Prediction
• PredictProtein-PHDacc (58%)– http://cubic.bioc.columbia.edu/predictprotein
• PredAcc (70%?)– condor.urbb.jussieu.fr/PredAccCfg.html
QHTAW... QHTAWCLTSEQHTAAVIWBBPPBEEEEEPBPBPBPB
Lecture 3.3 58
2D Threading Algorithm
• Convert PDB to a database containing sequence, SS and ASA information
• Predict the SS and ASA for the query sequence using a “high-end” algorithm
• Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA)
• Rank the alignments and select the most probable fold
Lecture 3.3 59
G E N E T I C S
| | | | * | |
G E N E S I S
G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 0 0 10 0 10 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10
G E N E T I C SG 60 40 30 20 20 0 10 0E 40 50 30 30 20 0 10 0N 30 30 40 20 20 0 10 0E 20 20 20 30 20 10 10 0S 20 20 20 20 20 0 10 10I 10 10 10 10 10 20 10 0S 0 0 0 0 0 0 0 10
Lecture 3.3 60
Sij (Identity Matrix) A C D E F G H I K L M N P Q R S T V W YA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0D 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0E 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0F 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0H 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0I 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0K 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0L 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0M 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0N 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0P 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0Q 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Lecture 3.3 61
A A T V DA 1VVD
A A T V DA 1 1 VVD
A A T V DA 1 1 0 0 0VVD
A A T V DA 1 1 0 0 0V 0VD
A A T V DA 1 1 0 0 0V 0 1 1VD
A A T V DA 1 1 0 0 0V 0 1 1 2VD
Lecture 3.3 62
A Simple Example...
A A T V DA 1 1 0 0 0V 0 1 1 2 1VD
A A T V DA 1 1 0 0 0V 0 1 1 2 1V 0 1 1 2 2D 0 1 1 1 3
A A T V DA 1 1 0 0 0V 0 1 1 2 1V 0 1 1 2 2D 0 1 1 1 3
A A T V D | | | |A - V V D
A A T V D | | | | A V V D
A A T V D | | | |A V - V D
Lecture 3.3 63
Let’s Include 2o info & ASA
H E CH 1 0 0E 0 1 0C 0 0 1
E P BE 1 0 0P 0 1 0B 0 0 1
Sij = k1Sij + k2Sij + k3Sijseq strc asatotal
Sijstrc Sij
asa
Lecture 3.3 64
A A T V DA 2VVD
A A T V DA 2 2 VVD
A A T V DA 2 2 1 0 0VVD
A A T V DA 2 2 1 0 0V 1VD
A A T V DA 2 2 1 0 0V 1 3 3VD
A A T V DA 2 2 1 0 0V 1 3 3 3VD
E E E C C E E E C C E E E C C
E E E C C E E E C C E E E C C
EECC
EECC
EECC
EECC
EECC
EECC
Lecture 3.3 65
A Simple Example...
A A T V DA 2 2 1 0 0V 1 3 3 3 2VD
A A T V DA 2 2 1 0 0V 1 3 3 3 2V 0 2 3 5 4D 0 2 3 4 7
A A T V DA 2 2 1 0 0V 1 3 3 3 2V 0 2 3 5 4D 0 2 3 4 7
E E E C C E E E C C E E E C C
EECC
EECC
EECC
A A T V D | | | |A - V V D
A A T V D | | | | A V V D
A A T V D | | | |A V - V D
Lecture 3.3 66
2D Threading Performance
• In test sets 2D threading methods can identify 30-40% of proteins having very remote homologues (i.e. not detected by BLAST) using “minimal” non-redundant databases (<700 proteins)
• If the database is expanded ~4x the performance jumps to 70-75%
• Performs best on true homologues as opposed to postulated analogues
Lecture 3.3 67
2D Threading Advantages
• Algorithm is easy to implement
• Algorithm is very fast (10x faster than 3D threading approaches)
• The 2D database is small (<500 kbytes) compared to 3D database (>1.5 Gbytes)
• Appears to be just as accurate as DBM or other 3D threading approaches
• Very amenable to web servers
Lecture 3.3 68
Servers - PredictProtein
Lecture 3.3 69
Servers - 123D
Lecture 3.3 70
Servers - GenThreader
Lecture 3.3 71
More Servers - www.bronco.ualberta.ca
Lecture 3.3 72
2D Threading Disadvantages
• Reliability is not 100% making most threading predictions suspect unless experimental evidence can be used to support the conclusion
• Does not produce a 3D model at the end of the process
• Doesn’t include all aspects of 2o and 3o structure features in prediction process
• PSI-BLAST may be just as good (faster too!)
Lecture 3.3 73
Making it Better
• Include 3D threading analysis as part of the 2D threading process -- offers another layer of information
• Include more information about the “coil” state (3-state prediction isn’t good enough)
• Include other biochemical (ligands, function, binding partners, motifs) or phylogenetic (origin, species) information
Lecture 3.3 74
3D Threading Servers
• Generate 3D models or coordinates of possible models based on input sequence
• Loopp (version 2) – http://ser-loopp.tc.cornell.edu/loopp.html
• 3D-PSSM– http://www.sbg.bio.ic.ac.uk/~3dpssm/
• All require email addresses since the process may take hours to complete
Lecture 3.3 75
Lecture 3.3 76