2009 CSBB LAB 新生訓練

86
Protein structure concepts and its related computation problem Speaker: Chia Han Chu (PHD candidate) 21/07/2009 1 nthu CSBB lab

description

Protein structure concepts and its related computation problem 朱家漢主講

Transcript of 2009 CSBB LAB 新生訓練

Page 1: 2009 CSBB LAB 新生訓練

Protein structure concepts and its related computation problem

Speaker: Chia Han Chu (PHD candidate)

21/07/2009 1nthu CSBB lab

Page 2: 2009 CSBB LAB 新生訓練

What are proteins made of?• The parts of a protein, backbone and

side chain

H

OH

“Backbone”: N, C, C, N, C, C…

R: “side chain”21/07/2009 2nthu CSBB lab

Page 3: 2009 CSBB LAB 新生訓練

What are proteins made of?• By replacing different R, twenty amino acid can be formed and grouped according to the chemic -al and physical propert -ies (e.g. size) of the R

21/07/2009 3nthu CSBB lab

Page 4: 2009 CSBB LAB 新生訓練

What are proteins made of?• Pepide is an substance between

a animo acid (a.a for short) and a protein.

• The smallest molecular is a a.a. and the biggest one is a protein.

• Two or more a.a forms a pepide by utilizing peptide bond formation with removal of water.

21/07/2009 4nthu CSBB lab

Page 5: 2009 CSBB LAB 新生訓練

What are proteins made of?• Dipeptide and peptide bond

羧基 胺基

脫水

21/07/2009 5nthu CSBB lab

Page 6: 2009 CSBB LAB 新生訓練

What is protein structure?• Proteins are linear polymers that fold

up by themselves…mostly.

21/07/2009 6nthu CSBB lab

Page 7: 2009 CSBB LAB 新生訓練

What is protein structure?• Quaternary StructuresQuaternary Structures

– Proteins that are comprised

of more than one polypeptide chain

– Each polypeptide chain in such

a protein is called a subunit

Example: Hemoglobin

21/07/2009 7nthu CSBB lab

Page 8: 2009 CSBB LAB 新生訓練

What are the primary secondary structures?

• A common motif in the secondary structure of proteins, the alpha helix (α-helix) is a right- or left-handed coiled conformation.

• 3.6 amino acid (residues) per turn• O(i) hydrogen bonds to N(i+4)

Wikipedia21/07/2009 8nthu CSBB lab

Page 9: 2009 CSBB LAB 新生訓練

What are the primary secondary structures?

• A beta strand (also β-strand) is a stretch of amino acids typically 5–10 amino acids long whose peptide backbones are almost fully extended

• The β sheet (also β-pleated sheet) is the second form of regular secondary structure in proteins consisting of beta strands conn-ected laterally by three or more hydrogen bonds, forming a gener-ally twisted, pleated sheet. The picture comes from Wiki

21/07/2009 9nthu CSBB lab

Page 10: 2009 CSBB LAB 新生訓練

What are the primary secondary structures?

• Parallel and anti-parallel sheets

Parallel Anti-parallel21/07/2009 10nthu CSBB lab

Page 11: 2009 CSBB LAB 新生訓練

What are the primary secondary structures?

• Loops• Connect the secondary structure

Elements (Helix or strand)• Have various lengths and shapes• Located at the surface of the fold

-ed protein and therefore may have

important role in biological recognitio

-n processes• Proteins that are evolutionary relat

-ed have the same helices & sheets

but may vary in loop structures

Figure 2.8, Brandon & Tooze

21/07/2009 11nthu CSBB lab

Page 12: 2009 CSBB LAB 新生訓練

What are the super-secondary structures?

• Simple combinations of secondary structural elements, called motifs or supersecondary structure

Beta hairpin

Beta-alpha-beta unitHelix hairpin

21/07/2009 12nthu CSBB lab

Page 13: 2009 CSBB LAB 新生訓練

What are the super-secondary structures?

• Assembly of secondary structures which are shared by many structures

β hairpin

21/07/2009 13nthu CSBB lab

Page 14: 2009 CSBB LAB 新生訓練

What are the super-secondary structures?

• Assembly of secondary structures which are shared by many structures

Green key

21/07/2009 14nthu CSBB lab

Page 15: 2009 CSBB LAB 新生訓練

What are the super-secondary structures?

• Assembly of secondary structures which are shared by many structures

β-α-β Found almost in every protein structure with a parallel -sheet

21/07/2009 15nthu CSBB lab

Page 16: 2009 CSBB LAB 新生訓練

What is a protein domain?• A protein domain is a part of protein sequence

and structure that can evolve, function, and exist independently of the rest of the protein chain.

• Each domain forms a compact three-dimensional structure and often can be independently stable and folded.• One domain may appear in a variety of evolutionarily related proteins. • Domains vary in length from betweenabout 25 a.a up to 500 a.a in length

Pyruvate kinase, a protein from three domains (PDB 1pkn).*The picture above comes from wiki

Domain 1

Domain 2

Domain 3

21/07/2009 16nthu CSBB lab

Page 17: 2009 CSBB LAB 新生訓練

What is a protein domain?• Domains often form functional units, such as the calcium-

binding EF hand domain of calmodulin. • The EF hand is a helix-loop-helix structural domain found in a large family of calcium-binding proteins.

• Protein parvalbumin, which contains three such motifs and is probably involv-ed in muscle relaxation via its calcium-binding activity.

Calmodulin with four EF-Hand-motifs.*The above picture comes from Wiki

loop region (usually about

12 amino acids)

loop region (usually about

12 amino acids)

21/07/2009 17nthu CSBB lab

Page 18: 2009 CSBB LAB 新生訓練

What is a protein domain?• Because domains are self-stable, domains can be

"swapped" by genetic engineering between one protein and another to make chimera proteins.

1.BS-RNase. 2.The picture comes from the paper, 3D Domain swapping: A mechanism for oligomer assembly, Protein Science (1995)

21/07/2009 18nthu CSBB lab

Page 19: 2009 CSBB LAB 新生訓練

General concepts for structural bioinformatics

SequenceSequence

StructureStructureAnalysis

Classification

FunctionFunction

PredictionModelling

DesignEngineering

21/07/2009 19nthu CSBB lab

Page 20: 2009 CSBB LAB 新生訓練

Structure Databases• Original database-PDB

– Only one central repository for experimentally determined macromolecular structures – the Protein Data Bank (PDB)

– Established 1971– Walter Hamilton @ Brookhaven– 7 structures– “PDB format”– Magnetic tape distribution

21/07/2009 20nthu CSBB lab

Page 21: 2009 CSBB LAB 新生訓練

Other primary structure databases

• NDB – Nucleic acid Data Base– Most structures also in PDB

• BMRB – BioMagResBank– Experimental NMR data– Joined wwPDB in 2006

• CSD – Cambridge Structural Database– Small molecules, including some peptides

and antibiotics– You have to pay to use it!><

21/07/2009 21nthu CSBB lab

Page 22: 2009 CSBB LAB 新生訓練

Structure Databases• PDB accepts experimental structures of

“biopolymers”• When is a biomolecule big enough?

– Polypeptides: > 23 resides– Polynucleotides: > 3 residues ??– Polysaccharides: > 3 sugar residues– Fibers (only repeating unit deposited)

• Where is smaller molecules?– Deposit at Cambridge Crystallographic Data Center (CCDC) or NDB

21/07/2009 22nthu CSBB lab

Page 23: 2009 CSBB LAB 新生訓練

Structure Databases• International effort

– Curated by RCSB (USA), PDBe (EBI-MSD;Europe) and PDBj (Japan) + BMRB (USA) forNMR data

• > 58000 structures (July, 2009)• Distribute over internet• Updated daily• “The PDB” = ftp archive of “flat” PDBfile format

21/07/2009 23nthu CSBB lab

Page 24: 2009 CSBB LAB 新生訓練

Structure Databases

21/07/2009 24nthu CSBB lab

Page 25: 2009 CSBB LAB 新生訓練

Structure Databases• Redundancy

– There are > 58000 structures (July, 2009)– There are > 120,000 chains

• Multiple copies per entry (e.g. dimer, viruses)

– However there are only ~ 8600 unique proteins – why?

• Non-protein entries (DNA, RNA, carbohydrates醣類 , antibiotics抗生素 )

• Different laboratories• Complexes• Mutants• Paralogs and orthologs

21/07/2009 25nthu CSBB lab

Page 26: 2009 CSBB LAB 新生訓練

Structure Databases• To error is human...

– Experimental structures• May contain errors!• Need for validation!

21/07/2009 26nthu CSBB lab

Page 27: 2009 CSBB LAB 新生訓練

Structure Databases• PDB files

21/07/2009 27nthu CSBB lab

Page 28: 2009 CSBB LAB 新生訓練

Structure Databases• PDB files

21/07/2009 28nthu CSBB lab

Page 29: 2009 CSBB LAB 新生訓練

Structure Databases

• Other formats– PDB format is not compatible with

modern database technology

– Internally, wwPDB uses• ORACLE for web-services

– Exchange formats– mmCIF – macromolecular

Crystallographic

Information File– XML – eXtended Mark-up Language

21/07/2009 29nthu CSBB lab

Page 30: 2009 CSBB LAB 新生訓練

Structure Databases• wwPDB front-ends

– Several front-ends that provide raw and derived data and links to other database for all PDB entries.

• RCSB (often, inaccurately, called “PDB”)• PDBe• PDBj• OCA• PDBsum (lots of derived information)• MMDB (integrated with all of NCBI’s databsae)• Jena Library

21/07/2009 30nthu CSBB lab

Page 31: 2009 CSBB LAB 新生訓練

Structure Databases• Is wwPDB enough?

– All proteins in the RCSB PDB are whole proteins or a part of proteins.

– However, something interesting to biologists are the relationship of basic protein unit, domains, not whole proteins.

– Q: How do you extract the domains from PDB?

21/07/2009 31nthu CSBB lab

Page 32: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• Structural alignment can be used to classify known (and new!) structures– SCOP (manual)– FSSP/DDD (automatic)– CATH (mixed)

21/07/2009 32nthu CSBB lab

Page 33: 2009 CSBB LAB 新生訓練

Structure Classification Databases• SCOP database

– Structural Classification Of Proteins (SCOP for short)

– It is created and organized by the University of Cambridge, UK.

– The SCOP database aims to provide a detailed and comprehensive description of the structural and functional relationships between all proteins whose structure is known.

– Proteins are classified to reflect both structural and evolutionary relatedness.

– Classification is done manually.

21/07/2009 33nthu CSBB lab

Page 34: 2009 CSBB LAB 新生訓練

Structure Classification Databases• SCOP database

– The basic classification is the protein domain.

– SCOP hierarchy

21/07/2009 34nthu CSBB lab

Page 35: 2009 CSBB LAB 新生訓練

Structure Classification Databases• SCOP database

– sunid, a new SCOP identifier, is simply a number which uniquely identifies each entry in the SCOP hierarchy, from root to leaves.

– sccs, a new set of concise classification string, is a compact representation of a SCOP domain classification, including only the most relevant levels-for class, fold, superfamily, family.

– For example, PDB entry 1g61, chain A.• sunid:

cl=53931,cf=55908,sf=55909,fa=55910,dm=55911, sp=55912,px=41126

• sccs: d.126.1.1

21/07/2009 35nthu CSBB lab

Page 36: 2009 CSBB LAB 新生訓練

Structure Classification Databases

Family: Clear evolutionary relationship Proteins are clustered together into families on the basis of one of two criteria that imply their having a common evolutionary origin.

Criteria 1: All proteins that have residue identities of 30% and greater.

Criteria 2: Proteins with lower sequence identities but whose functions and structures are very similar. For example, globins with sequence identities of 15%.

Family: Clear evolutionary relationship Proteins are clustered together into families on the basis of one of two criteria that imply their having a common evolutionary origin.

Criteria 1: All proteins that have residue identities of 30% and greater.

Criteria 2: Proteins with lower sequence identities but whose functions and structures are very similar. For example, globins with sequence identities of 15%.

Superfamily: Probable common evolutionary origin Families, whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placedtogether in superfamilies.

Example actin, the ATPase domain of the heat-shock protein and hexokinase

Superfamily: Probable common evolutionary origin Families, whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placedtogether in superfamilies.

Example actin, the ATPase domain of the heat-shock protein and hexokinase

Fold: Major Structural Similarity Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in same arrangement with the same topological connections.

Advantage There may, however, be cases where a common evolutionary origin is obscured by the extent of the divergence in sequence, structure and function. In these cases, it is possible that the discovery of new structures, with folds between those of the previously known structures, will make clear their common evolutionary relationship.

Fold: Major Structural Similarity Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in same arrangement with the same topological connections.

Advantage There may, however, be cases where a common evolutionary origin is obscured by the extent of the divergence in sequence, structure and function. In these cases, it is possible that the discovery of new structures, with folds between those of the previously known structures, will make clear their common evolutionary relationship.

Class(1)α-helical domains (2)β-sheet domains (3)α/β domains which consist of from "beta-alpha-beta" structural units or "motifs" that form mainly parallel β-sheets (4)α+β domains formed by independent α-helices and mainly antiparallel β-sheets (5)multi-domain proteins (for those with domains of different fold and for which no homologues are known at present)(6)membrane and cell surface proteins and peptides(7)small proteins (8)coiled-coil proteins (9)low-resolution protein structures (10)peptides and fragments (11)designed proteins of non-natural sequence

Class(1)α-helical domains (2)β-sheet domains (3)α/β domains which consist of from "beta-alpha-beta" structural units or "motifs" that form mainly parallel β-sheets (4)α+β domains formed by independent α-helices and mainly antiparallel β-sheets (5)multi-domain proteins (for those with domains of different fold and for which no homologues are known at present)(6)membrane and cell surface proteins and peptides(7)small proteins (8)coiled-coil proteins (9)low-resolution protein structures (10)peptides and fragments (11)designed proteins of non-natural sequence

Information comes from Murzin,A., Brenner,S.E., Hubbard,T.J.P. and Chothia,C. (1995) SCOP: a Structural Classification of Proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540and Wiki.

21/07/2009 36nthu CSBB lab

Page 37: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• All a: Secondary structure exclusively or almost exclusively of a-helical

21/07/2009 37nthu CSBB lab

Page 38: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• All b: Secondary structure exclusively or almost exclusively of b sheets

21/07/2009 38nthu CSBB lab

Page 39: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• a/b: helices and sheet assembled from b-a-b units

21/07/2009 39nthu CSBB lab

Page 40: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• a+b: a helices and b sheets separated in different parts of molecule. Absence of b-a-b motifs

21/07/2009 40nthu CSBB lab

Page 41: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• SCOP website glance

21/07/2009 41nthu CSBB lab

Page 42: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification– C = Class

• Mainly α, mainly β, mixed α/β, few SSEs

– A = Architecture• Overall domain shape, orientatioin but not

connectivity of SSEs

– T = Topology = fold– H = Homologous superfamily

• Groups proteins thought to share a common ancester

21/07/2009 42nthu CSBB lab

Page 43: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification– Lower levels sequence-based

• S = %SI ≥ 35%• O = %SI ≥ 60%• L = %SI ≥ 90%• I = %SI ≥ 100%

– D = domain• Individual domains for each I-level

21/07/2009 43nthu CSBB lab

Page 44: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification

21/07/2009 44nthu CSBB lab

Page 45: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification

21/07/2009 45nthu CSBB lab

Page 46: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification

21/07/2009 46nthu CSBB lab

Page 47: 2009 CSBB LAB 新生訓練

Structure Classification Databases

• CATH classification

21/07/2009 47nthu CSBB lab

Page 48: 2009 CSBB LAB 新生訓練

Structure – sequence relationship

• Two conserved sequencessimilar structures (sure)

• Two similar structuresconserved sequences?

Human Myoglobin pdb:2mm1

Human Hemoglobin alpha-chain pdb:1jebA

Sequence id: 27%

Structural id: 90%21/07/2009 48nthu CSBB lab

Page 49: 2009 CSBB LAB 新生訓練

Principles of Protein Structure

• Today's proteins reflect millions of years of evolution

• 3D structure is better conserved than sequence during evolution

• Similarities among sequences or among structures may reveal information about shared biological functions of a protein family

21/07/2009 49nthu CSBB lab

Page 50: 2009 CSBB LAB 新生訓練

Why structural alignment?• In evolutionary related proteins

structure is much better preserved than sequence

• Similar structures may predict similar biological function

• Getting inside into the protein folding

• Similar two structures is equal to a good superimposition.

21/07/2009 50nthu CSBB lab

Page 51: 2009 CSBB LAB 新生訓練

Structure superimposition• What is the best transformation that What is the best transformation that

superimposes the unicorn on the lion?superimposes the unicorn on the lion?

21/07/2009 51nthu CSBB lab

Page 52: 2009 CSBB LAB 新生訓練

Structure superimposition• This is not a good result….

21/07/2009 52nthu CSBB lab

Page 53: 2009 CSBB LAB 新生訓練

Structure superimposition• Good result:

21/07/2009 53nthu CSBB lab

Page 54: 2009 CSBB LAB 新生訓練

Structure superimposition• Find the transformation matrix that

best overlaps the table and the chair

• i.e. Find the transformation matrix that minimizes the root mean square deviation between corresponding points of the table and the chair

21/07/2009 54nthu CSBB lab

Page 55: 2009 CSBB LAB 新生訓練

Kinds of transformations• Rotation• Translation• Scaling• And more…

21/07/2009 55nthu CSBB lab

Page 56: 2009 CSBB LAB 新生訓練

Translation

X

Y

21/07/2009 56nthu CSBB lab

Page 57: 2009 CSBB LAB 新生訓練

Rotation

X

Y

21/07/2009 57nthu CSBB lab

Page 58: 2009 CSBB LAB 新生訓練

Scale

X

Y

21/07/2009 58nthu CSBB lab

Page 59: 2009 CSBB LAB 新生訓練

Correspondence is Unknown• Given two configurations of points

in the three dimensional space

+

21/07/2009 59nthu CSBB lab

Page 60: 2009 CSBB LAB 新生訓練

Correspondence is Unknown• Find those rotations and translations of

one of the point sets which produce “large” superimpositions of corresponding 3-D points

60

?

21/07/2009 nthu CSBB lab

Page 61: 2009 CSBB LAB 新生訓練

Correspondence is Unknown• Simple case – two closely related

proteins with the same number of amino acids.

61

Question:

how do we asses the quality of the transformation?

+

21/07/2009 nthu CSBB lab

Page 62: 2009 CSBB LAB 新生訓練

Scoring the Alignment• Two point sets: A={ai} i=1…n

B={bj} j=1…m• Pairwise Correspondence:

(ak1,bt1) (ak2,bt2)… (akN,btN)

• RMSD (Root Mean Square Distance)

Sqrt( Σ||aki – bti||2/N)

6221/07/2009 nthu CSBB lab

Page 63: 2009 CSBB LAB 新生訓練

Scoring the Alignment• Given two sets of 3-D points :

P={pi}, Q={qi} , i=1,…,n;

rmsd(P,Q) = √ i|pi - qi |2 /n

• Find a 3-D transformation T* such that:

rmsd( T*(P), Q ) = minT √ i|T(pi) - qi |2 /n

63Find the highest number of atoms aligned with the lowest RMSD

21/07/2009 nthu CSBB lab

Page 64: 2009 CSBB LAB 新生訓練

Matching of structures• Two structures A and B match iff:1. Correspondence:

There is a one-to-one map between their elements

2. Alignment:There exists a rigid-body transform T such that the RMSD between the elements in A and those in T(B) is less than some threshold .

21/07/2009 64nthu CSBB lab

Page 65: 2009 CSBB LAB 新生訓練

Matching of structures• Complete match

21/07/2009 65nthu CSBB lab

Page 66: 2009 CSBB LAB 新生訓練

Matching of structures• But a complete match is rarely

possible– The molecules have different sizes– Their shapes are only locally similar

Alignment of 3adk and 1gky

21/07/2009 66nthu CSBB lab

Page 67: 2009 CSBB LAB 新生訓練

Matching of structures

67

Notion of support σ of the match: the match is between σ(A) and σ(B) Dual problem: - What is the support? - What is the transform? Often several (many) possible supports Small supports motifs

21/07/2009 nthu CSBB lab

Page 68: 2009 CSBB LAB 新生訓練

Matching of structures• Mathematical Relative

f

g

||f g||2

s

Over which support?21/07/2009 68nthu CSBB lab

Page 69: 2009 CSBB LAB 新生訓練

Matching of structures• Multiple Partial Matches

21/07/2009 69nthu CSBB lab

Page 70: 2009 CSBB LAB 新生訓練

Matching of structures• Multiple Partial Matches

21/07/2009 70nthu CSBB lab

Page 71: 2009 CSBB LAB 新生訓練

Matching of structures• What is best?

B

A

B

A

Should gaps be penalized?

21/07/2009 71nthu CSBB lab

Page 72: 2009 CSBB LAB 新生訓練

Matching of structures• What about this?

B

A

Sequence along backbone is not preserved

21/07/2009 72nthu CSBB lab

Page 73: 2009 CSBB LAB 新生訓練

Matching of structures• Similarity measure is unlikely to

satisfy triangular inequality for partial match

21/07/2009 73nthu CSBB lab

Page 74: 2009 CSBB LAB 新生訓練

Scoring Issues• Trade-off between size of σ and RMSD• How should gaps be counted?• Is there a “quality” of the correspondence?

[The correspondence may, or may not, satisfy type and/or backbone sequence preferences]

• Should accessible surface be given more importance?

• Similarity measure may be different from the inverse of RSMD (though no consensus on best measure!)

• But RMSD is computationally very convenient!

21/07/2009 74nthu CSBB lab

Page 75: 2009 CSBB LAB 新生訓練

RMSD v.s. Similarity measure

2( )

max / 2( )

1

Ti T

i i

ANGAP

a T b

B

2

( )

1min ( )

| ( ) |T i ii T

a T bT

RMSD dissimilarity measure emphasizes differences smaller support

STRUCTAL’s similarity measure emphasizes similarities larger support

Gap penalty21/07/2009 75nthu CSBB lab

Page 76: 2009 CSBB LAB 新生訓練

Comparison of Similarity Measures

• A.C.M. May. Toward more meaningful hierarchical classification of amino acids scoring functions. Protein Engineering, 12:707-712, 1999reviews 37 protein structure similarity measures

• The difficulty of defining a similarity score is probably due to the facts that structure comparison is an ill-posed problem and has multiple solutions

21/07/2009 76nthu CSBB lab

Page 77: 2009 CSBB LAB 新生訓練

Bottom Line• Finding an optimal partial match is NP-

hard: No fast algorithm is guaranteed to give an optimal answer for any given measure [Godzik, 1996]

– Heuristic/approximate algorithms– Probably not a single solution, but application-

dependent solutions– But there exist general algorithmic principles

21/07/2009 77nthu CSBB lab

Page 78: 2009 CSBB LAB 新生訓練

Algorithms for structure superimposition

• Distance based methods– DALI (Holm and Sander): Aligning scalar distance plots– STRUCTAL (Gerstein and Levitt): Dynamic programming using

pairwise inter-molecular distances– SSAP (Orengo and Taylor): Dynamic programming using

intramolecular vector distances– MINAREA (Falicov and Cohen): Minimizing soap-bubble surface

area

• Vector based methods– VAST (Bryant): Graph theory based secondary structure alignment– 3dSearch (Singh and Brutlag): Fast secondary structure index

lookup

• Both vector and distance based– LOCK (Singh and Brutlag): Hierarchically uses both secondary

structure vectors and atomic distances

21/07/2009 78nthu CSBB lab

Page 79: 2009 CSBB LAB 新生訓練

Algorithms for structure superimposition

• Distance based methods– DALI (Holm and Sander): Aligning scalar distance plots– STRUCTAL (Gerstein and Levitt): Dynamic programming using

pairwise inter-molecular distances– SSAP (Orengo and Taylor): Dynamic programming using

intramolecular vector distances– MINAREA (Falicov and Cohen): Minimizing soap-bubble surface

area

• Vector based methods– VAST (Bryant): Graph theory based secondary structure alignment– 3dSearch (Singh and Brutlag): Fast secondary structure index

lookup

• Both vector and distance based– LOCK (Singh and Brutlag): Hierarchically uses both secondary

structure vectors and atomic distances

21/07/2009 79nthu CSBB lab

Page 80: 2009 CSBB LAB 新生訓練

Dali

An intra-molecular distance plot for myoglobin

21/07/2009 80nthu CSBB lab

Page 81: 2009 CSBB LAB 新生訓練

Dali• http://www.ebi.ac.uk/dali/ • Based on aligning 2-D intra-molecular

distance matrices• Computes the best subset of

corresponding residues from the two proteins such that the similarity between the 2-D distance matrices is maximized

• Searches through all possible alignments of residues using Monte-Carlo and branch-and-bound algorithms

21/07/2009 81nthu CSBB lab

Page 82: 2009 CSBB LAB 新生訓練

VAST

21/07/2009 82nthu CSBB lab

Page 83: 2009 CSBB LAB 新生訓練

VAST• http://www.ncbi.nih.gov/Structure/VAST/

vast.shtml• Aligns only secondary structure elements (SSE)• Represents each SSE as a vector• Finds all possible pairs of vectors from the two

structures that are similar• Uses a graph theory algorithm to find maximal

subset of similar vectors• Overall alignment score is based on the number

of similar pairs of vectors between the two structures

21/07/2009 83nthu CSBB lab

Page 84: 2009 CSBB LAB 新生訓練

Recommanded books

21/07/2009 84nthu CSBB lab

Page 85: 2009 CSBB LAB 新生訓練

Recommanded books

21/07/2009 85nthu CSBB lab

Page 86: 2009 CSBB LAB 新生訓練

Thank you for your attention!

21/07/2009 86nthu CSBB lab