Structure databases, searches and alignments
description
Transcript of Structure databases, searches and alignments
![Page 1: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/1.jpg)
Structure databases, searches and alignments
Marian Novotny
[email protected] Molecular Bioinformatics X3
![Page 2: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/2.jpg)
Outline
1. Structure databases - why do we need them?
- types of structural databases
- Protein Data Bank
- other useful databases
2. Searches - text searches
3. Structure searches (alignments) - why?
- how ?
- comparison of available tools
![Page 3: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/3.jpg)
Structure databases
Why?
data tend to get lost
source of information for further analysis
better access to data by general public
validation of data is (sometimes) possible
![Page 4: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/4.jpg)
…. a structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information.
Oxford English dictionary
Database is…
…..a usually large collection of data organized especially for rapid search and retrieval (as by a computer)
Merriam-Webster Online
![Page 5: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/5.jpg)
Databases
Primary databasesAdded-value databases Derived databases
RCSB
MSD
PDBJ
NDB
CSD
OCA
PDBSum EDS
Whatcheck
Jena Image library
![Page 6: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/6.jpg)
ftp archive of flat files
![Page 7: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/7.jpg)
Primary databases
- repositories of experimental data of macromolecular structures (X-ray, NMR, electron microscopy…)
- RCSB (USA), MSD (Europe) and PDBJ (Japan) collaborate to form wwPDB. Data can be submitted to any of these databases. Databases interchange their new data on a regular basis, so they have an identical content.
- primary databases differ in presentation of data and the amount of extra services and links they provide
![Page 8: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/8.jpg)
The Protein Data Bank (PDB)
- established in 1971 by Walter Hamilton at Brookhaven National Laboratory
- seven structures were deposited at the beginning
- the database was distributed on magnetic tapes
- RCSB now run by the consortium of three institutions (San Diego Supercomputer Centre, Rutgers University and Centre for Avanced Reasearch and Biotechnology)
- 29326 structures (26.01.2005)
- distributed over internet
- released once a week
![Page 9: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/9.jpg)
![Page 10: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/10.jpg)
![Page 11: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/11.jpg)
![Page 12: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/12.jpg)
![Page 13: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/13.jpg)
![Page 14: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/14.jpg)
![Page 15: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/15.jpg)
![Page 16: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/16.jpg)
HEADER HYDROLASE 27-OCT-03 1UR9 TITLE INTERACTIONS OF A FAMILY 18 CHITINASE WITH THE DESIGNED TITLE 2 INHIBITOR HM508, AND ITS DEGRADATION PRODUCT, TITLE 3 CHITOBIONO-DELTA-LACTONE COMPND MOL_ID: 1; COMPND 2 MOLECULE: CHITINASE B; COMPND 3 CHAIN: A, B; COMPND 4 EC: 3.2.1.14; COMPND 5 ENGINEERED: YES; COMPND 6 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: SERRATIA MARCESCENS; SOURCE 3 STRAIN: BJL200; SOURCE 4 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 5 EXPRESSION_SYSTEM_STRAIN: DH5 ALPHA; SOURCE 6 OTHER_DETAILS: CLONED GENE KEYWDS CHITINASE, INHIBITION, LACTONE, CHITIN DEGRADATION, KEYWDS 2 HYDROLASE, GLYCOSIDASE EXPDTA X-RAY DIFFRACTION AUTHOR G.VAAJE-KOLSTAD,A.VASELLA,M.G.PETER,C.NETTER,D.R.HOUSTON, AUTHOR 2 B.WESTERENG,B.SYNSTAD,V.G.H.EIJSINK,D.M.F.VAN AALTEN REVDAT 1 27-APR-04 1UR9 0 JRNL AUTH G.VAAJE-KOLSTAD,A.VASELLA,M.G.PETER,C.NETTER, JRNL AUTH 2 D.R.HOUSTON,B.WESTERENG,B.SYNSTAD,V.G.H.EIJSINK JRNL AUTH 2 D.M.F.VAN AALTEN JRNL TITL INTERACTIONS OF A FAMILY 18 CHITINASE WITH THE JRNL TITL 2 DESIGNED INHIBITOR HM508 AND ITS DEGRADATION JRNL TITL 3 PRODUCT, CHITOBIONO-DELTA-LACTONE JRNL REF J.BIOL.CHEM. V. 279 3612 2004 JRNL REFN ASTM JBCHA3 US ISSN 0021-9258 REMARK 1 REMARK 1 REFERENCE 1 REMARK 1 AUTH D.M.F.VAN AALTEN,D.KOMANDER,B.SYNSTAD,S.GASEIDNES, REMARK 1 AUTH 2 M.G.PETER,V.G.H.EIJSINK REMARK 1 TITL STRUCTURAL INSIGHTS INTO THE CATALYTIC MECHANSIM OF REMARK 1 TITL 2 A FAMILY 18 EXOCHITINASE REMARK 1 REF PROC.NAT.ACAD.SCI.USA V. 98 8979 2001 REMARK 1 REFN ASTM PNASA6 US ISSN 0027-8424 REMARK 1 REFERENCE 2 REMARK 1 AUTH D.M.F.VAN AALTEN,B.SYNSTAD,M.B.BRURBERG,E.HOUGH, REMARK 1 AUTH 2 B.RIISE,V.G.H.EIJSINK,R.K.WIERENGA REMARK 1 TITL STRUCTURE OF A TWO-DOMAIN CHITOTRIOSIDASE FROM
PDB FILE
![Page 17: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/17.jpg)
12345678901234567890123456789012345678901234567890123456789012345678901234567890 1 2 3 4 5 6 7 8ATOM 340 N PHE A 43 3.853 28.346 32.161 1.00 10.57 N ATOM 341 CA PHE A 43 3.839 29.688 32.724 1.00 12.33 C ATOM 342 C PHE A 43 3.096 29.747 34.047 1.00 13.20 C ATOM 343 O PHE A 43 2.361 28.823 34.393 1.00 12.52 O ATOM 344 CB PHE A 43 3.228 30.659 31.700 1.00 10.99 C ATOM 345 CG PHE A 43 3.993 30.709 30.401 1.00 9.80 C ATOM 346 CD1 PHE A 43 3.743 29.794 29.386 1.00 9.85 C ATOM 347 CD2 PHE A 43 5.032 31.615 30.233 1.00 11.37 C ATOM 348 CE1 PHE A 43 4.528 29.781 28.220 1.00 10.71 C ATOM 349 CE2 PHE A 43 5.816 31.612 29.075 1.00 10.61 C ATOM 350 CZ PHE A 43 5.569 30.697 28.067 1.00 10.48 C
PDB file format
Atom number
Residue type
Residue number X,Y,Z coordinates
Temperature factor
Atom identifier
ChainOccupancy Atom type
![Page 18: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/18.jpg)
ATOM 340 N PHE A 43 3.853 28.346 32.161 1.00 10.57 N ATOM 341 CA PHE A 43 3.839 29.688 32.724 1.00 12.33 C ATOM 342 C PHE A 43 3.096 25.747 34.047 1.00 13.20 C ATOM 343 O PHE A 43 2.361 28.823 34.393 1.00 12.52 O ATOM 344 CB PHE A 43 3.228 30.659 31.700 1.00 10.99 C ATOM 345 CG PHE A 43 3.993 30.709 30.401 1.00 9.80 C ATOM 346 CD1 PHE A 43 4.743 29.794 29.386 1.00 9.85 C ATOM 347 CD2 PHE A 43 5.032 31.615 30.233 1.00 11.37 C ATOM 348 CE1 PHE A 43 4.528 32.781 28.220 1.00 10.71 C ATOM 349 CE2 PHE A 43 5.816 31.612 29.075 1.00 10.61 C
ATOM 350 CZ PHE A 43 5.569 30.697 28.067 1.00 10.48 C
12345678901234567890123456789012345678901234567890123456789012345678901234567890 1 2 3 4 5 6 7 8ATOM 340 N PHE A 43 3.853 28.346 32.161 1.00 10.57 N ATOM 341 CA PHE A 43 3.839 29.688 32.724 1.00 12.33 C ATOM 342 C PHE A 43 3.096 29.747 34.047 1.00 13.20 C ATOM 343 O PHE A 43 2.361 28.823 34.393 1.00 12.52 O ATOM 344 CB PHE A 43 3.228 30.659 31.700 1.00 10.99 C ATOM 345 CG PHE A 43 3.993 30.709 30.401 1.00 9.80 C ATOM 346 CD1 PHE A 43 3.743 29.794 29.386 1.00 9.85 C ATOM 347 CD2 PHE A 43 5.032 31.615 30.233 1.00 11.37 C ATOM 348 CE1 PHE A 43 4.528 29.781 28.220 1.00 10.71 C ATOM 349 CE2 PHE A 43 5.816 31.612 29.075 1.00 10.61 C ATOM 350 CZ PHE A 43 5.569 30.697 28.067 1.00 10.48 C
![Page 19: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/19.jpg)
PDB files - problems
- PDB format uses fixed-width fields, so one entry is limited to 99,999 atom records and chain identifier is limited to single character (not even for structures of huge complexes - e.g. ribosome and viruses)12345678901234567890123456789012345678901234567890123456789012345678901234567890 1 2 3 4 5 6 7 8ATOM 340 N PHE A 43 3.853 28.346 32.161 1.00 10.57 N
- parsing of PDB files difficult - apart from ATOM records the file is almost unstructured (e.g. no rules to describe structure determination in REMARKS records)
mmCIF and XML formats deal with these issues
![Page 20: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/20.jpg)
Trust PDB?
The database centres can’t refuse to accept any data! Even if curators of the PDB know the data contain serious errors.
So, PDB does contain a lot of errors - from sequence consistency errors (you’ll deal with them) to completely wrong folds.
And even the best data are still only the models that fit best experimental data.
Never trust the PDB!
![Page 21: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/21.jpg)
Trp D 67 7GPB
Do you find this Trp normal?
![Page 22: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/22.jpg)
Validation of structure files
- check statistics for bond lengths, angles, Ramachandran plots….
- do statistics look similar to those of other proteins?
WhatCheck, Procheck
- how well does the model fit experimental data?
EDS
![Page 23: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/23.jpg)
Electron Density Server
![Page 24: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/24.jpg)
PDBsum
![Page 25: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/25.jpg)
![Page 26: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/26.jpg)
PDBSum-Highlights
![Page 27: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/27.jpg)
![Page 28: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/28.jpg)
![Page 29: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/29.jpg)
Text searches in structural databases
Options:
PDB - SearchLite, SearchFields
MSD - MSDlight, MSDpro (Java), MSDmine
OCA
Find all the structures deposited by Gerard Kleywegt with resolution better than 2Å and published in Journal of Molecular Biology
![Page 30: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/30.jpg)
![Page 31: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/31.jpg)
Search Fields
![Page 32: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/32.jpg)
![Page 33: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/33.jpg)
![Page 34: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/34.jpg)
![Page 35: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/35.jpg)
![Page 36: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/36.jpg)
![Page 37: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/37.jpg)
Summary
- three major repositories of structural data: RCSB, MSD and PDBJ
-all three are part of wwPDB
-structural data are deposited in PDB files - problems
- new formats - mmCIF, XML
- validation tools are necessary - WHATCheck, EDS
- new services are developed to analyze the whole database (MSD services)
- searches at various levels of depth/complexity - Searchlite, Search Fields
- added-value databases - OCA, PDBSum
![Page 38: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/38.jpg)
Structural alignment
![Page 39: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/39.jpg)
Why structural alignment ?
we have sequence alignment - Clustal…
KTHLCV
KSHA-V that gives us an idea about a correspondence of amino acids of two (or more ) proteins
That enables to infer information about function
And evolution of the Protein
If the sequences are similar enough !!!!
![Page 40: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/40.jpg)
What is twilight zone ?
Sequence alignment unambiguously distinguishes only between protein pairs of similar structure and non-similar structures when the pairwise sequence identity is high.
High sequence identity roughly means over 40 %.
The signal gets blurred in the twilight zone of 20-35 % sequence identity.
![Page 41: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/41.jpg)
More of the twilight zone
More than 90 % sequence pairs with the sequence identity lower than 25 % have different structures.
Significance of sequence alignments is length dependent.
The longer the sequence the lower identity is required to be called significant.Nevertheless, it converges to 25% with alignments longer than 80 amino acids.
‘The more similar than identical’ rule can reduce a number of false positives.
Using intermediate sequences for finding links between more distant families can also reduce the number of false positives.
![Page 42: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/42.jpg)
How far can the sequence identity drop?
Average sequence identity of random alignments - 5.6 %
Average sequence identity of remote homologues 8.5 %
![Page 43: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/43.jpg)
How does it work?
From http://www.biochem.unizh.ch/antibody/Introduction/Institutsseminar97/source/slide2.htm
![Page 44: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/44.jpg)
Structural alignment because:
Structures are better conserved than sequences
structural alignment can imply a functional similarity that is not detectable from a sequence alignment .
Might help to improve sequence alignment when structures are available (phylogenetic studies, homology modeling).
Will improve sequence alignment methods (use of structural alignments’ substitution matrices, gap penalties).
Will improve sequence prediction methods
![Page 45: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/45.jpg)
1FWR_A -------------------------MKNWKTSAESILTTGPVVPVIVVKKLEHAVPMAKA2YPI_A ARTFFVGGNFKLNGSKQSIKEIVERLNTASIPENVEVVICPPATYLDYSVSLVKKPQVTV ::. . . : :. * .. : . * ...
1FWR_A LVAGGVRVLEVTLRTECAVDAIRAIAKEVPEAIVGAGTVLNPQQLAEVTE-------AGA2YPI_A GAQNAYLKASGAFTGENSVDQIKDVGAKWVILGHSERRSYFHEDDKFIADKTKFALGQGV . .. . :: * :** *: :. : . :: ::: *.
1FWR_A QFAISPGLTEPLLKAATEGTIPLIPGISTVSELMLGMDYGLKEFQFFPAEANGGVKALQA2YPI_A GVILCIGETLEEKKAGKTLDVVERQLNAVLEEVKDWTNVVVAYEPVWAIGTGLAATPEDA . :. * * **.. : :.:.*: : : .:. :. .... :*
1FWR_A IAGPFSQVRFCPKGGISPANYRDYLALKSVLCIGGSWLVPADALEAGDYDRITKLAREAV2YPI_A QDIHASIRKFLASKLGDKAASELRILYGGSANGSNAVTFKDKADVDGFLVGGASLKPEFV * :* .. . * . : . ..: . .* * :.* * *
1FWR_A EGAKL--2YPI_A DIINSRN
Structural versus sequence alignment
Sequence 1 ------------ART---FFVGGNFKLNG-SKQSI-KEIVERLNTASI--PENVEVVICP .=ALI |=ID | |.... .. ..... . ....|... . | ... Sequence 2 MKNWKTSAESIL--TTGP--VVPVI--VVKKLEHAVP-MAKALVAG-GVR-----V-LEV
Sequence 1 ------PATYLDYSVSLV-KKPQVTVGAQ-N--AY-LKASGAFTGEN-S---VDQIKDVG .=ALI |=ID ...........| . ..|||. . . . . . .| Sequence 2 TLRTECAVDAIRAIAKEVP-E--AIVGAGTVLN-PQ----------QLAEVT--E---AG
Sequence 1 AKWVILGH--SERRSYFHEDDKFIADKTKFALGQGVGVILCIGETLEEKKAGKTLDVVER .=ALI |=ID |...|. . .....|.|.......|..|. ... Sequence 2 AQFAIS-PGL-------------TEPLLKAATEGTIPLIPGIS--------------TVS
Sequence 1 QLNAV-LEEVKDW-TNVVVAYEP--VW--AIGTGLAATPEDA--QDI--HASI-RKFLA- .=ALI |=ID .|... . .. . .....| . . . . .. . . Sequence 2 ELMLGMD--YG-LK---EFQFFPAE-ANG-------G----VKA--LQA--IAG-P--FS
Sequence 1 SKLGDKAA-SELRILYGGSANGSN-AVTF---KDK-ADVDGFLVGGA-SLK--------- .=ALI |=ID . |....|... .. . . . ..|..... .. .. Sequence 2 -------QV---RFCPKGGIS-PANY--RDYL--ALKSVLCIGG-SWL-VPADALEAGDY
Sequence 1 --P--EFV--DIIN--SR-N .=ALI |=ID . . . . . .. Sequence 2 DRITKL-AREA--VEGAKL-
![Page 46: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/46.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS
Sequence versus structural alignment
![Page 47: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/47.jpg)
Is it difficult to make structural alignment?
Structural alignment is NP-hard (nondeterministic polynomial time) problem.
In other words, it is not tractable properly.
Even, if it would, the result would be correct from technical point of view not necessary from biological point of view.
Yes, it is.
![Page 48: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/48.jpg)
General solution
Use a heuristic approach:
1. Represent the proteins A and B in some coordinate independent space
2. Compare A and B
3. Optimize the alignment between A and B (e.g. minimize R.M.S.d.)
4. Measure the statistical significance of the alignment against some random set of structure comparisons
![Page 49: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/49.jpg)
“..in some coordinate independent space…”
Make the problem easier by:
- comparing only distance matrices of atoms
-comparing secondary
structure element (SSE)
- comparing cartoons
- comparing vectors of SSE
- combination of mentioned methods
- ….
![Page 50: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/50.jpg)
None of the methods guarantee the finding of the closest structure and two methods can disagree at all amino acid positions.
Nevertheless they can still provide a valuable insight into the history of the protein and give hints concerning the function.
![Page 51: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/51.jpg)
Server Location MethodCE http://cl.sdsc.edu Extension of optimal path1
DALI http://www2.ebi.ac.uk/dali Distance-matrix alignment2
DEJAVU http://portray.bmc.uu.se/cgi-bin/dennis/dejavu.pl SSE alignment with Catom optimisation3
LOCK http://gene.stanford.edu/LOCK/ Absolute orientation of corresponding points4
MATRAS http://bongo.lab.nig.ac.jp/~takawaba/Matras.html
Markov transition model of evolution5
PRIDE http://hydra.icgeb.trieste.it/pride/ CCatom distances6
SSM http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html Graph matching algorithm
TOP http://bioinfo1.mbfys.lu.se/TOP SSE alignment7
TOPS http:// tops.ebi.ac.uk/tops/compare1. html TOPS-diagram alignment8
TOPSCAN http://www.rubic.rdg.ac.uk/~andrew/bioinf.org/topscan
Secondary topology-string alignment9
VAST http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html
Vector alignment10
Methods for fold comparison
![Page 52: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/52.jpg)
Protein structure classification
If you want to know which structures are similar to a known structure, these systems might help:
A) Manual - SCOP
B) Semi-automatic - CATH
C) Automatic - FSSP
![Page 53: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/53.jpg)
![Page 54: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/54.jpg)
![Page 55: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/55.jpg)
CATHCATHhttp://www.biochem.ucl.ac.uk/bsm/cath
CATH Topology or fold group level From C. Orengo talk at EMBO course, Cambridge 2004
![Page 56: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/56.jpg)
TIM barrel enzymes – 18 different homologous families
>60 different E.C. numbers
EC Wheel of TIM barrelsStructure of TIM barrel:Triose phosphate isomerase
From J. Thornton talk at EMBO course, Cambridge 2004
![Page 57: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/57.jpg)
Rossmann Fold
Jelly Roll
Alpha/Beta Plaits
Arc repressor-like
OB Fold
CATHCATH
Rossmann
Alpha-beta plait TIM barrelJelly Roll
Immunoglobulin
OB fold
SH3-like
Up-down
Arc repressor-like
nearly one third of the
superfamilies belong to <10 fold
groups
From C. Orengo talk at EMBO course, Cambridge 2004
![Page 58: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/58.jpg)
TargetDB
contains 84063 sequences annotated like:-hypothetical protein Af0491 from A. fulgidus
- putative serine hydrolase from S.cerevisiae
-predicted glutamine amidotransferase from P. aeruginosa (January 2005)
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
http://targetdb.pdb.org/
PDB contains about 500 structures with a similar degree of confidence in functional assignment
Hypothetical Protein Mth938 (PDB ID:1ihn)
![Page 59: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/59.jpg)
Function from structure
![Page 60: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/60.jpg)
Fold and structural motifs
SSM fold search
Surface clefts
Residueconservation
DNA-bindingHTH motifs
Nest analysis
Sequence motifs(PROSITE, BLOCKS,SMART, Pfam, etc)
Sequence scans
Sequence searchvs PDB
Sequence searchvs Uniprot
Superfamily HMMlibrary
Gene neighbours
n-residue templates
Enzyme active sites
Ligand binding sites
DNA binding sites
Reverse templates
![Page 61: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/61.jpg)
Summary
Structural alignment can help with protein annotations even when the sequence similarity is not significant.
Sequence identity of two proteins with similar structures can be lower than 10 % - number of folds is limited.
Recent progress in protein structure determination increases the usefulness of structural alignment.
Structural alignment is difficult problem that is solved by heuristic methods.
These methods simplify the problem and sacrifice the optimum result for the speed.
![Page 62: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/62.jpg)
Summary II
Different methods can provide completely different alignments.
In our results, CE, Dali,Matras and Vast were the best servers for finding structural relatives.
A few structural classification systems have been developed (CATH, FSSP, SCOP), they provide hierarchical classification of protein structures and enable to infer functional and evolutionary relationships between proteins.
Folds are not distributed equally. Ten most frequent folds represent almost one third of all structures.
![Page 63: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/63.jpg)
![Page 64: Structure databases, searches and alignments](https://reader035.fdocuments.us/reader035/viewer/2022081519/568143d5550346895db06247/html5/thumbnails/64.jpg)