Pattern databases in protein analysis
description
Transcript of Pattern databases in protein analysis
![Page 1: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/1.jpg)
Pattern databases in protein analysis
Arthur Gruber
Instituto de Ciências Biomédicas
Universidade de São Paulo
AG-ICB-USP
![Page 2: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/2.jpg)
Protein databases• Genpept – protein sequence database
translated from GenBank• UniProtKB/TrEMBL – is a computer-annotated
protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase.
• UniProtKB/Swiss-Prot – is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases.
AG-ICB-USP
![Page 3: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/3.jpg)
How to assign protein functions? • Similar proteins may share common functions,
but… proteins that share common domains may have evolved to perform distinct functions
• Proteins that exert similar function may share common domains, but… domain sequences are not always very similar – more refined are requires than simply similarity searches
• Proteins may share common domains, but have different architectures – no single domain are necessarily involved with protein function. Many proteins use multiple domains to perform their activities AG-ICB-USP
![Page 4: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/4.jpg)
Some conclusions • Similarity searches may reveal proteins that
share very similar sequences and functions – high similarity over the full length of the query sequence
• An output with no significant hits or with hits to unannotated proteins will no unravel the possible function of the query protein
• Similarity searches do not differentiate orthologues from paralogues
• When matching multidomain proteins, it may not be appropriate to transfer the functional annotation – the context is important!
AG-ICB-USP
![Page 5: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/5.jpg)
So what do proteins with similar function have in
common?
AG-ICB-USP
![Page 6: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/6.jpg)
residues, motifs, domains, architecture…
AG-ICB-USP
![Page 7: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/7.jpg)
Pattern databases• Databases that contain patterns of residue
conservation within groups of related sequences
• There are several methods to determine patterns
• There are many different pattern databases
AG-ICB-USP
![Page 8: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/8.jpg)
Pattern databases
AG-ICB-USP
![Page 9: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/9.jpg)
Common protein pattern databases
AG-ICB-USP
• Prosite patterns – regular expressions• Prosite profiles – weight matrices (profiles)• Pfam – database of protein domain families.
Contains curated multiple sequence alignments for each family and corresponding HMMs
• Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function
• Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches
• Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource
![Page 10: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/10.jpg)
How to start building a pattern database?
AG-ICB-USP
• Prosite patterns – regular expressions• Prosite profiles – weight matrices (profiles)• Pfam – database of protein domain families.
Contains curated multiple sequence alignments for each family and corresponding HMMs
• Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function
• Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches
• Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource
![Page 11: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/11.jpg)
How to start building a pattern database?
AG-ICB-USP
![Page 12: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/12.jpg)
How to start building a pattern database?
AG-ICB-USP
With multiple sequence alignments of functionally related proteins
![Page 13: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/13.jpg)
Some definitions
AG-ICB-USP
• Protein motif – a single conserved region• Prosite pattern – a consensus expression of a
conserved region• Frequency matrices (PRINTS) – matrices that contain
the frequencies in which residures occur in a given motif
• PSSM – position specific score (weight) matrices (BLOCKS) –add a scoring scheme to the frequency matrices
• HMMs profiles – probabilistic models derived from alignment profiles
• Protein domain - is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain.
![Page 14: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/14.jpg)
AG-ICB-USP
![Page 15: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/15.jpg)
AG-ICB-USP
![Page 16: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/16.jpg)
AG-ICB-USP
![Page 17: Pattern databases in protein analysis](https://reader033.fdocuments.us/reader033/viewer/2022051821/56815d4f550346895dcb59b1/html5/thumbnails/17.jpg)
AG-ICB-USP