Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)
description
Transcript of Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)
![Page 1: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/1.jpg)
Protein Bioinformatics CourseMatthew Betts & Rob RussellAG Russell (Protein Evolution)
Course overviewDay 1 - ModularityDay 2 - InteractionsDay 3 - Modularity & InteractionsDay 4 - StructureDay 5 - Structure & Interactions
Daily schedule10:00-11:00 lecture11:00-12:00 work on exercises in pairs12:00-13:00 lunch13:00-15:30 work on exercises in pairs16:00-17:00 presentations by you
![Page 2: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/2.jpg)
Protein Sequence Databases
![Page 3: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/3.jpg)
• Homologues = proteins with a common ancestor • Homology --> similar function• Sequence similarity --> homology
• Find homologues using:• BLAST• Profile Searching
Database Searching
www.proteinmodelportal.org
![Page 4: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/4.jpg)
![Page 5: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/5.jpg)
Scores and E-values
How similar is my sequence to one in the database?
How much would I expect to get >= this score by
chance alone?
• Alignment• Substitution matrix• Gap penalties
• cf. random sequences• E = 1: one such match by chance• E < 0.01: significant• Depends on database:
• size: larger = better• composition (random assumed)
![Page 6: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/6.jpg)
Homology comes in two main types:
Orthology and Paralogy
What is the difference and why does this matter?
![Page 7: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/7.jpg)
Paralogues
Duplication -
Duplication -
Paralogues
Orthologues
Speciation - - Speciation
![Page 8: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/8.jpg)
Different FatesOrthologues:• Both copies required (one in each species)
• conservation of function (‘same gene’)• adaptation to new environment
Easier to transferknowledge of functionbetween orthologues
Paralogues:• Both copies useful
• conservation of function• One copy freed from selection
• disabled• new function
• Different parts of each free from selection• function split between them
![Page 9: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/9.jpg)
Assignment of orthology / paralogy can be complicated by:• duplication preceding speciation• lineage-specific deletions of paralogs• complete genome duplications• many-to-one relationship• multi-domain proteins
![Page 10: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/10.jpg)
Homology usually found by sequence similarity, but…proteins with dissimilar sequences can still be homologous
Betts, Guigo, Agarwal, Russell, EMBO J 2001
![Page 11: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/11.jpg)
Proteins are modular
Since the early 1970s it has been observed that protein structures are divided into discrete elements or domains that appear to fold, function and evolve independently.
![Page 12: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/12.jpg)
• Functional domains (Pfam, SMART, COGS, CDD, etc.)
• Intrinsic features– Signal peptide, transit peptides (signalP)– Transmembrane segments (TMpred, etc)– Coiled-coils (coils server)– Low complexity regions, disorder (e.g. SEG, disembl)
• Hints about structure?
Given a sequence, what should you look for?
![Page 13: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/13.jpg)
“Low sequence complexity”(Linker regions? Flexible? Junk?
Signal peptide(secreted or membrane attached)
Transmembrane segment(crosses the membrane)
Tyrosine kinase (phosphorylates Tyr)
Immunoglobulin domains(bind ligands?)
SMART domain ‘bubblegram’ for human fibroblast growth factor (FGF) receptor 1(type P11362 into web site: smart.embl.de)
Given a sequence, what should you look for?
![Page 14: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/14.jpg)
Protein Modularity
• discrete structural and functional units
• found in different combinations in different proteins
Receptor-related tyrosine-kinase
Non-receptor tyrosine-kinases
consider separately in predictions
![Page 15: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/15.jpg)
Finding Protein Domains
• through partial matches to whole sequences:
• compare to databases of domains (Pfam, SMART, Interpro)
• can be separated by:• low-complexity and disordered regions (SEG)• trans-membrane regions (TMAP)• coiled-coils (COILS)
query sequence:
matchmatch
match
Repeat searches using each domain separately
![Page 16: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/16.jpg)
12 000 domain alignments make sequence searching easier
WPP domain alignment
Alignments provide more information about a protein family and thus allow for more sensitive sequences than a single sequence.
Domain alignments also lack low-complexity or disorder (normally) and other domains that can make single sequence searches confusing.
![Page 17: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/17.jpg)
Finding domains in a sequence
![Page 18: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/18.jpg)
Cryptic domains:at the border of sequence
detectability
Gallego et al, Mol Sys Biol 2010
Identified using more sensitive fold recognition methods that use structure to help find weak members of sequence families.
If Pfam or SMART or similar do not find a domain, and the region is probably not disordered, then fold recognition might help.
![Page 19: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/19.jpg)
Domain peptide interactions
Recognition of ligands or targeting signals
Post-translational modifications
![Page 20: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/20.jpg)
3BP1_MOUSE/528-537 APTMPPPLPPPTN8_MOUSE/612-629 IPPPLPERTPSOS1_HUMAN/1149-1157 VPPPVPPRRRNCF1_HUMAN/359-390 SKPQPAVPPRPSAPEXE_YEAST/85-94 MPPTLPHRDWSH3-interacting motif PxxP
“perpetrator”
“instance”
“motif”
“victim”
Peptides interacting with a common domain often show a common pattern or motif usually 3-8 aas.
Linear motifs
Puntervol et al, NAR, 2003; www.elm.org (Eukaryotic Linear Motif DB)
![Page 21: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/21.jpg)
Domains: large globular segments of the proteome that fold into discrete structures and belong in sequence families.
Linear motifs: small, non-globular segments that do not adopt a regular structure, and aren’t homologous to each other in the way domains are.
Motifs lie in the disordered part of the proteome.
Linear motifs versus domains
![Page 22: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/22.jpg)
Intrinsically unstructured or disordered proteins or protein
fragments
![Page 23: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/23.jpg)
Disorder predictors(IUPred, RONN, DisORPred,
etc)
![Page 24: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/24.jpg)
Neduva & Russell, Curr. Opin. Biotech, 2006
Linear motif mediated interactions
are everywhere
Include motifs for:• Targeting – e.g. KDEL• Modifications – e.g.
phosphorylation• Signaling – e.g. SH3
About 200 are currentlyknown, likely many morestill to be discovered
![Page 25: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/25.jpg)
Finding linear motifs in a sequence
Linear motifs are much harder to find than domains.
Long (>30 AA), belong to sequence families that help detect new
family members
Short (typically < 8AA), simple patterns, e.g. PxxP will occur in
most sequences randomly.
![Page 26: Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681439c550346895db01a3a/html5/thumbnails/26.jpg)
www.russelllab.org/wiki