Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)

Protein Bioinformatics CourseMatthew Betts & Rob RussellAG Russell (Protein Evolution)

Course overviewDay 1 - ModularityDay 2 - InteractionsDay 3 - Modularity & InteractionsDay 4 - StructureDay 5 - Structure & Interactions

Daily schedule10:00-11:00 lecture11:00-12:00 work on exercises in pairs12:00-13:00 lunch13:00-15:30 work on exercises in pairs16:00-17:00 presentations by you

Protein Sequence Databases

• Homologues = proteins with a common ancestor • Homology --> similar function• Sequence similarity --> homology

• Find homologues using:• BLAST• Profile Searching

Database Searching

www.proteinmodelportal.org

Scores and E-values

How similar is my sequence to one in the database?

How much would I expect to get >= this score by

chance alone?

• Alignment• Substitution matrix• Gap penalties

• cf. random sequences• E = 1: one such match by chance• E < 0.01: significant• Depends on database:

• size: larger = better• composition (random assumed)

Homology comes in two main types:

Orthology and Paralogy

What is the difference and why does this matter?

Paralogues

Duplication -

Paralogues

Orthologues

Speciation - - Speciation

Different FatesOrthologues:• Both copies required (one in each species)

• conservation of function (‘same gene’)• adaptation to new environment

Easier to transferknowledge of functionbetween orthologues

Paralogues:• Both copies useful

• conservation of function• One copy freed from selection

• disabled• new function

• Different parts of each free from selection• function split between them

Assignment of orthology / paralogy can be complicated by:• duplication preceding speciation• lineage-specific deletions of paralogs• complete genome duplications• many-to-one relationship• multi-domain proteins

Homology usually found by sequence similarity, but…proteins with dissimilar sequences can still be homologous

Betts, Guigo, Agarwal, Russell, EMBO J 2001

Proteins are modular

Since the early 1970s it has been observed that protein structures are divided into discrete elements or domains that appear to fold, function and evolve independently.

• Functional domains (Pfam, SMART, COGS, CDD, etc.)

• Intrinsic features– Signal peptide, transit peptides (signalP)– Transmembrane segments (TMpred, etc)– Coiled-coils (coils server)– Low complexity regions, disorder (e.g. SEG, disembl)

• Hints about structure?

Given a sequence, what should you look for?

“Low sequence complexity”(Linker regions? Flexible? Junk?

Signal peptide(secreted or membrane attached)

Transmembrane segment(crosses the membrane)

Tyrosine kinase (phosphorylates Tyr)

Immunoglobulin domains(bind ligands?)

SMART domain ‘bubblegram’ for human fibroblast growth factor (FGF) receptor 1(type P11362 into web site: smart.embl.de)

Given a sequence, what should you look for?

Protein Modularity

• discrete structural and functional units

• found in different combinations in different proteins

Receptor-related tyrosine-kinase

Non-receptor tyrosine-kinases

consider separately in predictions

Finding Protein Domains

• through partial matches to whole sequences:

• compare to databases of domains (Pfam, SMART, Interpro)

• can be separated by:• low-complexity and disordered regions (SEG)• trans-membrane regions (TMAP)• coiled-coils (COILS)

query sequence:

matchmatch

Repeat searches using each domain separately

12 000 domain alignments make sequence searching easier

WPP domain alignment

Alignments provide more information about a protein family and thus allow for more sensitive sequences than a single sequence.

Domain alignments also lack low-complexity or disorder (normally) and other domains that can make single sequence searches confusing.

Finding domains in a sequence

Cryptic domains:at the border of sequence

detectability

Gallego et al, Mol Sys Biol 2010

Identified using more sensitive fold recognition methods that use structure to help find weak members of sequence families.

If Pfam or SMART or similar do not find a domain, and the region is probably not disordered, then fold recognition might help.

Domain peptide interactions

Recognition of ligands or targeting signals

Post-translational modifications

3BP1_MOUSE/528-537 APTMPPPLPPPTN8_MOUSE/612-629 IPPPLPERTPSOS1_HUMAN/1149-1157 VPPPVPPRRRNCF1_HUMAN/359-390 SKPQPAVPPRPSAPEXE_YEAST/85-94 MPPTLPHRDWSH3-interacting motif PxxP

“perpetrator”

“instance”

“motif”

“victim”

Peptides interacting with a common domain often show a common pattern or motif usually 3-8 aas.

Linear motifs

Puntervol et al, NAR, 2003; www.elm.org (Eukaryotic Linear Motif DB)

Domains: large globular segments of the proteome that fold into discrete structures and belong in sequence families.

Linear motifs: small, non-globular segments that do not adopt a regular structure, and aren’t homologous to each other in the way domains are.

Motifs lie in the disordered part of the proteome.

Linear motifs versus domains

Intrinsically unstructured or disordered proteins or protein

fragments

Disorder predictors(IUPred, RONN, DisORPred,

Neduva & Russell, Curr. Opin. Biotech, 2006

Linear motif mediated interactions

are everywhere

Include motifs for:• Targeting – e.g. KDEL• Modifications – e.g.

phosphorylation• Signaling – e.g. SH3

About 200 are currentlyknown, likely many morestill to be discovered

Finding linear motifs in a sequence

Linear motifs are much harder to find than domains.

Long (>30 AA), belong to sequence families that help detect new

family members

Short (typically < 8AA), simple patterns, e.g. PxxP will occur in

most sequences randomly.

www.russelllab.org/wiki

Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)

Documents

Transcript of Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution)

Protein: A New Perspective · Selected Protein Requirements Malone AM, Russell MK. Nutrient Requirements. Pocket Guide to Nutrition Assessment Academy of Nutrition and Dietetics 2015,

Donta Betts Plea

DRS BETTS BETTS FORT WORTH GROCER CODRS BETTS BETTS FlnrsiGlans Surgeons ana Specialists L005 MAIN STREET DALLAS TEXAS The most widely end favorably tnown special- ists ¬ In the United

by Betts Nets - The finest in fishing tackle27 Hi Tider by Betts® by Betts® Betts® Blue The engineering, design and quality of these blue mono nets with heavy mono, more weights,

Floats - Betts Tackle

Thomas Betts

Machine learning-assisted directed protein …Machine learning-assisted directed protein evolution with combinatorial libraries Zachary Wua, S. B. Jennifer Kana, Russell D. Lewisb,

William Betts

Thomas & Betts-JT Packard

Mud Flap Hangers - BETTS

Betts Townsend Track Record

Protein Bioinformatics Course Matthew Betts & Rob Russell AG Russell (Protein Evolution) Course overview Day 1- Modularity Day 2- Interactions Day 3- Modularity.

Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and more interactions.

T&B Cable Tray - Thomas & Betts Canada · 2015-04-15T&B Cable Tray - Thomas & Betts Canada

Bj Betts - Bj Betts Custom Lettering Guide 2 - 2008

Thomas & betts LinkedIn PowerPoint

Betts and Coorg ornithology

Betts Brochure

Aimee Betts Biography - NCBPT€¦ · Aimee Betts Aimee Betts is a contemporary embroiderer who creates objects of painstaking detail and craftsmanship. The language of adornment

Heavy Ion Physics with CMS Russell Betts - UIC. Studying QCD with Heavy Ions Quark Gluon Plasma: –QCD at High T, High Density –Phase Diagram of QCD Strongly-Interacting.

Russell Group, Protein Evolution _____ Rob Russell Cell Networks University of Heidelberg Interactions and more interactions.