Phylogenetics workshop: Protein sequence phylogeny week 2

30
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes

description

Phylogenetics workshop: Protein sequence phylogeny week 2. Darren Soanes. Species trees Interpretation of trees Taxon sampling Tools Lateral (horizontal) gene transfer Fast evolving genes. Using DNA sequence to construct trees. TGCT A TT. TGCT T TT. TGCT T TT. - PowerPoint PPT Presentation

Transcript of Phylogenetics workshop: Protein sequence phylogeny week 2

Page 1: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Phylogenetics workshop:Protein sequence phylogeny

week 2

Darren Soanes

Page 2: Phylogenetics  workshop: Protein sequence  phylogeny week 2

• Species trees• Interpretation of trees• Taxon sampling• Tools• Lateral (horizontal) gene transfer• Fast evolving genes

Page 3: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Using DNA sequence to construct trees

TGCTATT TGCTTTT TGCTTTT

TGCTATT – ancestral DNA sequence

TGCTTTT – sequence change due to mutation

Page 4: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Reversals can confuse phylogeniesTGCTATT TGCTATTTGCTTTT TGCTTTT TGCTTTT

TGCTATT – ancestral DNA sequence

TGCTTTT – sequence change

TGCTATTreversal

Page 5: Phylogenetics  workshop: Protein sequence  phylogeny week 2

To minimise the effect of reversals

• Use DNA sequences that are evolving slowly – mutations happen rarely.

• Use long stretches of DNA.• Align sequences, use the parts of the

alignment that show a high degree of conservation.

• rDNA sequences (genes that encode ribosomal RNA) are often used.

Page 6: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Species tree constructed using ribosomal DNA (rDNA) sequence

Page 7: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Using protein sequences to create species trees

• Advantages– protein sequences evolve more slowly than DNA

sequences (many DNA mutations are neutral – they do not change amino acid sequences)

– reversals are less common than in DNA• Single copy protein encoding genes identified• Protein sequences joined together to create a

multiple protein sequence for each species• Sequences aligned • Disadvantage – need sequenced genomes

Page 8: Phylogenetics  workshop: Protein sequence  phylogeny week 2

basidiomycetes

ascomycetes

filamentous ascomycetes

yeasts

zygomycete

30 proteins

60 proteins

Fungal species trees – more proteins = better resolutionoomycete (not fungi)

microsporidia

plant

Page 9: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Fungal Species Tree (based on 153 concatenated protein sequences)

Page 10: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Clades

A clade consists of an ancestor organism and all its descendants.

Page 11: Phylogenetics  workshop: Protein sequence  phylogeny week 2
Page 12: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Gene trees

• The evolutionary history of genes can be represented as phylogenetic trees based on alignment of protein sequences.

• Gene duplication and loss can be inferred from phylogenetic trees.

• Protein sequences evolve more slowly that DNA sequences (due to redundancy in genetic code)

Page 13: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Gene duplication

• Gene duplication due to unequal crossing over during meiosis can create gene families.

• Sequence and function of different members of a gene family can diverge.

Page 14: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Gene duplication

Page 15: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Sequence homology (1)

• Genes are said to be homologous if they share a common evolutionary ancestor.

• Orthologues are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologues retain the same function in the course of evolution. (e.g. myoglobin in mammals).

Page 16: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Sequence homology (2)• Paralogous genes are related by duplication within a

genome. Paralogues often evolve new functions, even if these are related to the original one.

• In-paralogues, paralogues that were duplicated after a speciation and are therefore in the same species

• Out-paralogues, paralogues that were duplicated before a speciation. Not necessarily in the same species.

Page 17: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Orthology and paralogy

Page 18: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Paralogues

In-paraloguesOut-paralogues

A, B and C are different species

α and β are different paralogues of the same gene

Page 19: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Evolution of globin superfamily in human lineage

Page 20: Phylogenetics  workshop: Protein sequence  phylogeny week 2

TOR gene duplication events in fungi

TOR: protein kinase, subunit of a complex that regulate cell growth in response to nutrient availability and cellular stresses

Page 21: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Taxon sampling methods

• BLAST easiest – though subjective• Occurence of Pfam (protein family) motif• Clustering e.g.

– INPARANOID http://inparanoid.sbc.su.se/cgi-bin/index.cgi

– orthoMCL http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Page 22: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Minimum bootstrap

• 70% bootstrap is thought to be broadly similar to P-value 0.05

• Minimum bootstrap used depends on study• To improve bootstrap support

– remove poorly aligned sequences if possible, can be due to mis-annotation of genomes.

– Change taxon sampling

Page 23: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Collapse branches with bootstrap less than defined value

Page 24: Phylogenetics  workshop: Protein sequence  phylogeny week 2
Page 25: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Lateral gene transfer (purine-cytosine permease)

oomycete

fungi

Page 26: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Eukaryotic Tree of Life

Phytophthora sojae

Aspergillus oryzae

Page 27: Phylogenetics  workshop: Protein sequence  phylogeny week 2
Page 28: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Genes that evolve quickly (1)

• Synonymous substitution – change in DNA sequence that does not affect the amino acid sequence, often in the third position of a codon, e.g. CCG (Pro)→CCA (Pro).

• Non-synonymous substitution - change in DNA sequence that does affect the amino acid sequence, often in the first or second position of a codon, e.g. CCG (Pro)→CAG (Gln).

Page 29: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Genes that evolve quickly (2)

• For a given protein encoding gene (comparison between orthologues in more than one species)

• dN=number of non-synonomous mutations• dS=number of synonomous mutations• We can calculate the ratio dN/dS.• For most genes this is < 1• Genes under evolutionary pressure to change protein

sequence (diversify), dN/dS > 1

Page 30: Phylogenetics  workshop: Protein sequence  phylogeny week 2

Genes that evolve quickly (3)

• CodeML (part of the PAML package) will calculate dN/dS for a set of orthologues from different (closely related) species.

• Human vs Chimpanzee – rapidly evolving genes involved in immunity, reproduction and olfaction (smell).

• Genes with very low dN/dS (under purifying selection) involved in metabolism, intracellular signalling, nerve / brain function.