Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.

30
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes

Transcript of Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.

Phylogenetics workshop:Protein sequence phylogeny

week 2

Darren Soanes

• Species trees• Interpretation of trees• Taxon sampling• Tools• Lateral (horizontal) gene transfer• Fast evolving genes

Using DNA sequence to construct trees

TGCTATT TGCTTTT TGCTTTT

TGCTATT – ancestral DNA sequence

TGCTTTT – sequence change due to mutation

Reversals can confuse phylogeniesTGCTATT TGCTATTTGCTTTT TGCTTTT TGCTTTT

TGCTATT – ancestral DNA sequence

TGCTTTT – sequence change

TGCTATTreversal

To minimise the effect of reversals

• Use DNA sequences that are evolving slowly – mutations happen rarely.

• Use long stretches of DNA.• Align sequences, use the parts of the

alignment that show a high degree of conservation.

• rDNA sequences (genes that encode ribosomal RNA) are often used.

Species tree constructed using ribosomal DNA (rDNA) sequence

Using protein sequences to create species trees

• Advantages– protein sequences evolve more slowly than DNA

sequences (many DNA mutations are neutral – they do not change amino acid sequences)

– reversals are less common than in DNA

• Single copy protein encoding genes identified• Protein sequences joined together to create a

multiple protein sequence for each species• Sequences aligned • Disadvantage – need sequenced genomes

basidiomycetes

ascomycetes

filamentous ascomycetes

yeasts

zygomycete

30 proteins

60 proteins

Fungal species trees – more proteins = better resolutionoomycete (not fungi)

microsporidia

plant

Fungal Species Tree (based on 153 concatenated protein sequences)

Clades

A clade consists of an ancestor organism and all its descendants.

Gene trees

• The evolutionary history of genes can be represented as phylogenetic trees based on alignment of protein sequences.

• Gene duplication and loss can be inferred from phylogenetic trees.

• Protein sequences evolve more slowly that DNA sequences (due to redundancy in genetic code)

Gene duplication

• Gene duplication due to unequal crossing over during meiosis can create gene families.

• Sequence and function of different members of a gene family can diverge.

Gene duplication

Sequence homology (1)

• Genes are said to be homologous if they share a common evolutionary ancestor.

• Orthologues are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologues retain the same function in the course of evolution. (e.g. myoglobin in mammals).

Sequence homology (2)

• Paralogous genes are related by duplication within a genome. Paralogues often evolve new functions, even if these are related to the original one.

• In-paralogues, paralogues that were duplicated after a speciation and are therefore in the same species

• Out-paralogues, paralogues that were duplicated before a speciation. Not necessarily in the same species.

Orthology and paralogy

Paralogues

In-paraloguesOut-paralogues

A, B and C are different species

α and β are different paralogues of the same gene

Evolution of globin superfamily in human lineage

TOR gene duplication events in fungi

TOR: protein kinase, subunit of a complex that regulate cell growth in response to nutrient availability and cellular stresses

Taxon sampling methods

• BLAST easiest – though subjective• Occurence of Pfam (protein family) motif• Clustering e.g.

– INPARANOID http://inparanoid.sbc.su.se/cgi-bin/index.cgi

– orthoMCL http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Minimum bootstrap

• 70% bootstrap is thought to be broadly similar to P-value 0.05

• Minimum bootstrap used depends on study• To improve bootstrap support

– remove poorly aligned sequences if possible, can be due to mis-annotation of genomes.

– Change taxon sampling

Collapse branches with bootstrap less than defined value

Lateral gene transfer (purine-cytosine permease)

oomycete

fungi

Eukaryotic Tree of Life

Phytophthora sojae

Aspergillus oryzae

Genes that evolve quickly (1)

• Synonymous substitution – change in DNA sequence that does not affect the amino acid sequence, often in the third position of a codon, e.g. CCG (Pro)→CCA (Pro).

• Non-synonymous substitution - change in DNA sequence that does affect the amino acid sequence, often in the first or second position of a codon, e.g. CCG (Pro)→CAG (Gln).

Genes that evolve quickly (2)

• For a given protein encoding gene (comparison between orthologues in more than one species)

• dN=number of non-synonomous mutations• dS=number of synonomous mutations• We can calculate the ratio dN/dS.• For most genes this is < 1• Genes under evolutionary pressure to change protein

sequence (diversify), dN/dS > 1

Genes that evolve quickly (3)

• CodeML (part of the PAML package) will calculate dN/dS for a set of orthologues from different (closely related) species.

• Human vs Chimpanzee – rapidly evolving genes involved in immunity, reproduction and olfaction (smell).

• Genes with very low dN/dS (under purifying selection) involved in metabolism, intracellular signalling, nerve / brain function.