Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods...

60
Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems homoplasy hybridisation Dr. Sean Graham, UBC.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods...

Page 1: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny reconstruction How do we reconstruct the tree of life?

Outline:

Terminology

Methods

distance

parsimony

maximum likelihood

bootstrapping

Problems

homoplasy

hybridisation

Dr. Sean Graham, UBC.

Page 2: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogenetic reconstruction

Page 3: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

•Rooted trees

Phylogenetic reconstruction

Page 4: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

•Rooted trees

Outgroup:

Phylogenetic reconstruction

Page 5: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogenetic reconstructionIntroduction

Page 6: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Bir

ds

Cro

co

dile

s

Tu

rtle

s

Am

ph

ibia

ns

Ma

mm

als

Liz

ard

s

Sn

ake

s

Tu

rtle

s

Am

ph

ibia

ns

Ma

mm

als

Liz

ard

s

Sn

ake

s

Cro

co

dile

s

Bir

ds

Understanding Trees

Page 7: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Do these phylogenies agree?

Figure 14.17

Page 8: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Branch lengths

A

B

C

D

A

B

C

D

1 nt change

Page 9: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Understanding Trees

A B C D EMonophyletic

A B C D E

Paraphyletic

A B C D E

Polyphyletic

Trees can be used to describe taxonomic groups

Page 10: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

What is the relationship between taxonomic names and phylogenetic

groups?

Bir

ds

Cro

co

dile

s

Tu

rtle

s

Am

ph

ibia

ns

Ma

mm

als

Liz

ard

s

Sn

ake

sAmnion

Amniotes

Page 11: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

What is the relationship between taxonomic names and phylogenetic

groups?

Bir

ds

Cro

co

dile

s

Tu

rtle

s

Liz

ard

s

Sn

ake

s

Cold Blooded

Reptiles

Page 12: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

What is the relationship between taxonomic names and phylogenetic

groups?

Bir

ds

Cro

co

dile

s

Tu

rtle

s

Am

ph

ibia

ns

Ro

de

nts

Liz

ard

s

Sn

ake

s

Wings

Ba

ts

Page 13: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Polyphyletic example: Amentiferae

Page 14: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Polyphyletic example: Amentiferae

Ancestor with separate flowers

Willows WalnutsOaks

Evolution of catkins

Page 15: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Vertebrate Phylogeny

Are these groups monophyletic, paraphyletic or polyphyletic?

fish?

tetrapods? (= four limbed)

amphibians?

mammals?

ectotherms (= warm blooded)?

Page 16: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Constructing Trees

Methods:

distance (UPGMA, Neighbor joining)

parsimony

maximum likelihood (Bayesian)

Page 17: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Distance Methods (phenetics)

Page 18: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Distance methods rely on clustering algorithms (e.g. UPGMA)

Trait 1T

rait

2

AB

C

D

Distance matrix

A B C D

A 1.0 3.0 4.9

B 3.3 3.0

C 3.0

D

Example 1: morphology

Page 19: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

UPGMA

Trait 1T

rait

2

AB

C

D

Distance matrix

A B C D

A 1.0 3.0 4.9

B 3.3 3.0

C 3.0

D

A

B

Example 1: morphology

Page 20: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

UPGMA

Trait 1T

rait

2

AB

C

D

Distance matrix

A B C D

A 1.0 3.0 4.9

B 3.3 3.0

C 3.0

D

A

B

C

D

Example 1: morphology

Page 21: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Distance matrix

A B C D

A 1 3 5

B 3 7

C 7

D

Distance methods with sequence data

A: ATTGCAATCGG

B: ATTACGATCGG

C: GTTACAACCGG

D: CTCGTAGTCGA

A

B

Page 22: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

New Distance matrix: take averages

AB C D

AB 3 6

C 7

D

Distance methods with sequence data

A

B

A B C D

A 1 3 5

B 3 7

C 7

D

Page 23: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

AB C D

AB 3 6

C 7

D

Distance methods with sequence data

A

B

A B C D

A 1 3 5

B 3 7

C 7

D

C

A

BC

D

Page 24: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

AB C D

AB 3 6

C 7

D

Distance methods with sequence data

A

B

A B C D

A 1 3 5

B 3 7

C 7

D

C

A

BC

D

Page 25: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Assumptions of distance methods

Page 26: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Strengths and weaknesses of distance methods

Page 27: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

II. Parsimony Methods (Cladistics)

Hennig (German entomologist) wrote in 1966

Translated into English in 1976: very influential

Page 28: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Applying parsimony

• Consider four taxa (1-4) and four characters (A-D)

• Ancestral state: abcd

A B C D

1 a’ b c d

2 a’ b’ c d’

3 a’ b’ c’ d

4 a’ b’ c d

Trait

Ta

xon

Page 29: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Applying parsimony

• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd

A B C D

1 a’ b c d

2 a’ b’ c’ d’

3 a’ b’ c’ d

4 a’ b’ c d

Trait

Ta

xon

1 2 3 4

a’bcd a’b’c’d’ a’b’c’d a’b’cd

a’

d’

c’

b

Unique changes

Convergences or reversals

b’

5 steps

abcd

Page 30: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Applying parsimony

• Consider four taxa (1-4) and four characters (A-D)• Ancestral state: abcd

A B C D

1 a’ b c d

2 a’ b’ c’ d’

3 a’ b’ c’ d

4 a’ b’ c d

Trait

Ta

xon

1 4 3 2

a’bcd a’b’cd a’b’c’da’b’c’d’

a’

d’

c’

Unique changes

Convergences or reversals

b’

4 steps

abcd

Page 31: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Strengths and weaknesses of parsimony

Strengths

Weaknesses

.

Page 32: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Parsimony practicePosition

Taxon 1234567K AGTACCGL AAGACTAM AACCTTAN AAAGTTA

Which unrooted tree is most parsimonious?

L

M

N

K

L

K N

M

N

L

M

K

Plot each change on each tree. Positions 1 and 2 are done.

Which positions help to determine relationships?

22

2

Page 33: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Inferring the direction of evolution

Chimp

Human

Gorilla

Bonobo

Orangutan

Mouse

ACGCTAGCTACG

ACGCTAGCTACG

ACGCTAGCTAGG

ACGCTAGCTAGG

ACGCTAGCTAGG

ACGCTAGCTAGGWhere did the mutation occur, and what was the change?

Page 34: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

III. Maximum likelihood (and Bayesian)

Page 35: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Maximum likelihood: a starting sketch

• Probabilities – transition: 0.2 transversion: 0.1 no change 0.7

A

CT

GTransitions

Tra

nsv

ersi

on

s

A

T

A

G

G

C

A

G

G

A

A

C

G

G

G

A

G

G

G

G

Find the tree with the highest probability

Page 36: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Maximum likelihood: a starting sketch

• Probabilities – transition: 0.2 transversion: 0.1 no change 0.7

A

CT

GTransitions

Tra

nsv

ersi

on

s

A

T

A

G

G

C

A

G

G

A

A

C

G

G

G

A

G

G

G

G

A

T

G

G

G

A

T

A

G

G

Find the tree with the highest probability

P = (.7)(.1)(.2)(.7)(.7)

Page 37: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Maximum likelihood: a starting sketch

• Probabilities – transition: 0.2 transversion: 0.1 no change 0.7

A

CT

GTransitions

Tra

nsv

ersi

on

s

A

T

A

G

G

C

A

G

G

A

A

C

G

G

G

A

G

G

G

G

A

T

A

G

G

A

A

G

G

G

A

A

G

G

A

C

A

G

G

A

P = (.7)(.1)(.2)(.7)(.7)

P = (.7)(.1)(.7)(.7)(.7)

P = (.1)(.2)(.7)(.7)(.2)Find the tree with the highest probability

Page 38: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Assessment of Maximum Likelihood (also Bayesian)

• Strengths

• Weaknesses

Page 39: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Characters to use in phylogeny

• Morphology

• DNA sequence

Page 40: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Challenges of using DNA data

Alignment can be very challenging!

Taxon 1 AATGCGCTaxon 2 AATCGCT

Taxon 1 AATGCGCTaxon 2

Page 41: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Informative sequences evolve at moderate rates

• Too slow?– not enough variation– Taxon 1 AATGCGC– Taxon 2 AATGCGC– Taxon 3 AATGCGC

Polytomy

Page 42: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Example of insufficient evidence: metazoan phylogeny

Fungi

Metazoans

Page 43: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Challenges: sunflower phylogeny

= 15 spp!= 12 spp!

• Recent radiation (200,000 years)• Many species, much hybridization• Need more rapidly evolving markers!!

Page 44: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Informative sequences evolve at moderate rates

• Too fast?– homoplasy likely– “saturation” – only 4 possible states for DNA– Taxon 1 ATTCTGA– Taxon 2 GTAGTGG– Taxon 3 CGTGCTG

Polytomy

Page 45: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Saturation• Imagine changing one nucleotide every hour to a random

nucleotide• Split the ancestral population in 2.

ACGTGCT

One hour

Four hours

12 hours

ACTTGCT

ACGAGCT

ACCTGAA

GCGATCC

ACCAGAA

AGCCTCC

8 hours

AGCGGAA

GAGCTCC

Red indicates multiple mutations at a site

24 hours?

Page 46: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Saturation: mammalian mitochondrial DNA

Page 47: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Forces of evolution and phylogeny reconstruction

How does each force affect the ability to reconstruct phylogeny?

mutation?

drift?

selection?

non-random mating?

migration?

Page 48: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny case study I: whalesAre whales ungulates (hoofed mammals)? Figure 14.4

Page 49: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Whales: DNA sequence data

Hillis, D. A. 1999.

How reliable is this tree? Bootstrapping.

Page 50: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

How consistent are the data?• Take the dataset (5 taxa, 10 characters)

• Create a new data set by sampling characters at random, with replacement

Taxon 1 2 3 4 5 6 7 8 9 10

Human A C G T T G T A C T

Chimp A G G T T C T A T T

Bonobo A G G T T C T A T G

Gorilla A C T T G C T G T C

Orang T C G T G T A C C C

Taxon 3 8 2 6 10 10 5 8 8 7 3

Human G A C G T T T A A T G

Chimp G A G C T T T A A T G

Bonobo G A G C G G T A A T G

Gorilla T G C C C C G G G T T

Orang G C C T C C G C C A G

Page 51: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Whales: DNA sequence data

Hillis, D. A. 1999.

Page 52: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Molecular clocks

Page 53: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Basic idea of molecular clocks

chimps

humans

whales

hippos56 mya

60 substitutions

6 substitutions

Page 54: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Challenges for phylogeny: gene flow

Page 55: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Sunflower annuals

Page 56: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Different genes may have different histories!

Page 57: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny summary

Page 58: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny study questions1) Explain in words the difference between monophyletic, paraphyletic,

and polyphyletic taxa. Draw a hypothetical phylogeny representing each type. Give an actual example of a commonly recognized paraphyletic taxon in both animals and in plants.

2) How can a reconstructed phylogeny be used to determine if a similar character in two taxa is due to homoplasy?

3) Whales are classified as cetaceans, not artiodactyl ungulates. This makes artiodactyls paraphyletic – why? What is the evidence that whales belong in the artiodactyls?

4) Phenetics (distance methods) and cladistics (parsimony) differ in the ways they recognize and use similarities among taxa to form phylogenetic groupings. What types of similarity does each school recognize, and how useful is each type of similarity considered to be for identifying groups?

Page 59: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny study questions5) What is “bootstrapping” in the context of phylogenetic analysis, and

why is this procedure performed?

6) Why are maximum likelihood methods increasing in popularity for reconstructing phylogenies? In your answer, include a short description of how this method identifies the best phylogeny.

7) For what kinds of data can maximum likelihood methods of phylogeny construction be used? Why is this so? What types of data are typically not used, and why?

8) Would animal mitochondrial DNA provide a reasonable molecular tool for evaluating deep phylogenetic relationships between animal phyla? What about ribosomal DNA? Justify your answers.

9) Integrative question: Draw a pair of axes with “Time since divergence” on the x axis and “percent of sites that are the same” on the y axis. Draw a graph that shows the basic pattern for third codon sites: is your graph linear? Explain why or why not.

Page 60: Phylogeny reconstruction How do we reconstruct the tree of life? Outline: Terminology Methods distance parsimony maximum likelihood bootstrapping Problems.

Phylogeny study questions10) You are studying a group of species that lives in two very different

environments. You build two phylogenies: one is based on a locus that is probably under divergent selection in the two environments, while the other phylogeny is based on a neutral locus. Which phylogeny would be more likely to represent the species history? why?

11) For a number of years, Anolis lizards are found in similar micro-habitats on many separate islands in the Carribean are very similar to each other (for example, large lizards that feed on the ground, smaller lizards that feed on tree trunks, and very small lizards that feed at the tops of branches). Two different, historical explanations have been proposed to explain this pattern: each morph has evolved repeatedly on each island, or each morph has evolved just once, then dipsersed. Sketch a phylogeny that would support each hypothesis.

12) Integrative question: the Cameroon lake cichlid phylogeny, showing that the lake species were monophyletic, was based on mitochondrial DNA. Explain why this might not reflect the species history. How could you be more certain about the phylogeny?

13) Explain why allopolyploid taxa pose problems for phylogenies.