The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1...

27
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics http://bioquest.org/bedr > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA…

Transcript of The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1...

The Evolutionary Basis of Bioinformatics:An Introduction to Phylogenetics

http://bioquest.org/bedrock

> Sequence 1GAGGTAGTAATTAGATCCGAAA…> Sequence 2GAGGTAGTAATTAGATCTGAAA…> Sequence 3GAGGTAGTAATTAGATCTGTCA…

What is phylogenetics?

Phylogenetics is the study of evolutionary relationships among and within species.

crocodiles

birds

lizards

snakesrodents

primates

marsupials

What is phylogenetics?

crocodiles

birds

lizards

snakes

rodents

primates

marsupials

This is an example of a phylogenetic tree.

• Forensics:Did a patient’s HIV infection result from an invasive dental

procedure performed by an HIV+ dentist?

Applications of phylogenetics

• Conservation:How much gene flow is there among local populations of island

foxes off the coast of California?

• Medicine:What are the evolutionary relationships among the various

prion-related diseases?

To be continued…

Phylogenetic concepts:Interpreting a Phylogeny

Sequence A

Sequence B

Sequence C

Sequence D

Sequence E

Time

Which sequence is most closely related to B?

A, because B diverged from A more recently than from any other sequence.

Physical position in tree is not meaningful! Only tree structure matters.

Phylogenetic concepts:Rooted and Unrooted Trees

Time

A

B

C

D

Root =

A B

C D

Root

X

=?

A B

C D

?

? ?

? ?

X

Rooting and Tree Interpretation

bacteria archaebacteria

oak

fruit fly

chickenhuman

bacteria

archaea

oak

fruit fly

chicken

human

bacteria

archaebacteria

oak

fruit fly

chicken

human

– bones

– cell nuclei

+ cell nuclei

+ bones

Rooting MethodsOutgroup rootAdd 2+ taxa whose branches contain tree’s new root

trout

eagle

bat mouse

trout

eaglebat

mouse

Must already know position of new tree’s root (often go from higher to lower taxonomic unit, e.g. family genus)

shark rayray

shark

How Many Trees?

Unrooted trees Rooted trees

# sequences

# pairwise distances # trees

# branches /

tree # trees

# branches

/tree

3

4

5

6

10

30

N

(assuming bifurcation only)

How Many Trees?

2N - 2(2N - 3)!

2N - 2 (N - 2)!

2N - 3(2N - 5)!

2N - 3 (N - 3)!

N (N - 1)

2

N

584.95 1038578.69 103643530

1834,459,425172,027,0254510

109459105156

8105715105

6155364

433133

# branches

/tree# trees

# branches /

tree# trees

# pairwise distances

# sequences

Rooted treesUnrooted trees

Tree Types

Root

50 million years

sharks

seahorses

frogs

owls

crocodiles

armadillosbats

Evolutionary trees measure time.

Root

sharksseahorses

frogsowls

crocodilesarmadillos

bats5% change

Phylograms measure change.

Tree Properties

Root

UltrametricityAll tips are an equal

distance from the root.X

Y

a

b

c de

a = b + c + d + e

Root

AdditivityDistance between any two tips equals the total branch

length between them.

X

Y

ab

c d

e

XY = a + b + c + d + e

In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.

Tree Building Exercise

UltrametricityAll tips are an equal

distance from the root. Root

X

Y

a

b

c de

a = b + c + d + e

Using the distance matrix given, construct an ultrametric tree.

Phylogenetic Methods

Maximum likelihood• Maximizes likelihood of observed data

Many different procedures exist. Three of the most popular:

Maximum parsimony• Minimizes total evolutionary change

Neighbor-joining• Minimizes distance between nearest neighbors

Comparison of Methods

Neighbor-joining Maximum parsimony Maximum likelihood

Very fast Slow Very slow

Easily trapped in local optima

Assumptions fail when evolution is rapid

Highly dependent on assumed evolution model

Good for generating tentative tree, or choosing among multiple trees

Best option when tractable (<30 taxa, strong conservation)

Good for very small data sets and for testing trees built using other methods

Phylogenetic concepts:Homology and Homoplasy

Hair? Wings?

Bat

Chimp

Hawk

bat

chimp

hawk

+ hair

no hairno wings

+ wings

+ wings

Homology:identity due to shared ancestry

(evolutionary signal)

Homoplasy:identity despite

separate ancestry(evolutionary noise)

Trees are hypotheses about evolutionary history

So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.

Tree Testing

Let’s study the following four sequences:

How can we explain the indicated character?

P. A C A T A C GQ. G T A T A C GR. G C A C A T GS. G C A C A C A

1. Homology: Changed just once.2. Homoplasy: Changed twice or more.

P Q

R S

Homology more likely, but homoplasy still feasible.

Tree Testing

Now let’s look at four other sequences:

W. A C A T G T C A G A C GX. G T A T G T C A G A C GY. G C A C A C T G A A T GZ. G C A C A C T G A A C A

P Q

R S

Same two explanations possible.Any changes to their relative likelihood?

Homology much more likely; homoplasy implausible.

Tree TestingBasic principle:

Long branches Strong evolutionary signal

A

B

C

D

Short branches Weak evolutionary signal

A

B

C

D

Zero-length branches NO evolutionary signal

A

B

C

D

Tree-testing methods:Bootstrapping, Jackknifing, Split decomposition, …

Applications of phylogenetics

1. Forensics

Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

Phylogenetic analysis

So what do the results mean?

• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?

• Do we have enough data to be confident in our conclusions? What additional data would help?

• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

How much gene flow is there among local populations of island foxes off the coast of California?

Applications of phylogenetics

2. Conservation

http://bioquest.org/bedrock/

Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)

Applications of phylogenetics

What are the evolutionary relationships among the various prion-related diseases?

3. Medicine

Linking Sequence and Structure

Enolase