An Introduction to Phylogenetics
description
Transcript of An Introduction to Phylogenetics
An Introduction to Phylogenetics
> Sequence 1GAGGTAGTAATTAGATCCGAAA…> Sequence 2GAGGTAGTAATTAGATCTGAAA…> Sequence 3GAGGTAGTAATTAGATCTGTCA…
Anton E. Weisstein
Indiana State UniversityMarch 11-14, 2004
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
What is phylogenetics?
Phylogenetics is the study of evolutionary relationships.
Relationships among species:
crocodiles
birds
lizards
snakesrodents
primates
marsupials
What is phylogenetics?Relationships among species:
crocodilesbirdslizardssnakesrodentsprimatesmarsupials
This is an example of a phylogenetic tree.
What is phylogenetics?Relationships within species:HIV subtypes
RwandaIvory Coast
UgandaU.S.U.S.
Italy
U.K.
India Rwanda
EthiopiaS. AfricaUganda
Uganda
Tanzania
Romania
BrazilCameroon
Netherlands
NetherlandsTaiwan
Russia
A
B
C
D
F G
So what is phylogeneticsgood for?
Phylogenetics has direct applications to:
• Conservation: test wood, ivory, meat products for poaching
• Agriculture: analyze specific differences between cultivars
• Forensics: DNA fingerprinting
• Medicine: determine specific biochemical function of cancer-causing genes
1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
HIV Example 1:Florida dentist case
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
Phylogenetic concepts:Interpreting a Phylogeny
Sequence A
Sequence BSequence C
Sequence D
Sequence E
Time
Which sequence is most closely related to B?
A, because B diverged from A more recently than from any other sequence.
Physical position in tree is not meaningful! Only tree structure matters.
Phylogenetic concepts:Rooted and Unrooted Trees
Time
A
B
C
D
Root =
A B
C D
Root
X
=?A B
C D
?
? ?
? ?
X
How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees# branches
/tree3 3 1 3 3 4
4 6 3 5 15 6
5 10 15 7 105 8
6 15 105 9 945 10
10 45 2,027,025 17 34,459,425 18
30 435 8.69 1036 57 4.95 1038 58
N N (N - 1)2
(2N - 5)!2N - 3 (N - 3)!
2N - 3 (2N - 3)!2N - 2 (N - 2)!
2N - 2
Tree Types
Root
50 million years
sharks
seahorses
frogsowls
crocodilesarmadillosbats
Evolutionary trees measure time.
Root
sharksseahorses
frogsowls
crocodilesarmadillos
bats5% change
Phylograms measure change.
Tree Properties
Root
UltrametricityAll tips are an equal
distance from the root.X
Y
a
bc d
e
a = b + c + d + e
Root
AdditivityDistance between any two tips equals the total branch
length between them.
X
Ya
b
c de
XY = a + b + c + d + e
In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.
Tree Building Exercise
UltrametricityAll tips are an equal
distance from the root. Root
X
Y
a
bc d
e
a = b + c + d + e
Using the distance matrix given, construct an ultrametric tree.
Phylogenetic Methods
Neighbor-joining• Minimizes distance between nearest neighbors
Maximum parsimony• Minimizes total evolutionary change
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
Comparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Uses only pairwise distances
Uses only shared derived characters
Uses all data
Minimizes distance between nearest neighbors
Minimizes total distance
Maximizes tree likelihood given specific parameter values
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, homoplasy rare)
Good for very small data sets and for testing trees built using other methods
Which procedure should we use?Neighbor-
joiningMaximumparsimony
Maximumlikelihood
All that we can!
?
• Each method has its own strengths
• Use multiple methods for cross-validation
• In some cases, none of the three gives the correct phylogeny!
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
Phylogenetic concepts:Homology and Homoplasy
Homology: identical character due to shared ancestry (evolutionary signal)
Homoplasy: identical character due to evolutionary convergence or reversal (evolutionary noise)
lizards
snakes
rodentsprimates+hair
Homology Homoplasy(Convergence)
birds
snakes
rodentsbats
+flight
+flight
Homoplasy(Reversal)
worms
lizardssnakes+legs –legs
Watching the Molecular ClockMutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock.
% genetic difference
20012002
2001
2002
2000
Watching the Molecular ClockMutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock.
% genetic difference
BUT:• Natural selection favors some mutations and eliminates others• Selection varies over time and across lineages
2000
20012002
200120012002
2002
Trees are hypotheses about evolutionary history
So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.
Tree Testing:Split Decomposition
Split decomposition is one method for testing a tree.
A
B
C
D
A
D
B
C
A
C
B
D
Under this procedure, we choose exactly four taxa (A, B, C, D) and examine the topologies of all possible unrooted trees. How many such trees are there?
Only one of these topologies is right. How can we quantitatively assess the support for each tree?
Tree Testing:Split Decomposition
The correct tree should be approximately additive; the others usually will not. For each tree, we calculate split indices that estimate the length of the internal branch:
+A
D
B
C+
A
C
B
D–
2Large split indices Long internal branch Topology strongly supported
Small split indices Short internal branch Topology weakly supported
Negative split indices Biologically impossible Topology probably wrong
=
if A
C
B
Dis the right phylogeny!
Tree Testing:Bootstrapping
Used to assess the support for individual branches
Randomly resample characters, with replacement
How often does a specific branch appear?
Repeat many times (1000 or more)
rathumanturtlefruit flyoakduckweed
100
98
73
Tree Testing:Bootstrapping
MacClade Example:
Vertebrate evolution
Outline
I. Overview
II. Building and Interpreting Phylogenies
III. Evolutionary Inference
IV. Specific Applications
HIV Example 1:Florida dentist case
• 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
• HIV evolves so fast that transmission patterns can be reconstructed from viral sequence (molecular forensics).
• Compared viral sequence from the dentist, three of his HIV+ patients, and two HIV+ local controls.
Florida dentist case
So what do the results mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?