The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1...
-
Upload
nelson-small -
Category
Documents
-
view
225 -
download
0
Transcript of The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1...
The Evolutionary Basis of Bioinformatics:An Introduction to Phylogenetics
http://bioquest.org/bedrock
> Sequence 1GAGGTAGTAATTAGATCCGAAA…> Sequence 2GAGGTAGTAATTAGATCTGAAA…> Sequence 3GAGGTAGTAATTAGATCTGTCA…
What is phylogenetics?
Phylogenetics is the study of evolutionary relationships among and within species.
crocodiles
birds
lizards
snakesrodents
primates
marsupials
What is phylogenetics?
crocodiles
birds
lizards
snakes
rodents
primates
marsupials
This is an example of a phylogenetic tree.
• Forensics:Did a patient’s HIV infection result from an invasive dental
procedure performed by an HIV+ dentist?
Applications of phylogenetics
• Conservation:How much gene flow is there among local populations of island
foxes off the coast of California?
• Medicine:What are the evolutionary relationships among the various
prion-related diseases?
To be continued…
Phylogenetic concepts:Interpreting a Phylogeny
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Time
Which sequence is most closely related to B?
A, because B diverged from A more recently than from any other sequence.
Physical position in tree is not meaningful! Only tree structure matters.
Phylogenetic concepts:Rooted and Unrooted Trees
Time
A
B
C
D
Root =
A B
C D
Root
X
=?
A B
C D
?
? ?
? ?
X
Rooting and Tree Interpretation
bacteria archaebacteria
oak
fruit fly
chickenhuman
bacteria
archaea
oak
fruit fly
chicken
human
bacteria
archaebacteria
oak
fruit fly
chicken
human
– bones
– cell nuclei
+ cell nuclei
+ bones
Rooting MethodsOutgroup rootAdd 2+ taxa whose branches contain tree’s new root
trout
eagle
bat mouse
trout
eaglebat
mouse
Must already know position of new tree’s root (often go from higher to lower taxonomic unit, e.g. family genus)
shark rayray
shark
How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees
# branches
/tree
3
4
5
6
10
30
N
(assuming bifurcation only)
How Many Trees?
2N - 2(2N - 3)!
2N - 2 (N - 2)!
2N - 3(2N - 5)!
2N - 3 (N - 3)!
N (N - 1)
2
N
584.95 1038578.69 103643530
1834,459,425172,027,0254510
109459105156
8105715105
6155364
433133
# branches
/tree# trees
# branches /
tree# trees
# pairwise distances
# sequences
Rooted treesUnrooted trees
Tree Types
Root
50 million years
sharks
seahorses
frogs
owls
crocodiles
armadillosbats
Evolutionary trees measure time.
Root
sharksseahorses
frogsowls
crocodilesarmadillos
bats5% change
Phylograms measure change.
Tree Properties
Root
UltrametricityAll tips are an equal
distance from the root.X
Y
a
b
c de
a = b + c + d + e
Root
AdditivityDistance between any two tips equals the total branch
length between them.
X
Y
ab
c d
e
XY = a + b + c + d + e
In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.
Tree Building Exercise
UltrametricityAll tips are an equal
distance from the root. Root
X
Y
a
b
c de
a = b + c + d + e
Using the distance matrix given, construct an ultrametric tree.
Phylogenetic Methods
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
Maximum parsimony• Minimizes total evolutionary change
Neighbor-joining• Minimizes distance between nearest neighbors
Comparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, strong conservation)
Good for very small data sets and for testing trees built using other methods
Phylogenetic concepts:Homology and Homoplasy
Hair? Wings?
Bat
Chimp
Hawk
bat
chimp
hawk
+ hair
no hairno wings
+ wings
+ wings
Homology:identity due to shared ancestry
(evolutionary signal)
Homoplasy:identity despite
separate ancestry(evolutionary noise)
Trees are hypotheses about evolutionary history
So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.
Tree Testing
Let’s study the following four sequences:
How can we explain the indicated character?
P. A C A T A C GQ. G T A T A C GR. G C A C A T GS. G C A C A C A
1. Homology: Changed just once.2. Homoplasy: Changed twice or more.
P Q
R S
Homology more likely, but homoplasy still feasible.
Tree Testing
Now let’s look at four other sequences:
W. A C A T G T C A G A C GX. G T A T G T C A G A C GY. G C A C A C T G A A T GZ. G C A C A C T G A A C A
P Q
R S
Same two explanations possible.Any changes to their relative likelihood?
Homology much more likely; homoplasy implausible.
Tree TestingBasic principle:
Long branches Strong evolutionary signal
A
B
C
D
Short branches Weak evolutionary signal
A
B
C
D
Zero-length branches NO evolutionary signal
A
B
C
D
Tree-testing methods:Bootstrapping, Jackknifing, Split decomposition, …
Applications of phylogenetics
1. Forensics
Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
So what do the results mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?
How much gene flow is there among local populations of island foxes off the coast of California?
Applications of phylogenetics
2. Conservation
http://bioquest.org/bedrock/
Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)
Applications of phylogenetics
What are the evolutionary relationships among the various prion-related diseases?
3. Medicine