Phylogenetic Analysis
description
Transcript of Phylogenetic Analysis
![Page 1: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/1.jpg)
Phylogenetic Analysis
![Page 2: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/2.jpg)
2
Phylogenetic Analysis Overview
• Insight into evolutionary relationships
• Inferring or estimating these evolutionary relationships shown as branches of a tree
• Length and nesting reflects degree of similarity between any two items (in our case, sequences)
![Page 3: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/3.jpg)
3
Phylogenetics and Cladistics
• Clade = a set of descendants from a single ancestor (Greek work for branch)
• Three basic assumptions– Any group of organisms are related b descent
from a common ancestor– There is a bifurcating pattern of cladogenesis– Change in characteristics occurs in lineages
over time
![Page 4: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/4.jpg)
4
More default assumptions
1. Correct sequences and origins2. Shared ancestral origin3. Homologous sequences4. No mixtures of nuclear and organellar
sequences5. Large enough taxa sampling size6. Contains representative sequence
variations7. Sufficient sequence variations
![Page 5: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/5.jpg)
5
Basic Terminology• Clades: a group of organisms
or genes that includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor.
• Taxons: any named group of organisms; not necessary a clade.
• Branches: branches sometimes correspond to the degree of divergence
• Nodes: a bifurcating branch point
Branch lengths are not significant
Branch lengths are significant
![Page 6: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/6.jpg)
6
Basic Definition• Homologous: sequences that share an arbitrary
threshold level of similarity determined by alignment of matching bases
• Similarity: a quantifiable term that refers to a degree of relatedness between sequences, but does not necessarily reflect ancestry.
• Orthologs: homologs produced by speciation; derived from a common ancestor; tend to have similar function
• Paralogs: homologs produced by gene duplication; derived within an organism, tend to have differing functions
• Xenologs: homologs resulting from horizontal gene transfer between two organisms; difficult to verify; variable function but tends to be similar.
![Page 7: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/7.jpg)
7
Phylogenetic Analysis Overview
• Objective: – determine branch length and to figure out
how the tree should be drawn– Sequences most closely related drawn as
neighboring branches
![Page 8: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/8.jpg)
8
Phylogenetic Analysis Overview
• Dependent upon good multiple sequence alignment programs
• Group sequences with similar patterns of substitutions in order to reconstruct a phylogenetic tree
![Page 9: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/9.jpg)
9
Phylogenetic Analysis Overview
• Consider two sequences that are related– Ancestoral sequence can be (partially)
derived– With additional sequences, more
information can be gathered to add to a correct derivation
![Page 10: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/10.jpg)
10
Phylogenetic Analysis Overview• Example: C-Terminal Motor Kinesin sequences
– http://www.proweb.org/kinesin/BE4_Cterm.html
![Page 11: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/11.jpg)
11
Practical use of phylogenetic analysis
• To prioritize the analysis of genes in the target family – give insight into protein functions
![Page 12: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/12.jpg)
12
•P. asruginosa, a bacteria that is one of the top 3 causes or opportunistic infections, is noted for its antimicrobial resistance and resistance to detergents. •3 homologous outer membrane proteins, OprJ, OprM and OprN were identified as playing a role in this antimicrobial resistance.
![Page 13: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/13.jpg)
13
Figure 14.2 Example of a phylogenetic tree based on genes that does not match organismal phylogeny, suggesting horizontal gene transfer has occurred.
Possible horizontal gene transfer
![Page 14: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/14.jpg)
14
Uses of Phylogenetic Analysis
• Given a set of genes, determine which genes are likely to have equivalent functions
• Follow changes occurring in a rapidly changing species such as a virus– Example: influenza – Study of rapidly changing genes – Next year’s strain can be predicted– Flu vaccination can be developed
![Page 16: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/16.jpg)
16
Tree of Life• Phylogenies study how the evolution of
species has occurred
• Image: http://microbialgenome.org/primer/tree.html
![Page 17: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/17.jpg)
17
Tree of Life
• Traditionally, morphological (visible features) characters have been used to classify organisms– Living organisms– Fossil records
• Sequence data beginning to take larger role
![Page 18: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/18.jpg)
18
Tree of Life
• Many different resources including:
– NCBI taxonomy web sites
– University of Arizona’s tree of life project
![Page 19: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/19.jpg)
19
NCBI Taxonomy Web Site
• http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/
分類法 ; 分類學
![Page 21: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/21.jpg)
21
Evolutionary Trees
• Two dimensional graph showing evolutionary relationship among a set of items
• can be organisms, genes, or sequences
• Each unit is defined by a distinct branch on the tree
![Page 22: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/22.jpg)
22
Evolutionary Trees
• leaves represent the units (taxa) being studied
• nodes and branches representing the relationships among the taxa
• Two taxa derived from the same common ancestor will share a node in the graph
![Page 23: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/23.jpg)
23
Evolutionary Trees
• length of each branch may be drawn according to the number of sequence level changes that occurred
• distance may not be in direct relation to evolutionary time
• uniform rate of mutation analyses use the molecular clock hypothesis
![Page 24: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/24.jpg)
24
Rooted Trees• One sequence (root) defined to be common
ancestor of all of the other sequences
• A unique path leads from the root node to any other node
• Direction of path indicates evolutionary time
• Root chosen as a sequence thought to have branched off earliest
![Page 25: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/25.jpg)
25
Rooted Trees
• If molecular clock hypothesis holds, it is possible to predict a root
• As the number of sequences increase, the number of possible rooted trees increases very rapidly
• In most cases, a bifurcating binary tree is the best model to simulate evolutionary events
![Page 26: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/26.jpg)
26
Example Rooted Tree
SYSTEMATICS AND MOLECULAR PHYLOGENETICSImage source: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html
![Page 27: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/27.jpg)
27
Unrooted Tree (Star)
• Indicates evolutionary relationship without revealing the location of the oldest ancestry
• Fewer possible unrooted trees than a rooted tree
![Page 28: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/28.jpg)
28
Example Unrooted Tree
Image source: http://www.shef.ac.uk/english/language/quantling/images/quantling1.jpg
![Page 29: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/29.jpg)
29• Image: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html
![Page 30: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/30.jpg)
30
Methods for Determining Trees
• Three main methods:– maximum parsimony– Distance– maximum likelihood
![Page 31: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/31.jpg)
31
Maximum Parsimony
• Predicts evolutionary tree minimizing number of steps required to generate observed variation
• Multiple sequence alignment must first be obtained
![Page 32: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/32.jpg)
32
Maximum Parsimony
• For each position, phylogenetic trees requiring the smallest number of evolutionary changes to produce the observed sequence changes are identified
• Trees that produce the smallest number of changes for all sequence positions are identified
![Page 33: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/33.jpg)
33
Maximum Parsimony
• Time consuming algorithm
• Only works well if the sequences have a strong sequence similarity
![Page 34: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/34.jpg)
34
Maximum Parsimony Example
1 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
• four sequences, three possible unrooted trees
![Page 35: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/35.jpg)
35
Maximum Parsimony Example
Possible Trees:
1
2 4
31
3 4
21
4 2
3
![Page 36: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/36.jpg)
36
Maximum Parsimony Example
• Some sites are informative, and other sites are not
• Informative site has the same sequence character in at least two different sequences
• Only the informative sites need to be considered
![Page 37: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/37.jpg)
37
1 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G
Three informative columns
Maximum Parsimony Example
![Page 38: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/38.jpg)
38
Maximum Parsimony Example
1 G G A2 G G G3 A C A4 A C G
1
2 4
3 1
3 4
2 1
4 2
3
1
2 4
3 1
3 4
2 1
4 2
3
Column 1
Column 2
Column 3
1
2 4
3 1
3 4
2 1
4 2
3
Is a substitution
![Page 39: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/39.jpg)
39
Distance Method
• Looks at the number of changes between each pair in a group of sequences
• Goal is to identify a tree that positions neighbors correctly and that also has branch lengths which reproduce the original data as closely as possible
![Page 40: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/40.jpg)
40
Distance Method
• CLUSTALW uses the neighbor-joining method as a guide to multiple sequence alignments
• PHYLIP suite of programs employ neighbor-joining methods
– http://evolution.genetics.washington.edu/phylip.html
![Page 41: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/41.jpg)
41
Distance Programs in Phylip
• NEIGHBOR: estimates phylogenies using either:
– neighbor-joining (no molecular clock assumed)
– unweighted pair group method with arithmetic mean (UPGMA) (molecular clock assumed)
![Page 42: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/42.jpg)
42
Distance Analysis
• distance score counted as – number of mismatched positions in the
alignment– number of sequence positions that must be
changed to generate the second sequence
• Success depends on degree the distances among a set of sequences can be made additive on a predicted evolutionary tree
![Page 43: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/43.jpg)
43
Example of Distance Analysis
• Consider the alignment:
A ACGCGTTGGGCGATGGCAACB ACGCGTTGGGCGACGGTAATC ACGCATTGAATGATGATAATD ACACATTGAGTGATAATAAT
![Page 44: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/44.jpg)
44
Example of Distance Analysis
• Distances can be shown as a table
A ACGCGTTGGGCGATGGCAACB ACGCGTTGGGCGACGGTAATC ACGCATTGAATGATGATAATD ACACATTGAGTGATAATAAT
![Page 45: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/45.jpg)
45
Example of Distance Analysis
• Using this information, a tree can be drawn:
A ACGCGTTGGGCGATGGCAACB ACGCGTTGGGCGACGGTAATC ACGCATTGAATGATGATAATD ACACATTGAGTGATAATAAT
C
D
A
B
41
2
2
1
![Page 46: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/46.jpg)
46
Fitch and Margoliash Algorithm (3 sequences)
• Distance table used
• Sequences combined in threes – define the branches of the predicted tree– calculate the branch lengths of the tree
![Page 47: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/47.jpg)
47
Fitch and Margoliash Algorithm (3 sequences)
• 1) Draw unrooted tree with three branches originating from common node:
Cc
b
a
B
A
![Page 48: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/48.jpg)
48
Fitch and Margoliash Algorithm (3 sequences)
1) Calculate lengths of tree branches algebraically:
• distance from A to B = a + b = 22 (1)• distance from A to C = a + c = 39 (2)• distance from B to C = b + c = 41 (3)• • subtracting (3) from (2) yields:• • b + c = 41• -a – c = -39• __________• b – a = 2 (4)• • adding (1) and (4) yields 2b = 24; b = 12• so a + 12 = 22; a = 10• 10 + c = 39; c = 29
![Page 49: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/49.jpg)
49
Fitch and Margoliash Algorithm (3 sequences)
• 3) Resulting tree:
C29
12
10
B
A
![Page 50: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/50.jpg)
50
Fitch and Margoliash Algorithm (5 sequences)
• Algorithm can be extended to more sequences. Consider the distances:
A
B
C
D
E
a
bd
c
e
f
g
![Page 51: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/51.jpg)
51
Summary of Fitch-Margoliash
1) Find the mostly closely related pairs of sequences (A, B).
2) Treat the rest of the sequences as a composite. Calculate the average distance from A to all others; and from B to all others.
3) Use these values to calculate the length of the edges a and b.
![Page 52: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/52.jpg)
52
Summary of Fitch-Margoliash
4) Treat A and B as a composite. Calculate the average distances between AB and each of the other sequences. Create a new distance table.
5) Identify next pair of related sequences and begin as with step 1.
6) Subtract extended branch lengths to calculate lengths of intermediate branches.
![Page 53: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/53.jpg)
53
Summary of Fitch-Margoliash
7) Repeat the entire process with all possible pairs of sequences.
8) Calculate predicted distances between each pair of sequences for each tree to find the best tree.
![Page 54: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/54.jpg)
54
Neighbor Joining
• Similar to Fitch-Margoliash
• Sequences chosen to give best least-squares estimate of branch length
![Page 55: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/55.jpg)
55
Maximum Likelihood
• Calculates likelihood of a tree given an alignment
• Trees with least number of changes will be most likely
![Page 56: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/56.jpg)
56
Maximum Likelihood (ML)
• Probability of each tree is product of mutation rates in each branch
• Likelihoods given by each column multiplied to give the likelihood of the tree
![Page 57: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/57.jpg)
57
Maximum Likelihood (ML)
• Disadvantages:– Computationally intensive– Can only be done for a handful of
sequences
![Page 58: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/58.jpg)
58
Which Method to Choose?
• Depends upon the sequences that are being compared– strong sequence similarity:
• maximum parsimony – clearly recognizable sequence similarity
• distance methods – All others:
• maximum likelihood
![Page 59: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/59.jpg)
59
Distance, Parsiomony and ML• Distance matrix: simply count the number of differences
between two sequences.• Maximum Parsimony: search for a tree that requires
the smallest number of changes to explain the differences observed among the taxa.
• ML: evaluates the probability that the chosen evolutionary model has generated the observed data. A simple model is that changes between all nucleotides (or amino acids) are equally probable. The probability for all possible reconstructions are summed up to yield the likelihood for one particular site. The likelihood for the tree is the product of the likelihoods for all alignment positions in the dataset.
![Page 60: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/60.jpg)
60
Which Method to Choose?
• Best to choose at least two approaches
• Compare the results – if they are similar, you can have more confidence
![Page 61: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/61.jpg)
61
Difficulties With Phylogenetic Analysis
• Horizontal or lateral transfer of genetic material (for instance through viruses) makes it difficult to determine phylogenetic origin of some evolutionary events.
• Genes selective pressure can be rapidly evolving, masking earlier changes that had occurred phylogenetically.
![Page 62: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/62.jpg)
62
Difficulties With Phylogenetic Analysis
• Two sites within comparative sequences may be evolving at different rates.
• Re-arrangements of genetic material can lead to false conclusions.
• Duplicated genes can evolve along separate pathways, leading to different functions
![Page 63: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/63.jpg)
63
Here are some 264 of the phylogeny packages, and 30 free servers
![Page 64: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/64.jpg)
64
Exercise
• Multiple Sequence Alignment– Sequence Alignment: CLUSTALW
– Sample sequences: found on E-learning system
![Page 65: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/65.jpg)
65
Explanation on the parameters
![Page 66: Phylogenetic Analysis](https://reader034.fdocuments.us/reader034/viewer/2022042617/568146e2550346895db41a06/html5/thumbnails/66.jpg)
66
Exercise