Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of...
Transcript of Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of...
1
Methods of molecular phylogeny
Peter Norberg([email protected])
Content
• Introduction to Evolution and taxonomy• Phylogenetic analysis• Algorithmics• Applied phylogenetics• Computer Software• Practical session
2
Evolution
• Charles Darwin• ”Tree of life”• Phylogenetic tree• Root = Ancestor to
all species
Rooted or unrooted trees?• Trees show
evolutionary relationships
• The root shows direction
3
Different representations
A B C D
ABCD
AB
CD
AB
CD
ABCD
Trees can be based on:
• Outer appearances (e.g. wings, shape of bills)• Functionality• Complexity• A combination of…• ………..• …..• DNA, RNA, AA, gene order….
4
Phylogenetic trees based on DNAAATTGGCC AATAGGCC
AATAGGAC
AATTGGCG
AGTTGGCG
TATTGGCG
AATAGGCA
AATAGGACAATAGGCAAGTTGGCGTATTGGCG
Phylogenetic trees based on DNAAATTGGCC AATAGGCC
AATAGGAC
AATTGGCG
AGTTGGCG
TATTGGCG
AATAGGCA
AATAGGACAATAGGCAAGTTGGCGTATTGGCG
5
Genomic region
• Same genomic region for all taxa!• Not too similar• Not too diverged• Insertions/deletions
Sequence alignment(1) AATGGCAACCGCATTCAGGATTTAA
(4) AATGGTAACCGCATTCAGGAATTA
(2) AATGGTAACCGCAAGGATTTAA
(3) ATGGTAACCGCATTGAGGATTTAA
(5) TGGTAACCGCATTCAGGAATTAA
(1) AATGGCAACCGCATTCAGGATTTAA(2) AATGGTAACCGCAA GGATTTAA(3) ATGGTAACCGCATTGAGGATTTAA(4) AATGGTAACCGCATTCAGGAATTA(5) TGGTAACCGCATTCAGGATTTAA
(1) AATGGCAACCGCATTCAGGATTTAA(2) AATGGTAACCGCAAGGATTTAA(3) ATGGTAACCGCATTGAGGATTTAA(4) AATGGTAACCGCATTCAGGAATTA(5) TGGTAACCGCATTCAGGATTTAA
Correct: Wrong:
6
Sequence alignment, our exampleAATTGGCC AATAGGCC
AATAGGAC
AATTGGCG
AGTTGGCG
TATTGGCG
AATAGGCA
AATTGGCCAATAGGCC
AATAGGACAATTGGCG
AGTTGGCGTATTGGCG
AATAGGCA
AATTGGCCAATAGGCC
AATAGGACAATTGGCG
AGTTGGCGTATTGGCGAATAGGCA
Phylogenetic principles
• Similar DNA sequences = closely related
• Inherited mutations.
• Simplest “route”!
• Homoplasy unlikely (not always true).
7
Homology vs. homoplasy
• Homology = similarity due to a common ancestor
• Homoplasy = similarity due to convergent evolution, but independent origins
Algorithms for constructing phylogenetic trees
• What is an algorithm?
• Several different phylogenetic algorithms exist.
• How do they work?
8
• Distance matrices– Neighbour Joining– UPGMA
• Maximum Parsimony
• Maximum Likelihood
• Bayesian inference
Algorithms for constructing phylogenetic trees
Distance matrices
• Based on the genetic distance
• Genetic distance based on nucleotide substitutions
• Typically # of differences / totalt # of nt
(1) AATTCCGG(2) AATACCGG(3) AATTAATG
1 2 31 0 2 1 03 3 4 0
1 2 31 0 2 0.125 03 0.375 0.5 0
9
Neighbour Joining
• Cluster in pairs
• Shortest distance first
• => Similar sequences located closely together in the tree
• Fast algorithm!
2
1
3
1 2 31 0 2 0.125 03 0.375 0.5 0
AB
CD
Maximum Parsimony
• Utilizes so-called informative sites.
• Simplest path (fewest mutations)
• Build all possible trees.
• Choose the tree, which requires the fewest mutations
• Relatively fast
10
Maximum Parsimony, example
(1)AATTCC(2)AAGTCC(3)AATTCC(4)AAGTCT
a
1 23 4
a
1 2 3 4
a a
1 4 2 3
a a
1
2
3
4a
1 2
3 4
a
1 3
2 4
1 2
4 3
a a
a
a
Maximum Likelihood and Bayesian inference
• Statistical method including an evolutionary model
• Summarize the likelihood for all columns
• Calculate the likelihood for all possible trees
• Good but slow!
• Bayesian inference faster
11
To test all possible trees• Is it possible?
• => Takes too long time!!!!
• To analyze 20 taxa gives ~1022 different possible trees (10.000.000.000.000.000.000.000)
• What to do?
• => Use sophisticated algorithms to limit the search space…..
• Usually produce good results, but not necessarily the best
To root an unrooted tree
• Include an “outgroup”• Outgroup = more distantly related (but not
too distantly)• Place the root where the outgroup connects
to the tree
12
Rooting a tree
A
BC
D
E
F
outgroup
G
AB
FD
G
CE
Significance
• Is the tree reliable?
• Is it the only probable?
• Bootstrap, Jack knife etc.
13
Bootstrap
• Construct several new sequence sets (1000 st.)
• A new sequence set is generated by randomly picking of columns from the original set
• Apply the phylogenetic algorithm on all sets.
• Make one consensus tree from all trees
Bootstrapping
A: AACTTAACCACGCTATCGATGCAATTATATAB: AATTTGACTGCGGTACCGATCCAATTATATAC: AATTTGACTGGGCTACCGATCCAATTATATAD: AACTTAACCGCGCTACTGATCGAATTATATA
A: CACCB: TGCTC: TGCTD: CAGC
A
D
B
C
96
96
A
B
C
D
3
3
A
C
B
D
1
1
14
Pitfalls?
• Homoplasy (convergent evolution)- Selection pressure- Hyper variable regions- Random events
• Gene duplication
• Recombination- Different regions have different ancestries
Recombination
A
B
Recombinants
Recombination
15
Detection of recombinants
X
A
B
C
D
E
FG
H
I
H
Detection of recombinants
X
A
B
C
DE
FG
H
I
A
B
C
D
E
FG
H
I
H
16
A
B
C
D
A
B
C
D
R A
B
C
D
R
A
B
C
D
R
A
B
C
D
R
Phylogenetic networks
Applied phylogenetics
• Reconstruct evolutionary history
• Animals, plants, bacteria, viruses, plasmids, ……
• Establish evolutionary mechanisms
• Functional studies
• Trace pandemic diseases
• Forensic medicine
17
Examples
18
19
Practical session
Phylip• Software package for phylogenetic analysis• Several small (command-line) applications• Many different algorithms• Widely used by the scientific community
• seqboot -> Constructs bootstrap sets• dnapars -> Constructs a maximum parsimony tree• consence -> Constructs a consensus tree• drawtree -> Draws the tree
20
Herpes Simplex Virus Type 1 & 2
• ~100 nm in diameter.• Capsid surrounded by envelope.• Different glycoproteins in envelope.
• Usually asymptomatic• Cause oral and genital lesions, encephalitis, meningitis and keratitis• Transferred via direct contact• Life long infection in the sensorial ganglia• HSV-1: 70-80%, HSV-2: 20-30%
Photo by Linda M. Stannard, University of Cape Town.
HSV-1 US7 (Glycoprotein I)
21
Clinical samplesIsolate Gender Location
E4 F brain (encephalitis)25 M brain (encephalitis)274 F brain (encephalitis)1666 M brain (encephalitis)3355 F brain (encephalitis)7682 M brain (encephalitis)90132 F oral90147 M oral90238 M oral90395 F oral90444 F brain (encephalitis)90579 F oral90602 M oral94783 M oral97869 F genital981264 F genital982466 M genital983501 F genital993412 F oral993515 F oral993565 F genital993576 F genital993594 F oral993606 F oral993608 F genital993615 F genital993621 F genital993626 F genital