From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The...
-
Upload
annabel-gibbs -
Category
Documents
-
view
216 -
download
0
Transcript of From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The...
From basic Concepts to Advanced applications
Molecular Evolution and Phylogeny
By Ofir Cohen
The Bioinformatics UnitG.S. Wise Faculty of Life Science
Tel Aviv University, Israel2011
http://ibis.tau.ac.il/twiki/bin/view/Bioinformatics/Phylogeny
2 of ~28
Darwin’s teachings– common descent and Tree-like evolution
Introduction – The tree concept
3 of ~28
Common Descent – Modern evidence
Introduction – The tree concept
"The unity of life is no less remarkable than its diversity" "The unity of life is no less remarkable than its diversity" THEODOSIUS DOBZHANSK
4 of ~28
What is a Phylogenetic Tree? Phylogenetic tree:
(hypothetical) historical pattern of evolutionary relationships among organisms
Introduction – The tree concept
Homo
Bos
Mus
Rattus0.011
0.025
0.012
0.011
Gallus
0.038
0.066
0.01
Root
Node
Leaf
Branch
(Greek: phylon = race and genetic = birth)
sps
Horizontal branch length –proportional to evolutionary distances (unit = substitution / site)
5 of ~28
Molecular evidence of HIV transmission in a
criminal case
Introduction - Anecdotes
Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297
6 of ~28
Criminal investigation
August 1994 a nurse tests negative for HIV. breaks off a messy 10 year affair with a doctor. Three weeks later the doctor gives his ex-mistress a vitamin B-12 shot
In January 1995, the nurse tests positive for both HIV and hepatitis C.
The doctor’s office records from the day are missing (but eventually found). The doctor had withdrawn blood samples from a known HIV patient and a known hepatitis C patient
the same day as the vitamin B-12 shot. The nurse had never had contact with either patient
Introduction - Anecdotes
Circumstantial evidence that the doctor injected blood from a patient of his into this ex-girlfriend….
How can this be proved using a phylogenetic approach?
7 of ~28
HIV – short background
Extreme heterogeneity Within each patient there are many different viral
strains ("quasi-species")
Introduction - Anecdotes
8 of ~28
History of the virus:
gp120PATIENT
VICTIM
CONTROLS
©2002 National Academy of Sciences, U.S.A.
Introduction - Anecdotes
9 of ~28
History of the virus:
RT VICTIM
PATIENT
Introduction - Anecdotes
Source sequences that are paraphyletic (other sequences are nested within them)
with respect to the recipient sequences provide evidence for the direction of transmission.
10 of ~28
Phylogenetic analysis: Not only among organisms - Cancer
phylogenyA phylogeny of acute myeloid leukemia (AML) subtypes
Riester et al. 2010Liu et al. 2009
11 of ~28
Phylogenetic analysis: Not only in biology – Language evolution
Russell and Atkinson. 2003
Researchers learn the evolution of languages by treating them like genomes.
Instead of COGs (gene families), analyze COGNATES (words families)
12 of ~28
Comparative Genomics – "All life is one"
Compare homologues sequences
13 of ~28
Newick format with branch lengths
(A:0.3,((B1:0.1,B2:0.1):0.3,(C1:0.1,C2:0.1):0.5):0.3);
0.1
A
B1
C1
C2
B2http://tree.bio.ed.ac.uk/software/figtree/
14 of ~28
Alignment and phylogeny are mutually dependant
Inaccurate tree building
MSA
Sequence alignment
0.4
Phylogeny reconstruction
Unaligned sequences
16 of ~28
Multiple sequence alignment (MSA)Several advanced MSA programs are available.
Today we will use two: MAFFT – fastest and one of the most accurate PRANK – distinct from all other MSA programs because of its
correct treatment of insertions/deletions
17 of ~28
MAFFT Web server & download:
http://align.bmr.kyushu-u.ac.jp/mafft/online/server/ Efficiency-tuned variants
quick & dirty or slow but accurate
Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066© 2002 Oxford University Press
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*
18 of ~28
Choosing a MAFFT strategy
qu
ick &
dirty slow
bu
t accurate
19 of ~28
Choosing a MAFFT strategy
qu
ick &
dirty slow
bu
t accurate
20 of ~28
Choosing a MAFFT strategy
qu
ick &
dirty slow
bu
t accurate
21 of ~28
Choosing a MAFFT strategy
L-INS-i
ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------
--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------
------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------
--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo
--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------
G-INS-i
XXXXXXXXXXX-XXXXXXXXXXXXXXX
XX-XXXXXXXXXXXXXXX-XXXXXXXX
XXXXX----XXXXXXXX---XXXXXXX
XXXXX-XXXXXXXXXX----XXXXXXX
XXXXXXXXXXXXXXXX----XXXXXXX
E-INS-i
oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo
---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------
-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo
---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------
---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------
qu
ick &
dirty slow
bu
t accurate
22 of ~28
MAFFT outputSaving the output Choose a format: Clustal, Fasta, or
click "Reformat" to convert to a selection of other formats
Save page as a text file
A colored view of the alignment
23 of ~28
PRANK
24 of ~28
Classical alignment errors for HIV env
25 of ~28
PRANK Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/
26 of ~28
PRANK output
If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/
27 of ~28
Downloadable PRANK http://www.ebi.ac.uk/goldman-srv/prank/prank/
PRANK: A command-line program interface PRANKSTER: A program with graphical user interface
2828
1. Download and unzip the sequence files from my homepage (Google “Ofir Cohen" and look for the workshop materials under "Teaching"). Open "fahA.fas" in Notepad – these are 65 protein sequences in FASTA format.
2. Run PRANKSTER, open the "fahA.fas" file, and run "Alignment""Make alignment"
3. While you wait: Copy the sequences into the MAFFT web server and run the "automatic" "moderate" strategy – which strategy did MAFFT choose for you? Click "Reformat", choose "phylip|phylip4", and save as "fahA.mafft.phylip"
4. When PRANKSTER finishes click FileSave, and save the MSA in Phylip format by the name "fahA.prank.phylip"
29 of ~28
Phylogeny reconstructionDifferent approaches (algorithms / programs): Distance based methods (e.g. neighbor-joining, as in ClustalW)
Fast but inaccurate Maximum parsimony (e.g. MEGA) Maximum likelihood methods (e.g. phyML, RAxML)
Accurate but slower Bayesian methods (e.g. MrBayes)
Most accurate but very slow
ABCDE
Guide tree
A
DCB
E
MSA
Pairwise distance table
30 of ~28
PhyMLThe most widely used maximum likelihood (ML) program Web server & download: http://www.atgc-montpellier.fr/phyml/
Accepts input MSA in PHYLIP format only:
• Interleaved: • Sequencial:
3232
1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your email)
2. Run it with the local installation of "phyml.bat"
• You should end up with a file: "fahA.prank.phylip_phyml_tree.txt"
33 of ~28
RAxML Web server: http://phylobench.vital-it.ch/raxml-bb/ Similar maximum likelihood (ML) methodology as phyML, but much faster
Faster results Better results in same run-time
3535
1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and enter your email)
• Save the resulting tree file as: "fahA.prank.phylip.raxml"
36 of ~28
FigTree: tree visualization and figure creation
Manipulate a node
Manipulate a clade
Manipulate a taxon
37 of ~28
1. Open "fahA.prank.phylip_phyml_tree.txt" in FigTree
2. Play around with the different options and make a pretty figure!
1. Find out how to color specific clades, as below
2. Try each of the three options under "Layout"
3. Export a figure in PDF format(File Export Graphic…)
38 of ~28
Final Questions…
Thanks for your attention