1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life...

29
Phylogeny Workshop Phylogeny Workshop By Eyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 http://ibis.tau.ac.il/twiki/bin/view/Bioinformatics/Phylo geny2009

Transcript of 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life...

Page 1: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1

Phylogeny WorkshopPhylogeny Workshop

By Eyal Privman

The Bioinformatics UnitG.S. Wise Faculty of Life Science

Tel Aviv University, IsraelNovember 2009

http://ibis.tau.ac.il/twiki/bin/view/Bioinformatics/Phylogeny2009

Page 2: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2Why should weWhy should we

care about phylogeny?care about phylogeny?

"Nothing in biology makes sense except in the light of evolution"

(Theodosius Dobzhansky, 1973)

Page 3: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

33 Alignment and phylogeny are mutually dependant

Inaccurate tree building

MSA

Sequence alignment

0.4

Phylogeny reconstruction

Unaligned sequences

Page 4: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

44 Alignment and phylogeny are both challenging

25% of residues are

aligned wrong

Based on BAliBASE: a large representative set of proteins

Page 5: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

55 Alignment and phylogeny are both challenging

5% of tree branches are wrong

Based on simulations of 100 protein sequences

Page 6: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

66 Multiple sequence alignment (MSA)

progressive alignment

ABCDE

Guide tree

A

DCB

E

MSA

Pairwise distance table Iterative

Page 7: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

77

Multiple sequence alignment (MSA)

Several advanced MSA programs are available.Today we will use two:

• MAFFT – fastest and one of the most accurate

• PRANK – distinct from all other MSA programs because of its correct treatment of insertions/deletions

Page 8: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

88

MAFFT• Web server & download:

http://align.bmr.kyushu-u.ac.jp/mafft/online/server/

• Efficiency-tuned variants quick & dirty or slow but accurate

Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066© 2002 Oxford University Press

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*

Page 9: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

99

Choosing a MAFFT strategy

quick & dirty slow

but accurate

Page 10: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1010

Choosing a MAFFT strategy

quick & dirty slow

but accurate

Page 11: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1111

Choosing a MAFFT strategy

quick & dirty slow

but accurate

Page 12: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1212

Choosing a MAFFT strategy

L-INS-i

ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------

--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------

------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------

--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo

--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------

G-INS-i

XXXXXXXXXXX-XXXXXXXXXXXXXXX

XX-XXXXXXXXXXXXXXX-XXXXXXXX

XXXXX----XXXXXXXX---XXXXXXX

XXXXX-XXXXXXXXXX----XXXXXXX

XXXXXXXXXXXXXXXX----XXXXXXX

E-INS-i

oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo

---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------

-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo

---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------

---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------

quick & dirty slow

but accurate

Page 13: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1313

MAFFT outputSaving the output

• Choose a format: Clustal, Fasta, or click "Reformat" to convert to a selection of other formats

• Save page as a text file

A colored view of the alignment

Page 14: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1414PRANK

Page 15: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1515

Classical alignment errors for HIV env

Page 16: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1616

PRANK

• Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/

Page 17: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1717

PRANK output

If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/

Page 18: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1818

Downloadable PRANK

• http://www.ebi.ac.uk/goldman-srv/prank/prank/– PRANK: A command-line program interface

– PRANKSTER: A program with graphical user interface

Page 19: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

1919 1. Download and unzip the sequence files from my homepage (Google "Eyal Privman" and look for the workshop materials under "Teaching"). Open "fahA.fas" in Notepad – these are 65 protein sequences in FASTA format.

2. Run PRANKSTER, open the "fahA.fas" file, and run "Alignment""Make alignment"

3. While you wait: Copy the sequences into the MAFFT web server and run the "automatic" "moderate" strategy – which strategy did MAFFT choose for you? Click "Reformat", choose "phylip|phylip4", and save as "fahA.mafft.phylip"

4. When PRANKSTER finishes click FileSave, and save the MSA in Phylip format by the name "fahA.prank.phylip"

Page 20: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2020

Phylogeny reconstruction

Different approaches (algorithms / programs):

• Distance based methods (e.g. neighbor-joining, as in ClustalW) Fast but inaccurate

• Maximum parsimony (e.g. MEGA)

• Maximum likelihood methods (e.g. phyML, RAxML) Accurate but slower

• Bayesian methods (e.g. MrBayes) Most accurate but very slow

ABCDE

Guide tree

A

DCB

E

MSA

Pairwise distance table

Page 21: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2121

PhyMLThe most widely used maximum likelihood (ML) program

• Web server & download: http://www.atgc-montpellier.fr/phyml/

Accepts input MSA in PHYLIP format only:

• Interleaved: • Sequencial:

Page 22: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2222

Downloadable PhyMLLess user-friendly, but allows using local computer power

• Run "phyml.bat"

• Drag the file from Windows Explorer to the blue window

• Enter "d" to switch fromDNA to AA

• Enter "y" to run

Page 23: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2323

1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your email)

2. Run it with the local installation of "phyml.bat"

You should end up with a file: "fahA.prank.phylip_phyml_tree.txt"

Page 24: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2424

RAxML

• Web server: http://phylobench.vital-it.ch/raxml-bb/

• Similar maximum likelihood (ML) methodology as phyML, but much faster Faster results Better results in same run-time

Page 25: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2525

Downloadable RAxML

• A command-line program:http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm(On that page you will also find instructions for running on Windows, and the RAxML manual)

• easyRAx takes care of some of the RAxML options for you: http://projects.exeter.ac.uk/ceem/easyRAx.htmlbut installation is a somewhat more complex

Page 26: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2626

1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and enter your email)

Save the resulting tree file as: "fahA.prank.phylip.raxml"

Page 27: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2727 FigTree: tree visualization and figure creation

Manipulate a node

Manipulate a clade

Manipulate a taxon

Page 28: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

2828

1. Open "fahA.prank.phylip_phyml_tree.txt" in FigTree

2. Play around with the different options and make a pretty figure!

1. Find out how to color specific clades, as below

2. Try each of the three options under "Layout"

3. Export a figure in PDF format(File Export Graphic…)

Page 29: 1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009 .

29

Thanks for your attentionThanks for your attention

andand

happy phylogeny…happy phylogeny…