From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The...

35
From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel 2011 http://ibis.tau.ac.il/twiki/bin/view/Bioinform atics/Phylogeny

Transcript of From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The...

Page 1: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

From basic Concepts to Advanced applications

Molecular Evolution and Phylogeny

By Ofir Cohen

The Bioinformatics UnitG.S. Wise Faculty of Life Science

Tel Aviv University, Israel2011

http://ibis.tau.ac.il/twiki/bin/view/Bioinformatics/Phylogeny

Page 2: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

2 of ~28

Darwin’s teachings– common descent and Tree-like evolution

Introduction – The tree concept

Page 3: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

3 of ~28

Common Descent – Modern evidence

Introduction – The tree concept

"The unity of life is no less remarkable than its diversity" "The unity of life is no less remarkable than its diversity" THEODOSIUS DOBZHANSK

Page 4: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

4 of ~28

What is a Phylogenetic Tree? Phylogenetic tree:

(hypothetical) historical pattern of evolutionary relationships among organisms

Introduction – The tree concept

Homo

Bos

Mus

Rattus0.011

0.025

0.012

0.011

Gallus

0.038

0.066

0.01

Root

Node

Leaf

Branch

(Greek: phylon = race and genetic = birth)

sps

Horizontal branch length –proportional to evolutionary distances (unit = substitution / site)

Page 5: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

5 of ~28

Molecular evidence of HIV transmission in a

criminal case

Introduction - Anecdotes

Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, 14292-14297

Page 6: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

6 of ~28

Criminal investigation

August 1994 a nurse tests negative for HIV. breaks off a messy 10 year affair with a doctor. Three weeks later the doctor gives his ex-mistress a vitamin B-12 shot

In January 1995, the nurse tests positive for both HIV and hepatitis C.

The doctor’s office records from the day are missing (but eventually found). The doctor had withdrawn blood samples from a known HIV patient and a known hepatitis C patient

the same day as the vitamin B-12 shot. The nurse had never had contact with either patient

Introduction - Anecdotes

Circumstantial evidence that the doctor injected blood from a patient of his into this ex-girlfriend….

How can this be proved using a phylogenetic approach?

Page 7: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

7 of ~28

HIV – short background

Extreme heterogeneity Within each patient there are many different viral

strains ("quasi-species")

Introduction - Anecdotes

Page 8: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

8 of ~28

History of the virus:

gp120PATIENT

VICTIM

CONTROLS

©2002 National Academy of Sciences, U.S.A.

Introduction - Anecdotes

Page 9: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

9 of ~28

History of the virus:

RT VICTIM

PATIENT

Introduction - Anecdotes

Source sequences that are paraphyletic (other sequences are nested within them)

with respect to the recipient sequences provide evidence for the direction of transmission.

Page 10: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

10 of ~28

Phylogenetic analysis: Not only among organisms - Cancer

phylogenyA phylogeny of acute myeloid leukemia (AML) subtypes

Riester et al. 2010Liu et al. 2009

Page 11: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

11 of ~28

Phylogenetic analysis: Not only in biology – Language evolution

Russell and Atkinson. 2003

Researchers learn the evolution of languages by treating them like genomes.

Instead of COGs (gene families), analyze COGNATES (words families)

Page 12: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

12 of ~28

Comparative Genomics – "All life is one"

Compare homologues sequences

Page 13: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

13 of ~28

Newick format with branch lengths

(A:0.3,((B1:0.1,B2:0.1):0.3,(C1:0.1,C2:0.1):0.5):0.3);

0.1

A

B1

C1

C2

B2http://tree.bio.ed.ac.uk/software/figtree/

Page 14: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

14 of ~28

Alignment and phylogeny are mutually dependant

Inaccurate tree building

MSA

Sequence alignment

0.4

Phylogeny reconstruction

Unaligned sequences

Page 15: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

16 of ~28

Multiple sequence alignment (MSA)Several advanced MSA programs are available.

Today we will use two: MAFFT – fastest and one of the most accurate PRANK – distinct from all other MSA programs because of its

correct treatment of insertions/deletions

Page 16: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

17 of ~28

MAFFT Web server & download:

http://align.bmr.kyushu-u.ac.jp/mafft/online/server/ Efficiency-tuned variants

quick & dirty or slow but accurate

Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066© 2002 Oxford University Press

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*

Page 17: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

18 of ~28

Choosing a MAFFT strategy

qu

ick &

dirty slow

bu

t accurate

Page 18: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

19 of ~28

Choosing a MAFFT strategy

qu

ick &

dirty slow

bu

t accurate

Page 19: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

20 of ~28

Choosing a MAFFT strategy

qu

ick &

dirty slow

bu

t accurate

Page 20: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

21 of ~28

Choosing a MAFFT strategy

L-INS-i

ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------

--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------

------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------

--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo

--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------

G-INS-i

XXXXXXXXXXX-XXXXXXXXXXXXXXX

XX-XXXXXXXXXXXXXXX-XXXXXXXX

XXXXX----XXXXXXXX---XXXXXXX

XXXXX-XXXXXXXXXX----XXXXXXX

XXXXXXXXXXXXXXXX----XXXXXXX

E-INS-i

oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo

---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------

-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo

---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------

---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------

qu

ick &

dirty slow

bu

t accurate

Page 21: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

22 of ~28

MAFFT outputSaving the output Choose a format: Clustal, Fasta, or

click "Reformat" to convert to a selection of other formats

Save page as a text file

A colored view of the alignment

Page 22: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

23 of ~28

PRANK

Page 23: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

24 of ~28

Classical alignment errors for HIV env

Page 24: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

25 of ~28

PRANK Web server: http://www.ebi.ac.uk/goldman-srv/webPRANK/

Page 25: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

26 of ~28

PRANK output

If you need a different format – copy the results to the READSEQ sequence converter: http://www-bimas.cit.nih.gov/molbio/readseq/

Page 26: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

27 of ~28

Downloadable PRANK http://www.ebi.ac.uk/goldman-srv/prank/prank/

PRANK: A command-line program interface PRANKSTER: A program with graphical user interface

Page 27: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

2828

1. Download and unzip the sequence files from my homepage (Google “Ofir Cohen" and look for the workshop materials under "Teaching"). Open "fahA.fas" in Notepad – these are 65 protein sequences in FASTA format.

2. Run PRANKSTER, open the "fahA.fas" file, and run "Alignment""Make alignment"

3. While you wait: Copy the sequences into the MAFFT web server and run the "automatic" "moderate" strategy – which strategy did MAFFT choose for you? Click "Reformat", choose "phylip|phylip4", and save as "fahA.mafft.phylip"

4. When PRANKSTER finishes click FileSave, and save the MSA in Phylip format by the name "fahA.prank.phylip"

Page 28: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

29 of ~28

Phylogeny reconstructionDifferent approaches (algorithms / programs): Distance based methods (e.g. neighbor-joining, as in ClustalW)

Fast but inaccurate Maximum parsimony (e.g. MEGA) Maximum likelihood methods (e.g. phyML, RAxML)

Accurate but slower Bayesian methods (e.g. MrBayes)

Most accurate but very slow

ABCDE

Guide tree

A

DCB

E

MSA

Pairwise distance table

Page 29: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

30 of ~28

PhyMLThe most widely used maximum likelihood (ML) program Web server & download: http://www.atgc-montpellier.fr/phyml/

Accepts input MSA in PHYLIP format only:

• Interleaved: • Sequencial:

Page 30: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

3232

1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your email)

2. Run it with the local installation of "phyml.bat"

• You should end up with a file: "fahA.prank.phylip_phyml_tree.txt"

Page 31: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

33 of ~28

RAxML Web server: http://phylobench.vital-it.ch/raxml-bb/ Similar maximum likelihood (ML) methodology as phyML, but much faster

Faster results Better results in same run-time

Page 32: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

3535

1. Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and enter your email)

• Save the resulting tree file as: "fahA.prank.phylip.raxml"

Page 33: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

36 of ~28

FigTree: tree visualization and figure creation

Manipulate a node

Manipulate a clade

Manipulate a taxon

Page 34: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

37 of ~28

1. Open "fahA.prank.phylip_phyml_tree.txt" in FigTree

2. Play around with the different options and make a pretty figure!

1. Find out how to color specific clades, as below

2. Try each of the three options under "Layout"

3. Export a figure in PDF format(File Export Graphic…)

Page 35: From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.

38 of ~28

Final Questions…

Thanks for your attention