1 Construction of Phylogenetic Trees Walter M. Fitch and Emanuel Margoliash Science, New Series,...

Post on 29-Dec-2015

218 views 4 download

Tags:

Transcript of 1 Construction of Phylogenetic Trees Walter M. Fitch and Emanuel Margoliash Science, New Series,...

1

Construction of Phylogenetic Trees

Walter M. Fitch and Emanuel MargoliashScience, New Series, Volume 155, Issue 3760(Jan. 20, 1967), 279-284

Speaker : Fang-Ling Lin

Advisor : Prof. R.C. T. Lee

National Chi-Nan University

2

Outline

Basic nounsConstruct phylogenetic treeAnalyze the phylogenetic treeReconstruction of the ancestral cytochrome c

amino acid sequences.

3

Introduction

Biochemists have attempted to use quantitative estimates of variance between substances obtained from different species to construct phylogenetic trees.

These methods have not been completely satisfactory because

1. restricted2. accuracy3. mathematical

4

What is cytochrome c?

Cytochrome c is a protein that participates in the metabolism of the mitochondrion .

It will move from the mitochondrion to the cytoplasm and the cell will die.

5

Determining the Mutation Distance

The mutation distance : The minimal number of nucleotides that would need to be altered in order for the gene for one cytochrome to code for the other.

ACTGAT A C T G AT -

T C T - AT C

TCTATC

6

Problem

Given:

Output: phylogenetic tree

7

The construction of the tree

Assume there are proteins, A, B and C, and their mutation distances.

There are two fundamental problems:1. Which pair does one join together first?

2. What are the lengths of edges a, b, and c?

B C

A 24 28

B 32

8

Which pair does one join together first ?

It is simply by choosing the pair with the smallest mutation distance.

B C

A 24 28

B 32 A B C

9

What are the lengths of legs a, b, and c?

B C

A 24 28

B 32

a+b=24 a+c=28b+c=32

a=10b=14c=18

A B C

a b

c

10

When information from more than three proteins is utilizedWhen information from more than three

proteins is utilized, the basic procedure is the same.

One then simply joins two subsets to create a single subset.

Until all proteins are members of a single subset.

11

Example: 5 proteins

1 2 3 4 5

1 0 1 13 17 16

2 0 12 16 15

3 0 10 8

4 0 1

5 0

1,2 3 4 5

1,2 0 (13+12)/2

=12.5

(17+16)/2

=16.5

(16+15)/2

=15.5

3 0 10 8

4 0 1

5 0

1 2 3,4,5

a+b=1a+c=(13+17+16)/3=15.33b+c=(12+16+15)/3=14.33

a=1b=0c=14.33

a=1 b=0

c=14.33

12

Example: 5 proteins

1,2 3 4,5

1,2 0 12.5 (16.5+15.5)/2

=16

3 0 (10+8)/2

=9

4,5 0

a+b=1a+c=(16.5+10)/2=13.25b+c=(15.5+8)/2=11.75

a=1.25b=-0.25c=121 2 , 3 4 5

c=12

a=1.25 b=-0.251 0

13

Example: 5 proteins

1,2 3,4,5

1,2 0 (12.5+16)/2

=14.25

3,4,5 0

1 2 3 4 5

c=9.75

a=2.75b=6.25

1 0

a+b=9a+c=12.5b+c=16

a=2.75b=6.25c=9.75

1.25 -0.25

14

Example: 5 proteins

1,2 3,4,5

1,2 0 14.25

3,4,5 0

1 2 3 4 5

c=9.75

2.75 b=6.25

1 01.25 -0.25

x=5.75

((x+1.25)+(x-0.25))/2=6.25x=5.75

((y+1)+(y+0))/2=9.75y=9.25

y=9.25

15

Testing Alternative Trees

In this method, the output is generated by input, and the results are the same by using the same input every time.

Since a particular assignment of species to A and B subsets defines a tree, thus different assignments of species to A and B produce different trees. Check this out.

Fig. 1 is the best of 40 phylogenetic trees.

16

Phylogenetic Tree of 20 species

•Back 1•Back 2

Fig.1

17

Reconstructed distances

Values in the upper right half of the table are reconstructed distances found by summing the leg lengths in Fig.1.

i

j

original input

reconstruct value

18

Standard deviation

the percentage of change from the input data

standard deviation :summed over all values of i<j

19

The statistically optimal tree

In testing phylogenetic alternatives, one is seeking to minimize the percent “standard deviation.”

Fig.1 has a percent “standard deviation” of 8.7, the lowest of the 40 alternatives so far tested.

The percent “standard deviation” for the initial tree was 12.3.

20

The statistically optimal tree

21

Fig.1 is remarkably like that constructed in accord with classical zoological comparisons.

Almost all the alternative phylogenetic schemes tested involved rearrangements with the groups birds (turkey, chicken) and nonprimate mammals (cow, sheep, pig).

22

Three noticeable deviations

Birds of flight (Neognathae) and penguin (Impennae)

Kangaroo v.s. nonprimate mammals and placental mammals v.s. marsupials

The turtle appears more closely associated with the birds than to its fellow reptile the rattlesnake.

Fig.1

23

Indeed, from any phylogenetic ancestor, today’s descendants are equidistant with respect to time but not equidistant genetically.

The method indicates those lines in which the gene has undergone the more rapid changes.

For example, The mutation distance between mammals and primates is 7.5 and that between mammals and non-primates is 5.8. The change in the cytochrome c gene has been much more rapid in the descent of the primates than in that of the other mammals. Fig.1

24

Reconstruction of the ancestral cytochrome c amino acid sequences.

The procedure is dependent upon the phylogenetic tree on which these sequence data are arranged.

25

Amino acid No.

Ancestral MammalAncestral PrimateMonkeyMan----------Kangaroo----------Rabbit----------DogAncestral UngulatePigAncestral PerissodactylDonkeyHorse

17 18 21 39 41 50 52 53 56 64 66 68 89 94 95 98 109

V Q L H U P F A E I G L I E Q NS

V Q L H U P F S A E Y G L I Y Q N

V Q L H U P F S A E I G L I E Q N V Q L H U P F E A E I G L I E Q N

V Q L H U V F S A E Y A L I A L N

W M S H U P O S L E Y A V I G L N W M S H U P F S L W Y A V I G L N

V Q L N W P F S A W Y A L I Y L N

V Q L H U P F S A E Y G L E Y L I

V Q L H U P O S A E Y A L I G L N

W M S H U P O S L E Y A V I G L N

V Q L H U P F S A E Y A L I Y L N

V Q L H U P F S A E Y A L I Y L N

V Q L H U P O S A E Y G L I Y L N

Y YV Q L H U P F S A E G L I Q N

26

There is presently no detectable relationship between the primary structures of cytochrome c and those of hemoglobins. The reconstruction and comparison of the ancestral amino acid sequences may reval a homology that cannot be detected in present-day proteins.

The employment of such ancestral sequences may be generally useful for detecting common ancestry not otherwise observable.

27

Thank you !