Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance...

33
Phylogenetic Trees Tutorial 6
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    2

Transcript of Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance...

Phylogenetic Trees

Tutorial 6

• Measuring distance

• Bottom-up algorithm (Neighbor Joining)– Distance based algorithm– Relative distance based

Phylogenetic Trees

Tutorial 6

• Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges.

• Jukes-Cantor

, Fraction of sites where residues differi jd f

Measuring Distance

,

3 4log(1 )

4 3i jd f

Measuring Distance (cont)• Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences

• the score increases proportionally to the extent of dissimilarity between residues

2

,1

( , )n

a b i ii

d s a b

Star StructureAssumption: Divergence of sequences is assumed to occur at constant rate Distance to root equals

a

d

c

b

6

a b c d

a 0 8 7 5

b 8 0 3 9

c 7 3 0 8

d 5 9 8 0

a

d

c

b

Basic Algorithm

Initial star diagramDistance matrix

7

a b c d

a 0 8 7 5

b 8 0 3 9

c 7 3 0 8

d 5 9 8 0

a

d

c

b

Choose the nodes with the shortest distance and fuse them.

Selection step

8

dc,b e

a

a,d

c

e

b

Dce

Dde

f

d

ac

e

b

Daf

Dde

fDce

Dbf

a b c d

a 0 8 7 5

b 8 0 3 9

c 7 3 0 8

d 5 9 8 0

1 2

3

9

Neighbor Joining Algorithm

Constructs unrooted tree.

Step by step summary:

1. Calculate all pairwise distances.

2. Pick two nodes (i and j) for which the distance is minimal.

3. Define a new node (x) and re-calculate the distances from the free nodes to the new node.

4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.

5. Continue until two nodes remain – connect with edge.

Neighbor Joining’ (merging close sequences – not the actual algorithm)

Pick two nodes for which the distance is minimal (i,j)

Node 10 is a new node.

5,6

Re-calculate the distances from new node

I,j : the fused nodes (5,6)X :a new added node (node 10)m :the remaining nodes in the star

, , ,, 2

i m j m i jX m

d d dd

Calculate Dix and Djx

r : ~average distance to nodes L : number of leaves left in the tree (leaves nodes representing taxa, sequences,etc)

,,

,, , ,

2

2

i j i jX i

i j j iX j i j X i

d r rd

d r rd d d

,

,

2

2

i ki

j kj

dr

Ld

rL

Calculate Dix and Djx

r5=ΣD5k/(L-2)= 3.22406/(9-2)=0.46058

r6=ΣD6k/(L-2)= 3.22758/(9-2)=0.461083

ΣD5k

ΣD6k

Calculate Dix and Djx

D10,5=(D5,6+r5-r6)/2=(0.06088+0.46058-0.461083)/2) = 0.0301886

D10,6=D5,6-D10,5=0.06088-0.0301886=0.0306914

0.0301886

0.0306914

Step 2

0.080375

0.044625

Step 3

0.069258

0.040447

Step 4

Step 5

Step 6

Step 7

Problems

0.1 0.10.1

0.40.4

43

1 2

Step by step summary:1. Calculate all pairwise distances.

2. Pick two nodes (i and j) for which the

relative distance is minimal (lowest).

3. Define a new node (x) and re-calculate the distances from the free nodes to the new node.

4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.

5. Continue until two nodes remain – connect with edge.

Neighbor Joining (Not assuming equal divergence)

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

, , ( )i j i j i jM d r r

,

,

2

2

i ki

j kj

dr

Ld

rL

, , ( )i j i j i jM d r r • Negative values

• As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value.

• Select pair that produce lowest value

• Reevaluate M with every iteration

JI

X

M

0.1 0.10.1

0.40.4

43

1 2

0.1 0.10.1

0.40.4

43

1 2

Re-calculate the distances from new node

, , ,, 2

i m j m i jX m

d d dd

,,

,, , ,

2

2

i j i jX i

i j j iX j i j X i

d r rd

d r rd d d

,

,

2

2

i ki

j kj

dr

Ld

rL

JI

X

M

31

EXAMPLE

   A  B  C  D  E

 B  5        

 C  4  7      

 D  7  10  7    

 E  6  9  6  5  

 F  8  11  8  9  8

   A  B  C  D  E

 B  -13        

 C  -11 -11      

 D  -10  -10 -10.5    

 E  -10  -10 -11 -13  

 F -10.5 -10.5  -11  -11.5  -11.5

Original distance Matrix Relative Distance Matrix (Mij)

The Mij Table is used only to choose the closest pairs and not for calculating the distances

1

7

5

3

6

2

4

0.2

Bacillus

1

3

7

5

6

2

4

0.2

1

5

3

7

6

2

4

0.2

3

5

7

1

6

2

4

0.2

Bacillus

Bacillus

Bacillus

E.coli

E.coli E.coli

E.coli

Pseudomonas

Pseudomonas

Pseudomonas

Pseudomonas

Salmonella

Salmonella Salmonella

Salmonella

Aeromonas

Aeromonas

Aeromonas

Aeromonas

Lechevaliera

Lechevaliera

Lechevaliera

Lechevaliera

Burkholderias

Burkholderias

Burkholderias

Burkholderias

Problems with phylogenetic trees

Software

PHYLIP

PAUP

MEGA3

http://evolution.gs.washington.edu/phylip.html

http://paup.csit.fsu.edu/

http://www.megasoftware.net/

http://evolution.genetics.washington.edu/phylip/software.htmlMore