Advanced programming 236512 Algorithms for reconstructing phylogenetic trees spring 2006

60
. Advanced programming 236512 Algorithms for reconstructing phylogenetic trees spring 2006 Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau, Taub 700, tel 4894 Website: http://webcourse.cs.technion.ac.il/236 512/

description

Advanced programming 236512 Algorithms for reconstructing phylogenetic trees spring 2006. Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau, Taub 700, tel 4894 Website: http://webcourse.cs.technion.ac.il/236512/. Evolution. Evolution of new organisms is driven by - PowerPoint PPT Presentation

Transcript of Advanced programming 236512 Algorithms for reconstructing phylogenetic trees spring 2006

Page 1: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

.

Advanced programming 236512Algorithms for reconstructing

phylogenetic trees spring 2006

Lecturer: Shlomo Moran, Taub 639, tel 4363TA: Ilan Gronau, Taub 700, tel 4894Website: http://webcourse.cs.technion.ac.il/236512/

Page 2: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

2

Evolution

Evolution of new organisms is driven by

Diversity Different individuals

carry different variants of the same basic blue print

Mutations The DNA sequence

can be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 3: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

3

The Tree of Life

Sou

rce:

Alb

erts

et

al

Page 4: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

4

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Page 5: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

5

Theory of Evolution

Basic idea speciation events lead to creation of different

species. Speciation caused by physical separation into

groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Page 6: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

6

Phylogenenetic trees

Leaves - current day species (or taxa – plural of taxon) Internal vertices - hypothetical common ancestors Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Page 7: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

7

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

Page 8: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

8

Types of treesUnrooted tree represents the same phylogeny without

the root node

Usually, data from current day species does not distinguish between different placements of the root.

Page 9: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

9

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents the three rooted trees

Page 10: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

10

Positioning Roots in Unrooted Trees

We can estimate the position of the root by introducing an outgroup:

a set of species that are definitely distant from all the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

Page 11: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

11

Morphological vs. Molecular

Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features

Gene sequences Protein sequences

Analysis based on homologous sequences (e.g., globins) in different species

Page 12: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

12

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

Page 13: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

13

Type of Data

Distance-based (The project focus on this method). Input is a matrix of distances between species Can be fraction of residue they disagree on, or

alignment score between them, or …

Character-based Examine each character (e.g., residue) separately

Not covered in this project

Page 14: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

14

Constructing trees from distances:

Transform differences between species to numerical distances

Find a weighted tree that realizes/approximates the distances between the species.

The task is:Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03
Page 15: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

15

Exact solution: Additive sets

Given a set S of n objects with an n×n distance matrix:d(i,i)=0, and for i≠j, d(i,j)>0d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) ≤ d(i,j)+d(j,k).

Can we construct a weighted tree which realizes these distances?

Page 16: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

16

There is always a tree for 3 objects

For n=3: There is always a (unique) tree with one internal node.

( , )( , )( , )

d i j a bd i k a cd j k b c

ab

c

i

j

k

v

i j k

i 0 a+b a+c

j 0 b+c

k 0

Distance metrics on 4 objects may not have a tree.

Page 17: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

17

The Four Points Condition

Definition: A distance metric on n objects satisfies the four points condition iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)

ik

lj

Theorem: A distance metric is additive , it satisfies the four points conditionNote: The four point condition implies O(n4) algorithm, which is not very efficient.

Page 18: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

18

Constructing additive trees:The neighbor joining problem

Let i, j be neighboring leaves in a tree, let v be their parent, and let k

be any other vertex.

The formula

shows that we can compute the distances of v to all other leaves.

1

2( , ) [ ( , ) ( , ) ( , )]d k v d k i d k j d i j

d(k,v)

i

j

k

v

Page 19: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

19

Constructing additive trees:The neighbor joining problem

This suggest the following method to construct tree from a distance

matrix:

1. Find neighboring leaves i,j in the tree,

2. Replace i,j by their parent v and recursively construct a tree T

for the smaller set.

3. Add i,j as children of v in T.

Page 20: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

20

Neighbor Finding

How can we find from distances alone a pair of neighboring leaves (called also cherries)?

Closest vertices aren’t necessarily neighboring leaves.

AB

CD

Page 21: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

21

Neighbor Finding: Seitou&Nei method

Theorem (Saitou&Nei) Assume all internal edge weights are positive. If Q(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree.

is a leaf

For a leaf , let

For leaves

2

( , ).

, :

( , ) ( ) ( , ) ( )

iu

i j

i r d i u

i j

Q i j n d i j r r

Definitions

Page 22: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

22

S&N Neighbor Joining Algorithm If n =3, return tree of three vertices Compute Q(i,j) for all i,j Choose i,j such that Q(i,j) is minimal Create new vertex v, and set

ij

v

k

1 (for some

2 // or could be 0

1for each vertex ,

2

( , ) [ ( , ) ( , ) ( , )] )

( , ) ( , ) ( , ) ( , ) ( , )

( , ) [ ( , ) ( , ) ( , )]

d i v d i j d i r d j r r

d j v d i j d i v d i v d j v

k d v k d i k d j k d i j

remove i,j, and add v to the set of objectsRecursively construct a tree on the smaller set, then add i,j as children of v, at distances d(i,v) and d(j,v).

d(k,v)

Page 23: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

23

Initialization: θ(n2) to compute r(i) and Q(i,j) for all i,jL.

Each Iteration: O(n2) to find the maximal Q(i,j). O(n) to compute {D(v,k):k L} for the new node v,

and to update the matrix. O(n2) to update the values Q(i,j).

Total of O(n3).

Complexity of S&N Neighbor Joining Algorithm

ij

k

D(v,k)

Page 24: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

24

Some remarks on S&N Neighbor Joining Algorithm

Applicable to matrices which are not additive

Known to work good in practice.

The algorithm and its variants are the most widely used

distance-based algorithms today.

Next we present a more efficient Neighbor Joining

algorithm, which is based on LCA distances.

Page 25: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

25

Least Common Ancestor distances

Definition: Given a weighted tree T and a specific vertex r in it:

dT(r;i j)=distance in T from r to path(i,j).

dT(r;i i)=distance in T from r to i.

A E

D

CB

r

3

55

2312

2

5

2

3Edge weights:

LCA distances:DT(r;AD)= 3

78

5

76DT(r;AA)= 7

Page 26: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

26

Least Common Ancestor distances

The distances dT(r;i,j) can be presented by a matrix:

A B C D E

A 7 0 0 3 5

B 8 5 0 0

C 7 0 0

D 5 3

E 6A E

D

CB

r

3

55

78

675

Page 27: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

27

LCA Matrices

Definition: A symmetric nonnegative matrix L is an LCA matrix iff

1. For each i: L(i,i)=maxj{L(i,j)}

2. It satisfies the “3 points condition”:

for each 3 distinct indices i, j, k ,

L(i,j) ≥ min {L(i,k), L(j,k)}

“the smallest value appears twice”

j k

i 11 9 6

j 8 6

Page 28: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

28

LCA Matrices

j k

i 9 6

j 6

Theorem: The following conditions are equivalent for an (n-1)(n-1) matrix L:

1. L is an LCA matrix.

2. There is a weighted tree T with n leaves and a leaf r in T such that for each pair of leaves i,j r:

L(i,j)= dT(r;ij)

Page 29: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

29

LCA distances LCA matrix

There is a weighted tree T s.t. L(i,j)= dT(r;ij).

L is an LCA matrix: By properties of least common ancestors in trees

ij

k

L(k,i) = L(j,i) L(k,j)

r

Page 30: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

30

LCA matrix LCA distances

Now we are given an LCA matrix L and need to construct a tree. The construction uses “maximal off diagonal” entries:

L(i,j) is a “maximal off-diagonal” in entry in row i if L(i,j)=maxk{L(i,k):k i}

1 2 k

1 18 9 8 3 7

Example: L(1,2) is maximal off diagonal entry in row 1

Page 31: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

31

Maximal off diagonal entries

Lemma: If L(i,j) is the maximal “off-diagonal” entry in both rows i and j in L, then for all k i,j: L(i,k)=L(j,k).

Proof: By the 3 points condition on {i,j,k}.

i j k

i 18 9 8 3 7

j 9 14 8 3 7

Example for i=1, j=2

Page 32: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

32

LCA matrix LCA distances:Proof by induction

We now prove by induction on n: L is an (n-1)(n-1) LCA matrix

There is a weighted tree T with a root r as in the theorem.

Basis: n= 2. L=[w]. T is a tree with a single edge of weight w.

4r i4

Page 33: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

34

Induction stepInduction step: n ¸ 3. Let L be an LCA matrix of

dimension n-1. We describe an algorithm for constructing the corresponding tree:

1. Find i,j s.t. L(i,j) is the maximal off-diagonal entry in L.

i j k

i 11 9

j 9 14

L

(In the example i=1 and j=2)

Page 34: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

35

Induction step

2. Let L` be the matrix obtained by removing rows/columns i and j, and inserting row/column v s.t. L`(v,v)=L(i,j), and for k i,j,

L`(v,k)=L(i,k) (=L(j,k))

v k

v 9 8 3 7

L`

1 2 k

1 11 9 8 3 7

2 9 14 8 3 7

L

Page 35: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

36

Induction Step

To show that L` We is an LCA matrix we need a definition and a

simple observation:

Definition: Let L be an nn matrix, and let S {1,...,n}.

L[S] is the submatrix of L consisting of the rows and columns with

indices from S.

Observation 1: If L is an LCA matrix then for every S {1,...,n},

L[S] is also an LCA matrix.

Page 36: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

37

Induction step Claim: L` is an LCA matrix of dimension n-2 Proof: Let S be all leaves except j. Than L` is obtained from

L(S) as follows:1. change the index i to v2. set L`(v,v)Ã L(i,j)By Observation 1 and the maximality of L(i,j), L` is also an

LCA matrix.

v k

v 9 8 3 7

L`

1 2 k

1 11 9 8 3 7

2 9 14 8 3 7

L

Page 37: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

38

Induction step

3. Construct a tree T` for L` (with n-1 leaves)

v k

v 9 8 3 7

v

T`L`

Page 38: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

39

Induction step

4. Add to v to childs, for i and j, with appropriate edge lengths.

v

T`i j k

i 11 9 8 3 7

j 9 14 8 3 7

2 5

ij

Page 39: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

40

Deepest LCA neighbor joining If n · 3, return tree of n vertices Prepare a list MAX of size n, s.t.

MAX(i ) = maximal off diagonal element in row i

Recursion: Find i,j s.t. L(i,j) is maximal off diagonal entry of L Make the reduction to L` as described update the list MAX (only MAX(v) needs an update!) Construct T` for L` Add i and j as childs of v.

v

T

`i j

Page 40: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

41

Complexity AnalysisInitialization: Constructing MAX - O(n2).

Let Time(n) be the complexity of the algorithm, given the input matrix L and the list MAX. Time(n) is given by:

Reducing L to L`: O(n) Updating MAX: O(n). Constructing T` from L`: T(n-1). Constructing T from T`: O(1).

Time(n)· Time(n-1)+O(n)

Hence Time(n)=O(n2)

Page 41: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

42

Seitou&Nei vs. DLCA methods

DLCA like S&N can be implemented on noisy data (in many ways)

On exact data, DLCA and S&N methods have the same (correct)

output. They differ on noisy data (which occurs in practice).

One basic difference: Unlike S&N method, the DLCA algorithm

depends on selecting a root. Hence DLCA may produce many

different trees on the same output.

Some of the projects will concentrate on this difference.

Page 42: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

43

Incremental Reconstruction via Local Queries

Incrementally reconstructing the tree:

a

bc

d

ef

g

h

6

4 1

2

3

5

a

bc

d

ef

g

h

12 3

4 5

6

When inserting a new taxon x to a given topology T, we need to find out to which

edge x should be attached.

We are allowed to ask the ‘oracle’ local queries LQ(x,v).

(x – taxon, v – internal vertex)

Page 43: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

44

Local Queries - Motivation

Asking LQ(x,v) is equivalent to asking the topology of {x, a, b, c},

where v is the center-point of a,b,c in T.

a

bc

d

ef

g

h

6

4 1

2

3

5

f

a

bc

d

e

12 3

4

Such questions can be asked directly (using likelihood) or through a pairwise

distance matrix (which will be discussed later)

Page 44: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

45

Balancing Vertices

We’d like to minimize the number of queries required for inserting a

new taxon.

Lower bound – log3(|ET|). (simple adversarial argument)

Upper bound – log2(|ET|).

The algorithm which achieves the upper bound uses ‘balancing vertices’:

A balancing vertex in T is an internal vertex, which splits T into 3 subtrees

of size at most ceil(|T|/2).

Using balancing vertices in the local queries, the edge to which a new

taxon should be attached can be found in ~ log2(|ET|) queries.

Page 45: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

46

Balancing Vertices

Every tree contains either a single balancing vertex or two adjacent

balancing vertices.

Finding a balancing vertex:

Start at some arbitrary vertex v. If v is balancing, stop.

Otherwise, continue to the vertex u, adjacent to v in the ‘heaviest’ subtree.

The algorithm traverses each edge at most once

Time complexity – O(|T|).

a

c

d

ef

g

h

13 edges in T

11 edges9 edges7 edges

Page 46: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

47

A Simple and Efficient Algorithm

Iteratively add taxa 1,2,…,n to the topology

When adding taxon x to topology T:

If T is trivial (consists of a single edge), attach x to that edge.

Otherwise: Find a balancing vertex v of T.

Ask query LQ(v,x)

Continue recursively on T’, the subtree corresponding to the answer of the query.

Complexity:

Adding taxon 1≤x≤n to T takes O(log(x)) queries and O(x) time.

Total query complexity: O(n·log(n))

Total time complexity: O(n2)

Page 47: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

48

Interesting Issues

Two major issues are raised in this area:

Queries do not always have reliable answers- Use confidence level for answers

- Verify the answers

Reduce running time to O(n·log(n))- Finding balancing vertices leads to high overhead

- Maybe we don’t have to re-compute the balancing vertices in every stage

Page 48: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

49

Robustness to Noise in Data

Answering local queries using a distance matrix D: We wish to assess the topology spanned by four taxa: x, a, b, c.

Observe the 4×4 submatrix of D over x, a, b, c:

a

bc

x

b x a c

bx

ac

If D is additive then there is a labeling of the taxa by i, j, k, l s.t:

D(i,j) + D(k,l) ≤ D(i,k) + D(j,l) = D(i,l) + D(j,k)

The configuration of the quartet is (ij ; kl), and the path separating them is of

length ½(D(i,k) + D(j,l) - D(i,j) + D(k,l))

If D is not additive we set the configuration of the quartet to (ij ; kl),

where D(i,j) + D(k,l) is minimal of the three sums.

Confidence of prediction can be estimated by the difference between

maximal and minimal sums.

?

Page 49: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

50

Robustness to Noise in Data

Answering local queries using a distance matrix D: We can check several quartets of type x, a, b, c to answer a single local query.

Example: To answer LQ(1,g) we can check all quartets in

{g} ×{a} ×{c,f} ×{b,d,e}

We can choose a representative set of quartets, and answer the local

query according to (weighted) majority.

If the answer is still inconclusive, we can choose to ask another local

query.

a

bc

d

ef

12 3

4

g?

Page 50: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

51

Improving Running Time

Separator Trees: A deterministic algorithm which inserts a new taxon x to a given topology T

can be viewed as a rooted decision tree.

• Each internal node represents a local query (internal vertex in T).

• Each internal node has three outgoing edges corresponding to the three possible

answers to the query.

• Each leaf corresponds to an edge in T.

A special case of decision trees are separator trees.

The time complexity of the algorithm is the depth of the separator

tree

a b c d e f g h i j k l m

1 2 5

3 6

S:4

a

b

d

ef

g

h

i

jl

mk

1

23

4

5

6

T:

c

Page 51: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

52

a b c d e f g h i j k l m

Improving Running Time

Balanced Separator Trees: A balanced separator tree uses balancing vertices (of the appropriate subtrees of T)

Can be constructed in O(n·log(n)) time

Inserting a taxon does not drastically harm the balance

If we allow some imbalance, we can guarantee that the costly balancing

procedure is executed few times during construction of the whole topology.

Amortized analysis of total time complexity: O(n·log2(n))

a

b

c

d

ef

g

h

i

jl

mk

1

23

4

5

6

1 2 5

3 6

T: S:4

Page 52: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

53

Improving Running TimeBottom-up approach: (simple separator trees)

Start with the edge-set of T

Choose disjoint edge triplets, s.t. that each triplet contains at least one leaf

Contract each triplet to a single edge

Recursively continue on the reduced topology

T: S:

a

b

c

d

ef

g

h

i

jl m

k

1

23

4

5

6

j

1

2 3 4 5 6

j3

56

5

a b c d e f g h i j k l m

1 2 4 6j

3 6j

5

Page 53: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

54

Improving Running Time

Bottom-up approach: (simple separator trees) By simple linear traversal of T you can find θ(|T|) edge-triplets

Topology size is reduced by a constant factor each stage

• Depth of simple separator tree is O(log(n))

• Time complexity is O(n).

Insertion of taxon induces modifications propagating bottom-up through the

layers of the separator tree

a

b

c

d

ef

g

h

i

jl m

k

1

23

4

5

6

j

1

2 3 4 5 6

j3

56

5

a b c d e f g h i j k l m

1 2 4 6j

3 6j

5

IS: {1,2,4,6}

IS: {3}

IS: {5}

Page 54: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

55

ATTCG …ATACG …ACTGG …...

Testing Reconstruction Methods on Noisy Data

We’d like to test reconstruction algorithms on actual phylogenetic data.

Problem: Confirmed phylogenetic trees are scarce and small.

Solution: Simulate the data.

Generate an edge-weighted tree under some probabilistic model

(Yule-Harding)

Choose random DNA string for root and simulate evolution on tree to obtain sequences for all leaves

SeqGenDNAdist

from

PHYLIP

Obtain pariwise distances from

sequences

00

00

00

00

0

TD

T’Reconstruction

AlgorithmCompare topologies

Page 55: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

56

The ProjectsProject I: The DLCA algorithm

Implement algorithms: Saitou&Nei's neighbor joining DLCA neighbor-joining

mid-point reduction maximal-value reduction

Simulate data:Use pre-generated trees to simulate process of evolution (using SeqGen program)For each tree generate several sequence-sets Experiments:Test the various algorithms on the generated data:

Use DNADIST program (part of the Phylip package) to get a distance matrix corresponding to the sequence-set of the leaves.

Execute algorithms on distance matrix Check topological accuracy using the RF-score

Page 56: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

57

The Projects

Project II: Fast Algorithms Using Local Queries

Implement algorithms: Implement advanced data structures which support the various algorithms: Algorithm using semi-balanced separator trees Algorithm using simple separator trees

Simulate data:Use pre-generated trees and/or uniform random model

Experiments: Test the various algorithms on the generated trees:

o Use the generated trees to answer the local queries asked by the algorithms.

o Compare the performance of the different algorithms on this data.

Page 57: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

58

The Projects

Project III: Robust Algorithms Using Local Queries

Implement algorithms: Implement the O(n2) algorithm using O(n·log(n)) queries

Simulate data:Use pre-generated trees and distance matrices

Experiments: Test various approaches on the generated data:

o Use the distance-matrices to answer the local queries asked by the algorithms.

o Suggest some method of estimating the confidence level of an answer to a query.

o Check for errors in the reconstructed topology. Compare several approaches

Page 58: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

59

Grading Scheme

10% - work plan 60% - final report + submitted code

Rough distribution of grade: 40% - meeting project requirements 10% - code organization and documentation 10% - innovation and creativeness

30% - final presentation

Page 59: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

60

Schedule

21/3 – Introductory meeting

28/3 – Deadline for choosing a project

26-30/3 – Individual 30 minute meetings with each teem to discuss the

specification of the project.

23-27/4 – Individual 60 minute meeting with each team to discuss work

plan and design of project

2/5 – Deadline for submitting work plan

21-25/5 – Individual progress meetings

18-22/6 – Concluding 60 minute meetings with each team

27/6, 4/7 – Project presentations and submission of final draft

Final submission deadline – To be announced

Page 60: Advanced programming 236512 Algorithms for reconstructing phylogenetic trees  spring 2006

61

Homework

Team up in pairs

Choose project

Send me e-mail containing:

The names, id numbers, e-mails of all students in the group

Preferred project + 2nd priority project

Two optional dates for first project meeting (next week)

Go over references of your chosen project

Good Luck !