0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao...

68
1 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston, Illinois U. S. A.

Transcript of 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao...

Page 1: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

1

Fast and Accurate Reconstructionof Evolutionary Trees: a Model-based Study

Ming-Yang Kao

Department of Computer ScienceNorthwestern University

Evanston, Illinois

U. S. A.

Page 2: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

2

Perspectives

Use biology ideas to solve computer science problems

Use computer science tools to solve biology problems

biologycomputerscience

this talk

Page 3: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

3

Use Biology to Solve CS Problems

• DNA Computing

• DNA Self-Assembly

• Genetic Algorithms

• Neural Network

• Others

Page 4: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

4

Use CS to Solve Biology Problems

• Bioinformatics or Computational Biology

data mining

(this talk)

• Related fields computational neuroscience computational ecology medical informatics … many more ...

Page 5: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

5

Example Research Areas of Bioinformatics

• DNA sequencing • DNA microarray analysis• DNA self-assembly for nano-structures• DNA word design

• RNA secondary structure prediction

• Protein sequencing (my talk #4)• Proteomics• Protein database search • Protein sequence design (my talk #3)• Protein landscape analysis

• Phylogeny reconstruction (this talk)• Phylogeny comparison (my talk #1)

Page 6: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

6

Evolutionary Trees

definition: a tree with distinct labels at leaves

leaf labels: species, organisms, DNAs, RNAs, proteins, features, etc.

ancestralspecies

bird plumpeach

rice wheat

present-day species(Just a joke!)

Page 7: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

7

Evolutionary Trees

leaf labels: DNA sequences

bird plum peach

rice wheat

AAGT CCAG CCAT

CGGG CGGC

(Just a joke!)

Page 8: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

8

Problem Formulation

bird plumpeach

rice wheat

AAGT CCAGCCAT

CGGG CGGC

Input: DNA sequences of present-day species

Output: the true evolutionary tree

Question: What is “true”? Need a model!

(Just a joke!)

Page 9: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

9

A Fundamental Problem of Biology

Since the time of Charles Darwin,

Problem: reconstruct the evolutionary history of all known species.

Importance:

• intellectually fascinating

• practical benefits – medicine, food …

• Charles Robert Darwin --- 1809-1882• Origin of Species --- 1859

Page 10: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

10

Main Difficulties

• Availability of data

Hundreds of millions of species --- unlikely to be all available any time soon or ever.

But DNA sequences of more and more species are becoming available.

• Extracting information from data

focus of this talk

Page 11: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

11

Today’s Technical Focus

bird plumpeach

rice wheat

AAGT CCAGCCAT

CGGG CGGC

Input: DNA sequences of present-day species

Output: the true evolutionary tree

Question: What is “true”? Need a model!

Collaborators:Csuros & Kim

Page 12: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

12

Main Result

An algorithm that constructs an evolutionary tree from biomolecular sequences

• Provable high accuracy

• Short sequence length

• Optimal running time

• Optimal memory space

Page 13: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

13

Outline of Technical Discussion

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 14: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

14

Outline of Technical Discussion (1)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 15: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

15

Model of Evolution

Intuitions

ACGTACT

AGGAGAA

CAGGAGTTTTAA

Mutation occurs probabilistically.

1. edge length ~ time 2. edge length ~ mutation probability3. edge length ~ dissimilarity (or distance)

AGTTCCT

Page 16: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

16

Jukes-Cantor Model of Evolution (1)

Edge Mutation Probability

0.6Pe

A

X

430 gf p

e

• No insertion or deletion.

• X = A with probability 1 - 0.6 = 0.4

• X = C, G, or T with probability 0.6/3 = 0.2

Page 17: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

17

Jukes-Cantor Model of Evolution (2)

Independent Mutations along All Edges

A

A C

G

G

0.2

0.70.65

0.6

Page 18: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

18

Jukes-Cantor Model of Evolution (3)

i.i.d. mutations at every character

AAGT

AGTTCAGG

GGTG

GTTG

0.2

0.70.65

0.6

Page 19: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

19

Outline of Technical Discussion (2)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 20: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

20

Problem Formulation

AGTGT

S 4

GGTAC

CGTTT

CAGGT GTACT

TGGAC

CAGGT

CGTGT ATCGT

0.2

0.60.7

0.3

0.20.5

0.70.1

S1

S5

S3

S 2

True Tree(not known to algorithm)

Input: SSS 521,...,,

Output:

S 4

S1

S5

S3

S 2

unrooted

• Pick any sequence for the root (also unknown to algorithm).• Generate the other sequences.

but not the other sequences,nor the tree.

Page 21: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

21

Computational Objectives

Input: DNA sequences SSS 521,...,,

Output:

S 4

S1

S5

S3

S 2

Minimize:

• running time

• memory space

• probability of incorrect output

• sample size, i.e., length of the input sequences

Page 22: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

22

Outline of Technical Discussion (3)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 23: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

23

Triplets• A triplet is one formed by three leaves.

• P is the center of XYZ.

X

P

ZY

Page 24: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

24

G-depth of Triplet

# of edges between X and Y

X

Z

Y

d XY

},,max{ dddd ZXYZXYXYZ

5, 8, 7

Page 25: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

25

G-depth of a Tree

the smallest d such that the triplets of g-depth at most d covers the entire tree

g-depth = 4

the best case

Page 26: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

26

G-depth of a Tree

the smallest d such that the triplets of g-depth at most d covers the entire tree

g-depth = 2 log n

the worst case

Page 27: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

27

G-depth of a Treethe smallest d such that the triplets of g-depth at most d covers the entire tree

• at most 2 log n

• can be O(1)

Page 28: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

28

Our New Result (1)

Page 29: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

29

Our New Result (2)

polynomial sample size

Page 30: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

30

Our New Result (3)

polynomial sample size

provable high accuracy

Page 31: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

31

Our New Result (4)

polynomial sample size

provable high accuracy

optimal time & space

Page 32: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

32

Comparison with Previous Results

this talk

Page 33: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

33

Outline of Technical Discussion (4)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 34: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

34

Experimental Study Design

• Step 1 -- Pick a model tree T.

• Step 2 -- Use T to generate sequences.

• Step 3 -- Use an algorithm to reconstruct a tree T’ from the sequences (without knowing T).

• Step 4 -- Compare T’ and T.

Page 35: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

35

Wrong and Right Edges

X1

X2X4

X3

X5

X3

X2X4

X1

X5

bad

good

true tree

reconstructed tree

Page 36: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

36

Experiment #1

• the 135-taxon African-Eve tree (courtesy of Huson and Maddison)

• algorithms compared: HGT and bioNJ (Olivier Gascuel)

• parameters: sequence length and percentage of wrong edges

• edge mutation probabilities: between 0.47 and 0.088

• # of simulations = 20 per sequence length

• more experiments in progress

Page 37: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

37

135-taxon African Eve Tree

Page 38: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

38

Results of Experiment #1

Page 39: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

39

Experiment #2

• a 1892-taxon tree of eukaryotes

• algorithms compared: HGT and bioNJ

• parameters: sequence length and percentage of wrong edges

• edge mutation probabilities: between 0.47 and 0.088

• # of simulations = 20 per sequence length

• more experiments in progress

• several variants of the basic HGT

Page 40: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

40

Results of Experiment #2

Page 41: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

41

Results of Experiment #2

Page 42: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

42

Results of Experiment #2

Page 43: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

43

Outline of Technical Discussion (5)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 44: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

44

Our New Result (4)

polynomial sample size

provable high accuracy

optimal time & space

Page 45: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

45

Outline of Technical Discussion (5)

1. Describe the HGT algorithm.

2. Prove the sample size bound (and high probability for accuracy).

3. Prove the optimal time & space.

Page 46: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

46

Outline of Technical Discussion (5/1)

1. Describe the HGT algorithm.

2. Prove the sample size bound (and high probability for accuracy).

3. Prove the optimal time & space.

Page 47: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

47

Closeness and Distance of Two Leaves

0.6Pe

AAGT

AGTTX CAGG

GGTGY

GTTG

0.2

0.70.65

3lnln3

1

4

2

3

1

4

2}Pr{

3

1}Pr{

D

XYXY

XYYXYX

The larger the closeness,the more accurately we can estimate the distance.

Closeness is multiplicative.Distance is additive!!!

Page 48: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

48

Closeness = Cubic Root of Determinant

0.6Pe

T

G

C

A

1333

3133

3313

3331

PPPP

PPPP

PPPP

PPPP

eeee

ee

ee

eee

e

eeee

e

M

AAGT

CAGG

A C G T

Page 49: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

49

Closeness of Triplet

0.6Pe

AAGT

AGTTX CAGG

GGTGY GTTG

Z

0.2

0.70.65

ZXYZXY

XYZ 1111

The larger the closeness, the more accurately we can estimate the three pairwise distances.

Page 50: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

50

Assemble Triplets Into Treevia Distance Additivity (I)

X A Y

b

P

a c

9

28

31

D

D

D

YA

XA

XY

X A Y

3

P

25 6

cb

ba

ca

YA

XA

XY

9

28

31

DDD

Page 51: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

51

Assemble Triplets Into Treevia Distance Additivity (II)

X YA

B

B

X

X

Y

Y

A

3

2

106

3

Q

P

P

Q

25 6

15

15 216

9

28

31

D

D

D

YA

XA

XY

18

17

31

D

D

D

YB

XB

XY

Page 52: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

52

How to Choose Triplets to Minimize Errors?

X Z Y

3

P

25 6

9

28

31

D

D

D

YZ

XZ

XY

ZXYZXY

XYZ 1111

The larger the closeness, the more accurately we can estimate the three pairwise distances.

Greedy Strategy!

Harmonic Greedy Triplet (HGT)

Page 53: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

53

Over-Simplified Outline of HGT

• Stage 1: T’ ABC with the largest

closeness.

• Stage 2: Repeat the following steps until

T’ contains all the leaves.

Step 2(1): Pick a triplet XYZ with the largest closeness where X, Y are in T’ but Z is not.

Step 2(2): Incorporate XYZ into T’ to add Z into T’.

Page 54: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

54

Outline of Technical Discussion (5/2)

1. Describe the HGT algorithm.

2. Prove the sample size bound (and high probability for accuracy).

3. Prove the optimal time & space.

Page 55: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

55

Our New Result (4/1)

polynomial sample size

provable high accuracy

Page 56: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

56

Over-Simplified Outline of HGT

• Stage 1: T’ ABC with the largest

closeness.

• Stage 2: Repeat the following steps until

T’ contains all the leaves.

Step 2(1): Pick a triplet XYZ with the largest closeness where X, Y are in T’ but Z is not.

Step 2(2): Incorporate XYZ into T’ to add Z into T’.

Page 57: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

57

Polynomial Sequence Length (1)

)1( 34 g d XYZ

XYZ

larger

smaller

Lemma 1: The g-depth of the last triplet used in HGT is the g-depth of the true tree T.

Proof:• The largest closeness such that the triplets with same or larger closeness cover the true tree T.

• The smallest g-depth such that the triplets with same or smaller g-depths cover the true tree T.

Page 58: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

58

Polynomial Sequence Length (2)

)1( 34 g d XYZ

XYZ g-depth of tree

Lemma 1: The g-depth of the last triplet used in HGT is the g-depth of the true tree T.

Lemma 2:

)(2

XYZsequence length needed

where XYZ is the last triplet used.

Page 59: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

59

Outline of Technical Discussion (5/3)

1. Describe the HGT algorithm.

2. Prove the sample size bound (and high probability for accuracy).

3. Prove the optimal time & space.

Page 60: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

60

Our New Result (4/2)

optimal time & space

Page 61: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

61

Over-Simplified Outline of HGT

• Stage 1: T’ ABC with the largest

closeness.

• Stage 2: Repeat the following steps until

T’ contains all the leaves.

Step 2(1): Pick a triplet XYZ with the largest closeness where X, Y are in T’ but Z is not.

Step 2(2): Incorporate XYZ into T’ to add Z into T’.

Page 62: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

62

Optimal Time/Space for the First Triplet

• Stage 1:

Fix an arbitrary leaf A.

T’ ABC with the largest closeness.

• Stage 2:

Repeat the following steps until T’ contains all the leaves.

Step 2(1): Pick a triplet XYZ with the largest closeness where X, Y are in T’ but Z is not.

Step 2(2): Incorporate XYZ into T’.

Page 63: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

63

Optimal Time/Space for the Other Leaves

partially reconstructed tree

not yet recovered

Y

X

Z

XYZ

A

B

C

ABC

P

Q

only need to consider thetriplets formed byone of X, Y, one of B, C,and one of

Page 64: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

64

Outline of Technical Discussion (6)

1. Define the model of evolution.2. Formulate the computational

problem.

3. Discuss the theoretical performance of our algorithm.

4. Discuss the empirical performance.

5. Describe and analyze the algorithm.

6. Further research.

Page 65: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

65

Further Research

• more general models of evolution

• practical implementations

Page 66: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

66

Main Difficulties

• Availability of data

Hundreds of millions of species --- unlikely to be all available any time soon or ever.

But DNA sequences of more and more species are becoming available.

• Extracting information from data

focus of this talk

Page 67: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

67

Do the genomes of all green plants contain enough information for the reconstructionof their evolutionary tree?

• genome size of eukaryotes: base pairs

• # of green plant species: several

If so, does this impose any necessary structure on the information or the tree? If so, how do we determine and use that structure?

Beyond All Computational Considerations

1010116

~

What do you think?

The End.

Thank You!

108

Page 68: 0 Fast and Accurate Reconstruction of Evolutionary Trees: a Model-based Study Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

68

Data Mining Flowchart

true tree(unknown)

collect & processindividual sequences

compare & alignmultiple sequences

tree reconstructionalgorithms

tree verification(compare & refine)

evolution models

generatesequences further

process

parameters

distance or characters

treesinformation

refine

infer

today’s focus

parameters