DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal...

28
DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013

Transcript of DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal...

Page 1: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents

Zhaoming YinBader-Polo Joint Group Meeting, Nov 11, 2013

Page 2: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Contribution

• Research Aspect

-A framework to solve the maximum parsimonious tree with the input of unequal genome contents.

-Proved Adequate subgraph theory is applicable in unequal contents data which reduces search space.

-provide a benchmark for the HPC community.

• Engineering Aspect

-Implement software with many state of the art features such as supertree method, GAS initialization method, spectral partition etc.

-The software can produce a tree with not only topologies, but also type/number of different evolution events (visualization!).

Page 3: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Why Phylogenetic Tree Problem is Hard?• For N genomes, there are (N-3)!! number of

possible tree topologies.• For each topology, we need to compute at least

one different median, the possible median order are (g-2)!! . g is the number of genes.

• To validate each possible median, if the gene content has duplications, it’s NP hard.

• So the complexity type of computing the MP tree with uneuqal contents genomes is:

NP hard over NP hard over NP hard!

Page 4: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Phylogenetic Tree

This picture presents the phylogeny of the “12 Drosophila.”

From http://insects.eugenes.org/species

Page 5: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Maximum Parsimony Concept

5

1

23

4

13 2

4

6 5 6

5

1 4 2 3

6

Of all possible topologies, the maximum parsimonious tree is the one that has the minimum total tree length

Page 6: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Genome Rearrangement

http://ai.stanford.edu/~serafim/CS374_2006/presentations/lecture17.ppt

Page 7: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Genome RearrangementIn 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution.

1 2 3 4 5 6 7 8 9 10

1 2 –6 –5 -4 -3 7 8 9 10

1 2 7 8 3 4 5 6 9 10

1 2 7 8 –6 -5 -4 -3 9 10

Inversion:

Transposition:

Inverted Transposition:

Page 8: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Genome Median Computation

5

1

23

4

14 2

3

65 6

1

2

3

5

4

6

1

2

3

5

4

6

Page 9: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Genome Median Computation

1

2

3

5

4

6

1,2,3

1,-3,-2-2,-1,3

1,2,3 = 2 moves2,-1,3 = 5 moves…..

Page 10: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 1: Spectral Partition

Page 11: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 2: Compute MP Tree for Each Sub-Disk

Page 12: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 2-1: How to Compute Median (BNB)

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

1

2

3 45

6

78

Page 13: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 2-2: How to Compute Median (LK)

………………….

stop

Page 14: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 2-2: How to Evaluate Median

1

med1, 2, 3, 3, 4, 6, 5

1, 2, 3, 4, 3, 6, 5

1, 2, 3, 4, 6, 3, 5

1, 2, 5, 4, 6, 3, 3

Dis(m,1)+Dis(m,2)+Dis(m,3)

23

Page 15: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 2-2: How to Evaluate Median

1, 2, 3, 3, 4, 6, 5

1, 2, 3, 4, 3, 5

Find a mapping first (NP hard) dis=1

1, 2, 3, 3, 4, 6, 5

-2, -1, 3, 3, 4, 5

Complete the loss (polynomial) dis =2

1, 2, 3, 4, 6, 5

-2, -1, 3, 4, 6, 5

Compute DCJ (polynomial) dis =3

1, 2, 3, 4, 6, 5

1, 2, 3, 4, 6, 5

Page 16: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 3: Merge Disks

Decomposition of The disks

Construct a tree for each disk

Merge the tree usingA specific consensus method:Strict, majority etc…

Disambiguation

Page 17: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step 4: Initialization

1

2

3

5

4

6

X

1 2

c

b

e

d

Init by insertionWhich is local

Init by prospectionWhich is global.

Page 18: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Step5: Iterative Refinement

12

3 4

a

b

Page 19: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Review

• Step 1: Spectral partition• Step 2: Subtree construction• Step 3: Supertree merge• Step 4: Initialization of complete tree using

General Adequate Subgraph (GAS) method.

• Step 5: Iterative Refinement until the complete tree converged.

Page 20: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Result—Simulated Data

seed#Theta+#gamma+#phi operations

We know the total number of evolution event in the model tree

We grow our own tree

Page 21: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Result--Accuracy

%of duplication 0.1% of loss 0.1Theta is % of inversion

There are 8 species2*8-3 =13edges.So the average accuracy is ~90%

Page 22: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Result – Real Data

SCRaMbLE Matrix

• We can represent a SCRaMbLEd strain by its vector.• The sign gives the orientation. • The color encodes the position in the synthetic chromosome.

Page 23: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Result – Real Data

#inversion:#insertion/deletion:#duplication

Page 24: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Parallel Method [Bader 05]

Parallel search

Load Balancing

Page 25: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Experimental Results (Parallel)

Page 26: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Why Many-core BnB?

• So many distributed memory MIP BnB frameworks (PICO, PEBBL, ALPS, COIN-OR).

• Load balance of distributed BnB is highly relied on Ramp up, run time load balancing is not efficient.

• But nowadays Peta-flops machines are mostly hybrid systems(distributed + many-core (or accelerators)).

Page 27: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Experimental Results (Intel Phi knapsack)

Page 28: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.