Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Post on 01-Jan-2016

224 views 4 download

Transcript of Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Figs, Wasps, Gophers, and Lice: A Computational Exploration of

Coevolution

Ran Libeskind-HadasDepartment of Computer Science

Harvey Mudd College

The Cophylogeny Problem

From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts. Nature 1988, 332:258-259

Obligate Mutualism ofFigs and Fig Wasps

From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004

ovipostor

Indigobirds and Finches

www.indigobirds.com

• High level of host specificity (e.g. eggs and mouth markings)

Cophylogeny Reconstruction

Host tree

Problem Instance

Host tree

a b c

Parasite tree

d

e

Problem Instance

Host tree

Tips associations

a b c

Parasite tree

d

e

Possible Solutions

a b c

d

e

a b c

d

e

Input

Event Cost Modelcospeciation

a b c

d

e

cospeciation cospeciation

a b c

d

e

Event Cost Modelduplication

a b c

d

eduplication

a b c

d

e

Event Cost Modelhost switch

a b c

d

e

host switch

a b c

d

e

Event Cost Modelloss

a b c

d

e

lossloss loss

loss

a b c

d

e

Event Cost Model

a b c

d

e

cospeciation

lossloss

duplication

host switchloss

loss cospeciation

a b c

d

e

Cost = duplication + cospeciation + 3 * loss

Cost = cospeciation + host switch + loss

Some typical costs

a b c

d

e

a b c

Cost = 8 Cost = 5

cospeciation

lossloss

duplication

host switchloss

loss cospeciation+ 0

+ 2+ 2

+ 2

+ 3+ 2

+ 2 + 0e

d

How hard is this problem?

• If host switches are not permitted, we can find optimal solutions in “next-to-no-time” (time proportional to the number of nodes in the trees)…

• … but host switches shouldn’t be ignored – they are quite common…

• … and with host switches, this problem is computationally hard. How hard?

• Let’s take a short aside on “hardness”…

Snowplows of Northern Minnesota

Burrsburg

Frostbite City

Shiversville

Tundratown

Freezeapolis

A Short Aside on “Hard” Problems

“Hard” Problems

Snowplows of Northern Minnesota

Burrsburg

Frostbite City

Shiversville

Tundratown

FreezeapolisGreed? Brute Force?

“Greed” isn’t always good!

Temptingville

A

B

C

D

E

F

“Hard” Problems

The Travelling Salesperson Problem

New York

Moscow

Paris

San Francisco

Claremont

242

1942

742

1342

2142

Brute Force? Greed?

4422642

“Hard” Problems

The Travelling Salesperson Problem

Claremont 1 Montclare

ClearmontMontclear

1

1

22

“Hard” Problems

The Travelling Salesperson Problem

Claremont 1 Montclare

ClearmontMontclear

1

1

221042

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec

4900< 1 sec

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->

37 billion years!

Snowplows and Travelling Salesperson Revisited!

Travelling Salesperson Problem

Snowplow Problem

Protein Folding

NP-complete problems

Tens of thousands of other known problems go in this cloud!!

Cophylogeny Problem!

“I can’t find an efficient algorithm. I guess I’m too dumb.”

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

“I can’t find an efficient algorithm because no suchalgorithm is possible!”

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

“I can’t find an efficient algorithm, but neithercan all these famous people.”

$1 million

Coping with NP-completeness…

• Brute force • Ad hoc Heuristics• Meta-heuristics• Approximation algorithms

A Meta-heuristic Approach• Fix a timing for the host tree – a relative ordering of

the speciation events• All host switches occur “horizontally” in time• We can solve the problem optimally for a given

timing using Dynamic Programming

Genetic Algorithm• Host tree and three different possible

ordering of the speciation events.

Jane 2.0(available at www.cs.hmc.edu/~hadas/jane)

What Jane does…

Gopher/Louse pair…8 tips on gopher tree10 tips on louse tree

Best solutions found are listed here… along with total cost

But perhaps those “seemingly good” solutions of cost 11 are no better than random…

In “Stats” mode, we can generaterandom tip mappings or entirelyrandom parasite trees.

Here, we ran 50 trials with randomtip mappings.

The red dashed line shows the best solution found to our original dataset and the blue histogram shows the costs for the 50 random trials. In this case, none of the random trials resulted in solutions of cost 11 or less!

Jane Demo!