Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

38
Figs, Wasps, Gophers, and Lice: A Computational Exploration of Coevolution Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College

Transcript of Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Page 1: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Figs, Wasps, Gophers, and Lice: A Computational Exploration of

Coevolution

Ran Libeskind-HadasDepartment of Computer Science

Harvey Mudd College

Page 2: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

The Cophylogeny Problem

From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts. Nature 1988, 332:258-259

Page 3: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Obligate Mutualism ofFigs and Fig Wasps

From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004

ovipostor

Page 4: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Indigobirds and Finches

www.indigobirds.com

• High level of host specificity (e.g. eggs and mouth markings)

Page 5: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Cophylogeny Reconstruction

Host tree

Page 6: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Problem Instance

Host tree

a b c

Parasite tree

d

e

Page 7: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Problem Instance

Host tree

Tips associations

a b c

Parasite tree

d

e

Page 8: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Possible Solutions

a b c

d

e

a b c

d

e

Input

Page 9: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Event Cost Modelcospeciation

a b c

d

e

cospeciation cospeciation

a b c

d

e

Page 10: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Event Cost Modelduplication

a b c

d

eduplication

a b c

d

e

Page 11: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Event Cost Modelhost switch

a b c

d

e

host switch

a b c

d

e

Page 12: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Event Cost Modelloss

a b c

d

e

lossloss loss

loss

a b c

d

e

Page 13: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Event Cost Model

a b c

d

e

cospeciation

lossloss

duplication

host switchloss

loss cospeciation

a b c

d

e

Cost = duplication + cospeciation + 3 * loss

Cost = cospeciation + host switch + loss

Page 14: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Some typical costs

a b c

d

e

a b c

Cost = 8 Cost = 5

cospeciation

lossloss

duplication

host switchloss

loss cospeciation+ 0

+ 2+ 2

+ 2

+ 3+ 2

+ 2 + 0e

d

Page 15: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

How hard is this problem?

• If host switches are not permitted, we can find optimal solutions in “next-to-no-time” (time proportional to the number of nodes in the trees)…

• … but host switches shouldn’t be ignored – they are quite common…

• … and with host switches, this problem is computationally hard. How hard?

• Let’s take a short aside on “hardness”…

Page 16: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Snowplows of Northern Minnesota

Burrsburg

Frostbite City

Shiversville

Tundratown

Freezeapolis

A Short Aside on “Hard” Problems

Page 17: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“Hard” Problems

Snowplows of Northern Minnesota

Burrsburg

Frostbite City

Shiversville

Tundratown

FreezeapolisGreed? Brute Force?

Page 18: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“Greed” isn’t always good!

Temptingville

A

B

C

D

E

F

Page 19: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“Hard” Problems

The Travelling Salesperson Problem

New York

Moscow

Paris

San Francisco

Claremont

242

1942

742

1342

2142

Brute Force? Greed?

4422642

Page 20: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“Hard” Problems

The Travelling Salesperson Problem

Claremont 1 Montclare

ClearmontMontclear

1

1

22

Page 21: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“Hard” Problems

The Travelling Salesperson Problem

Claremont 1 Montclare

ClearmontMontclear

1

1

221042

Page 22: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec

4900< 1 sec

Page 23: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

Page 24: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

Page 25: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->

Page 26: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

n2 versus 2n

The Fast-O-Matic performs 109 operations/sec

Fast-O-MaticFast-O-Matic

n2

2n

n = 10 n = 30 n = 50n = 70

100< 1 sec

900< 1 sec

2500< 1 sec

1024< 1 sec

109

1 sec 1015

13 days

4900< 1 sec

1021

37 trillion years

Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->

37 billion years!

Page 27: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Snowplows and Travelling Salesperson Revisited!

Travelling Salesperson Problem

Snowplow Problem

Protein Folding

NP-complete problems

Tens of thousands of other known problems go in this cloud!!

Cophylogeny Problem!

Page 28: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

“I can’t find an efficient algorithm. I guess I’m too dumb.”

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

Page 29: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

“I can’t find an efficient algorithm because no suchalgorithm is possible!”

Page 30: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

“I can’t find an efficient algorithm, but neithercan all these famous people.”

Page 31: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

$1 million

Page 32: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Coping with NP-completeness…

• Brute force • Ad hoc Heuristics• Meta-heuristics• Approximation algorithms

Page 33: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

A Meta-heuristic Approach• Fix a timing for the host tree – a relative ordering of

the speciation events• All host switches occur “horizontally” in time• We can solve the problem optimally for a given

timing using Dynamic Programming

Page 34: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Genetic Algorithm• Host tree and three different possible

ordering of the speciation events.

Page 35: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Jane 2.0(available at www.cs.hmc.edu/~hadas/jane)

Page 36: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

What Jane does…

Gopher/Louse pair…8 tips on gopher tree10 tips on louse tree

Best solutions found are listed here… along with total cost

Page 37: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

But perhaps those “seemingly good” solutions of cost 11 are no better than random…

In “Stats” mode, we can generaterandom tip mappings or entirelyrandom parasite trees.

Here, we ran 50 trials with randomtip mappings.

The red dashed line shows the best solution found to our original dataset and the blue histogram shows the costs for the 50 random trials. In this case, none of the random trials resulted in solutions of cost 11 or less!

Page 38: Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College.

Jane Demo!