Beiko cms final

52
Robert Beiko When trees can’t agree

Transcript of Beiko cms final

Page 1: Beiko cms final

Robert Beiko

When trees can’t agree

Page 2: Beiko cms final

2

- The human microbiome -an ecosystem unlike any other

Page 3: Beiko cms final

Human gut microbiome: 2-3 million genes

Typically > 160 “species” at any given time

Human: ~25,000 genes

Qin et al., Nature (2010)

Page 4: Beiko cms final

4

Microbial communities

http://upload.wikimedia.org/wikipedia/commons/2/2d/Bacteria_%28251_31%29_Airborne_microbes.jpg

Page 5: Beiko cms final

5

Photo courtesy of Emma Allen-Vercoe, University of Guelph

Lachnospiraceae bacterium 3-1-57 CT1“Lachnozilla”

Page 6: Beiko cms final

6Meehan and Beiko (2014) Genome Biol Evol

Lachno

Lachnospiraceae – commonly thought of as “Good bacteria”

Page 7: Beiko cms final

7

0 1000 2000 3000 4000 5000 6000 7000 8000

Sizes of Assembly and Draft Genomes of Class Clostridia

Number of Protein-Coding Genes

Zilla

Page 8: Beiko cms final

?

Page 9: Beiko cms final

9

50

33

4

?

Page 10: Beiko cms final

10W. Ford Doolittle, Sci Am (1999)

Page 11: Beiko cms final

11

PNAS, 2012

“…pathogen-driven inflammatory responses in the gut can generate transient enterobacterial blooms in which conjugative transfer occurs at unprecedented rates.”

PLoS Biol, 2007

“…lateral gene transfer, mobile elements, and gene amplification have played important roles in affecting the ability of gut-dwelling Bacteroidetes to vary their cell surface, sense their environment, and harvest nutrient resources present in the distal intestine.”

Gene transfer matters

Page 12: Beiko cms final

12

The genomics toolkitGene profiles

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Page 13: Beiko cms final

13

The genomics toolkit“Species” trees

Page 14: Beiko cms final

14

The genomics toolkitGene trees

Do this forALL genes

Page 15: Beiko cms final

15

Representing and understandingmicrobial relationships

1. Matrix-based approaches

2. Phylogenetic reconciliation

3. Gene distributions and “microbial identity”

Page 16: Beiko cms final

1The tyrannyof distance

Page 17: Beiko cms final

17

From profile to distance matrix

Gene 1 Gene 2 Gene 3 Gene 4 Gene n

A

B

C

D

E

F

S1 = 0.91 0.82 0.72 0.89

𝑑𝐴 ,𝐵=1.0−1𝑛∑

𝑔=1

𝑛

𝑆𝑔

A B C

A 0 0.165 0.252

B 0.165 0 0.297

C 0.252 0.297 0

Page 18: Beiko cms final

18

Neighbor-joining

Start with a ‘star’ tree

At each iteration, split off the pair of taxa that minimizes the total sum of branch lengths in the tree

Choose groups x and y to minimize the Q-criterion:

Distance matrix entry for (x,y)

x

y

Weighted distance to all leaves

Page 19: Beiko cms final

19

Continue until binary tree is obtained

Saitou and Nei (1987)

Page 20: Beiko cms final

20

Neighbor-net: Building a splits graph

Bryant and Moulton, Mol Biol Evol (2003)

Page 21: Beiko cms final

21

Neighbor-net is guaranteed to produce a circular set of splits

This will produce a planar graph

Page 22: Beiko cms final

22

Neighbor-net of 298 microbial genomes

Beiko, Biol Direct (2011)

Page 23: Beiko cms final

23

Limitations of neighbor-net

• Neighbor-net still imposes a constraint on the relationships among genomes: “long-distance” connections cannot be shown

?

Page 24: Beiko cms final

24

Explicit connections between genomes• Make each genome a vertex in a graph G

V = {A,B,C,D,E,F,…}E = {{A,B},…}

For some threshold t:{A,B} ϵ G iff dA,B ≤ tor if some other condition is satisfied

A BwA,B

Page 25: Beiko cms final

25

Linear programming

• Weighting networks based on straight genome-genome similarity highlights close relatives, redundancy

• LP introduces weighting scheme that constrains connections and promotes distinct relationships

Page 26: Beiko cms final

26

P. aeruginosaP. fluorescensP. lePewtidaP. syringaeP. entomophilaP. stutzeriP. mendocina

Holloway and Beiko, BMC Evol Biol (2010)

“Plume”

Page 27: Beiko cms final

27

Some like it hotPyrococcus furiosusoptimal growth temperature:

100°C

Page 28: Beiko cms final

28Kunin et al. (2005) Genome Res

Networks

Page 29: Beiko cms final

29

Networks!!!!

Dagan et al. (2008) PNAS

Page 30: Beiko cms final

2Inferring andcomparing trees

Page 31: Beiko cms final

31

Phylogenetic tree reconciliation

Species tree S Gene tree GLateral gene transfer

Subtree prune and regraftWhidden et al., Syst Biol (2014)

Page 32: Beiko cms final

32

For two rooted trees, dSPR is equal to thenumber of components in a MAF, minus 1

So building a MAF is equivalent to inferring the minimumnumber of SPR events needed to reconcile a species treewith a gene tree

Problem is NP-hard

dSPR = 1

MAF components = 2

Bordewich and Semple, Ann Combinatorics (2005)

Page 33: Beiko cms final

33

T1 T2

Case 1(separate components)

Case 3(several pendant nodes)

Case 2(one pendant node)

Chris’s algorithm

Page 34: Beiko cms final

34

Fixed-parameter tractability

• Problem is dominated by Case 3 (3 alternatives)

• Cut all candidate edges at each step = linear 3-approximation

• Decision problem: to decide if SPR distance ≤ k

• Problem is exponential in SPR distance, NOT number of leaves

therefore FPT

Chris Whidden + Norbert Zeh

Page 35: Beiko cms final

35

In practice

Page 36: Beiko cms final

36

SPR Supertrees

Supertree: a tree that satisfies some optimality criterion with respect to a set of input trees

SPR supertree: given a set of gene trees, find a tree that minimizes the total number of SPR operations vs. all gene trees

Building an SPR supertree: assemble an initial tree, then propose SPR operations and evaluate its total SPR distance from input trees

Whidden et al., 2014

Page 37: Beiko cms final

37

Why SPR supertrees?

1. Explicit representation of LGT events

2. Branches broken in MAF → implied LGT events. Can build graph of connections

Page 38: Beiko cms final

244 bacterial genomes40,631 gene trees= Bacterial SPR supertree

LGT patterns for Clostridium

Whidden et al., 2014

Page 39: Beiko cms final

(taming in progress) http://en.wikipedia.org/wiki/File:Godzilla_%2754_design.jpg

3Taming Lachnozilla

Page 40: Beiko cms final

What makes LachnoZilla

LachnoZilla ?

Page 41: Beiko cms final

41

C. difficile….

“Virulence-associated protein”Mobile DNA

Phylogenetic profile basedon extremely good matches toother genomes (> 95% ID, > 95% coverage)

= “recent” LGT events

Page 42: Beiko cms final

42

LZ & friends

279 genomesConserved marker-gene tree

Ben Wright

Page 43: Beiko cms final

43

LachnoZilla (and friends)genome graph

!

Page 44: Beiko cms final

44

Close relative(expected)

Page 45: Beiko cms final

45

Distant relative(not so expected)(big genome though!)

Page 46: Beiko cms final

46

Selective sharing

Page 47: Beiko cms final

Gene-centric graphsLZ Genom

e 1Genom

e 2Genom

e 3Genom

e 4Genom

e 5Genom

e 6

Gene 1

× ×

Gene 2

×

Gene 3

× ×

Gene 4

× × ×

Edge weights are proportional to similarity of distributionUse graph clustering to divide up completely connected, weighted graph

Gene 2

Gene 3

Gene 1

Gene 4

Page 48: Beiko cms final

Legionaminic acidAcetylneuraminic acid

(pathogen associated)

Bacteroides pectinophilusButyrivibrio proteoclasticusEubacterium plexicaudatumRoseburiaNeighborsWeirdly named isolates

Lachnozilla in graph form(it all makes sense now)

Page 49: Beiko cms final

Mystery isolate #1(made-up example)

Page 50: Beiko cms final

Mystery isolate #2(made-up example)

Page 51: Beiko cms final

Questions

Representations

Clear inference

From pattern to understanding

Page 52: Beiko cms final

52

FIN