Inferring microbial ecosystem function from community structure

1
Inferring microbial ecosystem function from community structure Jeff S. Bowman and Hugh W. Ducklow Lamont-Doherty Earth Observatory at Columbia University [email protected] | www.polarmicrobes.org Introduction and Motivation Marine microbes play a central role in the sustainability of the global ocean by mediating the flow of carbon and nutrients through the marine system. Ecologists commonly study the structure and composition of marine microbial communities by analyzing the 16S rRNA gene. Although this data is well suited to evaluating differences between communities, and to correlating community structure with other environmental parameters (e.g. chlorophyll concentration, temperature, sa- linity), it is less well suited to describing the ecosystem functions (i.e. metabolic functions) of these community. Although metagenomics and other techniques can bridge the gap between microbial community structure and ecosystem function these techniques are costly, data intensive, and low throughput. Our goal was to develop a high-throughput method for inferring community metabolism from community taxonomy. By evaluating metabolic structure in place of community structure we capture key inter-sample relationships and their impact on microbial ecosystem function. Our method produces pathway genome databases (PGDBs) that describe the metabolic pathways likely to be present in the sample. These PGDBs are amenable to flux-based metabolic modeling. Future work will focus on predicting the flow of elements and energy through these pathways, providing a way to model the impact of changing commu- nity structure on biogeochemical cycles. Here we apply our method to a seasonally variable, depth stratified microbial community from the West Antarctic Peninsula, a region undergoing unprecedented environmental change. 16S sequence library, the bigger the better! Obtain all completed genomes Build 16S rRNA reference tree Find consensus genome for each tree node Place reads on reference tree Extract pathways for each placement Generate confidence score for sample Predict metabolic pathways Calculate confidence for each node Evaluate genomic plasticity for terminal nodes Evaluate relative core genome size Fig. 1. Methods. Our metabolic inference pipe- line, PAPRICA [1], uses a phylogenetic placement program (pplacer) [2] to place query reads on a reference tree of 16S rRNA genes from all complet- ed genomes. We determine a consensus genome for each point of placement on the tree, and deter- mine the metabolic pathways represented in these genomes. Separately we determine a confidence score for each point of placement on the reference tree from a novel indicator of genomic stability. Terminal Node Terminal Node Internal Node Core genome Accessory Genome = ( ) כ(1 Ԅ ) Fig. 2. Confidence score. Placements can be made to terminal and internal nodes. To determine the confidence (c) of a metabolic inference for a given placement we con- sider the core genome size (S core ), the mean genome size of the clade (S clade ), and the mean index of plasticity for the clade (ф; Fig. 3). Fig. 3. Genomic plasticity of genomes in our database. A major impediment to accurate metabolic inference is the genetic diversity that can exist within even a narrow taxonomic clade. We developed a confidence metric for our inferred metab- olisms that is based on the degree of genomic plasticity present inherent to each genome. X-axis gives the position of each genome on our reference tree, Y-axis gives the degree of plasticity. Unusually plastic genomes are indicated by Roman numerals. I) Nanoarcheum equitans II) the Mycobacteria III) a butyrate producing bacterium within the Clostridium IV) Candidatus Hodgkinia circadicola V) the Myco- plasma VI) Sulcia muelleri VII) Portiera aleyrodidanum VIII) Buchnera aphidicola IX) the Oxalobacteraceae. 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Terminal node Relative plasticity I II III IV V VI VII VIII IX Fig. 4. Sample locations within the Palmer LTER off the WAP (left) and inter-sample similarity (right). The location of Palmer Sta- tion is given by the star. Summer surface and deep samples along with winter surface samples were analyzed [3]. A) Hierarchical cluster- ing of samples by metabolic structure. B) Hierarchical clustering of samples by taxonomic structure. Note duplicate samples in both A and B. C) Distances between samples are in good agreement between the two methods (R2 = 0.70). D) Distances are correlated (R2 = 0.40), albeit less well, the alternate metabolic inferrence approach PICRUSt [4]. /RQJLWXGH /DWLWXGH NW NE SW SE WAP summer_sw_deep_b.1 summer_sw_deep_b.2 summer_nw_deep_b.1 summer_nw_deep_b.2 summer_se_deep_b.1 summer_se_deep_b.2 winter_ne_shallow_b.1 winter_ne_shallow_b.2 summer_ne_deep_b.1 summer_ne_deep_b.2 summer_ne_shallow_b.1 summer_ne_shallow_b.2 summer_se_shallow_b.1 summer_se_shallow_b.2 summer_sw_shallow_b.1 summer_sw_shallow_b.2 summer_nw_shallow_b.1 summer_nw_shallow_b.2 0.0 1.0 2.0 Height summer_nw_deep_b.1 summer_nw_deep_b.2 summer_se_deep_b.1 summer_se_deep_b.2 summer_sw_deep_b.1 summer_sw_deep_b.2 winter_ne_shallow_b.1 winter_ne_shallow_b.2 summer_ne_deep_b.1 summer_ne_deep_b.2 summer_se_shallow_b.1 summer_se_shallow_b.2 summer_nw_shallow_b.2 summer_sw_shallow_b.1 summer_sw_shallow_b.2 summer_nw_shallow_b.1 summer_ne_shallow_b.1 summer_ne_shallow_b.2 0.0 0.2 0.4 Height 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.1 0.3 0.5 Distance by pathway abundance Distance by edge abundance A B Surface Deep Winter surface C 0.05 0.10 0.15 0.2 0.4 0.6 0.8 Distance by pathway abundance Distance by OTU abundance D This method R 2 = 0.70 PICRUSt R 2 = 0.40 Clustering by pathway abundance, this method Clustering by edge abundance, this method Key Points Microbial communities can be described by their metabolic structure. Metabolic structure provides information on potential microbial ecosystem functions . Representing a microbial community by metabolic structure may provide a way to model the flow of elements and energy through the community . 1. Bowman, Jeff S., and Hugh W. Ducklow. 2015. Microbial Communities Can Be Described by Metabolic Structure: A General Framework and Application to a Sea- sonally Variable, Depth-Stratified Microbial Community from the Coastal West Antarctic Peninsula. PloS one, 10.8: e0135868. 2. Matsen, F, R Kodner, E Armbrust. 2010. pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538. 3. Luria, C, H Ducklow, L Amaral-Zettler. 2014. Marine bacterial, archaeal and eukaryotic diversity and community structure on the continental shelf of the western Antarctic Peninsula. Aquatic Microbial Ecology, 73:2 107-121. 4. Langille, Morgan GI, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. 2013. Nature biotechnology 31.9: 814-821. pyruvate fermentation to lactate phosphonoacetate degradation adenosine nucleotides degradation III creatinine degradation II D-galacturonate degradation I triacylglycerol degradation allantoin degradation to ureidoglycolate I (urea producing) nitrate reduction I (denitrification) oxalate degradation II sucrose degradation IV (sucrose phosphorylase) galactose degradation I (Leloir pathway) threonine degradation I S-methyl-5-thio-alpha-D-ribose 1-phosphate degradation nitrate reduction IV (dissimilatory) taurine degradation IV cholesterol degradation to androstenedione II (cholesterol dehydrogenase) sitosterol degradation to androstenedione reactive oxygen species degradation (mammalian) alkylnitronates degradation reductive monocarboxylic acid cycle trehalose degradation VI (periplasmic) arginine degradation III (arginine decarboxylase/agmatinase pathway) propionyl CoA degradation phenylmercury acetate degradation thymine degradation glutamate degradation I uracil degradation I (reductive) ethanol degradation IV threonine degradation III (to methylglyoxal) formaldehyde oxidation II (glutathione-dependent) ethanol degradation II valine degradation II S-methyl-5'-thioadenosine degradation II guanosine nucleotides degradation III formate oxidation to CO2 pyrimidine deoxyribonucleosides degradation 2'-deoxy-alpha-D-ribose 1-phosphate degradation methylglyoxal degradation II glutamate degradation X glucose and glucose-1-phosphate degradation glycogen degradation I urate biosynthesis/inosine 5'-phosphate degradation pseudouridine degradation phenylacetate degradation I (aerobic) D-mannose degradation urea degradation I methionine degradation I (to homocysteine) aspartate degradation I citrulline degradation glutamine degradation I -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 Enriched in surface | Enriched in deep and winter p-value 0.05 4.57 x 10-5 Key intracellular metabolism Anaerobic metabolism Nitrogen degradation Carbon degradation C 1 metabolism Autotrophy Mercury degradation Columbia / Kiel University Sustainable Oceans Symposium Fig. 5. What metabolic pathways are differentially present between summer surface samples and winter and deep samples? Having determined that the relationship between samples can be accurately represented by metabolic structure we can begin to ask ecologically relevant questions. A frequent ques- tion posed to community structure data is how are metabolisms partitioned between niches? In the figure at left color gives the p-value for a Mann-Whit- ney test between sample groups (summer surface vs. summer deep and winter surface). The X-axis gives the anomaly, calculated as the difference in sample group means divided by the sum of the sample group means.

Transcript of Inferring microbial ecosystem function from community structure

Page 1: Inferring microbial ecosystem function from community structure

Inferring microbial ecosystem function from community structure

Je� S. Bowman and Hugh W. Ducklow

Lamont-Doherty Earth Observatory at Columbia [email protected] | www.polarmicrobes.org

Introduction and Motivation

Marine microbes play a central role in the sustainability of the global ocean by mediating the �ow of carbon and nutrients through the marine system. Ecologists commonly study the structure and composition of marine microbial communities by analyzing the 16S rRNA gene. Although this data is well suited to evaluating di�erences between communities, and to correlating community structure with other environmental parameters (e.g. chlorophyll concentration, temperature, sa-linity), it is less well suited to describing the ecosystem functions (i.e. metabolic functions) of these community. Although metagenomics and other techniques can bridge the gap between microbial community structure and ecosystem function these techniques are costly, data intensive, and low throughput.

Our goal was to develop a high-throughput method for inferring community metabolism from community taxonomy. By evaluating metabolic structure in place of community structure we capture key inter-sample relationships and their impact on microbial ecosystem function. Our method produces pathway genome databases (PGDBs) that describe the metabolic pathways likely to be present in the sample. These PGDBs are amenable to �ux-based metabolic modeling. Future work will focus on predicting the �ow of elements and energy through these pathways, providing a way to model the impact of changing commu-nity structure on biogeochemical cycles.

Here we apply our method to a seasonally variable, depth strati�ed microbial community from the West Antarctic Peninsula, a region undergoing unprecedented environmental change.

16S sequence library, the bigger

the better!

Obtain all completed genomes

Build 16S rRNA reference tree

Find consensus genome for

each tree node

Place reads on reference tree

Extract pathways for each placement

Generate confidence score

for sample

Predict metabolic pathways

Calculate confidence for

each node

Evaluate genomic

plasticity for terminal nodes

Evaluate relative core genome size

Fig. 1. Methods. Our metabolic inference pipe-line, PAPRICA [1], uses a phylogenetic placement program (pplacer) [2] to place query reads on a reference tree of 16S rRNA genes from all complet-ed genomes. We determine a consensus genome for each point of placement on the tree, and deter-mine the metabolic pathways represented in these genomes. Separately we determine a con�dence score for each point of placement on the reference tree from a novel indicator of genomic stability.

Terminal Node

Terminal Node

Internal Node

Core genome

Accessory Genome

= ( )

(1 )

Fig. 2. Con�dence score. Placements can be made to terminal and internal nodes. To determine the con�dence (c) of a metabolic inference for a given placement we con-sider the core genome size (Score), the mean genome size of the clade (Sclade), and the mean index of plasticity for the clade (ф; Fig. 3).

Fig. 3. Genomic plasticity of genomes in our database. A major impediment to accurate metabolic inference is the genetic diversity that can exist within even a narrow taxonomic clade. We developed a con�dence metric for our inferred metab-olisms that is based on the degree of genomic plasticity present inherent to each genome. X-axis gives the position of each genome on our reference tree, Y-axis gives the degree of plasticity. Unusually plastic genomes are indicated by Roman numerals. I) Nanoarcheum equitans II) the Mycobacteria III) a butyrate producing bacterium within the Clostridium IV) Candidatus Hodgkinia circadicola V) the Myco-plasma VI) Sulcia muelleri VII) Portiera aleyrodidanum VIII) Buchnera aphidicola IX) the Oxalobacteraceae.

0 500 1000 1500 2000 2500

0.0

0.2

0.4

0.6

0.8

1.0

Terminal node

Rel

ativ

e pl

astic

ity

I

IIIII

IV

V VIVII

VIII

IX

Fig. 4. Sample locations within the Palmer LTER o� the WAP (left) and inter-sample similarity (right). The location of Palmer Sta-tion is given by the star. Summer surface and deep samples along with winter surface samples were analyzed [3]. A) Hierarchical cluster-ing of samples by metabolic structure. B) Hierarchical clustering of samples by taxonomic structure. Note duplicate samples in both A and B. C) Distances between samples are in good agreement between the two methods (R2 = 0.70). D) Distances are correlated (R2 = 0.40), albeit less well, the alternate metabolic inferrence approach PICRUSt [4].

NW

NE

SW

SE

WAP

sum

mer

_sw

_dee

p_b.

1su

mm

er_s

w_d

eep_

b.2

sum

mer

_nw

_dee

p_b.

1su

mm

er_n

w_d

eep_

b.2

sum

mer

_se_

deep

_b.1

sum

mer

_se_

deep

_b.2

win

ter_

ne_s

hallo

w_b

.1w

inte

r_ne

_sha

llow

_b.2

sum

mer

_ne_

deep

_b.1

sum

mer

_ne_

deep

_b.2

sum

mer

_ne_

shal

low

_b.1

sum

mer

_ne_

shal

low

_b.2

sum

mer

_se_

shal

low

_b.1

sum

mer

_se_

shal

low

_b.2

sum

mer

_sw

_sha

llow

_b.1

sum

mer

_sw

_sha

llow

_b.2

sum

mer

_nw

_sha

llow

_b.1

sum

mer

_nw

_sha

llow

_b.20.

01.

02.

0

Hei

ght

sum

mer

_nw

_dee

p_b.

1su

mm

er_n

w_d

eep_

b.2

sum

mer

_se_

deep

_b.1

sum

mer

_se_

deep

_b.2

sum

mer

_sw

_dee

p_b.

1su

mm

er_s

w_d

eep_

b.2

win

ter_

ne_s

hallo

w_b

.1w

inte

r_ne

_sha

llow

_b.2

sum

mer

_ne_

deep

_b.1

sum

mer

_ne_

deep

_b.2

sum

mer

_se_

shal

low

_b.1

sum

mer

_se_

shal

low

_b.2

sum

mer

_nw

_sha

llow

_b.2

sum

mer

_sw

_sha

llow

_b.1

sum

mer

_sw

_sha

llow

_b.2

sum

mer

_nw

_sha

llow

_b.1

sum

mer

_ne_

shal

low

_b.1

sum

mer

_ne_

shal

low

_b.20.

00.

20.

4

Hei

ght

0.02 0.04 0.06 0.08 0.10 0.12 0.14

0.1

0.3

0.5

Distance by pathway abundance

Dis

tanc

e by

edg

e ab

unda

nce

A B

Surface

Deep

Winter surface

C

0.05 0.10 0.15

0.2

0.4

0.6

0.8

Distance by pathway abundance

Dis

tanc

e by

OTU

abu

ndan

ce

D

This methodR2 = 0.70

PICRUStR2 = 0.40

Clustering by pathway abundance, this method Clustering by edge abundance, this method

Key Points

• Microbial communities can be described by their metabolic structure.• Metabolic structure provides information on potential microbial ecosystem functions.• Representing a microbial community by metabolic structure may provide a way to model the �ow of elements and energy through the community.

1. Bowman, Je� S., and Hugh W. Ducklow. 2015. Microbial Communities Can Be Described by Metabolic Structure: A General Framework and Application to a Sea-sonally Variable, Depth-Strati�ed Microbial Community from the Coastal West Antarctic Peninsula. PloS one, 10.8: e0135868.2. Matsen, F, R Kodner, E Armbrust. 2010. pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a �xed reference tree. BMC Bioinformatics, 11:538.3. Luria, C, H Ducklow, L Amaral-Zettler. 2014. Marine bacterial, archaeal and eukaryotic diversity and community structure on the continental shelf of the western Antarctic Peninsula. Aquatic Microbial Ecology, 73:2 107-121.4. Langille, Morgan GI, et al. Predictive functional pro�ling of microbial communities using 16S rRNA marker gene sequences. 2013. Nature biotechnology 31.9: 814-821.

pyruvate fermentation to lactatephosphonoacetate degradation

adenosine nucleotides degradation IIIcreatinine degradation II

D−galacturonate degradation Itriacylglycerol degradation

allantoin degradation to ureidoglycolate I (urea producing)nitrate reduction I (denitrification)

oxalate degradation IIsucrose degradation IV (sucrose phosphorylase)

galactose degradation I (Leloir pathway)threonine degradation I

S−methyl−5−thio−alpha−D−ribose 1−phosphate degradationnitrate reduction IV (dissimilatory)

taurine degradation IVcholesterol degradation to androstenedione II (cholesterol dehydrogenase)

sitosterol degradation to androstenedionereactive oxygen species degradation (mammalian)

alkylnitronates degradationreductive monocarboxylic acid cycle

trehalose degradation VI (periplasmic)arginine degradation III (arginine decarboxylase/agmatinase pathway)

propionyl CoA degradationphenylmercury acetate degradation

thymine degradationglutamate degradation I

uracil degradation I (reductive)ethanol degradation IV

threonine degradation III (to methylglyoxal)formaldehyde oxidation II (glutathione−dependent)

ethanol degradation IIvaline degradation II

S−methyl−5'−thioadenosine degradation IIguanosine nucleotides degradation III

formate oxidation to CO2pyrimidine deoxyribonucleosides degradation

2'−deoxy−alpha−D−ribose 1−phosphate degradationmethylglyoxal degradation II

glutamate degradation Xglucose and glucose−1−phosphate degradation

glycogen degradation Iurate biosynthesis/inosine 5'−phosphate degradation

pseudouridine degradationphenylacetate degradation I (aerobic)

D−mannose degradationurea degradation I

methionine degradation I (to homocysteine)aspartate degradation I

citrulline degradationglutamine degradation I

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

Enriched in surface | Enriched in deep and winter

p-value0.05

4.57 x 10-5

Key intracellular metabolismAnaerobic metabolismNitrogen degradationCarbon degradation

C1 metabolism

AutotrophyMercury degradation

Columbia / Kiel University Sustainable Oceans Symposium

Fig. 5. What metabolic pathways are di�erentially present between summer surface samples and winter and deep samples? Having determined that the relationship between samples can be accurately represented by metabolic structure we can begin to ask ecologically relevant questions. A frequent ques-tion posed to community structure data is how are metabolisms partitioned between niches? In the �gure at left color gives the p-value for a Mann-Whit-ney test between sample groups (summer surface vs. summer deep and winter surface). The X-axis gives the anomaly, calculated as the di�erence in sample group means divided by the sum of the sample group means.