SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 ·...

56
SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities in microbial community-wide metabolic networks Hugo Roume a,1* , Anna Heintz-Buschart a* , Emilie EL Muller a , Patrick May a , Venkata P 5 Satagopam a , Cédric C Laczny a , Shaman Narayanasamy a , Laura A Lebrun a , Michael R Hoopmann b , James M Schupp c , John D Gillece c , Nathan D Hicks c , David M Engelthaler c , Thomas Sauter d , Paul S Keim c , Robert L Moritz b , and Paul Wilmes a,2 a Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg. b Institute for Systems Biology, Seattle, WA, USA. 10 c The Translational Genomic Research Institute-North, Flagstaff, AZ, USA. d Life Science Research Unit, University of Luxembourg, Luxembourg, Luxembourg. 1 Current address: Laboratory of Microbial Ecology and Technology, Ghent University, Belgium 2 Corresponding author: P Wilmes, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 15 L-4362 Esch-sur-Alzette, Luxembourg. E-mail: [email protected]. * These authors contributed equally to this work.

Transcript of SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 ·...

Page 1: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

SUPPLEMENTARY INFORMATION

Comparative integrated omics: identification of key

functionalities in microbial community-wide metabolic networks

Hugo Roumea,1*, Anna Heintz-Buscharta*, Emilie EL Mullera, Patrick Maya, Venkata P 5

Satagopama, Cédric C Lacznya, Shaman Narayanasamya, Laura A Lebruna, Michael R

Hoopmannb, James M Schuppc, John D Gillecec, Nathan D Hicksc, David M Engelthalerc,

Thomas Sauterd, Paul S Keimc, Robert L Moritzb, and Paul Wilmesa,2

aLuxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.

bInstitute for Systems Biology, Seattle, WA, USA. 10 cThe Translational Genomic Research Institute-North, Flagstaff, AZ, USA.

dLife Science Research Unit, University of Luxembourg, Luxembourg, Luxembourg.

1Current address: Laboratory of Microbial Ecology and Technology, Ghent University, Belgium

2Corresponding author: P Wilmes, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 15

L-4362 Esch-sur-Alzette, Luxembourg. E-mail: [email protected].

* These authors contributed equally to this work.

Page 2: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

2

Supplementary Materials and Methods 20

Biomolecular extraction and quality assessment

For each sampling date, three 200 mg sub-samples derived from one OMMC islet (defined,

herein1, as technical replicates) were used for biomolecular extraction. Each of the individual

biomolecular fractions isolated from the technical replicates were combined into a pool in

order to yield sufficient biomolecular quantities for subsequent high-throughput analysis and 25

to reduce the influence of fine-scale within-sample heterogeneity (as demonstrated by Roume

et al.2).

For the quality assessment of the isolated genomic DNA, fractions were separated by

electrophoresis on a 1 % agarose gel containing 4 ‰ ethidium bromide (PlusOne Ethidium

Bromide, GE Healthcare). For size estimation, the MassRuler DNA ladder mix (Fermentas) 30

was loaded onto the gels. Agarose gels were visualized on an InGenius gel imaging and

analysis system (Syngene, Cambridge, UK; as described in Roume et al.1). The DNA pool

sample was snap-frozen in liquid nitrogen in the elution buffer and stored at -80 °C until

library preparation and sequencing.

RNA quality assessment and quantification was carried out using an Agilent 2100 35

Bioanalyzer (Agilent Technologies, Diegem, Belgium; as described in Roume et al.1).

The quality of protein extracts was assessed following 1D-SDS-PAGE separation.

Preparation of RNA for shipment and sequencing

The volume of the RNA pool sample was adjusted to 180 µl using RNase free-water. A

mixture of 18 µl of a 3 M (w/v) sodium acetate solution, 2 µl of a glycogen solution 40

Page 3: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

3

(10.mg/µl) and 600 µl of ice-cold 96 % (v/v) ethanol was added to the RNA solution. The

RNA solution was gently mixed by inversion and precipitated at -20 °C for at least 1.h.

Following centrifugation at 10 000 x g for 30 min at 4 °C, the RNA pellet was washed three

times with ice-cold 70 % (v/v) ethanol. The pellet was then air dried for 5 min at room

temperature overlaid with 100 µl of Ambion® RNAlater (Life Technologies, Gent, Belgium) 45

and placed at -20 °C until shipment. For reclamation of the RNA, the solution was

centrifuged at 14 000 x g for 10 min at 8 °C. After removal of the supernatant, the pellet was

washed three times with 100 µl of 80 % (v/v) ethanol, followed by 25 000 x g centrifugation

for 3 min at 8 °C. The pellet was then air dried until all visible ethanol had evaporated. The

dried RNA pellet was then re-suspended in 1 mM sodium citrate buffer solution (pH 6.4). 50

RNA-Sequencing

In order to enrich the RNA fraction in mRNA, the rRNA was subtracted from the total RNA

fraction using the Ribo-Zero rRNA Removal Kit (Meta-Bacteria; Epicentre, Madison, WI,

USA). This was followed by cDNA synthesis and amplification using the ScriptSeqTM v2

RNA-Seq library preparation kit (Epicentre). 55

rRNA removal

The procedure consisted of an initial washing and re-suspension of the Ribo-Zero

microspheres with dedicated solutions. Following treatment of the total RNA sample with

Ribo-Zero removal solution, the RNA sample solution was added to the re-suspended Ribo-

Zero microspheres and selected by hybridisation. The RNA-microsphere solution was then 60

removed and the rRNA-depleted sample was purified by ethanol precipitation.

Page 4: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

4

Library preparation

The cDNA synthesis and amplification were performed using the ScriptSeqTM v2 RNA-Seq

library preparation kit (Epicentre). The procedure consisted of initial fragmentation of the

RNA, followed by the annealing of the cDNA synthesis primer. Briefly, addition of 4 µl

cDNA Synthesis Master Mix to the fragmented RNA solution was followed by incubation 65

first at 25 °C for 5 min and then 42 °C for 20 min. cDNA Synthesis Master Mix was prepared

by combining 3 µl cDNA Synthesis PreMix (ScriptSeq v2 RNA-Seq library preparation kit),

0.5 µl of DTT (100 mM) and 0.5 µl of StartScript Reverse Transcriptase solution (Epicentre).

After cooling to 37 °C, 1 µl of finishing solution (Epicentre) was added to the cDNA

synthesis solution and incubated at 37 °C for 10 min, before an incubation at 95 °C for 3 min. 70

8 µl of terminal tagging master mix was then added to each solution and incubated at 25 °C

for a further 15 min, following incubation at 95 °C for 3 min. Terminal tagging master mix

was prepared using 7.5 µl of terminal tagging premix (Epicentre) and 0.5 µl of DNA

polymerase. The 3’-terminal tagged cDNA was then purified using the AMPure XP system

(Beckman Coulter, Brea, CA, USA). The purified cDNA strand was then amplified by PCR, 75

resulting in the generation of the second strand of cDNA, addition of the Illumina adaptor

sequences and incorporation of specific barcodes, as well as amplification. Finally, the RNA-

Seq library was purified using AMPure XP system (Beckman Coulter). The size distribution

of the RNA-Seq library was assessed using an Agilent 2100 Bioanalyzer.

80

DNA/cDNA sequencing

DNA/cDNA was prepared according to the modified instructions from The Wellcome Trust

Sanger Institute3. The metagenomic sequencing protocol used a 96-well library preparation

and the molecular barcoding method for Illumina library construction. The barcodes were

Page 5: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

5

designed using Hamming codes, which allow single nucleotide sequencing errors to be

corrected and single indels (insertions/deletions) to be detected without ambiguity. Several 85

optimisations were employed in the indexing protocol. The tags used were 8 bp long, which

allowed the design of larger number of barcodes with error-correcting capability. The bar

codes were introduced in a regular PCR, which simplified the PCR step and allowed for use

of as few as six cycles. Before pooling, the relative concentration of each sample library was

measured by quantitative PCR (qPCR), which then allowed the accurate pooling of libraries 90

together and improved the uniformity of their representation.

DNA fragmentation

1-5 µg of DNA was used for the fragmentation step, as determined following gel analysis,

and re-suspended in 75 µl of 10 mM Tris-HCl, pH 8.5. The sample was then sheared for

150 s in 100 µl Covaris microtubes (Woburn, MA, USA). The following programme was 95

used: duty cycle 20 %, intensity 5, cycle burst 200, power 37 W, temperature 7 °C and mode

‘Freq sweeping’. The sheared DNA samples were transferred into a MicroAmp optical 96-

well plate (Life technologies, Grand Island, NY, USA). Six samples, chosen at random, were

run on an Agilent Bioanalyzer DNA 1000 chip to check the quality of the fragmentation. A

150-250 bp smear was detected and 70-90 % of the initial DNA amount was recovered. 100

Size selection

A large fragment plate was prepared by dispensing 150 µl of beads into wells of a round-

bottom Costar plate (Corning, Glendale, AZ, USA) for each sample. Additionally, a small-

fragment plate was prepared by dispensing 60 µl of beads into wells of a separated Costar

plate for every sample. The large-fragment Costar plate was placed on a magnetic stand, 105

Page 6: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

6

allowing the collection of beads. 45 µl of 10 mM Tris-HCl buffer were removed from the

large-fragment wells, leaving all beads in the well. The plate was removed from the magnetic

stand and 190 µl of sample from sonication tubes was added to the large fragment wells on

the Costar plate. Following 8 min of incubation at room temperature, using two magnetic

stands, the large-fragment Costar plate was placed on a magnetic stand to collect beads. 110

Following 5 min incubation at room temperature, 58 µl of bead buffer were removed from

the small fragment wells, leaving behind beads and approximately 2 µl of bead buffer. Small-

fragment wells were then removed from the magnetic stand. 300 µl of supernatant were

transferred from large-fragment wells into small-fragment wells. After transfer, the large-

fragment plate was discarded. Samples were then incubated in small-fragment wells for 115

5 min and the plate was placed on a magnetic plate to collect beads. 300 µl of supernatant

were removed and replaced by 300 µl of 80 % ethanol without disturbing beads. Following

30 s of incubation, ethanol was removed. This last step was repeated two times. Following

ethanol removal by air-drying for 5 min, the small fragment plate was removed from the

magnetic stand and 45 µl of pre-warmed 10 mM Tris-HCl (pH 8.5) were dispensed into the 120

wells. After re-suspension of beads by vigorous mixing, the solution was incubated for 2 min

and placed on a magnetic plate to collect beads. Following a further 3 min incubation, 42.5 µl

of supernatant containing the size-selected products were transferred to a new plate.

All enzymes and reaction buffers used for the end-repair, dA-tailing, ligation and PCR

amplification were provided from the KAPA Library Preparation Kits with Standard PCR 125

Library Amplification/Illumina series (KapaBiosystems, Woburn, MA, USA).

Page 7: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

7

End-repair

To the supernatant containing the sheared DNA, 10 µl of end-repair buffer and 5 µl of end-

repair enzyme mix were added. 100 µl of this solution were added to each well. The plate

was then covered, vortexed briefly and spun down before being incubated for 30 min at 130

20 °C in a thermocycler. The samples were then cleaned using Agencourt AMPure SPRI

beads (Beckman Coulter). 90 µl of SPRI beads were added into each well, the plate was

covered and vortexed for 30 s. The reaction plate was placed on the magnetic SPIRPlate for

10 min and beads separated from the solution. Following incubation for 10 min, the

completely clear solution was discarded. 200 µl of 70 % (v/v) ethanol were added to each 135

well and incubated for 30 s at room temperature. The ethanol washes were repeated two

times. The reaction plate was dried for 15 min in a thermocycler and each sample was eluted

with 43.5 µl of 10 mM Tris buffer (pH 8.0).

A-tailing

To 30 µl of end-repaired DNA, 5 µl of 10x A-tailing buffer, 3 µl of A-tailing enzyme and 140

12.µl of water were added. 50 µl of A-tailing master mix were added to each well. The plate

was then covered, vortexed briefly and spun down before being incubated for 30 min at

30 °C in a thermocycler. The sample was then cleaned up using the Agencourt AMPure SPRI

bead method with 90 µl of SPRI beads placed in each well. Each sample was then eluted with

36 µl of 10 mM Tris buffer (pH 8.0). The same six samples, randomly chosen for the DNA 145

fragmentation step, were run on an Agilent Bioanalyzer DNA 1000 chip and the average

sample concentration was calculated using the “integrated peak” function.

Page 8: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

8

Adapter preparation and ligation

To 30 µl of A-tailed DNA, 10 µl of 5x ligation buffer, 5 µl of DNA ligase and 5 µl of DNA

adaptor (30 µM) were added. 50 µl of this reaction mixture, were added to each well. The 150

plate was then incubated at 20 °C (room temperature) for 15 min. The sample was cleaned up

using Agencourt AMPure SPRI beads by addition of 40 µl of SPRI beads to each well. Each

sample was then eluted with 30 µl of 10 mM Tris buffer (pH 8.0) / 0.05 % (v/v) Tween 20.

The same six samples were randomly chosen for the DNA fragmentation step and run on an

Agilent DNA 1000 chip to check the success of the ligation; the smear obtained had an 155

average molecular size of 50 to 200 bp larger than before ligation.

PCR amplification

Library amplification was carried out according to a modified reaction setup as defined in the

KAPA Library Preparation kit instructions (Illumina). 25 µl of 2x Kapa HIFI Hotstart Mix

were spiked with 1 M betaine, aiding in amplification of high-GC-content regions and 160

reducing biases. 1 µl of the supplied PCR primers were added to each tube and a sufficient

quantity of water was added to reach 50 µl of solution volume per PCR reaction. The tubes

were then transferred into a PCR thermocycler and run with the recommended KAPA library

preparation cycling programme. The samples were then cleaned using the SPRI beads

method with 40 µl of SPRI beads. Each sample was eluted using 50 µl of 10 mM Tris-HCl / 165

0.05 % (v/v) Tween 20. The same six samples as those randomly chosen at the DNA

fragmentation step, were run on an Agilent Bioanalyzer DNA 1000 chip to check the success

of the indexing enrichment PCR.

Page 9: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

9

Final library quantification by qPCR

The KAPA Library Quant kit (KapaBiosystems) for final library quantification was used 170

(Illumina). Briefly, after an initial 1:1 000 dilution in 10 mM Tris-HCl, pH 8.0 + 0.05 % (v/v)

Tween 20, the 2x KAPA SYBR FAST qPCR master mix was used to amplify the DNA

library with six other standards on an ABI 7900 thermocycler. The qPCR step was conducted

using the following cycling conditions: 95 °C for 5 min followed by 35 cycles at 95 °C for

30.s and at 60 °C for 45 s. The concentration of the library was established using the 175

standards according to the manufacturer’s instructions. Both the standards and the libraries

with unknown concentration were run in triplicate. Prior to sequencing, the libraries were

diluted to the required concentration (e.g. 4.5 pM) by following the Illumina cluster

generation protocol.

180

Sequence assembly

For each of the two sampling dates, the raw paired-end 100 nt read metagenomic and

metatranscriptomic sequences were processed separately first using the PAired-eND

Assembler4 (PANDAseq, Supplementary Figure 1, step 1) to assemble overlapping read

pairs. PANDAseq was run with a score threshold of 0.9 and 25 nt minimum overlap

requirement to determine the location of the amplification primers, identify the optimal 185

overlap between reads, correct sequencing errors and check length and base quality. The

reads selected by the PANDAseq assembler were extracted from the raw sequence files using

in-house Perl scripts. The remaining non-redundant paired-end reads were trimmed using the

trim-fastq.pl script from the PoPoolation package5 using a quality-threshold of 20 (1 %

probability of miscall) and a minimum length of 40 nt resulting in two quality trimmed read 190

sets, one including still paired-end and one containing only single-end reads, where the other

Page 10: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

10

read pair was discarded during quality trimming. Metagenome and metatranscriptome

FASTQ files were then combined into a combined FASTQ file. All PANDAseq and single-

end reads were then combined into a single FASTQ file. Paired-end and single-end reads

were made non-redundant using CD-HIT-dup6 (Supplementary Figure 1, step 2). None-195

redundant reads were then used as input for the MOCAT assembly pipeline7 (Supplementary

Figure 1, step 3), using default parameters. The assembled contigs were filtered with

minimum length threshold of 150 nt. To enhance the final assembly by reads that were

assembled by PANDAseq, but not used by the MOCAT assembly pipeline, all PANDAseq

reads were mapped onto the MOCAT contigs using SOAPaligner8, with the following 200

parameter settings: -r 2 -M 4 -l 30 -v 10 -p 8 (Supplementary Figure 1, step 4). The

unmapped PANDAseq assembled reads with a minimum length of 150 bp, were extracted and

added to the contigs. The final contig files were made non-redundant using CD-HIT6 (-c 1.0)

by clustering identical sequences.

205

Extraction and classification of reads mapping to rRNA genes in the metagenomic data

EMIRGE9 was run using metagenomic reads quality filtered to minimum average QV = 30

and a minimum of 40 bp with the trim-fastq.pl script from the PoPoolation package5. The

reference database used was the truncated (non-redundant) small ribosomal subunit SILVA10

database release 111. The consensus sequences at 100 iterations were extracted and named

based on their identification in the reference database, without imposing any normalized 210

posterior probability.

Generation of additional metagenomic data for contig extension and analysis

To obtain an additional high-depth metagenomic dataset, a total of four additional floating

Page 11: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

11

sludge samples islets were collected on 23rd February 2011, each representing a biological

replicate. Biomolecular extraction was performed on 200 mg of starting material as for the 215

other two dates, using the protocol described by Roume et al.1. The resulting DNA fractions

were sequenced under sample names I, II, III and IV using the sequencing protocol described

above. Four technical replicates from sample I were generated, resulting in four libraries I-1,

I-2, I-3 and I-4. While the remaining biological replicates II, III and IV were sequenced once

each, generating a total of 12.6 gigabases metagenomic sequence. The reads were then 220

assembled using AMOS11 and MetaVelvet12.

Gene annotation

Non-redundant contig files were split into two distinct files, one file with contigs of a length

below 500 bp and another file with contigs lengths above or equal to 500 bp. The contigs 225

with lengths below 500 bp were annotated using FragGeneScan13 (Supplementary Figure 1,

step 5), using settings for short sequence reads with sequencing error (-complete 0 –train

illumine_5). The contigs with a length equal or above 500 bp were annotated with Prodigal

gene finder14 (v2.60, Supplementary Figure 1, step 5) using the MOCAT gene prediction

processing steps. The resulting amino acids sequence files were then combined into a single 230

file and made non-redundant using CD-HIT with a sequence identity threshold of 1.0 and a

description string length within the cluster file of 5 000 (-c 1.0 –d 5000; Supplementary

Figure 1, step 6).

All sequences were mapped to the KEGG database version 64.0 using BLAT15 and sequences

were annotated with KOs (KEGG orthologous groups; Supplementary Figure 1, step 7). 235

Page 12: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

12

Pre-processing of protein fraction for high-throughput analysis

Following 1D-SDS-PAGE electrophoresis and staining with Imperial protein stain (Thermo

Scientific, Erembodegem, Belgium; as described in Roume et al.1), the protein gel was

conserved at 4 °C in the dark and under vacuum in sealing foil D0316L-20 (DOMO

ELEKTRO, Herentals, Belgium). Prior to further analysis, entire lanes were cut into 1 mm-240

slices using a grid cutter (MEE-1x5, Gel Company, San Francisco, CA, USA), yielding

approximately 70 slices per lane. Two 1 mm-slices were combined in a single well of a 96-

well V-bottom plate with a hole introduced into the bottom using a 30 gauge lancet needle

(Becton Dickinson, Franklin Lakes, NJ, USA). The wells contained size 11 black hexagonal

glass beads (SB3656, Fusion Beads) to prevent the gel pieces from clogging the hole. For in-245

gel digestion, an automated liquid handling system (Tecan EVO, Männedorf, Switzerland)

was used for reduction, alkylation, tryptic digestion and peptide extraction from the gel

pieces. After extraction, the peptide solution was dried and reconstituted in 20 µl of a

solution of 0.1 % (v/v) [trifluoroacetic acid (TFA, 5 %; Sigma) / acetonitrile (ACN, 95 %;

BioSolve, Valkenwaard, Netherlands)] in MilliQ H2O in a round bottom polypropylene 96-250

well plate (Greiner Bio-One, Monroe, NC, USA) and placed into an Eksigent Nano 2D plus

system autosampler (ABSciex, Framingham, MA, USA) for analysis.

Liquid chromatography

Peptides obtained from the 1 mm-gel bands were separated using an Eksigent Nano 2D LC

plus system employing splitless nanoflow. Reverse phase high performance liquid 255

chromatography (RP-HPLC) and separation columns were prepared in-house by packing a

Kasil fritted capillary [360 µm outer diameter (OD), 75 µm inner diameter (ID)] with a 1 cm

bed of ReproSil Pur C18-AQ 3 µm 120 Å stationary phase (Dr. Maisch GmbH, Ammerbuch,

Page 13: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

13

Germany) for the sample trap and desalting column. A Kasil fritted capillary (360 µm OD,

75 µm ID) was packed with a 15 cm bed of the same stationary phase as the separation 260

column and this was connected to a PicoTip emmiter (360 µm OD x 20 µm ID, Tip 10 µm,

FS360-20-10-N-20) for nano-electrospray ionisation. For each LC run, the sample was

injected for 10 minutes at 2.5 µl/min with loading buffer (2 % v/v acetonitrile and 0.1 % v/v

formic acid). The sample was separated by a linear gradient changing from 98 % solvent A

(0.1 % v/v formic acid in water) and 2 % solvent B (0.1 % v/v formic acid in acetonitrile) to 265

40 % A and 60 % B in 60 min at 0.3 µl/min.

Mass spectrometry

Following LC separation, the peptides were analysed on a LTQ-Velos Orbitrap (Thermo-

Fisher, San Jose, CA, USA). MS1 data were collected over the range of 300 – 2 000 m/z in

the Orbitrap at a resolution of 30 000. Fourier-transform mass spectrometry (FTMS) preview 270

scan and predictive automatic gain control (pAGC) were enabled. The full scan FTMS target

ion volume was 1 x 106 with a maximum fill time of 500 ms. MS2 data were collected in the

LTQ-Velos with a target ion volume of 1 x 104 and a maximum fill time of 100 ms. The 10

most intense peaks were selected (within a window of 2.0 Da) for higher-energy collisional

dissociation at 15 000 resolution in the Orbitrap. Dynamic exclusion was enabled in order to 275

exclude an observed precursor for 180 s after two observations. The dynamic exclusion list

size was set at the maximum 500 and the exclusion width was set at ±5 ppm based on

precursor mass. Monoisotopic precursor selection and charge state rejection were enabled to

reject precursors with z = +1 or unassigned charge state.

280

Page 14: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

14

Protein identification

For MS analysis, Thermo .RAW files were converted to mzXML format using MSConvert

(ProteoWizard16) and searched with X!Tandem17 version 2011.12.01.1. Spectra were

searched against the metagenomic and metatranscriptomic data, common lab protein

contaminants, and decoys. Redundancy was removed from these three data sets using

BlastClust. The contaminant database was a modified version of the common Repository of 285

Adventitious Proteins (cRAP, www.thegpm.org/crap) with the Sigma Universal Standard

Proteins removed and human angiotensin II and [Glu-1] fibrinopeptide B (MS test peptides)

added, for a total of 66 entries. Decoys were generated with Mimic (www.kaell.org), which

randomly shuffles peptide sequences between tryptic residues, but retains peptide sequence

homology in decoy entries. 290

Search criteria used for X!Tandem included a precursor mass tolerance of 15 ppm and a

fragment mass tolerance of 15 ppm for higher-energy collisional dissociation spectra.

Peptides were assumed to be semi-tryptic (cleavage after K or R except when followed by P),

but semi-tryptic peptides with up to 2 missed cleavages were allowed. The search parameters

included a static modification of +57.021464 Da at C for carbamidomethylation by 295

iodoacetamide and potential modifications of +.15.994915 Da at M for oxidation, -

17.026549 Da at N-terminal Q for deamidation, and -18.010565 Da at N-terminal E for loss

of water from formation of pyro-Glu. Additionally, -17.026549 Da at the N-terminal

carbamidomethylated C for deamidation from formation of S-carbamoylmethylcysteine and

N-terminal acetylation were searched. Peptide spectrum matches (PSMs) obtained from 300

X!Tandem were validated using the Trans Proteomic Pipeline18 version 4.6 Rev.1. The PSMs

were analysed with PeptideProphet19 to assign each PSM a probability of being correct.

Accurate mass binning was employed to promote PSMs whose theoretical mass closely

matched the observed mass of the precursor ion, and to correct for any systematic mass

Page 15: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

15

errors. Decoys and the non-parametric model option were used to improve PSM scoring. 305

Protein identifications were inferred with ProteinProphet. The ProteinProphet scores were

then analysed in iProphet20, which combines results from multiple fractions and multiple

database searches (although here, only X!Tandem was used) and assigns a probability for

each unique protein and its corresponding peptide sequences. The false discovery rate for a

given iProphet probability was calculated using the number of decoy protein inferences at 310

that probability. Only proteins identified at iProphet probabilities corresponding to a false

discovery rate (FDR) less than 1.0 % were further considered.

KO annotations and protein quantification

The corresponding sequences of the identified proteins were collected in FASTA format from

the non-redundant single peptides/protein sequences database previously generated from the 315

combined metagenome and metatranscriptome assemblies. Relative protein quantitation was

performed using the normalized spectral index (NSI) measure using an in-house software tool

called NSICalc as previously described21. The amino acid sequences of identified proteins

were then mapped to the KO library (KEGG database version 64.0) using BLAT15 (e-

value<10-5, %identity>50, score>50). From the resulting list of KOs, the frequency of each 320

KO was determined at the protein level, using an in-house developed Perl script.

Gene copy and transcript abundances

To account for differences in read depth and sampling, the number of raw sequence reads

from autumn and winter metagenome and metatranscriptome libraries were equilibrated by

randomly selecting reads in the larger libraries from the autumn sample to mirror the number 325

of reads in the smaller library (winter) using an in-house Perl script based on the ‘shuffle’

Page 16: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

16

method from the ‘List::Util’ CPAN package. This resulted in 14 546 374 reads and

16 443 761 reads being used from the metagenomic and metatranscriptomic libraries,

respectively. The four balanced raw read sequence libraries were then mapped separately to

the combined assembly for both sampling dates using SOAPaligner8 with the following 330

parameter settings: -r 2 -M 4 -l 30 -v 10 -p 8.

For each library, reads were mapped to genes and counted, except for reads mapping to

multiple genes, for which weighted proportions were used. Next, to obtain the abundances of

genes and transcripts, read counts were normalized by the length of the respective gene

sequences22, to obtain normalized gene copy abundances and normalized transcript 335

abundances, respectively. Normalized gene copy abundances per KO were obtained by

calculating the sum of normalized gene copy abundances from all genes belonging to the

same KO group. Similarly, the KO-wise transcript abundances were calculated as the sum of

normalized transcript abundances over all genes within the same KO.

340

Relative gene expression

Relative expression of KOs was determined by dividing the normalized transcript abundance

of each KO by the inferred gene copy abundance of the same KO23.

The relative expression of a KO is greatly dependent on the normalized gene copy

abundance, if this value is close to zero. Similarly, KOs with normalized gene copy

abundances close to zero are prone to be falsely identified as highly expressed due to their 345

very low gene copy abundances. Therefore, highly expressed KOs were selected based on

normalized gene copy and transcript abundances, in addition to relative expression. KOs

were considered highly expressed, if their relative expression was above the 90th percentile.

In addition, to avoid false positive identification of highly expressed KOs, the normalized

Page 17: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

17

transcript abundances of highly expressed KOs had to be above the 3rd quartile of the 350

normalized transcript abundances of all KOs or normalized gene copy abundances had to be

above an empirically determined threshold. This threshold was determined by sorting KOs by

their normalized gene copy abundances and applying a sliding window approach to

determine the lowest normalized gene copy abundance with an average relative expression

robustly within the interquartile range (see Supplementary Figure 2). 355

Sensitivity of gene expression analysis to imposed cut-offs

Results of the analysis of KOs exhibiting high relative expression are dependent on the

quantile of KOs considered highly expressed (we considered KOs with a relative expression

above the 90th percentile), as well as the lower cut-off which was set for gene copy

abundances to avoid false positive identification of KOs exhibiting very low gene copy 360

abundances and only slightly higher transcript abundances as highly expressed. Therefore, we

first analysed, if the numbers and identities of KOs identified based on their relative

expression were robust to different levels of noise added to the data. In addition, we analysed

whether the conclusions would also be robust to small to moderate changes in the selected

cut-off values. 365

To address the first point (robustness to noise), we changed the gene copy and transcript

abundances by a random number following a uniform distribution within different limits. The

lowest of these limits corresponded to +/- 1 read mapped per kilobase of metagenomic

sequence and the highest to +/- 50 reads mapped per kilobase. We then analysed the numbers

and identities of the highly expressed genes in 100 repetitions of each test, given the chosen 370

cut-offs and the identities of enriched pathways within the selected sets of KOs. We

compared the results to the same analysis carried out using a single numerical cut-off for

Page 18: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

18

relative expression (minimal relative transcript abundance = 10 times relative gene copy

abundance).

To address the second point (robustness to changes in cut-offs), we changed each cut-off 375

within small to moderate limits. For the inclusion of KOs irrespective of their gene copy

abundances, cut-offs at steps between the 55th to 95th percentile of transcript abundances were

used. For the exclusion of KOs with low gene copy abundances, different cut-offs of robust

relative expression were set between the 20th and 95th percentile. We then analysed the

numbers and identities of the KOs found to be highly expressed, as well as the pathways 380

enriched in these KOs.

Comparison of the metagenomic dataset with the metatranscriptomic and metaproteomic

datasets

The congruency of metagenomic and metatranscriptomic datasets was determined by

calculating the proportion of KOs with at least one gene having at least 10 metagenomic

reads per kilobase mapping to it and also at least one gene resulting in the mapping of 10 385

metatranscriptomic reads per kilobase. The congruency of the metagenomic and

metaproteomic datasets was calculated analogously as the proportion of KOs with at least one

gene mapped by at least 10 metagenomic reads per kilobase that also had at least one gene

identified at the protein level.

390

Analysis of pathway membership

Assignment of KOs to KEGG pathways was done by using the KO to pathway link

(http://rest.kegg.jp/link/Ko/pathway) in the KEGG database version 67.1. Enrichment of KOs

Page 19: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

19

in specific pathways was tested using a hypergeometric test and p-values were adjusted using

FDR-control24. Test results with adjusted p-values below 0.05 were considered significant.

395

Community-wide metabolic network reconstructions

The KO to R (reaction) link (http://rest.kegg.jp/link/Ko/reaction) was used to associate each

individual KO to the corresponding reactions following the R to RP (reaction pair) link

(http://rest.kegg.jp/link/reaction/RP), thereby, associating individual reactions to their

corresponding main reaction pairs. The RPAIR annotation was specifically chosen to ignore

unspecific compounds of reactions (water, energy carriers and cofactors), thereby, only 400

taking into account the main compounds of a reaction. The reaction pairs (RP) were then

further selected by using only RPAIRs with assigned reaction classes

(http://rest.kegg.jp/link/rn/rc). Finally, the reaction pair - compounds link

(http://rest.kegg.jp/list/RP) was used to associate individual RPs to corresponding pair(s) of

compounds. As some KOs have identical compounds (e.g. subunits of the same enzyme or 405

enzymes that catalyse each other’s reverse reaction), KOs with identical compounds (as

annotated in the KEGG database version 67.1) were grouped. A KO network graph was built

by using all KOs as nodes. Edges between two KOs were introduced if a product metabolite

of one KO was found as a substrate metabolite of the other KO. Multiple edges between the

same KOs were reduced into a single edge. For topological analysis of the reconstructed 410

metabolic network, nodes sharing the exact same edges were regrouped to be represented as a

single node. Multiple edges connecting the same nodes were likewise combined into a single

edge. The undirected network graph was visualized and analysed using Cytoscape25,

employing a spring embedded layout. Singleton nodes not connected to any other node were

removed. 415

Page 20: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

20

Calculation of gene copy and transcript number and relative expression of regrouped nodes

Gene copy numbers and transcript numbers of nodes which represented several KOs due to

their sharing of the same edges were calculated by summing all normalized gene copy and

transcript numbers, respectively. Relative expression of a node was calculated by dividing the 420

node-wise sum of normalized transcript abundances by the node-wise sum of normalized

gene copy abundances, thereby levelling relative expression of the regrouped KOs.

Identification of choke points

Choke points as defined by Rahman and Schomburg26 are enzymes which consume or

produce unique metabolites and possess a high load score in the metabolic network 425

reconstruction. To assess whether a node within the metabolic network reconstructions could

qualify as choke point, the number of edges representing every metabolite was counted,

yielding the number of occurrence. In the cases where an edge represented more than one

metabolite, the lowest number of occurrence was assigned to this edge. Every node was also

assigned the lowest number of occurrence of all its edges. Nodes with an assigned number of 430

occurrence of 1 were considered potential choke points. The number of occurrence for every

node is listed and potential choke points are highlighted in Supplementary Dataset 7.

Weighted load score

Weighting of compounds in a bi-partite RPAIR-based metabolic network reconstruction has

been shown to increase pathfinding accuracy27. The number of occurrence described above 435

was assigned as edge weight within the metabolic network reconstructions. A weighted

betweenness centrality was calculated using the R-package igraph28. Alternative weighted

load scores were calculated for each node from the weighted betweenness centralities and the

Page 21: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

21

degree as defined in the manuscript, and potential key functionalities were determined using

this measure and expression as described in the main manuscript. The resulting weighted 440

load scores and the identities of the key nodes were compared to the unweighted results

discussed in the manuscript.

Matching of genes to Candidatus Microthrix parvicella Bio17 genome

Amino acid sequences were aligned using BLAT15 with cut-offs chosen as follows: e-value <

10-5, %-identity > 50, score > 50. 445

Alignment of contigs encoding key functionalities

Contigs were selected based on the following criteria: (1) they encoded a gene annotated with

a KO representing a key functionality, and (2) the expression of this gene was corroborated

by at least one mapped metatranscriptomic read. Selected contigs were aligned to the NCBI

non-redundant nucleotide database using BLASTn with default parameters29. The best hit with 450

a query coverage above 50 % and a percentage identity above 80 % for each contig was

documented. In addition, contigs containing genes annotated as K03921 were aligned to 85

isolate genomes from the same biological wastewater treatment (BWWT) plant using BLAST

and selecting only results with percentage identity above 80 %.

455

Quantification of isolate sequences in combined metagenomic and metatranscriptomic

assemblies

Reads from the balanced libraries (see section Expression analysis and contextualization of

omic datasets - gene copy and transcript abundances) were mapped against the genome of

Page 22: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

22

Nitrosomonas sp. Is79 (Ref. 30) and Isolate LCSB065 using SOAPaligner8 with the

following parameter settings: -r 2 -M 4 -l 30 -v 10 -p 8 and mapped reads were counted.

460

Ammonia monooxygenase (AMO) contig extension

Genes encoding subunits A and B of ammonia monooxygenase (amoA and amoB) are

established phylogenetic markers31. However, none of the contigs from the combined autumn

and winter assembly that contained an open reading frame annotated as encoding for a

subunit of AMO (K10944, K10945, or K10946) harboured a complete gene. In order to

recover a full gene sequence and determine the position of the amoA genes recovered from 465

the combined metagenomic and metatranscriptomic data relative to other known amoA

sequences within a phylogenetic tree, we employed a contig extension protocol in order to

increase the length of the contigs via extension and merging of these contigs.

The contig extension protocol was carried out in a step-wise fashion including contig

extension, gene calling and gene annotation. Contig alignment, merging and extension was 470

performed using minimus2 (Ref. 32) from the AMOS suite33 with a minimum overlap of 60

bases at 98 % identity. The contig extension protocol was performed in three steps: i) contigs

were extended by aligning contigs annotated with the same KO IDs; ii) contigs were

extended by aligning contigs annotated with KOs K10944, K10945 or K10946; iii) contigs

were extended using a high-depth metagenomic assembly (see section Generation of 475

additional metagenomic data for contig extension and analysis) from a different sample,

using the AMO contigs from the previous step as a reference. This procedure was performed,

because the genes encoding the subunits of AMO are known to exist in a cluster/operon34.

Following the run on minimus2, gene calling was performed on the resultant contigs using

FragGeneScan13 and Prodigal14 using default parameters as previously used. Predicted 480

Page 23: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

23

amino acid sequences were then merged and made non-redundant based on 100 % sequence

identity using CD-HIT35 and were re-annotated with KO IDs using our annotation pipeline.

Gene calling and re-annotation steps were performed to ensure that the extended contigs

retained their original annotation reference. The extended contigs were then used for

downstream analysis in order to associate these AMO genes to a bacterial species. 485

AmoA phylogenetic analysis

Nearly complete amino acid sequences (201 – 274 amino acids) of AmoA and/or MmoA

from representative organisms belonging to the beta-Proteobacteria, gamma-Proteobacteria

and archaea were retrieved from the Refseq protein database (see Supplementary Table 3)

and aligned with ClustalOmega (using default parameters). The alignment file was submitted 490

to a phylogenetic analysis using the Phylogeny.fr customized workflow service36 including

alignment curation with Gblocks37 (using default parameters), tree construction with PhyML38

(bootstrap of 100), and visualization by TreeDyn39.

Isolate LCSB065 isolation

Isolate LCSB065 was obtained from an OMMC biomass sample diluted by a factor of 104. 495

The biomass was first cultivated on Petri dishes of wastewater-agar medium (1.5 % agar;

w/v) in 800 ml filtered (0.2 µm, Sartorius, Göttingen, Germany) wastewater from the

Schifflange BWWT plant. A single colony was then transferred to a Petri dish with R2A

medium40 and cultivated at 20 °C under aerobic conditions. Isolates were grown on different

growth media recommended for the culture of bacteria from water and wastewater, 500

particularly Microthrix parvicella, such as R2A40, wastewater agar medium, MSV + peptone

and MSV A + B41 or Slijkhuis A and F42 under different growth conditions.

Page 24: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

24

Nile red staining

Lipid inclusions were visualized using a protocol modified from Fowler & Greenspan43. A

stock solution of Nile red (Sigma-Aldrich, Diegem, Belgium) in acetone (Sigma-Aldrich) 505

was prepared at a concentration of 500 µg/ml and preserved at 4 °C protected from light.

50 µL of a working solution, containing 2.5 µl of the stock solution in 1 ml of 75 % (v/v)

glycerol, were deposited onto a microscopy glass slide with heat fixed bacterial cells. After

5 min incubation, epifluoresence and bright field microscopic observations of the same fields

of view were carried out on an inverted microscope (Nikon Ti) equipped with a 60 × oil 510

immersion Nikon Apo-Plan lambda objective (1.4 N.A). Intermediate magnification 1.5 ×

was used in order to better resolve images. Excitation light was from a Xenon arc lamp, and

the beam was passed through an Optoscan monochromator (Cairn Research, Kent, UK) with

550/20 nm selected band pass. Emitted light was reflected through a 620/60 nm bandpass

filter with a 565 dichroic connected to a cooled CCD camera (QImaging, Exi Blue). All 515

imaging data were collected and analysed using the OptoMorph (Cairn Research, Kent, UK)

and ImageJ44.

Isolate genome sequencing and genome assembly

Following DNA extraction from isolate cultures using the Power Soil DNA isolation kit (MO

BIO, Carlsbad, CA), a paired-end sequencing library with a theoretical insert size of 300 bp 520

was prepared with the AMPure XP/Size Select Buffer Protocol as previously described by

Kozarewa & Turner3, modified to allow for size-selection of fragments using the double solid

phase reversible immobilisation procedure described earlier by Rodrigue et al.45

and sequenced on an Illumina HiSeq with a read length of 100 bp. The resulting 854 683

Page 25: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

25

paired raw reads were de-duplicated with FastUniq46, and quality filtered to a minimum 525

average QV = 30 and a minimum length of 60 bp with the trim-fastq.pl script from the

PoPoolation suite5, leading to 551 103 read pairs and 145 500 single-end reads (high quality

data yield of 73 %). Two separate preliminary assemblies were obtained with IDBA-UD47;

v1.1.0, with parameters –mink 30 –maxk 90 –step 5 –similar 98 --pre_correction); and

SPAdes 2.5.0, using the hammer read error correction module which filtered for contigs 530

shorter than 200 bp. The resulting assemblies were merged with phrap (minimum overlap of

50 bp). The merged assembly was manually inspected with Consed48 and contigs broken

where merge conflicts were detected.

The resulting contigs were uploaded to and analysed using RAST49. According to this

analysis, the isolate was a Rhodococcus sp. Similarity of contigs to bacterial genomes (NCBI 535

database, accessed 6th March 2014) were assessed by BLAT15. The bitscore of each hit was

recorded and only contigs with a hit to Rhodococcus spp. with a bitscore within 60 % of the

best hit’s bitscore were selected for the final set. This led to the removal of 120 contigs most

of which had low coverage of reads.

Filtered reads were mapped onto the assembled contigs using BWA50 with default parameters. 540

Reads mapping with mapping quality scores at or above five were used to assess contig

coverage (Supplementary Dataset 8). The final set of contigs was submitted to AmphoraNet

server to determine the isolate species searching for 31 phylogenetic marker genes51, as well

as to RAST49 (accession number 6666666.64457) and annotated.

The COG protein profile of Isolate LCSB065 was determined as previously described by 545

Muller et al.21. Annotation of KOs was carried out on protein predictions from RAST as

described for metagenomic proteins.

Page 26: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

26

Page 27: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

27

Supplementary Results and Discussion

Sensitivity of gene expression analysis to imposed cut-offs

We found that the cut-offs imposed to avoid false-positives within the highly expressed genes 550

and genes encoding key functionalities led to a greater robustness against noise than would

be observed if simple numerical cut-offs had been chosen. This was obvious from the

simulation of noise, in which the number and variance of highly expressed genes grew with

the noise, when a numerical cut-off was chosen, whereas the choice of cut-offs, as defined

herein, resulted in mostly stable numbers (Supplementary Figure 3a&b). 555

The identities and numbers of KOs identified as exhibiting high relative expression in our

datasets were not overly sensitive to the cut-offs imposed to avoid false-positive results. The

numbers of highly expressed KOs decreased by less than 20 % by the exclusion of KOs with

low gene copy abundances (Supplementary Figure 3c). As most genes with very high

transcript abundances did not have low gene copy numbers (Figure 3), the cut-off selected 560

for inclusion of KOs with high transcript abundances irrespective of their gene copy numbers

changed the total number of highly expressed KOs by less than 5 %. If the transcript

abundance cut-off for genes with low gene copy abundances was set between the 55th and

95th percentile (our default value was the 75th percentile), an enrichment with the KOs above

the 90th percentile of the highly expressed KOs of 4 or 5 out of the 5 pathways in autumn, and 565

5 out of the 6 pathways in winter was consistently detected (Supplementary Figure 3d).

Only few additional pathways were enriched after variation of this cut-off (Supplementary

Figure 3e). In particular, the findings of pathways ko00910 “Nitrogen metabolism” and

ko00190 “Oxidative phosphorylation” as exhibiting overall high levels of gene expression

were resilient to changes in cut-offs. 570

Page 28: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

28

In conclusion, the described gene expression analysis is robust to noise, as well as too small

to moderate changes in the chosen cut-offs.

Effect of regrouping redundant KOs into single nodes

To carry out a topological analysis of the reconstructed metabolic network, nodes and edges

were rendered non-redundant, by representing multiple KOs with identical substrate and 575

product metabolites as a single node. Due to this step, 229 and 220 nodes representing more

than one KO were part of the autumn and winter metabolic network reconstructions,

respectively. The calculated load scores were overall only mildly affected by the regrouping

(Spearman correlation of load scores in the redundant and non-redundant autumn or winter

network reconstructions: 0.98). 70 % of key functionalities identified in the non-redundant 580

network were shared between both autumn network reconstructions (40 % for the winter

network reconstructions), as reported in Supplementary Dataset 7. The rationale behind

regrouping redundant KOs into single nodes was based on the fact that most KOs represent

subunits of enzyme complexes, which do not work in parallel, but rather cooperatively in

metabolizing substrates. Consequently, some of the additional nodes found in the networks 585

with regrouped redundant KOs represent multi-subunit complexes, such as AMO, and we

therefore believe that the practice of regrouping redundant KOs into single nodes is

warranted.

Genomic analysis of Isolate LCSB065

Mapping of filtered reads onto Isolate LCSB065’s contigs revealed a mean/standard 590

deviation empirical sequencing insert size of 244±43 bp, and a mean read depth per mapped

position (coverage) of 27±11x (median 25x).

Page 29: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

29

As a first approach to analyse Isolate LCSB065’s genetic potential, protein coding genes

were annotated with KOs as before for the metagenomic and metatranscriptomic sequences.

Of utmost interest, out of 3 373 protein coding genes that could be annotated with KOs, 420 595

genes (12.5 %) were annotated as KOs belonging to “Lipid metabolism”, according to KEGG

Orthology. This relates to rank 4 behind “Amino acid metabolism” (19.1 %), “Carbohydrate

metabolism” (16.2 %) and “Xenobiotics biodegradation and metabolism” (14.0 %). Similar

results were obtained from COG categories52 and SEED subsystems categories of the

predicted proteins53. The assembled genome also encodes genes for the synthesis and 600

polymerisation of poly-hydroxybutyrate (PHB, Supplementary Dataset 8 indicated in

Figure 5b) and for the synthesis of TAGs (Supplementary Dataset 8 indicated in Figure

5b), inclusions of which are visible following Nile Red staining (see Supplementary Figure

8). Nine other genes beside the gene matching to the three metagenomic contigs encoding

acyl-[acyl-carrier protein] desaturases are annotated as desaturases in the isolate genome. 605

The genomic region of the gene matching to the three metagenomic contigs encoding acyl-

[acyl-carrier protein] desaturases was assessed. The gene is the first gene of a syntenous

block of four genes present in Rhodococcus jostii RHA1, Nocardia farcinica IF, Godronia

bronchialis and Tsukamurella paurometabola DSM 20162. This block encodes the

homologous fatty acid desaturase (peg.6927), followed by a tRNA dihydrouridine 610

synthase B, a cell envelope-associated transcriptional attenuator LytR-CpsA-Psr of the

subfamily A1 and a phosphate transport system regulatory protein PhoU. The genome of

Isolate LCSB065 furthermore contains six genes coding for lipases with an export signal

peptide, thereby reinforcing its potential keystone role in the community (Supplementary

Dataset 8). The genomic enrichment of genes involved in lipid metabolism suggests that the 615

isolated Rhodococcus sp. may occupy a function as keystone species within the sampled

OMMCs. Additional work is required to elucidate the exact role of this organismal group

Page 30: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

30

within the OMMC community.

Page 31: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

31

620

List of Supplementary Figures

Supplementary Figure 1 Overview of the assembly and annotation pipeline.

Supplementary Figure 2 Determination of lower thresholds for gene

abundances for the selection of highly expressed

KOs within the reconstructed community-wide

metabolic networks.

Supplementary Figure 3 Results of the sensitivity analyses.

Supplementary Figure 4 Expression of KOs in metabolic pathways at the

protein level.

Supplementary Figure 5 Generalized OMMC-wide metabolic network

reconstructed from the combined metagenomic and

metatranscriptomic data of OMMCs sampled in

autumn and winter.

Supplementary Figure 6 Representation of the fatty acid metabolic pathway

(ko01212).

Supplementary Figure 7 OMMC-wide metabolic networks reconstructed

from metagenomic and metatranscriptomic data of

OMMCs sampled in autumn and winter.

Supplementary Figure 8 Micrographs of Isolate LCSB065.

Page 32: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

32

List of Supplementary Tables

Supplementary Table 1 Physicochemical characteristics of the wastewater at

the time of sampling in the anoxic tank of the

Schifflange BWWT plant.

Supplementary Table 2 Quantitative and qualitative characteristics of

biomacromolecular fractions sequentially isolated

from the OMMC samples.

Supplementary Table 3 Statistics of the combined assembly of the

metagenomic and metatranscriptomic sequence

datasets.

Supplementary Table 4 Ammonia-oxidizing organisms with corresponding

accession numbers of amoA genes used for the

reconstruction of the phylogenetic tree.

Supplementary Table 5 Numbers of metagenomic and metatranscriptomic

reads from autumn and winter dates mapping to the

isolate genomes of Nitrosomonas sp. Is79 and

Isolate LCSB065.

Page 33: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

33

List of Supplementary Datasets

Supplementary Dataset 1 Dominant genera in the autumn and winter samples

determined based on reconstruction of 16S rRNA

gene sequences from the metagenomic data.

Supplementary Dataset 2 KO gene copy (KOGA), transcript (KOTA) and

protein (NSI) abundances in datasets from autumn

(04 October 2010) and winter (25 January 2011)

samples.

Supplementary Dataset 3 Highly expressed metabolic KOs in the (a) autumn

and (b) winter sample and pathways overrepresented

by metabolic KOs highly expressed in the (c)

autumn or (d) winter sample.

Supplementary Dataset 4 Generalized microbial community-level

reconstructed metabolic network reconstructed

using the combined metagenomic and

metatranscriptomic datasets in simple interaction

format.

Supplementary Dataset 5 Autumn-specific reconstructed microbial

community-level metabolic network in simple

interaction format.

Supplementary Dataset 6 Winter-specific reconstructed microbial community-

level metabolic network in simple interaction

Page 34: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

34

format.

Supplementary Dataset 7 Results of topological analyses. List of all nodes in

the metabolic networks reconstructed from OMMCs

sampled in (a) autumn and (b) winter with

topological measures and expression values. (c)

KOs with betweenness centrality values different

between metabolic networks reconstructed from

OMMCs sampled in autumn and winter, and (d)

pathways enriched in these KOs. KOs with key

functionality in OMMCs sampled in (e) autumn or

(f) winter including topological analysis results,

gene abundances and expression, as well as protein

abundances and pathway membership. (g) Results of

BLAST searches of all genes encoding key

functionalities against publicly available bacterial

genomes.

Supplementary Dataset 8 Summary of characteristics of Isolate LCSB065’s

genome, results of the phylogenetic analysis, and

coverage of Isolate LCSB065 contigs by reads from

the isolate genome sequencing.

Page 35: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

35

Supplementary Figures

Supplementary Figure 1 Overview of the assembly and annotation pipeline. Annotated KOs

were used for the subsequent metabolic network reconstructions. QC: quality control, 1-7:

steps in the pipeline.

1

2

3

4

5

6

metagenomic reads metatranscriptomic reads

PANDAseq + trim-fastq.pl: joverlaps joined and trimmed, QC

MOCAT assembly pipeline: combined assembly of metagenomic and metatranscriptomic reads

SOAP: reads joined by PANDAseq mapped to MOCAT contigs to recruit unassembled reads

autumn sample winter sample autumn sample winter sample

FragGeneScan: genes called on contigs < 500 nt

Prodigal: genes called on contigs ≥ 500 nt

CD-HIT: processed reads made non-redundant

7

CD-HIT: predicted amino acid sequences made non-redundant

BLAST-X and KEGG: enzyme annotation (e-value < 10-5, %identity > 50, score > 50)

K00001 K00002 K00003 K00004 K00005 …!

nucleotide sequence amino acid sequence legend: KXXXXX KO identifier!

contigs > 150 nt

Page 36: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

36

Supplementary Figure 2 Determination of lower thresholds for gene abundances for the

selection of highly expressed KOs within the reconstructed community-wide metabolic

networks. (a) and (b) Moving median of KO relative expression (relExp) versus gene copy

abundance (KOGA) for (a) autumn and (b) winter. (c) and (d) KO relative expression

(relExp) versus gene copy abundance (KOGA) for (a) autumn and (b) winter. Vertical lines

indicate lowest gene abundances with robust gene expression values within the interquartile

range.

a b

c d

●●

●●

●●

●●●● ●●●●●●●●●●●●

●●

●● ●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●●

●●

●●

●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●

1e−03 1e−01 1e+010.1

0.51.0

5.010.0

50.0100.0

mov

ing

med

ian

wint

er re

lExp

rwinter KOGA

●●

●●●

●●●●●

●●● ●●

●●●●●●●●●●●●●

●●●

●●●●●●●●

●●●

●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●●

●●

●●

●●●●●

●●

●●●●●●●●●

●●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●●●●●

●●●●

●●●

●●●●●

●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●

1e−03 1e−01 1e+01

0.2

0.51.02.0

5.010.020.0

50.0100.0

mov

ing

med

ian

autu

mn

relE

xpr

autumn KOGA

●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●

1e−03 1e−01 1e+01

1e−02

1e−01

1e+00

1e+01

1e+02

autu

mn

relE

xpr

autumn KOGA

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

1e−03 1e−01 1e+01

1e−02

1e−01

1e+00

1e+01

1e+02

wint

er re

lExp

r

winter KOGA

Page 37: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

37

Supplementary Figure 3 Results of the sensitivity analyses. (a) and (b) Effect of noise on

the number and identity of genes with high relative expression determined by comparing a

simple numerical cut-off and our method for the reduction of false-positives by excluding

genes with very low gene copy abundances. Filled boxes indicate total number of genes with

high relative expression, boxes with blue or brown lines indicate the sizes of the intersect of

genes with high relative expression without noise and after addition of noise; (a) autumn

dataset, (b) winter dataset. (c) Effect of changing the cut-offs for exclusion of genes with low

gene copy abundances to reduce false-positives on numbers of genes with a high relative

expression. (d) and (e) Effect of changing the cut-offs for exclusion of genes with low gene

copy abundances to reduce false-positives on the identities of pathways enriched with highly

expressed genes. (d) Number of pathways enriched with highly expressed genes that are

found at the chosen cut-off (0.75) and after variation of the cut-off. (e) Number of additional

a b

c d e

0.5

0.55 0.

60.

65 0.7

0.75 0.

80.

85 0.9

0.95 1

addi

tiona

l enr

iche

d pa

thwa

ys0

1

2

3

4

cut−off for exclusion of geneswith low gene copy abundance

autumn winter

0.5

0.55 0.

60.

65 0.7

0.75 0.

80.

85 0.9

0.95 1

cons

erve

d en

riche

d pa

thwa

ys

0

2

4

6

8

cut−off for exclusion of geneswith low gene copy abundance

autumn winter

0.5

0.55 0.

60.

65 0.7

0.75 0.

80.

85 0.9

0.95 1

high

ly e

xpre

ssed

gen

es

0

50

100

150

200

250

cut−off for exclusion of geneswith low gene copy abundance

autumn winter

●●

●●

1 2 3 4 5 7 10 15 20 50

0

100

200

300

400

num

ber o

f gen

es

●●

noise [reads per kb]

simple cut−off

●● ● ●

●●●●●●● ● ●● ●

1 2 3 4 5 7 10 15 20 50

0

100

200

300

400

num

ber o

f gen

es

●● ●●

●●

noise [reads per kb]

exclusion of genes withlow copy abundance

●●● ●● ●

● ●

●●

1 2 3 4 5 7 10 15 20 50

0

100

200

300

400

num

ber o

f gen

es ●

●●

● ●●

noise [reads per kb]

simple cut−off

●●

1 2 3 4 5 7 10 15 20 50

0

100

200

300

400

num

ber o

f gen

es

● ● ● ●

noise [reads per kb]

exclusion of genes withlow copy abundance

Page 38: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

38

pathways enriched in highly expressed genes, which are not found at the chosen cut-off but

after relaxation of the cut-off.

625

Page 39: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

39

Supplementary Figure 4 Expression of KOs in metabolic pathways at the protein level. (a)

and (b) Comparison of gene copy abundances (KOGA) and protein abundances (NSI) in the

(a) autumn and (b) winter samples. (c) and (d) Comparison of gene transcript abundances

(KOTA) and protein abundances (NSI) in the (c) autumn and (d) winter samples. (e) and (f)

Comparison of expression values relative to gene copy numbers (KOGA), transcript

expression levels (KOTA) and protein abundances (rel. protein exp.) in the (e) autumn and (f)

winter samples. (c to f) Highly expressed KOs are highlighted in red.

a

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

autu

mn

prot

eom

ics N

SI

autumn KOGA

●●

●● ●

● ●

●●

● ●

● ●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●●

● ●

●●

●●

●●

●●●●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

wint

er p

rote

omics

NSI

winter KOGA

b

c

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

autu

mn

prot

eom

ics N

SI

autumn KOTA

●●

●●

●●

● ●

●●

●●●●●●

e

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●●●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

autu

mn

rel.

prot

ein

expr

.

autumn relExp

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●

d

f

●●

● ●●

●●

● ●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

win

ter r

el. p

rote

in e

xpr.

winter relExp

● ●

●●

●●

●●

●● ●●

●●

●●

●●●

●●

●●

●● ●

● ●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●●●●

1e−03 1e−01 1e+01 1e+031e−07

1e−05

1e−03

1e−01

wint

er p

rote

omics

NSI

winter KOTA

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●●●●

Page 40: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

40

Supplementary Figure 5 Generalized OMMC-wide metabolic network reconstructed from

the combined metagenomic and metatranscriptomic data of OMMCs sampled in autumn and

winter. Coloured nodes indicate pathways enriched in highly expressed KOs; blue – oxidative

phosphorylation; yellow – nitrogen metabolism; red – TCA cycle; purple – glycerolipid

metabolism. Large grey nodes are highly expressed during at least one season. Opacity of

nodes indicates shortest average path length (the more transparent, the longer the path

length).

Node 4

"K12253"

"K14269"

"K04128"

"K10674"

"K08357-K08358"

"K01698"

"K02083"

"K09880"

"K06720"

"K03688"

"K03852""K00797"

"K01922"

"K01874"

"K11180-K11181"

"K00549"

"K07173"

"K13038"

"K05827"

"K01778"

"K00275"

"K01712"

"K13747"

"K01423"

"K05275"

"K08353"

"K00198-K03518-K03519-K03520" "K00192-K00195"

"K01480""K12251""K09065"

"K01932"

"K01883-K01884"

"K01799"

"K01594""K02536"

"K03343"

"K01581"

"K12254""K10764"

"K12743"

"K03119"

"K00677"

"K01761"

"K01427-K01428-K01429-K01430-K14048"

"K01740""K01046-K14675" "K01738"

"K12339"

"K00290"

"K13746"

"K13034"

"K00471"

"K11263"

"K03474"

"K01672-K01674"

"K01457"

"K14541"

"K05888"

"K06281-K06282"

"K02492"

"K08688"

"K00640"

"K00194-K00197"

"K08295""K00169-K00170-K00171-K00172"

"K00631-K03621-K08591"

"K09011"

"K14138"

"K00122-K00123-K00124-K00127-K05299-K08348-K15022"

"K00620""K00174-K00175-K00176-K00177"

"K10702""K00382"

"K01026""K04042"

"K01655-K02594" "K14681"

"K00619-K14682"

"K01062"

"K01049""K02535"

"K00467"

"K01739""K00031"

"K06193""K00166-K00167-K11381"

"K01676-K01677-K01678-K01679" "K00137"

"K13929""K05823"

"K01060""K01438"

"K00316"

"K01039-K01040"

"K00655"

"K01034-K01035" "K01640"

"K00657""K00635""K01067""K01905"

"K00997-K06133"

"K00651"

"K00641""K00164""K00658"

"K04020"

"K01647-K01648-K15230-K15231" "K15741"

"K00625-K13788-K15024"

"K01659""K10977"

"K01714"

"K00161-K00162-K00163" "K02437"

"K03779""K00158""K00681"

"K00830"

"K11717"

"K01760-K14155" "K14085"

"K00823""K00156""K01758"

"K01512""K14260" "K12256"

"K11258""K01636"

"K16164-K16165"

"K01652-K01653" "K00101"

"K00822""K00016"

"K02510""K00129""K00027-K00028-K00029"

"K00835""K01958-K01959-K01960"

"K00149""K03417""K08691"

"K01255-K01256-K01270-K07751-K11140-K11142"

"K03737""K00128""K03416""K00656"

"K00827-K14272" "K00814"

"K08967" "K00824""K03851"

"K14264""K00260-K00261-K00262-K15371" "K01952""K00264-K00265-K00266"

"K00831"

"K01437""K07806"

"K01953""K00859""K01479""K09482"

"K05597""K00292-K00293" "K01469"

"K01923"

"K01937"

"K01656""K15849"

"K00287""K00289"

"K13501"

"K00603""K01915""K01425"

"K01930""K01443""K11754"

"K00764""K02433-K02434-K02435"

"K09758" "K13990"

"K01657-K01658-K13503" "K13497"

"K03342""K00812-K00813-K11358-K14454-K14455"

"K00817""K00820"

"K01458""K00832""K01664-K01665-K13950" "K00815"

"K00604"

"K09698"

"K02224"

"K01846""K02232" "K01885"

"K12234""K01663-K02500-K02501"

"K09251""K14268"

"K13821"

"K09470"

"K14163""K00826"

"K10206""K00819"

"K01776""K00284""K01580""K01744""K00294""K00836-K15785" "K00816""K14267"

"K00840""K00606""K15372""K13566"

"K00024-K00025-K00026-K00051-K00116" "K07250-K13524" "K00841"

"K01940""K01044"

"K03918""K01697""K00548"

"K14157""K00030""K01895"

"K01452"

"K01643-K01644"

"K13559"

"K13877"

"K00109""K01908"

"K04487""K13017""K01919-K11204"

"K00821""K08969"

"K00818""K01924""K05830"

"K00252"

"K07511"

"K01715"

"K01692"

"K07516""K00255-K07512" "K15016"

"K10527"

"K00023"

"K09479"

"K07515"

"K00665""K01964-K15036" "K11533"

"K00648"

"K01961-K01962-K01963-K02160-K11262" "K00668"

"K14471-K14472"

"K10852"

"K02372"

"K00253""K11410"

"K00208"

"K01968-K01969"

"K00209"

"K02371"

"K15019"

"K01847-K01848-K01849-K11942"

"K01687"

"K05939"

"K01027-K01028-K01029" "K01965-K01966-K15052"

"K00491-K13242" "K00232"

"K07514"

"K01613"

"K01825"

"K00248-K09478"

"K00074"

"K01782""K06445"

"K00249"

"K00234-K00235-K00237-K00239-K00240-K00241-K00242-K00244-K00245-K00246-K00247"

"K00271"

"K05606"

"K01755""K01468"

"K00052"

"K02517"

"K00263"

"K01439"

"K01555-K16171"

"K01682"

"K00647-K09458" "K00667"

"K01681"

"K03801"

"K03644"

"K00215""K00611" "K01478"

"K01921"

"K00059"

"K06447"

Node 5

"K12355""K01254"

"K00507"

"K00227"

"K00083"

"K07534"

"K07535"

"K13239" "K07753"

"K09829"

"K04118"

"K02609-K02610-K02611-K02612-K02613" "K10978"

"K10258"

"K00130"

"K13779"

"K05824"

"K12658"

"K01500"

"K02549"

Node 6

"K12405"

"K10214"

"K01467""K01800"

"K01796"

"K01705"

"K02618"

"K13019"

"K01661"

"K11731"

"K07536"

"K16173"

"K01484"

"K00213"

"K01434"

"K00454"

"K06015"

"K07425"

"K00508"

"K00300"

"K00257"

"K00490"

"K08262"

"K11987"

"K05607-K13766"

"K00108-K00499"

"K03921"

"K08683"

"K01615"

"K06446"

"K13767"

"K14446"

"K14534"

"K11538"

"K04126""K14732"

"K01708"

"K04127"

"K01720"

"K01702"

"K03383"

"K00286"

"K01703-K01704"

"K01881"

"K10793-K10794"

"K00097"

"K01706"

"K01777"

"K00472"

"K01466"

"K01870"

"K10536"

"K01869"

"K14259"

"K01477"

"K01707"

"K01941"

"K03182-K03186" "K01845"

"K01716""K13777-K13778" "K10780"

"K01583-K01584-K01585-K02626"

"K01887""K00019"

"K01873"

"K01476"

"K01673"

"K00318""K01750"

"K05551""K13018"

"K00634"

"K15232"

"K04110""K00140"

"K00627""K00613"

"K03821""K14469""K15018""K01911-K14760" "K14467""K07823"

"K08764""K01578""K08692-K14067-K14451" "K15742"

"K05822""K10817""K01902-K01903"

"K06718""K01899-K01900" "K00674"

"K16007""K15913"

"K02615""K01641""K00626"

"K01897-K15013"

"K15866""K09699""K01907""K00673"

"K14371"

"K00645""K01649""K00186-K00187" "K15880"

"K00652""K14245"

"K14674""K15038""K08748""K01913"

"K01041""K01912-K02614"

"K13356""K01896""K00021-K00054" "K04116""K03823"

"K12298""K00643""K11992"

"K13745""K06130"

"K14676""K00043"

"K05287" "K15359"

"K00100""K01075"

"K00132-K04073" "K00639""K00193"

"K13922""K01638"

"K00654"

"K13923"

"K04105""K00680"

"K01442"

"K00273"

"K04072""K04021"

"K01505"

"K01637""K00157"

"K14621""K01058""K00020"

"K01031-K01032"

"K00022""K00632"

"K07508-K07509" "K07513"

"K00135-K00139"

"K01076"

"K05605"

"K13776"

"K01904"

"K01068"

"K02428"

"K03426"

"K01515-K08312"

"K01791"

"K01241""K00698"

"K02527"

"K16330""K00876"

"K00925""K01950"

"K09759"

"K01914"

"K01886""K13035"

"K00602-K01492"

"K00278"

"K01424""K03465"

"K13998""K00926"

"K11541""K00931"

"K01006-K01007"

"K11540""K12657"

"K01954-K01955-K01956" "K00901"

"K04112-K04113-K04114-K04115"

"K00949""K12659""K13941""K12524-K12525"

"K00929""K00932""K00928"

"K01522""K00939"

"K00867-K03525-K09680" "K01008"

"K00951"

"K00868"

"K00930""K00950""K00940""K14154"

"K02231""K00874"

"K10807-K10808"

"K00857""K00942"

"K00955-K13811"

"K00765-K02502" "K06164"

"K15519""K00856""K00899"

"K13930""K00943""K00860"

"K00944""K00985"

"K06982""K03338""K16328""K00948""K10353"

"K01431""K00133"

"K00394-K00395" "K00003""K00364"

"K01942-K03524"

"K15728"

"K00088"

"K00609-K00610" "K15786" "K13547"

"K12526"

"K01951""K01579"

"K01779""K03800"

"K01918"

"K00143" "K01756"

"K05957""K01939""K01876"

"K15894"

"K00145"

"K05829"

"K00599"

"K00679"

"K00772"

"K13294"

"K00759"

"K01490"

"K05907""K11212"

"K01081-K03787-K11751"

"K01120-K13298"

"K00760"

"K01928"

"K01844"

"K16054"

"K01749"

"K01929"

"K08964"

"K03340"

"K00440-K00441-K00443"

"K00329-K00330-K00331-K00332-K00333-K00334-K00335-K00336-K00337-K00338-K00339-K00340-K00341-K00342-K00343-K03878-K03879-K03880-K03881-K03882-K03883-K03884"

"K01719"

"K06042""K05895"

"K01935"

"K08963"

"K04835"

"K08351""K01012"

"K03184"

"K06126""K06134"

"K02827-K02828"

"K02230-K09882-K09883"

"K00355"

"K02304""K01599"

"K03795"

"K03185"

"K01470"

"K03365"

"K03403-K03404-K03405"

"K13016-K13020"

"K00231"

"K01772""K01473-K01474"

"K00979"

"K11185-K11186"

"K12725"

"K11207"

"K00228-K02495"

Node 3

"K12728"

"K01627"

"K03270"

"K05934""K00558"

"K05928"

"K00563-K00564-K06970-K08316-K12297"

"K03183"

"K05936" "K01843"

"K13623""K02188"

"K03428""K01611"

"K13542-K13543"

"K00833"

"K06127"

"K09187-K09188-K11419-K11420-K11423-K11429"

"K00589-K02303-K02496" "K02302"

"K03394"

"K15792"

"K00595"

"K00568"

"K01485"

"K11752"

"K00790"

"K13015"

"K01498"

"K00956-K00957-K00958"

"K12711" "K00077"

"K15998"

"K15989"

"K00773"

"K01591"

"K00788"

"K01465"

"K11121""K03707"

"K00969-K06210-K13522"

"K13421"

"K01488""K15938""K00106"

"K08693""K01486""K02473"

"K00972-K11528"

"K01489"

"K00758"

"K00757"

"K00087-K11177-K11178-K13480-K13481-K13482-K13483"

"K03783-K03784" "K01487"

"K00756""K03815"

"K01815"

"K02259"

"K02474"

"K03274"

"K01119"

"K03337"

"K00065"

"K01514-K01524" "K09018-K09024"

"K00763"

"K00767"

"K00010"

"K03462"

"K08281""K03517"

"K14447"

"K00469"

"K00207"

Node 2

"K00768"

"K00226-K00254" "K02472"

"K00762"

"K13763"

"K01493"

"K01000"

"K01239"

"K01494"

"K03269"

"K00769-K03816" "K15780"

"K01520""K02563"

"K02233"

"K07264"

"K01129""K00761-K02825"

"K02319-K02320-K02321-K02322-K02323-K02324-K02325-K02327-K02335-K02337-K02338-K02339-K02340-K02341-K02342-K02343-K02684-K03504-K03763-K14159"

"K16329"

"K01814""K11755"

"K01523"

"K03335"

"K01195"

"K01686"

"K16044"

"K01464"

"K04034-K04035"

"K01685"

"K00543"

"K00789"

"K01243""K13621"

"K00390""K00547"

"K13622"

"K01251"

"K00013"

"K01745"

"K01089""K01598"

"K01892"

"K01014"

"K01082"

"K01590"

"K00954-K02201"

"K01693"

"K01586""K00380-K00381-K00392"

"K02169"

"K00551-K00570"

"K05362"

"K00802"

"K03897"

"K00559"

"K04075"

"K04566-K04567-K04568"

"K02301"

"K00040"

"K00082""K00041"

"K01812""K03336"

"K01496"

"K14152"

"K00798"

"K03927"

"K00075"

"K02225-K02227"

"K04486-K05602" "K13388"

"K12726"

"K00387-K05301-K07147"

"K12502"

"K00588"

"K00569"

"K13384"

"K05831"

"K01925"

"K01582"

"K08968""K02228"

"K03785-K03786"

"K00973"

"K01858"

"K01819""K10708""K00066"

"K10046"

"K01711"

"K00966-K00971" "K09461"

"K01092"

"K00322-K00323-K00324-K00325"

"K01690"

"K06118"

"K00800"

"K01820"

"K00483"

"K01916"

"K00218"

"K04040"

"K11333-K11334-K11335"

"K04037-K04038-K04039"

"K00039"

"K14448"

"K02377"

"K00975"

"K01057-K07404" "K12450"

"K00354"

"K13832""K00014"

"K00012""K01784"

"K00965""K00963"

"K05358"

"K00036"

"K01792"

"K01735"

"K15917"

"K01085"

"K01596"

"K00560""K01610""K02586-K02588-K02591"

"K01948""K02203"

"K00927""K00865-K15788" "K10760"

"K00864""K00863-K05878-K05879"

"K11529""K00855"

"K00852""K07031""K13829"

"K00917""K00962""K00919""K00524-K00525-K00526" "K00848""K00888"

"K00921""K02999-K03002-K03004-K03005-K03006-K03007-K03010-K03011-K03013-K03016-K03018-K03021-K03023-K03040-K03041-K03042-K03043-K03044-K03046-K03048-K03060-K13797-K13798" "K13940"

"K00869""K00872-K02204"

"K01768-K08043-K08048-K08049-K11029" "K01525"

"K01509-K01553" "K00923"

"K00946"

"K00882"

"K12446""K00885"

"K00884""K00853"

"K00922"

"K00877-K00941" "K00878""K00527""K00912"

"K14153""K00887"

"K13799"

"K16146""K00854"

"K00847"

"K00850""K00849"

"K00900"

"K13057"

"K11753"

"K13830""K00851"

"K03272""K10710""K00858""K00945-K13800" "K00703-K16148" "K00891""K04765"

"K09903"

"K01623-K01624-K11645"

"K01621"

"K00134-K00150-K05298-K10705"

"K01783"

"K00131""K01622"

"K01635-K08302"

"K01803"

"K01689" "K13831"

"K01601-K01602"

"K06131-K06132"

"K00616"

"K00615"

"K13810"

"K00058"

"K02446-K03841-K04041" "K01837"

"K01834-K15633-K15634-K15635"

"K01628"

"K01841"

"K01629""K01662"

"K00096"

"K03431""K09483"

"K08097"

"K03635-K03636"

"K04713"

"K16019"

"K03473" "K01186"

"K00983"

"K00968"

"K00523-K12452"

"K00794"

"K16020"

"K10960"

"K00995"

"K03637-K03639"

"K01808"

"K01807"

"K00384"

"K00796" "K16306"

"K00048""K03472"

"K02548"

"K01077-K01113"

"K01654"

"K01183-K13381" "K06120-K06121-K06122"

"K01053"

"K00493-K14338"

"K01095-K01096"

"K00005"

"K00011"

"K01632"

"K01633"

"K02564-K12410"

"K15669"

"K01101"

"K06920"

"K03476"

"K00423-K00434"

Node 1

"K01709"

"K01818"

"K00103"

"K06879-K09457"

"K12451"

"K01790"

"K01218"

"K14470"

"K01179-K01225"

"K00067"

"K03082"

"K13085"

"K03587-K05364-K05365" "K14394"

"K03526"

"K00793"

"K09709""K04712"

"K01198""K01078-K09474"

"K00090-K06152"

"K02821-K03475" "K01207"

"K00299""K02771" "K03273""K01117"

"K02786-K02787-K02788" "K02815""K07026"

"K14449" "K02798-K02799-K02800" "K02775"

"K00099"

"K01710"

"K02768-K02769-K02770"

"K02802-K02803-K02804"

"K01786-K03077-K03080"

"K02763-K02764-K02765" "K01804""K02782-K02783"

"K01189-K07406-K07407"

"K04719"

"K01222"

"K02777" "K01787"

"K01626-K03856"

"K00068""K00009"

"K00033"

"K11532"

"K00895"

"K01770"

"K16055""K05947"

"K00720""K00697"

"K01810-K06859"

"K01835"

"K01112""K12506"

"K03271"

"K16011"

"K15779"

"K15778"

"K01809"

"K06153"

"K15916"

"K00721""K00754"

"K01839"

"K00845"

"K00696"

"K10012"

"K01139" "K03429"

"K00748"

"K00693-K00694-K16150-K16153"

"K01769-K12319-K12320-K12323-K12324" "K15914"

"K13656"

"K01805"

"K01187-K12317-K15922"

"K01201"

"K00008"

"K01182"

"K01193"

"K02778-K02779"

"K00886""K01232"

"K05351""K00691-K00705" "K01087"

"K01226"

"K00978"

"K01838""K01194-K05342"

"K01223""K12309"

"K01199"

"K01190""K05343"

"K01188-K05349-K05350" "K01229-K12111"

"K00034"

"K02793-K02794-K02795-K02796" "K02790-K02791"

"K02817-K02818-K02819"

"K12308"

"K02808-K02809-K02810"

"K08679"

"K00045""K00115-K00117"

"K00689-K00690-K05341"

"K01220""K08678""K01785"

"K00991"

"K01178""K01840"

"K00702"

"K15982-K15983"

"K00004"

"K15817""K15812"

"K06167"

"K15813""K01822"

"K13370" "K15054"

"K00511"

"K00251"

"K12343-K12344"

"K16051"

"K09472"

"K05898"

"K01728""K01253"

"K10219"

"K03391"

"K00151"

"K10221"

"K10027"

"K02294"

Node 7

"K15745" "K14605""K09835""K02293"

"K06013"

"K00514"

"K01042"

"K01634"

"K00119""K00376"

"K09459"

"K00967"

"K04708""K05884"

"K15888"

"K15887"

"K00805"

Node 8

"K05356"

"K03831-K15376"

"K15857"

"K12503"

"K00801"

"K05954"

"K10187"

"K11778"

"K02291""K14066"

"K01597""K05355"

"K13787-K13789"

"K00787-K00795"

"K06443"

"K06045"

"K00044"

"K02292"

"K01499"

"K09879"

"K00806"

"K01823"

"K01130"

"K00791"

"K03379"

"K03527"

"K10026"

"K05979"

"K02523"

"K03392"

"K15767"

"K14520"

"K01061"

"K03381"

"K01801"

"K00450"

"K01857"

"K01588"

"K01055"

"K01821"

"K13774-K13775"

"K03333-K16045"

"K14727"

"K10217"

"K13043" "K15067"

"K00217"

"K16242-K16243-K16244-K16245-K16246"

"K00452"

"K09516"

"K00368"

"K01502"

"K00258"

"K10700"

"K01852""K00485"

"K11693"

"K01853"

"K03366"

"K10619-K16303"

"K10618"

"K07538"

"K06163"

"K14083"

"K14519"

"K01576"

"K05797"

"K09471"

"K15758""K00399-K00401-K00402"

"K05921"

"K03388-K03389-K03390"

"K01575"

"K01725"

"K00446"

"K01617"

"K12468"

"K00451"

"K01131"

"K05783"

"K01589"

"K07539"

"K01607"

"K03464"

"K00224"

"K15981-K16046"

"K10676"

"K01856"

"K00222"

"K05917"

"K00480""K00517"

"K10797""K11147-K11153"

"K05915"

"K01868"

"K01933"

"K00053"

"K00060-K15789"

"K11808""K00155"

"K03367-K14188"

"K05549-K05550-K05784"

"K07249"

"K02551"

"K01721" "K10220"

"K00360-K00367-K00370-K00371-K00372-K00373-K00374-K02567-K02568"

"K00448-K00449"

"K02636"

"K01051"

"K04480-K14081"

"K02164-K02305-K02448-K04561-K04748"

"K01826"

"K01002"

"K00466""K00453""K07106"

"K01609""K00486"

"K00055"

"K01564"

"K04097"

"K00319-K13942"

"K04099""K00383"

"K00432"

"K04100-K04101"

"K02170""K00481""K00799"

"K10535""K00320""K00577-K00581-K00582-K00583-K00584"

"K01817""K01560-K01561"

"K15228" "K13498""K01069"

"K01759""K15762"

"K04091"

"K01618"

"K01816"

"K00315"

"K10713"

"K00430-K11188"

"K03781""K03396"

"K01699-K13919-K13920"

"K06116-K06117" "K00980"

"K00502"

"K00086"

"K11780-K11781"

"K01091"

"K08093"

"K01867"

"K00463"

"K01734"

"K01866""K15226"

"K00501""K00500"

"K04518-K05359"

"K01889-K01890" "K01746"

"K01934" "K00766""K03782"

"K13951-K13952-K13980"

"K00492"

"K00459"

"K00001"

"K00529-K05710"

"K00114""K13954"

"K13953""K00121"

"K12732"

"K10775""K10944-K10945-K10946"

"K14028"

"K00285"

"K00104-K11472-K11473-K11517" "K00274"

"K03418""K14170"

"K01079"

"K00015-K00018" "K00200-K00203-K00205-K11260-K11261"

"K05603"

"K00148""K01592"

"K00457""K00270"

"K00276""K01569"

"K01851-K02361-K02552"

"K01875"

"K01070"

"K01563""K02554"

"K00141"

"K01066"

"K15739"

"K04103"

"K03179"

"K00146"

"K01920"

"K01608""K01737"

"K01917"

"K00301-K00302-K00303-K00314"

"K10774"

"K01126"

"K01726""K01620"

"K01945"

"K14422"

"K00102-K03777-K03778"

"K01668""K00600""K03181""K01666"

"K03430""K01667"

"K01872"

"K01011""K02619"

"K07246"

"K00605"

"K01752"

"K00259""K01571-K01572-K01573"

"K10218""K14759"

"K01754""K01775"

"K01501""K01568"

"K04782"

"K05396""K11787"

"K15864""K00362-K00363-K00366-K03385-K04016-K15876"

"K01556""K03735-K03736-K04019"

"K10216""K04022""K15066"

"K01733"

"K01451"

"K11788"

"K09019-K16066"

"K03153""K01426"

"K14419-K14420"

"K01878-K01879-K01880-K14164"

"K00281-K00282-K00283"

"K00050-K15893"

"K00210-K00211-K04517"

"K00042""K00220"

"K00002"

"K13812"

"K00998"

"K00873-K12406"

"K00981"

"K00468"

"K06215-K08681"

"K01495-K09007"

"K10011"

"K13036"

"K14652"

"K01893"

"K01497"

"K00505"

"K14187"

"K01619""K01736"

"K04093-K04516-K06208-K06209" "K00049-K12972"

"K02858"

"K03738"

"K01694""K01695-K01696-K06001"

"K03339"

"K00006-K00057-K00105-K00111-K00113"

"K16305"

"K00147""K13853"

"K01433-K01938-K13402"

"K01455"

"K05601""K01447-K01448"

"K07248"

"K00601-K08289-K11175" "K00288"

"K01639"

"K00297"

"K01114"

"K01625""K01595"

"K01593""K01491-K13403"

"K01631"

Page 41: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

41

Supplementary Figure 6 Representation of the fatty acid metabolic pathway (ko01212).

KOs with a betweenness centrality that is much higher in the metabolic network

reconstructed from the OMMC sampled in winter compared to the OMMC sampled in

autumn are highlighted in red. The key functionality of KO K03921 (acyl-[acyl-carrier

protein] desaturase) is highlighted in pink. Winter KOs with a high relative gene expression

are represented in blue. The dotted lines represent the continuity of the pathway and the

circles represent metabolites.

Palmi&c(acid(Hexadecanoyl0CoA(

K01897(

Hexadecanoyl(

K00665(

Stearoyl0CoA( Oleoyl0CoA( Linoleoyl0CoA(K00059(K10258(

K00647(K00208((K02372(K00647(K09458(K00208(K02371(K00059(K10258(

K03921' K10256(

Icosanoyl0CoA( Icosenoyl0CoA( Icosadienoyl0CoA(

Octadecatrienoyl0CoA(K10257(

Icosatrienoyl0CoA(

…..(

…..(

…..( …..(

K00059(

Acetyl0CoA(

…..(

K01716(K00665(K02371(K02372(K09458(K00208(K00647(

Malonyl0ACP(

Page 42: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

42

Supplementary Figure 7 OMMC-wide metabolic networks reconstructed from

metagenomic and metatranscriptomic data of OMMCs sampled in (a) autumn and (b) winter.

Large nodes indicate KOs encoding key functionalities whereas colours represent pathway

membership: yellow, nitrogen metabolism; orange, fatty acid biosynthesis; light green,

benzoate degradation; dark green, porphyrin and chlorophyll metabolism; pink, cystein and

methionine metabolism and red, other pathways.

630

a b

Page 43: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

43

Supplementary Figure 8 Micrographs of Isolate LCSB065. (a) Non-polar granules observed

in Isolate LCSB065 following Nile Red staining (λex 550/20 nm, λem 620/60 nm); (b)

Bright field micrograph; (c) Overlay of (a) and (b). The scale bar is equivalent to 10 µm.

a b c

Page 44: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

44

Supplementary Tables

Supplementary Table 1 Physicochemical characteristics of the wastewater at the time of

sampling in the anoxic tank of the Schifflange BWWT plant.

Sampling dates Suspended

solids (g/l)

pH NO3-

(mg/l)

O2

(mg/l)

NH4

(mg/l)

PO4

(mg/l)

Air

temperature

(°C)

Water

temperature

(°C)

4 October 2010 2.78 6.94 2.77 0.64 0.6 2.06 15 20.7

25 January 2011 3.24 7.01 3.37 1.13 1.72 1.02 0 14.5

635

Page 45: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

45

Supplementary Table 2 Quantitative and qualitative analyses of biomacromolecular

fractions sequentially isolated from the OMMC samples.

Sampling dates DNA RNA Protein

A260/A280 Quantity

[µg]

RIN Quantity

[µg]

Quantity per

gel lane [µg]

4 October 2010 2.20 15.44 8.9 81.97 20.1

25 January 2011 2.03 6.11 9.2 75 21.8

Page 46: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

46

Supplementary Table 3 Statistics of the combined assembly of the metagenomic and

metatranscriptomic sequence datasets. 640

Statistic Value Unit

Total contig length 1 617 735 059 nt

Average contig length 239 nt

N50 258 nt

Maximal contig length 12 591 nt

Number of contigs 6 761 781

Page 47: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

47

Supplementary Table 4 Ammonia-oxidizing organisms with corresponding accession

numbers of amoA genes used for the reconstruction of the phylogenetic tree.

645

Organism name Accession number

Candidatus Nitrososphaera gargensis gi|166007511

Methylobacter tundripaludum gi|493947876

Methylococcus capsulatus Bath gi|53803062

Methylomicrobium alcaliphilum 20Z gi|357403888

Nitrosococcus halophilus Nc4 gi|292490804

Nitrosococcus oceani ATCC19707 gi|77165960

Nitrosococcus watsonii C-113 gi|300113334

Nitrosomonas cryotolerans ATCC49181 gi|12620331

Nitrosomonas europaea ATCC19718 gi|30248947

Nitrosomonas eutropha C91 gi|114332043

Nitrosomonas sp. Is79A3 gi|339481967

Nitrosomonas sp. JL21 gi|19310210

Nitrosomonas sp. AL212 gi|325981491

Nitrosospira briensis C-128 gi|1732262

Nitrosospira multiformis ATCC25196 gi|82701932

Nitrosospira sp. APG3A gi|490283770

Nitrosospira sp. En13 gi|121483569

Nitrosospira sp. NpAV gi|2062746

Nitrosospira sp. Np39-19 gi|2425028

Nitrosovibrio tenuis gi|1732264

Page 48: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

48

Supplementary Table 5 Numbers of metagenomic and metatranscriptomic reads from

autumn and winter dates mapping to the isolate genomes of Nitrosomonas sp. Is79 (ref. 30)

and Isolate LCSB065.

Dataset Sampling dates Target organism

Nitrosomonas sp. IS79 Isolate LCSB065

metagenomic reads 4 October 2010 7 282 5 665

metagenomic reads 25 January 2011 13 058 7 941

metatranscriptomic reads 4 October 2010 4 004 1 923

metatranscriptomic reads 25 January 2011 28 631 20 784

Page 49: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

49

Supplementary References

1. Roume H, Muller EE, Cordes T, Renaut J, Hiller K, Wilmes P. A biomolecular isolation

framework for eco-systems biology. ISME J 2013; 7: 110-121. 650

2. Roume H, Heintz-Buschart A, Muller EE, Wilmes P. Sequential isolation of metabolites,

RNA, DNA, and proteins from the same unique sample. Microbial Metagenomics,

Metatranscriptomics, and Metaproteomics. Method Enzymol 2013; 531: 219-236.

655

3. Kozarewa I, Turner DJ. 96-plex molecular barcoding for the Illumina Genome Analyzer.

High-Throughput Next Generation Sequencing 2011; 279-298.

4. Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PANDAseq: paired-

end assembler for illumina sequences. BMC Bioinformatics 2012; 13: 31. 660

5. Kofler R, Orozco-terWengel P, De Maio N, Pandey RV, Nolte V, Futschik A et al.

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data

from pooled individuals. PloS one 2011; 6: e15925.

665

6. Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and

comparing biological sequences. Bioinformatics 2010; 26: 680-682.

7. Kultima JR, Sunagawa S, Li J, Chen W, Chen H, Mende DR et al. MOCAT: a

metagenomics assembly and gene prediction toolkit. PloS one 2012; 7: e47656. 670

8. Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program.

Page 50: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

50

Bioinformatics 2008; 24: 713-714.

9. Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF. EMIRGE: reconstruction of 675

full-length ribosomal genes from microbial community short read sequencing data. Genome

Biol 2011; 12: R44.

10. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al. The SILVA ribosomal

RNA gene database project: improved data processing and web-based tools. Nucleic Acids 680

Res 2013; 41: D590-D596.

11. Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B et al. MetAMOS: a

modular and open source metagenomic assembly and analysis pipeline. Genome Biol 2013;

14: R2. 685

12. Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: an extension of Velvet

assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res

2012; 40: e155-e155.

690

13. Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads.

Nucleic Acids Res 2010; 38: e191-e191.

14. Hyatt D, Chen G-L, LoCascio P, Land M, Larimer F, Hauser L. Prodigal: prokaryotic

gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11: 695

119.

Page 51: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

51

15. Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res 2002; 12: 656-664.

16. Kessner D, Chambers M, Burke R, Agus D, Mallick P. ProteoWizard: open source 700

software for rapid proteomics tools development. Bioinformatics 2008; 24: 2534-2536.

17. Craig R, Cortens JP, Beavis RC. Open source system for analyzing, validating, and

storing protein identification data. J Proteome Res 2004; 3: 1234-1242.

705

18. Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N et al. A guided

tour of the Trans-­‐Proteomic Pipeline. Proteomics 2010; 10: 1150-1159.

19. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate

the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 710

2002; 74: 5383-5392.

20. Shteynberg D, Deutsch EW, Lam H, Eng JK, Sun Z, Tasman N et al. iProphet: multi-

level integrative analysis of shotgun proteomic data improves peptide and protein

identification rates and error estimates. Mol Cell Proteomics 2011; 10: M111.007690. 715

21. Muller EE, Pinel N, Laczny CC, Hoopmann MR, Narayanasamy S, Lebrun LA et al.

Community-integrated omics links dominance of a microbial generalist to fine-tuned

resource usage. Nat Commun 2014; 5: 5603; 1-10.

720

22. Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M et al. Accurate quantification of

transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res 2011;

Page 52: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

52

39: e9-e9.

23. Tsementzi D, Poretsky R, Rodriguez-­‐R LM, Luo C, Konstantinidis KT. Evaluation of 725

metatranscriptomic protocols and application to the study of freshwater microbial

communities. Environ Microbiol Rep 2014; 6: 640-655.

24. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful

approach to multiple testing. Journal of the Royal Statistical Society. Series B 730

(Methodological) 1995; 289-300.

25. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al. Cytoscape: a

software environment for integrated models of biomolecular interaction networks. Genome

Res 2003; 13: 2498-2504. 735

26. Rahman SA, Schomburg D. Observing local and global properties of metabolic

pathways:‘load points’ and ‘choke points’ in the metabolic networks. Bioinformatics 2006;

22: 1767-1774.

740

27. Faust K, Croes D, van Helden J. Metabolic pathfinding using RPAIR annotation. J Mol

Biol 2009; 388: 390-414.

28. Csardi G, Nepusz T. The igraph software package for complex network research.

InterJournal, Complex Systems 2006; 1695: 1-9. 745

29. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search

Page 53: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

53

tool. J Mol Biol 1990; 215: 403-410.

30. Bollmann A, Sedlacek CJ, Norton J, Laanbroek HJ, Suwa Y, Stein LY et al. Complete 750

genome sequence of Nitrosomonas sp. Is79, an ammonia oxidizing bacterium adapted to low

ammonium concentrations. Stand Genomic Sci 2013; 7: 469.

31. Liu W, Li L, Khan MA, Zhu F. Popular molecular markers in bacteria. Mol Genet

Microbiol Virol 2012; 27: 103-107. 755

32. Sommer DD, Delcher AL, Salzberg SL, Pop M. Minimus: a fast, lightweight genome

assembler. BMC Bioinformatics 2007; 8: 64.

33. Treangen TJ, Sommer DD, Angly FE, Koren S, Pop M. Next generation sequence 760

assembly with AMOS. Curr Protoc Bioinformatics 2011; 11.8. 1-11.8. 18.

34. Sayavedra-­‐Soto L, Hommes N, Alzerreca J, Arp D, Norton JM, Klotz M. Transcription of

the amoC, amoA and amoB genes in Nitrosomonas europaea and Nitrosospira sp. NpAV.

FEMS Microbiol Lett 1998; 167: 81-88. 765

35. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation

sequencing data. Bioinformatics 2012; 28: 3150-3152.

36. Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F et al. Phylogeny. fr: 770

robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 2008; 36: W465-W469.

Page 54: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

54

37. Castresana J. Selection of conserved blocks from multiple alignments for their use in

phylogenetic analysis. Mol Biol Evol 2000; 17: 540-552.

775

38. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New

algorithms and methods to estimate maximum-likelihood phylogenies: assessing the

performance of PhyML 3.0. Syst Biol 2010; 59: 307-321.

39. Chevenet F, Brun C, Bañuls A-L, Jacq B, Christen R. TreeDyn: towards dynamic 780

graphics and annotations for analyses of trees. BMC Bioinformatics 2006; 7: 439.

40. Reasoner D, Geldreich E. A new medium for the enumeration and subculture of bacteria

from potable water. App Environ Microbiol 1985; 49: 1-7.

785

41. Levantesi C, Rossetti S, Thelen K, Kragelund C, Krooneman J, Eikelboom D et al.

Phylogeny, physiology and distribution of ‘Candidatus Microthrix calida’, a new Microthrix

species isolated from industrial activated sludge wastewater treatment plants. Environ

Microbiol 2006; 8: 1552-1563.

790

42. Slijkhuis H. Microthrix parvicella, a filamentous bacterium isolated from activated sludge:

cultivation in a chemically defined medium. App Environ Microbiol 1983; 46: 832-839.

43. Fowler SD, Greenspan P. Application of Nile red, a fluorescent hydrophobic probe, for

the detection of neutral lipid deposits in tissue sections: comparison with oil red O. J 795

Histochem Cytochem 1985; 33: 833-836.

Page 55: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

55

44. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image

analysis. Nat Methods 2012; 9: 671-675.

800

45. Rodrigue S, Materna AC, Timberlake SC, Blackburn MC, Malmstrom RR, Alm EJ,

Chisholm SW. Unlocking short read sequencing for metagenomics. PLoS one 2010; 5:

e11840.

46. Xu H, Luo X, Qian J, Pang X, Song J, Qian G et al. FastUniq: a fast de novo duplicates 805

removal tool for paired short reads. PloS one 2012; 7: e52249.

47. Peng Y, Leung HC, Yiu S-M, Chin FY. IDBA-UD: a de novo assembler for single-cell

and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012; 28: 1420-

1428. 810

48. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome

Res 1998; 8: 195-202.

49. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al. The RAST Server: 815

rapid annotations using subsystems technology. BMC Genomics 2008; 9: 75.

50. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform.

Bioinformatics 2009; 25: 1754-1760.

820

51. Kerepesi C, Bánky D, Grolmusz V. AmphoraNet: The webserver implementation of the

AMPHORA2 metagenomic workflow suite. Gene 2014; 533: 538-540.

Page 56: SUPPLEMENTARY INFORMATION Comparative integrated omics: identification … · 2015-06-17 · SUPPLEMENTARY INFORMATION Comparative integrated omics: identification of key functionalities

56

52. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for

genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000; 28: 33-36. 825

53. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang H-Y, Cohoon M et al. The

subsystems approach to genome annotation and its use in the project to annotate 1000

genomes. Nucleic Acids Res 2005; 33: 5691-5702.

830