Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was...

4
Comprehensive ecosystem specific 16S rRNA database illuminates the microbial dark matter in anaerobic digesters M.S. Dueholm*, S. Knutson*, V. Rudkjøbing*, M. Nierychlo*, J. Kristensen*, F. Petriglieri*, E. Yashiro*, S.M. Karst*, M. Albertsen*, P.H. Nielsen* *Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark Abstract We have recently developed a method that can produce millions of high-quality, full-length 16S rRNA sequences from any environment. Here we used this method to generate 412,174 of such sequences from anaerobic digesters located at Danish wastewater treatment plants. The sequences were used to create the first comprehensive anaerobic digester specific 16S rRNA reference database containing 9,174 unique full-length 16S rRNA sequences. Phylogenetic analyses of the database revealed many novel microbes, which highlights large gaps in our current knowledge regarding the microorganism involved in the anaerobic digestion process. The reference database now allows us to assign provisional names to these organisms and start to connect them with their ecological roles in the AD process. Keywords: Full-length 16S rRNA; Diversity; Microbial Dark Matter Session: Microbiology of anaerobic digestion/ (meta)genomic research Introduction Anaerobic digestion (AD) enables near-complete microbial degradation of complex organic waste into methane and carbon dioxide under anaerobic conditions. It, therefore, serves two important roles in the future circular economy. Firstly, it provides a means for sustainable disposal of organic waste, and secondly, it is a source of renewable energy. The AD process requires the concerted action of many different specialized bacteria and archaea organized into complex microbial communities. Although AD is a well-established technology, we know surprisingly little about the microbes involved in the process, and as a consequence, improved process design is often based on empirical knowledge and trial-and-error approaches rather than deduced from scientific knowledge. Molecular methods, such as 16S rRNA amplicon sequencing and fluorescence in situ hybridization (FISH) microscopy, can provide detailed information about how individual microbes (good or bad) responds to specific environmental cues and about their physiology and roles in the anaerobic food web. Materials and Methods DNA and RNA were extracted from digester biomass obtained from anaerobic digesters located at 16 Danish wastewater treatment plants, which were are all part of the MiDAS project (McIlroy et al., 2017). Full-length 16S rRNA sequencing libraries were prepared using both the primer-free and primer-based protocols as previously described (Karst et al., 2018). Both libraries were sequences together on a HiSeq rapid run and assembled using the bioinformatic pipeline previously described (Karst et al., 2018). Unique exact full-length 16S rRNA sequence variants (ESVs) was identified

Transcript of Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was...

Page 1: Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was performed on all samples based 16S rRNA amplicon sequencing with primers targeting the V4

Comprehensive ecosystem specific 16S rRNA database illuminates the microbial dark matter in anaerobic digesters M.S. Dueholm*, S. Knutson*, V. Rudkjøbing*, M. Nierychlo*, J. Kristensen*, F. Petriglieri*, E. Yashiro*, S.M. Karst*, M. Albertsen*, P.H. Nielsen*

*Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark

Abstract

We have recently developed a method that can produce millions of high-quality, full-length 16S rRNA sequences from any environment. Here we used this method to generate 412,174 of such sequences from anaerobic digesters located at Danish wastewater treatment plants. The sequences were used to create the first comprehensive anaerobic digester specific 16S rRNA reference database containing 9,174 unique full-length 16S rRNA sequences. Phylogenetic analyses of the database revealed many novel microbes, which highlights large gaps in our current knowledge regarding the microorganism involved in the anaerobic digestion process. The reference database now allows us to assign provisional names to these organisms and start to connect them with their ecological roles in the AD process.

Keywords: Full-length 16S rRNA; Diversity; Microbial Dark Matter

Session: Microbiology of anaerobic digestion/ (meta)genomic research

Introduction

Anaerobic digestion (AD) enables near-complete microbial degradation of complex organic waste into methane and carbon dioxide under anaerobic conditions. It, therefore, serves two important roles in the future circular economy. Firstly, it provides a means for sustainable disposal of organic waste, and secondly, it is a source of renewable energy. The AD process requires the concerted action of many different specialized bacteria and archaea organized into complex microbial communities. Although AD is a well-established technology, we know surprisingly little about the microbes involved in the process, and as a consequence, improved process design is often based on empirical knowledge and trial-and-error approaches rather than deduced from scientific knowledge. Molecular methods, such as 16S rRNA amplicon sequencing and fluorescence in situ hybridization (FISH) microscopy, can provide detailed information about how individual microbes (good or bad) responds to specific environmental cues and about their physiology and roles in the anaerobic food web.

Materials and Methods

DNA and RNA were extracted from digester biomass obtained from anaerobic digesters located at 16 Danish wastewater treatment plants, which were are all part of the MiDAS project (McIlroy et al., 2017). Full-length 16S rRNA sequencing libraries were prepared using both the primer-free and primer-based protocols as previously described (Karst et al., 2018). Both libraries were sequences together on a HiSeq rapid run and assembled using the bioinformatic pipeline previously described (Karst et al., 2018). Unique exact full-length 16S rRNA sequence variants (ESVs) was identified

Page 2: Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was performed on all samples based 16S rRNA amplicon sequencing with primers targeting the V4

using the usearch fastx_unique command with singletons discarded, assuming that the two identical sequences would rarely arise due to random sequencing errors of independent amplified sequences. ESVs were aligned to sequences in the SILVA v. 132 SSU Ref NR 99 database (Quast et al., 2013) using the SINA aligner (Pruesse et al., 2012). A comprehensive taxonomy was added to each ESV based on the SILVA taxonomy of the closest neighbor trimmed based on sequence identity according to the thresholds proposed by Yarza et al., (2014). Gaps in the taxonomy were filled with a denovo taxonomy constructed using Uclust and the thresholds above. Phylogenetic trees were calculated using FastTree v.2.1.7 SSE3 (Price et al., 2010) with Gamma20-based likelihood and imported into ARB (Ludwig et al., 2004) for visualization. Microbial community analysis was performed on all samples based 16S rRNA amplicon sequencing with primers targeting the V4 region of the 16S rRNA (Caporaso et al., 2012) as previously described (Karst et al., 2016) except that amplicon sequence variants (ASVs) were used instead of clustered operational taxonomic units (OTUs).

Results and Conclusions

Phylogenetic analysis of the produced ESVs showed several new taxonomic groups within the Bacterial kingdom, including four ESVs belonging to novel orders (Fig. 1). The novel taxonomic groups were found throughout the bacterial tree of life. In stark contrast to the novelty observed for the Bacteria, all archaeal ESVs corresponded to known species.

Figure 1. Overview of microbial diversity in the anaerobic digesters. Maximum-likelihood phylogenetic trees showing the archaeal and bacterial diversity observed in the full-length 16S rRNA ESV database. The trees include all ESVs generated in this study and were calculated using FastTree. Sequences are color-coded based on their similarity to their closest neighbor in the SILVA database. The table shows the number of ESVs belonging to novel taxa predicted based on sequence identity according to the thresholds proposed by Yarza et al., (2014).

Page 3: Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was performed on all samples based 16S rRNA amplicon sequencing with primers targeting the V4

Figure 2. Microbial community composition in the studied anaerobic digesters. The heatmap shows the 25 most abundant genera in the anaerobic digesters. Results are based on amplicons sequencing targeting the V4 region of the 16S rRNA gene and using our comprehensive anaerobic digester specific 16S rRNA reference database. Previously unknown genera are highlighted in red and amplicons that could not be assigned to a specific genus are highlighted in blue.

Amplicon sequencing was performed on all samples and taxonomy was assigned to the amplicons based on our ESV database (Fig. 2). Approximately 80-90% of all amplicons could be assigned to specific genera, indicating a good coverage of the reference database. Interestingly, 9 out of the top 25 genera represented novel genera not found/named in the SILVA v. 132 SSU NR database, most of these had high abundances in some digesters, indicating a role in the AD process. With our ecosystems specific database, we can now put names on these genera and start to assign functions to them, thus shedding light on the microbial dark matter in anaerobic digesters.

References:

Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, et al. (2012). Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6: 1621–1624.

Karst SM, Albertsen M, Kirkegaard RH, Dueholm MS, Nielsen PH. (2016). Molecular Methods. In: Loosdrecht MCM Van, Nielsen PH, Lopez-Vazquez CM, Brdjanovic D (eds). Experimental Methods In Wastewater Treatment. IWA Publishing, pp 301–339.

Karst SM, Dueholm MS, McIlroy SJ, Kirkegaard RH, Nielsen PH, Albertsen M. (2018). Retrieval of a million high-quality , full-length microbial 16S and 18S rRNA gene sequences without primer bias. Nat Biotechnol 36: 190–195.

Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar a., et al. (2004). ARB: A

Page 4: Comprehensive ecosystem specific 16S rRNA database ... · Microbial community analysis was performed on all samples based 16S rRNA amplicon sequencing with primers targeting the V4

software environment for sequence data. Nucleic Acids Res 32: 1363–1371.

McIlroy SJ, Kirkegaard RH, McIlroy B, Nierychlo M, Kristensen JM, Karst SM, et al. (2017). MiDAS 2.0: An ecosystem-specific taxonomy and online database for the organisms of wastewater treatment systems expanded for anaerobic digester groups. Database 2017: 1–9.

Price MN, Dehal PS, Arkin AP. (2010). FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5: e9490.

Pruesse E, Peplies J, Glöckner FO. (2012). SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28: 1823–1829.

Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. (2013). The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res 41: D590-6.

Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer K-H, et al. (2014). Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12: 635–645.