Metaproteomics reveals insights into microbial structure ...
Metaproteomics of the Human Intestinal Tract to Assess … · 2017. 3. 11. · I ABSTRACT The human...
Transcript of Metaproteomics of the Human Intestinal Tract to Assess … · 2017. 3. 11. · I ABSTRACT The human...
Department of Veterinary Biosciences
Faculty of Veterinary Medicine
University of Helsinki
Helsinki, Finland
Metaproteomics of the Human Intestinal Tract to Assess
Microbial Functionality and Interactions with the Host
Carolin Adriane Kolmeder
ACADEMIC DISSERTATION
To be presented, with the permission of the Faculty of Veterinary Medicine of the University of
Helsinki, for public examination in Walter auditorium, EE-building, Agnes Sjöberginkatu 2, Helsinki, on
May 28th, at 12 noon.
Helsinki 2015
Director of Studies:
Professor Airi Palva
Department of Veterinary Biosciences
University of Helsinki
Helsinki, Finland
Supervisors:
Professor Willem M. de Vos
Department of Veterinary Biosciences and RPU Immunobiology
Faculty of Medicine, University of Helsinki,
Helsinki, Finland and
Laboratory of Microbiology
Wageningen University
Wageningen, The Netherlands
Docent Anne Salonen
Department of Immunology and Bacteriology
University of Helsinki
Helsinki, Finland
Professor Airi Palva
Department of Veterinary Biosciences
University of Helsinki
Helsinki, Finland
Pre-examiners:
Docent Kirsi Savijoki
Department of Food and Environmental
Sciences
University of Helsinki
Helsinki, Finland
Associate Professor Koen Venema
Department of Human Biology
Maastricht University – Campus Venlo
Maastricht, The Netherlands
Opponent:
Docent Petri Auvinen
Institute of Biotechnology
University of Helsinki
Helsinki, Finland
Published in Dissertationes Schola Doctoralis Scientiae Circumiectalis, Alimentarie, Biologicae
ISBN 978-951-51-1169-2 (paperback) and 978-951-51-1170-8 (PDF)
ISSN 2342-5423 (print) and 2342-5431 (online)
http://ethesis.helsinki.fi
Cover: Talvinen auringonlasku Kolilla – Jochen Müller
Layout: Jochen Müller
Hansaprint
Helsinki, Finland 2015
Non nobis solum nati sumus. Cicero/Plato
ABSTRACT ...................................................................................................................................................... I
LIST OF ORIGINAL PUBLICATIONS ...................................................................................................... II
ABBREVIATIONS ........................................................................................................................................ III
GLOSSARY ................................................................................................................................................... IV
1. INTRODUCTION ........................................................................................................................................ 1
2. REVIEW OF THE LITERATURE ............................................................................................................ 3
2.1 Human intestinal microbiota ................................................................................................................ 3
2.1.1 Development from birth to old age ................................................................................................................ 3
2.1.2 Variation and modulation in adulthood ....................................................................................................... 6
2.2 Methods to study the human intestinal microbiota ............................................................................ 7
2.2.1 Faecal material as the study object ............................................................................................................... 7
2.2.2 Phylogenetic analysis .................................................................................................................................... 8
2.2.3 Metagenomics ................................................................................................................................................ 9
2.2.4 Metatranscriptomics .................................................................................................................................... 13
2.2.5 Metaproteomics ............................................................................................................................................ 13
2.2.6 Data integration ........................................................................................................................................... 19
3. AIMS OF THE STUDY ............................................................................................................................. 20
4. MATERIALS & METHODS .................................................................................................................... 21
4.1 Study cohorts and samples ................................................................................................................. 21
4.2 Methods ................................................................................................................................................ 22
4.2.1 Metaproteomics ............................................................................................................................................ 22
4.2.2 Phylogenetic microarray analysis ............................................................................................................... 23
4.2.3 Statistical analysis ........................................................................................................................................ 24
4.2.4 Additional data-sets used in this thesis ....................................................................................................... 24
5. RESULTS AND DISCUSSION ................................................................................................................. 25
5.1 Method development to study the human intestinal metaproteome (I, II and V) ......................... 25
5.1.1 Analysis of the abundant metaproteome (II) .............................................................................................. 25
5.1.2 Reproducibility testing of the pipeline (II) .................................................................................................. 27
5.1.3 Development of in-house databases (I-IV) ................................................................................................. 28
5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV) ................................................................. 29
5.1.5 Reflections on the metaproteomic approach ............................................................................................... 30
5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time .................. 31
5.2.1 Individuality of the faecal metaproteome (II and III) ................................................................................ 33
5.2.2 Main microbial functionalities (II and III) ................................................................................................ 33
5.3 Host-microbiota interactions (II-IV) ................................................................................................. 34
5.4 Comparing the metaproteome of lean and obese individuals (IV).................................................. 35
5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis (II-IV) .......... 36
5.6 Reflections on study cohorts ............................................................................................................... 37
6. CONCLUSIONS AND FUTURE PERSPECTIVES ............................................................................... 39
7. ACKNOWLEDGEMENTS ....................................................................................................................... 42
8. REFERENCES ........................................................................................................................................... 45
I
ABSTRACT
The human body is complemented by its microbes. Consequently, various intimate reactions between host
and microbes have been recognized and determining the role of the microbiota in many diseases has started.
Until the present time, most attention has been focused on the intestinal tract that contains the majority of the
microbiota. Extensive studies on the phylogenetic composition of the microbes present in faecal material
revealed species richness and the individuality of the intestinal microbiota, unique to each human being.
The work described in this thesis started when the first large scale metagenome analysis of the intestinal
microbiota for global exploitation of the potential biological functions of this ecosystem was being initiated.
Today, approximately ten million unique genes have been identified from the human intestinal microbiota,
but little is known about which of these genes can be expressed into proteins and the conditions under which
the protein synthesis occurs. Therefore, metaproteomics has emerged as a discipline in order to delve into the
biological functions of an ecosystem in a holistic manner by targeting all the proteins present.
This thesis work describes the development of a metaproteomic approach for analysing proteins from
the human intestinal tract. The basis of this approach is to use crude faecal material to determine bacterial
and human proteins simultaneously. Separating the proteins by one dimensional gel-electrophoresis and
focusing on the molecular weight region defined by the 37 and 75 kDa bands of a protein marker was found
to be both reproducible and feasible for the comparison of several dozens of samples. In a case study on the
metaproteome and metagenome of two individuals the benefit of utilizing matched metagenomes on the
number of identified spectra could be demonstrated. Furthermore, a bioinformatic workflow was devised to
increase the ratio of peptides identified per protein class.
Applying the developed metaproteomic approach to faecal samples derived from healthy adults
identified ecosystem specific components including surface proteins like flagellins and proteins involved in
vitamin synthesis. The monitoring of the metaproteome over six weeks in a probiotic intervention study
revealed time-dependent and inter-individual variations in these functions. The major finding emerging from
this work was that the metaproteome is personalized, both over the long-term as well as in short-term.
Therefore, the phylogenetic individuality of the intestinal microbiota was also functionally reflected. A
different set of microbes in their specific host in combination with environmental factors will express
specific functionalities.
Analysing both microbial and human proteins, which represented approximately 85% and 15% of the
total proteins respectively, allowed identifying indications of host-microbe interactions. Enzymes both for
degrading host and microbial components were identified. Effector and response molecules of the host-
microbe dialogue were found.
In addition, the metaproteome of a cohort of obese individuals was compared to that of non-obese
individuals. Based on the spectral counts of the eggNOG assignments of the identified peptides the faecal
metaproteomes of obese and non-obese subjects were significantly different from each other. In addition,
insulin-sensitivity correlated significantly with bacterial proteins.
In most cases, metaproteomic data was complemented by phylogenetic microarray data. This allowed
looking at the composition and its function at the same time. Data integration revealed for example a
correlation between Eubacterium rectale and flagellins. In the obesity study, the proportion of Bacteroidetes
peptides in the faecal samples from the obese subjects compared to the non-obese subjects were relatively
higher than the presence of this phylum detected by 16S rRNA analysis. This result implies biological
activity of Bacteroidetes also in obese individuals.
Metagenomics is still ahead of metaproteomics, both in numbers of analysed samples and available
tools. However, a refinement of metaproteomic methodologies has started and the number of samples
analysed is rising. Metaproteomics is one of the central disciplines to identify the key-molecules involved in
host-microbes interactions.
II
LIST OF ORIGINAL PUBLICATIONS
This thesis is based on the following original publications referred to in the text by their Roman numerals (I-
V):
I Rooijers, K.*, Kolmeder, C.*, Juste, C., Doré, J., de Been, M., Boeren, S., Galan, P., Beauvallet, C.,
de Vos, W.M. and Schaap, P.J., 2011. An iterative workflow for mining the human intestinal
metaproteome. BMC Genomics, 12(6).
*Equal contribution
II Kolmeder, C.A., de Been, M., Nikkilä, J., Ritamo, I., Mättö, J., Valmu, L., Salojärvi, J., Palva, A.,
Salonen, A. and de Vos, W.M., 2012. Comparative metaproteomics and diversity analysis of human
intestinal microbiota testifies for its temporal stability and expression of core functions. PLoS ONE,
7(1), e29913.
III Kolmeder, C.A., Salojärvi, J., Ritari, J., de Been, M., Raes, J., Falony, G., Vieira-Silva, S.,
Kekkonen, R.A., Corthals, G., Palva, A., Salonen, A. and de Vos, W.M., 2015. Faecal
metaproteomic analysis reveals a personalized and stable functional microbiome and limited effects
of a probiotic intervention in adults. Submitted.
IV Kolmeder, C.A., Ritari, J., Verdam, F.J., Muth, T., Keskitalo, S., Varjosalo, M., Fuentes, S., Greve,
J.W., Buurman, W.A., Reichl, U, Rapp, E., Martens, L., Palva, A., Salonen, A., Rensen, S. and de
Vos, W.M., 2015. Colonic metaproteomic signatures of active bacteria and the host in obesity.
Submitted.
V Kolmeder, C.A. and de Vos, W.M., 2014. Metaproteomics of our microbiome – developing insight
in function and activity in man and model systems. Journal of Proteomics, 97, 3-16.
The original articles are reprinted with the kind permission of the publishers.
In addition, some unpublished results are presented.
III
ABBREVIATIONS
1-DE one dimensional gel electrophoresis
2-DE two dimensional gel electrophoresis
AMP abundant metaproteome
BMI body mass index
CD Crohn’s disease
COG cluster of orthologous groups
DB database
EggNOG evolutionary genealogy of genes: non-supervised orthologous groups
FDR false discovery rate
geLC-MS/MS protein fractionation by 1-DE and peptide analysis by LC-MS/MS
GIT gastro-intestinal tract
GMM microbial gastro-intestinal metabolic modules
HPLC high performance liquid chromatography
HMP human microbiome project
IBD inflammatory bowel disease
IBS irritable bowel syndrome
IMG/JGI integrated microbial genomes/joint genome institute
KEGG Kyoto encyclopedia of genes and genomes
KO KEGG orthology
LC liquid chromatography
LCA lowest common ancestor
MALDI-ToF matrix assisted laser desorption/ionization time-of flight
MRM multiple reaction monitoring
MS mass spectrometry
MS1 peptide ions
MS/MS tandem mass spectrometry
NGS next generation sequencing
NSP non-starch polysaccharides
NOG non-supervised orthologous group
OTU operational taxonomic unit
PSM peptide spectral matching
qPCR quantitative polymerase chain reaction
RS resistant starch
SCFA short chain fatty acids
SCX strong cation exchange
SRM selected reaction monitoring
UniProtKB universal protein resource knowledgebase
IV
GLOSSARY
Activity Collection of expressed functions of a cell; in contrast to genetic material and
potential
Ecosystem Living environment and organisms within
Functionality Collection of functions a cell is capable of performing
Matched data Metaproteomic and metagenomic data originating from the same biological sample;
matched metaproteome and matched metagenome
Microbiota Collection of microbes living in an ecosystem
Microbiome Collection of genes of a microbiota; also used as a synonym for microbiota
Meta-Omics Global analysis of molecules contained in an ecosystem or microbiota with omics
techniques that aim at the collective characterization and quantification of pools of
biological molecules; metagenomics, metatranscriptomics, metaproteomics
Phylogeny Evolutionary relation of organisms; nowadays mostly based on genetic comparison
Taxonomy Hierarchical grouping of organisms according to phylogeny
Introduction
1
1. INTRODUCTION
Making new biological discoveries requires two components: a special interest as well as technological
possibilities. The discovery of microbial life by Antonie van Leeuwenhoek (1632-1723) is a good example
from the early days of microbiology as it illustrates his keen interest in understanding a biological
phenomenon and advancing technical developments. Van Leeuwenhoek was a careful observer and
associated his diarrhoea to the presence of more and different little animals “animalcules”, as he called the
microbes, in his faecal excrements. From his drawings one may conclude that he suffered from an infection
by the protozoan Giardia. In addition, he developed lenses that later advanced into microscopes, instruments
that are still widely used in microbiology today. Ever since, most attention for microbes living in or around
the human body, has been focused on pathogenic microbes. However, in the late 19th century studying
microbes being possibly beneficial to humans, such as the lactic acid bacteria, started. Microbiology has
come a long way from these observations. Today, not only single microbial organisms are studied in
isolation but also complete microbial communities, termed microbiota, are analysed, preferably in their
original environment. The analysis of soil, oceans, eons-old ice, and the bodies of a variety of animals, has
revealed huge microbial species variations and has provided indications of microbial functions. For instance,
termites have specialized hindgut microbes helping them with degrading wood material and cows require a
rumen microbiota to supply them with energy out of grass cellulose. By now, from all these microbial
reservoirs, one has received most attention, as evidenced by the numbers of papers: the human being. Skin,
the oral cavity and gut, among other body sites, are populated by microbes. In particular, the intestinal
microbiota has become the focus of many studies that address their associations with health and disease.
Modern microbiology has access to a large set of sophisticated molecular tools, each of which provides
important information on the studied ecosystem (Table 1; Figure 1). The dominant phylogenetic composition
of a microbiota can be assessed by analysing the 16S rRNA gene; this answers the question ‘Who is there’.
The development of a variety of high-throughput gene sequencing methods has permitted the analysis of the
entire gene repertoire of a microbial community, also known as the microbiome. The resulting metagenomic
information describes the functional potential of the intestinal microbes, answering the question ’What can
they do’. At present, phylogenetic and metagenomic approaches targeting the human microbiota are
dominating the literature. However, other meta-omic approaches that analyse microbial consortia within their
ecosystem are required to expand our knowledge of their in vivo functions by addressing the question ‘What
are they doing’. The metabolic net-outcome (metabolomics) has been determined in a variety of studies. To
some extent, the transcriptional activity of the microbial ecosystems (metatranscriptomics) has been
addressed as well. Similarly, less studied than the phylogenetic composition and metagenome is the
metaproteome, which includes the entire collection of proteins that are present within an ecosystem.
Studying the proteins is required to answer the question ‘What has happened’. The term metaproteome was
first proposed in 2004 (Rodríguez-Valera, 2004). In the same year, the first metaproteomic study, which
analysed the metaproteome of activated sludge, was published (Wilmes and Bond, 2004). It was not before
2006, that the possibilities and challenges of metaproteomics were described in full (Wilmes and Bond,
2006). At the start of this thesis work, only one paper had been published describing the qualitative changes
of the intestinal metaproteome occurring during early infancy (Klaassens et al. 2007). Due to the absence of
reference genomes that are instrumental for protein identification, less than ten proteins could be identified.
By now, metaproteomics has found application in many different environments from contaminated soil to the
Atlantic Ocean, from termite hind-gut to human cerumen (Benndorf et al. 2007; Warnecke et al. 2007; Feig
et al. 2013; Georges et al. 2014). The main aims of this thesis project were to develop an approach to analyse
the human intestinal metaproteome and then to utilize this approach to establish a baseline of a healthy
metaproteome.
Introduction
2
Table 1. Overview of state-of-the-art methods to study the human intestinal microbiota. The analytical target is named
as well as the expected deliverables (Readout).
Method Target Readout
Phylogenetic analysis 16S rRNA sequences Microbiota composition and diversity
Metagenomics All DNA present Catalogue of microbial coding capacity and
diversity on the functional level
Metatranscriptomics All mRNA Transcription of microbial genes as a proxy
for their function – some host transcripts
may be discovered as well
Metaproteomics All proteins Expression of microbial and host genes into
proteins as a mark for their intestinal
function
Metabolomics All metabolites and other small molecules Net metabolic output of the intestine derived
from host-microbe interactions
Culturing Single microbes Real functional analysis
Figure 1. Tools with which to study the intestinal ecosystem. The intestinal ecosystem is directly influenced by the host
and the environment surrounding the host.
Review of the Literature
3
2. REVIEW OF THE LITERATURE
2.1 Human intestinal microbiota
2.1.1 Development from birth to old age
We are born into a world full of microorganisms, including Bacteria, Archaea and Fungi as well as their
viruses. Consequently, these microbes also colonize us. The first cohabitants on our inner and outer surfaces
usually derive from our mothers and there seem to be routes that expose already the foetus to microbes,
challenging the earlier view that babies are borne completely sterile (Tissier, 1900; Jiménez et al. 2005; Jost
et al. 2014). Recent phylogenetic analysis has revealed that a great variety of microbial communities are
hosted by the human body, distinct to each body site and niche (Human Microbiome Project Consortium,
2012). Each individual harbours a unique microbiota, this being influenced by both the host’s genes and
environmental factors. However, for example the skin microbiota of two individuals is more similar than the
skin microbiota and the intestinal microbiota within one individual, highlighting the site-specificity of the
host-associated microbial communities. In addition, even a single body site, such as the mouth, can contain
many micro-environments and hence different microbial communities exist in different niches (Kort et al.
2014).
The intestine is not only the most complex but also the most studied of these ecosystems i.e. the
environment and the organisms living within it (Qin et al. 2010; Karlsson et al. 2014; Li et al. 2014). The
start-up community of the intestine may be shaped by the mode of delivery, infant nutrition – breast milk
versus formula – and use of antibiotics, and may thereby also have an impact on health later in life (Penders
et al. 2006; Scholtens et al. 2012; Nylund et al. 2014). Probably the most long-lasting effect of the intestinal
community on our well-being is the stimulation of the immune-system. The intestinal microbiota plays an
essential part in the priming of the immune system, reviewed by Weng and Walker (2013). This priming is
needed in order to acquire the delicate balance between tolerance to antigens derived from food and
commensals, and effective reactivity against pathogens later in life. As the human body and its physiology
develop from infancy to adulthood, so does our intestinal microbiota. There are drastic changes taking place
in infancy and childhood evidently affecting the microbiota. One early event is the change from nursing on
breast milk, containing growth stimulating components for microbes (Wang et al. 2015), or formula to eating
solid food (Bergström et al. 2014). The constant body growth and other developmental changes cause a
constantly changing environment for the intestinal microbiota. The present studies have mostly focused on
analysing faecal samples from infants and adults. From toddler age to adolescence less data is available and
therefore the constant succession of the microbiota from infancy to adulthood is largely unknown. However,
from early life until adulthood the diversity of the microbiota increases and an ecosystem rich in several
hundreds to thousands of microbial species becomes established (Nielsen et al. 2014). At the level of
operational taxonomic units (OTUs), unique organismal genetic units, the variation across individuals is
tremendous, even though the main phyla in the gut are limited, including Firmicutes, Bacteroidetes,
Actinobacteria, Proteobacteria and Verrucomicrobia. Collecting and curating 16S rRNA gene sequences at
97% similarity derived from the human intestine as deposited in public databases resulted in the definition of
2,473 OTUs (Ritari et al. 2015). These represent the present estimate of the number of bacterial and archaeal
species possibly found in the human intestine, meaning that there is an almost infinite number of
combinatorial community alternatives. In addition, more genera and species are constantly being isolated
(Rajilić-Stojanović and de Vos, 2014). As discovered over a decade ago, each individual harbours one’s own
personalized microbiota determined by genetic, environmental and dietary factors. As long as no drastic
changes occur (see below), the microbiota is relatively stable and the temporal fluxes are smaller than the
between-individual differences (Jalanka-Tuovinen et al. 2011). Therefore, the individual microbiota can be
used for subject-identification. A challenging down-side of the large microbiota differences between
Review of the Literature
4
individuals is evident when setting up group comparisons e.g. when attempting to study the effect of a
special dietary regime. The between-group differences are hard to detect when the within-group differences
are already large. However, unifying patterns across a collection of individuals have been identified and a set
of enterotypes, each dominated by a certain group of bacteria, have been proposed (Arumugam et al. 2011).
These patterns suggest that there is a limited set of module solutions providing the integral features in the
intestine, provided by dominating bacteria. Such patterns like the enterotypes may be useful in study design
since stratifying for a certain enterotype may make it easier to detect group differences. Recently, also the
microbiota diversity (Salonen et al. 2014) and the abundance of individual bacteria (Korpela et al. 2014)
have been reported to associate with the individual heterogeneity in dietary responses. This suggests that
different features of the microbiota can be utilized in the stratification of the study subjects and in this way
decrease the within-group variability.
The whole digestive tract, from the oral cavity down to the colon, provides several different niches for
microbial colonization, with the colon exhibiting the highest microbial density (Figure 2). As nutrient
availability and human components change along the gastro-intestinal tract (GIT), the composition of the
microbiota changes from one end to the other. The living environment of the intestinal microbes is not static
at all. Among other factors, the amount of oxygen, the pH that varies from acidic in the stomach to more
neutral in the intestine, and notably bile, which is a natural detergent, affect the types and amount of
microbes present. In addition, pH and bile can vary due to changes in the diet and these can lead to changes
in microbial communities (Duncan et al. 2009). The microbiota is encompassed by varying human enzymes
having their origin in the saliva, gastric and pancreatic juice, bile and, in early life, breast milk. For example,
the human intestinal alkaline phosphatase and the salivary scavenger and agglutinin (SALSA) have been
shown to have an effect on the growth of intestinal microbes (Lallès, 2014; Reichhardt et al. 2014). The
microbiota is also in contact with mucus, immune components and immune cells. Inevitably, evolution has
produced an individual host-microbiota mutualism (Quercia et al. 2014).
The human host provides the home for the microbiota and the intestinal microbes make a recognizable
contribution to human physiology. The most obvious factor that links men and microbes is the diet and its
major components; carbohydrates, lipids and proteins. Out of the large variety of carbohydrates, mono- and
disaccharides and the less complex polysaccharide starch are digested mostly by host enzymes in the upper
GI tract and immediately taken up as they are the major energy source. However, resistant starch (RS) and
non-starch polysaccharides (NSP), which include celluloses, pectin, and hemicellulose (xylose, mannose or
glucose), mostly reach the colon in an undigested form. Microbes, equipped with a large variety of
hydrolases, degrade, to different extents, these plant-derived carbohydrates, which consist of a variety of
sugars that are linked with various glycosylic bonds (Martens et al. 2014). For example, the genome of
Bacteroides cellulosilyticus encodes over 400 enzymes for carbohydrate degradation. In general, the
microbial gene repertoire for hydrolysing carbohydrates is much wider than that of the host. This means that
humans depend on microbes to degrade certain food components; this may provide flexibility to adapt to
short or long-term changes in the diet (Sonnenburg and Sonnenburg, 2014). Certain intestinal microbes not
only rely on carbohydrates of dietary origin, but they also have enzymatic capacity to degrade host glycans
derived from mucus, the glycoprotein structure secreted by goblet cells that covers the epithelial cells. For
example Akkermansia muciniphila possesses this characteristic (Derrien et al. 2004), reviewed by
Ouwerkerk et al. (2013); in this way, they contribute to a turn-over of this complex glycoprotein structure.
Review of the Literature
5
Figure 2. Schematic representation of the human digestive tract. Left side: concentration of microbes in cfu/ml. Right
side: human components shaping the ecosystem; body fluids are secreted to the digestive tract along the digestive
process; the intestine is lined by epithelial cells covered with a mucus layer produced by goblet cells, in yellow; various
immune cells are present near the epithelial cells, dendritic cell as representative (Wang et al. 2005; Lawley and
Walker, 2013; Schuijt et al. 2013)
While processing dietary and endogenous glycans, intestinal bacteria produce a variety of molecules that
contain energy and/or act as bioactive molecules. The short chain fatty acids (SCFA) represent one group of
microbial metabolites that is receiving considerable attention. Acetate, propionate, and butyrate are the major
SCFA being formed in the intestine (Cummings and Macfarlane, 1991). In particular, butyrate is of special
interest as it is both energy source and a signalling molecule for the epithelial cells. Several metabolic
pathways for forming butyrate have been described (Louis et al. 2010). Similarly, propionate can be
produced by different enzymatic routes (Reichardt et al. 2014). The main butyrate producers are members of
Ruminococcacea and Lachnospiraceae, the key species being Faecalibacterium prausnitzii and Roseburia
intestinalis, Anaerostipes caccae and Eubacterium hallii. Microbial networks exist for cross-feeding as
certain species depend on the presence of other microbes and their contribution of certain substrates (Scott et
al. 2013). For example Bifidobacterium ssp. produce lactate and acetate during growth on the polysaccharide
inulin. These metabolites are further utilized by other bacteria, e.g. E. hallii, to produce butyrate.
Review of the Literature
6
Similarly to the hydrolases for degrading glycans, intestinal microbes possess genes with coding
capacity to transform bile salts to a diverse range of metabolites. These metabolites have been shown to play
a role in modulating the host’s energy metabolism, reviewed by Li and Chiang (2015).
In addition, the GIT microbiota produces a set of proteinases. Microbial proteinases derived from
pathogens have been shown not only to digest proteins but also to have non-proteolytic properties such as
signalling to the host (Jarocki et al. 2014). However, as proteinases have been proposed to play a role in
inflammatory bowel disease (IBD), it is likely that also commensal proteinases can have other properties
than proteolytic ones (Steck et al. 2012).
Intestinal microbes also contribute to the micronutrient supply and synthesize vitamins, for example B
vitamins (LeBlanc et al. 2013). However, absorption by the host is not always proven. Thiamine has been
shown to be synthesized by the intestinal microbiota and also human transporters exist implying the
absorption by the host (Nabokina et al. 2014). Bacterial vitamin synthesis capacity seems to vary according
to geographic region and depend on food supply and age (Yatsunenko et al. 2012). This indicates that
vitamins of microbial origin contribute to the host’s vitamin supply. In contrast to vitamins, our GIT
microbiota for example depends on the trace element iron from the host’s diet, as reviewed by Kortman et al.
(2014).
Changes in the metabolic activities of the intestinal microbiota possibly occur with old age. When the
hormone set-up, the levels of physical activity, and nutrition habits are changing, the microbiota changes as
well. In centenarians, it has been found that an increase in the levels of inflammation, so called
inflammaging, was associated with a reduction in the numbers of butyrate-producing F. prausnitzii and an
increase in the levels of potential pathogens (Biagi et al. 2010; Rampelli et al. 2013).
2.1.2 Variation and modulation in adulthood
The first component in a human’s life affecting the composition of an individual’s intestinal microbiota is the
personal gene make-up. Twin studies have revealed that host genetics to some extent explain the
composition of the intestinal microbiota (Zoetendal et al. 2001; Turnbaugh et al. 2009; Tims et al. 2012). In
addition, single nucleotide polymorphisms have been shown to affect the intestinal community structure; this
has been documented for the gene encoding fucosyl transferase (fut2) that determines the secretor-status of
the host (Wacklin et al. 2014). The secretor-status profoundly affects the intestinal ecosystem as only in
secretors the intestinal mucosa is decorated with fucose, a carbohydrate abundantly utilized by intestinal
bacteria. Secretors and non-secretors have been shown to exhibit compositional differences in their
microbiota. This provides one of the first examples with a known mechanism to explain how the host
genotype shapes the intestinal microbiota (Wacklin et al. 2014). However, these are still early days of
integrating the two extremely complex systems and trying to link variations in the gut microbiome to those in
the human genome.
Second, diet, as introduced above, is inevitably influencing the intestinal community. However, only a
few human studies have been conducted that describe the daily differences in diet and the different macro-
and micro-nutrient content. For example, in a cross-over study in which study participants consumed either a
control diet, a diet high in RS or high in NSPs, or a weight loss-inducing diet, it was found that changes had
occurred in microbial composition upon diet-switching (Walker et al. 2011). The effect of diet, attributable
to food components, on the human intestinal microbiota has been reviewed recently (Salonen and de Vos,
2014; Graf et al. 2015). The microbial metabolomics profiles in vegans and omnivores have been compared,
the effect of energy-content of diet on microbial composition has been studied, and metagenomic,
metatranscriptomic and phylogenetic composition changes in response to a meat-fat based diet analysed
(Jumpertz et al. 2011; Wu et al. 2014; David et al. 2014). In addition, differences in microbial composition
and metagenomes in different ethnic groups have been detected (Yatsunenko et al. 2012; Tyakht et al. 2013;
Schnorr et al. 2014; O’Keefe et al. 2015). These studies possibly reflect the different diets of the ethnic
groups but it should be noted that these studies also compare populations with profound differences in
Review of the Literature
7
hygiene level, general lifestyle, and host genetics. Flexibility in metabolic activities similar to that of the host
is to be expected, reviewed by van Ommen et al. (2014). Therefore, setting aside extreme diets, most dietary
interventions in healthy adults only lead to subtle changes in the microbiota (Salonen and de Vos, 2014). A
recent meta-analysis including three different intervention studies on obese patients on different dietary
interventions showed that some individuals’ microbiota may be responsive and others not upon dietary
changes (Korpela et al. 2014). In a study, where healthy individuals consumed a mixture of seven milk-
fermenting bacterial strains only subtle changes were found, as observed at the transcriptional level. The
microbial composition remained unchanged (Mahowald et al. 2009). According to the definition provided by
the World Health Organization (WHO), probiotics are "live micro-organisms which, when administered in
adequate amounts, confer a health benefit on the host". They encompass a wide range of different species and
have been widely used for example to modulate gut transit time (Dimidi et al. 2014). On microbial
composition in healthy individuals, only subtle effects, if any at all, have been observed in response to the
consumption of probiotics (Gerritsen et al. 2011).
Third, the health status of a person can impact on the intestinal microbiota. Many associations between
microbiota and disease have been reported. Moreover, indirect effects are well studied, such as by the usage
of antibiotics. Also direct impact is likely but often it is not known whether the changes in the microbiota are
causal or a consequence of the disease. Dramatic changes in an individual’s microbiota can occur upon
antibiotic treatment as reviewed recently (Panda et al. 2014). It has been proposed that if no resilience is
achieved, i.e. the changes to the intestinal microbial composition upon antibiotic usage are not balanced,
infections by Clostridium difficile can occur (Song et al. 2013). These infections can become life-threatening
in the case when C. difficile produces toxins in the intestine. Infections by C. difficile have been efficiently
treated by introducing the microbiota from a healthy donor (Bookstaver et al. 2013; van Nood et al. 2013).
This is evidence that the microbiota is causal both for the reoccurrence of the disease and its cure, one rare
example of causality in a highly complex system.
There is a large number of diseases that may involve the intestinal microbiota (de Vos and de Vos,
2012) ranging from intestinal diseases, such as IBD (Cantarel et al. 2011; Machiels et al. 2014) and irritable
bowel syndrome (IBS) (Kassinen et al. 2007), to systemic conditions or diseases like obesity (Everard and
Cani, 2013), non-alcoholic fatty liver disease, and diabetes type II (Vrieze et al. 2012), and even neuronal
disorders such as Parkinson’s disease (Scheperjans et al. 2014). In most cases, differences in composition
have been detected but mechanistic explanations are missing. In addition, it needs to be demonstrated
whether these associations are a consequence or cause of the disease. Obesity and related metabolic
aberrations represent one of the most actively studied diseases in relation to the intestinal microbiota. Studies
addressing obesity-related phylogenetic profiles have yielded controversy results (Finucane et al. 2014).
Those might be due to the applied methodology but also to the study cohort as the severity of obesity
differed in the compared studies (Walters et al. 2014). On the other hand, several mechanisms that link gut
microbiota to obesity and metabolic diseases have been proposed. These include increased energy harvest
from the diet via SCFA production, enhanced fat storage as well as promotion of low-grade inflammation
that contributes to development of insulin resistance (Moreno-Indias et al. 2014). This clearly shows that the
functionality of the microbiota has to be studied in addition to the composition in order to understand the
host-microbe cross-talk in obesity.
2.2 Methods to study the human intestinal microbiota
2.2.1 Faecal material as the study object
The easiest access to the human intestinal microbiota is faeces, which contains microbes and molecules that
reflect as closely as possible the activities in the colon (Cummings, 1975). Since faeces can be obtained in a
non-invasive manner, human studies targeting the intestinal microbiota have been mostly based on the
analysis of faeces. In contrast, very few studies have involved samples taken from the ileum, intestinal
Review of the Literature
8
mucosa (biopsies) or were conducted from other types of specimens such as luminal washes or biopsies
(Booijink et al. 2010; Li et al. 2011). These studies clearly show that each compartment has its distinct
microbiota with its own distinct functions.
The components of faeces, which originate from host, food and microbes, reflect the whole digestive
process. Faeces has been used as a cure for over thousand years and its composition has interested scientists
for more than 100 years (Joslin, 1901). The bacterial mass represents more than half of the faecal wet weight,
half of which were found to be intact microbial cells although many of these cells were not alive (Stephen
and Cummings, 1980; Ben-Amor et al. 2005). Research has moved from describing the composition of
faeces to using it for discriminating between different body conditions. Individual human proteins have been
studied in faeces and used as disease markers e.g. calprotectin, a calcium binding protein derived from
human neutrophils, is utilized as an inflammatory marker in inflammatory bowel disease (Foell et al. 2009).
Elevated levels of calprotectin were also measured in a cohort of obese individuals indicating an inflamed
state of the intestinal ecosystem in individuals with a high BMI (Verdam et al. 2013).
The liquid phase of faeces, termed faecal water, has been widely used to test for toxic compounds, as
reviewed by Pearson et al. (2009). Recently, faecal water has also been used to study the net outcome of host
and microbiota metabolism and turn-over by metabolomics, both by nuclear magnetic resonance (NMR) and
mass spectrometry (MS) techniques (Gao et al. 2009; Wu et al. 2010; O’Keefe et al. 2015). Since the
metabolites present in faeces do not reflect the proportion of absorbed molecules and metabolites, it may be
complemented with blood analysis to measure systemic metabolites. For example, the levels of plasma lipids
have been determined and correlated to the intestinal microbial composition (Lahti et al. 2013a). Recently,
several studies assessing the metabolites in faeces appeared. For example differences in microbial
metabolites between chocolate-eaters and non-chocolate eaters were detected; another study described the
effect of moderate wine consumption (Rezzi et al. 2007; Dewulf et al. 2013; Jiménez-Girón et al. 2014).
2.2.2 Phylogenetic analysis
The intestinal microbiota is phylogenetically rich, spreading over the three domains of life Bacteria, Archaea
and Eukaryotes. There is a constant up-dating of phylogenetic grouping and classification (Rajilić-Stojanović
and de Vos, 2014). Fungi and Archaea are more genetically deeply branched but also less researched than
Bacteria, which represent the majority of intestinal microbes. However, Archaea, like Methanobrevibacter
smithii and Methanosphaera stadtmanae, are of special interest due to their role in hydrogen disposal and in
the production of methane that regulates the fermentation rate of some species (Gaci et al. 2014). Viruses add
an additional “phylogenetic” complexity to the intestinal ecosystem. Similarly to the mycome, the virome
has just started to be studied, but it should not be neglected when aiming to understand the entire intestinal
ecosystem (Norman et al. 2014).
The traditional way of identifying the phylogeny of a microorganism has been its isolation by culturing
and studying its structural, physiological and biochemical characteristics such as cell-wall structure with
Gram-staining and substrate preferences via feeding tests. There is the assumption that the majority of the
world’s microbial biodiversity has not been cultured yet (Rappé and Giovannoni, 2003). At present, 1,057
intestinal microbial species have been cultured; 957 Bacteria, 92 Fungi, and 8 Archaea (Rajilić-Stojanović
and de Vos, 2014). These represent less than half of the almost 2,500 known OTUs. Many of the intestinal
microbes are strict anaerobes or are otherwise very demanding regarding their preferred culture and growth
conditions. Moreover, some microbes may be obligate syntrophs and cannot be grown as pure cultures, or
may grow very slowly demanding a patient researcher and long term funding.
Molecular tools have been instrumental in exploring the microbial species variety. One specific gene has
been established to categorize the phylogeny of an organism i.e. the gene encoding the 16S rRNA, an
essential part of the prokaryotic ribosome. The 16S rRNA is an approximately 1,500 nt molecule, consisting
of conserved gene regions and nine hypervariable parts. These characteristics make it an ideal phylogenetic
marker and its sequence reflects evolutionary distance. The conserved regions are used to design nucleic acid
Review of the Literature
9
primers for amplification of a broad range of species whereas the hypervariable part can be used for species
identification or at least clustering it to its nearest neighbour.
Quantitative polymerase chain reaction (qPCR) assays have been used to quantify certain microbial
genera or species (Rinttilä et al. 2004). For community-wide microbiota studies 16S rRNA gene-based
microarrays have been developed, such as the human intestinal tract chip HITChip (Rajilić-Stojanović et al.
2009). Microbial fingerprints have been used to compare the stability of the microbiota over time and
relative differences of individual taxa between individuals and study groups. For example, phylogenetic
fingerprints have shown clustering according to an individual’s health status such as the presence of IBD and
obesity (Erickson et al. 2012; Verdam et al. 2013) and also served as predictors of the effect of diet (Korpela
et al. 2014).
As sequencing technologies have advanced and costs are sinking, phylogenetic analysis has reached a
next level of sophistication. Instead of analysing a predefined set of phylotypes, the full diversity of specific
microbiota can be explored. Next generation sequencing (NGS) techniques, sequencing techniques that
followed the Sanger sequencing protocol and outperform it in terms of throughput and costs, made the global
analysis of 16S rRNA possible. Sequencing platforms and strategies for gene assembly are reviewed in
(Nagarajan and Pop, 2013). Accumulation of 16S rRNA sequences from various environments necessitate
selection of publically available 16S rRNA for an annotation of the sequence reads as specific as possible.
This will ease to find differences between individuals as those are more likely to occur at the species or even
strain level rather than at the genus or higher levels. Such a database has been created recently for the
intestine containing 2,473 OTUs, as indicated above, compared to over 100,000 contained in the 16S rRNA
sequence repositories SILVA and Greengenes (Ritari et al. 2015). Accumulation of genomic and
metagenomic data has supported the development of a software tool called Picrust that infers functions from
the phylogenetic composition based on 16S rRNA gene information (Segata et al. 2012).
Although technical developments have greatly facilitated phylogenetic analysis, one main factor still
remains to complicate between-study comparisons i.e. DNA extraction. Large differences in the abundance
of the major phyla have been found across studies that cannot be explained only by the sample origin
(Lozupone et al. 2013). For simultaneous lysis of a diverse mixture of cells ranging from Gram-positive
Firmicutes to Gram-negative Bacteroidetes, mechanical lysis by bead beating has been shown to be the less
biased method (Salonen et al. 2010). In addition to DNA-extraction, sequencing specific biases such as the
choice of primers, especially which variable region they are targeting, and the analytical platform used,
affect the result of phylogenetic composition analyses (Lozupone et al. 2013).
2.2.3 Metagenomics
A number of technological developments including advanced PCR (Williams et al. 2006), next generation
sequencing technologies, and new gene assembly strategies (see above) have made it possible to target the
complete genetic content of microbial communities. Several studies have taken advantage of metagenomics
to examine the genetic variety of the intestinal microbiota (Table 2). Sophisticated data processing strategies
are required if one wishes to efficiently analyse the complex data generated. Hence, the recent years have
seen the construction of several pipelines for processing raw sequence reads up to gene annotation, such as
Medusa (Karlsson et al. 2014), Metaphlan (Segata et al. 2012), MEGAN (Huson and Weber, 2013), as well
as metaProx that has been designed for faster annotation (Vey and Charles, 2014).
In the late 2000s, two large projects focusing on the human microbiota were launched: the European
MetaHIT with about 20 million Euro funding from the European Union focusing on the intestinal microbiota
and the U.S. Human Microbiome Project HMP funded with over 100 million US $ from the National
Institutes of Health, targeting microbiota at various body sites. Following the initial discovery of a baseline
of 3 M genes, at the end of the MetaHit project, a total of 511 Europeans and 368 Chinese individuals had
had their intestinal microbiota sequenced (Qin et al. 2010; Li et al. 2014). The HMP has sequenced over 500
human intestinal genomes as a reference data set to facilitate the assignment of metagenomic reads (Human
Review of the Literature
10
Microbiome Jumpstart Reference Strains Consortium et al. 2010). In addition, the microbiota from all body
cavities of more than 100 subjects has been determined (Morgan et al. 2012). In both projects also data
analysis tools were developed and the data has been made available to the scientific community (MetaHIT:
http://meta.genomics.cn; HMP: hmpccad.com). The HMP has continued as integrated HMP (iHMP) which
will focus more on the functionalities of the microbiota (Integrative HMP (iHMP) Research Network
Consortium, 2014).
Having started from sequencing clone libraries based on Sanger technology (Gill et al. 2006), the
technological advances have been reflected in the rapid increase of the number of samples analysed in a
single study, constantly increasing the number of sequenced nucleotides and the lengths of the reads while
reducing the amount of time and money needed. Therefore, the number of analysed human metagenomes
went up from two in 2006 to over 1,000 in 2014. Most studies have focused on identifying the genetic
richness and coding capacity of the human microbiome with the focus on adults and European and U.S.
inhabitants (Table 2). Mostly, the sequencing platforms HiSeq (Illumina) and 454 (Roche) have been used in
the various projects. The fact that the next generation sequencing technologies is developing very rapidly is
illustrated by the fact that less than ten years after its launch, the 454 sequencing platform is no longer
supported by its manufacturer.
The intestinal metagenomic sequencing efforts were recently combined into two separate gene atlases
containing close to and over 10 Million microbial genes (Li et al. 2014; Karlsson et al. 2014; Table 2). Based
on the data of both atlases, it was found that a core microbiome of close to 300,000 genes was shared by 50%
of the study population. In the dataset collected by Karlsson and co-authors (2014), 1.3 million genes were
reportedly shared by at least 20% of the population. The functional richness was assessed in the dataset
collected by Li and co-authors (2014) and a total of 6,980 KEGG orthologous groups (KOs) (42.1% of
genes) and 36,489 eggNOG orthologous groups (60.4% of genes) were identified (gene orthology databases
like KEGG and eggNOG are detailed in 2.2.5). In the atlas reported by Karlsson and co-authors (2014) it was
found that the core genes, i.e. genes found in at least 50% of the subjects, derived mainly from Bacteroides
and Clostridium spp., which was to be expected as the majority of the intestinal microbiota belongs to the
phyla Bacteroidetes and Firmicutes. More than 67% of the genes were without KO or known function but
when looking at core genes, this number was reduced to 55% - this could indicate a bias in the annotation
level of the shared house-keeping functions or just reflect the fact that more attention has been given to
annotate the genomes of core bacteria. The fact that only half of the genes have been annotated indicates
that, to a large extent, the functional individuality still has to be discovered. The explanation why individuals
are either more or less prone to develop a certain disease or react in a specific way to a food component may
lie in these unknown functionalities.
In addition to metagenomes, sequencing of single microbial genomes is relevant for genome annotation
and protein identification (see below). During recent years, there has also been a tremendous effort expended
in these analyses. In 2007, there were only 306 bacterial genomes with 28 originating from the digestive
system from Homo sapiens at the IMG/JGI system. The most recent release (04.02.2015) contained 23,908
bacterial genomes, out of which 1,847 had Homo sapiens as the host and in these, 398 digestive system as
origin (1,034 genomes unclassified). The enormous increase of the number of sequenced bacterial genomes
during the last years has been reviewed recently (Land et al. 2015).
Review of the Literature
11
Ta
ble
2.
Ov
erv
iew
of
the
mo
st s
ign
ific
ant
hu
man
in
test
inal
16
S r
RN
A a
nd
met
agen
om
ic s
tud
ies.
Da
ta r
eso
urc
e
NA
Eu
rop
ean
Nu
cleo
tid
e
Arc
hiv
e
PR
JNA
28
11
7
DD
BJ/
EM
BL
/
Gen
Ban
k A
N
32
089
;
Gen
Ban
k A
N
FJ3
62
60
4–
FJ3
72
38
2
htt
p:/
/gu
tmet
a.
gen
om
ics.
org
.cn
/
NC
BI
AN
SR
A0
45
64
6 a
nd
SR
A0
50
23
0
Ma
in f
ind
ing
s
En
rich
ed m
etab
oli
sm o
f g
lyca
ns,
am
ino
aci
ds,
xen
ob
ioti
cs;
24
07
un
iqu
e cl
ust
er o
f o
rtho
log
ou
s g
rou
ps
(CO
G)
Fu
nct
ion
al d
iffe
ren
ces
bet
wee
n i
nfa
nts
and
adu
lts
and
bet
wee
n h
um
an i
nte
stin
al m
icro
bio
ta
and
oth
er m
icro
bio
ta
Ov
erre
pre
sen
tati
on
: ca
rboh
ydra
te m
etab
oli
sm;
un
der
rep
rese
nta
tio
n:
lip
id t
ransp
ort
&
met
abo
lism
Co
re m
etag
eno
me;
dif
fere
nce
s in
ob
esit
y
3,3
00,0
00
Mio
un
iqu
e g
enes
Up
dat
e o
f Q
in e
t al
. 2
01
0 g
ene
atla
s to
4,2
67,9
85
gen
es
Seq
uen
cin
g m
eth
od
,
dim
ensi
on
of
gen
etic
info
rma
tio
n
San
ger
seq
uen
cin
g
Bac
teri
al 1
6S
rR
NA
gen
e an
d
dis
tal
gu
t m
etag
eno
me
50
,164
pre
dic
ted
op
en r
ead
ing
fram
es;
40
% u
n-a
ssem
ble
d
San
ger
seq
uen
cin
g
1,0
57,4
81
of
met
agen
om
ic
seq
uen
ce r
ead
s
San
ger
seq
uen
cin
g,
NG
S
1,9
37,4
61
par
tial
bac
teri
al 1
6S
rRN
A g
ene
seq
uen
ces
2.1
4 G
b o
f m
etag
enom
ic
seq
uen
ce r
ead
s
NG
S
57
6.7
Gb
of
met
agen
om
ic
seq
uen
ce r
ead
s
NG
S
37
8.4
Gb
of
met
agen
om
ic
seq
uen
ce r
ead
s
Stu
dy
co
ho
rt
Co
un
try
of
sam
ple
co
llec
tio
n
Sa
mp
le n
um
ber
Gen
der
, a
ge,
BM
I, d
iet
(wh
ere
ava
ila
ble
)
U.S
.
2
1 M
ale
age
37
, on
e fe
mal
e ag
e 2
8
on
e o
n u
nre
stri
cted
die
t, o
ne
veg
etar
ian
Jap
an
13
7 m
ale,
5 f
emal
e
3-7
mo
nth
s: 4
; >
1<
3 y
ears
: 2
; 2
4-4
5 y
ears
:
7
U.S
.
15
4
31
mon
ozy
go
tic
and
23 d
izy
go
tic
twin
pai
rs (
age
21
-32 y
ears
) an
d 4
6 m
oth
ers
Eu
rop
ean
or
Afr
ican
an
cest
ry
con
cord
ant
for
ob
esit
y o
r le
ann
ess,
lean
/ov
erw
eig
ht
or
ov
erw
eig
ht/
ob
ese
two
sam
pli
ng
po
ints
wit
h 5
7±
4 d
ays
inte
rval
s
Den
mar
k,
Sp
ain
12
4 (
56
% f
emal
e, 4
4%
mal
e)
Av
erag
e ag
e 5
6 y
ears
(ra
ng
e 18
– 6
9)
Av
erag
e B
MI
27
kg
/m2 (
17
.9 –
40
.2)
Ch
ina
36
8 (
dia
bet
ic p
atie
nts
an
d c
on
tro
ls)
43
% f
emal
e, 5
7%
mal
e
Av
erag
e ag
e 4
8 y
ears
(ra
ng
e 13
- 8
6)
Ref
eren
ce
Gil
l, 2
00
6
Ku
rok
awa,
20
07
Tu
rnb
aug
h,
20
09
Qin
, 2
010
a, b
Qin
, 2
012
a, b
Review of the Literature
12
htt
p:/
/ww
w.h
mp
dac
c.o
rg
/HM
GC
/
MG
-RA
ST
ser
ver
Illu
min
a 1
6S
rR
NA
AN
“qii
me:
85
0”;
mic
rob
iom
e sh
otg
un
seq
uen
ces
AN
“qii
me:
62
1”
Seq
uen
ce R
ead
Arc
hiv
e
ER
P0
02
46
9
Eu
rop
ean
Nu
cleo
tid
e
Arc
hiv
e
ER
P0
03
61
2
ER
P0
02
06
1
AN
: ac
cess
ion
nu
mb
er,
NG
S:
nex
t g
ener
atio
n s
equ
enci
ng
, H
MP
: hu
man
mic
rob
iom
e p
roje
ct
a In
teg
rate
d i
n t
he
gen
e co
llec
tio
n b
y L
i et
al.
201
4;
12
68
met
agen
om
es a
t E
uro
pea
n N
ucl
eoti
deA
rch
ive
PR
JEB
52
24
bIn
teg
rate
d i
n t
he
gen
e co
llec
tion
by
Kar
lsso
n e
t al
. 2
01
4;
78
2 m
etag
eno
mes
; 40
bil
lio
n m
etag
eno
mic
rea
ds
fro
m s
equ
enci
ng o
n I
llu
min
a p
latf
orm
;
htt
p:/
/web
.stu
den
t.ch
alm
ers.
se/g
rou
ps/
sysb
io-m
mg
/ME
DU
SA
/
Mic
rob
iom
es a
t 1
8 f
emal
e an
d 1
5 m
ale
bo
dy
sit
es;
Dis
trib
uti
on
of
pat
hw
ays
acro
ss
sub
ject
s m
ore
ho
mo
gen
eou
s th
an
dis
trib
uti
on
of
tax
a;
Fu
nct
ion
alit
y o
f m
ajo
r ce
llu
lar
pro
cess
es a
pp
eare
d t
o b
e si
mil
ar a
cro
ss
ind
ivid
ual
s
11
0 m
etag
eno
mes
;
Ag
e an
d g
eog
rap
hic
ori
gin
dep
end
ent
dif
fere
nce
s;
In a
ll a
ge-
gro
up
s p
ron
oun
ced
dif
fere
nce
s b
etw
een U
.S.
and
Am
erin
dia
ns
Dia
bet
ic a
t-ri
sk i
nd
ivid
ual
s p
red
icti
on
by
mic
rob
iom
e an
aly
sis
Av
erag
e 57
8,5
12
gen
es/m
icro
bio
me
(ran
ge:
59
,14
7–8
78
,81
6)
Lo
w a
nd
hig
h r
ich
div
ersi
ty
Gen
om
e as
sem
bly
fro
m m
etag
eno
mic
dat
a w
ith
ou
t u
se o
f re
fere
nce
gen
om
es
NG
S
16
rR
NA
gen
e se
qu
enci
ng
and
m
etag
eno
mic
s fo
r a
sub
set
of
sam
ple
s
NG
S
1,0
93,7
40
,274
met
agen
om
ic s
equ
ence
read
s
NG
S
45
3 G
b o
f m
etag
enom
ic
seq
uen
ce
read
s
NG
S
10
,067
,99
6,4
12
met
agen
om
ic s
equ
ence
read
s
NG
S
4.5
Gb
met
agen
om
ic
seq
uen
ce r
ead
s p
er s
amp
le
U.S
.
24
2
mea
n a
ge
of
27
±5
(ra
ng
e 1
8-4
0 y
ears
)
mea
n B
MI
24
± 4
U.S
.
Am
azo
nas
of
Ven
ezu
ela,
rura
l M
alaw
ian
com
mu
nit
ies,
US
A m
etro
po
lita
n a
reas
53
1
11
5 i
nd
ivid
ual
s (3
4 f
amil
ies)
fro
m M
alaw
i; 1
00
ind
ivid
ual
s (1
9 f
amil
ies)
fro
m V
enez
uel
a; a
nd
31
6 i
nd
ivid
ual
s (9
8 f
amil
ies)
fro
m t
he
US
A
32
6 i
nd
ivid
ual
s ag
ed 0
–1
7 y
ears
(8
3 M
alaw
ian
,
65
Am
erin
dia
n a
nd
178
res
iden
ts o
f th
e U
SA
)
plu
s 2
02
ad
ult
s ag
ed 1
8–7
0 y
ears
Eu
rop
e
14
5
70
yea
rs o
ld f
emal
es
no
rmal
, im
pai
red
or
dia
bet
ic g
luco
se c
on
tro
l
Den
mar
k
29
2 (
53
% f
emal
e, 4
7%
mal
e)
Av
erag
e ag
e o
f 5
6 y
ears
(ra
nge
50
- 6
1)
BM
I 1
8.1
– 4
6.6
kg
/m2
33
% l
ean
, 9
% o
ver
wei
gh
t, 5
8%
ob
ese
Den
mar
k,
Sp
ain
31
8 (
incl
ud
ing
sam
ple
s fr
om
Qin
et
al.
20
10
)
Incl
ud
ing
UC
an
d C
D p
atie
nts
Ta
ble
2 c
on
tin
ued
HM
P c
on
sort
ium
,
20
12
a, b
Yat
sun
enk
o, 2
012
Kar
lsso
n,
20
13
b
Le
Ch
atel
ier,
20
13
a
Nie
lsen
20
14
a
Review of the Literature
13
2.2.4 Metatranscriptomics
One step down the cellular machinery to gene expression is the transcription into mRNA. However, since
transcripts of bacteria and archaea have a short half-life, the design of the experimental approach and the
preparation of the samples should be appropriate. In babies, where the metabolic turn-over is high and the
intestinal transit usually very fast, transcripts may provide crucial information about intestinal activity. In
adults mucosal or ileal samples are very appropriate for mRNA analysis and have provided essential insight
into the metabolic conversions occurring in the upper intestinal tract where the microbes compete with the
hosts for carbohydrates and form various trophic chains (Zoetendal et al. 2012; Zoetendal and de Vos, 2014).
Moreover, this approach has revealed the functional activity of ingested Lactobacillus strains (Marco et al.
2010). In contrast, faecal samples from adults have been studied but the functional information which can be
obtained from these specimens may be limited to the stable messengers (Booijink et al. 2010). It has been
described that the digestive process in the human intestinal tract can last for up to six days (Cummings et al.
1978).
Since the mRNA should be processed immediately, transcriptional analysis of faecal samples may be
not become routine practice. Similar to 16S rRNA analysis, transcriptome analysis has moved from mere
profiling and analysing a set of pre-defined mRNAs on a microarray to targeting each microbiota
individually by sequencing mRNAs in an approach called RNAseq (Hampton-Marcell et al. 2013). Although
faecal metatranscriptome data has to be interpreted with caution due to the above mentioned technical
challenges, the present studies have provided evidence of the differences between potential function and
regulated functions in vivo. In a recent study conducted in healthy men, the metatranscriptomes differed
more than the metagenomes, reflecting the importance of functional studies as gene expression depends on
many different factors (Franzosa et al. 2014). When individuals were challenged with a meat-fat based diet,
samples clustered with KEGG assigned RNAseq data according to the diet but the taxonomic clustering
remained subject-specific implying a strong effect of diet on microbial transcriptional activity (David et al.
2014).
2.2.5 Metaproteomics
Proteins are the real key-players of any physiological process. Among others, they can process substrates
providing metabolic energy, have signalling and control functions within and outside the cells, have
structural functions, and make a cell mobile. It is important to address the actual function out of the many
possible ones encoded by the microbiome in order to obtain a more profound understanding of the activities
of the intestinal ecosystem and the host-microbe interactions. Proteins are the ultimate targets for this as they
determine the function of a cell. Moreover, protein synthesis is an energy-demanding process. For example
in E. coli it was found that two thirds of the total energy used for growth was spent on protein biosynthesis
(Neijssel and Mattos, 1994; Jewett et al. 2009). Hence, proteins are proxies of activity and since they are
rather stable (Taverna and Goldstein, 2002) their analyses both represent information of on-going activities
and those having taken place in the recent past. The term proteome describes the unity of proteins expressed
at a certain time under a certain condition in an organelle or organism. This has to be expanded in the context
of faecal metaproteomics as first reported by Klaassens and co-authors (2007) when they conducted a study
in human babies. As detailed above, faecal material may contain proteins, or their fragments, derived from
all steps of digestion. Therefore, faecal metaproteomics aims at analysing proteins expressed over a period of
time. The proteins present in saliva (Harden et al. 2012), gastric fluid (Kam et al. 2011), bile acid (Barbhuiya
et al. 2011), pancreas (Schrimpe-Rutledge et al. 2012; Paulo et al. 2013), and amniotic fluid (Michaels et al.
2007) have been characterized.
Large-scale analysis of proteins, i.e. measuring hundreds to thousands of different proteins in one run
has become possible due to the progress in two techniques i.e. liquid chromatography (LC), enabling the
separation of several thousands of peptides based on their differences in hydrophobicity within a few hours,
Review of the Literature
14
and mass spectrometry, making accurate mass measurement possible. Metaproteomics has advanced from
the two dimension gel-electrophoresis (2-DE) approach followed by matrix assisted laser
desorption/ionization time-of flight (MALDI-ToF) MS analysis to exploit mainly the 1-DE approach
followed by LC tandem mass spectrometry (MS/MS), called GeLC-MS/MS. Instead of separating proteins
according to their isoelectric point and molecular weight followed by the analysis of a limited set of proteins
(2-DE), complex protein mixtures are fractionated by their molecular weight only (1-DE). The GeLC-
MS/MS approach allows the analysis of a large mixture of peptides which are analysed in a single step by
nano-high-performance LC-MS/MS. Reverse phase (RP) columns consisting of C18 molecules are used and
a gradient increasing in hydrophobicity is applied to supply the mass spectrometer for peptide measurements.
The various aspects of faecal metaproteomics have been reviewed recently (Study V of this thesis); the
salient features of this discipline are summarized below (Figure 3).
Figure 3. Overview of a metaproteomics workflow with proposed specifications based on this thesis work.
Sample preparation
Prior to LC-MS/MS analysis, the proteins need to be liberated from the sample matrix. As in all meta-omics
techniques, a large collection of organisms can be the molecule-source for these proteins. Similarly, the
proteins may be extracellular or intracellular. The available options for metaproteomics include the analysis
of faecal water, chemical or mechanical lysis of either microbial cells, or all of the cells present in faeces. All
of these options have found application in faecal metaproteomics but a systematic comparison such as for
other metaproteomic samples like soil and activated sludge has yet to come (Keiblinger et al. 2012; Hansen
et al. 2014). Due to limitations of the nano-HPLC and the mass spectrometers, complex protein mixtures
need to be fractionated. This can be achieved by fractionation with 1-DE, followed by dividing the gel into
several fractions that are then separately analysed by LC-MS/MS. Peptide mixtures can also be separated by
strong cation exchange (SCX) chromatography and the resulting fractions analysed by reverse phase LC-MS.
Another possibility is to inject the complex mixture into the LC and run a shallow gradient for a long period
of time (see below).
Review of the Literature
15
LC-MS/MS
Proteins are relatively stable molecules (see above) and this eases their analysis. To increase throughput, the
proteins are digested into peptides prior to being subjected to MS. In order to reach the mass spectrometer
after elution from the chromatographic columns, the peptides need to be charged by ionization. The mass
analyser sorts the peptide ions according to their mass to charge (m/z) ratio and a detector measures their
intensities. Modern mass spectrometers are usually hybrids containing two mass analysers; one high accurate
mass analyser measuring the mass of intact peptide ions i.e. parent ions and, for easing down-stream peptide
identification, one less accurate but fast mass analyser for fragmenting peptide ions and analysing the masses
of the fragment ions. Several solutions are available for each component of a mass spectrometer, i.e.
ionization source, mass analyser, and detector. These are compatible with each other and many different set-
ups exist (Schneider and Riedel, 2010; Graham et al. 2011). There are also different methods available for
fragmenting peptide ions (Guthals and Bandeira, 2012) resulting into different ion types. Typically, in
metaproteomics studies, the peptides are ionized after nano-LC by electro-spray ionization and their m/z
value analysed by hybrid mass spectrometers. The fragmentation is carried out by collision-induced
dissociation leading to the production of mostly b- and y-ions, a fragmentation pattern which is appropriate
for peptide-identification. Since its invention in the late 1980s, mass spectrometers have undergone rapid
developments, both in terms of accuracy of the m/z determination and of speed of molecules analysed within
a certain time window. In this thesis, this is illustrated by the fact that with the mass spectrometer used in
Study IV five times more fragments were measured in the same time window (duty cycle) compared to
Study I. These developments allow the analysis of complex protein mixtures, such as those derived from
environmental systems, including the human intestinal tract. The invention of nano-HPLC allows the
injection of few l and therefore sample availability is mostly not the bottle neck. This is a clear advantage to
2-DE.
Despite the fact that only a limited number of human intestinal metaproteomic studies have been
reported, various options for sample preparation and data analysis have been described, reflecting the
multitude of options the proteomics tool box is offering. Least variation is seen in the LC-MS/MS setup
reflecting the 1D gel/gel-free approaches and nano-LC plus Orbitrap as state of the art. In the faecal
metaproteomics approaches, three major pipeline solutions have been applied: 2-DE followed by LC-
MS/MS, 1-DE followed by LC/MSMS, and SCX chromatography coupled to LC-MS/MS (SCX RP LC-
MS/MS) (Table 3). The sample complexity of environmental samples is generally higher than in classic
proteomics studies where the proteins originate from one organism and are usually present in a homogenous
sample matrix. In addition, one dogma for proteomics i.e. that all (or at least the majority) of expected
proteins is contained in the sequence database (Nesvizhskii, 2014), is hardly fulfilled in metaproteomics. The
discrepancy between the large interest in metaproteomics and its analytical challenge is expressed by the fact
that the number of review articles exceeds the number of experimental articles. An overview of the human
intestinal metaproteomic studies highlighting the differences in analytical approaches, used databases and
outcome in protein numbers and identified functionalities has been recently published (Study V of this
thesis). Since its publication, three additional studies have been reported which are focusing on method
development and those are detailed in the Results and Discussion section of this thesis (Lichtman et al. 2013;
Xiong et al. 2014; Tanca et al. 2015). One additional study described proteomic differences between healthy
controls and patients with Crohn’s Disease (CD) based on 2-DE profiles of faecal proteins (Juste et al. 2014).
Of note, the 16S rRNA analysis did not reveal any grouping according to the health status among the studied
samples, highlighting the added value of proteome compared to conventional microbiota analysis based on
phylogeny.
Review of the Literature
16
Table 3. Features of pre-fractionation techniques applied in human intestinal metaproteomics.
Feature 2-DE 1-DE SCX
Required protein amount > 10 µg 1-10 µg 0.01 -0.001 µg
Throughput sample
preparation
Low
Medium
High
Throughput LC-MS/MS Low Medium Low
Identification efficiency 10 – 100 proteins from one
sample
>500 proteins from one
sample
>500 proteins from one
sample
2-DE: two dimensional gel electrophoresis; 1-DE one dimension gel electrophoresis; SCX: strong cation exchange
Peptide identification
There are three main routes for inferring a peptide sequence to an m/z value. Those are peptide spectral
matching (PSM), “de novo” sequencing, and the use of spectral libraries (Lam et al. 2008; Allmer, 2011;
Vaudel et al. 2012). Most frequently applied is PSM which relies on matching parent and fragment ions
against an in-silico digest of a protein sequence database. Many different algorithms have been developed for
performing this task; these are available via open source tools or commercial software (Shteynberg et al.
2013). The choice of the protein sequence database is a very crucial one due to the complexity of the
analysed samples in faecal metaproteomics. Hence, this is an essential step in the whole metaproteomic
pipeline which is elaborated in the Results and Discussion part of this thesis. De novo sequencing i.e. direct
inference of a peptide sequence from the m/z value is not yet routinely used in metaproteomics (Cantarel et
al. 2011; Muth et al. 2015) as statistical methods to control for false positive identifications need to be
developed. Spectral libraries, i.e. high quality reference m/z repositories, do not yet exist for metaproteomics.
Since PSM is a probability-based approach, the number of false positive identifications has to be
controlled and kept on a low level (Jeong et al. 2012). One common method is to use the results from PSM
against a decoy database, which for example can be the sequences from the target database in reverse or a
shuffling of the amino acids (Yadav et al. 2012). This approach has been introduced for classical proteomics
and it has been shown that there is still improvement regarding sensitivity and accuracy of PSM in
metaproteomics (Muth et al. 2015). False discovery rate estimations at the protein level are impaired by the
problem of assigning peptides to proteins in metaproteomics. As the same peptide can be part of proteins
originating from various species, phyla or even domains of life, protein inference is put to its extreme in
metaproteomics. Therefore, analytical strategies are required to increase the statistical validity of PSM in
metaproteomics.
Specific post-translational modifications (PTMs) have not been considered in any human intestinal
metaproteome study, as the complexity is already extremely high. A large variety of bacterial PTMs is
known and includes phosphorylation, methylation and glycosylation, reviewed by Cain et al. (2014). In
general, targeted proteomics approaches are required to identify for example phosphopeptides, reviewed by
Huang et al. (2014). A proteomics approach has shown that some of their very abundant glycolytic enzymes
of Lactobacillus rhamnosus GG are phosphorylated (Koponen et al. 2012). A large variety of PTMs are to be
expected to be identified from intestinal metaproteomes.
Quantification
In addition to identify the different proteins in the metaproteome, quantifying their amounts is required for
sample comparison. Absolute quantification cannot be achieved in high-throughput proteomics but basic
approximations for assessing protein quantity exist and are very useful. Two major approaches can be taken:
labels can be introduced to compare between different samples based on the ratio of the differently labelled
peptides, but also label-free quantification has been made possible during the last decade due to technical and
software improvements (Seifert et al. 2012; Otto et al. 2014; Arsène-Ploetze et al. 2014). The latter approach
has been mostly applied in faecal metaproteomics. Label-free analysis is either based on the chromatographic
Review of the Literature
17
peaks of MS1 data, i.e. is the unity of all measured peptide ions, to compare peak intensities across
experiments. This requires sophisticated software to adjust for differences in chromatography and peak
intensities (aligning and normalization). Another, more simplistic but effective approach is counting the
spectral hits for a certain unit (that may be proteins, peptides, taxonomic or functional unit). Various
normalization methods have been described, reviewed by Neilson et al. (2011).
During the last decade, selected reaction monitoring (SRM) methods, also called multiple reaction
monitoring (MRM) methods, have been developed to quantify targeted proteins, reviewed by Liebler and
Zimmerman (2013). Proof-of-principle studies identify proteins of interest and then a method is established
to quantify these proteins. In a study on the metaproteome of the mucosal samples of CD patients a subset of
proteins that were found to be significantly differently expressed in the 2-DE experiments, were targeted and
verified with SRM/MRM (Juste et al. 2014). All of the SRM/MRM experiments of the thirteen proteins
verified the earlier 2-DE results. Recently, SRM methods for 10,000 human proteins have been published
(Rosenberger et al. 2014). In future, such methods may also be developed in large scale to target specific
microbial proteins derived from the human intestine.
In general, the evaluation of an approach is crucial for developing a method for multi-sample
comparisons, especially when a multitude of options is available for each single analytical step. Already for
the 1D separation and consequential protein in-gel digestion, a variety of choices have to be made and these
will influence the subsequent results (Speicher et al. 2000; Glatter et al. 2012). Hence, the evaluation of a
metaproteomic pipeline needs to be carefully planned and executed.
The technical development in MS-based proteomics poses a challenge for data analysis and data storage
(data repositories) as well as handling the constantly increasing size of raw files. A file containing the
MS/MS spectra of a single sample or gel slice can be as large as 0.5 GB. Improvements have been achieved
by standardisation of files for across-pipeline usage and public data repositories to enable reanalysis with
constantly improving data analysis tools. For example, analysis solutions have been developed, with cloud
computing to maximize compute resources (Muth et al. 2013). To handle these big data, software packages
for data visualization are required; these have been reviewed recently (Oveland et al. 2015).
Proteomics applied in microbiology
Proteomics has been widely used to study microbial activity but there are only very few studies on
commensal intestinal microbes, in-vitro as well as in animal model (Schneider and Riedel, 2010; Otto et al.
2014). Performing a literature search on PubMed on 15/02/23 with the term “proteomics” in combination
with either “pathogen”, “probiotic”, or “commensal” resulted in 1657, 81 and 55 hits, respectively. For
example, the pathogen Staphylococcus aureus has been studied for its response to glucose-starvation using a
proteome approach (Michalik et al. 2012). A reference proteome was reported for L. rhamnosus GG which is
widely marketed as a probiotic and consumed world-wide, and this paradigm probiotic strain was analysed
by proteomics in several culture conditions also resembling the intestinal environment (Koskenniemi et al.
2009; Koskenniemi et al. 2011; Koponen et al. 2012). Recently, the same group of researchers analysed the
surface proteome of L. rhamnosus GG as this possibly contains the interacting molecules in the host upon
consumption and identified expected secreted proteins as well as a set of moonlighting proteins (Espino et al
2015). Other probiotic or model strains that have been studied by proteomics are L. plantarum WCFS1
(Cohen et al. 2006), Bifidobacterium longum subsp. infantis (Kim et al. 2013), Propionibacterium
freudenreichii (Le Maréchal 2015), and Weissella confusa (Lee et al. 2008). One example of a commensal of
which its proteins were studied is Enterococcus faecalis (Lindenstrauß et al. 2013). In addition to measuring
activity, proteomics has recently been used to identify microbial strains by MALDI in the case of
Bifidobacterium animalis (Ruiz-Moyano et al. 2012).
Similarly, the proteome of several human body compartments have been determined as detailed above,
but only rarely the data is also searched against bacterial sequence entries.
Review of the Literature
18
Inferring functions to proteins
Once a new genome is sequenced, the function of its genes or proteins is inferred from homology analysis
with previously annotated genes or proteins; in the best case scenario the functions of selected genes are
discovered by knock out, overproduction and phenotypic analysis. Due to the exponential growth of
sequenced genes and therefore hypothetical proteins, experimental proof of the functions of these genes is
lagging far behind. In the most recent release of the Universal Protein Resource Knowledgebase
(UniProtKB) (2015_02 of 04-Feb-2015), the curated part Swissprot database contained 547,599 entries
(http://web.expasy.org/docs/relnotes/relstat.html), whereas the non-curated part Trembl contained more than
100-fold more sequences, with 90,860,905 sequences (http://www.ebi.ac.uk/uniprot/TrEMBLstats/). It
appears that the Swissprot database has reached a plateau whereas in Trembl the number of sequences is
doubling approximately every year. Evidently, this generates the problem that annotation of new genes
mostly relies on non-curated sequences and therefore introduces biases.
There are several approaches linked to extensive databases that can be used to assign proteins to
functional groups (Table 4). They differ in terms of the number and type of genomes on which the
classification has been based on. In addition, the amount of levels for functional grouping varies and can be
shallow like for the Cluster of Orthologous Genes (COG) (Tatusov et al. 1997) or more branched like the
Kyoto Encyclopedia of Genes and Genomes orthology (KO) (Kanehisa et al. 2004). The Cluster of
Orthologous Genes was originally built for bacterial proteins but extended for human functions as well
(Koonin et al. 2004). However, there has been a break of more than ten years in updating this classifier as a
result of funding issues. Considering the tremendous increase in sequenced genomes, this clearly created
insufficient coverage but the recent 2014 update provides a reasonable starting position (Galperin et al.
2015). The database eggNOG built on the COG database but has a higher sequence coverage and therefore
provides a good candidate for the functional annotation of especially bacterial genomes (Jensen et al. 2008).
Recently, the Metaproteome Analyser (MPA), a pipeline which ranges from PSM to functional
annotation, was released (Muth et al. 2015). One sample can be analysed with a collection of up to four non-
commercial search algorithms within one analysis process. The interface allows representing the taxonomic
and functional results in overview graphs. However, for this feature uniprot identifiers are required to infer
taxonomy and function and cannot be used in all metaproteomic approaches yet, as many protein candidates
for metagenomic protein sequence databases do not contain uniprot identifiers.
Table 4. Overview of gene orthology databases.
Database Description Source
COG Cluster of orthologous
genes
(Galperin et al. 2015)
25 functional families
4,631 different COGs
711 full length eukaryotic, archaeal and
bacterial genomes
http://www.ncbi.nlm.nih.gov/COG/
eggNOG non-supervised
orthologous groups (Powell et
al. 2014)
25 functional families
1,700,000 orthologous groups
Full length genomes
3,686 organisms (eukaryotic, archaeal and
bacterial)
http://eggnog.embl.de
KEGG
(Kanehisa et al. 2012)
18,449 KEGG orthologies (KO)
(2015/2/18; inferred from KEGG GENES
16,055,760), 469 pathway maps as one
part of the KEGG database
one KO may belong to a variety of higher
levels
last free version 2011
Review of the Literature
19
2.2.6 Data integration
Studying the genes, transcripts, proteins or metabolites of the human intestinal microbiota all have their
individual challenges. However, for a systemic understanding, different perspectives have to be taken at the
same time and different data sets need to be integrated. Recent integrated approaches of the intestinal
metaproteome followed a simple path and mainly described the study cohort (with only a limited set of
subjects) from different angles and reported the similarities or differences with different data sets, like
metagenomics and metatranscriptomics (Erickson et al. 2012; Ferrer et al. 2013; Daniel et al. 2014). Real
integration still needs to be further enforced, as it is exemplified in this thesis, by using different types of
data, like correlating the abundance of present community members with proteins to explain certain health
conditions. Models of metabolic fluxes and network-modelling can be integrated as well (Faust et al. 2012).
The term systems biology has been largely used for single organism studies but as analytical methods as well
as bioinformatics are advancing, it is now possible to examine the human intestinal microbiota in a systems
biological approach (Dos Santos et al. 2010; Siggins et al. 2012; dos Santos et al. 2013). Especially for
generating hypotheses, intestinal models, as reviewed in (Venema and Van den Abbeele, 2013), as well as
animal models help in reducing the complexity of the intestinal ecosystem. These hypotheses can then be
tested in human trials. The differences between mice and humans have been reviewed recently (Nguyen et al.
2015). When testing the effect of diet on a consortium of 12 human bacterial strains in a mouse model,
transcriptomics, composition analysis and proteomics revealed different changes at each analytical level
(McNulty et al. 2013). The share of unique peptides per strains in the in vivo community was different to the
predicted contribution based on genome comparison. These findings evidence the need for studying the
intestinal community with complementary omics methodologies, not only to increase the insight, but also to
reduce biases of each of the approaches. A new methodology is the use of stable isotope probing (SIP) that
has been used to identify the bacteria with active metabolic conversions in an intestinal model and could also
be used in humans to study protein fluxes (Kovatcheva‐Datchary et al. 2009). Recently, SIP has been used to
study the active community found in mouse intestinal biopsies (Berry et al. 2013).
Review of the Literature
20
3. AIMS OF THE STUDY
The main aim of the thesis work was to characterize the intestinal ecosystem based on the proteins identified
from faecal material. In detail, there were four main aims of this study:
I) To develop methods for assessing the metaproteome from the human intestinal tract and implement these
in a proof-of-principle study;
II) To determine the intestinal functionalities and their short-term variance in healthy adults in the context of
a probiotic intervention;
III) To conduct a comparison of the intestinal functionality of non-obese and obese individuals based on
microbial and host proteins;
IV) To evaluate the methodologies which can be used to study the metaproteome of the human intestinal
microbiota.
Materials & Methods
21
4. MATERIALS & METHODS
4.1 Study cohorts and samples
During the course of this work faecal samples from four different study cohorts were analysed (Table 5).
The individuals in Study I took part in the Micro-Obese cohort recruited from the large SU.VI.MAX2
cohort, in which individuals had been followed for eight years for their healthy nutritional and lifestyle habits
(Hercberg et al. 2004).
For Study II, which was a proof-of-principle study to assess the reproducibility of the developed
approach and to determine the variation over time, samples were collected in-house from adults who
provided informed consent.
The study participants in Study III were a sub-set of a larger cohort of initially 68 individuals taking part
in a probiotic trial testing three different probiotic strains in a double-blind setting (Lactobacillus rhamnosus
GG, 1010 cfu per day; Bifidobacterium animalis ssp. lactis Bb12, 108 cfu per day; and Propionibacterium
freudenreichii ssp. shermanii JS, 1010 cfu per day). The study product, to be consumed daily for three weeks,
was a 250 ml fruit-based milk-drink without or with the probiotic strains. The set-up of this trial, the
determined immune-markers and lipids, and the correlation between levels of lipids and the microbiota had
been reported previously (Kekkonen et al. 2008a; Kekkonen et al. 2008b; Lahti et al. 2013a) (Figure 4).
Metaproteomic analyses were carried out on samples from eight individuals from the placebo group and
from eight individuals from the L. rhamnosus GG group.
In Study IV, the intestinal metaproteomes were analysed in faecal samples from nine lean (BMI > 18.5 <
25 kg/m2), four overweight (BMI > 25 < 30 kg/m2), and 16 obese subjects (BMI > 30 kg/m2; N=4); of these
seven were morbidly obese (BMI > 45 kg/m2). The microbial composition and clinical host variables had
been determined previously (Verdam et al. 2013).
Table 5. Overview of study cohorts analysed in this thesis
Study Type Country of
sample
collection
Subjects No.
(Female:Male)
Timepoints
(intervals)
Sample
storage
Age
range
BMI
Study I Observational
study
France 2 (0:2) 1 Fresh 61-63 < 30
Study II Proof of
principle study
Finland 3 (3:0) 2 (1y/
1/2 y)
-70° C 23-35 < 30
Study III Placebo
controlled
probiotic
intervention
Finland 16 (11:5) 3 (3 weeks) -20° C o/n
-70° C
27-54 18.4–
30.4
Study IV Comparative
study
The Netherlands 29 (20:9) 1 +4° C
-20° C
within one
day
18-54 18.6–
60.3
Figure 4. Study design of Study III with respect to the analyses described in this thesis.
Materials & Methods
22
4.2 Methods
4.2.1 Metaproteomics
Study I was conducted within a collaboration in which the proteins had already been extracted from faecal
samples by first isolating microbial cells by Nycodenz gradient centrifugation (Murayama et al. 2001),
followed by disruption of the cells by chemical lysis, as described in Study I in this thesis. In all other studies
an in-house developed sample preparation approach was applied that included the following key features: i)
no fractionation of faecal material, ii) extraction of proteins via mechanical lysis with bead beating (0.1 mm
zirconium silicate beads) under anaerobic condition (gaseous nitrogen), and iii) protein fractionation into
three parts according to molecular weight by one dimensional gel-electrophoresis with focusing on the
approximately 40-80 kDa fraction. The proteins contained therein were termed the Abundant MetaProteome
AMP (see Results and Discussion).
The details of the applied metaproteomic methods have been described in the original articles (Studies I-
IV). Table 6 presents an overview of the applied metaproteomic techniques from sample preparation to data
analysis as applied in the four sub-studies.
Table 6. Overview of proteomics methods applied in this thesis
Study
I II III IV
Sample
Preparation
Protein Extraction From microbes isolated from faecal material
(=bacterial pellet)
X
From unfractionated faecal material X X X
Detergent buffer (urea, thiourea, CHAPS) X
Phosphate buffered saline (PBS) X X X
FastPrep FP120 (Savant) @ 0.1 mm zirconium
silicate beads
X
FastPrep 24 (MP Biomedicals) @ 0.1 mm
zirconium silicate beads
X X
1D PAGE NUPAGE 4-12% Bis-tris (Invitrogen); reducing
agent, sample buffer, and MES running buffer
(Biorad)
X X X X
Fractionation 20 gel pieces per gel lane X
3 gel pieces per gel lane, analysis focus on AMP X X X
In-gel digestion (Shevchenko et al. 2006) X X X X
Peptide clean-up GF/C filtration and OMIX tips X
C18 microspin columns X
HPLC-ESI-
MS/MS
Nano-HPLC Gradient run length (min): 50, 85, 100, and 120,
respectively
X X X X
Mass spectrometers LTQ-Orbitrap X
LTQ-Orbitrap XL X
Orbitrap Velos X
Velos Pro-Orbitrap Elite X
Materials & Methods
23
Table 6 continued
Data-analysis Peptide spectral
matching – search
algorithm
OMSSA (Geer et al. 2004) X X X X
X!Tandem (Craig and Beavis, 2004) X
q-value correction (Käll et al. 2008); filtering of
ambiguous hits
X
Peptide spectral
matching – database
Matched metagenomes, collection of bacterial
genomes (=synthetic metagenome) + unmatched
metagenome
X
Gut dedicated database (db) X
Extension of db in Study II X X
Separate search against subsets of db X
Search against complete set of sequences in db X X
Peptide grouping X
False discovery
rate(FDR) estimation
Concatenated target and decoy db
X X X
Separate target and decoy db X
FDR stringency 1 % X
5 % X X X
Functional
assignment
COG BLAST (Tatusov et al. 1997; Altschul et
al. 1997)
X X X
KEGG assignments through NCBInr BLAST
and subsequent MEGAN 5 analysis (Huson and
Weber, 2013)
X
EggNOG (Powell et al. 2014); Ublast (Edgar,
2010)
X
Functional grouping iPath (Letunic et al. 2008) X
microbial gastro-intestinal metabolic modules
GMM (Le Chatelier et al. 2013)
X
Taxonomic
assignment - Lowest
common ancestora
(LCA)
In-house python scripts – UniprotKB X
Unipept (Mesuere et al. 2012) X X
Quantification Progenesis LCMS (MS1 based) X
Spectral counting (MS/MS based)b X X X
a Peptide matching against UniprotKB entries with 100% identity; assignment of lowest taxonomic unit to which the
peptide was uniquely mapped; in each case all the hierarchical levels were reported allowing the sub-setting of the
data e.g. for bacterial peptides b Counting of the number of spectra identified per comparative unit e.g. KO or phylum; relative percentages in
Studies I - III; absolute numbers in Study IV
4.2.2 Phylogenetic microarray analysis
In addition to the metaproteomic data, phylogenetic data was available. The experiments for nucleic acid
based analysis were carried out by our research team (Study II and III) and collaborators (Study III, Study
IV). In Study II and IV, faecal DNA was extracted by mechanical lysis as described in (Salonen et al. 2010)
and in Study III with a modified Promega method as described in (Ahlroos and Tynkkynen, 2009).
Of all samples in Studies II-IV, with the exception of four samples in Study III, the Human Intestinal
Tract Chip (HITChip), a phylogenetic microarray described previously (Rajilić-Stojanović et al. 2009), was
used to identify the bacterial composition of the faecal samples. The results of the microbiota composition in
Materials & Methods
24
studies III and IV had been reported previously (Verdam et al. 2013; Lahti et al. 2013a). The HITChip
targets the V1 and V6 hypervariable regions of the 16S rRNA gene of over 1000 bacterial phylotypes.
Phylogenetic organization of the microarray probes and data pre-processing have been explained in detail
elsewhere (Rajilić-Stojanović et al. 2009; Salonen et al. 2010; Jalanka-Tuovinen et al. 2011). In Studies III
and IV, the probe signals were summarized with Robust Probabilistic Averaging algorithm (fRPA) (Lahti et
al. 2013b) per phylotype and further to 131 genus-like levels, which are groups of bacteria sharing ≥90%
sequence identity, and referred to with species name and relatives.
4.2.3 Statistical analysis
In studies II-IV, the statistical programming language R was used for statistical data analysis (R-core team).
Unsupervised methods
For the correlation of various data-sets either Pearson (Studies II and III) or Spearman correlation was used
(Study IV). The Jaccard Index (Levandowsky and Winter, 1971), in this case “number of shared
peptide/peptide union”, was used to compare the metaproteomes at the level of identified peptide sequences
(Study III). Complete linkage hierarchical clustering was applied to visualize sample relationships (Studies
II-IV). Principal component analysis (PCA) (Venables and Ripley, 2002) was used as the ordination method
to identify feature based clusters (Study II). In addition, principle coordinate analysis PCoA was used to
identify sample clusters based on distances (Paradis et al. 2004). Jackknife iteration was used to test the
statistical significance of the separation. Random forest analysis of ANOVA results was applied in Study II
(Breiman, 2011; Faraway. 2002). Q-value corrections were applied. Wilcoxon test was used to test for group
differences (Study IV).
Supervised method
Boots-trap aggregated redundancy analysis (bagged RDA), applied and described in (Jalanka-Tuovinen et al.
2014), was used to identify the NOGs explaining most of the clustering of the study groups in Study IV.
4.2.4 Additional data-sets used in this thesis
Study I Study II Study III Study IV
Metagenomic reads,
NCBI BioProject 33303,
samples NO1 and NO3
Phylogenetic microarray
data
Phylogenetic microarray
data;
Symptom diary
Phylogenetic microarray
data;
Clinical host variables
Results and Discussion
25
5. RESULTS AND DISCUSSION
The general objective of the work described in this thesis was to gain a functional insight into the intestinal
microbiota based on a metaproteomic analysis of human faecal material. For this purpose, appropriate
sample preparation methodologies and a data analysis pipeline had first to be developed. These analytical
strategies were applied to faecal samples collected from a total of 50 adults across four different studies. The
functional individuality of the microbiome and changes in it occurring over time were addressed. In addition,
possible modulations of these functions due to both a probiotic intervention and to obesity were followed.
While the emphasis was placed on addressing the microbial functionalities, the developed approach also
enabled the simultaneous analysis of the host functions.
5.1 Method development to study the human intestinal metaproteome (I, II and V)
At the start of this project only one faecal metaproteomic study had been published, describing the
metaproteome of baby faeces in the first 100 days of life (Klaassens et al. 2007). However, this first study
was based on the classical 2-DE approach, before the appearance of the high throughput LC-MS/MS
analytical technologies, and furthermore, it targeted baby microbiota which has a low complexity as
compared to the adult microbiome. Moreover, no reference databases were available for the intestinal
proteins. Hence, here a novel methodological pipeline had to be developed for analysing faeces obtained
from adults; i.e. starting from refining the methodologies for preparing faecal samples for LC-MS/MS
analysis and through different stages to finally devising a strategy for complex data-analysis. First, protein
extraction and fractionation methods needed to be developed, applicable for the analysis of several dozens to
hundreds of samples (Study II). Second, in the peptide spectral matching, as large a collection as possible of
target sequences was needed with which to construct an intestine-specific database. As the sequence analysis
of the intestinal metagenome has made very rapid progress, this database had to be updated (Studies I-IV).
Third, a set of different gene orthology systems were applied to functionally annotate peptides and proteins,
as no established system for the human intestinal proteins existed (Studies I-IV). Fourth, since previously the
phylogenetic origin of a peptide had not been inferred from intestinal proteins, this was undertaken for the
first time (Studies II-IV). Moreover, statistical approaches, like the Jaccard index and the integration of
metaproteomic and phylogenetic data, originating from different disciplines were adopted for
metaproteomics (Studies II-IV). Finally, a critical assessment of the methodologies used in human
metaproteomics studies was performed and future needs and directions of human metaproteomics were
proposed (Study V).
5.1.1 Analysis of the abundant metaproteome (II)
Mechanical lysis of microbial cells from adult faecal samples using a bead beating procedure had been
shown to generate high yields of DNA that were suitable for determining the phylogenetic composition of a
microbial ecosystem (Yu and Morrison, 2004). Hence, this method was applied to extract proteins from adult
faecal samples. Similarly to the amount used for DNA extraction applied in Studies II-IV, a total of 125 mg
faecal material was used as starting material (Salonen et al. 2010). Bead beating of the sample after three
fold dilution in PBS with 500 mg zirconium silicate beads, for cell disruption, and three glass beads, for
homogenization during the lysis process, under a gaseous nitrogen atmosphere resulted in protein
concentrations high enough to produce highly visible Coomassie stained protein bands appropriate for
consecutive nano-LC-MS/MS analysis.
Proteins were fractionated according to their molecular weight by 1-DE. In addition to the size
separation, this also provided a significant purification of the protein extracts and no further cleaning of the
sample was needed anymore as these often results in dilution and the subsequent necessity to concentrate the
samples. The 37 and 75 kDa bands of a pre-stained marker were used as molecular weight controls in the
Results and Discussion
26
production of three gel fractions: one containing proteins larger than 80 kDa, one with a molecular weight
range of approximately 40-80 kDa and one with proteins smaller than 40 kDa. The proteins contained in
these fractions were called High molecular weight MetaProteome HMP, Abundant MetaProteome (AMP),
and Low molecular weight MetaProteome (LMP), respectively. The proteins contained in the ~40-80 kDa
region were termed AMP as the majority of proteins were covered in this region. By comparing the three
molecular weight fractions, it was observed that 81.4% of the different proteins across all fractions were also
identified in the AMP (Study II, 6 biological samples). A further motivation to focus on this area was that it
may be difficult to identify high molecular weight proteins and the lower molecular weight fraction is likely
to contain ribosomal proteins and partially digested proteins. In Study II, where the approach was published
for the first time, AMP was referred to proteins present in the 35-80 kDa region. This terming was based on
few individuals’ samples. Later, when analysing dozens of different samples, it was noticed that cutting the
gels below the 35 kDa marker band may not always be convenient and a cutting above is required in order to
have sample intrinsic marker bands for cutting which might be slightly above the 37 kDa band of the pre-
stained marker. On average, in the AMP, almost 5500 unique peptides and 460 NOGs, as indicator for
functional depth, were identified in one LC-MS/MS experiment with the experimental specifications as
provided for Study IV (Table 6).
Due to the complexity of the sample matrix and the proteins present in faecal material one sample
preparation approach will always allow identification of only a subset of proteins present in the sample.
Therefore, recently, various methods have been described that could increase the number of proteins
identified in a metaproteomic experiment. Two main approaches are available for the identification of low-
abundance proteins in the intestinal metaproteomes, i) separating the faecal fractions and ii) fractionation of
proteins. Two recent studies described approaches to separately analyse bacterial and human proteins
(Lichtman et al. 2013; Xiong et al. 2014). Analysing the proteins present in faecal water identified 53 human
core proteins in a sample size of three individuals (Lichtman et al. 2013). This number corresponds to the 23
human KOs - which are a higher abstraction level than proteins - found to be the human functional core in
Study III (data analysis this thesis; core defined as number of KOs identified in at least one sample of each
individual). Therefore, the approach described here that is based on taking crude faecal material covers the
same range of human proteins as assessed in the so-called targeted approach. The tedious separation
described by (Xiong et al. 2014), which included several sample-preparation steps to separate human and
microbial cells from each other, appeared to double the number of identified microbial proteins by two-fold
compared to an approach analysing both fractions at once. Therefore, as found in other proteomics studies,
increasing the level of fractionation increases the level of identifications.
In addition to sample separation strategies, increasing the analysis time for one sample might increase
protein coverage. This can be achieved by undertaking an extensive fractionation of proteins or peptides. For
mouse faeces, filter assisted protein digestion (FASP), a protein digestion method which circumvents the in-
gel digestion, in combination with a gradient running for over eight hours, produced over 16,000 peptides in
four runs (Tanca et al. 2014). The gain in functional information compared to other approaches was not
detailed. The same group compared the effect of differential centrifugation and achieved an enrichment of
microbial peptides when separating bacterial cells from other faecal components by applying different
centrifugation speeds. The preparation of a microbial pellet resulted into the identification of approximately
6,000 microbial peptides, whereas when the unseparated material was taken only around 4,000 peptides
could be identified (Tanca et al. 2015). However, in the approach analysing the bacterial fraction, it appeared
that the Bacteroidaceae were underrepresented. When comparing these other studies to the approach
developed in this thesis (Study II), differences could be observed also at other steps of the pipeline, e.g. in
the method of lysing the cells, the analytical platform used, as well as in the databases used (Table 7). These
differences impair any direct comparison between the various studies. In general, the quantity and quality of
fractionation and time spent per sample differs depending on which combination of techniques is selected. At
Results and Discussion
27
the functional level, no major discrepancies appeared when comparing the different studies based on the
identified proteins (Study V).
Table 7. Brief overview of examples for published method combinations in faecal metaproteomic experiments.
Sample preparation Kolmeder et
al. 2012
Tanca et al. 2014 Erickson et al.
2011
Juste et al.
2014
Crude faecal material X
Bacterial pellet
X X X
Digestion
In-gel digestion X X
FASEP X
In-solution
X
Analytical platform
2-DE X
SCX X
LC-MSMS moderate gradient < 3h X
LC-MSMS Long gradient > 3h
X
Protein sequence database
Bacterial genomes X X X X
Metagenomes X X
5.1.2 Reproducibility testing of the pipeline (II)
The effect of sample preparation on the reproducibility of the LC-MS/MS results had to be determined in
order to allow its application for robust sample comparisons. For this purpose, MS1 profiles were used, since
with respect to the analytical outputs, they represent best the uniqueness of each in-gel digest. Replicates of
the protein extraction, in-gel digestion, and LC-MS/MS were taken for comparison. In the analyses, samples
from six biological samples were included representing three different individuals and two sampling time-
points.
Altogether, the comparison was performed with 26,000 MS1 events across 22 AMPs. Based on Pearson
correlation coefficients, the technical variation was compared to the variation of different samples from the
same individual and in samples from different individuals (Figure 4). The technical variation, including the
variation introduced by protein extraction, 1-DE/in-gel-digestion and LC-MS/MS analysis, was significantly
lower than the variation of samples from the same individual collected on two different days with at least
half a year time in-between. The variation observed in different samples from the same individual was
significantly lower than the variation observed in samples taken from different individuals. This provided
assurance that the developed approach could be applied in consecutive studies for comparing the
functionalities of the intestinal microbiota across samples.
Results and Discussion
28
Figure 4. Boxplots of the distribution of the Pearson correlation coefficients (y-axis) to assess the effect of sample
preparation, temporal and subject variation.
5.1.3 Development of in-house databases (I-IV)
One specific challenge in metaproteomics is the need for an adequate protein sequence database being a fair
representation of the vast variety of different proteins present in the sample. The intestinal microbiota with
the hundreds to thousands of different microbial species is an extreme version of this challenge. The
individuality of the intestinal metaproteome and the consecutive required targeting of the database were
demonstrated with metaproteome analysis of two samples for which the metagenomes had been analysed and
were available for PSM i.e. matched metagenomes (Study I). Protein sequence databases were constructed
by naïve translation of the metagenomes and these were used for PSM. By using the matched metagenome as
the protein sequence database for PSM, it was possible to obtain double as many identified peptides than if
one used the unmatched metagenome. In the same study, the following search strategy to increase the
number of identified peptides was proposed; i) to use a collection of intestinal bacterial genomes as a
database, called a synthetic metagenome, ii) perform a BLAST search with the hits from this search against a
large metagenome database, iii) construct from these BLAST hits a protein sequence database and finally iv)
perform PSM against this database and apply a conserved FDR of 1%. The aim was to increase per protein
the sequence coverage by identified peptides, not to identify a large variety of functions. This approach of
using an iterative workflow for PSM targeted the problem of impairing peptide identification on the basis of
established validation strategies when the search sequence space becomes extremely large. In a classical
proteomics experiment the database size is usually not larger than 10,000 sequences. However, the collection
of metagenomic sequences used here contained 3,300,000 sequences (Qin et al. 2010). The result of doubling
the number of identified peptides compared to the matched metagenome demonstrated the usefulness of the
approach that could be applied in cases where extensive protein coverage is needed. Recently, other
strategies for decreasing the size of the protein sequence database to aim for more specificity have been
proposed. In one of these approaches a first search was conducted in a complex protein sequence database
and a second search in a database constructed from the unfiltered results from the first search. The peptide
identifications from the second search were subjected to strict FDR filtering (Jagtap et al. 2013). In another
study, in a first search all the UniprotKB entries were used and then in a second search only the genomes of
those microbial strains for which initially a hit had been found (Tanca et al. 2015). A systemic comparison of
the available possibilities to increase the sensitivity of PSM in metaproteomics has yet to come.
Results and Discussion
29
Especially at the beginning of this work, having a matched metagenome of each analysed sample was
not feasible due to the high costs. Therefore, it was decided to construct a database based on publically
available sequences which would cover microbial, human and food derived proteins. During the course of
this thesis work, the first comprehensive metagenome study was published (Kurokawa et al., 2007) followed
by a human intestinal gene catalogue containing 3.3 Mio genes (Qin et al. 2010); the data of both studies was
included into the protein sequence database constructed in-house. In addition, the protein sequence database
contained a collection of intestinal microbial genomes; their number was initially 298 (Study II) but
increased to 594 as more genomes became available (Studies III & IV). The human genome was included as
well and extended for identifying alternative proteins in Studies III & IV. Finally, seven plant genomes were
included in the protein sequence database that could be beneficial for identifying proteins present in food.
The initial database contained 5,132,694 protein sequences (Study II) and the updated database 6,153,068
(Studies III & IV). The extension of the protein sequence database plays a possible role in the increased
identification ratios during the course of this thesis work (Table 8).
The largest collection of genes derived from the intestinal microbiota covers now more than 10,000,000
genes which may increase the ratio of identified peptides out of the metaproteomes (Karlsson et al. 2014).
However, as exemplified in a recent study where the effect was tested of the size of the protein sequence
database, it appeared that the optimum way of using this large sequence space still has to be determined
(Muth et al. 2015). The more unique proteins that are present in a protein sequence database, the higher is the
probability that peptides present in a sample will be identified due to the nature of PSM that allows only
100% similar identifications. Conversely, the recent study by Muth and co-authors revealed that by using the
present peptide validation strategy of FDR filtering, the sensitivity will suffer in the case of large databases,
hampering utilizing its full potential.
Table 8. Overview of MS/MS spectra and peptide identifications across studies
Study I *
50 min gradient
Study II**
85 min gradient
Study III **
Step-wise gradient
70min + 30min
Study IV**
120 min gradient
Number of MS/MS
spectra per sample
~45,000 ~4000 ~20,000 ~40,000
Identification ratio 7.9 - 11.9% ~17.0% ~24.0% ~27.0%
*Different approach than used in Studies II-IV, 20 gel pieces covering full molecular weight range, each analysed with
a 50 min gradient
** AMP; see Table 6 in Material and Methods for differences in mass spectrometers used and other parameters between
the studies
5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV)
Simply because of their extreme complexity and since the appropriate methods are still under development,
most metagenomic sequences have not been well annotated. Hence, a strategy of annotating the identified
proteins by orthologies was used. This orthology-based annotation facilitated the comparison between
samples. Due to the problem of protein inference, which is particularly pronounced in metaproteomics, it is
impossible to assign one protein solution for one peptide, the common unit in classical proteomics for
making comparison between samples. In addition, default settings of search algorithms only provide a
certain amount of proteins solutions for one peptide (e.g. 30 by OMSSA). However, as in the case of
glutamate dehydrogenase, over 500 proteins from different bacteria share at least one peptide and often more
peptides. Therefore, the initial search hits need to be remapped to the original protein sequence database with
all options per tryptic peptide recorded, and then proteins should be grouped according to their peptide
dependencies. In Study II proteins have been grouped based on having at least one peptide in common and
one unique peptide. This is computationally demanding and as no ready applicable software solution is
available for this task, it was only applied in Study II, with in-house scripts. Therefore, otherwise peptides
Results and Discussion
30
were functionally characterized based on the proteins they belonged to based on orthology and then the
spectra per orthological functional category summed for subsequent quantitative comparisons.
Out of the three orthological databases i.e. COG, KEGG and eggNOG, used for assigning a function to a
peptide with eggNOG (Study IV) the highest functional assignation rate was obtained. For more than 90% of
the spectra the respective proteins had a high confidence hit in the eggNOG database when applying a
UBlast search (e-value of 1E-30 or lower); reflecting the suitability of this classification database of
orthologous genes for annotating the human intestinal metaproteomes. The other databases COG and KEGG,
which were used in studies II and III, covered around 70% of the identified spectra. All these orthological
classification systems are based on a large collection of genomes from various sources and therefore they
span a very broad range of functions. The description of the mapped orthologies is not always readily
interpretable in the context of the human intestinal microbiota and may be viewed as rather abstract.
Therefore, GMMs have been developed (Le Chatelier et al. 2013) that organize KOs into intestinal specific
functional groups at three levels, such as mucin degradation or the degradation of carbohydrates, like
arabinose and xylose (Study IV). These are meaningful categories and assist in the interpretation of the
results. Characterization of protein functions can also be done based on the protein families database
(PFAM) (Finn et al. 2014). This domain based functional annotation was not applied in this thesis work but
has been previously used in metagenomic studies (Ellrott et al. 2010) and might be useful also in
metaproteomics for functional description of proteins and sub sequential quantitative analysis for
functionally targeted results.
5.1.5 Reflections on the metaproteomic approach
Taking into account that possibly 1000 microbial phylotypes are present in one individual intestine and that
each of these phylotypes has a gene capacity of at least 2000 proteins, several millions of proteins could be
present in one faecal sample. The proteins identified here obviously only reflect a very small fraction of the
total proteins present in the samples.
There are a few critical points that affect the success of a metaproteomic analysis. Sample preparation
and protein or peptide fractionation strategies cannot be complete. However, the reproducibility can be
assessed; based on a validated approach comparisons between samples can be made. In this work, the
proteins were fractionated by molecular size which was then utilized in their subsequent selection and
recovery. This process may have an effect on the identified proteins as only a subset of the proteins was
analysed. Moreover, the recovery of the trypsin-digested peptides from the gel may be biased and affected by
the amino acid sequences. In general, there exists no absolute bias-free proteomic technique, also not for
metaproteomics and there is still potential for identifying more functions. To be kept in mind is that both the
functional and taxonomic comparisons are based on identified spectra. This might introduce a further bias as
it is of course not known which functions the not-identified proteins may have. They can be similar to those
identified but they may also be different. Furthermore, a number of other parameters may introduce a bias,
such as differences between search engines and their capacity to differentiate between similar peptides: with
respect to the search results OMSSA for example is more conservative, tending to underrepresent, whereas
X!Tandem has a tendency to over-represent. In addition, the more information is demanded for a peptide
such as function and taxonomy, the smaller will become the remaining fraction for comparison; not all
peptides have a taxonomy assigned and of these not all have an assigned function and vice versa. Therefore,
improvements still can be made at all the various steps. The protein coverage needs to be increased by an
optimized sample preparation method and analytical platform. However, careful evaluation is required that
an accurate comparison between a large amount of samples is still possible. In addition, a cost-benefit
analysis for the database searches is required with regard to the choice of search parameters and database
size, all of which define the compute time and time spent on interpretation of the results, and peptide
identification sensitivity (Muth et al. 2015). Similarly, peptide and protein validation methods for
metaproteomics are required. Because of all these possible improvements, the here described approach
Results and Discussion
31
captures a fraction of the faecal proteins. However, it represents a feasible solution for comparing dozens of
samples including the functional and taxonomic characterization of the identified peptides. Although there
was a constant development of the metaproteomic pipeline throughout the project no major discrepancies
were observed. This indicates that the developed approach provides a reliable access to the major proteins
contained in faeces.
In Study II, protein quantification was based on MS1 data, in all other studies quantification was
performed by counting the identified spectra per comparative group e.g. specific phylum. Quantification
based on MS1 peaks might be more robust for identifying differences in protein abundances between
samples compared to spectral counting. However, applying the MS1 based approach in metaproteomics
generates two challenges. The large protein diversity across metaproteomes poses a challenge to the aligning
and normalizing algorithms of software for quantifying MS1 based data. Furthermore, the MS1 based
approach relies on assigning peptides to proteins; some proteins like glutamate-dehydrogenase are
conserved. It is very challenging to resolve several hundred proteins forming one protein cluster at the single
protein level. Today’s increased computer power make it more realistic that the standard application of MS1
based comparison in metaproteomics will come in the near future.
5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time
Having developed methods and strategies for faecal metaproteome analysis, the focus of this work was to
describe the proteins found in healthy individuals and to create a healthy metaproteomic baseline. Health is a
complex state defined by the WHO “as state of complete physical, mental and social well-being and not
merely the absence of disease or infirmity”. In Study III, people were asked to grade their health (“good”,
“average”, “poor”). Two individuals reported that they had average health but all others reported good
health. In the other studies, the participants were not asked to classify their health.
In Study III, in which samples of the largest healthy cohort studied for their metaproteome until now
were analysed, 103 microbial KOs were found in each of the 16 individuals at least once. This can be
considered as the functional core microbiome. The majority (62%) of these could be mapped on the KEGG
metabolic pathways (Figure 5). These core metabolic functions, which were mainly involved in carbohydrate
and amino acid, and energy metabolism, represent the backbone of the intestinal ecosystem. The coverage
appears less compared to what was found in Study II. This can be explained by the fact that in Study II only
three individuals were studied, whereas in Study III 16 individuals were studied. Similarly as observed for
phylogenetic analysis and metagenomes, the larger the cohort is, the smaller the number of shared species or
genes/functions might be (Jalanka-Tuovinen et al. 2011; Karlsson et al. 2014).
Results and Discussion
32
Fig
ure
5.
Met
abo
lic
pat
hw
ays
rep
rese
nti
ng
th
e m
etap
rote
om
ic c
ore
bas
ed o
n K
O a
ssig
nm
ents
as
asse
ssed
in
16
ad
ult
s (S
tud
y I
II)
(Let
un
ic e
t al
. 2
008
).
Results and Discussion
33
5.2.1 Individuality of the faecal metaproteome (II and III)
One of the main findings of this work was that the intestinal metaproteome of a human being is highly
individual. This extends previous findings on the personalized composition of the intestinal microbiota that
has been reported extensively (Li et al. 2014). It was observed that the individuality of the metaproteomes
remained both over the long term (one half to one year; based on MS1 level, Study II) and the short term (6
weeks with probiotic intervention; Study III). Moreover, the personalized functional microbiome could be
distinguished at both the peptide and KO level. This is an important finding as theoretically the individuality
at the peptide level could be explained by genetic differences leading to amino acid differences. However,
the observation that it was also found at the level of KOs, an abstract functional level, implies that there is a
functional significance of the individuality. The individual variation exists in the type and extent of the
cellular- and extracellular activities taking place in the intestinal ecosystem. These findings clearly go
beyond metagenomic findings. Classifying genes to high functional level categories like COG families has
revealed a rather homogenous distribution across individuals (Turnbaugh et al. 2009; Human Microbiome
Project Consortium, 2012). However, most metagenomic studies have functionally not delved very deep into
functional richness due to its obvious complexity.
5.2.2 Main microbial functionalities (II and III)
The main functionalities and special functions of the human intestinal microbiota explored by
metaproteomics in this work clearly represent specific features of the ecosystem. Many proteins being
involved in energy conversion were identified; these ranged from degrading both the initial food substrates,
like arabinose isomerase for plant material degradation, and host derived, like fucosyl-isomerase to degrade
fucose, up to the formation of intermediates such as butyrate and propionate by enzymes like butyryl-CoA
dehydrogenase and methylmalonly CoA mutase. Clearly, generating energy for the microbial metabolism
especially from carbohydrate sources appeared to be a characteristic of the intestinal microbiota, confirming
earlier (meta-)genome-based observations.
The maintenance of a redox-equilibrium is a very important task in an anaerobic environment.
Rubrerythin was detected in the metaproteomes analysed in Study II and this protein could be involved in
reducing oxidative damage by reducing peroxide to water. In addition, glutamate dehydrogenase was
identified from several hundred microbes (Studies II & III). This protein may act as electron sink (Study II).
The fact that high amounts of glutamate dehydrogenase were found highlights its functional importance as
determined in a computational approach in which the functions found in a human derived metagenome were
compared to metagenomes of other ecosystems (Ellrott et al. 2010). Furthermore, this demonstrates the
capability of this metaproteomic approach to capture functional information from several hundreds of
different species.
The synthesis of vitamins like folate and vitamin B12 could be identified since it was possible to detect
enzymes like formyltetrahydrofolate synthetase and cobalamin biosynthesis protein, respectively, in the
metaproteomes analysed in Studies II and III. This does not imply that these vitamins are necessarily
available to the host. However, it indicates that these vitamins may be required by the intestinal ecosystem.
Interestingly, patients with IBD had a higher prevalence of deficiencies of folate and vitamin B12 (Bermejo
et al. 2013). It is not to be excluded that the microbiota has a role in this phenomenon. In general, vitamin
B12 was identified earlier as important factor in the intestine as both the host and the majority of intestinal
microbes are requiring this vitamin, recently reviewed (Degnan et al. 2014).
In general, in Study III it was possible to identify temporal and individual differences in all of the
described functions. These identified microbial functions are evidence for the various metabolic activities
taking place in the intestine and the changes occurring upon daily variations. Those proteins might be used as
marker for the integrity of the microbial community.
Results and Discussion
34
In Study III, study participants had consumed a milk-based fruit drink with or without L. rhamnosus
GG. Although, immunological experiments had revealed an effect by the consumption of L. rhamnosus GG
(Kekkonen et al. 2008a), no changes to the functionalities were observed. A possible explanation for this
might be that there were more pronounced changes in the ileum which were diluted in faecal material and
could not be detected with the applied approach.
5.3 Host-microbiota interactions (II-IV)
The developed approach allowed identifying also human proteins. These constituted approximately 15% of
the identified spectra. Identification of bacterial and host proteins in parallel resulted in the identification of
various indications for host-microbiota interactions. Flagellins were detected in most of the intestinal
metaproteomes (Studies I-IV). They are microbial proteins that make the cell motile. Moreover, flagellins are
also known ligands of human toll-like receptor 5 and trigger the NFkB pathway (Vijay-Kumar et al. 2008).
In Study III with the pro-inflammatory serine/threonine protein kinase 4 a human protein could be found
significantly positively correlating with flagellins. This may be an example of a host-microbe interaction i.e.
the flagellins might be triggering the expression of serine/threonine kinase 4. Flagellins as component of the
intestinal metaproteome were also found in a recent case study identifying the faecal proteins of one lean and
one obese adolescent (Ferrer et al. 2013). In Study II, Eubacterium rectale and Roseburia intestinalis were
identified as the possible sources of the flagellins. New intestinal species possessing the ability to produce
flagellins have been recently identified, such as Lactobacillus ruminis (Neville et al. 2012). Elevated levels
of flagellin-antigens have been found in patients with Crohn’s disease and post-infective IBS (Schoepfer et
al. 2008). However, since they were detected also in healthy individuals, the presence of flagellins does not
necessarily need to lead to inflammation. On the contrary, these microbial proteins could also stimulate the
host in a beneficial way (Cullender et al. 2013).
Human intestinal alkaline phosphatase is another protein that is involved in the inflammatory response.
This protein was identified in the human metaproteome of several subjects (Studies II-IV). It was found in
high amounts in obese and was one of the NOGs explaining the separation of samples between the obese and
the non-obese group. It contributed up to 50% of the identified spectra with human origin. However, also in
non-obese individuals, the levels of this enzyme could be high. The human IAP can bind bacterial
lipopolysaccharides (LPS) and may therefore be controlling LPS-induced inflammation. Intestinal alkaline
phosphatase has been used as a therapeutic agent for several conditions such as sepsis and is a well-studied
enzyme. Several factors have been identified that affect its expression levels and enzyme activities, e.g. the
type of blood group or pH (Estaki et al. 2014). Intestinal alkaline phosphatase might represent a good marker
for regulative host processes allowing the gut to cope with changing environmental conditions.
The intestinal tract is an ecosystem rich in food derived proteins and host and microbial derived
proteins. Regulation of protein degradation is demonstrated by the identification of several proteinases but
also proteinase inhibitors. Several decades ago, tryptic activity was detected in faecal samples derived from
infants and therefore proteinases and their inhibitors were proposed to have a role in balancing intestinal
functions (Norin et al. 1985). As discussed in the literature summary of this thesis, proteinases are also
supposed to have another role in addition to proteolysis and it may be via these secondary roles that they
contribute to a certain disease phenotype.
Taken together, a combination of these proteins might be used as markers to monitor the health state of
the intestinal ecosystem i.e. they could be used as indicators of an intact or perturbed system.
Bacterial proteins degrading host derived glycans have been identified in many subjects (Study III), such
as fucose isomerase which degrades mammalian derived glycans. In addition, also the host enzyme trehalase
was identified; this is an enzyme that degrades trehalose, a disaccharide that is formed in bacteria upon stress
to dehydration and acts as a compatible solute. These findings further illustrate the mutualistic relationship
between the host and its microbes.
Results and Discussion
35
One example of indirect host-microbe interaction was observed in Study III in which two individuals
exhibited relatively lower levels of alpha amylase and high levels of Prevotella ssp. peptide counts in
comparison to the other individuals. This finding could indicate that these two individuals have consumed a
diet not dominated by starch but containing more complex carbohydrates enforcing the activity of Prevotella.
5.4 Comparing the metaproteome of lean and obese individuals (IV)
Within this work, the microbial composition and metaproteome of a cohort of 29 non-obese and obese were
analysed. In the samples deriving from the non-obese individuals, statistically significant higher levels of
Bacteroidetes 16S rRNA were found compared to the obese individuals. However, relatively more
Bacteroidetes peptides than 16S rRNA signals were obtained in the obese group compared to the non-obese
group.
It is unlikely that this is due to the greater faecal mass in obese individuals i.e. the relative share should
not be affected as protein and DNA amounts should be affected similarly in that case. One explanation could
be that there is an uncoupling between growth and activity, a well-known physiological phenomenon
recognized in industrially-utilized bacteria such as Lactococcus lactis cultured at low pH conditions (Goel et
al. 2015). It has been reported that the amount of SCFA were significantly higher in obese than in non-obese
subjects (Fernandes et al. 2014), and as a result the colonic pH values of the obese subjects are predicted to
be markedly lower. In addition, it was found that the growth rate of Bacteroides spp. was tightly controlled
and reduced at low pH and by the amount of SCFA (Duncan et al 2009). Hence, although the Bacteroides
spp. may simply be not growing to high abundance due to the low pH in the obese subjects, these bacteria
may still have very high activities.
Yet another explanation could be a technical bias due to the phylogenetic analysis approach used or an
incomplete coverage of the protein sequence database. Among the 957 cultured bacterial intestinal species
47% belong to the phylum Firmicutes and 10% to Bacteroidetes (Rajilić-Stojanović and de Vos, 2014). Out
of 398 strains at IMG/JGI for which there is an assigned origin in the digestive tract, 19% belong to the
phylum Bacteroidetes (Figure 6). It is possible that the 16S rRNA sequences of the “obese” Bacteroidetes
species are not fully covered by the oligonucleotide probes on the microarray. Alternatively, these “obese”
Bacteroidetes species may be overrepresented in the protein databases used. The effect of ethnicity on the
Bacteroidetes levels was revealed in a study on the microbiota and metagenome of obese compared to lean
individuals (Turnbaugh et al. 2009). Lean individuals with European ancestry had 1.1 times more
Bacteroidetes than their obese counterparts whereas the ratio was 1.3 for individuals of African ancestry.
This might be due to the fact that the lean individuals of African ancestry had significantly more
Bacteroidetes and less Firmicutes than the individuals of European ancestry. These findings exemplify the
difficulties which may arise in comparing groups e.g. according their BMI values.
The metaproteomes of the non-obese and obese group differed in a statistically significant way as shown
by a principle coordinate analysis of the identified NOGs. This demonstrated that there are considerable
differences in functionalities between the non-obese and obese group. Among the 25 NOGs which were
found to explain the separation between samples from the obese and the non-obese group, based on
baggedRDA analysis, several NOGs were involved in carbohydrate degradation.
Furthermore, the insulin sensitivity of the subjects, as measured with the HOMA index, correlated
negatively with a specific bacterial two component regulator. This is a further example of the link between
microbiota and the host, and it may represent a possible starting point to clarify the reasons of the metabolic
changes occurring in obesity.
Results and Discussion
36
Figure 6. Microbial phyla distribution of 1,847 microbial genomes at IMG/JGI with origin Homo sapiens (IMG/JGI
HS), 398 of those are from the sub-system digestive tract (IMG/JGI HS DT), and of 1057 cultured intestinal phylotypes
(1057). (EA = Euryarchaeota)
5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis
(II-IV)
The recent expansion of the protein databases as consequence of the efforts ongoing in sequencing microbial
genomes has provided a critical amount of microbial proteins sequences originating from a diverse range of
species which made referring a LCA to a peptide possible. Furthermore, phylogenetic microarray data and
metaproteomic data were compared to obtain a better understanding of which microbes were present and
what was their metabolic activity.
When comparing the phylum distributions measured by the phylogenetic microarray to the spectral
count data of phylum-specific peptides both agreeing and disagreeing results were obtained (Figure 7).
Differences in the averages per phylum were observed between Studies II, III and IV. Biological differences
in the study cohorts, the storage of the faecal material (Table 5; Bahl et al. 2012) as well as differences in the
analytical steps (Table 6) might have contributed to the observed differences.
Figure 7. Barplot of the distribution of the main bacterial phyla across three studies assessed by phylogenetic
microarray (16S) and metaproteomics analyses (pep).
0%
20%
40%
60%
80%
100%
16S pep 16S pep 16S pep 16S pep
Non-obese Obese
Study II Study III Study IV
Verrucomicrobia
Proteobacteria
Actinobacteria
Bacteroidetes
Firmicutes
Results and Discussion
37
The main functionalities were determined for the three major phyla found in the human intestine,
Firmicutes, Bacteroidetes, and Actinobacteria. Firmicutes appeared to be dominantly active in carbohydrate
metabolism and butyrate production, and Actinobacteria in sugar metabolism. The proteins produced by
Bacteroidetes showed a more even distribution of several functions. These findings were in agreement with
previous theoretical knowledge. Both in Study II and III, a large fraction of the identified peptides could not
be assigned to proteins with any known function. This implies that the functionalities of Bacteroidetes are
underrepresented in terms of cultured and sequenced species and hence need to be analysed more closely.
In Study III, peptides were filtered according their phylogenetic origin and the KOs of these specific
peptides were further studied. Specific focus was on the functionalities of Faecalibacterium prausnitzii,
Prevotella ssp. and Methanobrevibacter as these organisms vary greatly in their abundance and encode
distinct functions within the gut ecosystem. For F. prausnitzii for example glucuronate-isomerase was
identified implying F. prausnitzii’s involvement in carbohydrate metabolism. According to the KOs that
were identified based on Prevotella peptides these microbes contributed to the propionate synthesis in the
analysed metaproteomes as methylmalonyl-CoA mutase was identified. In addition, proteins involved in the
methane production derived from Methanobrevibacter were identified.
The bias which remains in finding a LCA for a peptide and drawing a conclusion on these findings is
that obviously not all peptides can be assigned. In this work the rate of assigning a LCA to a peptide was
around 70%. An absolute peptide matching is required for assigning a phylogenetic origin to a peptide.
However, this is impaired by the fact that many protein sequences still remain to be identified. Therefore, the
results have to be seen in the light of present knowledge both in the form of oligonucleotides represented on
the microarray as well as protein sequence entries in UniprotKB.
In Study II, where aligned and normalized MS1 data for proteins were available the proteins could be
correlated with the microarray data. Correlations for example between flagellin and Roseburia intestinalis
and Eubacterium rectale could be identified. These findings could be verified by a BLAST search of the
flagellin sequences. This highlights the fact that the integration of microarray and metaproteomic data makes
it possible to identify the organisms responsible for the production of a specific protein.
In Study I, it was noted that one of the individuals exhibited a high number of Akkermansia reads and
similarly a high number of Akkermansia derived peptides could be identified. In the other individual, hardly
any Akkermansia peptides were obtained. Akkermansia is so far the only member of the deeply rooted
phylum Verrocumicrobia found in human GIT and therefore a good example for comparing 16S rRNA and
peptide data.
5.6 Reflections on study cohorts
All the study participants but one (diabetes type II) were individuals without (chronic) disease. They had not
taken any antibiotics at least three months prior to sampling and did not suffer from any gastro-intestinal
diseases. Due to sample availability issues, some skewed age distribution was observed in the probiotic
intervention trial (Study III). In the obesity study (Study IV) there was a significant age difference between
the non-obese and the obese group. Generally, due to the small size of the cohorts, there was no even
distribution between male and female study participants.
With the exception of Study II which was a proof of principle study, all samples were derived from
collaborations for which a number of meta-data was already available. Therefore, there was no influence on
the availability of meta-data such as health-questionnaires or food diaries which were not collected. The
absence of food diaries impaired controlling for the impact of diet that can be significant. For instance,
without dietary information, it is difficult to account for the individual variation in some of the results
obtained here, such as why flagellins are detected in some individuals and not in others. Similarly, in Study
IV it remains unclear whether there is a per se difference in metabolism or if a difference in diet could
explain the separation of the intestinal metaproteome from obese and non-obese subjects.
Results and Discussion
38
In Study III, study participants filled a symptom diary that indicated showing normal health
perturbations such as loose stools or coughing.
There is a learning process encountered when arranging this kind of study. Studies with human subjects
depend very much on their willingness to participate and this may be lower in healthy subjects than in
individuals with a disease who may hope that their participation will confer a health benefit. Since scientific
interest in intestinal microbiota is still increasing, one can predict that larger studies also collecting faecal
material or other digestive tract compartments will be conducted in the future.
Conclusions and Future Perspectives
39
6. CONCLUSIONS AND FUTURE PERSPECTIVES
Microbiology has largely moved from light microscopy to what can be defined as “molecular microscopy”.
By determining the sequences of 16S rRNA genes or metagenomes, a much larger variety of microbial
communities has been discovered than ever would have been expected by looking at a microscopic picture or
analysing colonies on a plate. The continuous development of both proteomics and the human microbiome
field is reflected in the analytical capabilities of the different mass spectrometers used in this work, the
development of available algorithms and other tools, and the constant increase in the numbers of microbial
genes identified from the human intestine.
The immense increase in genomic information of the intestinal microbiota has greatly facilitated studies
at the protein level. However, the tremendous growth of sequence information has also hampered the
potential identifications because of the great number of variations. Hence, it underlined the need for the
development of new data analysis solutions. Proteomic software, such as is available for de novo sequencing,
may further be developed to identify peptides in complex mixtures without the need for gene knowledge.
Moreover, almost each year of this thesis work, an updated and faster or better mass spectrometer was
introduced.
The metaproteomic approach devised here is a detailed description of one solution for assessing the
proteins contained in faecal material; as discussed, several other approaches are feasible. With regard to the
sample preparation, the development in metaproteomics has been parallel to that encountered in
metagenomics. First, several different approaches have been proposed and now a systematic comparison is
being undertaken (Salonen et al. 2010; Lozupone et al. 2013; Franzosa et al. 2014). An important point to
keep in mind is that we are at the beginning of understanding the intestinal microbiome in its entirety by
applying methods producing complex data-sets. These present studies highlighted the variety and complexity
of the intestinal microbiome. After these initial studies, a refinement and homogenization of the analytical
tools has started. In the field of metaproteomics, a systematic comparison of sample preparation
methodologies has still to emerge and there are many important questions needing to be addressed. What is
the effect of sampling? What is the effect of freezing and is there an effect of storage time? Which is better:
using a bacterial pellet or utilizing crude faecal material? How should the lysis be performed: by mechanical
disruption, chemical lysis, or a combination of these techniques? What is the best protein-fractionation
method? In one study, the question of appropriate study material has now been addressed and the results
indicate that by using a bacterial pellet one can recover more microbial proteins but with a bias in the
recovered phyla (Tanca et al. 2015). It is to be expected that further methodological harmonization based on
a comprehensive evaluation will accelerate the speed in acquiring new knowledge about the intestinal
microbiota and its involvement in health and disease.
Another approach that may be important in the future is the use of stable isotope probing. This approach
can be coupled to dietary interventions with food components labelled with stable isotopes; this would make
it possible to analyse trophic chains. Similarly, as it has been done for 16S rRNA analysis in an intestinal
model and proteins in aquatic model systems, this technique could be applied to follow the human-microbial
interactions in humans based by measuring the incorporation rates of these substrates into proteins.
Two solutions for coping with protein inference that complicates quantitative comparisons between
samples were applied. In Study II grouping proteins according their relatedness based on at least one shared
peptide was used to resolve PSM results on the protein level. Another solution was devised by assigning
peptides to functional groups and performing quantitative comparisons on that level. However, what is still
needed is to define the functional-depth at which meaningful differences between treatment/health-state
groups are found; if this is the individual protein deriving from a certain strain or species, a group of proteins
sharing same peptides or proteins grouped to a specific function. Functional grouping of proteins as provided
by the orthologous gene databases such as COG or eggNOG or the recently developed GMM are very useful
in providing an overview of the functions taking place in a very complex ecosystem. One step further to the
Conclusions and Future Perspectives
40
defining of GMMs, grouping KOs to intestinal relevant functional categories, would be environment-specific
annotation of human intestinal metagenomes. This requires experimental proofing of the predicted functions
in combination with network analyses to test whether the predicted functions fit into the theoretical model.
There is an overall awareness of the gene richness encoding for a large variety of functions of our
microbes; and their importance for a multitude of body functions. However, there is still much to be learned
of our microbes and us, in a world with usually a surplus of every food item. The mechanisms of interactions
between host and microbiota are poorly understood. In the case of energy metabolism, recent research has
opened up the possibly involved molecules and even genes. Bile acids are depending on the microbiota
differently modified and these regulate the expression of the fxr gene, responsible for regulating host bile
acid production. We need to concentrate again more on microbial physiology. Future efforts are to be taken
to get to know yet un-cultured microbes and learn about their functions. Another possibility, closer to the in
vivo situation, is to fractionate the intestinal microbes into distinct subsets, e.g. according to surface
structure, and then to analyse their proteins.
There is a large unknown gap between the potential and real functionalities of the intestinal microbiota;
only half of the sequenced genes in metagenomic studies can be mapped to a gene homologue with a known
function. The numbers of proteins detected in one faecal metaproteomics experiment are several factors of
magnitude less than the theoretical number that could be detected. Some methods for fractionating the
metaproteome have been described above. A further option for better capturing the full variety of proteins
would be to utilize peptide ligand libraries as suggested recently (Righetti et al. 2015). Peptide ligand
libraries could help to identify the less abundant proteins.
Due to the multitude of factors affecting human metabolism and the composition and functions of our
microbes, careful metadata collection, as well as phylogenetic and functional measurements, are required if
we are to make meaningful associations. Data generated in such a way can be used to make predictions based
on profiling data and it can be applied for diagnosis of an individual’s health state or the decision what would
be the most successful treatment. As shown in this work, we have a personalized metaproteome and there is
temporal variation of its functions. By conducting longitudinal studies, well-controlled for possible
influencing factors, we may be able to determine which microbial set-ups are associated with which and
which effectors change the property and if so, to which extent.
Although the main purpose of metaproteomics is not to identify which organisms are present in the
microbial community but instead to study its function, in this work the method of assigning a phylogenetic
origin to a peptide was found useful in the handling of such complex data. Sub-setting the identifications into
host and bacterial proteins revealed component specific proteins.
Phylogenetic analyses complemented the metaproteomic data in this work. Although both
sequencing and proteomics techniques developed tremendously during the course of this work, biases from
both sides remain, complicating the one-to-one comparison. Until these biases are further tackled, any
comparisons have to be viewed as approximations. A major divergence was observed between phylogenetic
microarray analysis and metaproteomics for members of the genus Bacteroides. Whereas on the 16S rRNA
level Bacteroides spp. were apparently significantly less abundant in obese individuals, relatively more
Bacteroides peptides were identified from the obese individuals. While the simple explanation is that the
Bacteroides spp. are more active in the obese subjects, the question arises why they are then not simply
outgrowing and become more abundant. Consequential studies may provide further evidence that there is an
uncoupling between growth and activity.
There is a perceived need for studying the proteins in the upper regions of the GIT in addition to the
studies so far mostly focused on faecal material. One study used lavage to sample proteins along the intestine
(Li et al. 2011). However, this study focused on extracellular proteins and no attempts were made to lyse the
microbes that could have been present. Sampling in a way comparably to the one carried out in a rat study, in
which the metaproteome of different intestinal compartments were analysed (Haange et al. 2012), would
Conclusions and Future Perspectives
41
contribute to describe in humans the spatial functional differences along – from proximal to distal, and across
the intestinal tract – from epithelium to lumen.
We have learned by using techniques of modern molecular biology and keen interest in the ongoing
activities inside of our body that we require our microbes and our microbes require us for keeping a healthy
body condition. Careful study of the vast information on the intestinal microbiota and expanding it with
targeted new studies in a team of clinicians, microbiologists, nutrition scientists and bioinformaticians, will
sharpen our insights of the complex mechanisms ongoing inside us.
Acknowledgements
42
7. ACKNOWLEDGEMENTS
This thesis work was carried out at the Department of Veterinary Biosciences, University of Helsinki and the
Finnish Centre of Excellence in Microbial Food Safety Research. It was part of the TEKES project
“Innovations for Intestinal Health” implemented in the Faculty of Veterinary Medicine. This work was also
supported by the Academy of Finland and the Doctoral Programme in Food Chain and Health. A special
thank you to Dean Professor Antti Sukura and the Head of the Department Professor Airi Palva for providing
state-of-the-art facilities; everything needed was available.
I want to close this work by answering one question I was asked several times, especially in the beginning of
this work: Why Finland? The reason is simple - I got this project here which I felt fitted me, and which
privileges me to thank all of you to whom I am grateful for the best imaginable support:
My team of three supervisors: You displayed trust in me by giving me this project. You also believed in me
that I would finish it. Airi, I am grateful that I always could enjoy your support. Kiitos kovasti ja anna
anteeksi että en puhu suomea hyvin! Willem, your ideas were a very important driver of the project and your
enthusiasm helped through difficult times. Your positive challenging attitude allowed me to grow. Anne, you
have had many roles in this project: supervisor, office companion, friend. You have supported me with help
on a very broad scale. Thank you!
Koen Venema and Kirsi Savijoki are thanked for their critical comments on my thesis. You gave me the
chance to think through some aspects one more time.
I want to thank Ewen MacDonald for his kind suggestions to improve the English of this thesis.
This project produced some external hard-drives full of data which could only be handled with the help of
our bioinformatics J-forces. Janne and Jarkko – both of you always had a patient explanation of R and
statistics, now thanks to your pedagogical skills I’m happily using basic R-commands routinely. Jarkko - you
got me started with problem-solving of R-scripts and the knowledge that it is usually the way of reading the
data which causes the problem not the functions. Jarkko and Jarmo - you have been handling my huge tables,
solved several software issues and contributed substantially to the data analysis in the second half of the
project. Thank you!
This project allowed me to collaborate with many people both abroad and in Finland whom I want to thank
for their contributions, especially for the provision of such fine facilities (state of the art mass-spectrometers)
and the analysis tools so essential in this work: Sjef - for substantial help in getting the project started, Koos
and Peter - for the joint writing of the first manuscript of this thesis. Ilja - you not only conducted the LC-
MS/MS measurement, you even provided a courier service relating to the work for the second article. Thank
you also for the various discussions on technical aspects of LC-MS/MS.
I want to thank Jaana Mättö for making possible the collaboration with the Red Cross and for being a
member of my PhD follow-up group.
After mostly travelling to get my samples into the LC-MSMS, I finally obtained a very local collaborator
thanks to you Salla and Markku. Markku - thank you also very much for supporting me in your role as a
follow-up group member and posing some new proteomics challenges.
Acknowledgements
43
Mark - thank you very much for giving the project a Bioinformatics boost; thank you also for the numerous
(non-)proteomics chats.
I want to thank all the numerous co-authors not named individually. Thank you very much for your input.
I’m grateful for the travel financing that I received, both to communicate with people from the proteomics
and microbiology community.
Many people of the Finnish proteomics community are thanked for basic practical support.
This thesis is here also due to the efforts of many people, efforts made before I even started this project: The
supervisors of my Bachelor’s and Master’s thesis, especially Professor Hannelore Daniel and Professor Karl-
Heinz Engel, guided my first scientific steps; your inspiration and knowledge transfer planted the seed that I
should proceed to study for a PhD; the time at the NRC and people there were very important in giving me a
foundation in proteomics allowing me to start this thesis work. Thank you!
All the peer PhD-students are thanked for sharing the burdens and joys of doing a PhD. Especially Lotta for
always having a very good reason to come to Turku and making me familiar with Mölkky, Jing for all the
chats about science and life in general and your enormous scientific enthusiasm, and Katri for your very
positive spirit.
Outi - you have been an un-replaceable lab-assistance and you taught me how to give instructions in the right
way. Thank you for the Mökki experience and socialising me in the first part of the project.
The Mikro people are thanked for the chats during lunch and coffee breaks as well as the day-to-day support.
I want to thank the FiDiPro steering group for giving me the opportunity for periodical reflection of my
work.
This thesis required a lot of computer resources. Timo, vielen Dank für die stets prompte Hilfe in all den
Computerschwierigkeiten und dafür, dass du mir eine Software nach der anderen installiert hast. Vielen
Dank für die deutschen Sprüche!
Despite a language barrier once in a while there weren’t too many bureaucratic hurdles; Maija, Ritva and
Tarja - thank you for the smooth handling of these matters and the language support. Laila, thank you for the
help related to graduate school matters. Tiina - thank you very much for your kind help with guiding me
through the dissertation process.
I want to thank the graduated doctors for proving to me that graduation is possible. Especially Tanja and
Jonna are thanked for answering all my questions about how to get the thesis ready and the numerous
simultaneous translations during all these years.
All the late-night and weekend-workers are thanked for giving some background noise during the more
lonely times in the lab.
Katrin und Philipp mit Felix und Isabella: Aus unseren wöchentlichen Lauftreffs sind (Halb-)Jahres
Familientreffen geworden. Danke für eure Freundschaft!
Acknowledgements
44
Gabi und Robert: Danke für eure Unterstützung aus der Ferne und dass wir uns in München immer
willkommen fühlen.
Mama, Papa und Christoph: Der Ort meines Doktorvorhabens ist nicht der naheste zu meiner
niederbayerischen Heimat. Trotzdem habt ihr mich in meinem Vorhaben stets unterstützt wie in all meinen
Wissens-Bestrebungen. DANKE!
Unaussprechlicher Dank an Jochen und Iida: Mit euch bin ich da, wo ich sein mag und die, die ich sein mag.
Helsinki, April 2015
Carolin Kolmeder
8. REFERENCES
Ahlroos, T., Tynkkynen, S., 2009. Quantitative
strain-specific detection of Lactobacillus
rhamnosus GG in human faecal samples by
real-time PCR. J. Appl. Microbiol. 106, 506-
514.
Allmer, J., 2011. Algorithms for the de novo
sequencing of peptides from tandem mass
spectra. Expert. Rev. Proteomics 8, 645-657
Arsène-Ploetze, F., Bertin, P.N., Carapito, C.,
2014. Proteomic tools to decipher microbial
community structure and functioning.
Environ. Sci. Pollut. Res. Int.
Altschul, S.F., Madden, T.L., Schäffer, A.A.,
Zhang, J., Zhang, Z., Miller, W., Lipman,
D.J., 1997. Gapped BLAST and PSI-
BLAST: a new generation of protein
database search programs. Nucleic Acids
Res. 25, 3389-3402.
Arumugam, M., Raes, J., Pelletier, E., Le Paslier,
D., Yamada, T., Mende, D.R., Fernandes,
G.R., Tap, J., Bruls, T., Batto, J.M., et al.,
2011. Enterotypes of the human gut
microbiome. Nature 473, 174-180.
Bahl, M.I., Bergström, A., Licht, T.R., 2012.
Freezing fecal samples prior to DNA
extraction affects the Firmicutes to
Bacteroidetes ratio determined by
downstream quantitative PCR analysis.
FEMS Microbiol. Lett. 329, 193-197.
Barbhuiya, M.A., Sahasrabuddhe, N.A., Pinto,
S.M., Muthusamy, B., Singh, T.D.,
Nanjappa, V., Keerthikumar, S., Delanghe,
B., Harsha, H., Chaerkady, R., et al., 2011.
Comprehensive proteomic analysis of human
bile. Proteomics 11, 4443-4453.
Ben-Amor, K., Heilig, H., Smidt, H., Vaughan,
E.E., Abee, T., de Vos, W.M., 2005. Genetic
diversity of viable, injured, and dead fecal
bacteria assessed by fluorescence-activated
cell sorting and 16S rRNA gene analysis.
Appl. Environ. Microbiol. 71, 4679-4689.
Benndorf, D., Balcke, G.U., Harms, H., Von
Bergen, M., 2007. Functional metaproteome
analysis of protein extracts from
contaminated soil and groundwater. ISME J.
1, 224-234.
Bergström, A., Skov, T.H., Bahl, M.I., Roager,
H.M., Christensen, L.B., Ejlerskov, K.T.,
Mølgaard, C., Michaelsen, K.F., Licht, T.R.,
2014. Establishment of intestinal microbiota
during early life: a longitudinal, explorative
study of a large cohort of Danish infants.
Appl. Environ. Microbiol. 80, 2889-2900.
Bermejo, F., Algaba, A., Guerra, I., Chaparro, M.,
De-La-Poza, G., Valer, P., Piqueras, B.,
Bermejo, A., García-Alonso, J., Pérez, M.J.,
et al., 2013. Should we monitor vitamin B12
and folate levels in Crohn's disease patients?
Scand. J. Gastroenterol. 48, 1272-1277.
Berry, D., Stecher, B., Schintlmeister, A.,
Reichert, J., Brugiroux, S., Wild, B., Wanek,
W., Richter, A., Rauch, I., Decker, T., et al.,
2013. Host-compound foraging by intestinal
microbiota revealed by single-cell stable
isotope probing. Proc. Natl. Acad. Sci. U S A
110, 4720-4725.
Biagi, E., Nylund, L., Candela, M., Ostan, R.,
Bucci, L., Pini, E., Nikkilä, J., Monti, D.,
Satokari, R., Franceschi, C., et al., 2010.
Through ageing, and beyond: gut microbiota
and inflammatory status in seniors and
centenarians. PLoS One 5, e10667.
Booijink, C.C., El-Aidy, S., Rajilić-Stojanović,
M., Heilig, H.G., Troost, F.J., Smidt, H.,
Kleerebezem, M., de Vos, W.M., Zoetendal,
E.G., 2010. High temporal and inter-
individual variation detected in the human
ileal microbiota. Environ. Microbiol. 12,
3213-3227.
Bookstaver, P.B., Ahmed, Y., Millisor, V.E.,
Siddiqui, W., Albrecht, H., 2013.
Clostridium difficile: case report and concise
review of fecal microbiota transplantation. J.
S. C. Med. Assoc. 109, 62-66.
Breiman, L., 2001. Random forests. Machine
Learning 45, 5-32.
Cain, J.A., Solis, N., Cordwell, S.J., 2014. Beyond
gene expression: The impact of protein post-
translational modifications in bacteria. J.
Proteomics 97, 265-286.
Cantarel, B.L., Erickson, A.R., Verberkmoes,
N.C., Erickson, B.K., Carey, P.A., Pan, C.,
Shah, M., Mongodin, E.F., Jansson, J.K.,
Fraser-Liggett, C.M., et al., 2011. Strategies
for metagenomic-guided whole-community
proteomics of complex microbial
environments. PLoS One 6, e27173.
Cohen, D.P., Renes, J., Bouwman, F.G.,
Zoetendal, E.G., Mariman, E., de Vos, W.M.,
Vaughan, E.E., 2006. Proteomic analysis of
log to stationary growth phase Lactobacillus
plantarum cells and a 2-DE database.
Proteomics 6, 6485-6493.
Craig, R., Beavis, R.C., 2004. TANDEM:
matching proteins with tandem mass spectra.
Bioinformatics 20, 1466-1467.
Cullender, T.C., Chassaing, B., Janzon, A.,
Kumar, K., Muller, C.E., Werner, J.J.,
Angenent, L.T., Bell, M.E., Hay, A.G.,
Peterson, D.A., et al., 2013. Innate and
adaptive immunity interact to quench
microbiome flagellar motility in the gut. Cell
Host Microbe 14, 571-581.
Cummings, J. H., 1975. The colon: absorptive,
secretory and metabolic functions. Digestion
13, 232-240.
Cummings, J.H., Macfarlane, G.T., 1991. The
control and consequences of bacterial
fermentation in the human colon. J. Appl.
Bacteriol. 70, 443-459.
Cummings, J.H., Wiggins, H.S., Jenkins, D.J.,
Houston, H., Jivraj, T., Drasar, B.S., Hill,
M.J., 1978. Influence of diets high and low
in animal fat on bowel habit, gastrointestinal
transit time, fecal microflora, bile acid, and
fat excretion. J. Clin. Invest. 61, 953-963.
Daniel, H., Gholami, A.M., Berry, D.,
Desmarchelier, C., Hahne, H., Loh, G.,
Mondot, S., Lepage, P., Rothballer, M.,
Walker, A., et al., 2014. High-fat diet alters
gut microbiota physiology in mice. ISME J.
8, 295-308.
David, L.A., Maurice, C.F., Carmody, R.N.,
Gootenberg, D.B., Button, J.E., Wolfe, B.E.,
Ling, A.V., Devlin, A.S., Varma, Y.,
Fischbach, M.A., et al., 2014. Diet rapidly
and reproducibly alters the human gut
microbiome. Nature 505, 559-563.
de Vos, W.M., de Vos, E.A.J., 2012. Role of the
intestinal microbiome in health and disease:
from correlation to causation. Nutr. Rev. 70,
S45-S56.
Degnan, P.H., Taga, M.E., Goodman, A.L., 2014.
Vitamin B 12 as a Modulator of Gut
Microbial Ecology. Cell Metab. 20, 769-778.
Derrien, M., Vaughan, E.E., Plugge, C.M., de
Vos, W.M., 2004. Akkermansia muciniphila
gen. nov., sp. nov., a human intestinal mucin-
degrading bacterium. Int. J. Syst. Evol.
Microbiol. 54, 1469-1476.
Dewulf, E.M., Cani, P.D., Claus, S.P., Fuentes, S.,
Puylaert, P.G., Neyrinck, A.M., Bindels,
L.B., de Vos, W.M., Gibson, G.R., Thissen,
J.P., et al., 2013. Insight into the prebiotic
concept: lessons from an exploratory, double
blind intervention study with inulin-type
fructans in obese women. Gut 62, 1112-
1121.
Dimidi, E., Christodoulides, S., Fragkos, K.C.,
Scott, S.M., Whelan, K., 2014. The effect of
probiotics on functional constipation in
adults: a systematic review and meta-analysis
of randomized controlled trials. Am. J. Clin.
Nutr. 100, 1075-1084.
dos Santos, F.B., de Vos, W.M., Teusink, B.,
2013. Towards metagenome-scale models for
industrial applications—the case of Lactic
Acid Bacteria. Curr. Opin. Biotechnol. 24,
200-206.
dos Santos, V.M., Müller, M., De Vos, W.M.,
2010. Systems biology of the gut: the
interplay of food, microbiota and host at the
mucosal interface. Curr. Opin. Biotechnol.
21, 539-550.
Duncan, S.H., Louis, P., Thomson, J.M., Flint,
H.J., 2009. The role of pH in determining the
species composition of the human colonic
microbiota. Environ. Microbiol. 11, 2112-
2122.
Edgar, R.C., 2010. Search and clustering orders of
magnitude faster than BLAST.
Bioinformatics 26, 2460-2461.
Ellrott, K., Jaroszewski, L., Li, W., Wooley, J.C.,
Godzik, A., 2010. Expansion of the protein
repertoire in newly explored environments:
human gut microbiome specific protein
families. PLoS Comput. Biol. 6, e1000798.
Erickson, A.R., Cantarel, B.L., Lamendella, R.,
Darzi, Y., Mongodin, E.F., Pan, C., Shah,
M., Halfvarson, J., Tysk, C., Henrissat, B., et
al., 2012. Integrated metagenomics/
metaproteomics reveals human host-
microbiota signatures of Crohn's disease.
PLoS One 7, e49138.
Estaki, M., DeCoffe, D., Gibson, D.L., 2014.
Interplay between intestinal alkaline
phosphatase, diet, gut microbes and
immunity. World J. Gastroenterol. 20,
15650-15656.
Everard, A., Cani, P.D., 2013. Diabetes, obesity
and gut microbiota. Best. Pract. Res. Clin.
Gastroenterol. 27, 73-83.
Faraway, J.J., 2002. Practical Regression and
ANOVA using R. Published on the URL:
http://www.cran.rproject.org/doc/contrib/Far
away-PRA.pdf.
Faust, K., Sathirapongsasuti, J.F., Izard, J.,
Segata, N., Gevers, D., Raes, J.,
Huttenhower, C., 2012. Microbial co-
occurrence relationships in the human
microbiome. PLoS Computat. Biol. 8,
e1002606.
Feig, M.A., Hammer, E., Völker, U., Jehmlich,
N., 2013. In-depth proteomic analysis of the
human cerumen—a potential novel
diagnostically relevant biofluid. J.
Proteomics 83, 119-129.
Fernandes, J., Su, W., Rahat-Rozenbloom, S.,
Wolever, T., Comelli, E., 2014. Adiposity,
gut microbiota and faecal short chain fatty
acids are linked in adult humans. Nutr
Diabetes 4, e121.
Ferrer, M., Ruiz, A., Lanza, F., Haange, S.,
Oberbach, A., Till, H., Bargiela, R., Campoy,
C., Segura, M.T., Richter, M., et al., 2013.
Microbiota from the distal guts of lean and
obese adolescents exhibit partial functional
redundancy besides clear differences in
community structure. Environ. Microbiol.
15, 211-226.
Finn, R.D., Bateman, A., Clements, J., Coggill, P.,
Eberhardt, R.Y., Eddy, S.R., Heger, A.,
Hetherington, K., Holm, L., Mistry, J., et al.,
2014. Pfam: the protein families database.
Nucleic Acids Res. 42, D222-30.
Finucane, M.M., Sharpton, T.J., Laurent, T.J.,
Pollard, K.S., 2014. A taxonomic signature
of obesity in the microbiome? Getting to the
guts of the matter. PloS One 9, e84689.
Foell, D., Wittkowski, H., Roth, J., 2009.
Monitoring disease activity by stool
analyses: from occult blood to molecular
markers of intestinal inflammation and
damage. Gut 58, 859-868.
Franzosa, E.A., Morgan, X.C., Segata, N.,
Waldron, L., Reyes, J., Earl, A.M.,
Giannoukos, G., Boylan, M.R., Ciulla, D.,
Gevers, D., et al., 2014. Relating the
metatranscriptome and metagenome of the
human gut. Proc. Natl. Acad. Sci. U.S.A.
111, E2329-38.
Gaci, N., Borrel, G., Tottey, W., O’Toole, P.W.,
Brugère, J., 2014. Archaea and the human
gut: New beginning of an old story. World J.
Gastroenterol: WJG 20, 16062.
Galperin, M.Y., Makarova, K.S., Wolf, Y.I.,
Koonin, E.V., 2015. Expanded microbial
genome coverage and improved protein
family annotation in the COG database.
Nucleic Acids Res. 43, D261-9.
Gao, X., Pujos-Guillot, E., Martin, J., Galan, P.,
Juste, C., Jia, W., Sebedio, J., 2009.
Metabolite analysis of human fecal water by
gas chromatography/mass spectrometry with
ethyl chloroformate derivatization. Anal.
Biochem. 393, 163-175.
Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner,
L., Xu, M., Maynard, D.M., Yang, X., Shi,
W., Bryant, S.H., 2004. Open mass
spectrometry search algorithm. J. Proteome
Res 3, 958-964.
Georges, A.A., El-Swais, H., Craig, S.E., Li,
W.K., Walsh, D.A., 2014. Metaproteomic
analysis of a winter to spring succession in
coastal northwest Atlantic Ocean microbial
plankton. ISME J. 8, 1301-1313.
Gerritsen, J., Smidt, H., Rijkers, G.T., de Vos,
W.M., 2011. Intestinal microbiota in human
health and disease: the impact of probiotics.
Genes Nutr. 6, 209-240.
Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B.,
Turnbaugh, P.J., Samuel, B.S., Gordon, J.I.,
Relman, D.A., Fraser-Liggett, C.M., Nelson,
K.E., 2006. Metagenomic analysis of the
human distal gut microbiome. Science 312,
1355-1359.
Glatter, T., Ludwig, C., Ahrne, E., Aebersold, R.,
Heck, A.J., Schmidt, A., 2012. Large-scale
quantitative assessment of different in-
solution protein digestion protocols reveals
superior cleavage efficiency of tandem Lys-
C/trypsin proteolysis over trypsin digestion.
J. Proteome Res. 11, 5145-5156.
Goel, A., Eckhardt, T.H., Puri, P., Jong, A.,
Santos, F.B., Giera, M., Fusetti, F., de Vos,
W.M., Kok, J., Poolman, B., et al., 2015.
Protein costs do not explain evolution of
metabolic strategies and regulation of
ribosomal content. Mol. Microbiol.
Graf, D., Di Cagno, R., Fåk, F., Flint, H.J.,
Nyman, M., Saarela, M., Watzl, B., 2015.
Contribution of diet to the composition of the
human gut microbiota. Microb. Ecol. Health
Dis. 26.
Graham, C., McMullan, G., Graham, R.L.J., 2011.
Proteomics in the microbial sciences.
Bioeng. Bugs, 17-30.
Guthals, A., Bandeira, N., 2012. Peptide
identification by tandem mass spectrometry
with alternate fragmentation modes. Mol.
Cell. Proteomics 11, 550-557.
Haange, S., Oberbach, A., Schlichting, N.,
Hugenholtz, F., Smidt, H., von Bergen, M.,
Till, H., Seifert, J., 2012. Metaproteome
analysis and molecular genetics of rat
intestinal microbiota reveals section and
localization resolved species distribution and
enzymatic functionalities. J. Proteome Res.
11, 5406-5417.
Hampton-Marcell, J.T., Moormann, S.M., Owens,
S.M., Gilbert, J.A., 2013. Preparation and
metatranscriptomic analyses of host-microbe
systems. Methods Enzymol. 531, 169-185.
Hansen, S.H., Stensballe, A., Nielsen, P.H.,
Herbst, F., 2014. Metaproteomics:
Evaluation of protein extraction from
activated sludge. Proteomics 14, 2535-2539.
Harden, C.J., Perez-Carrion, K., Babakordi, Z.,
Plummer, S.F., Hepburn, N., Barker, M.E.,
Wright, P.C., Evans, C.A., Corfe, B.M.,
2012. Evaluation of the salivary proteome as
a surrogate tissue for systems biology
approaches to understanding appetite. J.
Proteomics 75, 2916-2923.
Hercberg, S., Galan, P., Preziosi, P., Bertrais, S.,
Mennen, L., Malvy, D., Roussel, A., Favier,
A., Briançon, S., 2004. The SU. VI. MAX
Study: a randomized, placebo-controlled trial
of the health effects of antioxidant vitamins
and minerals. Arch. Intern. Med. 164, 2335-
2342.
Huang, J., Wang, F., Ye, M., Zou, H., 2014.
Enrichment and separation techniques for
large-scale proteomics analysis of the protein
post-translational modifications. J.
Chromatogr. A. 1372, 1-17.
Human Microbiome Jumpstart Reference Strains
Consortium, Nelson, K.E., Weinstock, G.M.,
Highlander, S.K., Worley, K.C., Creasy,
H.H., Wortman, J.R., Rusch, D.B., Mitreva,
M., Sodergren, E., Chinwalla, A.T.,
Feldgarden, M., Gevers, D., Haas, B.J.,
Madupu, R., Ward, D.V., Birren, B.W.,
Gibbs, R.A., Methe, B., Petrosino, J.F.,
Strausberg, R.L., Sutton, G.G., White, O.R.,
Wilson, R.K., Durkin, S., Giglio, M.G.,
Gujja, S., Howarth, C., Kodira, C.D.,
Kyrpides, N., Mehta, T., Muzny, D.M.,
Pearson, M., Pepin, K., Pati, A., Qin, X.,
Yandava, C., Zeng, Q., Zhang, L., Berlin,
A.M., Chen, L., Hepburn, T.A., Johnson, J.,
McCorrison, J., Miller, J., Minx, P.,
Nusbaum, C., Russ, C., Sykes, S.M.,
Tomlinson, C.M., Young, S., Warren, W.C.,
Badger, J., Crabtree, J., Markowitz, V.M.,
Orvis, J., Cree, A., Ferriera, S., Fulton, L.L.,
Fulton, R.S., Gillis, M., Hemphill, L.D.,
Joshi, V., Kovar, C., Torralba, M.,
Wetterstrand, K.A., Abouellleil, A., Wollam,
A.M., Buhay, C.J., Ding, Y., Dugan, S.,
FitzGerald, M.G., Holder, M., Hostetler, J.,
Clifton, S.W., Allen-Vercoe, E., Earl, A.M.,
Farmer, C.N., Liolios, K., Surette, M.G., Xu,
Q., Pohl, C., Wilczek-Boney, K., Zhu, D.,
2010. A catalog of reference genomes from
the human microbiome. Science 328, 994-
999.
Human Microbiome Project Consortium, 2012.
Structure, function and diversity of the
healthy human microbiome. Nature 486,
207-214.
Huson, D.H., Weber, N., 2013. Microbial
community analysis using MEGAN.
Methods Enzymol. 531, 465-485.
Integrative HMP (iHMP) Research Network
Consortium, 2014. The Integrative Human
Microbiome Project: dynamic analysis of
microbiome-host omics profiles during
periods of human health and disease. Cell
Host Microbe 16, 276-289.
Jagtap, P., Goslinga, J., Kooren, J.A., McGowan,
T., Wroblewski, M.S., Seymour, S.L.,
Griffin, T.J., 2013. A two‐step database
search method improves sensitivity in
peptide sequence matches for
metaproteomics and proteogenomics studies.
Proteomics 13, 1352-1357.
Jalanka-Tuovinen, J., Salojärvi, J., Salonen, A.,
Immonen, O., Garsed, K., Kelly, F.M.,
Zaitoun, A., Palva, A., Spiller, R.C., de Vos,
W.M., 2014. Faecal microbiota composition
and host-microbe cross-talk following
gastroenteritis and in postinfectious irritable
bowel syndrome. Gut 63, 1737-1745.
Jalanka-Tuovinen, J., Salonen, A., Nikkilä, J.,
Immonen, O., Kekkonen, R., Lahti, L.,
Palva, A., de Vos, W.M., 2011. Intestinal
microbiota in healthy adults: temporal
analysis reveals individual and common core
and relation to intestinal symptoms. PLoS
One 6, e23035.
Jarocki, V.M., Tacchi, J.L., Djordjevic, S.P.,
2014. Non‐proteolytic functions of microbial
proteases increases pathological complexity.
Proteomics 15, 1075-1088.
Jensen, L.J., Julien, P., Kuhn, M., von Mering, C.,
Muller, J., Doerks, T., Bork, P., 2008.
eggNOG: automated construction and
annotation of orthologous groups of genes.
Nucleic Acids Res. 36, D250-4.
Jeong, K., Kim, S., Bandeira, N., 2012. False
discovery rates in spectral identification.
BMC Bioinformatics 13 Suppl 16, S2.
Jewett, M.C., Miller, M.L., Chen, Y., Swartz, J.R.,
2009. Continued protein synthesis at low
[ATP] and [GTP] enables cell adaptation
during energy limitation. J. Bacteriol. 191,
1083-1091.
Jiménez, E., Fernández, L., Marín, M.L., Martín,
R., Odriozola, J.M., Nueno-Palop, C.,
Narbad, A., Olivares, M., Xaus, J.,
Rodríguez, J.M., 2005. Isolation of
commensal bacteria from umbilical cord
blood of healthy neonates born by cesarean
section. Curr. Microbiol. 51, 270-274.
Jiménez-Girón, A., Ibáñez, C., Cifuentes, A.,
Simó, C., Muñoz-González, I., Martín-
Álvarez, P.J., Bartolome, B., Moreno
Arribas, M.V., 2014. Faecal metabolomic
fingerprint after moderate consumption of
red wine by healthy subjects. J. Proteome
Res. 14, 897-905.
Joslin, E.P., 1901. The Influence of Bile on
Metabolism. J. Exp. Med. 5, 513-525.
Jost, T., Lacroix, C., Braegger, C., Chassard, C.,
2014. Stability of the maternal gut
microbiota during late pregnancy and early
lactation. Curr. Microbiol. 68, 419-427.
Jumpertz, R., Le, D.S., Turnbaugh, P.J., Trinidad,
C., Bogardus, C., Gordon, J.I., Krakoff, J.,
2011. Energy-balance studies reveal
associations between gut microbes, caloric
load, and nutrient absorption in humans. Am.
J. Clin. Nutr. 94, 58-65.
Juste, C., Kreil, D.P., Beauvallet, C., Guillot, A.,
Vaca, S., Carapito, C., Mondot, S., Sykacek,
P., Sokol, H., Blon, F., et al., 2014. Bacterial
protein signals are associated with Crohn's
disease. Gut 63, 1566-1577.
Käll, L., Storey, J.D., Noble, W.S., 2008. Non-
parametric estimation of posterior error
probabilities associated with peptides
identified by tandem mass spectrometry.
Bioinformatics 24, i42-8.
Kam, S.Y., Hennessy, T., Chua, S.C., Gan, C.S.,
Philp, R., Hon, K.K., Lai, L., Chan, W.H.,
Ong, H.S., Wong, W.K., et al., 2011.
Characterization of the human gastric fluid
proteome reveals distinct pH-dependent
protein profiles: implications for biomarker
studies. J. Proteome Res. 10, 4535-4546.
Kanehisa, M., Goto, S., Kawashima, S., Okuno,
Y., Hattori, M., 2004. The KEGG resource
for deciphering the genome. Nucleic Acids
Res. 32, D277-80.
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M.,
Tanabe, M., 2012. KEGG for integration and
interpretation of large-scale molecular data
sets. Nucleic Acids Res. 40, D109-D114.
Karlsson, F.H., Tremaroli, V., Nookaew, I.,
Bergström, G., Behre, C.J., Fagerberg, B.,
Nielsen, J., Bäckhed, F., 2013. Gut
metagenome in European women with
normal, impaired and diabetic glucose
control. Nature 498, 99-103.
Karlsson, F.H., Nookaew, I., Nielsen, J., 2014.
Metagenomic Data Utilization and Analysis
(MEDUSA) and Construction of a Global
Gut Microbial Gene Catalogue. PLoS
Comput. Biol. 10, e1003706.
Kassinen, A., Krogius-Kurikka, L., Mäkivuokko,
H., Rinttila, T., Paulin, L., Corander, J.,
Malinen, E., Apajalahti, J., Palva, A., 2007.
The fecal microbiota of irritable bowel
syndrome patients differs significantly from
that of healthy subjects. Gastroenterology
133, 24-33.
Keiblinger, K.M., Wilhartitz, I.C., Schneider, T.,
Roschitzki, B., Schmid, E., Eberl, L., Riedel,
K., Zechmeister-Boltenstern, S., 2012. Soil
metaproteomics - Comparative evaluation of
protein extraction protocols. Soil Biol.
Biochem. 54, 14-24.
Kekkonen, R.A., Lummela, N., Karjalainen, H.,
Latvala, S., Tynkkynen, S., Jarvenpaa, S.,
Kautiainen, H., Julkunen, I., Vapaatalo, H.,
Korpela, R., 2008a. Probiotic intervention
has strain-specific anti-inflammatory effects
in healthy adults. World J. Gastroenterol. 14,
2029-2036.
Kekkonen, R.A., Sysi-Aho, M., Seppanen-Laakso,
T., Julkunen, I., Vapaatalo, H., Oresic, M.,
Korpela, R., 2008b. Effect of probiotic
Lactobacillus rhamnosus GG intervention on
global serum lipidomic profiles in healthy
adults. World J. Gastroenterol. 14, 3188-
3194.
Kim, J., An, H.J., Garrido, D., German, J.B.,
Lebrilla, C.B., Mills, D.A., 2013. Proteomic
analysis of Bifidobacterium longum subsp.
infantis reveals the metabolic insight on
consumption of prebiotics and host glycans.
PloS One 8, e57535.
Klaassens, E.S., de Vos, W.M., Vaughan, E.E.,
2007. Metaproteomics approach to study the
functionality of the microbiota in the human
infant gastrointestinal tract. Appl. Environ.
Microbiol. 73, 1388-1392.
Koonin, E.V., Fedorova, N.D., Jackson, J.D.,
Jacobs, A.R., Krylov, D.M., Makarova, K.S.,
Mazumder, R., Mekhedov, S.L., Nikolskaya,
A.N., Rao, B.S., et al., 2004. A
comprehensive evolutionary classification of
proteins encoded in complete eukaryotic
genomes. Genome Biol. 5, R7.
Koponen, J., Laakso, K., Koskenniemi, K.,
Kankainen, M., Savijoki, K., Nyman, T.A.,
de Vos, W.M., Tynkkynen, S., Kalkkinen,
N., Varmanen, P., 2012. Effect of acid stress
on protein expression and phosphorylation in
Lactobacillus rhamnosus GG. J. Proteomics
75, 1357-1374.
Korpela, K., Flint, H.J., Johnstone, A.M., Lappi,
J., Poutanen, K., Dewulf, E., Delzenne, N.,
de Vos, W.M., Salonen, A., 2014. Gut
microbiota signatures predict host and
microbiota responses to dietary interventions
in obese individuals. PloS One 9, e90702.
Kort, R., Caspers, M., van de Graaf, A., van
Egmond, W., Keijser, B., Roeselers, G.,
2014. Shaping the oral microbiota through
intimate kissing. Microbiome 2, 41.
Kortman, G.A., Raffatellu, M., Swinkels, D.W.,
Tjalsma, H., 2014. Nutritional iron turned
inside out: intestinal stress from a gut
microbial perspective. FEMS Microbiol.
Rev. 38, 1202-1234.
Koskenniemi, K., Koponen, J., Kankainen, M.,
Savijoki, K., Tynkkynen, S., de Vos, W.M.,
Kalkkinen, N., Varmanen, P., 2009.
Proteome analysis of Lactobacillus
rhamnosus GG using 2-D DIGE and mass
spectrometry shows differential protein
production in laboratory and industrial-type
growth media. J. Proteome Res. 8, 4993-
5007.
Koskenniemi, K., Laakso, K., Koponen, J.,
Kankainen, M., Greco, D., Auvinen, P.,
Savijoki, K., Nyman, T.A., Surakka, A.,
Salusjarvi, T., de Vos, W.M., Tynkkynen, S.,
Kalkkinen, N., Varmanen, P., 2011.
Proteomics and transcriptomics
characterization of bile stress response in
probiotic Lactobacillus rhamnosus GG. Mol.
Cell. Proteomics 10, M110.002741.
Kovatcheva‐Datchary, P., Egert, M., Maathuis,
A., Rajilić‐Stojanović, M., De Graaf, A.A.,
Smidt, H., De Vos, W.M., Venema, K.,
2009. Linking phylogenetic identities of
bacteria to starch fermentation in an in vitro
model of the large intestine by RNA‐based
stable isotope probing. Environ. Microbiol.
11, 914-926.
Kurokawa, K., Itoh, T., Kuwahara, T., Oshima,
K., Toh, H., Toyoda, A., Takami, H., Morita,
H., Sharma, V.K., Srivastava, T.P., et al.,
2007. Comparative metagenomics revealed
commonly enriched gene sets in human gut
microbiomes. DNA Res. 14, 169-181.
Lahti, L., Salonen, A., Kekkonen, R.A., Salojärvi,
J., Jalanka-Tuovinen, J., Palva, A., Orešič,
M., de Vos, W.M., 2013a. Associations
between the human intestinal microbiota,
Lactobacillus rhamnosus GG and serum
lipids indicated by integrated analysis of
high-throughput profiling data. PeerJ 1, e32.
Lahti, L., Torrente, A., Elo, L.L., Brazma, A.,
Rung, J., 2013b. A fully scalable online pre-
processing algorithm for short
oligonucleotide microarray atlases. Nucleic
Acids Res. 41, e110.
Lallès, J., 2014. Intestinal alkaline phosphatase:
novel functions and protective effects. Nutr.
Rev. 72, 82-94.
Lam, H., Deutsch, E.W., Eddes, J.S., Eng, J.K.,
Stein, S.E., Aebersold, R., 2008. Building
consensus spectral libraries for peptide
identification in proteomics. Nat. Methods 5,
873-875.
Land, M., Hauser, L., Jun, S., Nookaew, I., Leuze,
M.R., Ahn, T., Karpinets, T., Lund, O., Kora,
G., Wassenaar, T., et al., 2015. Insights from
20 years of bacterial genome sequencing.
Funct. Integr. Genomics, 1-21.
Lawley, T.D., Walker, A.W., 2013. Intestinal
colonization resistance. Immunology 138, 1-
11.
Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E.,
Hildebrand, F., Falony, G., Almeida, M.,
Arumugam, M., Batto, J., Kennedy, S., et al.,
2013. Richness of human gut microbiome
correlates with metabolic markers. Nature
500, 541-546.
LeBlanc, J.G., Milani, C., de Giori, G.S., Sesma,
F., van Sinderen, D., Ventura, M., 2013.
Bacteria as vitamin suppliers to their host: a
gut microbiota perspective. Curr. Opin.
Biotechnol. 24, 160-168.
Lee, J.H., Karamychev, V.N., Kozyavkin, S.A.,
Mills, D., Pavlov, A.R., Pavlova, N.V.,
Polouchine, N.N., Richardson, P.M.,
Shakhova, V.V., Slesarev, A.I., et al., 2008.
Comparative genomic analysis of the gut
bacterium Bifidobacterium longum reveals
loci susceptible to deletion during pure
culture growth. BMC Genomics 9, 247.
Le Maréchal, C., Peton, V., Plé, C., Vroland, C.,
Jardin, J., Briard-Bion, V., Durant, G.,
Chuat, V., Loux, V., Foligné, B., et al., 2015.
Surface proteins of Propionibacterium
freudenreichii are involved in its anti-
inflammatory properties. Journal of
proteomics 113, 447-461.
Letunic, I., Yamada, T., Kanehisa, M., Bork, P.,
2008. iPath: interactive exploration of
biochemical pathways and networks. Trends
Biochem. Sci. 33, 101-103.
Levandowsky, M., Winter, D., 1971. Distance
between sets. Nature 234, 34-35.
Li, J., Jia, H., Cai, X., Zhong, H., Feng, Q.,
Sunagawa, S., Arumugam, M., Kultima, J.R.,
Prifti, E., Nielsen, T., et al., 2014. An
integrated catalog of reference genes in the
human gut microbiome. Nat. Biotechnol. 32,
834-841.
Li, T., Chiang, J.Y., 2015. Bile acids as metabolic
regulators. Curr. Opin. Gastroenterol. 31,
159-165.
Li, X., LeBlanc, J., Truong, A., Vuthoori, R.,
Chen, S.S., Lustgarten, J.L., Roth, B., Allard,
J., Ippoliti, A., Presley, L.L., et al., 2011. A
metaproteomic approach to study human-
microbial ecosystems at the mucosal luminal
interface. PLoS One 6, e26542.
Lichtman, J.S., Marcobal, A., Sonnenburg, J.L.,
Elias, J.E., 2013. Host-centric proteomics of
stool: a novel strategy focused on intestinal
responses to the gut microbiota. Mol. Cell.
Proteomics 12, 3310-3318.
Liebler, D.C., Zimmerman, L.J., 2013. Targeted
quantitation of proteins by mass
spectrometry. Biochemistry (N.Y.) 52, 3797-
3806.
Lindenstrauß, A.G., Behr, J., Ehrmann, M.A.,
Haller, D., Vogel, R.F., 2013. Identification
of fitness determinants in Enterococcus
faecalis by differential proteomics. Arch.
Microbiol. 195, 121-130.
Louis, P., Young, P., Holtrop, G., Flint, H.J.,
2010. Diversity of human colonic butyrate‐producing bacteria revealed by analysis of
the butyryl‐CoA: acetate CoA‐transferase
gene. Environ. Microbiol. 12, 304-314.
Lozupone, C.A., Stombaugh, J., Gonzalez, A.,
Ackermann, G., Wendel, D., Vazquez-Baeza,
Y., Jansson, J.K., Gordon, J.I., Knight, R.,
2013. Meta-analyses of studies of the human
microbiota. Genome Res. 23, 1704-1714.
Machiels, K., Joossens, M., Sabino, J., De Preter,
V., Arijs, I., Eeckhaut, V., Ballet, V., Claes,
K., Van Immerseel, F., Verbeke, K., et al.,
2014. A decrease of the butyrate-producing
species Roseburia hominis and
Faecalibacterium prausnitzii defines
dysbiosis in patients with ulcerative colitis.
Gut 63, 1275-1283.
Mahowald, M.A., Rey, F.E., Seedorf, H.,
Turnbaugh, P.J., Fulton, R.S., Wollam, A.,
Shah, N., Wang, C., Magrini, V., Wilson,
R.K., et al., 2009. Characterizing a model
human gut microbiota composed of members
of its two dominant bacterial phyla. Proc.
Natl. Acad. Sci. U.S.A. 106, 5859-5864.
Marco, M.L., de Vries, M.C., Wels, M.,
Molenaar, D., Mangell, P., Ahrne, S., de
Vos, W.M., Vaughan, E.E., Kleerebezem,
M., 2010. Convergence in probiotic
Lactobacillus gut-adaptive responses in
humans and mice. ISME J. 4, 1481-1484.
Martens, E.C., Kelly, A.G., Tauzin, A.S., Brumer,
H., 2014. The devil lies in the details: how
variations in polysaccharide fine-structure
impact the physiology and evolution of gut
microbes. J. Mol. Biol. 426, 3851-3865.
McNulty, N.P., Wu, M., Erickson, A.R., Pan, C.,
Erickson, B.K., Martens, E.C., Pudlo, N.A.,
Muegge, B.D., Henrissat, B., Hettich, R.L.,
et al., 2013. Effects of diet on resource
utilization by a model human gut microbiota
containing Bacteroides cellulosilyticus WH2,
a symbiont with an extensive glycobiome.
PLoS Biol. 11, e1001637.
Mesuere, B., Devreese, B., Debyser, G., Aerts,
M., Vandamme, P., Dawyndt, P., 2012.
Unipept: Tryptic Peptide-Based Biodiversity
Analysis of Metaproteome Samples. J.
Proteome Res. 11, 5773-5780.
Michaels, J.A., Dasari, S., Pereira, L., Reddy,
A.P., Lapidus, J.A., Lu, X., Jacob, T.,
Thomas, A., Rodland, M., Roberts, C.T., et
al., 2007. Comprehensive proteomic analysis
of the human amniotic fluid proteome:
gestational age-dependent changes. J.
Proteome Res. 6, 1277-1285.
Michalik, S., Bernhardt, J., Otto, A., Moche, M.,
Becher, D., Meyer, H., Lalk, M., Schurmann,
C., Schlüter, R., Kock, H., et al., 2012. Life
and death of proteins: a case study of
glucose-starved Staphylococcus aureus. Mol.
Cell. Proteomics 11, 558-570.
Moreno-Indias, I., Cardona, F., Tinahones, F.J.,
Queipo-Ortuño, M.I., 2014. Impact of the gut
microbiota on the development of obesity
and type 2 diabetes mellitus. Front.
Microbiol. 5.
Morgan, X.C., Segata, N., Huttenhower, C., 2012.
Biodiversity and functional genomics in the
human microbiome. Trends Genet. 29, 51-
58.
Murayama, K., Fujimura, T., Morita, M., Shindo,
N., 2001. One‐step subcellular fractionation
of rat liver tissue using a Nycodenz density
gradient prepared by freezing‐thawing and
two‐dimensional sodium dodecyl sulfate
electrophoresis profiles of the main fraction
of organelles. Electrophoresis 22, 2872-2880.
Muth, T., Kolmeder, C.A., Salojärvi, J., Keskitalo,
S., Varjosalo, M., Verdam, F.J., Rensen,
S.S., Reichl, U., Vos, W.M., Rapp, E., 2015.
Navigating through metaproteomics data–a
logbook of database searching. Proteomics.
Muth, T., Behne, A., Heyer, R., Kohrs, F.,
Benndorf, D., Hoffmann, M., Lehtevä, M.,
Reichl, U., Martens, L., Rapp, E., 2015. The
MetaProteomeAnalyzer: a powerful open-
source software suite for metaproteomics
data analysis and interpretation. J. Proteome
Res. 14, 1557-1565.
Muth, T., Peters, J., Blackburn, J., Rapp, E.,
Martens, L., 2013. ProteoCloud: A full-
featured open source proteomics cloud
computing pipeline. J. Proteomics 88, 104-
108.
Nabokina, S.M., Inoue, K., Subramanian, V.S.,
Valle, J.E., Yuasa, H., Said, H.M., 2014.
Molecular identification and functional
characterization of the human colonic
thiamine pyrophosphate transporter. J. Biol.
Chem. 289, 4405-4416.
Nagarajan, N., Pop, M., 2013. Sequence assembly
demystified. Nat. Rev. Genet. 14, 157-167.
Neijssel, O.M., Mattos, M., 1994. The energetics
of bacterial growth: a reassessment. Mol.
Microbiol. 13, 179-182.
Neilson, K.A., Ali, N.A., Muralidharan, S.,
Mirzaei, M., Mariani, M., Assadourian, G.,
Lee, A., Van Sluyter, S.C., Haynes, P.A.,
2011. Less label, more free: approaches in
label‐free quantitative mass spectrometry.
Proteomics 11, 535-553.
Nesvizhskii, A.I., 2014. Proteogenomics:
concepts, applications and computational
strategies. Nat. Methods 11, 1114-1125.
Neville, B.A., Forde, B.M., Claesson, M.J.,
Darby, T., Coghlan, A., Nally, K., Ross,
R.P., O’Toole, P.W., 2012. Characterization
of pro-inflammatory flagellin proteins
produced by Lactobacillus ruminis and
related motile Lactobacilli. PloS One 7,
e40592.
Nguyen, T.L., Vieira-Silva, S., Liston, A., Raes,
J., 2015. How informative is the mouse for
human gut microbiota research? Dis. Model.
Mech. 8, 1-16.
Nielsen, H.B., Almeida, M., Juncker, A.S.,
Rasmussen, S., Li, J., Sunagawa, S., Plichta,
D.R., Gautier, L., Pedersen, A.G., Le
Chatelier, E., 2014. Identification and
assembly of genomes and genetic elements
in complex metagenomic samples without
using reference genomes. Nat. Biotechnol.
32, 822-828.
Norin, K., Gustafsson, B., Lindblad, B., Midtvedt,
T., 1985. The establishment of some
microflora associated biochemical
characteristics in feces from children during
the first years of life. Acta Paediatrica 74,
207-212.
Norman, J.M., Handley, S.A., Virgin, H.W., 2014.
Kingdom-agnostic metagenomics and the
importance of complete characterization of
enteric microbial communities.
Gastroenterology 146, 1459-1469.
Nylund, L., Satokari, R., Salminen, S., de Vos,
W.M., 2014. Intestinal microbiota during
early life–impact on health and disease. Proc.
Nutr. Soc. 73, 457-469.
O’Keefe, S.J.D, Li, J.V., Lahti, L., Ou, J.,
Carbonero, F., Mohammed, K., Posma, P.M.,
Kinross, J., Wahl, E., Ruder, E., et al., 2015.
Fat, Fiber and Cancer Risk in African
Americans and Rural Africans. Nature
Comm., 6.
Otto, A., Becher, D., Schmidt, F., 2014.
Quantitative proteomics in the field of
microbiology. Proteomics 14, 547-565.
Ouwerkerk, J.P., de Vos, W.M., Belzer, C., 2013.
Glycobiome: bacteria and mucus at the
epithelial interface. Best Pract. Res. Clin.
Gastroenterol. 27, 25-38.
Oveland, E., Muth, T., Rapp, E., Martens, L.,
Berven, F.S., Barsnes, H., 2015. Viewing the
proteome: How to visualize proteomics data?
Proteomics.
Panda, S., Casellas, F., Vivancos, J.L., Cors,
M.G., Santiago, A., Cuenca, S., Guarner, F.,
Manichanh, C., 2014. Short-Term Effect of
Antibiotics on Human Gut Microbiota. PloS
One 9, e95476.
Paradis, E., Claude, J., Strimmer, K., 2004. APE:
Analyses of Phylogenetics and Evolution in
R language. Bioinformatics 20, 289-290.
Paulo, J.A., Kadiyala, V., Banks, P.A., Conwell,
D.L., Steen, H., 2013. Mass spectrometry-
based quantitative proteomic profiling of
human pancreatic and hepatic stellate cell
lines. Genomics Proteomics Bioinformatics
11, 105-113.
Pearson, J.R., Gill, C.I., Rowland, I.R., 2009.
Diet, fecal water, and colon cancer–
development of a biomarker. Nutr. Rev. 67,
509-526.
Penders, J., Thijs, C., Vink, C., Stelma, F.F.,
Snijders, B., Kummeling, I., van den Brandt,
P.A., Stobberingh, E.E., 2006. Factors
influencing the composition of the intestinal
microbiota in early infancy. Pediatrics 118,
511-521.
Powell, S., Forslund, K., Szklarczyk, D.,
Trachana, K., Roth, A., Huerta-Cepas, J.,
Gabaldon, T., Rattei, T., Creevey, C., Kuhn,
M., et al., 2014. eggNOG v4.0: nested
orthology inference across 3686 organisms.
Nucleic Acids Res. 42, D231-9.
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf,
K.S., Manichanh, C., Nielsen, T., Pons, N.,
Levenez, F., Yamada, T., et al., 2010. A
human gut microbial gene catalogue
established by metagenomic sequencing.
Nature 464, 59-65.
Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F.,
Liang, S., Zhang, W., Guan, Y., Shen, D., et
al., 2012. A metagenome-wide association
study of gut microbiota in type 2 diabetes.
Nature 490, 55-60.
Quercia, S., Candela, M., Giuliani, C., Turroni, S.,
Luiselli, D., Rampelli, S., Brigidi, P.,
Franceschi, C., Bacalini, M.G., Garagnani,
P., et al., 2014. From lifetime to evolution:
timescales of human gut microbiota
adaptation. Front. Microbiol. 5.
Rajilić-Stojanović, M., de Vos, W.M., 2014. The
first 1000 cultured species of the human
gastrointestinal microbiota. FEMS
Microbiol. Rev. 38, 996-1047.
Rajilić-Stojanović, M., Heilig, H.G., Molenaar,
D., Kajander, K., Surakka, A., Smidt, H., de
Vos, W.M., 2009. Development and
application of the human intestinal tract chip,
a phylogenetic microarray: analysis of
universally conserved phylotypes in the
abundant microbiota of young and elderly
adults. Environ. Microbiol. 11, 1736-1751.
Rampelli, S., Candela, M., Turroni, S., Biagi, E.,
Collino, S., Franceschi, C., O'Toole, P.W.,
Brigidi, P., 2013. Functional metagenomic
profiling of intestinal microbiome in extreme
ageing. Aging (Albany NY) 5, 902.
Rappé, M.S., Giovannoni, S.J., 2003. The
uncultured microbial majority. Annu. Rev.
Microbiol. 57, 369-394.
R-core team, R: A language and environment for
statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL
http://www.R-project.org/.
Reichardt, N., Duncan, S.H., Young, P.,
Belenguer, A., Leitch, C.M., Scott, K.P.,
Flint, H.J., Louis, P., 2014. Phylogenetic
distribution of three pathways for propionate
production within the human gut microbiota.
ISME J. 8, 1323-1335.
Reichhardt, M.P., Jarva, H., de Been, M., Miguel
Rodriguez, J., Quintana Jimenez, E.,
Loimaranta, V., de Vos, W.M., Meri, S.,
2014. The salivary scavenger and agglutinin
in early life: diverse roles in amniotic fluid
and in the infant intestine. J. Immunol. 61,
252-252.
Rezzi, S., Ramadan, Z., Martin, F.P., Fay, L.B.,
van Bladeren, P., Lindon, J.C., Nicholson,
J.K., Kochhar, S., 2007. Human metabolic
phenotypes link directly to specific dietary
preferences in healthy individuals. J.
Proteome Res. 6, 4469-4477.
Righetti, P.G., Candiano, G., Citterio, A.,
Boschetti, E., 2015. Combinatorial Peptide
Ligand Libraries as a “Trojan Horse” in
Deep Discovery Proteomics. Anal. Chem.
87, 293-305.
Rinttilä, T., Kassinen, A., Malinen, E., Krogius,
L., Palva, A., 2004. Development of an
extensive set of 16S rDNA‐targeted primers
for quantification of pathogenic and
indigenous bacteria in faecal samples by
real‐time PCR. J. Appl. Microbiol. 97, 1166-
1177.
Ritari, J., Lahti, L., de Vos, W.M., 2015.
Improved Taxonomic Assignment of Human
Intestinal 16s rRNA Sequence Reads by a
Dedicated Reference Database. Under
review.
Rodríguez-Valera, F., 2004. Environmental
genomics, the big picture? FEMS Microbiol.
Lett. 231, 153-158.
Rosenberger, G., Koh, C.C., Guo, T., Röst, H.L.,
Kouvonen, P., Collins, B.C., Heusel, M.,
Liu, Y., Caron, E., Vichalkovski, A., et al.,
2014. A repository of assays to quantify
10,000 human proteins by SWATH-MS.
Scientific Data 1, 140031.
Ruiz-Moyano, S., Tao, N., Underwood, M.A.,
Mills, D.A., 2012. Rapid discrimination of
Bifidobacterium animalis subspecies by
matrix-assisted laser desorption ionization-
time of flight mass spectrometry. Food
Microbiol. 30, 432-437.
Salonen, A., de Vos, W.M., 2014. Impact of diet
on human intestinal microbiota and health.
Annu. Rev. Food Sci. Technol. 5, 239-262.
Salonen, A., Lahti, L., Salojärvi, J., Holtrop, G.,
Korpela, K., Duncan, S.H., Date, P.,
Farquharson, F., Johnstone, A.M., Lobley,
G.E., et al., 2014. Impact of diet and
individual variation on intestinal microbiota
composition and fermentation products in
obese men. ISME J. 8, 2218-2230.
Salonen, A., Nikkilä, J., Jalanka-Tuovinen, J.,
Immonen, O., Rajilić-Stojanović, M.,
Kekkonen, R.A., Palva, A., de Vos, W.M.,
2010. Comparative analysis of fecal DNA
extraction methods with phylogenetic
microarray: effective recovery of bacterial
and archaeal DNA using mechanical cell
lysis. J. Microbiol. Methods 81, 127-134.
Scheperjans, F., Aho, V., Pereira, P.A., Koskinen,
K., Paulin, L., Pekkonen, E., Haapaniemi, E.,
Kaakkola, S., Eerola‐Rautio, J., Pohja, M., et
al., 2015. Gut microbiota are related to
Parkinson's disease and clinical phenotype.
Mov. Disord. 30, 350-358.
Schneider, T., Riedel, K., 2010. Environmental
proteomics: analysis of structure and
function of microbial communities.
Proteomics 10, 785-798.
Schnorr, S.L., Candela, M., Rampelli, S.,
Centanni, M., Consolandi, C., Basaglia, G.,
Turroni, S., Biagi, E., Peano, C., Severgnini,
M., et al., 2014. Gut microbiome of the
Hadza hunter-gatherers. Nat. Commun. 5,
3654.
Schoepfer, A.M., Schaffer, T., Seibold-Schmid,
B., Müller, S., Seibold, F., 2008. Antibodies
to flagellin indicate reactivity to bacterial
antigens in IBS patients. Neurogastroenterol.
Motil. 20, 1110-1118.
Scholtens, P.A., Oozeer, R., Martin, R., Amor,
K.B., Knol, J., 2012. The early settlers:
intestinal microbiology in early life. Annu.
Rev. Food Sci. Technol. 3, 425-447.
Schrimpe-Rutledge, A.C., Fontes, G., Gritsenko,
M.A., Norbeck, A.D., Anderson, D.J.,
Waters, K.M., Adkins, J.N., Smith, R.D.,
Poitout, V., Metz, T.O., 2012. Discovery of
novel glucose-regulated proteins in isolated
human pancreatic islets using LC–MS/MS-
based proteomics. J. Proteome Res. 11,
3520-3532.
Schuijt, T.J., van der Poll, T., de Vos, W.M.,
Wiersinga, W.J., 2013. The intestinal
microbiota and host immune interactions in
the critically ill. Trends Microbiol. 21, 221-
229.
Scott, K.P., Gratz, S.W., Sheridan, P.O., Flint,
H.J., Duncan, S.H., 2013. The influence of
diet on the gut microbiota. Pharmacol. Res.
69, 52-60.
Segata, N., Waldron, L., Ballarini, A.,
Narasimhan, V., Jousson, O., Huttenhower,
C., 2012. Metagenomic microbial
community profiling using unique clade-
specific marker genes. Nat. Methods 9, 811-
814.
Seifert, J., Taubert, M., Jehmlich, N., Schmidt, F.,
Völker, U., Vogt, C., Richnow, H., von
Bergen, M., 2012. Protein‐based stable
isotope probing (protein‐SIP) in functional
metaproteomics. Mass Spectrom. Rev. 31,
683-697.
Shevchenko, A., Tomas, H., Havlis, J., Olsen,
J.V., Mann, M., 2006. In-gel digestion for
mass spectrometric characterization of
proteins and proteomes. Nat. Protoc. 1, 2856-
2860.
Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L.,
Deutsch, E.W., 2013. Combining results of
multiple search engines in proteomics. Mol.
Cell. Proteomics 12, 2383-2393.
Siggins, A., Gunnigle, E., Abram, F., 2012.
Exploring mixed microbial community
functioning: recent advances in
metaproteomics. FEMS Microbiol. Ecol. 80,
265-280.
Song, Y., Garg, S., Girotra, M., Maddox, C., von
Rosenvinge, E.C., Dutta, A., Dutta, S.,
Fricke, W.F., 2013. Microbiota dynamics in
patients treated with fecal microbiota
transplantation for recurrent Clostridium
difficile infection. PloS One 8, e81330.
Sonnenburg, E.D., Sonnenburg, J.L., 2014.
Starving our Microbial Self: The Deleterious
Consequences of a Diet Deficient in
Microbiota-Accessible Carbohydrates. Cell
Metab. 20, 779-786.
Speicher, K.D., Kolbas, O., Harper, S., Speicher,
D.W., 2000. Systematic analysis of peptide
recoveries from in-gel digestions for protein
identifications in proteome studies. J.
Biomol. Tech. 11, 74-86.
Steck, N., Mueller, K., Schemann, M., Haller, D.,
2012. Bacterial proteases in IBD and IBS.
Gut 61, 1610-1618.
Stephen, A.M., Cummings, J.H., 1980. The
microbial contribution to human faecal mass.
J. Med. Microbiol. 13, 45-56.
Tanca, A., Palomba, A., Pisanu, S., Addis, M.F.,
Uzzau, S., 2015. Enrichment or depletion?
The impact of stool pretreatment on
metaproteomic characterization of the human
gut microbiota. Proteomics.
Tanca, A., Palomba, A., Pisanu, S., Deligios, M.,
Fraumene, C., Manghina, V., Pagnozzi, D.,
Addis, M.F., Uzzau, S., 2014. A
straightforward and efficient analytical
pipeline for metaproteome characterization.
Microbiome 2, 1-16.
Tatusov, R.L., Koonin, E.V., Lipman, D.J., 1997.
A genomic perspective on protein families.
Science 278, 631-637.
Taverna, D.M., Goldstein, R.A., 2002. Why are
proteins marginally stable? Proteins 46, 105-
109.
Tims, S., Derom, C., Jonkers, D.M., Vlietinck, R.,
Saris, W.H., Kleerebezem, M., de Vos,
W.M., Zoetendal, E.G., 2012. Microbiota
conservation and BMI signatures in adult
monozygotic twins. ISME J. 7, 707-717.
Tissier, H., 1900. Recherches sur la flore
intestinale des nourrissons: état normal et
pathologique. G. Carré et C. Naud.
Turnbaugh, P.J., Hamady, M., Yatsunenko, T.,
Cantarel, B.L., Duncan, A., Ley, R.E., Sogin,
M.L., Jones, W.J., Roe, B.A., Affourtit, J.P.,
et al., 2009. A core gut microbiome in obese
and lean twins. Nature 457, 480-484.
Tyakht, A.V., Kostryukova, E.S., Popenko, A.S.,
Belenikin, M.S., Pavlenko, A.V., Larin,
A.K., Karpova, I.Y., Selezneva, O.V.,
Semashko, T.A., Ospanova, E.A., et al.,
2013. Human gut microbiota community
structures in urban and rural populations in
Russia. Nat. Comm. 4.
van Nood, E., Vrieze, A., Nieuwdorp, M.,
Fuentes, S., Zoetendal, E.G., de Vos, W.M.,
Visser, C.E., Kuijper, E.J., Bartelsman, J.F.,
Tijssen, J.G., et al., 2013. Duodenal infusion
of donor feces for recurrent Clostridium
difficile. N. Engl. J. Med. 368, 407-415.
van Ommen, B., van der Greef, J., Ordovas, J.M.,
Daniel, H., 2014. Phenotypic flexibility as
key factor in the human nutrition and health
relationship. Genes Nutr. 9, 1-9.
Vaudel, M., Sickmann, A., Martens, L., 2012.
Current methods for global proteome
identification. Expert Rev. Proteomics 9,
519-532.
Venables, W.N., Ripley, B.D., 2002. Modern
applied statistics with S. Springer.
Venema, K., Van den Abbeele, P., 2013.
Experimental models of the gut microbiome.
Best Pract. Res. Clin. Gastroenterol. 27, 115-
126.
Verdam, F.J., Fuentes, S., de Jonge, C., Zoetendal,
E.G., Erbil, R., Greve, J.W., Buurman, W.A.,
de Vos, W.M., Rensen, S.S., 2013. Human
intestinal microbiota composition is
associated with local and systemic
inflammation in obesity. Obesity 21, E607-
E615.
Vey, G., Charles, T.C., 2014. MetaProx: the
database of metagenomic proximons.
Database (Oxford) 2014,
10.1093/database/bau097.
Vijay-Kumar, M., Aitken, J.D., Gewirtz, A.T.,
2008. Toll like receptor-5: protecting the gut
from enteric microbes. Semin.
Immunopathol. 30, 11-21.
Vrieze, A., Van Nood, E., Holleman, F., Salojärvi,
J., Kootte, R.S., Bartelsman, J.F.W.M.,
Dallinga-Thie, G.M., Ackermans, M.T.,
Serlie, M.J., Oozeer, R., et al., 2012.
Transfer of intestinal microbiota from lean
donors increases insulin sensitivity in
subjects with metabolic syndrome.
Gastroenterology 143, 913-916.
Wacklin, P., Tuimala, J., Nikkilä, J., Tims, S.,
Mäkivuokko, H., Alakulppi, N., Laine, P.,
Rajilić-Stojanović, M., Paulin, L., de Vos,
W.M., et al., 2014. Faecal Microbiota
Composition in Adults Is Associated with the
FUT2 Gene Determining the Secretor Status.
PloS One 9, e94863.
Walker, A.W., Ince, J., Duncan, S.H., Webster,
L.M., Holtrop, G., Ze, X., Brown, D., Stares,
M.D., Scott, P., Bergerat, A., et al., 2011.
Dominant and diet-responsive groups of
bacteria within the human colonic
microbiota. ISME J. 5, 220-230.
Walters, W.A., Xu, Z., Knight, R., 2014. Meta-
analyses of human gut microbes associated
with obesity and IBD. FEBS Lett. 588, 4223-
4233.
Wang, M., Ahrne, S., Jeppsson, B., Molin, G.,
2005. Comparison of bacterial diversity
along the human intestinal tract by direct
cloning and sequencing of 16S rRNA genes.
FEMS Microbiol. Ecol. 54, 219-231.
Wang, M., Li, M., Wu, S., Lebrilla, C.B.,
Chapkin, R.S., Ivanov, I., Donovan, S.M.,
2015. Fecal Microbiota Composition of
Breast-fed Infants is Correlated with Human
Milk Oligosaccharides Consumed. J. Pediatr.
Gastroenterol. Nutr.
Warnecke, F., Luginbühl, P., Ivanova, N.,
Ghassemian, M., Richardson, T.H., Stege,
J.T., Cayouette, M., McHardy, A.C.,
Djordjevic, G., Aboushadi, N., et al., 2007.
Metagenomic and functional analysis of
hindgut microbiota of a wood-feeding higher
termite. Nature 450, 560-565.
Weng, M., Walker, W., 2013. The role of gut
microbiota in programming the immune
phenotype. J. Dev. Orig. Health Dis. 4, 203-
214.
Williams, R., Peisajovich, S.G., Miller, O.J.,
Magdassi, S., Tawfik, D.S., Griffiths, A.D.,
2006. Amplification of complex gene
libraries by emulsion PCR. Na. Methods 3,
545-550.
Wilmes, P., Bond, P.L., 2004. The application of
two-dimensional polyacrylamide gel
electrophoresis and downstream analyses to a
mixed community of prokaryotic
microorganisms. Environ. Microbiol. 6, 911-
920.
Wilmes, P., Bond, P.L., 2006. Metaproteomics:
studying functional gene expression in
microbial ecosystems. Trends Microbiol. 14,
92-97.
Wu, G.D., Compher, C., Chen, E.Z., Smith, S.A.,
Shah, R.D., Bittinger, K., Chehoud, C.,
Albenberg, L.G., Nessel, L., Gilroy, E., et
al., 2014. Comparative metabolomics in
vegans and omnivores reveal constraints on
diet-dependent gut microbiota metabolite
production. Gut , gutjnl-2014-308209.
Wu, J., An, Y., Yao, J., Wang, Y., Tang, H., 2010.
An optimised sample preparation method for
NMR-based faecal metabonomic analysis.
Analyst 135, 1023-1030.
Xiong, W., Giannone, R.J., Morowitz, M.J.,
Banfield, J.F., Hettich, R.L., 2014.
Development of an Enhanced Metaproteomic
Approach for Deepening the Microbiome
Characterization of the Human Infant Gut. J.
Proteome Res. 14, 133-141.
Yadav, A.K., Kumar, D., Dash, D., 2012.
Learning from decoys to improve the
sensitivity and specificity of proteomics
database search results. PloS One 7, e50651.
Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan,
I., Dominguez-Bello, M.G., Contreras, M.,
Magris, M., Hidalgo, G., Baldassano, R.N.,
Anokhin, A.P., et al., 2012. Human gut
microbiome viewed across age and
geography. Nature 486, 222-227.
Yu, Z., Morrison, M., 2004. Improved extraction
of PCR-quality community DNA from
digesta and fecal samples. BioTechniques 36,
808-812.
Zoetendal, E.G., Akkermans, A.D., Akkermans-
van Vliet, W.M., de Visser, J., Arjan, G.M.,
de Vos, W.M., 2001. The host genotype
affects the bacterial community in the human
gastronintestinal tract. Microb. Ecol. Health
Dis. 13, 129-134.
Zoetendal, E.G., Raes, J., van den Bogert, B.,
Arumugam, M., Booijink, C.C., Troost, F.J.,
Bork, P., Wels, M., de Vos, W.M.,
Kleerebezem, M., 2012. The human small
intestinal microbiota is driven by rapid
uptake and conversion of simple
carbohydrates. ISME J. 6, 1415-1426.
Zoetendal, E.G., de Vos, W.M., 2014. Effect of
diet on the intestinal microbiota and its
activity. Curr. Opin. Gastroenterol. 30, 189-
195.