Metaproteomics of the Human Intestinal Tract to Assess … · 2017. 3. 11. · I ABSTRACT The human...

Department of Veterinary Biosciences

Faculty of Veterinary Medicine

University of Helsinki

Helsinki, Finland

Metaproteomics of the Human Intestinal Tract to Assess

Microbial Functionality and Interactions with the Host

Carolin Adriane Kolmeder

ACADEMIC DISSERTATION

To be presented, with the permission of the Faculty of Veterinary Medicine of the University of

Helsinki, for public examination in Walter auditorium, EE-building, Agnes Sjöberginkatu 2, Helsinki, on

May 28th, at 12 noon.

Helsinki 2015

Director of Studies:

Professor Airi Palva



Helsinki, Finland

Supervisors:

Professor Willem M. de Vos

Department of Veterinary Biosciences and RPU Immunobiology

Faculty of Medicine, University of Helsinki,

Helsinki, Finland and

Laboratory of Microbiology

Wageningen University

Wageningen, The Netherlands

Docent Anne Salonen

Department of Immunology and Bacteriology


Helsinki, Finland

Professor Airi Palva



Helsinki, Finland

Pre-examiners:

Docent Kirsi Savijoki

Department of Food and Environmental

Sciences


Helsinki, Finland

Associate Professor Koen Venema

Department of Human Biology

Maastricht University – Campus Venlo

Maastricht, The Netherlands

Opponent:

Docent Petri Auvinen

Institute of Biotechnology


Helsinki, Finland

Published in Dissertationes Schola Doctoralis Scientiae Circumiectalis, Alimentarie, Biologicae

ISBN 978-951-51-1169-2 (paperback) and 978-951-51-1170-8 (PDF)

ISSN 2342-5423 (print) and 2342-5431 (online)

http://ethesis.helsinki.fi

Cover: Talvinen auringonlasku Kolilla – Jochen Müller

Layout: Jochen Müller

Hansaprint

Helsinki, Finland 2015

Non nobis solum nati sumus. Cicero/Plato

ABSTRACT ...................................................................................................................................................... I

LIST OF ORIGINAL PUBLICATIONS ...................................................................................................... II

ABBREVIATIONS ........................................................................................................................................ III

GLOSSARY ................................................................................................................................................... IV

1. INTRODUCTION ........................................................................................................................................ 1

2. REVIEW OF THE LITERATURE ............................................................................................................ 3

2.1 Human intestinal microbiota ................................................................................................................ 3

2.1.1 Development from birth to old age ................................................................................................................ 3

2.1.2 Variation and modulation in adulthood ....................................................................................................... 6

2.2 Methods to study the human intestinal microbiota ............................................................................ 7

2.2.1 Faecal material as the study object ............................................................................................................... 7

2.2.2 Phylogenetic analysis .................................................................................................................................... 8

2.2.3 Metagenomics ................................................................................................................................................ 9

2.2.4 Metatranscriptomics .................................................................................................................................... 13

2.2.5 Metaproteomics ............................................................................................................................................ 13

2.2.6 Data integration ........................................................................................................................................... 19

3. AIMS OF THE STUDY ............................................................................................................................. 20

4. MATERIALS & METHODS .................................................................................................................... 21

4.1 Study cohorts and samples ................................................................................................................. 21

4.2 Methods ................................................................................................................................................ 22

4.2.1 Metaproteomics ............................................................................................................................................ 22

4.2.2 Phylogenetic microarray analysis ............................................................................................................... 23

4.2.3 Statistical analysis ........................................................................................................................................ 24

4.2.4 Additional data-sets used in this thesis ....................................................................................................... 24

5. RESULTS AND DISCUSSION ................................................................................................................. 25

5.1 Method development to study the human intestinal metaproteome (I, II and V) ......................... 25

5.1.1 Analysis of the abundant metaproteome (II) .............................................................................................. 25

5.1.2 Reproducibility testing of the pipeline (II) .................................................................................................. 27

5.1.3 Development of in-house databases (I-IV) ................................................................................................. 28

5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV) ................................................................. 29

5.1.5 Reflections on the metaproteomic approach ............................................................................................... 30

5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time .................. 31

5.2.1 Individuality of the faecal metaproteome (II and III) ................................................................................ 33

5.2.2 Main microbial functionalities (II and III) ................................................................................................ 33

5.3 Host-microbiota interactions (II-IV) ................................................................................................. 34

5.4 Comparing the metaproteome of lean and obese individuals (IV).................................................. 35

5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis (II-IV) .......... 36

5.6 Reflections on study cohorts ............................................................................................................... 37

6. CONCLUSIONS AND FUTURE PERSPECTIVES ............................................................................... 39

7. ACKNOWLEDGEMENTS ....................................................................................................................... 42

8. REFERENCES ........................................................................................................................................... 45

I

ABSTRACT

The human body is complemented by its microbes. Consequently, various intimate reactions between host

and microbes have been recognized and determining the role of the microbiota in many diseases has started.

Until the present time, most attention has been focused on the intestinal tract that contains the majority of the

microbiota. Extensive studies on the phylogenetic composition of the microbes present in faecal material

revealed species richness and the individuality of the intestinal microbiota, unique to each human being.

The work described in this thesis started when the first large scale metagenome analysis of the intestinal

microbiota for global exploitation of the potential biological functions of this ecosystem was being initiated.

Today, approximately ten million unique genes have been identified from the human intestinal microbiota,

but little is known about which of these genes can be expressed into proteins and the conditions under which

the protein synthesis occurs. Therefore, metaproteomics has emerged as a discipline in order to delve into the

biological functions of an ecosystem in a holistic manner by targeting all the proteins present.

This thesis work describes the development of a metaproteomic approach for analysing proteins from

the human intestinal tract. The basis of this approach is to use crude faecal material to determine bacterial

and human proteins simultaneously. Separating the proteins by one dimensional gel-electrophoresis and

focusing on the molecular weight region defined by the 37 and 75 kDa bands of a protein marker was found

to be both reproducible and feasible for the comparison of several dozens of samples. In a case study on the

metaproteome and metagenome of two individuals the benefit of utilizing matched metagenomes on the

number of identified spectra could be demonstrated. Furthermore, a bioinformatic workflow was devised to

increase the ratio of peptides identified per protein class.

Applying the developed metaproteomic approach to faecal samples derived from healthy adults

identified ecosystem specific components including surface proteins like flagellins and proteins involved in

vitamin synthesis. The monitoring of the metaproteome over six weeks in a probiotic intervention study

revealed time-dependent and inter-individual variations in these functions. The major finding emerging from

this work was that the metaproteome is personalized, both over the long-term as well as in short-term.

Therefore, the phylogenetic individuality of the intestinal microbiota was also functionally reflected. A

different set of microbes in their specific host in combination with environmental factors will express

specific functionalities.

Analysing both microbial and human proteins, which represented approximately 85% and 15% of the

total proteins respectively, allowed identifying indications of host-microbe interactions. Enzymes both for

degrading host and microbial components were identified. Effector and response molecules of the host-

microbe dialogue were found.

In addition, the metaproteome of a cohort of obese individuals was compared to that of non-obese

individuals. Based on the spectral counts of the eggNOG assignments of the identified peptides the faecal

metaproteomes of obese and non-obese subjects were significantly different from each other. In addition,

insulin-sensitivity correlated significantly with bacterial proteins.

In most cases, metaproteomic data was complemented by phylogenetic microarray data. This allowed

looking at the composition and its function at the same time. Data integration revealed for example a

correlation between Eubacterium rectale and flagellins. In the obesity study, the proportion of Bacteroidetes

peptides in the faecal samples from the obese subjects compared to the non-obese subjects were relatively

higher than the presence of this phylum detected by 16S rRNA analysis. This result implies biological

activity of Bacteroidetes also in obese individuals.

Metagenomics is still ahead of metaproteomics, both in numbers of analysed samples and available

tools. However, a refinement of metaproteomic methodologies has started and the number of samples

analysed is rising. Metaproteomics is one of the central disciplines to identify the key-molecules involved in

host-microbes interactions.

II

LIST OF ORIGINAL PUBLICATIONS

This thesis is based on the following original publications referred to in the text by their Roman numerals (I-

V):

I Rooijers, K.*, Kolmeder, C.*, Juste, C., Doré, J., de Been, M., Boeren, S., Galan, P., Beauvallet, C.,

de Vos, W.M. and Schaap, P.J., 2011. An iterative workflow for mining the human intestinal

metaproteome. BMC Genomics, 12(6).

*Equal contribution

II Kolmeder, C.A., de Been, M., Nikkilä, J., Ritamo, I., Mättö, J., Valmu, L., Salojärvi, J., Palva, A.,

Salonen, A. and de Vos, W.M., 2012. Comparative metaproteomics and diversity analysis of human

intestinal microbiota testifies for its temporal stability and expression of core functions. PLoS ONE,

7(1), e29913.

III Kolmeder, C.A., Salojärvi, J., Ritari, J., de Been, M., Raes, J., Falony, G., Vieira-Silva, S.,

Kekkonen, R.A., Corthals, G., Palva, A., Salonen, A. and de Vos, W.M., 2015. Faecal

metaproteomic analysis reveals a personalized and stable functional microbiome and limited effects

of a probiotic intervention in adults. Submitted.

IV Kolmeder, C.A., Ritari, J., Verdam, F.J., Muth, T., Keskitalo, S., Varjosalo, M., Fuentes, S., Greve,

J.W., Buurman, W.A., Reichl, U, Rapp, E., Martens, L., Palva, A., Salonen, A., Rensen, S. and de

Vos, W.M., 2015. Colonic metaproteomic signatures of active bacteria and the host in obesity.

Submitted.

V Kolmeder, C.A. and de Vos, W.M., 2014. Metaproteomics of our microbiome – developing insight

in function and activity in man and model systems. Journal of Proteomics, 97, 3-16.

The original articles are reprinted with the kind permission of the publishers.

In addition, some unpublished results are presented.

III

ABBREVIATIONS

1-DE one dimensional gel electrophoresis

2-DE two dimensional gel electrophoresis

AMP abundant metaproteome

BMI body mass index

CD Crohn’s disease

COG cluster of orthologous groups

DB database

EggNOG evolutionary genealogy of genes: non-supervised orthologous groups

FDR false discovery rate

geLC-MS/MS protein fractionation by 1-DE and peptide analysis by LC-MS/MS

GIT gastro-intestinal tract

GMM microbial gastro-intestinal metabolic modules

HPLC high performance liquid chromatography

HMP human microbiome project

IBD inflammatory bowel disease

IBS irritable bowel syndrome

IMG/JGI integrated microbial genomes/joint genome institute

KEGG Kyoto encyclopedia of genes and genomes

KO KEGG orthology

LC liquid chromatography

LCA lowest common ancestor

MALDI-ToF matrix assisted laser desorption/ionization time-of flight

MRM multiple reaction monitoring

MS mass spectrometry

MS1 peptide ions

MS/MS tandem mass spectrometry

NGS next generation sequencing

NSP non-starch polysaccharides

NOG non-supervised orthologous group

OTU operational taxonomic unit

PSM peptide spectral matching

qPCR quantitative polymerase chain reaction

RS resistant starch

SCFA short chain fatty acids

SCX strong cation exchange

SRM selected reaction monitoring

UniProtKB universal protein resource knowledgebase

IV

GLOSSARY

Activity Collection of expressed functions of a cell; in contrast to genetic material and

potential

Ecosystem Living environment and organisms within

Functionality Collection of functions a cell is capable of performing

Matched data Metaproteomic and metagenomic data originating from the same biological sample;

matched metaproteome and matched metagenome

Microbiota Collection of microbes living in an ecosystem

Microbiome Collection of genes of a microbiota; also used as a synonym for microbiota

Meta-Omics Global analysis of molecules contained in an ecosystem or microbiota with omics

techniques that aim at the collective characterization and quantification of pools of

biological molecules; metagenomics, metatranscriptomics, metaproteomics

Phylogeny Evolutionary relation of organisms; nowadays mostly based on genetic comparison

Taxonomy Hierarchical grouping of organisms according to phylogeny

Introduction

1

1. INTRODUCTION

Making new biological discoveries requires two components: a special interest as well as technological

possibilities. The discovery of microbial life by Antonie van Leeuwenhoek (1632-1723) is a good example

from the early days of microbiology as it illustrates his keen interest in understanding a biological

phenomenon and advancing technical developments. Van Leeuwenhoek was a careful observer and

associated his diarrhoea to the presence of more and different little animals “animalcules”, as he called the

microbes, in his faecal excrements. From his drawings one may conclude that he suffered from an infection

by the protozoan Giardia. In addition, he developed lenses that later advanced into microscopes, instruments

that are still widely used in microbiology today. Ever since, most attention for microbes living in or around

the human body, has been focused on pathogenic microbes. However, in the late 19th century studying

microbes being possibly beneficial to humans, such as the lactic acid bacteria, started. Microbiology has

come a long way from these observations. Today, not only single microbial organisms are studied in

isolation but also complete microbial communities, termed microbiota, are analysed, preferably in their

original environment. The analysis of soil, oceans, eons-old ice, and the bodies of a variety of animals, has

revealed huge microbial species variations and has provided indications of microbial functions. For instance,

termites have specialized hindgut microbes helping them with degrading wood material and cows require a

rumen microbiota to supply them with energy out of grass cellulose. By now, from all these microbial

reservoirs, one has received most attention, as evidenced by the numbers of papers: the human being. Skin,

the oral cavity and gut, among other body sites, are populated by microbes. In particular, the intestinal

microbiota has become the focus of many studies that address their associations with health and disease.

Modern microbiology has access to a large set of sophisticated molecular tools, each of which provides

important information on the studied ecosystem (Table 1; Figure 1). The dominant phylogenetic composition

of a microbiota can be assessed by analysing the 16S rRNA gene; this answers the question ‘Who is there’.

The development of a variety of high-throughput gene sequencing methods has permitted the analysis of the

entire gene repertoire of a microbial community, also known as the microbiome. The resulting metagenomic

information describes the functional potential of the intestinal microbes, answering the question ’What can

they do’. At present, phylogenetic and metagenomic approaches targeting the human microbiota are

dominating the literature. However, other meta-omic approaches that analyse microbial consortia within their

ecosystem are required to expand our knowledge of their in vivo functions by addressing the question ‘What

are they doing’. The metabolic net-outcome (metabolomics) has been determined in a variety of studies. To

some extent, the transcriptional activity of the microbial ecosystems (metatranscriptomics) has been

addressed as well. Similarly, less studied than the phylogenetic composition and metagenome is the

metaproteome, which includes the entire collection of proteins that are present within an ecosystem.

Studying the proteins is required to answer the question ‘What has happened’. The term metaproteome was

first proposed in 2004 (Rodríguez-Valera, 2004). In the same year, the first metaproteomic study, which

analysed the metaproteome of activated sludge, was published (Wilmes and Bond, 2004). It was not before

2006, that the possibilities and challenges of metaproteomics were described in full (Wilmes and Bond,

2006). At the start of this thesis work, only one paper had been published describing the qualitative changes

of the intestinal metaproteome occurring during early infancy (Klaassens et al. 2007). Due to the absence of

reference genomes that are instrumental for protein identification, less than ten proteins could be identified.

By now, metaproteomics has found application in many different environments from contaminated soil to the

Atlantic Ocean, from termite hind-gut to human cerumen (Benndorf et al. 2007; Warnecke et al. 2007; Feig

et al. 2013; Georges et al. 2014). The main aims of this thesis project were to develop an approach to analyse

the human intestinal metaproteome and then to utilize this approach to establish a baseline of a healthy

metaproteome.

Introduction

2

Table 1. Overview of state-of-the-art methods to study the human intestinal microbiota. The analytical target is named

as well as the expected deliverables (Readout).

Method Target Readout

Phylogenetic analysis 16S rRNA sequences Microbiota composition and diversity

Metagenomics All DNA present Catalogue of microbial coding capacity and

diversity on the functional level

Metatranscriptomics All mRNA Transcription of microbial genes as a proxy

for their function – some host transcripts

may be discovered as well

Metaproteomics All proteins Expression of microbial and host genes into

proteins as a mark for their intestinal

function

Metabolomics All metabolites and other small molecules Net metabolic output of the intestine derived

from host-microbe interactions

Culturing Single microbes Real functional analysis

Figure 1. Tools with which to study the intestinal ecosystem. The intestinal ecosystem is directly influenced by the host

and the environment surrounding the host.

Review of the Literature

3

2. REVIEW OF THE LITERATURE

2.1 Human intestinal microbiota

2.1.1 Development from birth to old age

We are born into a world full of microorganisms, including Bacteria, Archaea and Fungi as well as their

viruses. Consequently, these microbes also colonize us. The first cohabitants on our inner and outer surfaces

usually derive from our mothers and there seem to be routes that expose already the foetus to microbes,

challenging the earlier view that babies are borne completely sterile (Tissier, 1900; Jiménez et al. 2005; Jost

et al. 2014). Recent phylogenetic analysis has revealed that a great variety of microbial communities are

hosted by the human body, distinct to each body site and niche (Human Microbiome Project Consortium,

2012). Each individual harbours a unique microbiota, this being influenced by both the host’s genes and

environmental factors. However, for example the skin microbiota of two individuals is more similar than the

skin microbiota and the intestinal microbiota within one individual, highlighting the site-specificity of the

host-associated microbial communities. In addition, even a single body site, such as the mouth, can contain

many micro-environments and hence different microbial communities exist in different niches (Kort et al.

2014).

The intestine is not only the most complex but also the most studied of these ecosystems i.e. the

environment and the organisms living within it (Qin et al. 2010; Karlsson et al. 2014; Li et al. 2014). The

start-up community of the intestine may be shaped by the mode of delivery, infant nutrition – breast milk

versus formula – and use of antibiotics, and may thereby also have an impact on health later in life (Penders

et al. 2006; Scholtens et al. 2012; Nylund et al. 2014). Probably the most long-lasting effect of the intestinal

community on our well-being is the stimulation of the immune-system. The intestinal microbiota plays an

essential part in the priming of the immune system, reviewed by Weng and Walker (2013). This priming is

needed in order to acquire the delicate balance between tolerance to antigens derived from food and

commensals, and effective reactivity against pathogens later in life. As the human body and its physiology

develop from infancy to adulthood, so does our intestinal microbiota. There are drastic changes taking place

in infancy and childhood evidently affecting the microbiota. One early event is the change from nursing on

breast milk, containing growth stimulating components for microbes (Wang et al. 2015), or formula to eating

solid food (Bergström et al. 2014). The constant body growth and other developmental changes cause a

constantly changing environment for the intestinal microbiota. The present studies have mostly focused on

analysing faecal samples from infants and adults. From toddler age to adolescence less data is available and

therefore the constant succession of the microbiota from infancy to adulthood is largely unknown. However,

from early life until adulthood the diversity of the microbiota increases and an ecosystem rich in several

hundreds to thousands of microbial species becomes established (Nielsen et al. 2014). At the level of

operational taxonomic units (OTUs), unique organismal genetic units, the variation across individuals is

tremendous, even though the main phyla in the gut are limited, including Firmicutes, Bacteroidetes,

Actinobacteria, Proteobacteria and Verrucomicrobia. Collecting and curating 16S rRNA gene sequences at

97% similarity derived from the human intestine as deposited in public databases resulted in the definition of

2,473 OTUs (Ritari et al. 2015). These represent the present estimate of the number of bacterial and archaeal

species possibly found in the human intestine, meaning that there is an almost infinite number of

combinatorial community alternatives. In addition, more genera and species are constantly being isolated

(Rajilić-Stojanović and de Vos, 2014). As discovered over a decade ago, each individual harbours one’s own

personalized microbiota determined by genetic, environmental and dietary factors. As long as no drastic

changes occur (see below), the microbiota is relatively stable and the temporal fluxes are smaller than the

between-individual differences (Jalanka-Tuovinen et al. 2011). Therefore, the individual microbiota can be

used for subject-identification. A challenging down-side of the large microbiota differences between


4

individuals is evident when setting up group comparisons e.g. when attempting to study the effect of a

special dietary regime. The between-group differences are hard to detect when the within-group differences

are already large. However, unifying patterns across a collection of individuals have been identified and a set

of enterotypes, each dominated by a certain group of bacteria, have been proposed (Arumugam et al. 2011).

These patterns suggest that there is a limited set of module solutions providing the integral features in the

intestine, provided by dominating bacteria. Such patterns like the enterotypes may be useful in study design

since stratifying for a certain enterotype may make it easier to detect group differences. Recently, also the

microbiota diversity (Salonen et al. 2014) and the abundance of individual bacteria (Korpela et al. 2014)

have been reported to associate with the individual heterogeneity in dietary responses. This suggests that

different features of the microbiota can be utilized in the stratification of the study subjects and in this way

decrease the within-group variability.

The whole digestive tract, from the oral cavity down to the colon, provides several different niches for

microbial colonization, with the colon exhibiting the highest microbial density (Figure 2). As nutrient

availability and human components change along the gastro-intestinal tract (GIT), the composition of the

microbiota changes from one end to the other. The living environment of the intestinal microbes is not static

at all. Among other factors, the amount of oxygen, the pH that varies from acidic in the stomach to more

neutral in the intestine, and notably bile, which is a natural detergent, affect the types and amount of

microbes present. In addition, pH and bile can vary due to changes in the diet and these can lead to changes

in microbial communities (Duncan et al. 2009). The microbiota is encompassed by varying human enzymes

having their origin in the saliva, gastric and pancreatic juice, bile and, in early life, breast milk. For example,

the human intestinal alkaline phosphatase and the salivary scavenger and agglutinin (SALSA) have been

shown to have an effect on the growth of intestinal microbes (Lallès, 2014; Reichhardt et al. 2014). The

microbiota is also in contact with mucus, immune components and immune cells. Inevitably, evolution has

produced an individual host-microbiota mutualism (Quercia et al. 2014).

The human host provides the home for the microbiota and the intestinal microbes make a recognizable

contribution to human physiology. The most obvious factor that links men and microbes is the diet and its

major components; carbohydrates, lipids and proteins. Out of the large variety of carbohydrates, mono- and

disaccharides and the less complex polysaccharide starch are digested mostly by host enzymes in the upper

GI tract and immediately taken up as they are the major energy source. However, resistant starch (RS) and

non-starch polysaccharides (NSP), which include celluloses, pectin, and hemicellulose (xylose, mannose or

glucose), mostly reach the colon in an undigested form. Microbes, equipped with a large variety of

hydrolases, degrade, to different extents, these plant-derived carbohydrates, which consist of a variety of

sugars that are linked with various glycosylic bonds (Martens et al. 2014). For example, the genome of

Bacteroides cellulosilyticus encodes over 400 enzymes for carbohydrate degradation. In general, the

microbial gene repertoire for hydrolysing carbohydrates is much wider than that of the host. This means that

humans depend on microbes to degrade certain food components; this may provide flexibility to adapt to

short or long-term changes in the diet (Sonnenburg and Sonnenburg, 2014). Certain intestinal microbes not

only rely on carbohydrates of dietary origin, but they also have enzymatic capacity to degrade host glycans

derived from mucus, the glycoprotein structure secreted by goblet cells that covers the epithelial cells. For

example Akkermansia muciniphila possesses this characteristic (Derrien et al. 2004), reviewed by

Ouwerkerk et al. (2013); in this way, they contribute to a turn-over of this complex glycoprotein structure.


5

Figure 2. Schematic representation of the human digestive tract. Left side: concentration of microbes in cfu/ml. Right

side: human components shaping the ecosystem; body fluids are secreted to the digestive tract along the digestive

process; the intestine is lined by epithelial cells covered with a mucus layer produced by goblet cells, in yellow; various

immune cells are present near the epithelial cells, dendritic cell as representative (Wang et al. 2005; Lawley and

Walker, 2013; Schuijt et al. 2013)

While processing dietary and endogenous glycans, intestinal bacteria produce a variety of molecules that

contain energy and/or act as bioactive molecules. The short chain fatty acids (SCFA) represent one group of

microbial metabolites that is receiving considerable attention. Acetate, propionate, and butyrate are the major

SCFA being formed in the intestine (Cummings and Macfarlane, 1991). In particular, butyrate is of special

interest as it is both energy source and a signalling molecule for the epithelial cells. Several metabolic

pathways for forming butyrate have been described (Louis et al. 2010). Similarly, propionate can be

produced by different enzymatic routes (Reichardt et al. 2014). The main butyrate producers are members of

Ruminococcacea and Lachnospiraceae, the key species being Faecalibacterium prausnitzii and Roseburia

intestinalis, Anaerostipes caccae and Eubacterium hallii. Microbial networks exist for cross-feeding as

certain species depend on the presence of other microbes and their contribution of certain substrates (Scott et

al. 2013). For example Bifidobacterium ssp. produce lactate and acetate during growth on the polysaccharide

inulin. These metabolites are further utilized by other bacteria, e.g. E. hallii, to produce butyrate.


6

Similarly to the hydrolases for degrading glycans, intestinal microbes possess genes with coding

capacity to transform bile salts to a diverse range of metabolites. These metabolites have been shown to play

a role in modulating the host’s energy metabolism, reviewed by Li and Chiang (2015).

In addition, the GIT microbiota produces a set of proteinases. Microbial proteinases derived from

pathogens have been shown not only to digest proteins but also to have non-proteolytic properties such as

signalling to the host (Jarocki et al. 2014). However, as proteinases have been proposed to play a role in

inflammatory bowel disease (IBD), it is likely that also commensal proteinases can have other properties

than proteolytic ones (Steck et al. 2012).

Intestinal microbes also contribute to the micronutrient supply and synthesize vitamins, for example B

vitamins (LeBlanc et al. 2013). However, absorption by the host is not always proven. Thiamine has been

shown to be synthesized by the intestinal microbiota and also human transporters exist implying the

absorption by the host (Nabokina et al. 2014). Bacterial vitamin synthesis capacity seems to vary according

to geographic region and depend on food supply and age (Yatsunenko et al. 2012). This indicates that

vitamins of microbial origin contribute to the host’s vitamin supply. In contrast to vitamins, our GIT

microbiota for example depends on the trace element iron from the host’s diet, as reviewed by Kortman et al.

(2014).

Changes in the metabolic activities of the intestinal microbiota possibly occur with old age. When the

hormone set-up, the levels of physical activity, and nutrition habits are changing, the microbiota changes as

well. In centenarians, it has been found that an increase in the levels of inflammation, so called

inflammaging, was associated with a reduction in the numbers of butyrate-producing F. prausnitzii and an

increase in the levels of potential pathogens (Biagi et al. 2010; Rampelli et al. 2013).

2.1.2 Variation and modulation in adulthood

The first component in a human’s life affecting the composition of an individual’s intestinal microbiota is the

personal gene make-up. Twin studies have revealed that host genetics to some extent explain the

composition of the intestinal microbiota (Zoetendal et al. 2001; Turnbaugh et al. 2009; Tims et al. 2012). In

addition, single nucleotide polymorphisms have been shown to affect the intestinal community structure; this

has been documented for the gene encoding fucosyl transferase (fut2) that determines the secretor-status of

the host (Wacklin et al. 2014). The secretor-status profoundly affects the intestinal ecosystem as only in

secretors the intestinal mucosa is decorated with fucose, a carbohydrate abundantly utilized by intestinal

bacteria. Secretors and non-secretors have been shown to exhibit compositional differences in their

microbiota. This provides one of the first examples with a known mechanism to explain how the host

genotype shapes the intestinal microbiota (Wacklin et al. 2014). However, these are still early days of

integrating the two extremely complex systems and trying to link variations in the gut microbiome to those in

the human genome.

Second, diet, as introduced above, is inevitably influencing the intestinal community. However, only a

few human studies have been conducted that describe the daily differences in diet and the different macro-

and micro-nutrient content. For example, in a cross-over study in which study participants consumed either a

control diet, a diet high in RS or high in NSPs, or a weight loss-inducing diet, it was found that changes had

occurred in microbial composition upon diet-switching (Walker et al. 2011). The effect of diet, attributable

to food components, on the human intestinal microbiota has been reviewed recently (Salonen and de Vos,

2014; Graf et al. 2015). The microbial metabolomics profiles in vegans and omnivores have been compared,

the effect of energy-content of diet on microbial composition has been studied, and metagenomic,

metatranscriptomic and phylogenetic composition changes in response to a meat-fat based diet analysed

(Jumpertz et al. 2011; Wu et al. 2014; David et al. 2014). In addition, differences in microbial composition

and metagenomes in different ethnic groups have been detected (Yatsunenko et al. 2012; Tyakht et al. 2013;

Schnorr et al. 2014; O’Keefe et al. 2015). These studies possibly reflect the different diets of the ethnic

groups but it should be noted that these studies also compare populations with profound differences in


7

hygiene level, general lifestyle, and host genetics. Flexibility in metabolic activities similar to that of the host

is to be expected, reviewed by van Ommen et al. (2014). Therefore, setting aside extreme diets, most dietary

interventions in healthy adults only lead to subtle changes in the microbiota (Salonen and de Vos, 2014). A

recent meta-analysis including three different intervention studies on obese patients on different dietary

interventions showed that some individuals’ microbiota may be responsive and others not upon dietary

changes (Korpela et al. 2014). In a study, where healthy individuals consumed a mixture of seven milk-

fermenting bacterial strains only subtle changes were found, as observed at the transcriptional level. The

microbial composition remained unchanged (Mahowald et al. 2009). According to the definition provided by

the World Health Organization (WHO), probiotics are "live micro-organisms which, when administered in

adequate amounts, confer a health benefit on the host". They encompass a wide range of different species and

have been widely used for example to modulate gut transit time (Dimidi et al. 2014). On microbial

composition in healthy individuals, only subtle effects, if any at all, have been observed in response to the

consumption of probiotics (Gerritsen et al. 2011).

Third, the health status of a person can impact on the intestinal microbiota. Many associations between

microbiota and disease have been reported. Moreover, indirect effects are well studied, such as by the usage

of antibiotics. Also direct impact is likely but often it is not known whether the changes in the microbiota are

causal or a consequence of the disease. Dramatic changes in an individual’s microbiota can occur upon

antibiotic treatment as reviewed recently (Panda et al. 2014). It has been proposed that if no resilience is

achieved, i.e. the changes to the intestinal microbial composition upon antibiotic usage are not balanced,

infections by Clostridium difficile can occur (Song et al. 2013). These infections can become life-threatening

in the case when C. difficile produces toxins in the intestine. Infections by C. difficile have been efficiently

treated by introducing the microbiota from a healthy donor (Bookstaver et al. 2013; van Nood et al. 2013).

This is evidence that the microbiota is causal both for the reoccurrence of the disease and its cure, one rare

example of causality in a highly complex system.

There is a large number of diseases that may involve the intestinal microbiota (de Vos and de Vos,

2012) ranging from intestinal diseases, such as IBD (Cantarel et al. 2011; Machiels et al. 2014) and irritable

bowel syndrome (IBS) (Kassinen et al. 2007), to systemic conditions or diseases like obesity (Everard and

Cani, 2013), non-alcoholic fatty liver disease, and diabetes type II (Vrieze et al. 2012), and even neuronal

disorders such as Parkinson’s disease (Scheperjans et al. 2014). In most cases, differences in composition

have been detected but mechanistic explanations are missing. In addition, it needs to be demonstrated

whether these associations are a consequence or cause of the disease. Obesity and related metabolic

aberrations represent one of the most actively studied diseases in relation to the intestinal microbiota. Studies

addressing obesity-related phylogenetic profiles have yielded controversy results (Finucane et al. 2014).

Those might be due to the applied methodology but also to the study cohort as the severity of obesity

differed in the compared studies (Walters et al. 2014). On the other hand, several mechanisms that link gut

microbiota to obesity and metabolic diseases have been proposed. These include increased energy harvest

from the diet via SCFA production, enhanced fat storage as well as promotion of low-grade inflammation

that contributes to development of insulin resistance (Moreno-Indias et al. 2014). This clearly shows that the

functionality of the microbiota has to be studied in addition to the composition in order to understand the

host-microbe cross-talk in obesity.

2.2 Methods to study the human intestinal microbiota

2.2.1 Faecal material as the study object

The easiest access to the human intestinal microbiota is faeces, which contains microbes and molecules that

reflect as closely as possible the activities in the colon (Cummings, 1975). Since faeces can be obtained in a

non-invasive manner, human studies targeting the intestinal microbiota have been mostly based on the

analysis of faeces. In contrast, very few studies have involved samples taken from the ileum, intestinal


8

mucosa (biopsies) or were conducted from other types of specimens such as luminal washes or biopsies

(Booijink et al. 2010; Li et al. 2011). These studies clearly show that each compartment has its distinct

microbiota with its own distinct functions.

The components of faeces, which originate from host, food and microbes, reflect the whole digestive

process. Faeces has been used as a cure for over thousand years and its composition has interested scientists

for more than 100 years (Joslin, 1901). The bacterial mass represents more than half of the faecal wet weight,

half of which were found to be intact microbial cells although many of these cells were not alive (Stephen

and Cummings, 1980; Ben-Amor et al. 2005). Research has moved from describing the composition of

faeces to using it for discriminating between different body conditions. Individual human proteins have been

studied in faeces and used as disease markers e.g. calprotectin, a calcium binding protein derived from

human neutrophils, is utilized as an inflammatory marker in inflammatory bowel disease (Foell et al. 2009).

Elevated levels of calprotectin were also measured in a cohort of obese individuals indicating an inflamed

state of the intestinal ecosystem in individuals with a high BMI (Verdam et al. 2013).

The liquid phase of faeces, termed faecal water, has been widely used to test for toxic compounds, as

reviewed by Pearson et al. (2009). Recently, faecal water has also been used to study the net outcome of host

and microbiota metabolism and turn-over by metabolomics, both by nuclear magnetic resonance (NMR) and

mass spectrometry (MS) techniques (Gao et al. 2009; Wu et al. 2010; O’Keefe et al. 2015). Since the

metabolites present in faeces do not reflect the proportion of absorbed molecules and metabolites, it may be

complemented with blood analysis to measure systemic metabolites. For example, the levels of plasma lipids

have been determined and correlated to the intestinal microbial composition (Lahti et al. 2013a). Recently,

several studies assessing the metabolites in faeces appeared. For example differences in microbial

metabolites between chocolate-eaters and non-chocolate eaters were detected; another study described the

effect of moderate wine consumption (Rezzi et al. 2007; Dewulf et al. 2013; Jiménez-Girón et al. 2014).

2.2.2 Phylogenetic analysis

The intestinal microbiota is phylogenetically rich, spreading over the three domains of life Bacteria, Archaea

and Eukaryotes. There is a constant up-dating of phylogenetic grouping and classification (Rajilić-Stojanović

and de Vos, 2014). Fungi and Archaea are more genetically deeply branched but also less researched than

Bacteria, which represent the majority of intestinal microbes. However, Archaea, like Methanobrevibacter

smithii and Methanosphaera stadtmanae, are of special interest due to their role in hydrogen disposal and in

the production of methane that regulates the fermentation rate of some species (Gaci et al. 2014). Viruses add

an additional “phylogenetic” complexity to the intestinal ecosystem. Similarly to the mycome, the virome

has just started to be studied, but it should not be neglected when aiming to understand the entire intestinal

ecosystem (Norman et al. 2014).

The traditional way of identifying the phylogeny of a microorganism has been its isolation by culturing

and studying its structural, physiological and biochemical characteristics such as cell-wall structure with

Gram-staining and substrate preferences via feeding tests. There is the assumption that the majority of the

world’s microbial biodiversity has not been cultured yet (Rappé and Giovannoni, 2003). At present, 1,057

intestinal microbial species have been cultured; 957 Bacteria, 92 Fungi, and 8 Archaea (Rajilić-Stojanović

and de Vos, 2014). These represent less than half of the almost 2,500 known OTUs. Many of the intestinal

microbes are strict anaerobes or are otherwise very demanding regarding their preferred culture and growth

conditions. Moreover, some microbes may be obligate syntrophs and cannot be grown as pure cultures, or

may grow very slowly demanding a patient researcher and long term funding.

Molecular tools have been instrumental in exploring the microbial species variety. One specific gene has

been established to categorize the phylogeny of an organism i.e. the gene encoding the 16S rRNA, an

essential part of the prokaryotic ribosome. The 16S rRNA is an approximately 1,500 nt molecule, consisting

of conserved gene regions and nine hypervariable parts. These characteristics make it an ideal phylogenetic

marker and its sequence reflects evolutionary distance. The conserved regions are used to design nucleic acid


9

primers for amplification of a broad range of species whereas the hypervariable part can be used for species

identification or at least clustering it to its nearest neighbour.

Quantitative polymerase chain reaction (qPCR) assays have been used to quantify certain microbial

genera or species (Rinttilä et al. 2004). For community-wide microbiota studies 16S rRNA gene-based

microarrays have been developed, such as the human intestinal tract chip HITChip (Rajilić-Stojanović et al.

2009). Microbial fingerprints have been used to compare the stability of the microbiota over time and

relative differences of individual taxa between individuals and study groups. For example, phylogenetic

fingerprints have shown clustering according to an individual’s health status such as the presence of IBD and

obesity (Erickson et al. 2012; Verdam et al. 2013) and also served as predictors of the effect of diet (Korpela

et al. 2014).

As sequencing technologies have advanced and costs are sinking, phylogenetic analysis has reached a

next level of sophistication. Instead of analysing a predefined set of phylotypes, the full diversity of specific

microbiota can be explored. Next generation sequencing (NGS) techniques, sequencing techniques that

followed the Sanger sequencing protocol and outperform it in terms of throughput and costs, made the global

analysis of 16S rRNA possible. Sequencing platforms and strategies for gene assembly are reviewed in

(Nagarajan and Pop, 2013). Accumulation of 16S rRNA sequences from various environments necessitate

selection of publically available 16S rRNA for an annotation of the sequence reads as specific as possible.

This will ease to find differences between individuals as those are more likely to occur at the species or even

strain level rather than at the genus or higher levels. Such a database has been created recently for the

intestine containing 2,473 OTUs, as indicated above, compared to over 100,000 contained in the 16S rRNA

sequence repositories SILVA and Greengenes (Ritari et al. 2015). Accumulation of genomic and

metagenomic data has supported the development of a software tool called Picrust that infers functions from

the phylogenetic composition based on 16S rRNA gene information (Segata et al. 2012).

Although technical developments have greatly facilitated phylogenetic analysis, one main factor still

remains to complicate between-study comparisons i.e. DNA extraction. Large differences in the abundance

of the major phyla have been found across studies that cannot be explained only by the sample origin

(Lozupone et al. 2013). For simultaneous lysis of a diverse mixture of cells ranging from Gram-positive

Firmicutes to Gram-negative Bacteroidetes, mechanical lysis by bead beating has been shown to be the less

biased method (Salonen et al. 2010). In addition to DNA-extraction, sequencing specific biases such as the

choice of primers, especially which variable region they are targeting, and the analytical platform used,

affect the result of phylogenetic composition analyses (Lozupone et al. 2013).

2.2.3 Metagenomics

A number of technological developments including advanced PCR (Williams et al. 2006), next generation

sequencing technologies, and new gene assembly strategies (see above) have made it possible to target the

complete genetic content of microbial communities. Several studies have taken advantage of metagenomics

to examine the genetic variety of the intestinal microbiota (Table 2). Sophisticated data processing strategies

are required if one wishes to efficiently analyse the complex data generated. Hence, the recent years have

seen the construction of several pipelines for processing raw sequence reads up to gene annotation, such as

Medusa (Karlsson et al. 2014), Metaphlan (Segata et al. 2012), MEGAN (Huson and Weber, 2013), as well

as metaProx that has been designed for faster annotation (Vey and Charles, 2014).

In the late 2000s, two large projects focusing on the human microbiota were launched: the European

MetaHIT with about 20 million Euro funding from the European Union focusing on the intestinal microbiota

and the U.S. Human Microbiome Project HMP funded with over 100 million US $ from the National

Institutes of Health, targeting microbiota at various body sites. Following the initial discovery of a baseline

of 3 M genes, at the end of the MetaHit project, a total of 511 Europeans and 368 Chinese individuals had

had their intestinal microbiota sequenced (Qin et al. 2010; Li et al. 2014). The HMP has sequenced over 500

human intestinal genomes as a reference data set to facilitate the assignment of metagenomic reads (Human


10

Microbiome Jumpstart Reference Strains Consortium et al. 2010). In addition, the microbiota from all body

cavities of more than 100 subjects has been determined (Morgan et al. 2012). In both projects also data

analysis tools were developed and the data has been made available to the scientific community (MetaHIT:

http://meta.genomics.cn; HMP: hmpccad.com). The HMP has continued as integrated HMP (iHMP) which

will focus more on the functionalities of the microbiota (Integrative HMP (iHMP) Research Network

Consortium, 2014).

Having started from sequencing clone libraries based on Sanger technology (Gill et al. 2006), the

technological advances have been reflected in the rapid increase of the number of samples analysed in a

single study, constantly increasing the number of sequenced nucleotides and the lengths of the reads while

reducing the amount of time and money needed. Therefore, the number of analysed human metagenomes

went up from two in 2006 to over 1,000 in 2014. Most studies have focused on identifying the genetic

richness and coding capacity of the human microbiome with the focus on adults and European and U.S.

inhabitants (Table 2). Mostly, the sequencing platforms HiSeq (Illumina) and 454 (Roche) have been used in

the various projects. The fact that the next generation sequencing technologies is developing very rapidly is

illustrated by the fact that less than ten years after its launch, the 454 sequencing platform is no longer

supported by its manufacturer.

The intestinal metagenomic sequencing efforts were recently combined into two separate gene atlases

containing close to and over 10 Million microbial genes (Li et al. 2014; Karlsson et al. 2014; Table 2). Based

on the data of both atlases, it was found that a core microbiome of close to 300,000 genes was shared by 50%

of the study population. In the dataset collected by Karlsson and co-authors (2014), 1.3 million genes were

reportedly shared by at least 20% of the population. The functional richness was assessed in the dataset

collected by Li and co-authors (2014) and a total of 6,980 KEGG orthologous groups (KOs) (42.1% of

genes) and 36,489 eggNOG orthologous groups (60.4% of genes) were identified (gene orthology databases

like KEGG and eggNOG are detailed in 2.2.5). In the atlas reported by Karlsson and co-authors (2014) it was

found that the core genes, i.e. genes found in at least 50% of the subjects, derived mainly from Bacteroides

and Clostridium spp., which was to be expected as the majority of the intestinal microbiota belongs to the

phyla Bacteroidetes and Firmicutes. More than 67% of the genes were without KO or known function but

when looking at core genes, this number was reduced to 55% - this could indicate a bias in the annotation

level of the shared house-keeping functions or just reflect the fact that more attention has been given to

annotate the genomes of core bacteria. The fact that only half of the genes have been annotated indicates

that, to a large extent, the functional individuality still has to be discovered. The explanation why individuals

are either more or less prone to develop a certain disease or react in a specific way to a food component may

lie in these unknown functionalities.

In addition to metagenomes, sequencing of single microbial genomes is relevant for genome annotation

and protein identification (see below). During recent years, there has also been a tremendous effort expended

in these analyses. In 2007, there were only 306 bacterial genomes with 28 originating from the digestive

system from Homo sapiens at the IMG/JGI system. The most recent release (04.02.2015) contained 23,908

bacterial genomes, out of which 1,847 had Homo sapiens as the host and in these, 398 digestive system as

origin (1,034 genomes unclassified). The enormous increase of the number of sequenced bacterial genomes

during the last years has been reviewed recently (Land et al. 2015).


11

Ta

ble

2.

Ov

erv

iew

of

the

mo

st s

ign

ific

ant

hu

man

in

test

inal

16

S r

RN

A a

nd

met

agen

om

ic s

tud

ies.

Da

ta r

eso

urc

e

NA

Eu

rop

ean

Nu

cleo

tid

e

Arc

hiv

e

PR

JNA

28

11

7

DD

BJ/

EM

BL

/

Gen

Ban

k A

N

32

089

;

Gen

Ban

k A

N

FJ3

62

60

4–

FJ3

72

38

2

htt

p:/

/gu

tmet

a.

gen

om

ics.

org

.cn

/

NC

BI

AN

SR

A0

45

64

6 a

nd

SR

A0

50

23

0

Ma

in f

ind

ing

s

En

rich

ed m

etab

oli

sm o

f g

lyca

ns,

am

ino

aci

ds,

xen

ob

ioti

cs;

24

07

un

iqu

e cl

ust

er o

f o

rtho

log

ou

s g

rou

ps

(CO

G)

Fu

nct

ion

al d

iffe

ren

ces

bet

wee

n i

nfa

nts

and

adu

lts

and

bet

wee

n h

um

an i

nte

stin

al m

icro

bio

ta

and

oth

er m

icro

bio

ta

Ov

erre

pre

sen

tati

on

: ca

rboh

ydra

te m

etab

oli

sm;

un

der

rep

rese

nta

tio

n:

lip

id t

ransp

ort

&

met

abo

lism

Co

re m

etag

eno

me;

dif

fere

nce

s in

ob

esit

y

3,3

00,0

00

Mio

un

iqu

e g

enes

Up

dat

e o

f Q

in e

t al

. 2

01

0 g

ene

atla

s to

4,2

67,9

85

gen

es

Seq

uen

cin

g m

eth

od

,

dim

ensi

on

of

gen

etic

info

rma

tio

n

San

ger

seq

uen

cin

g

Bac

teri

al 1

6S

rR

NA

gen

e an

d

dis

tal

gu

t m

etag

eno

me

50

,164

pre

dic

ted

op

en r

ead

ing

fram

es;

40

% u

n-a

ssem

ble

d

San

ger

seq

uen

cin

g

1,0

57,4

81

of

met

agen

om

ic

seq

uen

ce r

ead

s

San

ger

seq

uen

cin

g,

NG

S

1,9

37,4

61

par

tial

bac

teri

al 1

6S

rRN

A g

ene

seq

uen

ces

2.1

4 G

b o

f m

etag

enom

ic

seq

uen

ce r

ead

s

NG

S

57

6.7

Gb

of

met

agen

om

ic

seq

uen

ce r

ead

s

NG

S

37

8.4

Gb

of

met

agen

om

ic

seq

uen

ce r

ead

s

Stu

dy

co

ho

rt

Co

un

try

of

sam

ple

co

llec

tio

n

Sa

mp

le n

um

ber

Gen

der

, a

ge,

BM

I, d

iet

(wh

ere

ava

ila

ble

)

U.S

.

2

1 M

ale

age

37

, on

e fe

mal

e ag

e 2

8

on

e o

n u

nre

stri

cted

die

t, o

ne

veg

etar

ian

Jap

an

13

7 m

ale,

5 f

emal

e

3-7

mo

nth

s: 4

; >

1<

3 y

ears

: 2

; 2

4-4

5 y

ears

:

7

U.S

.

15

4

31

mon

ozy

go

tic

and

23 d

izy

go

tic

twin

pai

rs (

age

21

-32 y

ears

) an

d 4

6 m

oth

ers

Eu

rop

ean

or

Afr

ican

an

cest

ry

con

cord

ant

for

ob

esit

y o

r le

ann

ess,

lean

/ov

erw

eig

ht

or

ov

erw

eig

ht/

ob

ese

two

sam

pli

ng

po

ints

wit

h 5

7±

4 d

ays

inte

rval

s

Den

mar

k,

Sp

ain

12

4 (

56

% f

emal

e, 4

4%

mal

e)

Av

erag

e ag

e 5

6 y

ears

(ra

ng

e 18

– 6

9)

Av

erag

e B

MI

27

kg

/m2 (

17

.9 –

40

.2)

Ch

ina

36

8 (

dia

bet

ic p

atie

nts

an

d c

on

tro

ls)

43

% f

emal

e, 5

7%

mal

e

Av

erag

e ag

e 4

8 y

ears

(ra

ng

e 13

- 8

6)

Ref

eren

ce

Gil

l, 2

00

6

Ku

rok

awa,

20

07

Tu

rnb

aug

h,

20

09

Qin

, 2

010

a, b

Qin

, 2

012

a, b

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=32089

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=FJ362604

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=FJ372382


12

htt

p:/

/ww

w.h

mp

dac

c.o

rg

/HM

GC

/

MG

-RA

ST

ser

ver

Illu

min

a 1

6S

rR

NA

AN

“qii

me:

85

0”;

mic

rob

iom

e sh

otg

un

seq

uen

ces

AN

“qii

me:

62

1”

Seq

uen

ce R

ead

Arc

hiv

e

ER

P0

02

46

9

Eu

rop

ean

Nu

cleo

tid

e

Arc

hiv

e

ER

P0

03

61

2

ER

P0

02

06

1

AN

: ac

cess

ion

nu

mb

er,

NG

S:

nex

t g

ener

atio

n s

equ

enci

ng

, H

MP

: hu

man

mic

rob

iom

e p

roje

ct

a In

teg

rate

d i

n t

he

gen

e co

llec

tio

n b

y L

i et

al.

201

4;

12

68

met

agen

om

es a

t E

uro

pea

n N

ucl

eoti

deA

rch

ive

PR

JEB

52

24

bIn

teg

rate

d i

n t

he

gen

e co

llec

tion

by

Kar

lsso

n e

t al

. 2

01

4;

78

2 m

etag

eno

mes

; 40

bil

lio

n m

etag

eno

mic

rea

ds

fro

m s

equ

enci

ng o

n I

llu

min

a p

latf

orm

;

htt

p:/

/web

.stu

den

t.ch

alm

ers.

se/g

rou

ps/

sysb

io-m

mg

/ME

DU

SA

/

Mic

rob

iom

es a

t 1

8 f

emal

e an

d 1

5 m

ale

bo

dy

sit

es;

Dis

trib

uti

on

of

pat

hw

ays

acro

ss

sub

ject

s m

ore

ho

mo

gen

eou

s th

an

dis

trib

uti

on

of

tax

a;

Fu

nct

ion

alit

y o

f m

ajo

r ce

llu

lar

pro

cess

es a

pp

eare

d t

o b

e si

mil

ar a

cro

ss

ind

ivid

ual

s

11

0 m

etag

eno

mes

;

Ag

e an

d g

eog

rap

hic

ori

gin

dep

end

ent

dif

fere

nce

s;

In a

ll a

ge-

gro

up

s p

ron

oun

ced

dif

fere

nce

s b

etw

een U

.S.

and

Am

erin

dia

ns

Dia

bet

ic a

t-ri

sk i

nd

ivid

ual

s p

red

icti

on

by

mic

rob

iom

e an

aly

sis

Av

erag

e 57

8,5

12

gen

es/m

icro

bio

me

(ran

ge:

59

,14

7–8

78

,81

6)

Lo

w a

nd

hig

h r

ich

div

ersi

ty

Gen

om

e as

sem

bly

fro

m m

etag

eno

mic

dat

a w

ith

ou

t u

se o

f re

fere

nce

gen

om

es

NG

S

16

rR

NA

gen

e se

qu

enci

ng

and

m

etag

eno

mic

s fo

r a

sub

set

of

sam

ple

s

NG

S

1,0

93,7

40

,274

met

agen

om

ic s

equ

ence

read

s

NG

S

45

3 G

b o

f m

etag

enom

ic

seq

uen

ce

read

s

NG

S

10

,067

,99

6,4

12

met

agen

om

ic s

equ

ence

read

s

NG

S

4.5

Gb

met

agen

om

ic

seq

uen

ce r

ead

s p

er s

amp

le

U.S

.

24

2

mea

n a

ge

of

27

±5

(ra

ng

e 1

8-4

0 y

ears

)

mea

n B

MI

24

± 4

U.S

.

Am

azo

nas

of

Ven

ezu

ela,

rura

l M

alaw

ian

com

mu

nit

ies,

US

A m

etro

po

lita

n a

reas

53

1

11

5 i

nd

ivid

ual

s (3

4 f

amil

ies)

fro

m M

alaw

i; 1

00

ind

ivid

ual

s (1

9 f

amil

ies)

fro

m V

enez

uel

a; a

nd

31

6 i

nd

ivid

ual

s (9

8 f

amil

ies)

fro

m t

he

US

A

32

6 i

nd

ivid

ual

s ag

ed 0

–1

7 y

ears

(8

3 M

alaw

ian

,

65

Am

erin

dia

n a

nd

178

res

iden

ts o

f th

e U

SA

)

plu

s 2

02

ad

ult

s ag

ed 1

8–7

0 y

ears

Eu

rop

e

14

5

70

yea

rs o

ld f

emal

es

no

rmal

, im

pai

red

or

dia

bet

ic g

luco

se c

on

tro

l

Den

mar

k

29

2 (

53

% f

emal

e, 4

7%

mal

e)

Av

erag

e ag

e o

f 5

6 y

ears

(ra

nge

50

- 6

1)

BM

I 1

8.1

– 4

6.6

kg

/m2

33

% l

ean

, 9

% o

ver

wei

gh

t, 5

8%

ob

ese

Den

mar

k,

Sp

ain

31

8 (

incl

ud

ing

sam

ple

s fr

om

Qin

et

al.

20

10

)

Incl

ud

ing

UC

an

d C

D p

atie

nts

Ta

ble

2 c

on

tin

ued

HM

P c

on

sort

ium

,

20

12

a, b

Yat

sun

enk

o, 2

012

Kar

lsso

n,

20

13

b

Le

Ch

atel

ier,

20

13

a

Nie

lsen

20

14

a


13

2.2.4 Metatranscriptomics

One step down the cellular machinery to gene expression is the transcription into mRNA. However, since

transcripts of bacteria and archaea have a short half-life, the design of the experimental approach and the

preparation of the samples should be appropriate. In babies, where the metabolic turn-over is high and the

intestinal transit usually very fast, transcripts may provide crucial information about intestinal activity. In

adults mucosal or ileal samples are very appropriate for mRNA analysis and have provided essential insight

into the metabolic conversions occurring in the upper intestinal tract where the microbes compete with the

hosts for carbohydrates and form various trophic chains (Zoetendal et al. 2012; Zoetendal and de Vos, 2014).

Moreover, this approach has revealed the functional activity of ingested Lactobacillus strains (Marco et al.

2010). In contrast, faecal samples from adults have been studied but the functional information which can be

obtained from these specimens may be limited to the stable messengers (Booijink et al. 2010). It has been

described that the digestive process in the human intestinal tract can last for up to six days (Cummings et al.

1978).

Since the mRNA should be processed immediately, transcriptional analysis of faecal samples may be

not become routine practice. Similar to 16S rRNA analysis, transcriptome analysis has moved from mere

profiling and analysing a set of pre-defined mRNAs on a microarray to targeting each microbiota

individually by sequencing mRNAs in an approach called RNAseq (Hampton-Marcell et al. 2013). Although

faecal metatranscriptome data has to be interpreted with caution due to the above mentioned technical

challenges, the present studies have provided evidence of the differences between potential function and

regulated functions in vivo. In a recent study conducted in healthy men, the metatranscriptomes differed

more than the metagenomes, reflecting the importance of functional studies as gene expression depends on

many different factors (Franzosa et al. 2014). When individuals were challenged with a meat-fat based diet,

samples clustered with KEGG assigned RNAseq data according to the diet but the taxonomic clustering

remained subject-specific implying a strong effect of diet on microbial transcriptional activity (David et al.

2014).

2.2.5 Metaproteomics

Proteins are the real key-players of any physiological process. Among others, they can process substrates

providing metabolic energy, have signalling and control functions within and outside the cells, have

structural functions, and make a cell mobile. It is important to address the actual function out of the many

possible ones encoded by the microbiome in order to obtain a more profound understanding of the activities

of the intestinal ecosystem and the host-microbe interactions. Proteins are the ultimate targets for this as they

determine the function of a cell. Moreover, protein synthesis is an energy-demanding process. For example

in E. coli it was found that two thirds of the total energy used for growth was spent on protein biosynthesis

(Neijssel and Mattos, 1994; Jewett et al. 2009). Hence, proteins are proxies of activity and since they are

rather stable (Taverna and Goldstein, 2002) their analyses both represent information of on-going activities

and those having taken place in the recent past. The term proteome describes the unity of proteins expressed

at a certain time under a certain condition in an organelle or organism. This has to be expanded in the context

of faecal metaproteomics as first reported by Klaassens and co-authors (2007) when they conducted a study

in human babies. As detailed above, faecal material may contain proteins, or their fragments, derived from

all steps of digestion. Therefore, faecal metaproteomics aims at analysing proteins expressed over a period of

time. The proteins present in saliva (Harden et al. 2012), gastric fluid (Kam et al. 2011), bile acid (Barbhuiya

et al. 2011), pancreas (Schrimpe-Rutledge et al. 2012; Paulo et al. 2013), and amniotic fluid (Michaels et al.

2007) have been characterized.

Large-scale analysis of proteins, i.e. measuring hundreds to thousands of different proteins in one run

has become possible due to the progress in two techniques i.e. liquid chromatography (LC), enabling the

separation of several thousands of peptides based on their differences in hydrophobicity within a few hours,


14

and mass spectrometry, making accurate mass measurement possible. Metaproteomics has advanced from

the two dimension gel-electrophoresis (2-DE) approach followed by matrix assisted laser

desorption/ionization time-of flight (MALDI-ToF) MS analysis to exploit mainly the 1-DE approach

followed by LC tandem mass spectrometry (MS/MS), called GeLC-MS/MS. Instead of separating proteins

according to their isoelectric point and molecular weight followed by the analysis of a limited set of proteins

(2-DE), complex protein mixtures are fractionated by their molecular weight only (1-DE). The GeLC-

MS/MS approach allows the analysis of a large mixture of peptides which are analysed in a single step by

nano-high-performance LC-MS/MS. Reverse phase (RP) columns consisting of C18 molecules are used and

a gradient increasing in hydrophobicity is applied to supply the mass spectrometer for peptide measurements.

The various aspects of faecal metaproteomics have been reviewed recently (Study V of this thesis); the

salient features of this discipline are summarized below (Figure 3).

Figure 3. Overview of a metaproteomics workflow with proposed specifications based on this thesis work.

Sample preparation

Prior to LC-MS/MS analysis, the proteins need to be liberated from the sample matrix. As in all meta-omics

techniques, a large collection of organisms can be the molecule-source for these proteins. Similarly, the

proteins may be extracellular or intracellular. The available options for metaproteomics include the analysis

of faecal water, chemical or mechanical lysis of either microbial cells, or all of the cells present in faeces. All

of these options have found application in faecal metaproteomics but a systematic comparison such as for

other metaproteomic samples like soil and activated sludge has yet to come (Keiblinger et al. 2012; Hansen

et al. 2014). Due to limitations of the nano-HPLC and the mass spectrometers, complex protein mixtures

need to be fractionated. This can be achieved by fractionation with 1-DE, followed by dividing the gel into

several fractions that are then separately analysed by LC-MS/MS. Peptide mixtures can also be separated by

strong cation exchange (SCX) chromatography and the resulting fractions analysed by reverse phase LC-MS.

Another possibility is to inject the complex mixture into the LC and run a shallow gradient for a long period

of time (see below).


15

LC-MS/MS

Proteins are relatively stable molecules (see above) and this eases their analysis. To increase throughput, the

proteins are digested into peptides prior to being subjected to MS. In order to reach the mass spectrometer

after elution from the chromatographic columns, the peptides need to be charged by ionization. The mass

analyser sorts the peptide ions according to their mass to charge (m/z) ratio and a detector measures their

intensities. Modern mass spectrometers are usually hybrids containing two mass analysers; one high accurate

mass analyser measuring the mass of intact peptide ions i.e. parent ions and, for easing down-stream peptide

identification, one less accurate but fast mass analyser for fragmenting peptide ions and analysing the masses

of the fragment ions. Several solutions are available for each component of a mass spectrometer, i.e.

ionization source, mass analyser, and detector. These are compatible with each other and many different set-

ups exist (Schneider and Riedel, 2010; Graham et al. 2011). There are also different methods available for

fragmenting peptide ions (Guthals and Bandeira, 2012) resulting into different ion types. Typically, in

metaproteomics studies, the peptides are ionized after nano-LC by electro-spray ionization and their m/z

value analysed by hybrid mass spectrometers. The fragmentation is carried out by collision-induced

dissociation leading to the production of mostly b- and y-ions, a fragmentation pattern which is appropriate

for peptide-identification. Since its invention in the late 1980s, mass spectrometers have undergone rapid

developments, both in terms of accuracy of the m/z determination and of speed of molecules analysed within

a certain time window. In this thesis, this is illustrated by the fact that with the mass spectrometer used in

Study IV five times more fragments were measured in the same time window (duty cycle) compared to

Study I. These developments allow the analysis of complex protein mixtures, such as those derived from

environmental systems, including the human intestinal tract. The invention of nano-HPLC allows the

injection of few l and therefore sample availability is mostly not the bottle neck. This is a clear advantage to

2-DE.

Despite the fact that only a limited number of human intestinal metaproteomic studies have been

reported, various options for sample preparation and data analysis have been described, reflecting the

multitude of options the proteomics tool box is offering. Least variation is seen in the LC-MS/MS setup

reflecting the 1D gel/gel-free approaches and nano-LC plus Orbitrap as state of the art. In the faecal

metaproteomics approaches, three major pipeline solutions have been applied: 2-DE followed by LC-

MS/MS, 1-DE followed by LC/MSMS, and SCX chromatography coupled to LC-MS/MS (SCX RP LC-

MS/MS) (Table 3). The sample complexity of environmental samples is generally higher than in classic

proteomics studies where the proteins originate from one organism and are usually present in a homogenous

sample matrix. In addition, one dogma for proteomics i.e. that all (or at least the majority) of expected

proteins is contained in the sequence database (Nesvizhskii, 2014), is hardly fulfilled in metaproteomics. The

discrepancy between the large interest in metaproteomics and its analytical challenge is expressed by the fact

that the number of review articles exceeds the number of experimental articles. An overview of the human

intestinal metaproteomic studies highlighting the differences in analytical approaches, used databases and

outcome in protein numbers and identified functionalities has been recently published (Study V of this

thesis). Since its publication, three additional studies have been reported which are focusing on method

development and those are detailed in the Results and Discussion section of this thesis (Lichtman et al. 2013;

Xiong et al. 2014; Tanca et al. 2015). One additional study described proteomic differences between healthy

controls and patients with Crohn’s Disease (CD) based on 2-DE profiles of faecal proteins (Juste et al. 2014).

Of note, the 16S rRNA analysis did not reveal any grouping according to the health status among the studied

samples, highlighting the added value of proteome compared to conventional microbiota analysis based on

phylogeny.


16

Table 3. Features of pre-fractionation techniques applied in human intestinal metaproteomics.

Feature 2-DE 1-DE SCX

Required protein amount > 10 µg 1-10 µg 0.01 -0.001 µg

Throughput sample

preparation

Low

Medium

High

Throughput LC-MS/MS Low Medium Low

Identification efficiency 10 – 100 proteins from one

sample

>500 proteins from one

sample

>500 proteins from one

sample

2-DE: two dimensional gel electrophoresis; 1-DE one dimension gel electrophoresis; SCX: strong cation exchange

Peptide identification

There are three main routes for inferring a peptide sequence to an m/z value. Those are peptide spectral

matching (PSM), “de novo” sequencing, and the use of spectral libraries (Lam et al. 2008; Allmer, 2011;

Vaudel et al. 2012). Most frequently applied is PSM which relies on matching parent and fragment ions

against an in-silico digest of a protein sequence database. Many different algorithms have been developed for

performing this task; these are available via open source tools or commercial software (Shteynberg et al.

2013). The choice of the protein sequence database is a very crucial one due to the complexity of the

analysed samples in faecal metaproteomics. Hence, this is an essential step in the whole metaproteomic

pipeline which is elaborated in the Results and Discussion part of this thesis. De novo sequencing i.e. direct

inference of a peptide sequence from the m/z value is not yet routinely used in metaproteomics (Cantarel et

al. 2011; Muth et al. 2015) as statistical methods to control for false positive identifications need to be

developed. Spectral libraries, i.e. high quality reference m/z repositories, do not yet exist for metaproteomics.

Since PSM is a probability-based approach, the number of false positive identifications has to be

controlled and kept on a low level (Jeong et al. 2012). One common method is to use the results from PSM

against a decoy database, which for example can be the sequences from the target database in reverse or a

shuffling of the amino acids (Yadav et al. 2012). This approach has been introduced for classical proteomics

and it has been shown that there is still improvement regarding sensitivity and accuracy of PSM in

metaproteomics (Muth et al. 2015). False discovery rate estimations at the protein level are impaired by the

problem of assigning peptides to proteins in metaproteomics. As the same peptide can be part of proteins

originating from various species, phyla or even domains of life, protein inference is put to its extreme in

metaproteomics. Therefore, analytical strategies are required to increase the statistical validity of PSM in

metaproteomics.

Specific post-translational modifications (PTMs) have not been considered in any human intestinal

metaproteome study, as the complexity is already extremely high. A large variety of bacterial PTMs is

known and includes phosphorylation, methylation and glycosylation, reviewed by Cain et al. (2014). In

general, targeted proteomics approaches are required to identify for example phosphopeptides, reviewed by

Huang et al. (2014). A proteomics approach has shown that some of their very abundant glycolytic enzymes

of Lactobacillus rhamnosus GG are phosphorylated (Koponen et al. 2012). A large variety of PTMs are to be

expected to be identified from intestinal metaproteomes.

Quantification

In addition to identify the different proteins in the metaproteome, quantifying their amounts is required for

sample comparison. Absolute quantification cannot be achieved in high-throughput proteomics but basic

approximations for assessing protein quantity exist and are very useful. Two major approaches can be taken:

labels can be introduced to compare between different samples based on the ratio of the differently labelled

peptides, but also label-free quantification has been made possible during the last decade due to technical and

software improvements (Seifert et al. 2012; Otto et al. 2014; Arsène-Ploetze et al. 2014). The latter approach

has been mostly applied in faecal metaproteomics. Label-free analysis is either based on the chromatographic


17

peaks of MS1 data, i.e. is the unity of all measured peptide ions, to compare peak intensities across

experiments. This requires sophisticated software to adjust for differences in chromatography and peak

intensities (aligning and normalization). Another, more simplistic but effective approach is counting the

spectral hits for a certain unit (that may be proteins, peptides, taxonomic or functional unit). Various

normalization methods have been described, reviewed by Neilson et al. (2011).

During the last decade, selected reaction monitoring (SRM) methods, also called multiple reaction

monitoring (MRM) methods, have been developed to quantify targeted proteins, reviewed by Liebler and

Zimmerman (2013). Proof-of-principle studies identify proteins of interest and then a method is established

to quantify these proteins. In a study on the metaproteome of the mucosal samples of CD patients a subset of

proteins that were found to be significantly differently expressed in the 2-DE experiments, were targeted and

verified with SRM/MRM (Juste et al. 2014). All of the SRM/MRM experiments of the thirteen proteins

verified the earlier 2-DE results. Recently, SRM methods for 10,000 human proteins have been published

(Rosenberger et al. 2014). In future, such methods may also be developed in large scale to target specific

microbial proteins derived from the human intestine.

In general, the evaluation of an approach is crucial for developing a method for multi-sample

comparisons, especially when a multitude of options is available for each single analytical step. Already for

the 1D separation and consequential protein in-gel digestion, a variety of choices have to be made and these

will influence the subsequent results (Speicher et al. 2000; Glatter et al. 2012). Hence, the evaluation of a

metaproteomic pipeline needs to be carefully planned and executed.

The technical development in MS-based proteomics poses a challenge for data analysis and data storage

(data repositories) as well as handling the constantly increasing size of raw files. A file containing the

MS/MS spectra of a single sample or gel slice can be as large as 0.5 GB. Improvements have been achieved

by standardisation of files for across-pipeline usage and public data repositories to enable reanalysis with

constantly improving data analysis tools. For example, analysis solutions have been developed, with cloud

computing to maximize compute resources (Muth et al. 2013). To handle these big data, software packages

for data visualization are required; these have been reviewed recently (Oveland et al. 2015).

Proteomics applied in microbiology

Proteomics has been widely used to study microbial activity but there are only very few studies on

commensal intestinal microbes, in-vitro as well as in animal model (Schneider and Riedel, 2010; Otto et al.

2014). Performing a literature search on PubMed on 15/02/23 with the term “proteomics” in combination

with either “pathogen”, “probiotic”, or “commensal” resulted in 1657, 81 and 55 hits, respectively. For

example, the pathogen Staphylococcus aureus has been studied for its response to glucose-starvation using a

proteome approach (Michalik et al. 2012). A reference proteome was reported for L. rhamnosus GG which is

widely marketed as a probiotic and consumed world-wide, and this paradigm probiotic strain was analysed

by proteomics in several culture conditions also resembling the intestinal environment (Koskenniemi et al.

2009; Koskenniemi et al. 2011; Koponen et al. 2012). Recently, the same group of researchers analysed the

surface proteome of L. rhamnosus GG as this possibly contains the interacting molecules in the host upon

consumption and identified expected secreted proteins as well as a set of moonlighting proteins (Espino et al

2015). Other probiotic or model strains that have been studied by proteomics are L. plantarum WCFS1

(Cohen et al. 2006), Bifidobacterium longum subsp. infantis (Kim et al. 2013), Propionibacterium

freudenreichii (Le Maréchal 2015), and Weissella confusa (Lee et al. 2008). One example of a commensal of

which its proteins were studied is Enterococcus faecalis (Lindenstrauß et al. 2013). In addition to measuring

activity, proteomics has recently been used to identify microbial strains by MALDI in the case of

Bifidobacterium animalis (Ruiz-Moyano et al. 2012).

Similarly, the proteome of several human body compartments have been determined as detailed above,

but only rarely the data is also searched against bacterial sequence entries.


18

Inferring functions to proteins

Once a new genome is sequenced, the function of its genes or proteins is inferred from homology analysis

with previously annotated genes or proteins; in the best case scenario the functions of selected genes are

discovered by knock out, overproduction and phenotypic analysis. Due to the exponential growth of

sequenced genes and therefore hypothetical proteins, experimental proof of the functions of these genes is

lagging far behind. In the most recent release of the Universal Protein Resource Knowledgebase

(UniProtKB) (2015_02 of 04-Feb-2015), the curated part Swissprot database contained 547,599 entries

(http://web.expasy.org/docs/relnotes/relstat.html), whereas the non-curated part Trembl contained more than

100-fold more sequences, with 90,860,905 sequences (http://www.ebi.ac.uk/uniprot/TrEMBLstats/). It

appears that the Swissprot database has reached a plateau whereas in Trembl the number of sequences is

doubling approximately every year. Evidently, this generates the problem that annotation of new genes

mostly relies on non-curated sequences and therefore introduces biases.

There are several approaches linked to extensive databases that can be used to assign proteins to

functional groups (Table 4). They differ in terms of the number and type of genomes on which the

classification has been based on. In addition, the amount of levels for functional grouping varies and can be

shallow like for the Cluster of Orthologous Genes (COG) (Tatusov et al. 1997) or more branched like the

Kyoto Encyclopedia of Genes and Genomes orthology (KO) (Kanehisa et al. 2004). The Cluster of

Orthologous Genes was originally built for bacterial proteins but extended for human functions as well

(Koonin et al. 2004). However, there has been a break of more than ten years in updating this classifier as a

result of funding issues. Considering the tremendous increase in sequenced genomes, this clearly created

insufficient coverage but the recent 2014 update provides a reasonable starting position (Galperin et al.

2015). The database eggNOG built on the COG database but has a higher sequence coverage and therefore

provides a good candidate for the functional annotation of especially bacterial genomes (Jensen et al. 2008).

Recently, the Metaproteome Analyser (MPA), a pipeline which ranges from PSM to functional

annotation, was released (Muth et al. 2015). One sample can be analysed with a collection of up to four non-

commercial search algorithms within one analysis process. The interface allows representing the taxonomic

and functional results in overview graphs. However, for this feature uniprot identifiers are required to infer

taxonomy and function and cannot be used in all metaproteomic approaches yet, as many protein candidates

for metagenomic protein sequence databases do not contain uniprot identifiers.

Table 4. Overview of gene orthology databases.

Database Description Source

COG Cluster of orthologous

genes

(Galperin et al. 2015)

25 functional families

4,631 different COGs

711 full length eukaryotic, archaeal and

bacterial genomes

http://www.ncbi.nlm.nih.gov/COG/

eggNOG non-supervised

orthologous groups (Powell et

al. 2014)

25 functional families

1,700,000 orthologous groups

Full length genomes

3,686 organisms (eukaryotic, archaeal and

bacterial)

http://eggnog.embl.de

KEGG

(Kanehisa et al. 2012)

18,449 KEGG orthologies (KO)

(2015/2/18; inferred from KEGG GENES

16,055,760), 469 pathway maps as one

part of the KEGG database

one KO may belong to a variety of higher

levels

last free version 2011

http://web.expasy.org/docs/relnotes/relstat.html


19

2.2.6 Data integration

Studying the genes, transcripts, proteins or metabolites of the human intestinal microbiota all have their

individual challenges. However, for a systemic understanding, different perspectives have to be taken at the

same time and different data sets need to be integrated. Recent integrated approaches of the intestinal

metaproteome followed a simple path and mainly described the study cohort (with only a limited set of

subjects) from different angles and reported the similarities or differences with different data sets, like

metagenomics and metatranscriptomics (Erickson et al. 2012; Ferrer et al. 2013; Daniel et al. 2014). Real

integration still needs to be further enforced, as it is exemplified in this thesis, by using different types of

data, like correlating the abundance of present community members with proteins to explain certain health

conditions. Models of metabolic fluxes and network-modelling can be integrated as well (Faust et al. 2012).

The term systems biology has been largely used for single organism studies but as analytical methods as well

as bioinformatics are advancing, it is now possible to examine the human intestinal microbiota in a systems

biological approach (Dos Santos et al. 2010; Siggins et al. 2012; dos Santos et al. 2013). Especially for

generating hypotheses, intestinal models, as reviewed in (Venema and Van den Abbeele, 2013), as well as

animal models help in reducing the complexity of the intestinal ecosystem. These hypotheses can then be

tested in human trials. The differences between mice and humans have been reviewed recently (Nguyen et al.

2015). When testing the effect of diet on a consortium of 12 human bacterial strains in a mouse model,

transcriptomics, composition analysis and proteomics revealed different changes at each analytical level

(McNulty et al. 2013). The share of unique peptides per strains in the in vivo community was different to the

predicted contribution based on genome comparison. These findings evidence the need for studying the

intestinal community with complementary omics methodologies, not only to increase the insight, but also to

reduce biases of each of the approaches. A new methodology is the use of stable isotope probing (SIP) that

has been used to identify the bacteria with active metabolic conversions in an intestinal model and could also

be used in humans to study protein fluxes (Kovatcheva‐Datchary et al. 2009). Recently, SIP has been used to

study the active community found in mouse intestinal biopsies (Berry et al. 2013).


20

3. AIMS OF THE STUDY

The main aim of the thesis work was to characterize the intestinal ecosystem based on the proteins identified

from faecal material. In detail, there were four main aims of this study:

I) To develop methods for assessing the metaproteome from the human intestinal tract and implement these

in a proof-of-principle study;

II) To determine the intestinal functionalities and their short-term variance in healthy adults in the context of

a probiotic intervention;

III) To conduct a comparison of the intestinal functionality of non-obese and obese individuals based on

microbial and host proteins;

IV) To evaluate the methodologies which can be used to study the metaproteome of the human intestinal

microbiota.

Materials & Methods

21

4. MATERIALS & METHODS

4.1 Study cohorts and samples

During the course of this work faecal samples from four different study cohorts were analysed (Table 5).

The individuals in Study I took part in the Micro-Obese cohort recruited from the large SU.VI.MAX2

cohort, in which individuals had been followed for eight years for their healthy nutritional and lifestyle habits

(Hercberg et al. 2004).

For Study II, which was a proof-of-principle study to assess the reproducibility of the developed

approach and to determine the variation over time, samples were collected in-house from adults who

provided informed consent.

The study participants in Study III were a sub-set of a larger cohort of initially 68 individuals taking part

in a probiotic trial testing three different probiotic strains in a double-blind setting (Lactobacillus rhamnosus

GG, 1010 cfu per day; Bifidobacterium animalis ssp. lactis Bb12, 108 cfu per day; and Propionibacterium

freudenreichii ssp. shermanii JS, 1010 cfu per day). The study product, to be consumed daily for three weeks,

was a 250 ml fruit-based milk-drink without or with the probiotic strains. The set-up of this trial, the

determined immune-markers and lipids, and the correlation between levels of lipids and the microbiota had

been reported previously (Kekkonen et al. 2008a; Kekkonen et al. 2008b; Lahti et al. 2013a) (Figure 4).

Metaproteomic analyses were carried out on samples from eight individuals from the placebo group and

from eight individuals from the L. rhamnosus GG group.

In Study IV, the intestinal metaproteomes were analysed in faecal samples from nine lean (BMI > 18.5 <

25 kg/m2), four overweight (BMI > 25 < 30 kg/m2), and 16 obese subjects (BMI > 30 kg/m2; N=4); of these

seven were morbidly obese (BMI > 45 kg/m2). The microbial composition and clinical host variables had

been determined previously (Verdam et al. 2013).

Table 5. Overview of study cohorts analysed in this thesis

Study Type Country of

sample

collection

Subjects No.

(Female:Male)

Timepoints

(intervals)

Sample

storage

Age

range

BMI

Study I Observational

study

France 2 (0:2) 1 Fresh 61-63 < 30

Study II Proof of

principle study

Finland 3 (3:0) 2 (1y/

1/2 y)

-70° C 23-35 < 30

Study III Placebo

controlled

probiotic

intervention

Finland 16 (11:5) 3 (3 weeks) -20° C o/n

-70° C

27-54 18.4–

30.4

Study IV Comparative

study

The Netherlands 29 (20:9) 1 +4° C

-20° C

within one

day

18-54 18.6–

60.3

Figure 4. Study design of Study III with respect to the analyses described in this thesis.

Materials & Methods

22

4.2 Methods

4.2.1 Metaproteomics

Study I was conducted within a collaboration in which the proteins had already been extracted from faecal

samples by first isolating microbial cells by Nycodenz gradient centrifugation (Murayama et al. 2001),

followed by disruption of the cells by chemical lysis, as described in Study I in this thesis. In all other studies

an in-house developed sample preparation approach was applied that included the following key features: i)

no fractionation of faecal material, ii) extraction of proteins via mechanical lysis with bead beating (0.1 mm

zirconium silicate beads) under anaerobic condition (gaseous nitrogen), and iii) protein fractionation into

three parts according to molecular weight by one dimensional gel-electrophoresis with focusing on the

approximately 40-80 kDa fraction. The proteins contained therein were termed the Abundant MetaProteome

AMP (see Results and Discussion).

The details of the applied metaproteomic methods have been described in the original articles (Studies I-

IV). Table 6 presents an overview of the applied metaproteomic techniques from sample preparation to data

analysis as applied in the four sub-studies.

Table 6. Overview of proteomics methods applied in this thesis

Study

I II III IV

Sample

Preparation

Protein Extraction From microbes isolated from faecal material

(=bacterial pellet)

X

From unfractionated faecal material X X X

Detergent buffer (urea, thiourea, CHAPS) X

Phosphate buffered saline (PBS) X X X

FastPrep FP120 (Savant) @ 0.1 mm zirconium

silicate beads

X

FastPrep 24 (MP Biomedicals) @ 0.1 mm

zirconium silicate beads

X X

1D PAGE NUPAGE 4-12% Bis-tris (Invitrogen); reducing

agent, sample buffer, and MES running buffer

(Biorad)

X X X X

Fractionation 20 gel pieces per gel lane X

3 gel pieces per gel lane, analysis focus on AMP X X X

In-gel digestion (Shevchenko et al. 2006) X X X X

Peptide clean-up GF/C filtration and OMIX tips X

C18 microspin columns X

HPLC-ESI-

MS/MS

Nano-HPLC Gradient run length (min): 50, 85, 100, and 120,

respectively

X X X X

Mass spectrometers LTQ-Orbitrap X

LTQ-Orbitrap XL X

Orbitrap Velos X

Velos Pro-Orbitrap Elite X

Materials & Methods

23

Table 6 continued

Data-analysis Peptide spectral

matching – search

algorithm

OMSSA (Geer et al. 2004) X X X X

X!Tandem (Craig and Beavis, 2004) X

q-value correction (Käll et al. 2008); filtering of

ambiguous hits

X

Peptide spectral

matching – database

Matched metagenomes, collection of bacterial

genomes (=synthetic metagenome) + unmatched

metagenome

X

Gut dedicated database (db) X

Extension of db in Study II X X

Separate search against subsets of db X

Search against complete set of sequences in db X X

Peptide grouping X

False discovery

rate(FDR) estimation

Concatenated target and decoy db

X X X

Separate target and decoy db X

FDR stringency 1 % X

5 % X X X

Functional

assignment

COG BLAST (Tatusov et al. 1997; Altschul et

al. 1997)

X X X

KEGG assignments through NCBInr BLAST

and subsequent MEGAN 5 analysis (Huson and

Weber, 2013)

X

EggNOG (Powell et al. 2014); Ublast (Edgar,

2010)

X

Functional grouping iPath (Letunic et al. 2008) X

microbial gastro-intestinal metabolic modules

GMM (Le Chatelier et al. 2013)

X

Taxonomic

assignment - Lowest

common ancestora

(LCA)

In-house python scripts – UniprotKB X

Unipept (Mesuere et al. 2012) X X

Quantification Progenesis LCMS (MS1 based) X

Spectral counting (MS/MS based)b X X X

a Peptide matching against UniprotKB entries with 100% identity; assignment of lowest taxonomic unit to which the

peptide was uniquely mapped; in each case all the hierarchical levels were reported allowing the sub-setting of the

data e.g. for bacterial peptides b Counting of the number of spectra identified per comparative unit e.g. KO or phylum; relative percentages in

Studies I - III; absolute numbers in Study IV

4.2.2 Phylogenetic microarray analysis

In addition to the metaproteomic data, phylogenetic data was available. The experiments for nucleic acid

based analysis were carried out by our research team (Study II and III) and collaborators (Study III, Study

IV). In Study II and IV, faecal DNA was extracted by mechanical lysis as described in (Salonen et al. 2010)

and in Study III with a modified Promega method as described in (Ahlroos and Tynkkynen, 2009).

Of all samples in Studies II-IV, with the exception of four samples in Study III, the Human Intestinal

Tract Chip (HITChip), a phylogenetic microarray described previously (Rajilić-Stojanović et al. 2009), was

used to identify the bacterial composition of the faecal samples. The results of the microbiota composition in

Materials & Methods

24

studies III and IV had been reported previously (Verdam et al. 2013; Lahti et al. 2013a). The HITChip

targets the V1 and V6 hypervariable regions of the 16S rRNA gene of over 1000 bacterial phylotypes.

Phylogenetic organization of the microarray probes and data pre-processing have been explained in detail

elsewhere (Rajilić-Stojanović et al. 2009; Salonen et al. 2010; Jalanka-Tuovinen et al. 2011). In Studies III

and IV, the probe signals were summarized with Robust Probabilistic Averaging algorithm (fRPA) (Lahti et

al. 2013b) per phylotype and further to 131 genus-like levels, which are groups of bacteria sharing ≥90%

sequence identity, and referred to with species name and relatives.

4.2.3 Statistical analysis

In studies II-IV, the statistical programming language R was used for statistical data analysis (R-core team).

Unsupervised methods

For the correlation of various data-sets either Pearson (Studies II and III) or Spearman correlation was used

(Study IV). The Jaccard Index (Levandowsky and Winter, 1971), in this case “number of shared

peptide/peptide union”, was used to compare the metaproteomes at the level of identified peptide sequences

(Study III). Complete linkage hierarchical clustering was applied to visualize sample relationships (Studies

II-IV). Principal component analysis (PCA) (Venables and Ripley, 2002) was used as the ordination method

to identify feature based clusters (Study II). In addition, principle coordinate analysis PCoA was used to

identify sample clusters based on distances (Paradis et al. 2004). Jackknife iteration was used to test the

statistical significance of the separation. Random forest analysis of ANOVA results was applied in Study II

(Breiman, 2011; Faraway. 2002). Q-value corrections were applied. Wilcoxon test was used to test for group

differences (Study IV).

Supervised method

Boots-trap aggregated redundancy analysis (bagged RDA), applied and described in (Jalanka-Tuovinen et al.

2014), was used to identify the NOGs explaining most of the clustering of the study groups in Study IV.

4.2.4 Additional data-sets used in this thesis

Study I Study II Study III Study IV

Metagenomic reads,

NCBI BioProject 33303,

samples NO1 and NO3

Phylogenetic microarray

data


data;

Symptom diary


data;

Clinical host variables

Results and Discussion

25

5. RESULTS AND DISCUSSION

The general objective of the work described in this thesis was to gain a functional insight into the intestinal

microbiota based on a metaproteomic analysis of human faecal material. For this purpose, appropriate

sample preparation methodologies and a data analysis pipeline had first to be developed. These analytical

strategies were applied to faecal samples collected from a total of 50 adults across four different studies. The

functional individuality of the microbiome and changes in it occurring over time were addressed. In addition,

possible modulations of these functions due to both a probiotic intervention and to obesity were followed.

While the emphasis was placed on addressing the microbial functionalities, the developed approach also

enabled the simultaneous analysis of the host functions.

5.1 Method development to study the human intestinal metaproteome (I, II and V)

At the start of this project only one faecal metaproteomic study had been published, describing the

metaproteome of baby faeces in the first 100 days of life (Klaassens et al. 2007). However, this first study

was based on the classical 2-DE approach, before the appearance of the high throughput LC-MS/MS

analytical technologies, and furthermore, it targeted baby microbiota which has a low complexity as

compared to the adult microbiome. Moreover, no reference databases were available for the intestinal

proteins. Hence, here a novel methodological pipeline had to be developed for analysing faeces obtained

from adults; i.e. starting from refining the methodologies for preparing faecal samples for LC-MS/MS

analysis and through different stages to finally devising a strategy for complex data-analysis. First, protein

extraction and fractionation methods needed to be developed, applicable for the analysis of several dozens to

hundreds of samples (Study II). Second, in the peptide spectral matching, as large a collection as possible of

target sequences was needed with which to construct an intestine-specific database. As the sequence analysis

of the intestinal metagenome has made very rapid progress, this database had to be updated (Studies I-IV).

Third, a set of different gene orthology systems were applied to functionally annotate peptides and proteins,

as no established system for the human intestinal proteins existed (Studies I-IV). Fourth, since previously the

phylogenetic origin of a peptide had not been inferred from intestinal proteins, this was undertaken for the

first time (Studies II-IV). Moreover, statistical approaches, like the Jaccard index and the integration of

metaproteomic and phylogenetic data, originating from different disciplines were adopted for

metaproteomics (Studies II-IV). Finally, a critical assessment of the methodologies used in human

metaproteomics studies was performed and future needs and directions of human metaproteomics were

proposed (Study V).

5.1.1 Analysis of the abundant metaproteome (II)

Mechanical lysis of microbial cells from adult faecal samples using a bead beating procedure had been

shown to generate high yields of DNA that were suitable for determining the phylogenetic composition of a

microbial ecosystem (Yu and Morrison, 2004). Hence, this method was applied to extract proteins from adult

faecal samples. Similarly to the amount used for DNA extraction applied in Studies II-IV, a total of 125 mg

faecal material was used as starting material (Salonen et al. 2010). Bead beating of the sample after three

fold dilution in PBS with 500 mg zirconium silicate beads, for cell disruption, and three glass beads, for

homogenization during the lysis process, under a gaseous nitrogen atmosphere resulted in protein

concentrations high enough to produce highly visible Coomassie stained protein bands appropriate for

consecutive nano-LC-MS/MS analysis.

Proteins were fractionated according to their molecular weight by 1-DE. In addition to the size

separation, this also provided a significant purification of the protein extracts and no further cleaning of the

sample was needed anymore as these often results in dilution and the subsequent necessity to concentrate the

samples. The 37 and 75 kDa bands of a pre-stained marker were used as molecular weight controls in the


26

production of three gel fractions: one containing proteins larger than 80 kDa, one with a molecular weight

range of approximately 40-80 kDa and one with proteins smaller than 40 kDa. The proteins contained in

these fractions were called High molecular weight MetaProteome HMP, Abundant MetaProteome (AMP),

and Low molecular weight MetaProteome (LMP), respectively. The proteins contained in the ~40-80 kDa

region were termed AMP as the majority of proteins were covered in this region. By comparing the three

molecular weight fractions, it was observed that 81.4% of the different proteins across all fractions were also

identified in the AMP (Study II, 6 biological samples). A further motivation to focus on this area was that it

may be difficult to identify high molecular weight proteins and the lower molecular weight fraction is likely

to contain ribosomal proteins and partially digested proteins. In Study II, where the approach was published

for the first time, AMP was referred to proteins present in the 35-80 kDa region. This terming was based on

few individuals’ samples. Later, when analysing dozens of different samples, it was noticed that cutting the

gels below the 35 kDa marker band may not always be convenient and a cutting above is required in order to

have sample intrinsic marker bands for cutting which might be slightly above the 37 kDa band of the pre-

stained marker. On average, in the AMP, almost 5500 unique peptides and 460 NOGs, as indicator for

functional depth, were identified in one LC-MS/MS experiment with the experimental specifications as

provided for Study IV (Table 6).

Due to the complexity of the sample matrix and the proteins present in faecal material one sample

preparation approach will always allow identification of only a subset of proteins present in the sample.

Therefore, recently, various methods have been described that could increase the number of proteins

identified in a metaproteomic experiment. Two main approaches are available for the identification of low-

abundance proteins in the intestinal metaproteomes, i) separating the faecal fractions and ii) fractionation of

proteins. Two recent studies described approaches to separately analyse bacterial and human proteins

(Lichtman et al. 2013; Xiong et al. 2014). Analysing the proteins present in faecal water identified 53 human

core proteins in a sample size of three individuals (Lichtman et al. 2013). This number corresponds to the 23

human KOs - which are a higher abstraction level than proteins - found to be the human functional core in

Study III (data analysis this thesis; core defined as number of KOs identified in at least one sample of each

individual). Therefore, the approach described here that is based on taking crude faecal material covers the

same range of human proteins as assessed in the so-called targeted approach. The tedious separation

described by (Xiong et al. 2014), which included several sample-preparation steps to separate human and

microbial cells from each other, appeared to double the number of identified microbial proteins by two-fold

compared to an approach analysing both fractions at once. Therefore, as found in other proteomics studies,

increasing the level of fractionation increases the level of identifications.

In addition to sample separation strategies, increasing the analysis time for one sample might increase

protein coverage. This can be achieved by undertaking an extensive fractionation of proteins or peptides. For

mouse faeces, filter assisted protein digestion (FASP), a protein digestion method which circumvents the in-

gel digestion, in combination with a gradient running for over eight hours, produced over 16,000 peptides in

four runs (Tanca et al. 2014). The gain in functional information compared to other approaches was not

detailed. The same group compared the effect of differential centrifugation and achieved an enrichment of

microbial peptides when separating bacterial cells from other faecal components by applying different

centrifugation speeds. The preparation of a microbial pellet resulted into the identification of approximately

6,000 microbial peptides, whereas when the unseparated material was taken only around 4,000 peptides

could be identified (Tanca et al. 2015). However, in the approach analysing the bacterial fraction, it appeared

that the Bacteroidaceae were underrepresented. When comparing these other studies to the approach

developed in this thesis (Study II), differences could be observed also at other steps of the pipeline, e.g. in

the method of lysing the cells, the analytical platform used, as well as in the databases used (Table 7). These

differences impair any direct comparison between the various studies. In general, the quantity and quality of

fractionation and time spent per sample differs depending on which combination of techniques is selected. At


27

the functional level, no major discrepancies appeared when comparing the different studies based on the

identified proteins (Study V).

Table 7. Brief overview of examples for published method combinations in faecal metaproteomic experiments.

Sample preparation Kolmeder et

al. 2012

Tanca et al. 2014 Erickson et al.

2011

Juste et al.

2014

Crude faecal material X

Bacterial pellet

X X X

Digestion

In-gel digestion X X

FASEP X

In-solution

X

Analytical platform

2-DE X

SCX X

LC-MSMS moderate gradient < 3h X

LC-MSMS Long gradient > 3h

X

Protein sequence database

Bacterial genomes X X X X

Metagenomes X X

5.1.2 Reproducibility testing of the pipeline (II)

The effect of sample preparation on the reproducibility of the LC-MS/MS results had to be determined in

order to allow its application for robust sample comparisons. For this purpose, MS1 profiles were used, since

with respect to the analytical outputs, they represent best the uniqueness of each in-gel digest. Replicates of

the protein extraction, in-gel digestion, and LC-MS/MS were taken for comparison. In the analyses, samples

from six biological samples were included representing three different individuals and two sampling time-

points.

Altogether, the comparison was performed with 26,000 MS1 events across 22 AMPs. Based on Pearson

correlation coefficients, the technical variation was compared to the variation of different samples from the

same individual and in samples from different individuals (Figure 4). The technical variation, including the

variation introduced by protein extraction, 1-DE/in-gel-digestion and LC-MS/MS analysis, was significantly

lower than the variation of samples from the same individual collected on two different days with at least

half a year time in-between. The variation observed in different samples from the same individual was

significantly lower than the variation observed in samples taken from different individuals. This provided

assurance that the developed approach could be applied in consecutive studies for comparing the

functionalities of the intestinal microbiota across samples.


28

Figure 4. Boxplots of the distribution of the Pearson correlation coefficients (y-axis) to assess the effect of sample

preparation, temporal and subject variation.

5.1.3 Development of in-house databases (I-IV)

One specific challenge in metaproteomics is the need for an adequate protein sequence database being a fair

representation of the vast variety of different proteins present in the sample. The intestinal microbiota with

the hundreds to thousands of different microbial species is an extreme version of this challenge. The

individuality of the intestinal metaproteome and the consecutive required targeting of the database were

demonstrated with metaproteome analysis of two samples for which the metagenomes had been analysed and

were available for PSM i.e. matched metagenomes (Study I). Protein sequence databases were constructed

by naïve translation of the metagenomes and these were used for PSM. By using the matched metagenome as

the protein sequence database for PSM, it was possible to obtain double as many identified peptides than if

one used the unmatched metagenome. In the same study, the following search strategy to increase the

number of identified peptides was proposed; i) to use a collection of intestinal bacterial genomes as a

database, called a synthetic metagenome, ii) perform a BLAST search with the hits from this search against a

large metagenome database, iii) construct from these BLAST hits a protein sequence database and finally iv)

perform PSM against this database and apply a conserved FDR of 1%. The aim was to increase per protein

the sequence coverage by identified peptides, not to identify a large variety of functions. This approach of

using an iterative workflow for PSM targeted the problem of impairing peptide identification on the basis of

established validation strategies when the search sequence space becomes extremely large. In a classical

proteomics experiment the database size is usually not larger than 10,000 sequences. However, the collection

of metagenomic sequences used here contained 3,300,000 sequences (Qin et al. 2010). The result of doubling

the number of identified peptides compared to the matched metagenome demonstrated the usefulness of the

approach that could be applied in cases where extensive protein coverage is needed. Recently, other

strategies for decreasing the size of the protein sequence database to aim for more specificity have been

proposed. In one of these approaches a first search was conducted in a complex protein sequence database

and a second search in a database constructed from the unfiltered results from the first search. The peptide

identifications from the second search were subjected to strict FDR filtering (Jagtap et al. 2013). In another

study, in a first search all the UniprotKB entries were used and then in a second search only the genomes of

those microbial strains for which initially a hit had been found (Tanca et al. 2015). A systemic comparison of

the available possibilities to increase the sensitivity of PSM in metaproteomics has yet to come.


29

Especially at the beginning of this work, having a matched metagenome of each analysed sample was

not feasible due to the high costs. Therefore, it was decided to construct a database based on publically

available sequences which would cover microbial, human and food derived proteins. During the course of

this thesis work, the first comprehensive metagenome study was published (Kurokawa et al., 2007) followed

by a human intestinal gene catalogue containing 3.3 Mio genes (Qin et al. 2010); the data of both studies was

included into the protein sequence database constructed in-house. In addition, the protein sequence database

contained a collection of intestinal microbial genomes; their number was initially 298 (Study II) but

increased to 594 as more genomes became available (Studies III & IV). The human genome was included as

well and extended for identifying alternative proteins in Studies III & IV. Finally, seven plant genomes were

included in the protein sequence database that could be beneficial for identifying proteins present in food.

The initial database contained 5,132,694 protein sequences (Study II) and the updated database 6,153,068

(Studies III & IV). The extension of the protein sequence database plays a possible role in the increased

identification ratios during the course of this thesis work (Table 8).

The largest collection of genes derived from the intestinal microbiota covers now more than 10,000,000

genes which may increase the ratio of identified peptides out of the metaproteomes (Karlsson et al. 2014).

However, as exemplified in a recent study where the effect was tested of the size of the protein sequence

database, it appeared that the optimum way of using this large sequence space still has to be determined

(Muth et al. 2015). The more unique proteins that are present in a protein sequence database, the higher is the

probability that peptides present in a sample will be identified due to the nature of PSM that allows only

100% similar identifications. Conversely, the recent study by Muth and co-authors revealed that by using the

present peptide validation strategy of FDR filtering, the sensitivity will suffer in the case of large databases,

hampering utilizing its full potential.

Table 8. Overview of MS/MS spectra and peptide identifications across studies

Study I *

50 min gradient

Study II**

85 min gradient

Study III **

Step-wise gradient

70min + 30min

Study IV**

120 min gradient

Number of MS/MS

spectra per sample

~45,000 ~4000 ~20,000 ~40,000

Identification ratio 7.9 - 11.9% ~17.0% ~24.0% ~27.0%

*Different approach than used in Studies II-IV, 20 gel pieces covering full molecular weight range, each analysed with

a 50 min gradient

** AMP; see Table 6 in Material and Methods for differences in mass spectrometers used and other parameters between

the studies

5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV)

Simply because of their extreme complexity and since the appropriate methods are still under development,

most metagenomic sequences have not been well annotated. Hence, a strategy of annotating the identified

proteins by orthologies was used. This orthology-based annotation facilitated the comparison between

samples. Due to the problem of protein inference, which is particularly pronounced in metaproteomics, it is

impossible to assign one protein solution for one peptide, the common unit in classical proteomics for

making comparison between samples. In addition, default settings of search algorithms only provide a

certain amount of proteins solutions for one peptide (e.g. 30 by OMSSA). However, as in the case of

glutamate dehydrogenase, over 500 proteins from different bacteria share at least one peptide and often more

peptides. Therefore, the initial search hits need to be remapped to the original protein sequence database with

all options per tryptic peptide recorded, and then proteins should be grouped according to their peptide

dependencies. In Study II proteins have been grouped based on having at least one peptide in common and

one unique peptide. This is computationally demanding and as no ready applicable software solution is

available for this task, it was only applied in Study II, with in-house scripts. Therefore, otherwise peptides


30

were functionally characterized based on the proteins they belonged to based on orthology and then the

spectra per orthological functional category summed for subsequent quantitative comparisons.

Out of the three orthological databases i.e. COG, KEGG and eggNOG, used for assigning a function to a

peptide with eggNOG (Study IV) the highest functional assignation rate was obtained. For more than 90% of

the spectra the respective proteins had a high confidence hit in the eggNOG database when applying a

UBlast search (e-value of 1E-30 or lower); reflecting the suitability of this classification database of

orthologous genes for annotating the human intestinal metaproteomes. The other databases COG and KEGG,

which were used in studies II and III, covered around 70% of the identified spectra. All these orthological

classification systems are based on a large collection of genomes from various sources and therefore they

span a very broad range of functions. The description of the mapped orthologies is not always readily

interpretable in the context of the human intestinal microbiota and may be viewed as rather abstract.

Therefore, GMMs have been developed (Le Chatelier et al. 2013) that organize KOs into intestinal specific

functional groups at three levels, such as mucin degradation or the degradation of carbohydrates, like

arabinose and xylose (Study IV). These are meaningful categories and assist in the interpretation of the

results. Characterization of protein functions can also be done based on the protein families database

(PFAM) (Finn et al. 2014). This domain based functional annotation was not applied in this thesis work but

has been previously used in metagenomic studies (Ellrott et al. 2010) and might be useful also in

metaproteomics for functional description of proteins and sub sequential quantitative analysis for

functionally targeted results.

5.1.5 Reflections on the metaproteomic approach

Taking into account that possibly 1000 microbial phylotypes are present in one individual intestine and that

each of these phylotypes has a gene capacity of at least 2000 proteins, several millions of proteins could be

present in one faecal sample. The proteins identified here obviously only reflect a very small fraction of the

total proteins present in the samples.

There are a few critical points that affect the success of a metaproteomic analysis. Sample preparation

and protein or peptide fractionation strategies cannot be complete. However, the reproducibility can be

assessed; based on a validated approach comparisons between samples can be made. In this work, the

proteins were fractionated by molecular size which was then utilized in their subsequent selection and

recovery. This process may have an effect on the identified proteins as only a subset of the proteins was

analysed. Moreover, the recovery of the trypsin-digested peptides from the gel may be biased and affected by

the amino acid sequences. In general, there exists no absolute bias-free proteomic technique, also not for

metaproteomics and there is still potential for identifying more functions. To be kept in mind is that both the

functional and taxonomic comparisons are based on identified spectra. This might introduce a further bias as

it is of course not known which functions the not-identified proteins may have. They can be similar to those

identified but they may also be different. Furthermore, a number of other parameters may introduce a bias,

such as differences between search engines and their capacity to differentiate between similar peptides: with

respect to the search results OMSSA for example is more conservative, tending to underrepresent, whereas

X!Tandem has a tendency to over-represent. In addition, the more information is demanded for a peptide

such as function and taxonomy, the smaller will become the remaining fraction for comparison; not all

peptides have a taxonomy assigned and of these not all have an assigned function and vice versa. Therefore,

improvements still can be made at all the various steps. The protein coverage needs to be increased by an

optimized sample preparation method and analytical platform. However, careful evaluation is required that

an accurate comparison between a large amount of samples is still possible. In addition, a cost-benefit

analysis for the database searches is required with regard to the choice of search parameters and database

size, all of which define the compute time and time spent on interpretation of the results, and peptide

identification sensitivity (Muth et al. 2015). Similarly, peptide and protein validation methods for

metaproteomics are required. Because of all these possible improvements, the here described approach


31

captures a fraction of the faecal proteins. However, it represents a feasible solution for comparing dozens of

samples including the functional and taxonomic characterization of the identified peptides. Although there

was a constant development of the metaproteomic pipeline throughout the project no major discrepancies

were observed. This indicates that the developed approach provides a reliable access to the major proteins

contained in faeces.

In Study II, protein quantification was based on MS1 data, in all other studies quantification was

performed by counting the identified spectra per comparative group e.g. specific phylum. Quantification

based on MS1 peaks might be more robust for identifying differences in protein abundances between

samples compared to spectral counting. However, applying the MS1 based approach in metaproteomics

generates two challenges. The large protein diversity across metaproteomes poses a challenge to the aligning

and normalizing algorithms of software for quantifying MS1 based data. Furthermore, the MS1 based

approach relies on assigning peptides to proteins; some proteins like glutamate-dehydrogenase are

conserved. It is very challenging to resolve several hundred proteins forming one protein cluster at the single

protein level. Today’s increased computer power make it more realistic that the standard application of MS1

based comparison in metaproteomics will come in the near future.

5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time

Having developed methods and strategies for faecal metaproteome analysis, the focus of this work was to

describe the proteins found in healthy individuals and to create a healthy metaproteomic baseline. Health is a

complex state defined by the WHO “as state of complete physical, mental and social well-being and not

merely the absence of disease or infirmity”. In Study III, people were asked to grade their health (“good”,

“average”, “poor”). Two individuals reported that they had average health but all others reported good

health. In the other studies, the participants were not asked to classify their health.

In Study III, in which samples of the largest healthy cohort studied for their metaproteome until now

were analysed, 103 microbial KOs were found in each of the 16 individuals at least once. This can be

considered as the functional core microbiome. The majority (62%) of these could be mapped on the KEGG

metabolic pathways (Figure 5). These core metabolic functions, which were mainly involved in carbohydrate

and amino acid, and energy metabolism, represent the backbone of the intestinal ecosystem. The coverage

appears less compared to what was found in Study II. This can be explained by the fact that in Study II only

three individuals were studied, whereas in Study III 16 individuals were studied. Similarly as observed for

phylogenetic analysis and metagenomes, the larger the cohort is, the smaller the number of shared species or

genes/functions might be (Jalanka-Tuovinen et al. 2011; Karlsson et al. 2014).


32

Fig

ure

5.

Met

abo

lic

pat

hw

ays

rep

rese

nti

ng

th

e m

etap

rote

om

ic c

ore

bas

ed o

n K

O a

ssig

nm

ents

as

asse

ssed

in

16

ad

ult

s (S

tud

y I

II)

(Let

un

ic e

t al

. 2

008

).


33

5.2.1 Individuality of the faecal metaproteome (II and III)

One of the main findings of this work was that the intestinal metaproteome of a human being is highly

individual. This extends previous findings on the personalized composition of the intestinal microbiota that

has been reported extensively (Li et al. 2014). It was observed that the individuality of the metaproteomes

remained both over the long term (one half to one year; based on MS1 level, Study II) and the short term (6

weeks with probiotic intervention; Study III). Moreover, the personalized functional microbiome could be

distinguished at both the peptide and KO level. This is an important finding as theoretically the individuality

at the peptide level could be explained by genetic differences leading to amino acid differences. However,

the observation that it was also found at the level of KOs, an abstract functional level, implies that there is a

functional significance of the individuality. The individual variation exists in the type and extent of the

cellular- and extracellular activities taking place in the intestinal ecosystem. These findings clearly go

beyond metagenomic findings. Classifying genes to high functional level categories like COG families has

revealed a rather homogenous distribution across individuals (Turnbaugh et al. 2009; Human Microbiome

Project Consortium, 2012). However, most metagenomic studies have functionally not delved very deep into

functional richness due to its obvious complexity.

5.2.2 Main microbial functionalities (II and III)

The main functionalities and special functions of the human intestinal microbiota explored by

metaproteomics in this work clearly represent specific features of the ecosystem. Many proteins being

involved in energy conversion were identified; these ranged from degrading both the initial food substrates,

like arabinose isomerase for plant material degradation, and host derived, like fucosyl-isomerase to degrade

fucose, up to the formation of intermediates such as butyrate and propionate by enzymes like butyryl-CoA

dehydrogenase and methylmalonly CoA mutase. Clearly, generating energy for the microbial metabolism

especially from carbohydrate sources appeared to be a characteristic of the intestinal microbiota, confirming

earlier (meta-)genome-based observations.

The maintenance of a redox-equilibrium is a very important task in an anaerobic environment.

Rubrerythin was detected in the metaproteomes analysed in Study II and this protein could be involved in

reducing oxidative damage by reducing peroxide to water. In addition, glutamate dehydrogenase was

identified from several hundred microbes (Studies II & III). This protein may act as electron sink (Study II).

The fact that high amounts of glutamate dehydrogenase were found highlights its functional importance as

determined in a computational approach in which the functions found in a human derived metagenome were

compared to metagenomes of other ecosystems (Ellrott et al. 2010). Furthermore, this demonstrates the

capability of this metaproteomic approach to capture functional information from several hundreds of

different species.

The synthesis of vitamins like folate and vitamin B12 could be identified since it was possible to detect

enzymes like formyltetrahydrofolate synthetase and cobalamin biosynthesis protein, respectively, in the

metaproteomes analysed in Studies II and III. This does not imply that these vitamins are necessarily

available to the host. However, it indicates that these vitamins may be required by the intestinal ecosystem.

Interestingly, patients with IBD had a higher prevalence of deficiencies of folate and vitamin B12 (Bermejo

et al. 2013). It is not to be excluded that the microbiota has a role in this phenomenon. In general, vitamin

B12 was identified earlier as important factor in the intestine as both the host and the majority of intestinal

microbes are requiring this vitamin, recently reviewed (Degnan et al. 2014).

In general, in Study III it was possible to identify temporal and individual differences in all of the

described functions. These identified microbial functions are evidence for the various metabolic activities

taking place in the intestine and the changes occurring upon daily variations. Those proteins might be used as

marker for the integrity of the microbial community.


34

In Study III, study participants had consumed a milk-based fruit drink with or without L. rhamnosus

GG. Although, immunological experiments had revealed an effect by the consumption of L. rhamnosus GG

(Kekkonen et al. 2008a), no changes to the functionalities were observed. A possible explanation for this

might be that there were more pronounced changes in the ileum which were diluted in faecal material and

could not be detected with the applied approach.

5.3 Host-microbiota interactions (II-IV)

The developed approach allowed identifying also human proteins. These constituted approximately 15% of

the identified spectra. Identification of bacterial and host proteins in parallel resulted in the identification of

various indications for host-microbiota interactions. Flagellins were detected in most of the intestinal

metaproteomes (Studies I-IV). They are microbial proteins that make the cell motile. Moreover, flagellins are

also known ligands of human toll-like receptor 5 and trigger the NFkB pathway (Vijay-Kumar et al. 2008).

In Study III with the pro-inflammatory serine/threonine protein kinase 4 a human protein could be found

significantly positively correlating with flagellins. This may be an example of a host-microbe interaction i.e.

the flagellins might be triggering the expression of serine/threonine kinase 4. Flagellins as component of the

intestinal metaproteome were also found in a recent case study identifying the faecal proteins of one lean and

one obese adolescent (Ferrer et al. 2013). In Study II, Eubacterium rectale and Roseburia intestinalis were

identified as the possible sources of the flagellins. New intestinal species possessing the ability to produce

flagellins have been recently identified, such as Lactobacillus ruminis (Neville et al. 2012). Elevated levels

of flagellin-antigens have been found in patients with Crohn’s disease and post-infective IBS (Schoepfer et

al. 2008). However, since they were detected also in healthy individuals, the presence of flagellins does not

necessarily need to lead to inflammation. On the contrary, these microbial proteins could also stimulate the

host in a beneficial way (Cullender et al. 2013).

Human intestinal alkaline phosphatase is another protein that is involved in the inflammatory response.

This protein was identified in the human metaproteome of several subjects (Studies II-IV). It was found in

high amounts in obese and was one of the NOGs explaining the separation of samples between the obese and

the non-obese group. It contributed up to 50% of the identified spectra with human origin. However, also in

non-obese individuals, the levels of this enzyme could be high. The human IAP can bind bacterial

lipopolysaccharides (LPS) and may therefore be controlling LPS-induced inflammation. Intestinal alkaline

phosphatase has been used as a therapeutic agent for several conditions such as sepsis and is a well-studied

enzyme. Several factors have been identified that affect its expression levels and enzyme activities, e.g. the

type of blood group or pH (Estaki et al. 2014). Intestinal alkaline phosphatase might represent a good marker

for regulative host processes allowing the gut to cope with changing environmental conditions.

The intestinal tract is an ecosystem rich in food derived proteins and host and microbial derived

proteins. Regulation of protein degradation is demonstrated by the identification of several proteinases but

also proteinase inhibitors. Several decades ago, tryptic activity was detected in faecal samples derived from

infants and therefore proteinases and their inhibitors were proposed to have a role in balancing intestinal

functions (Norin et al. 1985). As discussed in the literature summary of this thesis, proteinases are also

supposed to have another role in addition to proteolysis and it may be via these secondary roles that they

contribute to a certain disease phenotype.

Taken together, a combination of these proteins might be used as markers to monitor the health state of

the intestinal ecosystem i.e. they could be used as indicators of an intact or perturbed system.

Bacterial proteins degrading host derived glycans have been identified in many subjects (Study III), such

as fucose isomerase which degrades mammalian derived glycans. In addition, also the host enzyme trehalase

was identified; this is an enzyme that degrades trehalose, a disaccharide that is formed in bacteria upon stress

to dehydration and acts as a compatible solute. These findings further illustrate the mutualistic relationship

between the host and its microbes.


35

One example of indirect host-microbe interaction was observed in Study III in which two individuals

exhibited relatively lower levels of alpha amylase and high levels of Prevotella ssp. peptide counts in

comparison to the other individuals. This finding could indicate that these two individuals have consumed a

diet not dominated by starch but containing more complex carbohydrates enforcing the activity of Prevotella.

5.4 Comparing the metaproteome of lean and obese individuals (IV)

Within this work, the microbial composition and metaproteome of a cohort of 29 non-obese and obese were

analysed. In the samples deriving from the non-obese individuals, statistically significant higher levels of

Bacteroidetes 16S rRNA were found compared to the obese individuals. However, relatively more

Bacteroidetes peptides than 16S rRNA signals were obtained in the obese group compared to the non-obese

group.

It is unlikely that this is due to the greater faecal mass in obese individuals i.e. the relative share should

not be affected as protein and DNA amounts should be affected similarly in that case. One explanation could

be that there is an uncoupling between growth and activity, a well-known physiological phenomenon

recognized in industrially-utilized bacteria such as Lactococcus lactis cultured at low pH conditions (Goel et

al. 2015). It has been reported that the amount of SCFA were significantly higher in obese than in non-obese

subjects (Fernandes et al. 2014), and as a result the colonic pH values of the obese subjects are predicted to

be markedly lower. In addition, it was found that the growth rate of Bacteroides spp. was tightly controlled

and reduced at low pH and by the amount of SCFA (Duncan et al 2009). Hence, although the Bacteroides

spp. may simply be not growing to high abundance due to the low pH in the obese subjects, these bacteria

may still have very high activities.

Yet another explanation could be a technical bias due to the phylogenetic analysis approach used or an

incomplete coverage of the protein sequence database. Among the 957 cultured bacterial intestinal species

47% belong to the phylum Firmicutes and 10% to Bacteroidetes (Rajilić-Stojanović and de Vos, 2014). Out

of 398 strains at IMG/JGI for which there is an assigned origin in the digestive tract, 19% belong to the

phylum Bacteroidetes (Figure 6). It is possible that the 16S rRNA sequences of the “obese” Bacteroidetes

species are not fully covered by the oligonucleotide probes on the microarray. Alternatively, these “obese”

Bacteroidetes species may be overrepresented in the protein databases used. The effect of ethnicity on the

Bacteroidetes levels was revealed in a study on the microbiota and metagenome of obese compared to lean

individuals (Turnbaugh et al. 2009). Lean individuals with European ancestry had 1.1 times more

Bacteroidetes than their obese counterparts whereas the ratio was 1.3 for individuals of African ancestry.

This might be due to the fact that the lean individuals of African ancestry had significantly more

Bacteroidetes and less Firmicutes than the individuals of European ancestry. These findings exemplify the

difficulties which may arise in comparing groups e.g. according their BMI values.

The metaproteomes of the non-obese and obese group differed in a statistically significant way as shown

by a principle coordinate analysis of the identified NOGs. This demonstrated that there are considerable

differences in functionalities between the non-obese and obese group. Among the 25 NOGs which were

found to explain the separation between samples from the obese and the non-obese group, based on

baggedRDA analysis, several NOGs were involved in carbohydrate degradation.

Furthermore, the insulin sensitivity of the subjects, as measured with the HOMA index, correlated

negatively with a specific bacterial two component regulator. This is a further example of the link between

microbiota and the host, and it may represent a possible starting point to clarify the reasons of the metabolic

changes occurring in obesity.


36

Figure 6. Microbial phyla distribution of 1,847 microbial genomes at IMG/JGI with origin Homo sapiens (IMG/JGI

HS), 398 of those are from the sub-system digestive tract (IMG/JGI HS DT), and of 1057 cultured intestinal phylotypes

(1057). (EA = Euryarchaeota)

5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis

(II-IV)

The recent expansion of the protein databases as consequence of the efforts ongoing in sequencing microbial

genomes has provided a critical amount of microbial proteins sequences originating from a diverse range of

species which made referring a LCA to a peptide possible. Furthermore, phylogenetic microarray data and

metaproteomic data were compared to obtain a better understanding of which microbes were present and

what was their metabolic activity.

When comparing the phylum distributions measured by the phylogenetic microarray to the spectral

count data of phylum-specific peptides both agreeing and disagreeing results were obtained (Figure 7).

Differences in the averages per phylum were observed between Studies II, III and IV. Biological differences

in the study cohorts, the storage of the faecal material (Table 5; Bahl et al. 2012) as well as differences in the

analytical steps (Table 6) might have contributed to the observed differences.

Figure 7. Barplot of the distribution of the main bacterial phyla across three studies assessed by phylogenetic

microarray (16S) and metaproteomics analyses (pep).

0%

20%

40%

60%

80%

100%

16S pep 16S pep 16S pep 16S pep

Non-obese Obese

Study II Study III Study IV

Verrucomicrobia

Proteobacteria

Actinobacteria

Bacteroidetes

Firmicutes


37

The main functionalities were determined for the three major phyla found in the human intestine,

Firmicutes, Bacteroidetes, and Actinobacteria. Firmicutes appeared to be dominantly active in carbohydrate

metabolism and butyrate production, and Actinobacteria in sugar metabolism. The proteins produced by

Bacteroidetes showed a more even distribution of several functions. These findings were in agreement with

previous theoretical knowledge. Both in Study II and III, a large fraction of the identified peptides could not

be assigned to proteins with any known function. This implies that the functionalities of Bacteroidetes are

underrepresented in terms of cultured and sequenced species and hence need to be analysed more closely.

In Study III, peptides were filtered according their phylogenetic origin and the KOs of these specific

peptides were further studied. Specific focus was on the functionalities of Faecalibacterium prausnitzii,

Prevotella ssp. and Methanobrevibacter as these organisms vary greatly in their abundance and encode

distinct functions within the gut ecosystem. For F. prausnitzii for example glucuronate-isomerase was

identified implying F. prausnitzii’s involvement in carbohydrate metabolism. According to the KOs that

were identified based on Prevotella peptides these microbes contributed to the propionate synthesis in the

analysed metaproteomes as methylmalonyl-CoA mutase was identified. In addition, proteins involved in the

methane production derived from Methanobrevibacter were identified.

The bias which remains in finding a LCA for a peptide and drawing a conclusion on these findings is

that obviously not all peptides can be assigned. In this work the rate of assigning a LCA to a peptide was

around 70%. An absolute peptide matching is required for assigning a phylogenetic origin to a peptide.

However, this is impaired by the fact that many protein sequences still remain to be identified. Therefore, the

results have to be seen in the light of present knowledge both in the form of oligonucleotides represented on

the microarray as well as protein sequence entries in UniprotKB.

In Study II, where aligned and normalized MS1 data for proteins were available the proteins could be

correlated with the microarray data. Correlations for example between flagellin and Roseburia intestinalis

and Eubacterium rectale could be identified. These findings could be verified by a BLAST search of the

flagellin sequences. This highlights the fact that the integration of microarray and metaproteomic data makes

it possible to identify the organisms responsible for the production of a specific protein.

In Study I, it was noted that one of the individuals exhibited a high number of Akkermansia reads and

similarly a high number of Akkermansia derived peptides could be identified. In the other individual, hardly

any Akkermansia peptides were obtained. Akkermansia is so far the only member of the deeply rooted

phylum Verrocumicrobia found in human GIT and therefore a good example for comparing 16S rRNA and

peptide data.

5.6 Reflections on study cohorts

All the study participants but one (diabetes type II) were individuals without (chronic) disease. They had not

taken any antibiotics at least three months prior to sampling and did not suffer from any gastro-intestinal

diseases. Due to sample availability issues, some skewed age distribution was observed in the probiotic

intervention trial (Study III). In the obesity study (Study IV) there was a significant age difference between

the non-obese and the obese group. Generally, due to the small size of the cohorts, there was no even

distribution between male and female study participants.

With the exception of Study II which was a proof of principle study, all samples were derived from

collaborations for which a number of meta-data was already available. Therefore, there was no influence on

the availability of meta-data such as health-questionnaires or food diaries which were not collected. The

absence of food diaries impaired controlling for the impact of diet that can be significant. For instance,

without dietary information, it is difficult to account for the individual variation in some of the results

obtained here, such as why flagellins are detected in some individuals and not in others. Similarly, in Study

IV it remains unclear whether there is a per se difference in metabolism or if a difference in diet could

explain the separation of the intestinal metaproteome from obese and non-obese subjects.


38

In Study III, study participants filled a symptom diary that indicated showing normal health

perturbations such as loose stools or coughing.

There is a learning process encountered when arranging this kind of study. Studies with human subjects

depend very much on their willingness to participate and this may be lower in healthy subjects than in

individuals with a disease who may hope that their participation will confer a health benefit. Since scientific

interest in intestinal microbiota is still increasing, one can predict that larger studies also collecting faecal

material or other digestive tract compartments will be conducted in the future.

Conclusions and Future Perspectives

39

6. CONCLUSIONS AND FUTURE PERSPECTIVES

Microbiology has largely moved from light microscopy to what can be defined as “molecular microscopy”.

By determining the sequences of 16S rRNA genes or metagenomes, a much larger variety of microbial

communities has been discovered than ever would have been expected by looking at a microscopic picture or

analysing colonies on a plate. The continuous development of both proteomics and the human microbiome

field is reflected in the analytical capabilities of the different mass spectrometers used in this work, the

development of available algorithms and other tools, and the constant increase in the numbers of microbial

genes identified from the human intestine.

The immense increase in genomic information of the intestinal microbiota has greatly facilitated studies

at the protein level. However, the tremendous growth of sequence information has also hampered the

potential identifications because of the great number of variations. Hence, it underlined the need for the

development of new data analysis solutions. Proteomic software, such as is available for de novo sequencing,

may further be developed to identify peptides in complex mixtures without the need for gene knowledge.

Moreover, almost each year of this thesis work, an updated and faster or better mass spectrometer was

introduced.

The metaproteomic approach devised here is a detailed description of one solution for assessing the

proteins contained in faecal material; as discussed, several other approaches are feasible. With regard to the

sample preparation, the development in metaproteomics has been parallel to that encountered in

metagenomics. First, several different approaches have been proposed and now a systematic comparison is

being undertaken (Salonen et al. 2010; Lozupone et al. 2013; Franzosa et al. 2014). An important point to

keep in mind is that we are at the beginning of understanding the intestinal microbiome in its entirety by

applying methods producing complex data-sets. These present studies highlighted the variety and complexity

of the intestinal microbiome. After these initial studies, a refinement and homogenization of the analytical

tools has started. In the field of metaproteomics, a systematic comparison of sample preparation

methodologies has still to emerge and there are many important questions needing to be addressed. What is

the effect of sampling? What is the effect of freezing and is there an effect of storage time? Which is better:

using a bacterial pellet or utilizing crude faecal material? How should the lysis be performed: by mechanical

disruption, chemical lysis, or a combination of these techniques? What is the best protein-fractionation

method? In one study, the question of appropriate study material has now been addressed and the results

indicate that by using a bacterial pellet one can recover more microbial proteins but with a bias in the

recovered phyla (Tanca et al. 2015). It is to be expected that further methodological harmonization based on

a comprehensive evaluation will accelerate the speed in acquiring new knowledge about the intestinal

microbiota and its involvement in health and disease.

Another approach that may be important in the future is the use of stable isotope probing. This approach

can be coupled to dietary interventions with food components labelled with stable isotopes; this would make

it possible to analyse trophic chains. Similarly, as it has been done for 16S rRNA analysis in an intestinal

model and proteins in aquatic model systems, this technique could be applied to follow the human-microbial

interactions in humans based by measuring the incorporation rates of these substrates into proteins.

Two solutions for coping with protein inference that complicates quantitative comparisons between

samples were applied. In Study II grouping proteins according their relatedness based on at least one shared

peptide was used to resolve PSM results on the protein level. Another solution was devised by assigning

peptides to functional groups and performing quantitative comparisons on that level. However, what is still

needed is to define the functional-depth at which meaningful differences between treatment/health-state

groups are found; if this is the individual protein deriving from a certain strain or species, a group of proteins

sharing same peptides or proteins grouped to a specific function. Functional grouping of proteins as provided

by the orthologous gene databases such as COG or eggNOG or the recently developed GMM are very useful

in providing an overview of the functions taking place in a very complex ecosystem. One step further to the


40

defining of GMMs, grouping KOs to intestinal relevant functional categories, would be environment-specific

annotation of human intestinal metagenomes. This requires experimental proofing of the predicted functions

in combination with network analyses to test whether the predicted functions fit into the theoretical model.

There is an overall awareness of the gene richness encoding for a large variety of functions of our

microbes; and their importance for a multitude of body functions. However, there is still much to be learned

of our microbes and us, in a world with usually a surplus of every food item. The mechanisms of interactions

between host and microbiota are poorly understood. In the case of energy metabolism, recent research has

opened up the possibly involved molecules and even genes. Bile acids are depending on the microbiota

differently modified and these regulate the expression of the fxr gene, responsible for regulating host bile

acid production. We need to concentrate again more on microbial physiology. Future efforts are to be taken

to get to know yet un-cultured microbes and learn about their functions. Another possibility, closer to the in

vivo situation, is to fractionate the intestinal microbes into distinct subsets, e.g. according to surface

structure, and then to analyse their proteins.

There is a large unknown gap between the potential and real functionalities of the intestinal microbiota;

only half of the sequenced genes in metagenomic studies can be mapped to a gene homologue with a known

function. The numbers of proteins detected in one faecal metaproteomics experiment are several factors of

magnitude less than the theoretical number that could be detected. Some methods for fractionating the

metaproteome have been described above. A further option for better capturing the full variety of proteins

would be to utilize peptide ligand libraries as suggested recently (Righetti et al. 2015). Peptide ligand

libraries could help to identify the less abundant proteins.

Due to the multitude of factors affecting human metabolism and the composition and functions of our

microbes, careful metadata collection, as well as phylogenetic and functional measurements, are required if

we are to make meaningful associations. Data generated in such a way can be used to make predictions based

on profiling data and it can be applied for diagnosis of an individual’s health state or the decision what would

be the most successful treatment. As shown in this work, we have a personalized metaproteome and there is

temporal variation of its functions. By conducting longitudinal studies, well-controlled for possible

influencing factors, we may be able to determine which microbial set-ups are associated with which and

which effectors change the property and if so, to which extent.

Although the main purpose of metaproteomics is not to identify which organisms are present in the

microbial community but instead to study its function, in this work the method of assigning a phylogenetic

origin to a peptide was found useful in the handling of such complex data. Sub-setting the identifications into

host and bacterial proteins revealed component specific proteins.

Phylogenetic analyses complemented the metaproteomic data in this work. Although both

sequencing and proteomics techniques developed tremendously during the course of this work, biases from

both sides remain, complicating the one-to-one comparison. Until these biases are further tackled, any

comparisons have to be viewed as approximations. A major divergence was observed between phylogenetic

microarray analysis and metaproteomics for members of the genus Bacteroides. Whereas on the 16S rRNA

level Bacteroides spp. were apparently significantly less abundant in obese individuals, relatively more

Bacteroides peptides were identified from the obese individuals. While the simple explanation is that the

Bacteroides spp. are more active in the obese subjects, the question arises why they are then not simply

outgrowing and become more abundant. Consequential studies may provide further evidence that there is an

uncoupling between growth and activity.

There is a perceived need for studying the proteins in the upper regions of the GIT in addition to the

studies so far mostly focused on faecal material. One study used lavage to sample proteins along the intestine

(Li et al. 2011). However, this study focused on extracellular proteins and no attempts were made to lyse the

microbes that could have been present. Sampling in a way comparably to the one carried out in a rat study, in

which the metaproteome of different intestinal compartments were analysed (Haange et al. 2012), would


41

contribute to describe in humans the spatial functional differences along – from proximal to distal, and across

the intestinal tract – from epithelium to lumen.

We have learned by using techniques of modern molecular biology and keen interest in the ongoing

activities inside of our body that we require our microbes and our microbes require us for keeping a healthy

body condition. Careful study of the vast information on the intestinal microbiota and expanding it with

targeted new studies in a team of clinicians, microbiologists, nutrition scientists and bioinformaticians, will

sharpen our insights of the complex mechanisms ongoing inside us.

Acknowledgements

42

7. ACKNOWLEDGEMENTS

This thesis work was carried out at the Department of Veterinary Biosciences, University of Helsinki and the

Finnish Centre of Excellence in Microbial Food Safety Research. It was part of the TEKES project

“Innovations for Intestinal Health” implemented in the Faculty of Veterinary Medicine. This work was also

supported by the Academy of Finland and the Doctoral Programme in Food Chain and Health. A special

thank you to Dean Professor Antti Sukura and the Head of the Department Professor Airi Palva for providing

state-of-the-art facilities; everything needed was available.

I want to close this work by answering one question I was asked several times, especially in the beginning of

this work: Why Finland? The reason is simple - I got this project here which I felt fitted me, and which

privileges me to thank all of you to whom I am grateful for the best imaginable support:

My team of three supervisors: You displayed trust in me by giving me this project. You also believed in me

that I would finish it. Airi, I am grateful that I always could enjoy your support. Kiitos kovasti ja anna

anteeksi että en puhu suomea hyvin! Willem, your ideas were a very important driver of the project and your

enthusiasm helped through difficult times. Your positive challenging attitude allowed me to grow. Anne, you

have had many roles in this project: supervisor, office companion, friend. You have supported me with help

on a very broad scale. Thank you!

Koen Venema and Kirsi Savijoki are thanked for their critical comments on my thesis. You gave me the

chance to think through some aspects one more time.

I want to thank Ewen MacDonald for his kind suggestions to improve the English of this thesis.

This project produced some external hard-drives full of data which could only be handled with the help of

our bioinformatics J-forces. Janne and Jarkko – both of you always had a patient explanation of R and

statistics, now thanks to your pedagogical skills I’m happily using basic R-commands routinely. Jarkko - you

got me started with problem-solving of R-scripts and the knowledge that it is usually the way of reading the

data which causes the problem not the functions. Jarkko and Jarmo - you have been handling my huge tables,

solved several software issues and contributed substantially to the data analysis in the second half of the

project. Thank you!

This project allowed me to collaborate with many people both abroad and in Finland whom I want to thank

for their contributions, especially for the provision of such fine facilities (state of the art mass-spectrometers)

and the analysis tools so essential in this work: Sjef - for substantial help in getting the project started, Koos

and Peter - for the joint writing of the first manuscript of this thesis. Ilja - you not only conducted the LC-

MS/MS measurement, you even provided a courier service relating to the work for the second article. Thank

you also for the various discussions on technical aspects of LC-MS/MS.

I want to thank Jaana Mättö for making possible the collaboration with the Red Cross and for being a

member of my PhD follow-up group.

After mostly travelling to get my samples into the LC-MSMS, I finally obtained a very local collaborator

thanks to you Salla and Markku. Markku - thank you also very much for supporting me in your role as a

follow-up group member and posing some new proteomics challenges.

Acknowledgements

43

Mark - thank you very much for giving the project a Bioinformatics boost; thank you also for the numerous

(non-)proteomics chats.

I want to thank all the numerous co-authors not named individually. Thank you very much for your input.

I’m grateful for the travel financing that I received, both to communicate with people from the proteomics

and microbiology community.

Many people of the Finnish proteomics community are thanked for basic practical support.

This thesis is here also due to the efforts of many people, efforts made before I even started this project: The

supervisors of my Bachelor’s and Master’s thesis, especially Professor Hannelore Daniel and Professor Karl-

Heinz Engel, guided my first scientific steps; your inspiration and knowledge transfer planted the seed that I

should proceed to study for a PhD; the time at the NRC and people there were very important in giving me a

foundation in proteomics allowing me to start this thesis work. Thank you!

All the peer PhD-students are thanked for sharing the burdens and joys of doing a PhD. Especially Lotta for

always having a very good reason to come to Turku and making me familiar with Mölkky, Jing for all the

chats about science and life in general and your enormous scientific enthusiasm, and Katri for your very

positive spirit.

Outi - you have been an un-replaceable lab-assistance and you taught me how to give instructions in the right

way. Thank you for the Mökki experience and socialising me in the first part of the project.

The Mikro people are thanked for the chats during lunch and coffee breaks as well as the day-to-day support.

I want to thank the FiDiPro steering group for giving me the opportunity for periodical reflection of my

work.

This thesis required a lot of computer resources. Timo, vielen Dank für die stets prompte Hilfe in all den

Computerschwierigkeiten und dafür, dass du mir eine Software nach der anderen installiert hast. Vielen

Dank für die deutschen Sprüche!

Despite a language barrier once in a while there weren’t too many bureaucratic hurdles; Maija, Ritva and

Tarja - thank you for the smooth handling of these matters and the language support. Laila, thank you for the

help related to graduate school matters. Tiina - thank you very much for your kind help with guiding me

through the dissertation process.

I want to thank the graduated doctors for proving to me that graduation is possible. Especially Tanja and

Jonna are thanked for answering all my questions about how to get the thesis ready and the numerous

simultaneous translations during all these years.

All the late-night and weekend-workers are thanked for giving some background noise during the more

lonely times in the lab.

Katrin und Philipp mit Felix und Isabella: Aus unseren wöchentlichen Lauftreffs sind (Halb-)Jahres

Familientreffen geworden. Danke für eure Freundschaft!

Acknowledgements

44

Gabi und Robert: Danke für eure Unterstützung aus der Ferne und dass wir uns in München immer

willkommen fühlen.

Mama, Papa und Christoph: Der Ort meines Doktorvorhabens ist nicht der naheste zu meiner

niederbayerischen Heimat. Trotzdem habt ihr mich in meinem Vorhaben stets unterstützt wie in all meinen

Wissens-Bestrebungen. DANKE!

Unaussprechlicher Dank an Jochen und Iida: Mit euch bin ich da, wo ich sein mag und die, die ich sein mag.

Helsinki, April 2015

Carolin Kolmeder

8. REFERENCES

Ahlroos, T., Tynkkynen, S., 2009. Quantitative

strain-specific detection of Lactobacillus

rhamnosus GG in human faecal samples by

real-time PCR. J. Appl. Microbiol. 106, 506-

514.

Allmer, J., 2011. Algorithms for the de novo

sequencing of peptides from tandem mass

spectra. Expert. Rev. Proteomics 8, 645-657

Arsène-Ploetze, F., Bertin, P.N., Carapito, C.,

2014. Proteomic tools to decipher microbial

community structure and functioning.

Environ. Sci. Pollut. Res. Int.

Altschul, S.F., Madden, T.L., Schäffer, A.A.,

Zhang, J., Zhang, Z., Miller, W., Lipman,

D.J., 1997. Gapped BLAST and PSI-

BLAST: a new generation of protein

database search programs. Nucleic Acids

Res. 25, 3389-3402.

Arumugam, M., Raes, J., Pelletier, E., Le Paslier,

D., Yamada, T., Mende, D.R., Fernandes,

G.R., Tap, J., Bruls, T., Batto, J.M., et al.,

2011. Enterotypes of the human gut

microbiome. Nature 473, 174-180.

Bahl, M.I., Bergström, A., Licht, T.R., 2012.

Freezing fecal samples prior to DNA

extraction affects the Firmicutes to

Bacteroidetes ratio determined by

downstream quantitative PCR analysis.

FEMS Microbiol. Lett. 329, 193-197.

Barbhuiya, M.A., Sahasrabuddhe, N.A., Pinto,

S.M., Muthusamy, B., Singh, T.D.,

Nanjappa, V., Keerthikumar, S., Delanghe,

B., Harsha, H., Chaerkady, R., et al., 2011.

Comprehensive proteomic analysis of human

bile. Proteomics 11, 4443-4453.

Ben-Amor, K., Heilig, H., Smidt, H., Vaughan,

E.E., Abee, T., de Vos, W.M., 2005. Genetic

diversity of viable, injured, and dead fecal

bacteria assessed by fluorescence-activated

cell sorting and 16S rRNA gene analysis.

Appl. Environ. Microbiol. 71, 4679-4689.

Benndorf, D., Balcke, G.U., Harms, H., Von

Bergen, M., 2007. Functional metaproteome

analysis of protein extracts from

contaminated soil and groundwater. ISME J.

1, 224-234.

Bergström, A., Skov, T.H., Bahl, M.I., Roager,

H.M., Christensen, L.B., Ejlerskov, K.T.,

Mølgaard, C., Michaelsen, K.F., Licht, T.R.,

2014. Establishment of intestinal microbiota

during early life: a longitudinal, explorative

study of a large cohort of Danish infants.

Appl. Environ. Microbiol. 80, 2889-2900.

Bermejo, F., Algaba, A., Guerra, I., Chaparro, M.,

De-La-Poza, G., Valer, P., Piqueras, B.,

Bermejo, A., García-Alonso, J., Pérez, M.J.,

et al., 2013. Should we monitor vitamin B12

and folate levels in Crohn's disease patients?

Scand. J. Gastroenterol. 48, 1272-1277.

Berry, D., Stecher, B., Schintlmeister, A.,

Reichert, J., Brugiroux, S., Wild, B., Wanek,

W., Richter, A., Rauch, I., Decker, T., et al.,

2013. Host-compound foraging by intestinal

microbiota revealed by single-cell stable

isotope probing. Proc. Natl. Acad. Sci. U S A

110, 4720-4725.

Biagi, E., Nylund, L., Candela, M., Ostan, R.,

Bucci, L., Pini, E., Nikkilä, J., Monti, D.,

Satokari, R., Franceschi, C., et al., 2010.

Through ageing, and beyond: gut microbiota

and inflammatory status in seniors and

centenarians. PLoS One 5, e10667.

Booijink, C.C., El-Aidy, S., Rajilić-Stojanović,

M., Heilig, H.G., Troost, F.J., Smidt, H.,

Kleerebezem, M., de Vos, W.M., Zoetendal,

E.G., 2010. High temporal and inter-

individual variation detected in the human

ileal microbiota. Environ. Microbiol. 12,

3213-3227.

Bookstaver, P.B., Ahmed, Y., Millisor, V.E.,

Siddiqui, W., Albrecht, H., 2013.

Clostridium difficile: case report and concise

review of fecal microbiota transplantation. J.

S. C. Med. Assoc. 109, 62-66.

Breiman, L., 2001. Random forests. Machine

Learning 45, 5-32.

Cain, J.A., Solis, N., Cordwell, S.J., 2014. Beyond

gene expression: The impact of protein post-

translational modifications in bacteria. J.

Proteomics 97, 265-286.

Cantarel, B.L., Erickson, A.R., Verberkmoes,

N.C., Erickson, B.K., Carey, P.A., Pan, C.,

Shah, M., Mongodin, E.F., Jansson, J.K.,

Fraser-Liggett, C.M., et al., 2011. Strategies

for metagenomic-guided whole-community

proteomics of complex microbial

environments. PLoS One 6, e27173.

Cohen, D.P., Renes, J., Bouwman, F.G.,

Zoetendal, E.G., Mariman, E., de Vos, W.M.,

Vaughan, E.E., 2006. Proteomic analysis of

log to stationary growth phase Lactobacillus

plantarum cells and a 2-DE database.


Craig, R., Beavis, R.C., 2004. TANDEM:

matching proteins with tandem mass spectra.

Bioinformatics 20, 1466-1467.

Cullender, T.C., Chassaing, B., Janzon, A.,

Kumar, K., Muller, C.E., Werner, J.J.,

Angenent, L.T., Bell, M.E., Hay, A.G.,

Peterson, D.A., et al., 2013. Innate and

adaptive immunity interact to quench

microbiome flagellar motility in the gut. Cell

Host Microbe 14, 571-581.

Cummings, J. H., 1975. The colon: absorptive,

secretory and metabolic functions. Digestion

13, 232-240.

Cummings, J.H., Macfarlane, G.T., 1991. The

control and consequences of bacterial

fermentation in the human colon. J. Appl.

Bacteriol. 70, 443-459.

Cummings, J.H., Wiggins, H.S., Jenkins, D.J.,

Houston, H., Jivraj, T., Drasar, B.S., Hill,

M.J., 1978. Influence of diets high and low

in animal fat on bowel habit, gastrointestinal

transit time, fecal microflora, bile acid, and

fat excretion. J. Clin. Invest. 61, 953-963.

Daniel, H., Gholami, A.M., Berry, D.,

Desmarchelier, C., Hahne, H., Loh, G.,

Mondot, S., Lepage, P., Rothballer, M.,

Walker, A., et al., 2014. High-fat diet alters

gut microbiota physiology in mice. ISME J.

8, 295-308.

David, L.A., Maurice, C.F., Carmody, R.N.,

Gootenberg, D.B., Button, J.E., Wolfe, B.E.,

Ling, A.V., Devlin, A.S., Varma, Y.,

Fischbach, M.A., et al., 2014. Diet rapidly

and reproducibly alters the human gut

microbiome. Nature 505, 559-563.

de Vos, W.M., de Vos, E.A.J., 2012. Role of the

intestinal microbiome in health and disease:

from correlation to causation. Nutr. Rev. 70,

S45-S56.

Degnan, P.H., Taga, M.E., Goodman, A.L., 2014.

Vitamin B 12 as a Modulator of Gut

Microbial Ecology. Cell Metab. 20, 769-778.

Derrien, M., Vaughan, E.E., Plugge, C.M., de

Vos, W.M., 2004. Akkermansia muciniphila

gen. nov., sp. nov., a human intestinal mucin-

degrading bacterium. Int. J. Syst. Evol.

Microbiol. 54, 1469-1476.

Dewulf, E.M., Cani, P.D., Claus, S.P., Fuentes, S.,

Puylaert, P.G., Neyrinck, A.M., Bindels,

L.B., de Vos, W.M., Gibson, G.R., Thissen,

J.P., et al., 2013. Insight into the prebiotic

concept: lessons from an exploratory, double

blind intervention study with inulin-type

fructans in obese women. Gut 62, 1112-

1121.

Dimidi, E., Christodoulides, S., Fragkos, K.C.,

Scott, S.M., Whelan, K., 2014. The effect of

probiotics on functional constipation in

adults: a systematic review and meta-analysis

of randomized controlled trials. Am. J. Clin.

Nutr. 100, 1075-1084.

dos Santos, F.B., de Vos, W.M., Teusink, B.,

2013. Towards metagenome-scale models for

industrial applications—the case of Lactic

Acid Bacteria. Curr. Opin. Biotechnol. 24,

200-206.

dos Santos, V.M., Müller, M., De Vos, W.M.,

2010. Systems biology of the gut: the

interplay of food, microbiota and host at the

mucosal interface. Curr. Opin. Biotechnol.

21, 539-550.

Duncan, S.H., Louis, P., Thomson, J.M., Flint,

H.J., 2009. The role of pH in determining the

species composition of the human colonic

microbiota. Environ. Microbiol. 11, 2112-

2122.

Edgar, R.C., 2010. Search and clustering orders of

magnitude faster than BLAST.

Bioinformatics 26, 2460-2461.

Ellrott, K., Jaroszewski, L., Li, W., Wooley, J.C.,

Godzik, A., 2010. Expansion of the protein

repertoire in newly explored environments:

human gut microbiome specific protein

families. PLoS Comput. Biol. 6, e1000798.

Erickson, A.R., Cantarel, B.L., Lamendella, R.,

Darzi, Y., Mongodin, E.F., Pan, C., Shah,

M., Halfvarson, J., Tysk, C., Henrissat, B., et

al., 2012. Integrated metagenomics/

metaproteomics reveals human host-

microbiota signatures of Crohn's disease.

PLoS One 7, e49138.

Estaki, M., DeCoffe, D., Gibson, D.L., 2014.

Interplay between intestinal alkaline

phosphatase, diet, gut microbes and

immunity. World J. Gastroenterol. 20,

15650-15656.

Everard, A., Cani, P.D., 2013. Diabetes, obesity

and gut microbiota. Best. Pract. Res. Clin.

Gastroenterol. 27, 73-83.

Faraway, J.J., 2002. Practical Regression and

ANOVA using R. Published on the URL:

http://www.cran.rproject.org/doc/contrib/Far

away-PRA.pdf.

Faust, K., Sathirapongsasuti, J.F., Izard, J.,

Segata, N., Gevers, D., Raes, J.,

Huttenhower, C., 2012. Microbial co-

occurrence relationships in the human

microbiome. PLoS Computat. Biol. 8,

e1002606.

Feig, M.A., Hammer, E., Völker, U., Jehmlich,

N., 2013. In-depth proteomic analysis of the

human cerumen—a potential novel

diagnostically relevant biofluid. J.


Fernandes, J., Su, W., Rahat-Rozenbloom, S.,

Wolever, T., Comelli, E., 2014. Adiposity,

gut microbiota and faecal short chain fatty

acids are linked in adult humans. Nutr

Diabetes 4, e121.

Ferrer, M., Ruiz, A., Lanza, F., Haange, S.,

Oberbach, A., Till, H., Bargiela, R., Campoy,

C., Segura, M.T., Richter, M., et al., 2013.

Microbiota from the distal guts of lean and

obese adolescents exhibit partial functional

redundancy besides clear differences in

community structure. Environ. Microbiol.

15, 211-226.

Finn, R.D., Bateman, A., Clements, J., Coggill, P.,

Eberhardt, R.Y., Eddy, S.R., Heger, A.,

Hetherington, K., Holm, L., Mistry, J., et al.,

2014. Pfam: the protein families database.

Nucleic Acids Res. 42, D222-30.

Finucane, M.M., Sharpton, T.J., Laurent, T.J.,

Pollard, K.S., 2014. A taxonomic signature

of obesity in the microbiome? Getting to the

guts of the matter. PloS One 9, e84689.

Foell, D., Wittkowski, H., Roth, J., 2009.

Monitoring disease activity by stool

analyses: from occult blood to molecular

markers of intestinal inflammation and

damage. Gut 58, 859-868.

Franzosa, E.A., Morgan, X.C., Segata, N.,

Waldron, L., Reyes, J., Earl, A.M.,

Giannoukos, G., Boylan, M.R., Ciulla, D.,

Gevers, D., et al., 2014. Relating the

metatranscriptome and metagenome of the

human gut. Proc. Natl. Acad. Sci. U.S.A.

111, E2329-38.

Gaci, N., Borrel, G., Tottey, W., O’Toole, P.W.,

Brugère, J., 2014. Archaea and the human

gut: New beginning of an old story. World J.

Gastroenterol: WJG 20, 16062.

Galperin, M.Y., Makarova, K.S., Wolf, Y.I.,

Koonin, E.V., 2015. Expanded microbial

genome coverage and improved protein

family annotation in the COG database.


Gao, X., Pujos-Guillot, E., Martin, J., Galan, P.,

Juste, C., Jia, W., Sebedio, J., 2009.

Metabolite analysis of human fecal water by

gas chromatography/mass spectrometry with

ethyl chloroformate derivatization. Anal.

Biochem. 393, 163-175.

Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner,

L., Xu, M., Maynard, D.M., Yang, X., Shi,

W., Bryant, S.H., 2004. Open mass

spectrometry search algorithm. J. Proteome

Res 3, 958-964.

Georges, A.A., El-Swais, H., Craig, S.E., Li,

W.K., Walsh, D.A., 2014. Metaproteomic

analysis of a winter to spring succession in

coastal northwest Atlantic Ocean microbial

plankton. ISME J. 8, 1301-1313.

Gerritsen, J., Smidt, H., Rijkers, G.T., de Vos,

W.M., 2011. Intestinal microbiota in human

health and disease: the impact of probiotics.

Genes Nutr. 6, 209-240.

Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B.,

Turnbaugh, P.J., Samuel, B.S., Gordon, J.I.,

Relman, D.A., Fraser-Liggett, C.M., Nelson,

K.E., 2006. Metagenomic analysis of the

human distal gut microbiome. Science 312,

1355-1359.

Glatter, T., Ludwig, C., Ahrne, E., Aebersold, R.,

Heck, A.J., Schmidt, A., 2012. Large-scale

quantitative assessment of different in-

solution protein digestion protocols reveals

superior cleavage efficiency of tandem Lys-

C/trypsin proteolysis over trypsin digestion.

J. Proteome Res. 11, 5145-5156.

Goel, A., Eckhardt, T.H., Puri, P., Jong, A.,

Santos, F.B., Giera, M., Fusetti, F., de Vos,

W.M., Kok, J., Poolman, B., et al., 2015.

Protein costs do not explain evolution of

metabolic strategies and regulation of

ribosomal content. Mol. Microbiol.

Graf, D., Di Cagno, R., Fåk, F., Flint, H.J.,

Nyman, M., Saarela, M., Watzl, B., 2015.

Contribution of diet to the composition of the

human gut microbiota. Microb. Ecol. Health

Dis. 26.

Graham, C., McMullan, G., Graham, R.L.J., 2011.

Proteomics in the microbial sciences.

Bioeng. Bugs, 17-30.

Guthals, A., Bandeira, N., 2012. Peptide

identification by tandem mass spectrometry

with alternate fragmentation modes. Mol.

Cell. Proteomics 11, 550-557.

Haange, S., Oberbach, A., Schlichting, N.,

Hugenholtz, F., Smidt, H., von Bergen, M.,

Till, H., Seifert, J., 2012. Metaproteome

analysis and molecular genetics of rat

intestinal microbiota reveals section and

localization resolved species distribution and

enzymatic functionalities. J. Proteome Res.

11, 5406-5417.

Hampton-Marcell, J.T., Moormann, S.M., Owens,

S.M., Gilbert, J.A., 2013. Preparation and

metatranscriptomic analyses of host-microbe

systems. Methods Enzymol. 531, 169-185.

Hansen, S.H., Stensballe, A., Nielsen, P.H.,

Herbst, F., 2014. Metaproteomics:

Evaluation of protein extraction from

activated sludge. Proteomics 14, 2535-2539.

Harden, C.J., Perez-Carrion, K., Babakordi, Z.,

Plummer, S.F., Hepburn, N., Barker, M.E.,

Wright, P.C., Evans, C.A., Corfe, B.M.,

2012. Evaluation of the salivary proteome as

a surrogate tissue for systems biology

approaches to understanding appetite. J.


Hercberg, S., Galan, P., Preziosi, P., Bertrais, S.,

Mennen, L., Malvy, D., Roussel, A., Favier,

A., Briançon, S., 2004. The SU. VI. MAX

Study: a randomized, placebo-controlled trial

of the health effects of antioxidant vitamins

and minerals. Arch. Intern. Med. 164, 2335-

2342.

Huang, J., Wang, F., Ye, M., Zou, H., 2014.

Enrichment and separation techniques for

large-scale proteomics analysis of the protein

post-translational modifications. J.

Chromatogr. A. 1372, 1-17.

Human Microbiome Jumpstart Reference Strains

Consortium, Nelson, K.E., Weinstock, G.M.,

Highlander, S.K., Worley, K.C., Creasy,

H.H., Wortman, J.R., Rusch, D.B., Mitreva,

M., Sodergren, E., Chinwalla, A.T.,

Feldgarden, M., Gevers, D., Haas, B.J.,

Madupu, R., Ward, D.V., Birren, B.W.,

Gibbs, R.A., Methe, B., Petrosino, J.F.,

Strausberg, R.L., Sutton, G.G., White, O.R.,

Wilson, R.K., Durkin, S., Giglio, M.G.,

Gujja, S., Howarth, C., Kodira, C.D.,

Kyrpides, N., Mehta, T., Muzny, D.M.,

Pearson, M., Pepin, K., Pati, A., Qin, X.,

Yandava, C., Zeng, Q., Zhang, L., Berlin,

A.M., Chen, L., Hepburn, T.A., Johnson, J.,

McCorrison, J., Miller, J., Minx, P.,

Nusbaum, C., Russ, C., Sykes, S.M.,

Tomlinson, C.M., Young, S., Warren, W.C.,

Badger, J., Crabtree, J., Markowitz, V.M.,

Orvis, J., Cree, A., Ferriera, S., Fulton, L.L.,

Fulton, R.S., Gillis, M., Hemphill, L.D.,

Joshi, V., Kovar, C., Torralba, M.,

Wetterstrand, K.A., Abouellleil, A., Wollam,

A.M., Buhay, C.J., Ding, Y., Dugan, S.,

FitzGerald, M.G., Holder, M., Hostetler, J.,

Clifton, S.W., Allen-Vercoe, E., Earl, A.M.,

Farmer, C.N., Liolios, K., Surette, M.G., Xu,

Q., Pohl, C., Wilczek-Boney, K., Zhu, D.,

2010. A catalog of reference genomes from

the human microbiome. Science 328, 994-

999.

Human Microbiome Project Consortium, 2012.

Structure, function and diversity of the

healthy human microbiome. Nature 486,

207-214.

Huson, D.H., Weber, N., 2013. Microbial

community analysis using MEGAN.

Methods Enzymol. 531, 465-485.

Integrative HMP (iHMP) Research Network

Consortium, 2014. The Integrative Human

Microbiome Project: dynamic analysis of

microbiome-host omics profiles during

periods of human health and disease. Cell

Host Microbe 16, 276-289.

Jagtap, P., Goslinga, J., Kooren, J.A., McGowan,

T., Wroblewski, M.S., Seymour, S.L.,

Griffin, T.J., 2013. A two‐step database

search method improves sensitivity in

peptide sequence matches for

metaproteomics and proteogenomics studies.


Jalanka-Tuovinen, J., Salojärvi, J., Salonen, A.,

Immonen, O., Garsed, K., Kelly, F.M.,

Zaitoun, A., Palva, A., Spiller, R.C., de Vos,

W.M., 2014. Faecal microbiota composition

and host-microbe cross-talk following

gastroenteritis and in postinfectious irritable

bowel syndrome. Gut 63, 1737-1745.

Jalanka-Tuovinen, J., Salonen, A., Nikkilä, J.,

Immonen, O., Kekkonen, R., Lahti, L.,

Palva, A., de Vos, W.M., 2011. Intestinal

microbiota in healthy adults: temporal

analysis reveals individual and common core

and relation to intestinal symptoms. PLoS

One 6, e23035.

Jarocki, V.M., Tacchi, J.L., Djordjevic, S.P.,

2014. Non‐proteolytic functions of microbial

proteases increases pathological complexity.


Jensen, L.J., Julien, P., Kuhn, M., von Mering, C.,

Muller, J., Doerks, T., Bork, P., 2008.

eggNOG: automated construction and

annotation of orthologous groups of genes.


Jeong, K., Kim, S., Bandeira, N., 2012. False

discovery rates in spectral identification.

BMC Bioinformatics 13 Suppl 16, S2.

Jewett, M.C., Miller, M.L., Chen, Y., Swartz, J.R.,

2009. Continued protein synthesis at low

[ATP] and [GTP] enables cell adaptation

during energy limitation. J. Bacteriol. 191,

1083-1091.

Jiménez, E., Fernández, L., Marín, M.L., Martín,

R., Odriozola, J.M., Nueno-Palop, C.,

Narbad, A., Olivares, M., Xaus, J.,

Rodríguez, J.M., 2005. Isolation of

commensal bacteria from umbilical cord

blood of healthy neonates born by cesarean

section. Curr. Microbiol. 51, 270-274.

Jiménez-Girón, A., Ibáñez, C., Cifuentes, A.,

Simó, C., Muñoz-González, I., Martín-

Álvarez, P.J., Bartolome, B., Moreno

Arribas, M.V., 2014. Faecal metabolomic

fingerprint after moderate consumption of

red wine by healthy subjects. J. Proteome

Res. 14, 897-905.

Joslin, E.P., 1901. The Influence of Bile on

Metabolism. J. Exp. Med. 5, 513-525.

Jost, T., Lacroix, C., Braegger, C., Chassard, C.,

2014. Stability of the maternal gut

microbiota during late pregnancy and early

lactation. Curr. Microbiol. 68, 419-427.

Jumpertz, R., Le, D.S., Turnbaugh, P.J., Trinidad,

C., Bogardus, C., Gordon, J.I., Krakoff, J.,

2011. Energy-balance studies reveal

associations between gut microbes, caloric

load, and nutrient absorption in humans. Am.

J. Clin. Nutr. 94, 58-65.

Juste, C., Kreil, D.P., Beauvallet, C., Guillot, A.,

Vaca, S., Carapito, C., Mondot, S., Sykacek,

P., Sokol, H., Blon, F., et al., 2014. Bacterial

protein signals are associated with Crohn's

disease. Gut 63, 1566-1577.

Käll, L., Storey, J.D., Noble, W.S., 2008. Non-

parametric estimation of posterior error

probabilities associated with peptides

identified by tandem mass spectrometry.

Bioinformatics 24, i42-8.

Kam, S.Y., Hennessy, T., Chua, S.C., Gan, C.S.,

Philp, R., Hon, K.K., Lai, L., Chan, W.H.,

Ong, H.S., Wong, W.K., et al., 2011.

Characterization of the human gastric fluid

proteome reveals distinct pH-dependent

protein profiles: implications for biomarker

studies. J. Proteome Res. 10, 4535-4546.

Kanehisa, M., Goto, S., Kawashima, S., Okuno,

Y., Hattori, M., 2004. The KEGG resource

for deciphering the genome. Nucleic Acids

Res. 32, D277-80.

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M.,

Tanabe, M., 2012. KEGG for integration and

interpretation of large-scale molecular data

sets. Nucleic Acids Res. 40, D109-D114.

Karlsson, F.H., Tremaroli, V., Nookaew, I.,

Bergström, G., Behre, C.J., Fagerberg, B.,

Nielsen, J., Bäckhed, F., 2013. Gut

metagenome in European women with

normal, impaired and diabetic glucose

control. Nature 498, 99-103.

Karlsson, F.H., Nookaew, I., Nielsen, J., 2014.

Metagenomic Data Utilization and Analysis

(MEDUSA) and Construction of a Global

Gut Microbial Gene Catalogue. PLoS

Comput. Biol. 10, e1003706.

Kassinen, A., Krogius-Kurikka, L., Mäkivuokko,

H., Rinttila, T., Paulin, L., Corander, J.,

Malinen, E., Apajalahti, J., Palva, A., 2007.

The fecal microbiota of irritable bowel

syndrome patients differs significantly from

that of healthy subjects. Gastroenterology

133, 24-33.

Keiblinger, K.M., Wilhartitz, I.C., Schneider, T.,

Roschitzki, B., Schmid, E., Eberl, L., Riedel,

K., Zechmeister-Boltenstern, S., 2012. Soil

metaproteomics - Comparative evaluation of

protein extraction protocols. Soil Biol.

Biochem. 54, 14-24.

Kekkonen, R.A., Lummela, N., Karjalainen, H.,

Latvala, S., Tynkkynen, S., Jarvenpaa, S.,

Kautiainen, H., Julkunen, I., Vapaatalo, H.,

Korpela, R., 2008a. Probiotic intervention

has strain-specific anti-inflammatory effects

in healthy adults. World J. Gastroenterol. 14,

2029-2036.

Kekkonen, R.A., Sysi-Aho, M., Seppanen-Laakso,

T., Julkunen, I., Vapaatalo, H., Oresic, M.,

Korpela, R., 2008b. Effect of probiotic

Lactobacillus rhamnosus GG intervention on

global serum lipidomic profiles in healthy

adults. World J. Gastroenterol. 14, 3188-

3194.

Kim, J., An, H.J., Garrido, D., German, J.B.,

Lebrilla, C.B., Mills, D.A., 2013. Proteomic

analysis of Bifidobacterium longum subsp.

infantis reveals the metabolic insight on

consumption of prebiotics and host glycans.

PloS One 8, e57535.

Klaassens, E.S., de Vos, W.M., Vaughan, E.E.,

2007. Metaproteomics approach to study the

functionality of the microbiota in the human

infant gastrointestinal tract. Appl. Environ.

Microbiol. 73, 1388-1392.

Koonin, E.V., Fedorova, N.D., Jackson, J.D.,

Jacobs, A.R., Krylov, D.M., Makarova, K.S.,

Mazumder, R., Mekhedov, S.L., Nikolskaya,

A.N., Rao, B.S., et al., 2004. A

comprehensive evolutionary classification of

proteins encoded in complete eukaryotic

genomes. Genome Biol. 5, R7.

Koponen, J., Laakso, K., Koskenniemi, K.,

Kankainen, M., Savijoki, K., Nyman, T.A.,

de Vos, W.M., Tynkkynen, S., Kalkkinen,

N., Varmanen, P., 2012. Effect of acid stress

on protein expression and phosphorylation in

Lactobacillus rhamnosus GG. J. Proteomics

75, 1357-1374.

Korpela, K., Flint, H.J., Johnstone, A.M., Lappi,

J., Poutanen, K., Dewulf, E., Delzenne, N.,

de Vos, W.M., Salonen, A., 2014. Gut

microbiota signatures predict host and

microbiota responses to dietary interventions

in obese individuals. PloS One 9, e90702.

Kort, R., Caspers, M., van de Graaf, A., van

Egmond, W., Keijser, B., Roeselers, G.,

2014. Shaping the oral microbiota through

intimate kissing. Microbiome 2, 41.

Kortman, G.A., Raffatellu, M., Swinkels, D.W.,

Tjalsma, H., 2014. Nutritional iron turned

inside out: intestinal stress from a gut

microbial perspective. FEMS Microbiol.

Rev. 38, 1202-1234.

Koskenniemi, K., Koponen, J., Kankainen, M.,

Savijoki, K., Tynkkynen, S., de Vos, W.M.,

Kalkkinen, N., Varmanen, P., 2009.

Proteome analysis of Lactobacillus

rhamnosus GG using 2-D DIGE and mass

spectrometry shows differential protein

production in laboratory and industrial-type

growth media. J. Proteome Res. 8, 4993-

5007.

Koskenniemi, K., Laakso, K., Koponen, J.,

Kankainen, M., Greco, D., Auvinen, P.,

Savijoki, K., Nyman, T.A., Surakka, A.,

Salusjarvi, T., de Vos, W.M., Tynkkynen, S.,

Kalkkinen, N., Varmanen, P., 2011.

Proteomics and transcriptomics

characterization of bile stress response in

probiotic Lactobacillus rhamnosus GG. Mol.

Cell. Proteomics 10, M110.002741.

Kovatcheva‐Datchary, P., Egert, M., Maathuis,

A., Rajilić‐Stojanović, M., De Graaf, A.A.,

Smidt, H., De Vos, W.M., Venema, K.,

2009. Linking phylogenetic identities of

bacteria to starch fermentation in an in vitro

model of the large intestine by RNA‐based

stable isotope probing. Environ. Microbiol.

11, 914-926.

Kurokawa, K., Itoh, T., Kuwahara, T., Oshima,

K., Toh, H., Toyoda, A., Takami, H., Morita,

H., Sharma, V.K., Srivastava, T.P., et al.,

2007. Comparative metagenomics revealed

commonly enriched gene sets in human gut

microbiomes. DNA Res. 14, 169-181.

Lahti, L., Salonen, A., Kekkonen, R.A., Salojärvi,

J., Jalanka-Tuovinen, J., Palva, A., Orešič,

M., de Vos, W.M., 2013a. Associations

between the human intestinal microbiota,

Lactobacillus rhamnosus GG and serum

lipids indicated by integrated analysis of

high-throughput profiling data. PeerJ 1, e32.

Lahti, L., Torrente, A., Elo, L.L., Brazma, A.,

Rung, J., 2013b. A fully scalable online pre-

processing algorithm for short

oligonucleotide microarray atlases. Nucleic

Acids Res. 41, e110.

Lallès, J., 2014. Intestinal alkaline phosphatase:

novel functions and protective effects. Nutr.

Rev. 72, 82-94.

Lam, H., Deutsch, E.W., Eddes, J.S., Eng, J.K.,

Stein, S.E., Aebersold, R., 2008. Building

consensus spectral libraries for peptide

identification in proteomics. Nat. Methods 5,

873-875.

Land, M., Hauser, L., Jun, S., Nookaew, I., Leuze,

M.R., Ahn, T., Karpinets, T., Lund, O., Kora,

G., Wassenaar, T., et al., 2015. Insights from

20 years of bacterial genome sequencing.

Funct. Integr. Genomics, 1-21.

Lawley, T.D., Walker, A.W., 2013. Intestinal

colonization resistance. Immunology 138, 1-

11.

Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E.,

Hildebrand, F., Falony, G., Almeida, M.,

Arumugam, M., Batto, J., Kennedy, S., et al.,

2013. Richness of human gut microbiome

correlates with metabolic markers. Nature

500, 541-546.

LeBlanc, J.G., Milani, C., de Giori, G.S., Sesma,

F., van Sinderen, D., Ventura, M., 2013.

Bacteria as vitamin suppliers to their host: a

gut microbiota perspective. Curr. Opin.

Biotechnol. 24, 160-168.

Lee, J.H., Karamychev, V.N., Kozyavkin, S.A.,

Mills, D., Pavlov, A.R., Pavlova, N.V.,

Polouchine, N.N., Richardson, P.M.,

Shakhova, V.V., Slesarev, A.I., et al., 2008.

Comparative genomic analysis of the gut

bacterium Bifidobacterium longum reveals

loci susceptible to deletion during pure

culture growth. BMC Genomics 9, 247.

Le Maréchal, C., Peton, V., Plé, C., Vroland, C.,

Jardin, J., Briard-Bion, V., Durant, G.,

Chuat, V., Loux, V., Foligné, B., et al., 2015.

Surface proteins of Propionibacterium

freudenreichii are involved in its anti-

inflammatory properties. Journal of

proteomics 113, 447-461.

Letunic, I., Yamada, T., Kanehisa, M., Bork, P.,

2008. iPath: interactive exploration of

biochemical pathways and networks. Trends

Biochem. Sci. 33, 101-103.

Levandowsky, M., Winter, D., 1971. Distance

between sets. Nature 234, 34-35.

Li, J., Jia, H., Cai, X., Zhong, H., Feng, Q.,

Sunagawa, S., Arumugam, M., Kultima, J.R.,

Prifti, E., Nielsen, T., et al., 2014. An

integrated catalog of reference genes in the

human gut microbiome. Nat. Biotechnol. 32,

834-841.

Li, T., Chiang, J.Y., 2015. Bile acids as metabolic

regulators. Curr. Opin. Gastroenterol. 31,

159-165.

Li, X., LeBlanc, J., Truong, A., Vuthoori, R.,

Chen, S.S., Lustgarten, J.L., Roth, B., Allard,

J., Ippoliti, A., Presley, L.L., et al., 2011. A

metaproteomic approach to study human-

microbial ecosystems at the mucosal luminal

interface. PLoS One 6, e26542.

Lichtman, J.S., Marcobal, A., Sonnenburg, J.L.,

Elias, J.E., 2013. Host-centric proteomics of

stool: a novel strategy focused on intestinal

responses to the gut microbiota. Mol. Cell.


Liebler, D.C., Zimmerman, L.J., 2013. Targeted

quantitation of proteins by mass

spectrometry. Biochemistry (N.Y.) 52, 3797-

3806.

Lindenstrauß, A.G., Behr, J., Ehrmann, M.A.,

Haller, D., Vogel, R.F., 2013. Identification

of fitness determinants in Enterococcus

faecalis by differential proteomics. Arch.

Microbiol. 195, 121-130.

Louis, P., Young, P., Holtrop, G., Flint, H.J.,

2010. Diversity of human colonic butyrate‐producing bacteria revealed by analysis of

the butyryl‐CoA: acetate CoA‐transferase

gene. Environ. Microbiol. 12, 304-314.

Lozupone, C.A., Stombaugh, J., Gonzalez, A.,

Ackermann, G., Wendel, D., Vazquez-Baeza,

Y., Jansson, J.K., Gordon, J.I., Knight, R.,

2013. Meta-analyses of studies of the human

microbiota. Genome Res. 23, 1704-1714.

Machiels, K., Joossens, M., Sabino, J., De Preter,

V., Arijs, I., Eeckhaut, V., Ballet, V., Claes,

K., Van Immerseel, F., Verbeke, K., et al.,

2014. A decrease of the butyrate-producing

species Roseburia hominis and

Faecalibacterium prausnitzii defines

dysbiosis in patients with ulcerative colitis.

Gut 63, 1275-1283.

Mahowald, M.A., Rey, F.E., Seedorf, H.,

Turnbaugh, P.J., Fulton, R.S., Wollam, A.,

Shah, N., Wang, C., Magrini, V., Wilson,

R.K., et al., 2009. Characterizing a model

human gut microbiota composed of members

of its two dominant bacterial phyla. Proc.

Natl. Acad. Sci. U.S.A. 106, 5859-5864.

Marco, M.L., de Vries, M.C., Wels, M.,

Molenaar, D., Mangell, P., Ahrne, S., de

Vos, W.M., Vaughan, E.E., Kleerebezem,

M., 2010. Convergence in probiotic

Lactobacillus gut-adaptive responses in

humans and mice. ISME J. 4, 1481-1484.

Martens, E.C., Kelly, A.G., Tauzin, A.S., Brumer,

H., 2014. The devil lies in the details: how

variations in polysaccharide fine-structure

impact the physiology and evolution of gut

microbes. J. Mol. Biol. 426, 3851-3865.

McNulty, N.P., Wu, M., Erickson, A.R., Pan, C.,

Erickson, B.K., Martens, E.C., Pudlo, N.A.,

Muegge, B.D., Henrissat, B., Hettich, R.L.,

et al., 2013. Effects of diet on resource

utilization by a model human gut microbiota

containing Bacteroides cellulosilyticus WH2,

a symbiont with an extensive glycobiome.

PLoS Biol. 11, e1001637.

Mesuere, B., Devreese, B., Debyser, G., Aerts,

M., Vandamme, P., Dawyndt, P., 2012.

Unipept: Tryptic Peptide-Based Biodiversity

Analysis of Metaproteome Samples. J.

Proteome Res. 11, 5773-5780.

Michaels, J.A., Dasari, S., Pereira, L., Reddy,

A.P., Lapidus, J.A., Lu, X., Jacob, T.,

Thomas, A., Rodland, M., Roberts, C.T., et

al., 2007. Comprehensive proteomic analysis

of the human amniotic fluid proteome:

gestational age-dependent changes. J.

Proteome Res. 6, 1277-1285.

Michalik, S., Bernhardt, J., Otto, A., Moche, M.,

Becher, D., Meyer, H., Lalk, M., Schurmann,

C., Schlüter, R., Kock, H., et al., 2012. Life

and death of proteins: a case study of

glucose-starved Staphylococcus aureus. Mol.


Moreno-Indias, I., Cardona, F., Tinahones, F.J.,

Queipo-Ortuño, M.I., 2014. Impact of the gut

microbiota on the development of obesity

and type 2 diabetes mellitus. Front.

Microbiol. 5.

Morgan, X.C., Segata, N., Huttenhower, C., 2012.

Biodiversity and functional genomics in the

human microbiome. Trends Genet. 29, 51-

58.

Murayama, K., Fujimura, T., Morita, M., Shindo,

N., 2001. One‐step subcellular fractionation

of rat liver tissue using a Nycodenz density

gradient prepared by freezing‐thawing and

two‐dimensional sodium dodecyl sulfate

electrophoresis profiles of the main fraction

of organelles. Electrophoresis 22, 2872-2880.

Muth, T., Kolmeder, C.A., Salojärvi, J., Keskitalo,

S., Varjosalo, M., Verdam, F.J., Rensen,

S.S., Reichl, U., Vos, W.M., Rapp, E., 2015.

Navigating through metaproteomics data–a

logbook of database searching. Proteomics.

Muth, T., Behne, A., Heyer, R., Kohrs, F.,

Benndorf, D., Hoffmann, M., Lehtevä, M.,

Reichl, U., Martens, L., Rapp, E., 2015. The

MetaProteomeAnalyzer: a powerful open-

source software suite for metaproteomics

data analysis and interpretation. J. Proteome

Res. 14, 1557-1565.

Muth, T., Peters, J., Blackburn, J., Rapp, E.,

Martens, L., 2013. ProteoCloud: A full-

featured open source proteomics cloud

computing pipeline. J. Proteomics 88, 104-

108.

Nabokina, S.M., Inoue, K., Subramanian, V.S.,

Valle, J.E., Yuasa, H., Said, H.M., 2014.

Molecular identification and functional

characterization of the human colonic

thiamine pyrophosphate transporter. J. Biol.

Chem. 289, 4405-4416.

Nagarajan, N., Pop, M., 2013. Sequence assembly

demystified. Nat. Rev. Genet. 14, 157-167.

Neijssel, O.M., Mattos, M., 1994. The energetics

of bacterial growth: a reassessment. Mol.

Microbiol. 13, 179-182.

Neilson, K.A., Ali, N.A., Muralidharan, S.,

Mirzaei, M., Mariani, M., Assadourian, G.,

Lee, A., Van Sluyter, S.C., Haynes, P.A.,

2011. Less label, more free: approaches in

label‐free quantitative mass spectrometry.


Nesvizhskii, A.I., 2014. Proteogenomics:

concepts, applications and computational

strategies. Nat. Methods 11, 1114-1125.

Neville, B.A., Forde, B.M., Claesson, M.J.,

Darby, T., Coghlan, A., Nally, K., Ross,

R.P., O’Toole, P.W., 2012. Characterization

of pro-inflammatory flagellin proteins

produced by Lactobacillus ruminis and

related motile Lactobacilli. PloS One 7,

e40592.

Nguyen, T.L., Vieira-Silva, S., Liston, A., Raes,

J., 2015. How informative is the mouse for

human gut microbiota research? Dis. Model.

Mech. 8, 1-16.

Nielsen, H.B., Almeida, M., Juncker, A.S.,

Rasmussen, S., Li, J., Sunagawa, S., Plichta,

D.R., Gautier, L., Pedersen, A.G., Le

Chatelier, E., 2014. Identification and

assembly of genomes and genetic elements

in complex metagenomic samples without

using reference genomes. Nat. Biotechnol.

32, 822-828.

Norin, K., Gustafsson, B., Lindblad, B., Midtvedt,

T., 1985. The establishment of some

microflora associated biochemical

characteristics in feces from children during

the first years of life. Acta Paediatrica 74,

207-212.

Norman, J.M., Handley, S.A., Virgin, H.W., 2014.

Kingdom-agnostic metagenomics and the

importance of complete characterization of

enteric microbial communities.

Gastroenterology 146, 1459-1469.

Nylund, L., Satokari, R., Salminen, S., de Vos,

W.M., 2014. Intestinal microbiota during

early life–impact on health and disease. Proc.

Nutr. Soc. 73, 457-469.

O’Keefe, S.J.D, Li, J.V., Lahti, L., Ou, J.,

Carbonero, F., Mohammed, K., Posma, P.M.,

Kinross, J., Wahl, E., Ruder, E., et al., 2015.

Fat, Fiber and Cancer Risk in African

Americans and Rural Africans. Nature

Comm., 6.

Otto, A., Becher, D., Schmidt, F., 2014.

Quantitative proteomics in the field of

microbiology. Proteomics 14, 547-565.

Ouwerkerk, J.P., de Vos, W.M., Belzer, C., 2013.

Glycobiome: bacteria and mucus at the

epithelial interface. Best Pract. Res. Clin.

Gastroenterol. 27, 25-38.

Oveland, E., Muth, T., Rapp, E., Martens, L.,

Berven, F.S., Barsnes, H., 2015. Viewing the

proteome: How to visualize proteomics data?

Proteomics.

Panda, S., Casellas, F., Vivancos, J.L., Cors,

M.G., Santiago, A., Cuenca, S., Guarner, F.,

Manichanh, C., 2014. Short-Term Effect of

Antibiotics on Human Gut Microbiota. PloS

One 9, e95476.

Paradis, E., Claude, J., Strimmer, K., 2004. APE:

Analyses of Phylogenetics and Evolution in

R language. Bioinformatics 20, 289-290.

Paulo, J.A., Kadiyala, V., Banks, P.A., Conwell,

D.L., Steen, H., 2013. Mass spectrometry-

based quantitative proteomic profiling of

human pancreatic and hepatic stellate cell

lines. Genomics Proteomics Bioinformatics

11, 105-113.

Pearson, J.R., Gill, C.I., Rowland, I.R., 2009.

Diet, fecal water, and colon cancer–

development of a biomarker. Nutr. Rev. 67,

509-526.

Penders, J., Thijs, C., Vink, C., Stelma, F.F.,

Snijders, B., Kummeling, I., van den Brandt,

P.A., Stobberingh, E.E., 2006. Factors

influencing the composition of the intestinal

microbiota in early infancy. Pediatrics 118,

511-521.

Powell, S., Forslund, K., Szklarczyk, D.,

Trachana, K., Roth, A., Huerta-Cepas, J.,

Gabaldon, T., Rattei, T., Creevey, C., Kuhn,

M., et al., 2014. eggNOG v4.0: nested

orthology inference across 3686 organisms.


Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf,

K.S., Manichanh, C., Nielsen, T., Pons, N.,

Levenez, F., Yamada, T., et al., 2010. A

human gut microbial gene catalogue

established by metagenomic sequencing.

Nature 464, 59-65.

Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F.,

Liang, S., Zhang, W., Guan, Y., Shen, D., et

al., 2012. A metagenome-wide association

study of gut microbiota in type 2 diabetes.

Nature 490, 55-60.

Quercia, S., Candela, M., Giuliani, C., Turroni, S.,

Luiselli, D., Rampelli, S., Brigidi, P.,

Franceschi, C., Bacalini, M.G., Garagnani,

P., et al., 2014. From lifetime to evolution:

timescales of human gut microbiota

adaptation. Front. Microbiol. 5.

Rajilić-Stojanović, M., de Vos, W.M., 2014. The

first 1000 cultured species of the human

gastrointestinal microbiota. FEMS

Microbiol. Rev. 38, 996-1047.

Rajilić-Stojanović, M., Heilig, H.G., Molenaar,

D., Kajander, K., Surakka, A., Smidt, H., de

Vos, W.M., 2009. Development and

application of the human intestinal tract chip,

a phylogenetic microarray: analysis of

universally conserved phylotypes in the

abundant microbiota of young and elderly

adults. Environ. Microbiol. 11, 1736-1751.

Rampelli, S., Candela, M., Turroni, S., Biagi, E.,

Collino, S., Franceschi, C., O'Toole, P.W.,

Brigidi, P., 2013. Functional metagenomic

profiling of intestinal microbiome in extreme

ageing. Aging (Albany NY) 5, 902.

Rappé, M.S., Giovannoni, S.J., 2003. The

uncultured microbial majority. Annu. Rev.

Microbiol. 57, 369-394.

R-core team, R: A language and environment for

statistical computing. R Foundation for

Statistical Computing, Vienna, Austria. URL

http://www.R-project.org/.

Reichardt, N., Duncan, S.H., Young, P.,

Belenguer, A., Leitch, C.M., Scott, K.P.,

Flint, H.J., Louis, P., 2014. Phylogenetic

distribution of three pathways for propionate

production within the human gut microbiota.

ISME J. 8, 1323-1335.

Reichhardt, M.P., Jarva, H., de Been, M., Miguel

Rodriguez, J., Quintana Jimenez, E.,

Loimaranta, V., de Vos, W.M., Meri, S.,

2014. The salivary scavenger and agglutinin

in early life: diverse roles in amniotic fluid

and in the infant intestine. J. Immunol. 61,

252-252.

Rezzi, S., Ramadan, Z., Martin, F.P., Fay, L.B.,

van Bladeren, P., Lindon, J.C., Nicholson,

J.K., Kochhar, S., 2007. Human metabolic

phenotypes link directly to specific dietary

preferences in healthy individuals. J.


Righetti, P.G., Candiano, G., Citterio, A.,

Boschetti, E., 2015. Combinatorial Peptide

Ligand Libraries as a “Trojan Horse” in

Deep Discovery Proteomics. Anal. Chem.

87, 293-305.

Rinttilä, T., Kassinen, A., Malinen, E., Krogius,

L., Palva, A., 2004. Development of an

extensive set of 16S rDNA‐targeted primers

for quantification of pathogenic and

indigenous bacteria in faecal samples by

real‐time PCR. J. Appl. Microbiol. 97, 1166-

1177.

Ritari, J., Lahti, L., de Vos, W.M., 2015.

Improved Taxonomic Assignment of Human

Intestinal 16s rRNA Sequence Reads by a

Dedicated Reference Database. Under

review.

Rodríguez-Valera, F., 2004. Environmental

genomics, the big picture? FEMS Microbiol.

Lett. 231, 153-158.

Rosenberger, G., Koh, C.C., Guo, T., Röst, H.L.,

Kouvonen, P., Collins, B.C., Heusel, M.,

Liu, Y., Caron, E., Vichalkovski, A., et al.,

2014. A repository of assays to quantify

10,000 human proteins by SWATH-MS.

Scientific Data 1, 140031.

Ruiz-Moyano, S., Tao, N., Underwood, M.A.,

Mills, D.A., 2012. Rapid discrimination of

Bifidobacterium animalis subspecies by

matrix-assisted laser desorption ionization-

time of flight mass spectrometry. Food

Microbiol. 30, 432-437.

Salonen, A., de Vos, W.M., 2014. Impact of diet

on human intestinal microbiota and health.

Annu. Rev. Food Sci. Technol. 5, 239-262.

Salonen, A., Lahti, L., Salojärvi, J., Holtrop, G.,

Korpela, K., Duncan, S.H., Date, P.,

Farquharson, F., Johnstone, A.M., Lobley,

G.E., et al., 2014. Impact of diet and

individual variation on intestinal microbiota

composition and fermentation products in

obese men. ISME J. 8, 2218-2230.

Salonen, A., Nikkilä, J., Jalanka-Tuovinen, J.,

Immonen, O., Rajilić-Stojanović, M.,

Kekkonen, R.A., Palva, A., de Vos, W.M.,

2010. Comparative analysis of fecal DNA

extraction methods with phylogenetic

microarray: effective recovery of bacterial

and archaeal DNA using mechanical cell

lysis. J. Microbiol. Methods 81, 127-134.

Scheperjans, F., Aho, V., Pereira, P.A., Koskinen,

K., Paulin, L., Pekkonen, E., Haapaniemi, E.,

Kaakkola, S., Eerola‐Rautio, J., Pohja, M., et

al., 2015. Gut microbiota are related to

Parkinson's disease and clinical phenotype.

Mov. Disord. 30, 350-358.

Schneider, T., Riedel, K., 2010. Environmental

proteomics: analysis of structure and

function of microbial communities.


Schnorr, S.L., Candela, M., Rampelli, S.,

Centanni, M., Consolandi, C., Basaglia, G.,

Turroni, S., Biagi, E., Peano, C., Severgnini,

M., et al., 2014. Gut microbiome of the

Hadza hunter-gatherers. Nat. Commun. 5,

3654.

Schoepfer, A.M., Schaffer, T., Seibold-Schmid,

B., Müller, S., Seibold, F., 2008. Antibodies

to flagellin indicate reactivity to bacterial

antigens in IBS patients. Neurogastroenterol.

Motil. 20, 1110-1118.

Scholtens, P.A., Oozeer, R., Martin, R., Amor,

K.B., Knol, J., 2012. The early settlers:

intestinal microbiology in early life. Annu.

Rev. Food Sci. Technol. 3, 425-447.

Schrimpe-Rutledge, A.C., Fontes, G., Gritsenko,

M.A., Norbeck, A.D., Anderson, D.J.,

Waters, K.M., Adkins, J.N., Smith, R.D.,

Poitout, V., Metz, T.O., 2012. Discovery of

novel glucose-regulated proteins in isolated

human pancreatic islets using LC–MS/MS-

based proteomics. J. Proteome Res. 11,

3520-3532.

Schuijt, T.J., van der Poll, T., de Vos, W.M.,

Wiersinga, W.J., 2013. The intestinal

microbiota and host immune interactions in

the critically ill. Trends Microbiol. 21, 221-

229.

Scott, K.P., Gratz, S.W., Sheridan, P.O., Flint,

H.J., Duncan, S.H., 2013. The influence of

diet on the gut microbiota. Pharmacol. Res.

69, 52-60.

Segata, N., Waldron, L., Ballarini, A.,

Narasimhan, V., Jousson, O., Huttenhower,

C., 2012. Metagenomic microbial

community profiling using unique clade-

specific marker genes. Nat. Methods 9, 811-

814.

Seifert, J., Taubert, M., Jehmlich, N., Schmidt, F.,

Völker, U., Vogt, C., Richnow, H., von

Bergen, M., 2012. Protein‐based stable

isotope probing (protein‐SIP) in functional

metaproteomics. Mass Spectrom. Rev. 31,

683-697.

Shevchenko, A., Tomas, H., Havlis, J., Olsen,

J.V., Mann, M., 2006. In-gel digestion for

mass spectrometric characterization of

proteins and proteomes. Nat. Protoc. 1, 2856-

2860.

Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L.,

Deutsch, E.W., 2013. Combining results of

multiple search engines in proteomics. Mol.


Siggins, A., Gunnigle, E., Abram, F., 2012.

Exploring mixed microbial community

functioning: recent advances in

metaproteomics. FEMS Microbiol. Ecol. 80,

265-280.

Song, Y., Garg, S., Girotra, M., Maddox, C., von

Rosenvinge, E.C., Dutta, A., Dutta, S.,

Fricke, W.F., 2013. Microbiota dynamics in

patients treated with fecal microbiota

transplantation for recurrent Clostridium

difficile infection. PloS One 8, e81330.

Sonnenburg, E.D., Sonnenburg, J.L., 2014.

Starving our Microbial Self: The Deleterious

Consequences of a Diet Deficient in

Microbiota-Accessible Carbohydrates. Cell

Metab. 20, 779-786.

Speicher, K.D., Kolbas, O., Harper, S., Speicher,

D.W., 2000. Systematic analysis of peptide

recoveries from in-gel digestions for protein

identifications in proteome studies. J.

Biomol. Tech. 11, 74-86.

Steck, N., Mueller, K., Schemann, M., Haller, D.,

2012. Bacterial proteases in IBD and IBS.

Gut 61, 1610-1618.

Stephen, A.M., Cummings, J.H., 1980. The

microbial contribution to human faecal mass.

J. Med. Microbiol. 13, 45-56.

Tanca, A., Palomba, A., Pisanu, S., Addis, M.F.,

Uzzau, S., 2015. Enrichment or depletion?

The impact of stool pretreatment on

metaproteomic characterization of the human

gut microbiota. Proteomics.

Tanca, A., Palomba, A., Pisanu, S., Deligios, M.,

Fraumene, C., Manghina, V., Pagnozzi, D.,

Addis, M.F., Uzzau, S., 2014. A

straightforward and efficient analytical

pipeline for metaproteome characterization.

Microbiome 2, 1-16.

Tatusov, R.L., Koonin, E.V., Lipman, D.J., 1997.

A genomic perspective on protein families.

Science 278, 631-637.

Taverna, D.M., Goldstein, R.A., 2002. Why are

proteins marginally stable? Proteins 46, 105-

109.

Tims, S., Derom, C., Jonkers, D.M., Vlietinck, R.,

Saris, W.H., Kleerebezem, M., de Vos,

W.M., Zoetendal, E.G., 2012. Microbiota

conservation and BMI signatures in adult

monozygotic twins. ISME J. 7, 707-717.

Tissier, H., 1900. Recherches sur la flore

intestinale des nourrissons: état normal et

pathologique. G. Carré et C. Naud.

Turnbaugh, P.J., Hamady, M., Yatsunenko, T.,

Cantarel, B.L., Duncan, A., Ley, R.E., Sogin,

M.L., Jones, W.J., Roe, B.A., Affourtit, J.P.,

et al., 2009. A core gut microbiome in obese

and lean twins. Nature 457, 480-484.

Tyakht, A.V., Kostryukova, E.S., Popenko, A.S.,

Belenikin, M.S., Pavlenko, A.V., Larin,

A.K., Karpova, I.Y., Selezneva, O.V.,

Semashko, T.A., Ospanova, E.A., et al.,

2013. Human gut microbiota community

structures in urban and rural populations in

Russia. Nat. Comm. 4.

van Nood, E., Vrieze, A., Nieuwdorp, M.,

Fuentes, S., Zoetendal, E.G., de Vos, W.M.,

Visser, C.E., Kuijper, E.J., Bartelsman, J.F.,

Tijssen, J.G., et al., 2013. Duodenal infusion

of donor feces for recurrent Clostridium

difficile. N. Engl. J. Med. 368, 407-415.

van Ommen, B., van der Greef, J., Ordovas, J.M.,

Daniel, H., 2014. Phenotypic flexibility as

key factor in the human nutrition and health

relationship. Genes Nutr. 9, 1-9.

Vaudel, M., Sickmann, A., Martens, L., 2012.

Current methods for global proteome

identification. Expert Rev. Proteomics 9,

519-532.

Venables, W.N., Ripley, B.D., 2002. Modern

applied statistics with S. Springer.

Venema, K., Van den Abbeele, P., 2013.

Experimental models of the gut microbiome.

Best Pract. Res. Clin. Gastroenterol. 27, 115-

126.

Verdam, F.J., Fuentes, S., de Jonge, C., Zoetendal,

E.G., Erbil, R., Greve, J.W., Buurman, W.A.,

de Vos, W.M., Rensen, S.S., 2013. Human

intestinal microbiota composition is

associated with local and systemic

inflammation in obesity. Obesity 21, E607-

E615.

Vey, G., Charles, T.C., 2014. MetaProx: the

database of metagenomic proximons.

Database (Oxford) 2014,

10.1093/database/bau097.

Vijay-Kumar, M., Aitken, J.D., Gewirtz, A.T.,

2008. Toll like receptor-5: protecting the gut

from enteric microbes. Semin.

Immunopathol. 30, 11-21.

Vrieze, A., Van Nood, E., Holleman, F., Salojärvi,

J., Kootte, R.S., Bartelsman, J.F.W.M.,

Dallinga-Thie, G.M., Ackermans, M.T.,

Serlie, M.J., Oozeer, R., et al., 2012.

Transfer of intestinal microbiota from lean

donors increases insulin sensitivity in

subjects with metabolic syndrome.

Gastroenterology 143, 913-916.

Wacklin, P., Tuimala, J., Nikkilä, J., Tims, S.,

Mäkivuokko, H., Alakulppi, N., Laine, P.,

Rajilić-Stojanović, M., Paulin, L., de Vos,

W.M., et al., 2014. Faecal Microbiota

Composition in Adults Is Associated with the

FUT2 Gene Determining the Secretor Status.

PloS One 9, e94863.

Walker, A.W., Ince, J., Duncan, S.H., Webster,

L.M., Holtrop, G., Ze, X., Brown, D., Stares,

M.D., Scott, P., Bergerat, A., et al., 2011.

Dominant and diet-responsive groups of

bacteria within the human colonic

microbiota. ISME J. 5, 220-230.

Walters, W.A., Xu, Z., Knight, R., 2014. Meta-

analyses of human gut microbes associated

with obesity and IBD. FEBS Lett. 588, 4223-

4233.

Wang, M., Ahrne, S., Jeppsson, B., Molin, G.,

2005. Comparison of bacterial diversity

along the human intestinal tract by direct

cloning and sequencing of 16S rRNA genes.

FEMS Microbiol. Ecol. 54, 219-231.

Wang, M., Li, M., Wu, S., Lebrilla, C.B.,

Chapkin, R.S., Ivanov, I., Donovan, S.M.,

2015. Fecal Microbiota Composition of

Breast-fed Infants is Correlated with Human

Milk Oligosaccharides Consumed. J. Pediatr.

Gastroenterol. Nutr.

Warnecke, F., Luginbühl, P., Ivanova, N.,

Ghassemian, M., Richardson, T.H., Stege,

J.T., Cayouette, M., McHardy, A.C.,

Djordjevic, G., Aboushadi, N., et al., 2007.

Metagenomic and functional analysis of

hindgut microbiota of a wood-feeding higher

termite. Nature 450, 560-565.

Weng, M., Walker, W., 2013. The role of gut

microbiota in programming the immune

phenotype. J. Dev. Orig. Health Dis. 4, 203-

214.

Williams, R., Peisajovich, S.G., Miller, O.J.,

Magdassi, S., Tawfik, D.S., Griffiths, A.D.,

2006. Amplification of complex gene

libraries by emulsion PCR. Na. Methods 3,

545-550.

Wilmes, P., Bond, P.L., 2004. The application of

two-dimensional polyacrylamide gel

electrophoresis and downstream analyses to a

mixed community of prokaryotic

microorganisms. Environ. Microbiol. 6, 911-

920.

Wilmes, P., Bond, P.L., 2006. Metaproteomics:

studying functional gene expression in

microbial ecosystems. Trends Microbiol. 14,

92-97.

Wu, G.D., Compher, C., Chen, E.Z., Smith, S.A.,

Shah, R.D., Bittinger, K., Chehoud, C.,

Albenberg, L.G., Nessel, L., Gilroy, E., et

al., 2014. Comparative metabolomics in

vegans and omnivores reveal constraints on

diet-dependent gut microbiota metabolite

production. Gut , gutjnl-2014-308209.

Wu, J., An, Y., Yao, J., Wang, Y., Tang, H., 2010.

An optimised sample preparation method for

NMR-based faecal metabonomic analysis.

Analyst 135, 1023-1030.

Xiong, W., Giannone, R.J., Morowitz, M.J.,

Banfield, J.F., Hettich, R.L., 2014.

Development of an Enhanced Metaproteomic

Approach for Deepening the Microbiome

Characterization of the Human Infant Gut. J.


Yadav, A.K., Kumar, D., Dash, D., 2012.

Learning from decoys to improve the

sensitivity and specificity of proteomics

database search results. PloS One 7, e50651.

Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan,

I., Dominguez-Bello, M.G., Contreras, M.,

Magris, M., Hidalgo, G., Baldassano, R.N.,

Anokhin, A.P., et al., 2012. Human gut

microbiome viewed across age and

geography. Nature 486, 222-227.

Yu, Z., Morrison, M., 2004. Improved extraction

of PCR-quality community DNA from

digesta and fecal samples. BioTechniques 36,

808-812.

Zoetendal, E.G., Akkermans, A.D., Akkermans-

van Vliet, W.M., de Visser, J., Arjan, G.M.,

de Vos, W.M., 2001. The host genotype

affects the bacterial community in the human

gastronintestinal tract. Microb. Ecol. Health

Dis. 13, 129-134.

Zoetendal, E.G., Raes, J., van den Bogert, B.,

Arumugam, M., Booijink, C.C., Troost, F.J.,

Bork, P., Wels, M., de Vos, W.M.,

Kleerebezem, M., 2012. The human small

intestinal microbiota is driven by rapid

uptake and conversion of simple

carbohydrates. ISME J. 6, 1415-1426.

Zoetendal, E.G., de Vos, W.M., 2014. Effect of

diet on the intestinal microbiota and its

activity. Curr. Opin. Gastroenterol. 30, 189-

195.

Metaproteomics of the Human Intestinal Tract to Assess … · 2017. 3. 11. · I ABSTRACT The human...

Documents

Transcript of Metaproteomics of the Human Intestinal Tract to Assess … · 2017. 3. 11. · I ABSTRACT The human...