Poster #21 Development of pH -insensitive FRET sensors for … · 2016. 9. 27. · Development of...
Transcript of Poster #21 Development of pH -insensitive FRET sensors for … · 2016. 9. 27. · Development of...
Development of pH-insensitive FRET sensors for single-cell metabolite measurements. Dennis Botman1, Joachim Goedhart2, Bas Teusink1
1 Systems Bioinformatics/AIMMS, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, The
Netherlands. 2 Section of Molecular Cytology, van Leeuwenhoek Centre for Advanced Microscopy, Swammerdam Institute for
Life Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
Introduction: Recent studies show that cell
populations are heterogeneous also at the metabolic
level, which can result in metabolically distinct
subpopulations. Yet, few methods provide the
possibility to study metabolism on single-cell level
in time. However, the development of biosensors
based on Förster Resonance Energy Transfer
(FRET) gives new opportunities to measure
metabolism at the single-cell level. FRET biosensors
consist of a donor fluorescent protein (FP) and an
acceptor FP, called a FRET pair, with a linker in
between. The linker changes conformation in
appearance of a stimulus (e.g. metabolites), and the
resultant change in energy transfer allows single-cell
measurement of certain metabolites such as glucose,
ATP, Pi and more. However, it is known that the FPs
in FRET sensors are pH-sensitive, affecting the
energy transfer (i.e. the FRET ratio), and this
compromises their use in systems in which pH is a
dynamic variable. Therefore, we want to determine
the effect of pH changes on the FRET sensors and
reduce this effect when possible.
Materials & Methods: We expressed metabolic
FRET biosensors in yeast and tested the effect of
various pH conditions on the FRET ratios.
Furthermore, we characterized the pH sensitivity of
various FPs to determine which FPs in the sensors
can be optimized for pH sensitivity.
Results: We found that the current available FRET
sensors are highly affected by pH lower than 7.
Mostly acceptor FPs are pH-sensitive and pH
dynamic environments affect their fluorescent
intensity. This affects the FRET ratios and therefore
the accuracy of FRET sensors. Finally, we
characterized the pH sensitivity for several FPs and
identified potentially suitable pH-insensitive FRET
pairs to decrease the effect of pH on the biosensors.
Discussion: We show that the current FRET sensors
cannot be used in dynamic or acid pH environments
such as the golgi apparatus, mitochondria,
subcellular vesicles, membranes and yeast cytosol.
Yet, we found potential pH-insensitive FRET pairs
and our aim is to develop pH-insensitive FRET
sensor with these suitable pH-insensitive FRET
pairs.
Poster #21
Lack of unity in the workforce – How single cell heterogeneity may impact product formation at near-zero specific growth rates
Phillipp Schmidt1, Anne Doerr2, Johan van Heerden1, Frank Bruggeman1, Bas Teusink1 1Systems Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
2Bionanoscience, Delft University of Technology, Delft, The Netherlands E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
1. Motivation
Product formation is often uncoupled from growth. In fedbatch cultivations, or under zero-growth-rate conditions in retentostats, product formation is carried out when cells hardly grow. The heterogeneity of such cultures is however not understood. Perhaps high stress levels cause cells to diversify and prepare for future conditions, at the expense of product formation.
2. Aim
We aim to characterize the heterogeneity of product
formation in Saccharomyces cerevisiae cultures, using single-molecule RNA FISH (Fluorescent in situ hybridization). The advantage of this method is that it does not require genetic engineering and that it allows for quantification of heterogeneity in gene expression by single cells.
3. Results
We have established the protocol for smRNA FISH in
Saccharomyces cerevisiae. The protocol involves cell fixation, fluorescent labeling of single transcript molecules, the computational framework for fluorescence microscopy image analysis, and the statistical analysis of transcript counts. We illustrate this method for a nutrient transition experiment where yeast switches from glucose to ethanol growth.
Figure 1: smRNA FISH pipeline
4. Conclustion smRNA FISH is a powerful method for quantification of
the heterogeneity in single-cell gene expression. In the future, we will use this method to assess the heterogeneity of the expression of product formation enzymes under low growth-rate conditions.
Poster #22
1
BioSB 2016 Abstract
Molecular pathway analysis in human trabecular meshwork cells after treatment with corticosteroids
Ilona Liesenborghs1, Jan S.A.G. Schouten1, Theo G.M.F. Gorgels1, Chris T. Evelo2, Lars M.T. Eijssen2,Martina Kutmon2,
Henny J.M. Beckers1, Carroll A.B. Webers1
1 Department of Ophthalmology, University Eye Clinic Maastricht, The Netherlands 2Department of Bioinformatics, BiGCaT, Maastricht University, The Netherlands
E-mail: [email protected], [email protected]
1. Introduction
For decades corticosteroids have been used for the treatment of many different diseases in a variety of specialisms, including ophthalmology. However, the use of these drugs causes an elevation of the intraocular pressure (IOP) in 18-36% of the patients. In patients with primary open angle glaucoma this incidence is reported to be as high as 92%.[1, 2] The risk is highest when corticosteroids are applied in or close to the eyes. Continued corticosteroid use causes permanent damage of the optic nerve resulting in loss of visual field and eventually blindness. If such damage occurs, it is called corticosteroid-induced glaucoma.
The precise pathogenic mechanism of corticosteroid-induced glaucoma is not known, although the trabecular meshwork, where the outflow of fluid in the eye is regulated, seems to play an important role.[2] In order to get a better insight in the pathogenic mechanism we ran pathway analysis of microarray datasets in which the gene expression of human trabecular meshwork cells treated with and without dexamethasone (a corticosteroid) were compared. 2. Materials & Methods
A search for microarray datasets was conducted in Gene Expression Omnibus (GEO) and ArrayExpress. Datasets GSE6298, GSE37474, GSE16643, and GSE65240 were selected for further analysis. Quality control and pre-processing was performed with ArrayAnalysis.org. Datasets were subjected to quality control using diagnostic plots and pre-processed using normalization methods. Then statistical analysis was conducted using the Limma package for R/Bioconductor (linear regression models).
Thereafter pathway overrepresentation analysis and visualization were performed with PathVisio, using the human pathway collection of WikiPathways (now containing 811 pathways). The overrepresentation analysis was executed using differentially expressed genes. Pathways with a Z-score > 1.96, a permuted p-value < 0.05, and ≥ 3 changed genes were considered significantly changed. All significantly changed pathways were compared and the pathways that were significant in at least three studies were further investigated.
3. Results Pathway analysis of datasets GSE6298, GSE37474, GSE65240, and GSE16643 showed respectively 20, 19, 38 and 18 significantly altered pathways. None of the pathways were significantly altered in all the datasets. Seven pathways were significantly affected in three datasets. Some of these are already known to be associated with glaucoma or corticosteroid induced effects. For example, the complement activation pathway (WP545) is upregulated in all three the studies and has been associated with primary open angle glaucoma.[3] Complement activation can be triggered by immunoglobulins. However, studies showed that other triggers are able to activate the pathway.[4, 5] 4. Discussion
The method described above can be used to identify pathogenic pathways that play a role in the development of corticosteroid-induced glaucoma, which can contribute to help understand the mechanism. References 1. Tripathi, R.C., et al., Corticosteroids and glaucoma risk.
Drugs Aging, 1999. 15(6): p. 439-50. 2. Jones, R., 3rd and D.J. Rhee, Corticosteroid-induced
ocular hypertension and glaucoma: a brief review and update of the literature. Curr Opin Ophthalmol, 2006. 17(2): p. 163-7.
3. Tezel, G., et al., Oxidative stress and the regulation of complement activation in human glaucoma. Invest Ophthalmol Vis Sci, 2010. 51(10): p. 5071-82.
4. Ding, Q.J., et al., Lack of immunoglobulins does not prevent C1q binding to RGC and does not alter the progression of experimental glaucoma. Invest Ophthalmol Vis Sci, 2012. 53(10): p. 6370-7.
5. Sjoberg, A.P., L.A. Trouw, and A.M. Blom, Complement activation and inhibition: a delicate balance. Trends Immunol, 2009. 30(2): p. 83-90.
Poster #23
Epigenetic analysis of patients with FTD/MND indicate a role for pathways involved in
neurological development and calcium ion binding
Erdogan Taskesen1,2, Danielle Posthuma1,2 and Yolande Pijnenburg2
1VU University Amsterdam, Complex Trait Genetics (CTG), Amsterdam, the Netherlands. 2VU University Medical Center (VUMC), Alzheimercentrum, Amsterdam, the Netherlands
E-mail: [email protected], [email protected], [email protected]
1. Introduction
Frontotemporal dementia (FTD) is the second most
common form of dementia characterized by progressive
degeneration of the temporal and frontal lobes of the brain.
The use of Genome-wide association studies (GWAS) have
become a standard approach to identify genetic risk variants.
However, only a handful of highly penetrant genetic variants
have so far been identified in the largest GWAS study
containing 3526 FTD patients. It becomes clear that a
substantial part of the genetic risk variants has a small effect.
A currently important open question is whether, besides
genetic factors, epigenetic factors may converge on
biological processes, and as such cause degeneration of the
temporal and frontal lobes of the brain developmental
processes.
2. Materials & Methods
In this study we analysed the DNA-methylation profiles
of 128 FTD patients, and created a stepwise integration
approach to analyse the association between GWAS
associated SNPs and epigenetic regulation.
3. Results
Analysis of all FTD patients (n=128) revealed a
heterogeneous DNA-methylation profile whereas patients
with Motor Neuron Disease (FTD/MND) showed a more
homogeneous profile. We therefore followed up the latter
group and detected in total 224 unique genes with
significantly differential cytosine DNA-methylated levels
(PFDR<0.05) compared to healthy controls. Although DNA-
methylation profiles are derived from whole blood, we can
demonstrate that our detected genes are most significantly
associated with the brain tissue-type (GTEx), more specific
with the Medial Prefrontal Cortex, and the GO-term
Forebrain development (all with P<0.05). Interestingly, out
of the 224 DNA-methylated genes, 37 genes also contained a
GWAS associated SNP in FTD/MDN.
To address the affected biological mechanisms, we
subsequently performed a pathway analysis and detected
associations with various neurological functions, but also
with epigenetic brain marks (PBY<0.05). Further network
analysis revealed calcium ion binding, and intrinsic
component of plasma membrane being affected (P<0.001).
Although some of these processes are already associated
with dementia, the relationship with epigenetics is yet
unknown.
For other neuropsychiatric diseases, such as Bipolar
disorder, abnormal development of temporal and frontal
lobes is also known. Interestingly, our detected differential
DNA-methylated genes also significantly overlap with genes
associated with Bipolar Disorder. This may indicate a
common neurological mechanism that may have been
affected.
4. Conclusion
We show for the first time detailed epigenetic biological
processes involved in FTD/MND, and that understanding of
both genetic and epigenetic factors are critical for
unravelling the road to abnormal neurological development.
Poster #24
Defining the Threshold and Boundaries of the Keratinocytes Response to Blue Light Exposure
Zandra C. Félix Garza1, Joerg Liebmann2, Matthias Born2, Peter A.J. Hilbers1 and Natal A.W. van Riel1
1 Dept. of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands 2 Philips GmbH, Innovative Technologies, Aachen, Germany
E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
1. Introduction
Blue light (BL) irradiation of human keratinocytes is known to decrease their proliferation and increase their differentiation without inducing apoptosis in a wavelength and fluence dependent manner 1. It has been suggested that these BL effects are mediated by nitric oxide (NO) 2 and reactive oxygen species (ROS) 3. However, little is known of the threshold and boundaries of the keratinocytes response to blue light. From in vitro studies at higher wavelengths, it is known that a cell’s redox potential is inversely correlated with the magnitude of the irradiation’s effect 4. Characterizing the cellular response of these epidermal cells to BL may contribute to the design of effective therapies for inflammatory skin conditions. In order to define the keratinocytes response to blue light exposure and provide a holistic view of the underlying mechanism of BL irradiation we propose a computational model-based method.
2. Materials & Methods
We used a computational model to describe the
variations induced by blue light exposure on the cellular concentrations of NO and ROS, and the associated changes in the cellular processes of keratinocytes.
Data from previous studies was included in the model. The data comprised fluence and wavelength dependent concentrations of NO and ROS, and keratinocytes kinetics. 3. Results
Here we present preliminary results obtained with the
model, and examine whether it accurately incorporates the observed changes in the proliferation and differentiation capacity of keratinocytes and the concentration of NO and ROS after BL exposure. The model is then used to analyse the effect of blue light on keratinocytes with an initial low and high redox potential. Moreover, it is employed to define the maximum and minimum points of the cellular response to blue light, considering different energy levels. 4. Discussion
Although the proposed model constitutes an ongoing
work, its preliminary results agree with the reported
observations of the BL induced changes on the cellular processes of keratinocytes. References
1. Liebmann, J., Born, M. & Kolb-Bachofen, V. Blue-Light Irradiation Regulates Proliferation and Differentiation in Human Skin Cells. J. Invest. Dermatol. 130, 259–269 (2010). 2. Opländer, C. et al. Mechanism and biological relevance of blue-light (420–453nm)-induced nonenzymatic nitric oxide generation from photolabile nitric oxide derivates in human skin in vitro and in vivo. Free Radic. Biol. Med. 65, 1363–1377 (2013). 3. Liebel, F., Kaur, S., Ruvolo, E., Kollias, N. & Southall, M. D. Irradiation of skin with visible light induces reactive oxygen species and matrix-degrading enzymes. J. Invest. Dermatol. 132, 1901–1907 (2012). 4. Karu, T. Ten Lectures on Basic Science of Laser Phototherapy. Prima Books.(2007)
Poster #25
Visualising gene expression in the brain
Sjoerd M.H. Huisman1,2, Baldur van Lew2,3, Ahmed Mahfouz1,2,Marcel J.T. Reinders1, Boudewijn P.F. Lelieveldt1,2
1Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands2Image Processing, Leiden University Medical Center, Leiden, The Netherlands
3Computer Graphics and Visualisation, Delft University of Technology, Delft, The NetherlandsE-mail: [email protected]
1. Introduction
The scientific endeavour to understand the developmentand molecular characteristics of the human brain partly re-lies on gene expression data of brain samples. The highanatomical complexity of the brain, and the large numberof genes that are expressed in it, mean that it is challengingto analyse this data. Gene expression analyses have giveninsight in for instance the main transcriptional networks,but a strong visual way of exploring this data is lacking.
Gene expression data of the brain can be studied in anumber of distinct ways. One can focus on relationshipsbetween genes, to identify networks or modules of func-tionally related genes that are relevant to areas of the brain.On the other hand, one can focus on the relationships be-tween areas in the brain, for instance to complement con-nectivity data with molecular information. When data ofbrains in different stages of development are available, onecan also look at the changes that take place in the brain,both in relationships between genes and between anatomi-cal brain areas. Combined, these analyses provide a wealthof information on the molecular basis of brain function.
We introduce a methodology and tool, called Brain-Lens, to analyse spatial or spatio-temporal gene expressiondata of the brain. It uses linked co-expression maps andan anatomical explorer, both to show patterns in gene-generelationships and in sample-sample relationships. In the in-teractive tool, users can make selections in gene and samplemaps to discover gene expression patterns, or upload genesets of interest to find their brain-specific transcriptional re-lationships and explore the organisation of expression inthe brain.
2. Materials & Methods
We made use of the human brain transcriptome data fromthe Allen Human Brain Atlas, which contains microarraydata collected from the postmortem brains of six healthydonors. Between 300 and 1000 samples were dissected ineach brain. We used a computationally efficient implemen-tation of t-SNE to embed the human brain transcriptome ofeach donor in two separate 2D maps simultaneously: (1)gene t-SNE and (2) sample t-SNE.
As an example, to investigate the advantage of thesimultaneous embeddings, we analysed a set of 74 post-synaptic density (PSD)-related genes. We clustered the
PSD-related genes based on their distance in the gene basedt-SNE map, using DBSCAN. We used DAVID to find thegene ontology (GO) terms associated with each cluster ofgenes.
3. Results
For our example, the viewer shows that the PSD-relatedgenes cluster into three distinct clusters, cluster 1 of 16genes, cluster 2 of 28 genes, and cluster 3 of 8 genes.Genes in cluster 1 are all preferentially expressed in thecerebral cortex, and are enriched for GO-term synapse (p= 1.01 · 10−7). Genes in cluster 2 are similar (albeit withstronger expression in cerebellar cortex), and are also en-riched for synapse (p = 7.30 · 10−22). On the other hand,genes in cluster 3 have low expression in the cerebral cortexand cerebellum and high expression in subcortical regions,such as the thalamus and brainstem. Cluster 3 is enrichedfor GO-term myelin sheath (p = 6.86 · 10−9).
Figure 1: A: Gene t-SNE map with PSD gene clusters; B:Sample t-SNE maps for the three clusters; C: Average ex-pression (blue-red) of these clusters in brain slices.
4. Discussion
The presented analysis shows that our simultaneous em-bedding approach can be very helpful in identifying func-tionally related genes as well as their anatomical organiza-tion. Moreover, the anatomical localization of each samplerenders integration with imaging data straightforward, re-vealing relationships between groups of genes and specificimaging observations.
Poster #26
Integrated omics strategy identifies a potential new resistance pathway in
colorectal cancer Martin Fitzpatrick1,2*, Anna Ressa1,2*, Joep de Ligt1,3*, Evert Bosdriesz1,4*, Anirudh Prahallad4, Myrthe Jager3, Gianluca Maddalo2,
Lisanne de la Fonteijne3, Maarten Altelaar2, Rene Bernards4, Edwin Cuppen3, Lodewijk Wessels4 & Albert J. Heck2
*Equal contribution | 1. Cancer Genomics Center, The Netherlands. | 2. Biomolecular Mass Spectrometry and Proteomics, Utrecht
Institute for Pharmaceutical Sciences, Utrecht University & Netherlands Proteomics Centre, Utrecht, The Netherlands | 3. Centre for
Molecular Medicine, Department of Genetics, University Medical Centre Utrecht, Utrecht, The Netherlands | 4. The Netherlands
Cancer Institute, Amsterdam, The Netherlands
E-mails: [email protected], [email protected], [email protected], [email protected]
1. Introduction
Activating mutations of oncogenic proteins are a common
feature of many cancer and an obvious target for drug
intervention. BRAF V600E mutant skin cancers respond
well to BRAF drug inhibition. BRAF-mutant colorectal
cancer (CRC) however is insensitive to treatment due to
feedback activation of the epidermal growth factor receptor
(EGFR) [1]. Treatment of CRC with a combination
treatment of BRAF-inhibitor (BRAFi) and EGFR-inhibitor
(EGFRi) is currently undergoing clinical trials
[NCT01750918 in clinicaltrials.gov]. In this project we
aim to identify the mechanisms underlying BRAFi
resistance in CRC and identify both alternative treatment
targets and possible escape pathways using an integrative
multi-omics analysis approach.
2. Materials & Methods
Colorectal tumour cell line WiDr (WT and PTPN11 -/-) [2]
was used to study resistance of CRC to drug treatment.
Following EGFR stimulation cells were monitored under 6
conditions (control, BRAFi, EGFRi and a combination of
the 2 inhibitors, PTPN11-/- and PTPN11-/- with BRAFi).
Gene expression, protein levels and protein
phosphorylation were quantified in biological triplicates at
5 subsequent timepoints (0, 2, 6, 24 and 48 hours).
Expression levels were measured using ribosomal depleted
RNA Sequencing, protein levels were measured using
Mass Spectrometry and phosphorylation was measured by
Ti4+- IMAC Phospho enriched Mass Spectrometry. Fold
changes in abundance or occupancy at each timepoint were
calculated relative to the untreated controls or between
treatments for treatment-specific effects. Gene ontology
function, pathway and process enrichments were
performed at all omics levels.
3. Results
The resulting proteomic and phosphoproteomic data were
beyond the scale of established tools to process and
analyse. To remove this limitation an open source Python
package (PaDuA) was developed to provide a complete
robust workflow for both scripted and interactive analysis
of proteomic and phosphoproteomics data (Figure 1).
Quality control (QC) indicated strong concordance of
technical and biological replicates across the proteomic,
phosphoproteomics and transcriptomic datasets. Principal
component analysis (PCA) showed strong clustering at
both the protein abundance and expression level with
coordinated changes at later timepoints following both
BRAFi and combination BRAFi plus EGFRi treatment.
Figure 1. (Phospho)proteomic summary QC figures
automatically generated by the PaDuA workflow.
Our initial findings indicated strong up-regulation of
vesicle and endocytosis associated proteins at the 24 and
48 hour timepoints following both BRAFi and combination
treatments. Further study has highlighted the potential for
receptor endocytosis, degradation or recycling to function
as possible resistance mechanisms (Figure 2).
Figure 2. Schematic of the endocytosis pathway coloured
by proteomic abundance at 48h following combination
treatment (orange up-regulated, purple down-regulated).
4. Discussion
The complete success of combination treatment in
colorectal cancer requires further investigation. The initial
findings from our data demonstrate the potential for multi-
omics approaches to identify putative mechanisms of
resistance in a model system. The results from this study
will help direct future efforts to minimise side effects and
potential for resistance in colorectal cancer. The identified
pathways and genes of interest are to be followed up with
targeted validation. As our results were obtained in a
gnomically disrupted cell line [2] results will be confirmed
in a stable background. Proteomic analysis tools developed
to support this project have been made publicly available.
References 1. Prahallad, A., et al. Unresponsiveness of colon cancer to BRAF(V600E)
inhibition through feedback activation of EGFR. Nature 483, 100–3 (2012).
2. Chen, T. R., et al. WiDr is a derivative of another colon adenocarcinoma cell line,
HT-29. Cancer Genet. Cytogenet. 27, 125–134 (1987).
Poster #27
A systems approach to explore triacylglycerol production in
Neochloris oleoabundans
Benoit M. Carreres 1, Anne J. Klok 2, Maria Suarez-Diez 1, L. de Jaeger 2, Mark H.J. Sturme 2, Packo P. Lamers 2, René
H. Wijffels 2, Vitor A.P. Martins dos Santos1, Peter J. Schaap 1, Dirk E. Martens 1
1. Laboratory of systems and synthetic biology, Wageningen University, Wageningen, The Netherlands
2. Bioprocess engineering, Wageningen University, Wageningen, The Netherlands
E-mail: [email protected]
Introduction
Biofuels present a promising alternative to meet current
energy demands in an environmentally friendly and
sustainable way. However, plant derived biofuels present
disadvantages such as low photosynthetic efficiency, and
competition with food production for arable land.
Microalgae have the potential to overcome these
disadvantages. Among the lipids they produce, the triacyl-
glycerides (TAG) are easily converted into an effective and
clean fuel [1]. When exposed to nitrogen limitation, the
oleaginous microalgae Neochloris oleoabundans
accumulates up to 40% of its dry weight in TAG. However,
a feasible production requires a decrease of production
costs, which can be partially reached by increasing TAG
yield [2].
Materials and methods
N. oleoabundans was grown in a turbidostat operated
flat panel photobioreactor under controlled conditions.
Two light absorption rates were combined with several
nitrate supply rates. Samples were taken daily during
steady state at the same hour of the day and parameters
needed for constraint based modeling of metabolism such
as growth, nutrient uptake, and starch and TAG production
rates, were measured.
We built a constraint based model describing primary
metabolism of N. oleoabundans. Fluxes were calculated
through this network by flux balance analysis using
measured parameters as constraints.
cDNA samples of 16 experimental conditions, 8 from this
experiment and 8 from another experiment, were
sequenced on an Illumina platform using short paired-end
reads. Contigs were assembled and functionally annotated.
After mapping reads to CDS, analysis of GO-terms over-
representation in differentially expressed transcripts was
performed. Relative expression changes and relative flux
changes for all reactions in the model were compared.
Results
The metabolic model predicts a maximum TAG yield
of N. oleoabundans on light of 1.07 g (mol photons)-1, more
than 3 times current yield under optimal conditions.
Furthermore, from optimization scenarios we concluded
that increasing light efficiency has much higher potential
to increase TAG yield than blocking entire pathways (i.e.
starch biosynthesis).
Our analysis shows that nitrogen limitation directly affects
gene expression of nitrogen dependent reactions. On the
other hand, high light generates more energy which has to
be distributed over all possible pathways, thereby
indirectly propagating into lower metabolism. Starch
biosynthesis is up-regulated by high light and nitrogen
limitation, while its degradation reactions are down-
regulated by nitrogen limitation only. Lipids and TAG
synthesis pathways did not display a uniform pattern and
some reactions are strongly down-regulated by high light
and nitrogen limitation. Our analysis also suggests that
Phytyl diphosphate synthase acts as a central regulator for
pigment synthesis and breakdown.
Discussion
This project is a successful collaborative work between
laboratories of two different disciplines, namely Systems
and Synthetic Biology and Bioprocess Engineering,
performed by a combination of dry and wet lab
experiments. With the newly acquired knowledge, the
collaboration loop can be closed by designing new
experiments based on the in silico analysis.
Integration of predicted metabolic fluxes and transcript
expression data increased our understanding of the
microalgal adaptation upon nitrogen limitation and high
light exposure. Furthermore, we found few key reactions
that show a different behavior when high light induction is
applied on nitrogen sufficient or nitrogen limited
conditions. This behavior reflects the interdependence of
the response to nitrogen and light supply. Some other
reactions showed unexpected regulatory patterns thereby
providing prime choice targets for further studies. This
study is focused on central metabolism and more can be
learnt from a wider analysis. However, challenges
associated to microalgae annotation still prevent such in-
depth analysis [3].
References
1. Dellomonaco C, Fava F, Gonzalez R. The path to next
generation biofuels: successes and challenges in the era of
synthetic biology. Microb Cell Fact. 2010;9(3):1-15.
2. Schenk PM, Thomas-Hall SR, Stephens E, Marx UC,
Mussgnug JH, Posten C et al. Second generation biofuels:
high-efficiency microalgae for biodiesel production.
Bioenergy research. 2008;1(1):20-43.
3. Reijnders MJ, Carreres BM, Schaap PJ. Algal Omics:
The Functional Annotation Challenge. Current
Biotechnology. 2015;5.
Poster #28
Sensitivity analysis of models with tipping points
G.A. ten Broeke1, G.A.K. van Voorn1, B.W. Kooi2 and J. Molenaar11Biometris, Wageningen University, Wageningen, The Netherlands
2Faculty of Earth and Life Sciences, Free University, Amsterdam, The NetherlandsE-mail: [email protected]
1. Introduction
Tipping points are best revealed by applying bifurcationanalysis. Bifurcation analysis, however, is limited to de-terministic models, whereas many ecological models arenondeterministic. For those models one would like to relyon sensitivity analysis, but traditional methods of sensitiv-ity analysis do not take into account the effects of tippingpoints. Here we examine how much information on tip-ping points can still be otained using sensitivity analysis(see also [1]).
2. Materials & Methods
As an example, we consider the Bazykin-Berezovskayamodel, a predator-prey model with a strong Allee effect:
dX
dt= X (X − ζ) (κ−X)−XY , (1)
dY
dt= γ (X − h)Y . (2)
Bifurcation analysis leads to the bifurcation diagram in the(ζ, h) plane (Fig. 1). Note, however, that the presented ap-proach is meant for models for which such a diagram is notavailable. Here we use it as a consistency check.
• One-factor-at-a-time (OFAT) sensitivity analysis [2]shows how the output changes as a function of an in-dividual parameter, by varying the parameter over alarge range with respect to a default parameter setting.
• Global sensitivity analysis quantifies which parame-ters are influential over a region of parameter space,based on sample points within this region.
Figure 1: Bifurcation diagram. In the green region, the modelconverges to a positive steady state, or to extinction, dependingon the initial conditions. The curve H indicates a Hopf bifur-cation where the stable steady state turns into stable limit cycles(blue region). The curve G indicates a global bifurcation beyondwhich the model always converges to extinction (white region).The dotted line indicates a transcritical bifurcation.
3. Results
OFAT indeed reveals the transcritical bifurcation where thepopulation goes extinct. For h this is shown in Fig. 2 (leftpanel). Since all other parameters are fixed, the bifurcationoccurs at a single critical value.
For the global sensitivity analysis we plot the output asfunction of a single parameter, while letting other parame-ters vary freely (Fig. 2, right panel). Thus, for each value ofh we now have a number of sample points that correspondto different values of other parameters (in this case ζ andY (0)). For decreasing h, the transcritical bifurcation is in-dicated by an increasing number of samples that evolve toextinction. Furthermore, one might suspect different typesof model behaviour based on the existence of two sepa-rate clouds, namely the green and blue one. So, also with-out having a bifurcation diagram available, we may detectthese kind of dramatic transitions.
0 0.2 0.4 0.6 0.8 10
0.05
0.1
0.15
h
Y(500)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
h
Y(500)
Figure 2: Left: OFAT sensitivity analysis shows a transcriticalbifurcation for decreasing values of h. Right: Global sensitivityanalysis shows that it depends on other parameters at what valueof h this bifurcation occurs. The red squares display the meanoutput per value of h, and may be used as sensitivity measure.
4. Discussion
Our results [1] show that OFAT can serve as an appro-priate starting point for analysing models with tippingpoints. OFAT is computationally cheap, but examines onlychanges in individual parameters, and thus does not con-sider interaction effects. Global sensitivity is more compu-tationally expensive, but considers interaction effects . Ifcombined with a good sampling design and graphical pre-sentation, global sensitivity analysis can yield important in-formation about tipping points.
References
1. GA Ten Broeke, GAK van Voorn, BW Kooi, J Molenaar. Sensi-tivity analysis of models with tipping points. Math. Meth. Nat.Phenom., IN PRESS.
2. GA Ten Broeke, GAK van Voorn, A Ligtenberg. Which sensitivityanalysis method should I use for my Agent-based Model? JASSS,19 (2016).
Poster #29
Retrieval of genome-based information from sequence databases using hybrid homology-annotation searches: case of complete RNA virus genomes
Igor Sidorov1, Andrey Leontovich2, Dmitry Samborskiy2, and Alexander Gorbalenya1,2 1Department of Medical Microbiology, Leiden University Medical Center, Leiden, The Netherlands
2Belozersky Inst. of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
E-mails: [email protected], [email protected], [email protected], [email protected]
1. Introduction
Retrieval and use of biological information associated with nucleotide sequences is commonly accomplished by scanning databases for either matches between query and sequence annotations or statistically significant similarity (homology) between query and target sequences. The former approach relies on accuracy of annotation of target sequences, which varies considerably in databases and may compromise both sensitivity and selectivity of searches. The latter approach is free of this limitations due to high accuracy of genome sequencing. Yet, establishing biologically meaningful thresholds to separate hits with genuine and fortuitous similarity remains unmet challenge. Practically, similarity thresholds are set high enough (“S” in Figure) to eliminate false positives at the expense of sensitivity. Here we describe an approach with improved sensitivity and selectivity of retrieval of biological information which combines analyses of annotation and sequence similarities between query and targets within a statistical framework and automatically establishes data-driven thresholds (“H” in Figure) for sequence similarity. 2. Materials & Methods
The core of our approach is the use of isotonic regression for simultaneous analysis of distributions of similarity values and annotation matches to calculate probability for each database sequence that it belongs to the query group. As proof of the principle, the approach was realized in a computational engine (dubbed HAYGENS, Homology-Annotation hYbrid retrieval of complete RNA virus GENome Sequences). We used HMMER score values as a measure of similarity between query profile and database sequences, and matches between query and a field of annotation of database entries for assigning each sequence to a group with matching, mismatching or lack of annotations, respectively. Decision for sequence assignment was based on commonly used confidence interval values.
HAYGENS was applied to 13 RNA viral groups of different ranks (genus, family, and order) that include major pathogens of animals and plants and many poorly characterized viruses. RNA viruses are highly diverse, both within and between families, but each family has a unique domain layout in genome whose size varies within a limited range. Sequence alignment profiles of family-specific RNA-dependent RNA polymerase (RdRp), and taxa names were used to query GenBank. Additionally, to retain only complete or nearly complete genomic
sequences, the results of hybrid sequence-annotation searches were filtered by length of the entire sequence, RdRp and its flanking regions. 3. Results
Automatically run HAYGENS daily updates are available at http://veb.lumc.nl/HAYGENS. In comparison with annotation-based searches, HAYGENS’ gains of sensitivity and selectivity exceed 5% for about 25000 genomes. The gain of sensitivity is most for patented sequences of GenBank (PAT division), while the gain of selectivity is evenly spread among viral, patented and synthetic sequences (VRL, PAT and SYN divisions). Estimation of expected gains of accuracy of HAYGENS compared to homology-based only searches is not feasible due to the lack of established S-thresholds for genomes retrieval. Still, values of H-threshold varied considerably between the analyzed groups in apparently proper reflection of the virus sampling size and diversity.
Figure. Gain of accuracy of the hybrid genome retrieval. 4. Discussion
With the observed and likely gains in accuracy, hybrid approach can be used for quality assessment of sequence annotations. It may be also useful in procedures that are used to transfer annotation in annotation-based databases (GO) and genomic projects, as well as for calculating data-driven thresholds for structuring sequence profile databases (Pfam).
Mat
ched
Sequence similarity
An
nota
tion
S
Mis
mat
ched
/Em
pty
H
gain of selectivity
gain of sensitivity
Annotation retrieval
Hom
ology retrieval
Hybrid retrieval
Poster #30
Assembly and annotation of the fungi Fusarium poae Sven Warris1, Theo van der Lee1, Andriaan Vanheule2, Henri van de Geest1
1 Plant Research International, Wageningen UR, Wageningen, The Netherlands 2Department of Applied Biosciences, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
E-mail: [email protected], [email protected], [email protected], [email protected]
1. Introduction
The fungal species Fusarium poae is a prominent wheat pathogen and is, like other fungal pathogens, known for its genome plasticity and rapid evolution and adaptation. Key factors in these evolutionary processes are the presents of supernumerary chromosomes and a high number of transposable elements. Here we show that for such a highly repetitive genome, long read technology enables us to create a de novo assembly of the complete genome sequence. By using high volume HiSeq RNASeq data an automated annotation of this genome delivers a high quality annotation, comparable to the quality of the manually curated Fusarium graminearum genome. 2. Materials & Methods
The genome of Fusarium poae was sequenced using single molecule real time (SMRT) sequencing on the PacBio RSII using 16 SMRT cells. The data were assembled using HGAP2[1] and Celera[2]. Polishing of the genome assembly was performed by using Quiver[2].
For comparison, a de novo assembly was created using HiSeq 200bp paired-end data in CLCBio. These reads were also used for quality assessment of the long read assembly.
To generate a machine annotation, RNA was extracted from mycelium grown in six different culture conditions. A total of 562.136.710 quality trimmed HiSeq paired-end RNA-Seq reads were used in the BRAKER1[3] pipeline to annotate the genome.
RepeatModeler was used to predict repetitive elements in the genome. 3. Results
The new long read assembly contains an additional 7.28Mb of sequence compared to the HiSeq assembly, with a significantly reduced number of contigs and it has a much larger representation of bases in large contigs. In the long-read assembly four complete chromosomes were built from 9 contigs comprising 38.13Mb of sequence (Table 1). Compared to the HiSeq reads obtained from the same isolate, 223 variants were detected in homopolymeric stretches of nucleotides and low complexity (low GC%) regions (219 and 4 respectively). For these variants read mapping was inconclusive for both HiSeq and SMRT reads. In addition one single nucleotide polymorphism (SNP) was detected between the HiSeq and SMRT reads.
Using BRAKER1 and the RNASeq data 14,817 genes were predicted. The BUSCO (Benchmarking Universal Single-Copy Orthologs)[4] data set for fungi was used to assess the quality of this machine annotation (Table 2). This
analysis suggests that the annotation of F. poae did not miss any conserved genes, and within the conserved genes, <0.5% is miss-annotated. Based on the RepeatModeler output, the core chromosomes contain 2% transposable elements (TEs) while the supernumerary genome consists of 25% TEs.
Total Core Supernumerary F. graminearumGenome size (bp) 46.309.701 38.129.297 8.180.404 37.958.956GC% 46,30% 46% 47,60% 48,20%# of genes 14.817 12.097 2.720 14.160Mean gene density (per Mb) 320 317 332 373Median gene length (bp) 1.391 1.406 1.309 1.257Avg introns/gene 1,82 1,88 1,57 1,72Median intron length (bp) 54 54 57 55
F. poae
Table 1. Statistics on the genome assemblies[5]
Organism Complete Fragmented Missing TotalF. poae 1431 7 0 1438F. graminearum 1432 6 0 1438
Table 2. BUSCO statistics of F. poae and F. graminearum[5]
4. Discussion The statistics of the long-read de novo assembly were
much better compared with a de novo assembly of HiSeq paired-end reads for the same isolate. This shows the power of long read technology for the de novo assembly of repetitive fungal genomes.
For the gene modelling a large RNASeq data set was used, which is not always available. However, the statistics of the gene predictions are nearly as good as for the manually curated F. graminearum set, indicating that this approach can generate high quality genome annotations.
References 1. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J,
Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2013;10: 563–9. doi:10.1038/nmeth.2474
2. Denisov G, Walenz B, Halpern AL, Miller J, Axelrod N, Levy S, et al. Consensus generation and variant detection by Celera Assembler. Bioinformatics. 2008;24: 1035–40. doi:10.1093/bioinformatics/btn074
3. Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2015;32: 767–769. doi:10.1093/bioinformatics/btv661
4. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31: 3210–3212. doi:10.1093/bioinformatics/btv351
5. Vanheule A, Audenaert K, Warris S, Geest H van de, Schijlen E, Höfte M, et al. Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen. submitted. 2016;
Poster #31
Modelling Nutrient Dynamics: Donor versus Top-Down Control
Sanne De Smet1, Wim H. van der Putten
2,3 Elly Morriën
4 and Peter C. de Ruiter
1,4
1Biometris, Wageningen University, Wageningen, The Netherlands
2 Department of Terrestrial Ecology, NIOO, Wageningen, The Netherlands
3 Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands
4IBED, University of Amsterdam, Amsterdam, The Netherlands
E-mail: [email protected], [email protected], [email protected], [email protected]
1. Introduction
In order to meet the rising demand for food, crop
production systems will need to become more efficient; i.e.
we need to develop crops that grow fast and are insensitive
to pests and diseases. However, it is generally assumed that
growth is a trade-off with defence. We are interested in how
crops can escape this trade-off. However, knowledge of the
system-level processes of crop production systems is limited
as plants interact with a myriad of soil dwelling species
affecting their performance. This complex soil community
affects plant performance through different mechanisms
which can be positive or negative, potentially cancelling one
another out. We will focus on one type of interaction: the
effect of the decomposer subsystem on plant growth.
Decomposers drive nutrient cycling and hence may
control plant growth as they affect storage and mineralization
rates of nutrients. In models describing the decomposition of
organic material however, this biotic components is not well
incorporated. The decomposition process in existing models
is modelled either in a biological top-down or a
physicochemical donor-controlled way. However, both
approaches fail to tighten this plant-decomposer interaction.
We will take a modelling approach in which we start
from a donor controlled model to which we add the top down
controlled biology. By comparing these model versions we
will identify how decomposition dynamics influence plant
growth. We expect that decomposer properties will override
substrate properties in the control of plant growth by taking
over the control of the decomposition process.
2. Materials & Methods
We developed a compartmental model representing the
recycling of carbon and nitrogen in terrestrial ecosystems
(see Fig. 1).
Figure 1. Compartmental model (1: donor control; 2 and 3:
top-down control)
We derived a set of alternative versions modelling the
decomposition process through donor or top-down control.
In the donor-controlled version (1), we represented the
decomposition process as a first-order decay rate. In the top-
down controlled version (2), we added a decomposer
compartment (microbes). In this model the decomposition
rate depended on the biomass and physiology of the
microbes. This top-down controlled model was then further
developed by adding a microbial grazer compartment (3).
We analysed the system of differential equations by
solving for the steady states.
3. Results
Equilibrium plant biomass differed between model
versions. In the donor controlled model it depended on the
organic matter density. In the top-down controlled model,
the dependence shifted to microbial biomass density and
physiology; to which grazer properties are added if the
model is further extended. This implies that decomposers
properties override the donor controlled substrate
characteristics in defining plant growth.
This is further demonstrated in the equilibrium amount
of organic matter. We found that the equilibrium depended
on the plant biomass density in the donor controlled model,
but in the top-down controlled model it is fully defined by
decomposer physiology.
4. Discussion
Our results illustrate the close interaction between plants
and decomposers. We assumed that decomposer properties
would override substrate properties in the control of plant
growth. Both organic matter and plant density are indeed
functions of decomposer properties. Furthermore, under the
assumption of donor control addition of nutrients would
always increase decomposition rates. Whereas the maximum
decomposition rate is determined by decomposer properties
only in the top down controlled model.
In this article we have focused on the effects of the soil
decomposer subsystem. Our present theory does not address
the effects of other soil organisms. With appropriate
modifications they can be included in our theoretical
framework.
As decomposer physiology regulates plant productivity,
efficient crop production systems should take into account
functionality rather than biomass of the decomposers.
Poster #32
Comparative Proteomics Reveal Different Levels of Protein Conservation in the Polarization Network across 200 Fungi.
Eveline T. Diepeveen1, Valérie Pourquié1,2, Thies Gehrmann2, Thomas Abeel2 & Liedewij Laan1
1 Department of Bionanoscience, Delft University of Technology, Delft, the Netherlands 2 Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
1. Introduction
An essential step in cell growth and division is the proper break up of cell symmetry, resulting in polarized cells. Protein conservation of the loci involved in this polarization process is therefore expected to be high across taxa, due to conserved functions. The proteins and mechanisms involved in polarity establishment have been disentangled in a handful of fungal species, with most work done in the budding yeast Saccharomyces cerevisiae. Here we investigate how conserved polarity establishment is, by comparative proteomics of the bulk of proteins responsible for polarity establishment in budding yeast.
2. Materials & Methods
We selected 29 polarization proteins based on described
functions in the polarization process. We constructed a protein database (PD), consisting of 200 proteomes from strains and species of four fungal phyla: Microsporidia, Cryptomycota, Basidiomycota and Ascomycota (see Fig. 1). Next we screened the PD for the presence/absence of the 29 focal proteins through local Blastp searches. The obtained Blast hits were used to calculate the absolute number of occurrences per amino acid to determine the level of conservation along the amino acid sequence per protein. The results were plotted in conservation plots.
Phylogenetic relationships of the 200 examined strains and species were determined by using 86 conserved loci obtained from the DP. We performed phylogenetic inference (i.e., approximately maximum likelihood method) with FastTree and used the JTT model of amino acid evolution.
0.2
Saccharomycotina (33)
Pezizomycotina (113)
Taphrinomycotina (4)
Basi
diom
ycot
aAs
com
ycot
a
Ustilaginomycotina (4)
Agaricomycotina (2)
Microsporidia (7)
Agaricomycotina (30)
Pucciniomycotina (6)
Cryptomycota (1)
1
1
1
1
1
0.98
1
1
1
1
1
1
1
1
1
SaccharomycotinaPezizomycotina
Taphrinomycotina
AXL2
BEM
1BE
M2
BEM
3
MSB
3M
SB2
MSB
1LT
E1IQ
G1
GIC
2G
IC1
CLA4
CDC4
2CD
C24
BOI1
BNI1
BEM
4
SKM
1SE
C15
RSR1
RHO
3RG
A1RD
I1RA
S2
SEC4
SEC3
UBI4
SWI4
STE2
0
100%75%
50%1%0%
Ustilaginomycotina
Wallemiomycetes
Microsporidia
AgaricomycotinaPucciniomycotina
Cryptomycota
Figure 2. Polarization protein conservation levels. The greyscale represents the level of presence/absence of the 29 proteins for each clade clade. Core proteins are highlighted in red. Note the lineage-specific loss of nearly all polarization proteins in the Microsporidia (bottom row) and the lineage-specific gain of GIC1 and GIC2 in the true yeast group (within the Saccharomycotina; top row). 3. Results & Discussion
Our results indicate that polarization loci are conserved
at different levels across the fungal clade. There is a core (5/29) of highly conserved proteins, which are present in all fungal clades besides the Microsporidia (Fig. 2) and are found at various subfunctions of the polarization machinery.
However, we also observe lineage-specific losses and gains of polarization proteins in several clades (Fig. 2), even though some of these loci (such as CDC24) are essential in budding yeast and are commonly found in non-fungal eukaryotes. Some of these events seem to co-occur with e.g., strong genome reduction in the Microsporidia clade, and yeast-specific gain of loci, partially fuelled by the yeast-specific genome duplication.
Together our results show that the polarization network consists of a highly conserved part, but at the same time there are various proteins that only occur in specific species or clades, which might have evolved or been maintained for specific roles or functions. Further research will focus on the functional implications of these strong differences of protein conservation and lineage-specific patterns.
< Figure 1. The fungal phylogeny based on 86 loci. The number of examined strains/species per clade is depicted in brackets. Support values are specified above the branches.
Poster #33
GENOBOX: A GENOmics Tool-BOX for the Prediction of Functional Properties of Bacterial Strains
Karima Hajo1, Leonardo Lenoci1 and Sacha A.F.T. van Hijum1,2 1Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, The Netherlands
2NIZO Food Research, Ede, The NetherlandsE-mail: [email protected]
1. IntroductionFermented foods and beverages such as cheese, yogurt,
bread, beer and wine have been in the human diet for centuries. They are the final product of the fermentation of yeast and lactic acid bacteria on substrates such as milk, cereals and fruits.
In the context of the European research programme FP7, the GENOmics tool-BOX (GENOBOX)1 project proposes a novel approach to the prediction of functional properties of micro-organisms. GENOBOX allows fast and cost-effective prediction of specific strain functionalities based on genomic information. Genomic information of bacterial strains are stored in the GENOBOX database to allow comparison, analysis and visualisation of annotated genome and phenotype data providing prediction of functional properties characterisation of the strains.
2. Materials & MethodsGENOBOX is a database-driven web application
developed in the python web framework, Django2. The genomics toolbox is hosted on a virtual machine with Ubuntu 12.04 and Apache as the web server. The underlying database, used to store genomics data (genome sequences, functional annotations, other genomic features, functionality to screen for, orthology groups and phenotypic data) of the strains is a PostgreSQL database based on Chado3, which is a generic and extensible relational database schema for biological data. The software used in the user interface of the GENOBOX application is (i) Javascript for interactivity, (ii) CSS, (iii) DataTables and (iv) Jquery to visualize, filter and paginate web-page contents.
The genomics toolbox includes (annotated) genome data of public bacterial strains from the NCBI and private industrial strains sequenced using Illumina HiSeq2000 paired-end sequencing, which are strictly accessible to the owners. The phenotypic data in the GENOBOX application have been experimentally determined in the lab.
3. ResultsThe genomics toolbox provides an analysis framework
to easily browse through the experimental and biological data in the database to study detailed results on strains of interest. Particularly useful in GENOBOX are the functionality screening options, which uses genomic targets that are associated with specific functionalities. A list of genomic targets have been defined from literature and expert
knowledge. They include genes involved in flavour formation, proteolysis, antibiotic resistance, virulence factors and probiotic functionalities.
4. DiscussionGENOBOX
can be used to predict functional properties of microbial strains based on their genome sequences and function prediction models that have been implemented in the application. GENOBOX has broad applicability for research and industrial purposes in the improvement of production, flavour, safety and robustness of processes where microbes play an important role. In the future we aim to expand the function prediction models based on genotype-phenotype matching results and provide options to store and query sequence data for complex fermentations consisting of multiple microbes as well as sequence based predictive models for functions of these consortia of microbes.
References1. A genomics toolbox to enhance business from SMEs in the market of starter cultures and probiotics. (2015, March 11). Retrieved from http://cordis.europa.eu/project/rcn/110148_en.html.2. Django (Version 1.4) [Computer Software]. (2012). Available from https://djangoproject.com.3. Mungall CJ, Emmert DB. A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics. 2007 Jul 1;23::i337–46. Available from http://gmod.org/wiki/Chado.
Figure 1. A flowchart detailing the use of the GENOBOX application for different purposes (property). The flow to identify genes in strains responsible for Flavour(green), Safety(purple), Robustness(brown) and improved biomass production(blue) are displayed.
Poster #34
Titel van abstract: Replacing MLPA CNV detection with XHMM on targeted panel NGS data Auteurs: Daphne van Beek, Ivo Clemens, Marjan Weiss, Alessandra Maugeri, Ingrid Bakker, Quinten Waisfisz, Erik Sistermans Alle instituten: VUmc Klinische Genetica Keywords: XHMM optimization, validation, Copy Number Variation (CNV) detection, NGS read-depth analysis, targeted and Whole Exome Sequencing, MLPA, array, diagnostics Abstract (Maximum 2500 characters): Currently MLPA and arrays are routinely used for the clinical diagnosis of large DNA duplications and deletions (copy number variations, CNVs). Recently several tools for the detection of Copy Number Variations (CNVs) on NGS read-depth data have been developed. CNV detection by read-depth methods - such as XHMM (eXome-Hidden Markov Model) - is economically advantageous, as this type of analysis uses the NGS data already available from the SNP/INDEL detection without any additional laboratory costs. In this study we validated the use of XHMM for the detection of CNVs in a NGS based targeted gene panel. Method XHMM was optimized on 1210 samples sequenced with the Connective Tissue Disorder (CTD) panel version 1; 1176 exons captured in solution by a custom Mycroarray MYbaits library. The default parameter setting proved a good choice, except for single exon CNVs. A separate optimal parameter setting for detecting single exon CNVs was validated and implemented next to the default run. The performance of XHMM was tested on 318 samples sequenced with CTD panel version 2; 1977 exons captured in solution by a custom Roche Nimblegen SeqCap EZ Choice Library (IRN4000018830, Nimblegen). In this set were 35 previously MLPA or array confirmed CNVs: 12 duplications, 24 deletions – all heterozygous – and 1 homozygous deletion. Both the duplications and deletions range in size from single exon to over 60 exons. After exploratory runs with all 191 samples at both parameter settings, the panel was cleaned by excluding 30 samples where XHMM found CNVs and 17 samples with outlier average read-depths. 12 exons in the TNXB gene, a known pseudogenic region, were excluded from further analyses. Next, all excluded samples were each in turn analyzed against the cleaned panel at both parameter settings. Results and conclusion XHMM correctly called all 35 known CNVs, including 8 single exon deletions and 4 single exon duplications. XHMM detected an additional 68 CNVs that remain unconfirmed. Roughly half of these CNVs appear false positives, the other half plausible. The unconfirmed CNVs were all filtered out when results were filtered on frequently observed CNVs and requested gene-panel. CNV detection by XHMM was demonstrated to be at least as sensitive as the current methods. After filtering, the specificity proved sufficient. VUmc clinical genetics has implemented XHMM analysis has replaced MLPA for the clinical diagnosis of exon DNA duplications and deletions in CTD samples.
Poster #35
Bringing Zen to zinc sites
Bart van Beusekom1, Wouter Touw1,2, Gert Vriend2 and Robbie P. Joosten1
1Dept. of Biochemistry, Netherlands Cancer Institute, Amsterdam, the Netherlands2CMBI, Radboud University, Nijmegen, the Netherlands
E-mail: [email protected], [email protected]
1. Introduction
Structural zinc sites are a prevalent motif in proteins thatoften play an essential role in DNA recognition [1]. How-ever, the 3D description of many Zn2+ sites in the ProteinData Bank (PDB) is poor, which hampers their biologicaland biomedical interpretation.
2. Materials & Methods
A computer program called Zen was designed to help im-prove Zn2+ sites (Figure 1). Zen was incorporated in thePDB REDO pipeline, which optimises new and publishedcrystallographic protein structures by rebuilding and refine-ment [2]. Afterwards, all published Zn2+ site-containingstructures were renewed by PDB REDO to try and improvethe Zn2+ site description. The site quality was analysedusing a root-mean-square Z-score comprised of many dis-tances and angles in the site.
Figure 1: Schematic representation of Zen methodology.
3. Results
6935 structural Zn2+ sites were detected in 2635 PDBstructures which were updated in PDB REDO. Zn2+ sitesbecame much more realistic using the restraints from Zen:the average root-mean-square (rms) Z-score of Zn2+ sitesimproved from 1.70 ± 1.23 to 0.89 ± 0.29. Visual in-spection confirmed that Zen is capable of correcting manyhighly unrealistic Zn2+ sites (examples in Figure 2) whilerarely deteriorating sites that already have a normal geom-etry.
Figure 2: Three pairs of Zn2+ sites before (top) and after(bottom) re-refinement with restraints from Zen. Zn2+ ionsare displayed as grey spheres. Side-chains are shown withCys(Sγ) in yellow and His(Nδ/Nε) in blue.
4. Discussion
We developed a new method to improve the geometryof structural Zn2+ sites. The method has been appliedon a PDB-wide scale and structure models with updatedsites are available through the PDB REDO databank. ThePDB REDO webserver now automatically runs Zen toavoid poor Zn2+ sites in to-be-published protein structures.
References
1. MacPherson, S., Larochelle, S. and Turcotte, B. (2006). MicrobiolMol Biol Rev, 70(3):583-604.
2. Joosten, R. P., Joosten, K., Murshudov, G. and Perrakis, A. (2012).Acta Cryst. D, 48(4):484-496.
Poster #36
Disease specific network with application in network based outcome prediction
Amin Allahyar and Jeroen de Ridder
Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands E-mail: [email protected], [email protected]
1. Introduction In cancer outcome prediction, biological networks (e.g.
pathways) are used to aggregate functionally related genes with added discriminative power and biological relevance [1]. However, recent studies revealed that comparable performance might be achieved using many different biological networks [2, 3]. Perhaps even more striking is that utilizing random networks [3] or integrating random genes as markers [4] performs on par to the case when a real biological network is employed.
2. Materials & Methods
We aimed to investigate this issue by effectively turning the problem around. Initially, we constructed a candidate synergistic network in which two genes are connected if their integration yields prediction power beyond what is attainable individually. This is done by evaluating all pairwise combination of genes. Our hypothesis is that a biological network may perhaps be useful in breast cancer outcome prediction if it signifies more connections between identified synergistic pairs compare to random pairs.
In the next phase, we questioned the possibility of generalizing the synergistic pair selection strategy in NOPs. Concretely, we constructed a new classification problem in which topological properties of two genes (e.g. shortest path, jaccard, cluster coefficients etc.) are considered as evidences (i.e. features) and synergistic status between these genes which is determined by our candidate network are considered as corresponding labels. Using this framework, apart from being able to combine evidences from multiple topological measures, we can exploit any arbitrary number of biological networks that are available. 3. Results
We observed that none of considered biological networks sufficiently resemble the synergistic pairs which explains the comparable performance of utilizing a random network in NOPs. We conclude that “connectivity” property of a gene (as often used in NOPs) is not adequate to determine its synergistic mates.
Next, we aimed to predict synergistic pairs using topological measures of several biological networks. Our extensive experiments showed that the synergistic pairs can be accurately identified (~85% AUC) using combination of topological measures. Remarkably, our framework identified efficacy of known topological measures (e.g. degree and page rank) as well as several new ones (e.g. eigenvector centrality and closeness) that were not recognized before.
We also showed that our disease specific network which is specialized in breast cancer outcome prediction can be further employed in existing NOPs to predict the outcome of
unseen patients in an independent dataset. We observed that using our synergic network, most of the NOPs under study performed significantly better when compared to cases where existing biological networks are employed.
Figure 1. Overview of SPADE.
4. Discussion
Using biological networks in outcome prediction
problem is quite common. However, recent studies showed the inefficiency of these networks in aforementioned problem. By pairwise evaluation of all genes we aimed to see if any beneficial network for outcome prediction can be constructed that actually improve the performance. We could substantial similarity between existing networks and our optimal network. Further we investigated the ability of other topological measures that could be used to improve performance of this problem. We could identify several known topological measures as well as new measures that can be effectively used to identify synergistic pairs.
References [1] D. Hanahan, and Robert A. Weinberg, “Hallmarks of
Cancer: The Next Generation,” Cell, vol. 144, no. 5, pp. 646-674, 3/4/, 2011.
[2] C. Staiger, S. Cadot, B. Györffy, L. F. Wessels, and G. W. Klau, “Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis,” Frontiers in Genetics, vol. 4, 2013.
[3] C. Staiger, S. Cadot, R. Kooter, M. Dittrich, T. Müller, G. W. Klau, and L. F. Wessels, “A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer,” PloS one, vol. 7, no. 4, pp. e34796, 2012.
[4] D. Venet, J. E. Dumont, and V. Detours, “Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome,” PLoS Comput Biol, vol. 7, no. 10, pp. e1002240, 2011.
Poster #37
A set of 40 SNPs for DNA or RNA-based identification of Dutch individuals
S. Yousefi1, T. Abbassi-Daloii1, M. Vermaat1, D. Zhernakova2, P. de Knijff1, L. Franke2, P. A.C. ’t Hoen1 1Dept. Human Genetics, University of Leiden University Medical Center, Leiden, Netherlands
2Department of Genetics, UMC Groningen, Groningen, Netherlands
E-mail: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected], [email protected]
1. Introduction
The use of single nucleotide polymorphisms
(SNPs) for forensic purposes presents several
advantages over use of other genetic markers,
including their abundance in the genome, low
mutation rate and straightforward detection using
high-throughput technologies. The aim of the
present study was to find a small set of SNPs that
can be used for unequivocal identification of Dutch
individuals based on DNA or RNA material.
2. Materials & Methods
DNA-based genotype calls were taken from
the 500 parents of the Genome of the Netherlands
(GoNL) project. RNA-based genotype calls from
whole blood RNA-seq were taken from the
Biobank-based Integrative Omics Project (BIOS)
[1]. In the first step, we filtered data based on
number of expected genotypes and alleles for SNPs
(we expect 3 genotypes and 2 alleles for each
SNPs). Only SNPs for which genotypes were
called in >90% of samples (more than 90%
detection rate in both DNA and RNA) were kept.
Then the following SNP selection criteria were
used to select a set of SNPs for individual
identification: (1) minor allele frequency > 0.2, (2)
heterozygosity rate > 0.4,(3) linkage disequilibrium
~ zero, and (4) Hardy–Weinberg equilibrium p-
value > 0.01. All filtering steps have been
performed using R scripts.
3. Results
We detected 532 common SNPs between
DNA-seq and RNA-seq data after primary filtering.
Subsequently, we selected a set of SNPs for
individual identification based on several criteria
(Table 1, Figure 1). We ended up with a selection
of 40 SNPs.
Table 1. Filtering steps with number of remained SNPs
Filter Steps No. of SNPs Number of genotypes and alleles 1 1697154 Genotype calling with detection rate > 90% 18045 Common SNPs between DNA and RNA
with MAF > 0.2
532
HW_ p value > 0.01 79 LD based on R2 < 0.01 44 Ignore SNPs located in HLA loci 40 1 Ignore SNPs with > 3 genotypes or > 2 alleles
The uniqueness of these SNPs was checked and
results showed that this set is still big enough to
identify an individual uniquely in our population.
The selected SNPs were compared with previous
SNPs panels and also 1000 Genomes phase_3
populations. The Minor Allele Frequency of our
SNPs were close to 1000 Genomes European
populations and overlapped with American,
European and South Asian populations.
Figure 1.Distribution of alternative allele frequency in
532 SNPs and selected 40 SNPs in DNA (GoNL project)
and RNA (BIOS project).
4. Discussion
Several strategies and SNP panels have been
published for individual identification. However,
how to choose the best panel of SNPs is still under
discussion. In the present study we focused on a set
of SNPs for individual identification that could be
used in DNA and RNA material. With this selection
of SNPs, individuals can be identified in
biomedical studies. DNA and RNA samples from
the same individual can be matched and possible
sample mix ups can be identified and corrected.
The SNP set may serve forensic casework for
individual identification. Our SNP set has been
identified and evaluated in the Dutch population,
but the high concordance in MAF between the
Dutch and other European populations suggest that
it can more generally be used for individuals of
European descent. The set may be suboptimal for
non-European populations.
References
1. D. Zhernakova, P. Deelen, M. Vermaat, et al
.2015. “Hypothesis-free identification of
modulators of genetic risk factors”. http://dx.doi.org/10.1101/033217.
Poster #38
GruS: A novel scaffolding and gap filling tool for
Oxford Nanopore Technologies reads
Coolen J.P.M.1, Ridder D. de
1, Smit S.
1, Valle-Inclan J.E.
2 and Faino L.
2
1Department of Bioinformatics, Wageningen University and Research center, Wageningen, The Netherlands
2Department of Phytopathology, Wageningen University and Research center, Wageningen, The Netherlands
E-mail: [email protected]
1. Introduction
The introduction of next generation sequencing
technologies (NGS) resulted in an enormous growth of
available genomic datasets and publication of complete and
draft genomes. The introduction of third generation
sequencing technologies by Pacific Biosciences (PacBio) [1]
and Oxford Nanopore Technologies (ONT) [2], made it
possible to sequence large repeat regions and thereby
completing draft genomes by performing scaffolding and gap
filling. Currently no tool is able to use high error-rate ONT
data to improve draft genomes. We present the first novel
scaffolding and gap filling tool, Gru gap filling and
Scaffolder (GruS), that is compatible with PacBio and ONT
data.
2. Materials & Methods
GruS is based on PBJelly (PB) [3] and introduces the
ONT specific aligner Graphmap [4], a validation/evaluation
stage and an iteration stage. Thereby making it compatible
with all third generation sequencing technologies.
Contigs, 2D and 1D ONT reads were simulated and used
to evaluate the performance of GruS compared to PBJelly.
Two modes of GruS were used, one identical to PB
(GruS B) and one using Graphmap as aligner (GruS G).
3. Results
Using sets consisting of 1x – 50x coverage 2D reads to
improve simulated contigs, GruS B performs similar to PB
and GruS G outperforms both PB and GruS B in terms of
number of contig reduction and amount of errors (Figure 1).
With GruS G gaps were filled with higher accuracy
when using 2D reads to improve draft genomes (data not
shown).
Using 1D reads GruS G is able to reduce contigs, but
introduces errors where PB is unable to reduce contigs (data
not shown).
4. Discussion
GruS outperforms PB because it uses Graphmap, which
is highly optimized for using ONT reads with high error-rate,
allowing it to map more reads to the scaffolds, resulting in an
increase in number of reads identified to span or extend a
gap.
The contribution of more aligned reads improves the
consensus calling and thereby decreasing the amount of
errors introduced.
Improvements to the consensus calling could improve
the accuracy of gap filling when using 1D reads thereby
reducing the number of errors introduced.
References
1. Eid, J., et al., Real-Time DNA Sequencing from
Single Polymerase Molecules. Science, 2009.
323(5910): p. 133-138.
2. Quick, J., A.R. Quinlan, and N.J. Loman, A
reference bacterial genome dataset generated on
the MinION portable single-molecule nanopore
sequencer. Gigascience, 2014. 3(22): p. 10.1186.
3. English, A.C., et al., Mind the gap: upgrading
genomes with Pacific Biosciences RS long-read
sequencing technology. PLoS One, 2012. 7(11): p.
e47768.
4. Sovic, I., et al., Fast and sensitive mapping of
error-prone nanopore sequencing reads with
GraphMap. 2015.
Figure 1. Benchmark results show that using ONT data (2D reads)
GruS G outperforms both PB and GruS B in terms of contig reduction
and amount of errors introduced. X-axis: number of contigs. Y-axis
number of errors
Poster #39
Pathway and network analysis of transcriptomics data from the omental adipose tissue of morbidly obese diabetic individuals
Marianthi Kalafati1,2, Martina Kutmon1,2, Lars MT Eijssen1, Ilja CW Arts2,3, Chris T Evelo1,2
1Department of Bioinformatics, Maastricht University, Maastricht, The Netherlands 2Maastricht Centre for Systems Biology, Maastricht University, Maastricht,The Netherlands
3Department of Epidemiology, Maastricht University,Maastricht, The Netherlands
Email: [email protected], [email protected],
[email protected], [email protected], [email protected]
1. Introduction The prevalence of type 2 diabetes has significantly
increased along with that of obesity worldwide. The
adipose tissue has been recognized as a major endocrine
organ that is associated with obesity related diseases such
as type 2 diabetes. Nowadays every scientist has access to a
vast collection of transcriptomics data that are available in
online repositories. There are plenty of methods for
analyzing these types of data, which can provide insights
into the influence of gene expression on the molecular
level. Pathway and network analysis enables the
visualization of genomics data in the context of known
biological processes. This also allows to capture the
interactions of gene products and metabolites, information
that is important for the interpretation of transcriptomic
results.
2. Materials and Methods A publicly available dataset from Doumatey et al. (2015),
comparing the omental adipose tissue of 20 morbidly obese
subjects (GEO accession: GSE71416), including 14
morbidly obese individuals with T2D and 6 morbidly obese
individuals without T2D, was chosen based on relevance
and data quality.
The raw data of the 20 obese subjects were reanalyzed
with ArrayAnalysis.org, our online microarray quality
control and preprocessing pipeline. Pathway analysis was
performed in PathVisio 3.2.1 (www.pathvisio.org) to
interpret and visualize the molecular changes on a pathway
level. Network analysis was performed with Cytoscape
3.3.0 where all the significantly altered pathways were
merged and visualized.
3. Results In the selected human omental adipose tissue dataset,
statistical analysis showed that 310 genes were differentially
expressed, of these 225 were downregulated and 85
upregulated, in the obese diabetic subjects compared to the
obese nondiabetics. Pathway analysis revealed 8
significantly altered pathways in the WikiPathways human
pathway collection. These significantly altered pathways
were then merged into one combined network. Connection
between pathways was also possible, by 25 nodes present in
more than 2 pathways. The extension with transcription
factorgene interactions from the ENCODE project revealed
new connections between the pathways on a regulatory level.
4. Discussion This approach is an ongoing work that aims to describe
pathway and network analysis methods that will advance the
study of the diabetic omental adipose tissue. The
incorporation of this information in our approach is useful in
understanding the mechanisms affected by disease while
investigating the interaction between pathways involved in
the human omental adipose tissue by connecting them in a
biological network and extending them with additional
knowledge on transcription factor regulation and disease
association.
References: 1. Kutmon, Martina, Chris T. Evelo, and Susan L. Coort. "A
network biology workflow to study transcriptomics data of the
diabetic liver." BMC genomics 15.1 (2014): 971. 2. Doumatey, Ayo P., et al. "Global Gene Expression Profiling in
Omental Adipose Tissue of Morbidly Obese Diabetic African
Americans." Journal of endocrinology and metabolism 5.3
(2015): 199.
Poster #40