Poster #21 Development of pH -insensitive FRET sensors for … · 2016. 9. 27. · Development of...

Development of pH-insensitive FRET sensors for single-cell metabolite measurements. Dennis Botman1, Joachim Goedhart2, Bas Teusink1

1 Systems Bioinformatics/AIMMS, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, The

Netherlands. 2 Section of Molecular Cytology, van Leeuwenhoek Centre for Advanced Microscopy, Swammerdam Institute for

Life Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

Introduction: Recent studies show that cell

populations are heterogeneous also at the metabolic

level, which can result in metabolically distinct

subpopulations. Yet, few methods provide the

possibility to study metabolism on single-cell level

in time. However, the development of biosensors

based on Förster Resonance Energy Transfer

(FRET) gives new opportunities to measure

metabolism at the single-cell level. FRET biosensors

consist of a donor fluorescent protein (FP) and an

acceptor FP, called a FRET pair, with a linker in

between. The linker changes conformation in

appearance of a stimulus (e.g. metabolites), and the

resultant change in energy transfer allows single-cell

measurement of certain metabolites such as glucose,

ATP, Pi and more. However, it is known that the FPs

in FRET sensors are pH-sensitive, affecting the

energy transfer (i.e. the FRET ratio), and this

compromises their use in systems in which pH is a

dynamic variable. Therefore, we want to determine

the effect of pH changes on the FRET sensors and

reduce this effect when possible.

Materials & Methods: We expressed metabolic

FRET biosensors in yeast and tested the effect of

various pH conditions on the FRET ratios.

Furthermore, we characterized the pH sensitivity of

various FPs to determine which FPs in the sensors

can be optimized for pH sensitivity.

Results: We found that the current available FRET

sensors are highly affected by pH lower than 7.

Mostly acceptor FPs are pH-sensitive and pH

dynamic environments affect their fluorescent

intensity. This affects the FRET ratios and therefore

the accuracy of FRET sensors. Finally, we

characterized the pH sensitivity for several FPs and

identified potentially suitable pH-insensitive FRET

pairs to decrease the effect of pH on the biosensors.

Discussion: We show that the current FRET sensors

cannot be used in dynamic or acid pH environments

such as the golgi apparatus, mitochondria,

subcellular vesicles, membranes and yeast cytosol.

Yet, we found potential pH-insensitive FRET pairs

and our aim is to develop pH-insensitive FRET

sensor with these suitable pH-insensitive FRET

pairs.

Poster #21

Lack of unity in the workforce – How single cell heterogeneity may impact product formation at near-zero specific growth rates

Phillipp Schmidt1, Anne Doerr2, Johan van Heerden1, Frank Bruggeman1, Bas Teusink1 1Systems Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

2Bionanoscience, Delft University of Technology, Delft, The Netherlands E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]

1. Motivation

Product formation is often uncoupled from growth. In fedbatch cultivations, or under zero-growth-rate conditions in retentostats, product formation is carried out when cells hardly grow. The heterogeneity of such cultures is however not understood. Perhaps high stress levels cause cells to diversify and prepare for future conditions, at the expense of product formation.

2. Aim

We aim to characterize the heterogeneity of product

formation in Saccharomyces cerevisiae cultures, using single-molecule RNA FISH (Fluorescent in situ hybridization). The advantage of this method is that it does not require genetic engineering and that it allows for quantification of heterogeneity in gene expression by single cells.

3. Results

We have established the protocol for smRNA FISH in

Saccharomyces cerevisiae. The protocol involves cell fixation, fluorescent labeling of single transcript molecules, the computational framework for fluorescence microscopy image analysis, and the statistical analysis of transcript counts. We illustrate this method for a nutrient transition experiment where yeast switches from glucose to ethanol growth.

Figure 1: smRNA FISH pipeline

4. Conclustion smRNA FISH is a powerful method for quantification of

the heterogeneity in single-cell gene expression. In the future, we will use this method to assess the heterogeneity of the expression of product formation enzymes under low growth-rate conditions.

Poster #22

1

BioSB 2016 Abstract

Molecular pathway analysis in human trabecular meshwork cells after treatment with corticosteroids

Ilona Liesenborghs1, Jan S.A.G. Schouten1, Theo G.M.F. Gorgels1, Chris T. Evelo2, Lars M.T. Eijssen2,Martina Kutmon2,

Henny J.M. Beckers1, Carroll A.B. Webers1

1 Department of Ophthalmology, University Eye Clinic Maastricht, The Netherlands 2Department of Bioinformatics, BiGCaT, Maastricht University, The Netherlands

E-mail: [email protected], [email protected]

1. Introduction

For decades corticosteroids have been used for the treatment of many different diseases in a variety of specialisms, including ophthalmology. However, the use of these drugs causes an elevation of the intraocular pressure (IOP) in 18-36% of the patients. In patients with primary open angle glaucoma this incidence is reported to be as high as 92%.[1, 2] The risk is highest when corticosteroids are applied in or close to the eyes. Continued corticosteroid use causes permanent damage of the optic nerve resulting in loss of visual field and eventually blindness. If such damage occurs, it is called corticosteroid-induced glaucoma.

The precise pathogenic mechanism of corticosteroid-induced glaucoma is not known, although the trabecular meshwork, where the outflow of fluid in the eye is regulated, seems to play an important role.[2] In order to get a better insight in the pathogenic mechanism we ran pathway analysis of microarray datasets in which the gene expression of human trabecular meshwork cells treated with and without dexamethasone (a corticosteroid) were compared. 2. Materials & Methods

A search for microarray datasets was conducted in Gene Expression Omnibus (GEO) and ArrayExpress. Datasets GSE6298, GSE37474, GSE16643, and GSE65240 were selected for further analysis. Quality control and pre-processing was performed with ArrayAnalysis.org. Datasets were subjected to quality control using diagnostic plots and pre-processed using normalization methods. Then statistical analysis was conducted using the Limma package for R/Bioconductor (linear regression models).

Thereafter pathway overrepresentation analysis and visualization were performed with PathVisio, using the human pathway collection of WikiPathways (now containing 811 pathways). The overrepresentation analysis was executed using differentially expressed genes. Pathways with a Z-score > 1.96, a permuted p-value < 0.05, and ≥ 3 changed genes were considered significantly changed. All significantly changed pathways were compared and the pathways that were significant in at least three studies were further investigated.

3. Results Pathway analysis of datasets GSE6298, GSE37474, GSE65240, and GSE16643 showed respectively 20, 19, 38 and 18 significantly altered pathways. None of the pathways were significantly altered in all the datasets. Seven pathways were significantly affected in three datasets. Some of these are already known to be associated with glaucoma or corticosteroid induced effects. For example, the complement activation pathway (WP545) is upregulated in all three the studies and has been associated with primary open angle glaucoma.[3] Complement activation can be triggered by immunoglobulins. However, studies showed that other triggers are able to activate the pathway.[4, 5] 4. Discussion

The method described above can be used to identify pathogenic pathways that play a role in the development of corticosteroid-induced glaucoma, which can contribute to help understand the mechanism. References 1. Tripathi, R.C., et al., Corticosteroids and glaucoma risk.

Drugs Aging, 1999. 15(6): p. 439-50. 2. Jones, R., 3rd and D.J. Rhee, Corticosteroid-induced

ocular hypertension and glaucoma: a brief review and update of the literature. Curr Opin Ophthalmol, 2006. 17(2): p. 163-7.

3. Tezel, G., et al., Oxidative stress and the regulation of complement activation in human glaucoma. Invest Ophthalmol Vis Sci, 2010. 51(10): p. 5071-82.

4. Ding, Q.J., et al., Lack of immunoglobulins does not prevent C1q binding to RGC and does not alter the progression of experimental glaucoma. Invest Ophthalmol Vis Sci, 2012. 53(10): p. 6370-7.

5. Sjoberg, A.P., L.A. Trouw, and A.M. Blom, Complement activation and inhibition: a delicate balance. Trends Immunol, 2009. 30(2): p. 83-90.

Poster #23

Epigenetic analysis of patients with FTD/MND indicate a role for pathways involved in

neurological development and calcium ion binding

Erdogan Taskesen1,2, Danielle Posthuma1,2 and Yolande Pijnenburg2

1VU University Amsterdam, Complex Trait Genetics (CTG), Amsterdam, the Netherlands. 2VU University Medical Center (VUMC), Alzheimercentrum, Amsterdam, the Netherlands

E-mail: [email protected], [email protected], [email protected]

1. Introduction

Frontotemporal dementia (FTD) is the second most

common form of dementia characterized by progressive

degeneration of the temporal and frontal lobes of the brain.

The use of Genome-wide association studies (GWAS) have

become a standard approach to identify genetic risk variants.

However, only a handful of highly penetrant genetic variants

have so far been identified in the largest GWAS study

containing 3526 FTD patients. It becomes clear that a

substantial part of the genetic risk variants has a small effect.

A currently important open question is whether, besides

genetic factors, epigenetic factors may converge on

biological processes, and as such cause degeneration of the

temporal and frontal lobes of the brain developmental

processes.

2. Materials & Methods

In this study we analysed the DNA-methylation profiles

of 128 FTD patients, and created a stepwise integration

approach to analyse the association between GWAS

associated SNPs and epigenetic regulation.

3. Results

Analysis of all FTD patients (n=128) revealed a

heterogeneous DNA-methylation profile whereas patients

with Motor Neuron Disease (FTD/MND) showed a more

homogeneous profile. We therefore followed up the latter

group and detected in total 224 unique genes with

significantly differential cytosine DNA-methylated levels

(PFDR<0.05) compared to healthy controls. Although DNA-

methylation profiles are derived from whole blood, we can

demonstrate that our detected genes are most significantly

associated with the brain tissue-type (GTEx), more specific

with the Medial Prefrontal Cortex, and the GO-term

Forebrain development (all with P<0.05). Interestingly, out

of the 224 DNA-methylated genes, 37 genes also contained a

GWAS associated SNP in FTD/MDN.

To address the affected biological mechanisms, we

subsequently performed a pathway analysis and detected

associations with various neurological functions, but also

with epigenetic brain marks (PBY<0.05). Further network

analysis revealed calcium ion binding, and intrinsic

component of plasma membrane being affected (P<0.001).

Although some of these processes are already associated

with dementia, the relationship with epigenetics is yet

unknown.

For other neuropsychiatric diseases, such as Bipolar

disorder, abnormal development of temporal and frontal

lobes is also known. Interestingly, our detected differential

DNA-methylated genes also significantly overlap with genes

associated with Bipolar Disorder. This may indicate a

common neurological mechanism that may have been

affected.

4. Conclusion

We show for the first time detailed epigenetic biological

processes involved in FTD/MND, and that understanding of

both genetic and epigenetic factors are critical for

unravelling the road to abnormal neurological development.

Poster #24

Defining the Threshold and Boundaries of the Keratinocytes Response to Blue Light Exposure

Zandra C. Félix Garza1, Joerg Liebmann2, Matthias Born2, Peter A.J. Hilbers1 and Natal A.W. van Riel1

1 Dept. of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands 2 Philips GmbH, Innovative Technologies, Aachen, Germany

E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]

1. Introduction

Blue light (BL) irradiation of human keratinocytes is known to decrease their proliferation and increase their differentiation without inducing apoptosis in a wavelength and fluence dependent manner 1. It has been suggested that these BL effects are mediated by nitric oxide (NO) 2 and reactive oxygen species (ROS) 3. However, little is known of the threshold and boundaries of the keratinocytes response to blue light. From in vitro studies at higher wavelengths, it is known that a cell’s redox potential is inversely correlated with the magnitude of the irradiation’s effect 4. Characterizing the cellular response of these epidermal cells to BL may contribute to the design of effective therapies for inflammatory skin conditions. In order to define the keratinocytes response to blue light exposure and provide a holistic view of the underlying mechanism of BL irradiation we propose a computational model-based method.


We used a computational model to describe the

variations induced by blue light exposure on the cellular concentrations of NO and ROS, and the associated changes in the cellular processes of keratinocytes.

Data from previous studies was included in the model. The data comprised fluence and wavelength dependent concentrations of NO and ROS, and keratinocytes kinetics. 3. Results

Here we present preliminary results obtained with the

model, and examine whether it accurately incorporates the observed changes in the proliferation and differentiation capacity of keratinocytes and the concentration of NO and ROS after BL exposure. The model is then used to analyse the effect of blue light on keratinocytes with an initial low and high redox potential. Moreover, it is employed to define the maximum and minimum points of the cellular response to blue light, considering different energy levels. 4. Discussion

Although the proposed model constitutes an ongoing

work, its preliminary results agree with the reported

observations of the BL induced changes on the cellular processes of keratinocytes. References

1. Liebmann, J., Born, M. & Kolb-Bachofen, V. Blue-Light Irradiation Regulates Proliferation and Differentiation in Human Skin Cells. J. Invest. Dermatol. 130, 259–269 (2010). 2. Opländer, C. et al. Mechanism and biological relevance of blue-light (420–453nm)-induced nonenzymatic nitric oxide generation from photolabile nitric oxide derivates in human skin in vitro and in vivo. Free Radic. Biol. Med. 65, 1363–1377 (2013). 3. Liebel, F., Kaur, S., Ruvolo, E., Kollias, N. & Southall, M. D. Irradiation of skin with visible light induces reactive oxygen species and matrix-degrading enzymes. J. Invest. Dermatol. 132, 1901–1907 (2012). 4. Karu, T. Ten Lectures on Basic Science of Laser Phototherapy. Prima Books.(2007)

Poster #25

Visualising gene expression in the brain

Sjoerd M.H. Huisman1,2, Baldur van Lew2,3, Ahmed Mahfouz1,2,Marcel J.T. Reinders1, Boudewijn P.F. Lelieveldt1,2

1Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands2Image Processing, Leiden University Medical Center, Leiden, The Netherlands

3Computer Graphics and Visualisation, Delft University of Technology, Delft, The NetherlandsE-mail: [email protected]

1. Introduction

The scientific endeavour to understand the developmentand molecular characteristics of the human brain partly re-lies on gene expression data of brain samples. The highanatomical complexity of the brain, and the large numberof genes that are expressed in it, mean that it is challengingto analyse this data. Gene expression analyses have giveninsight in for instance the main transcriptional networks,but a strong visual way of exploring this data is lacking.

Gene expression data of the brain can be studied in anumber of distinct ways. One can focus on relationshipsbetween genes, to identify networks or modules of func-tionally related genes that are relevant to areas of the brain.On the other hand, one can focus on the relationships be-tween areas in the brain, for instance to complement con-nectivity data with molecular information. When data ofbrains in different stages of development are available, onecan also look at the changes that take place in the brain,both in relationships between genes and between anatomi-cal brain areas. Combined, these analyses provide a wealthof information on the molecular basis of brain function.

We introduce a methodology and tool, called Brain-Lens, to analyse spatial or spatio-temporal gene expressiondata of the brain. It uses linked co-expression maps andan anatomical explorer, both to show patterns in gene-generelationships and in sample-sample relationships. In the in-teractive tool, users can make selections in gene and samplemaps to discover gene expression patterns, or upload genesets of interest to find their brain-specific transcriptional re-lationships and explore the organisation of expression inthe brain.


We made use of the human brain transcriptome data fromthe Allen Human Brain Atlas, which contains microarraydata collected from the postmortem brains of six healthydonors. Between 300 and 1000 samples were dissected ineach brain. We used a computationally efficient implemen-tation of t-SNE to embed the human brain transcriptome ofeach donor in two separate 2D maps simultaneously: (1)gene t-SNE and (2) sample t-SNE.

As an example, to investigate the advantage of thesimultaneous embeddings, we analysed a set of 74 post-synaptic density (PSD)-related genes. We clustered the

PSD-related genes based on their distance in the gene basedt-SNE map, using DBSCAN. We used DAVID to find thegene ontology (GO) terms associated with each cluster ofgenes.

3. Results

For our example, the viewer shows that the PSD-relatedgenes cluster into three distinct clusters, cluster 1 of 16genes, cluster 2 of 28 genes, and cluster 3 of 8 genes.Genes in cluster 1 are all preferentially expressed in thecerebral cortex, and are enriched for GO-term synapse (p= 1.01 · 10−7). Genes in cluster 2 are similar (albeit withstronger expression in cerebellar cortex), and are also en-riched for synapse (p = 7.30 · 10−22). On the other hand,genes in cluster 3 have low expression in the cerebral cortexand cerebellum and high expression in subcortical regions,such as the thalamus and brainstem. Cluster 3 is enrichedfor GO-term myelin sheath (p = 6.86 · 10−9).

Figure 1: A: Gene t-SNE map with PSD gene clusters; B:Sample t-SNE maps for the three clusters; C: Average ex-pression (blue-red) of these clusters in brain slices.

4. Discussion

The presented analysis shows that our simultaneous em-bedding approach can be very helpful in identifying func-tionally related genes as well as their anatomical organiza-tion. Moreover, the anatomical localization of each samplerenders integration with imaging data straightforward, re-vealing relationships between groups of genes and specificimaging observations.

Poster #26

Integrated omics strategy identifies a potential new resistance pathway in

colorectal cancer Martin Fitzpatrick1,2*, Anna Ressa1,2*, Joep de Ligt1,3*, Evert Bosdriesz1,4*, Anirudh Prahallad4, Myrthe Jager3, Gianluca Maddalo2,

Lisanne de la Fonteijne3, Maarten Altelaar2, Rene Bernards4, Edwin Cuppen3, Lodewijk Wessels4 & Albert J. Heck2

*Equal contribution | 1. Cancer Genomics Center, The Netherlands. | 2. Biomolecular Mass Spectrometry and Proteomics, Utrecht

Institute for Pharmaceutical Sciences, Utrecht University & Netherlands Proteomics Centre, Utrecht, The Netherlands | 3. Centre for

Molecular Medicine, Department of Genetics, University Medical Centre Utrecht, Utrecht, The Netherlands | 4. The Netherlands

Cancer Institute, Amsterdam, The Netherlands

E-mails: [email protected], [email protected], [email protected], [email protected]

1. Introduction

Activating mutations of oncogenic proteins are a common

feature of many cancer and an obvious target for drug

intervention. BRAF V600E mutant skin cancers respond

well to BRAF drug inhibition. BRAF-mutant colorectal

cancer (CRC) however is insensitive to treatment due to

feedback activation of the epidermal growth factor receptor

(EGFR) [1]. Treatment of CRC with a combination

treatment of BRAF-inhibitor (BRAFi) and EGFR-inhibitor

(EGFRi) is currently undergoing clinical trials

[NCT01750918 in clinicaltrials.gov]. In this project we

aim to identify the mechanisms underlying BRAFi

resistance in CRC and identify both alternative treatment

targets and possible escape pathways using an integrative

multi-omics analysis approach.


Colorectal tumour cell line WiDr (WT and PTPN11 -/-) [2]

was used to study resistance of CRC to drug treatment.

Following EGFR stimulation cells were monitored under 6

conditions (control, BRAFi, EGFRi and a combination of

the 2 inhibitors, PTPN11-/- and PTPN11-/- with BRAFi).

Gene expression, protein levels and protein

phosphorylation were quantified in biological triplicates at

5 subsequent timepoints (0, 2, 6, 24 and 48 hours).

Expression levels were measured using ribosomal depleted

RNA Sequencing, protein levels were measured using

Mass Spectrometry and phosphorylation was measured by

Ti4+- IMAC Phospho enriched Mass Spectrometry. Fold

changes in abundance or occupancy at each timepoint were

calculated relative to the untreated controls or between

treatments for treatment-specific effects. Gene ontology

function, pathway and process enrichments were

performed at all omics levels.

3. Results

The resulting proteomic and phosphoproteomic data were

beyond the scale of established tools to process and

analyse. To remove this limitation an open source Python

package (PaDuA) was developed to provide a complete

robust workflow for both scripted and interactive analysis

of proteomic and phosphoproteomics data (Figure 1).

Quality control (QC) indicated strong concordance of

technical and biological replicates across the proteomic,

phosphoproteomics and transcriptomic datasets. Principal

component analysis (PCA) showed strong clustering at

both the protein abundance and expression level with

coordinated changes at later timepoints following both

BRAFi and combination BRAFi plus EGFRi treatment.

Figure 1. (Phospho)proteomic summary QC figures

automatically generated by the PaDuA workflow.

Our initial findings indicated strong up-regulation of

vesicle and endocytosis associated proteins at the 24 and

48 hour timepoints following both BRAFi and combination

treatments. Further study has highlighted the potential for

receptor endocytosis, degradation or recycling to function

as possible resistance mechanisms (Figure 2).

Figure 2. Schematic of the endocytosis pathway coloured

by proteomic abundance at 48h following combination

treatment (orange up-regulated, purple down-regulated).

4. Discussion

The complete success of combination treatment in

colorectal cancer requires further investigation. The initial

findings from our data demonstrate the potential for multi-

omics approaches to identify putative mechanisms of

resistance in a model system. The results from this study

will help direct future efforts to minimise side effects and

potential for resistance in colorectal cancer. The identified

pathways and genes of interest are to be followed up with

targeted validation. As our results were obtained in a

gnomically disrupted cell line [2] results will be confirmed

in a stable background. Proteomic analysis tools developed

to support this project have been made publicly available.

References 1. Prahallad, A., et al. Unresponsiveness of colon cancer to BRAF(V600E)

inhibition through feedback activation of EGFR. Nature 483, 100–3 (2012).

2. Chen, T. R., et al. WiDr is a derivative of another colon adenocarcinoma cell line,

HT-29. Cancer Genet. Cytogenet. 27, 125–134 (1987).

Poster #27

A systems approach to explore triacylglycerol production in

Neochloris oleoabundans

Benoit M. Carreres 1, Anne J. Klok 2, Maria Suarez-Diez 1, L. de Jaeger 2, Mark H.J. Sturme 2, Packo P. Lamers 2, René

H. Wijffels 2, Vitor A.P. Martins dos Santos1, Peter J. Schaap 1, Dirk E. Martens 1

1. Laboratory of systems and synthetic biology, Wageningen University, Wageningen, The Netherlands

2. Bioprocess engineering, Wageningen University, Wageningen, The Netherlands

E-mail: [email protected]

Introduction

Biofuels present a promising alternative to meet current

energy demands in an environmentally friendly and

sustainable way. However, plant derived biofuels present

disadvantages such as low photosynthetic efficiency, and

competition with food production for arable land.

Microalgae have the potential to overcome these

disadvantages. Among the lipids they produce, the triacyl-

glycerides (TAG) are easily converted into an effective and

clean fuel [1]. When exposed to nitrogen limitation, the

oleaginous microalgae Neochloris oleoabundans

accumulates up to 40% of its dry weight in TAG. However,

a feasible production requires a decrease of production

costs, which can be partially reached by increasing TAG

yield [2].

Materials and methods

N. oleoabundans was grown in a turbidostat operated

flat panel photobioreactor under controlled conditions.

Two light absorption rates were combined with several

nitrate supply rates. Samples were taken daily during

steady state at the same hour of the day and parameters

needed for constraint based modeling of metabolism such

as growth, nutrient uptake, and starch and TAG production

rates, were measured.

We built a constraint based model describing primary

metabolism of N. oleoabundans. Fluxes were calculated

through this network by flux balance analysis using

measured parameters as constraints.

cDNA samples of 16 experimental conditions, 8 from this

experiment and 8 from another experiment, were

sequenced on an Illumina platform using short paired-end

reads. Contigs were assembled and functionally annotated.

After mapping reads to CDS, analysis of GO-terms over-

representation in differentially expressed transcripts was

performed. Relative expression changes and relative flux

changes for all reactions in the model were compared.

Results

The metabolic model predicts a maximum TAG yield

of N. oleoabundans on light of 1.07 g (mol photons)-1, more

than 3 times current yield under optimal conditions.

Furthermore, from optimization scenarios we concluded

that increasing light efficiency has much higher potential

to increase TAG yield than blocking entire pathways (i.e.

starch biosynthesis).

Our analysis shows that nitrogen limitation directly affects

gene expression of nitrogen dependent reactions. On the

other hand, high light generates more energy which has to

be distributed over all possible pathways, thereby

indirectly propagating into lower metabolism. Starch

biosynthesis is up-regulated by high light and nitrogen

limitation, while its degradation reactions are down-

regulated by nitrogen limitation only. Lipids and TAG

synthesis pathways did not display a uniform pattern and

some reactions are strongly down-regulated by high light

and nitrogen limitation. Our analysis also suggests that

Phytyl diphosphate synthase acts as a central regulator for

pigment synthesis and breakdown.

Discussion

This project is a successful collaborative work between

laboratories of two different disciplines, namely Systems

and Synthetic Biology and Bioprocess Engineering,

performed by a combination of dry and wet lab

experiments. With the newly acquired knowledge, the

collaboration loop can be closed by designing new

experiments based on the in silico analysis.

Integration of predicted metabolic fluxes and transcript

expression data increased our understanding of the

microalgal adaptation upon nitrogen limitation and high

light exposure. Furthermore, we found few key reactions

that show a different behavior when high light induction is

applied on nitrogen sufficient or nitrogen limited

conditions. This behavior reflects the interdependence of

the response to nitrogen and light supply. Some other

reactions showed unexpected regulatory patterns thereby

providing prime choice targets for further studies. This

study is focused on central metabolism and more can be

learnt from a wider analysis. However, challenges

associated to microalgae annotation still prevent such in-

depth analysis [3].

References

1. Dellomonaco C, Fava F, Gonzalez R. The path to next

generation biofuels: successes and challenges in the era of

synthetic biology. Microb Cell Fact. 2010;9(3):1-15.

2. Schenk PM, Thomas-Hall SR, Stephens E, Marx UC,

Mussgnug JH, Posten C et al. Second generation biofuels:

high-efficiency microalgae for biodiesel production.

Bioenergy research. 2008;1(1):20-43.

3. Reijnders MJ, Carreres BM, Schaap PJ. Algal Omics:

The Functional Annotation Challenge. Current

Biotechnology. 2015;5.

Poster #28

Sensitivity analysis of models with tipping points

G.A. ten Broeke1, G.A.K. van Voorn1, B.W. Kooi2 and J. Molenaar11Biometris, Wageningen University, Wageningen, The Netherlands

2Faculty of Earth and Life Sciences, Free University, Amsterdam, The NetherlandsE-mail: [email protected]

1. Introduction

Tipping points are best revealed by applying bifurcationanalysis. Bifurcation analysis, however, is limited to de-terministic models, whereas many ecological models arenondeterministic. For those models one would like to relyon sensitivity analysis, but traditional methods of sensitiv-ity analysis do not take into account the effects of tippingpoints. Here we examine how much information on tip-ping points can still be otained using sensitivity analysis(see also [1]).


As an example, we consider the Bazykin-Berezovskayamodel, a predator-prey model with a strong Allee effect:

dX

dt= X (X − ζ) (κ−X)−XY , (1)

dY

dt= γ (X − h)Y . (2)

Bifurcation analysis leads to the bifurcation diagram in the(ζ, h) plane (Fig. 1). Note, however, that the presented ap-proach is meant for models for which such a diagram is notavailable. Here we use it as a consistency check.

• One-factor-at-a-time (OFAT) sensitivity analysis [2]shows how the output changes as a function of an in-dividual parameter, by varying the parameter over alarge range with respect to a default parameter setting.

• Global sensitivity analysis quantifies which parame-ters are influential over a region of parameter space,based on sample points within this region.

Figure 1: Bifurcation diagram. In the green region, the modelconverges to a positive steady state, or to extinction, dependingon the initial conditions. The curve H indicates a Hopf bifur-cation where the stable steady state turns into stable limit cycles(blue region). The curve G indicates a global bifurcation beyondwhich the model always converges to extinction (white region).The dotted line indicates a transcritical bifurcation.

3. Results

OFAT indeed reveals the transcritical bifurcation where thepopulation goes extinct. For h this is shown in Fig. 2 (leftpanel). Since all other parameters are fixed, the bifurcationoccurs at a single critical value.

For the global sensitivity analysis we plot the output asfunction of a single parameter, while letting other parame-ters vary freely (Fig. 2, right panel). Thus, for each value ofh we now have a number of sample points that correspondto different values of other parameters (in this case ζ andY (0)). For decreasing h, the transcritical bifurcation is in-dicated by an increasing number of samples that evolve toextinction. Furthermore, one might suspect different typesof model behaviour based on the existence of two sepa-rate clouds, namely the green and blue one. So, also with-out having a bifurcation diagram available, we may detectthese kind of dramatic transitions.

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

h

Y(500)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

h

Y(500)

Figure 2: Left: OFAT sensitivity analysis shows a transcriticalbifurcation for decreasing values of h. Right: Global sensitivityanalysis shows that it depends on other parameters at what valueof h this bifurcation occurs. The red squares display the meanoutput per value of h, and may be used as sensitivity measure.

4. Discussion

Our results [1] show that OFAT can serve as an appro-priate starting point for analysing models with tippingpoints. OFAT is computationally cheap, but examines onlychanges in individual parameters, and thus does not con-sider interaction effects. Global sensitivity is more compu-tationally expensive, but considers interaction effects . Ifcombined with a good sampling design and graphical pre-sentation, global sensitivity analysis can yield important in-formation about tipping points.

References

1. GA Ten Broeke, GAK van Voorn, BW Kooi, J Molenaar. Sensi-tivity analysis of models with tipping points. Math. Meth. Nat.Phenom., IN PRESS.

2. GA Ten Broeke, GAK van Voorn, A Ligtenberg. Which sensitivityanalysis method should I use for my Agent-based Model? JASSS,19 (2016).

Poster #29

Retrieval of genome-based information from sequence databases using hybrid homology-annotation searches: case of complete RNA virus genomes

Igor Sidorov1, Andrey Leontovich2, Dmitry Samborskiy2, and Alexander Gorbalenya1,2 1Department of Medical Microbiology, Leiden University Medical Center, Leiden, The Netherlands

2Belozersky Inst. of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia

E-mails: [email protected], [email protected], [email protected], [email protected]

1. Introduction

Retrieval and use of biological information associated with nucleotide sequences is commonly accomplished by scanning databases for either matches between query and sequence annotations or statistically significant similarity (homology) between query and target sequences. The former approach relies on accuracy of annotation of target sequences, which varies considerably in databases and may compromise both sensitivity and selectivity of searches. The latter approach is free of this limitations due to high accuracy of genome sequencing. Yet, establishing biologically meaningful thresholds to separate hits with genuine and fortuitous similarity remains unmet challenge. Practically, similarity thresholds are set high enough (“S” in Figure) to eliminate false positives at the expense of sensitivity. Here we describe an approach with improved sensitivity and selectivity of retrieval of biological information which combines analyses of annotation and sequence similarities between query and targets within a statistical framework and automatically establishes data-driven thresholds (“H” in Figure) for sequence similarity. 2. Materials & Methods

The core of our approach is the use of isotonic regression for simultaneous analysis of distributions of similarity values and annotation matches to calculate probability for each database sequence that it belongs to the query group. As proof of the principle, the approach was realized in a computational engine (dubbed HAYGENS, Homology-Annotation hYbrid retrieval of complete RNA virus GENome Sequences). We used HMMER score values as a measure of similarity between query profile and database sequences, and matches between query and a field of annotation of database entries for assigning each sequence to a group with matching, mismatching or lack of annotations, respectively. Decision for sequence assignment was based on commonly used confidence interval values.

HAYGENS was applied to 13 RNA viral groups of different ranks (genus, family, and order) that include major pathogens of animals and plants and many poorly characterized viruses. RNA viruses are highly diverse, both within and between families, but each family has a unique domain layout in genome whose size varies within a limited range. Sequence alignment profiles of family-specific RNA-dependent RNA polymerase (RdRp), and taxa names were used to query GenBank. Additionally, to retain only complete or nearly complete genomic

sequences, the results of hybrid sequence-annotation searches were filtered by length of the entire sequence, RdRp and its flanking regions. 3. Results

Automatically run HAYGENS daily updates are available at http://veb.lumc.nl/HAYGENS. In comparison with annotation-based searches, HAYGENS’ gains of sensitivity and selectivity exceed 5% for about 25000 genomes. The gain of sensitivity is most for patented sequences of GenBank (PAT division), while the gain of selectivity is evenly spread among viral, patented and synthetic sequences (VRL, PAT and SYN divisions). Estimation of expected gains of accuracy of HAYGENS compared to homology-based only searches is not feasible due to the lack of established S-thresholds for genomes retrieval. Still, values of H-threshold varied considerably between the analyzed groups in apparently proper reflection of the virus sampling size and diversity.

Figure. Gain of accuracy of the hybrid genome retrieval. 4. Discussion

With the observed and likely gains in accuracy, hybrid approach can be used for quality assessment of sequence annotations. It may be also useful in procedures that are used to transfer annotation in annotation-based databases (GO) and genomic projects, as well as for calculating data-driven thresholds for structuring sequence profile databases (Pfam).

Mat

ched

Sequence similarity

An

nota

tion

S

Mis

mat

ched

/Em

pty

H

gain of selectivity

gain of sensitivity

Annotation retrieval

Hom

ology retrieval

Hybrid retrieval

Poster #30

Assembly and annotation of the fungi Fusarium poae Sven Warris1, Theo van der Lee1, Andriaan Vanheule2, Henri van de Geest1

1 Plant Research International, Wageningen UR, Wageningen, The Netherlands 2Department of Applied Biosciences, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium

E-mail: [email protected], [email protected], [email protected], [email protected]

1. Introduction

The fungal species Fusarium poae is a prominent wheat pathogen and is, like other fungal pathogens, known for its genome plasticity and rapid evolution and adaptation. Key factors in these evolutionary processes are the presents of supernumerary chromosomes and a high number of transposable elements. Here we show that for such a highly repetitive genome, long read technology enables us to create a de novo assembly of the complete genome sequence. By using high volume HiSeq RNASeq data an automated annotation of this genome delivers a high quality annotation, comparable to the quality of the manually curated Fusarium graminearum genome. 2. Materials & Methods

The genome of Fusarium poae was sequenced using single molecule real time (SMRT) sequencing on the PacBio RSII using 16 SMRT cells. The data were assembled using HGAP2[1] and Celera[2]. Polishing of the genome assembly was performed by using Quiver[2].

For comparison, a de novo assembly was created using HiSeq 200bp paired-end data in CLCBio. These reads were also used for quality assessment of the long read assembly.

To generate a machine annotation, RNA was extracted from mycelium grown in six different culture conditions. A total of 562.136.710 quality trimmed HiSeq paired-end RNA-Seq reads were used in the BRAKER1[3] pipeline to annotate the genome.

RepeatModeler was used to predict repetitive elements in the genome. 3. Results

The new long read assembly contains an additional 7.28Mb of sequence compared to the HiSeq assembly, with a significantly reduced number of contigs and it has a much larger representation of bases in large contigs. In the long-read assembly four complete chromosomes were built from 9 contigs comprising 38.13Mb of sequence (Table 1). Compared to the HiSeq reads obtained from the same isolate, 223 variants were detected in homopolymeric stretches of nucleotides and low complexity (low GC%) regions (219 and 4 respectively). For these variants read mapping was inconclusive for both HiSeq and SMRT reads. In addition one single nucleotide polymorphism (SNP) was detected between the HiSeq and SMRT reads.

Using BRAKER1 and the RNASeq data 14,817 genes were predicted. The BUSCO (Benchmarking Universal Single-Copy Orthologs)[4] data set for fungi was used to assess the quality of this machine annotation (Table 2). This

analysis suggests that the annotation of F. poae did not miss any conserved genes, and within the conserved genes, <0.5% is miss-annotated. Based on the RepeatModeler output, the core chromosomes contain 2% transposable elements (TEs) while the supernumerary genome consists of 25% TEs.

Total Core Supernumerary F. graminearumGenome size (bp) 46.309.701 38.129.297 8.180.404 37.958.956GC% 46,30% 46% 47,60% 48,20%# of genes 14.817 12.097 2.720 14.160Mean gene density (per Mb) 320 317 332 373Median gene length (bp) 1.391 1.406 1.309 1.257Avg introns/gene 1,82 1,88 1,57 1,72Median intron length (bp) 54 54 57 55

F. poae

Table 1. Statistics on the genome assemblies[5]

Organism Complete Fragmented Missing TotalF. poae 1431 7 0 1438F. graminearum 1432 6 0 1438

Table 2. BUSCO statistics of F. poae and F. graminearum[5]

4. Discussion The statistics of the long-read de novo assembly were

much better compared with a de novo assembly of HiSeq paired-end reads for the same isolate. This shows the power of long read technology for the de novo assembly of repetitive fungal genomes.

For the gene modelling a large RNASeq data set was used, which is not always available. However, the statistics of the gene predictions are nearly as good as for the manually curated F. graminearum set, indicating that this approach can generate high quality genome annotations.

References 1. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J,

Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2013;10: 563–9. doi:10.1038/nmeth.2474

2. Denisov G, Walenz B, Halpern AL, Miller J, Axelrod N, Levy S, et al. Consensus generation and variant detection by Celera Assembler. Bioinformatics. 2008;24: 1035–40. doi:10.1093/bioinformatics/btn074

3. Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2015;32: 767–769. doi:10.1093/bioinformatics/btv661

4. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31: 3210–3212. doi:10.1093/bioinformatics/btv351

5. Vanheule A, Audenaert K, Warris S, Geest H van de, Schijlen E, Höfte M, et al. Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen. submitted. 2016;

Poster #31

Modelling Nutrient Dynamics: Donor versus Top-Down Control

Sanne De Smet1, Wim H. van der Putten

2,3 Elly Morriën

4 and Peter C. de Ruiter

1,4

1Biometris, Wageningen University, Wageningen, The Netherlands

2 Department of Terrestrial Ecology, NIOO, Wageningen, The Netherlands

3 Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands

4IBED, University of Amsterdam, Amsterdam, The Netherlands

E-mail: [email protected], [email protected], [email protected], [email protected]

1. Introduction

In order to meet the rising demand for food, crop

production systems will need to become more efficient; i.e.

we need to develop crops that grow fast and are insensitive

to pests and diseases. However, it is generally assumed that

growth is a trade-off with defence. We are interested in how

crops can escape this trade-off. However, knowledge of the

system-level processes of crop production systems is limited

as plants interact with a myriad of soil dwelling species

affecting their performance. This complex soil community

affects plant performance through different mechanisms

which can be positive or negative, potentially cancelling one

another out. We will focus on one type of interaction: the

effect of the decomposer subsystem on plant growth.

Decomposers drive nutrient cycling and hence may

control plant growth as they affect storage and mineralization

rates of nutrients. In models describing the decomposition of

organic material however, this biotic components is not well

incorporated. The decomposition process in existing models

is modelled either in a biological top-down or a

physicochemical donor-controlled way. However, both

approaches fail to tighten this plant-decomposer interaction.

We will take a modelling approach in which we start

from a donor controlled model to which we add the top down

controlled biology. By comparing these model versions we

will identify how decomposition dynamics influence plant

growth. We expect that decomposer properties will override

substrate properties in the control of plant growth by taking

over the control of the decomposition process.


We developed a compartmental model representing the

recycling of carbon and nitrogen in terrestrial ecosystems

(see Fig. 1).

Figure 1. Compartmental model (1: donor control; 2 and 3:

top-down control)

We derived a set of alternative versions modelling the

decomposition process through donor or top-down control.

In the donor-controlled version (1), we represented the

decomposition process as a first-order decay rate. In the top-

down controlled version (2), we added a decomposer

compartment (microbes). In this model the decomposition

rate depended on the biomass and physiology of the

microbes. This top-down controlled model was then further

developed by adding a microbial grazer compartment (3).

We analysed the system of differential equations by

solving for the steady states.

3. Results

Equilibrium plant biomass differed between model

versions. In the donor controlled model it depended on the

organic matter density. In the top-down controlled model,

the dependence shifted to microbial biomass density and

physiology; to which grazer properties are added if the

model is further extended. This implies that decomposers

properties override the donor controlled substrate

characteristics in defining plant growth.

This is further demonstrated in the equilibrium amount

of organic matter. We found that the equilibrium depended

on the plant biomass density in the donor controlled model,

but in the top-down controlled model it is fully defined by

decomposer physiology.

4. Discussion

Our results illustrate the close interaction between plants

and decomposers. We assumed that decomposer properties

would override substrate properties in the control of plant

growth. Both organic matter and plant density are indeed

functions of decomposer properties. Furthermore, under the

assumption of donor control addition of nutrients would

always increase decomposition rates. Whereas the maximum

decomposition rate is determined by decomposer properties

only in the top down controlled model.

In this article we have focused on the effects of the soil

decomposer subsystem. Our present theory does not address

the effects of other soil organisms. With appropriate

modifications they can be included in our theoretical

framework.

As decomposer physiology regulates plant productivity,

efficient crop production systems should take into account

functionality rather than biomass of the decomposers.

Poster #32

Comparative Proteomics Reveal Different Levels of Protein Conservation in the Polarization Network across 200 Fungi.

Eveline T. Diepeveen1, Valérie Pourquié1,2, Thies Gehrmann2, Thomas Abeel2 & Liedewij Laan1

1 Department of Bionanoscience, Delft University of Technology, Delft, the Netherlands 2 Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands

E-mail: [email protected], [email protected], [email protected], [email protected], [email protected]

1. Introduction

An essential step in cell growth and division is the proper break up of cell symmetry, resulting in polarized cells. Protein conservation of the loci involved in this polarization process is therefore expected to be high across taxa, due to conserved functions. The proteins and mechanisms involved in polarity establishment have been disentangled in a handful of fungal species, with most work done in the budding yeast Saccharomyces cerevisiae. Here we investigate how conserved polarity establishment is, by comparative proteomics of the bulk of proteins responsible for polarity establishment in budding yeast.


We selected 29 polarization proteins based on described

functions in the polarization process. We constructed a protein database (PD), consisting of 200 proteomes from strains and species of four fungal phyla: Microsporidia, Cryptomycota, Basidiomycota and Ascomycota (see Fig. 1). Next we screened the PD for the presence/absence of the 29 focal proteins through local Blastp searches. The obtained Blast hits were used to calculate the absolute number of occurrences per amino acid to determine the level of conservation along the amino acid sequence per protein. The results were plotted in conservation plots.

Phylogenetic relationships of the 200 examined strains and species were determined by using 86 conserved loci obtained from the DP. We performed phylogenetic inference (i.e., approximately maximum likelihood method) with FastTree and used the JTT model of amino acid evolution.

0.2

Saccharomycotina (33)

Pezizomycotina (113)

Taphrinomycotina (4)

Basi

diom

ycot

aAs

com

ycot

a

Ustilaginomycotina (4)

Agaricomycotina (2)

Microsporidia (7)

Agaricomycotina (30)

Pucciniomycotina (6)

Cryptomycota (1)

1

1

1

1

1

0.98

1

1

1

1

1

1

1

1

1

SaccharomycotinaPezizomycotina

Taphrinomycotina

AXL2

BEM

1BE

M2

BEM

3

MSB

3M

SB2

MSB

1LT

E1IQ

G1

GIC

2G

IC1

CLA4

CDC4

2CD

C24

BOI1

BNI1

BEM

4

SKM

1SE

C15

RSR1

RHO

3RG

A1RD

I1RA

S2

SEC4

SEC3

UBI4

SWI4

STE2

0

100%75%

50%1%0%

Ustilaginomycotina

Wallemiomycetes

Microsporidia

AgaricomycotinaPucciniomycotina

Cryptomycota

Figure 2. Polarization protein conservation levels. The greyscale represents the level of presence/absence of the 29 proteins for each clade clade. Core proteins are highlighted in red. Note the lineage-specific loss of nearly all polarization proteins in the Microsporidia (bottom row) and the lineage-specific gain of GIC1 and GIC2 in the true yeast group (within the Saccharomycotina; top row). 3. Results & Discussion

Our results indicate that polarization loci are conserved

at different levels across the fungal clade. There is a core (5/29) of highly conserved proteins, which are present in all fungal clades besides the Microsporidia (Fig. 2) and are found at various subfunctions of the polarization machinery.

However, we also observe lineage-specific losses and gains of polarization proteins in several clades (Fig. 2), even though some of these loci (such as CDC24) are essential in budding yeast and are commonly found in non-fungal eukaryotes. Some of these events seem to co-occur with e.g., strong genome reduction in the Microsporidia clade, and yeast-specific gain of loci, partially fuelled by the yeast-specific genome duplication.

Together our results show that the polarization network consists of a highly conserved part, but at the same time there are various proteins that only occur in specific species or clades, which might have evolved or been maintained for specific roles or functions. Further research will focus on the functional implications of these strong differences of protein conservation and lineage-specific patterns.

< Figure 1. The fungal phylogeny based on 86 loci. The number of examined strains/species per clade is depicted in brackets. Support values are specified above the branches.

Poster #33

GENOBOX: A GENOmics Tool-BOX for the Prediction of Functional Properties of Bacterial Strains

Karima Hajo1, Leonardo Lenoci1 and Sacha A.F.T. van Hijum1,2 1Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, The Netherlands

2NIZO Food Research, Ede, The NetherlandsE-mail: [email protected]

1. IntroductionFermented foods and beverages such as cheese, yogurt,

bread, beer and wine have been in the human diet for centuries. They are the final product of the fermentation of yeast and lactic acid bacteria on substrates such as milk, cereals and fruits.

In the context of the European research programme FP7, the GENOmics tool-BOX (GENOBOX)1 project proposes a novel approach to the prediction of functional properties of micro-organisms. GENOBOX allows fast and cost-effective prediction of specific strain functionalities based on genomic information. Genomic information of bacterial strains are stored in the GENOBOX database to allow comparison, analysis and visualisation of annotated genome and phenotype data providing prediction of functional properties characterisation of the strains.

2. Materials & MethodsGENOBOX is a database-driven web application

developed in the python web framework, Django2. The genomics toolbox is hosted on a virtual machine with Ubuntu 12.04 and Apache as the web server. The underlying database, used to store genomics data (genome sequences, functional annotations, other genomic features, functionality to screen for, orthology groups and phenotypic data) of the strains is a PostgreSQL database based on Chado3, which is a generic and extensible relational database schema for biological data. The software used in the user interface of the GENOBOX application is (i) Javascript for interactivity, (ii) CSS, (iii) DataTables and (iv) Jquery to visualize, filter and paginate web-page contents.

The genomics toolbox includes (annotated) genome data of public bacterial strains from the NCBI and private industrial strains sequenced using Illumina HiSeq2000 paired-end sequencing, which are strictly accessible to the owners. The phenotypic data in the GENOBOX application have been experimentally determined in the lab.

3. ResultsThe genomics toolbox provides an analysis framework

to easily browse through the experimental and biological data in the database to study detailed results on strains of interest. Particularly useful in GENOBOX are the functionality screening options, which uses genomic targets that are associated with specific functionalities. A list of genomic targets have been defined from literature and expert

knowledge. They include genes involved in flavour formation, proteolysis, antibiotic resistance, virulence factors and probiotic functionalities.

4. DiscussionGENOBOX

can be used to predict functional properties of microbial strains based on their genome sequences and function prediction models that have been implemented in the application. GENOBOX has broad applicability for research and industrial purposes in the improvement of production, flavour, safety and robustness of processes where microbes play an important role. In the future we aim to expand the function prediction models based on genotype-phenotype matching results and provide options to store and query sequence data for complex fermentations consisting of multiple microbes as well as sequence based predictive models for functions of these consortia of microbes.

References1. A genomics toolbox to enhance business from SMEs in the market of starter cultures and probiotics. (2015, March 11). Retrieved from http://cordis.europa.eu/project/rcn/110148_en.html.2. Django (Version 1.4) [Computer Software]. (2012). Available from https://djangoproject.com.3. Mungall CJ, Emmert DB. A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics. 2007 Jul 1;23::i337–46. Available from http://gmod.org/wiki/Chado.

Figure 1. A flowchart detailing the use of the GENOBOX application for different purposes (property). The flow to identify genes in strains responsible for Flavour(green), Safety(purple), Robustness(brown) and improved biomass production(blue) are displayed.

Poster #34

Titel van abstract: Replacing MLPA CNV detection with XHMM on targeted panel NGS data Auteurs: Daphne van Beek, Ivo Clemens, Marjan Weiss, Alessandra Maugeri, Ingrid Bakker, Quinten Waisfisz, Erik Sistermans Alle instituten: VUmc Klinische Genetica Keywords: XHMM optimization, validation, Copy Number Variation (CNV) detection, NGS read-depth analysis, targeted and Whole Exome Sequencing, MLPA, array, diagnostics Abstract (Maximum 2500 characters): Currently MLPA and arrays are routinely used for the clinical diagnosis of large DNA duplications and deletions (copy number variations, CNVs). Recently several tools for the detection of Copy Number Variations (CNVs) on NGS read-depth data have been developed. CNV detection by read-depth methods - such as XHMM (eXome-Hidden Markov Model) - is economically advantageous, as this type of analysis uses the NGS data already available from the SNP/INDEL detection without any additional laboratory costs. In this study we validated the use of XHMM for the detection of CNVs in a NGS based targeted gene panel. Method XHMM was optimized on 1210 samples sequenced with the Connective Tissue Disorder (CTD) panel version 1; 1176 exons captured in solution by a custom Mycroarray MYbaits library. The default parameter setting proved a good choice, except for single exon CNVs. A separate optimal parameter setting for detecting single exon CNVs was validated and implemented next to the default run. The performance of XHMM was tested on 318 samples sequenced with CTD panel version 2; 1977 exons captured in solution by a custom Roche Nimblegen SeqCap EZ Choice Library (IRN4000018830, Nimblegen). In this set were 35 previously MLPA or array confirmed CNVs: 12 duplications, 24 deletions – all heterozygous – and 1 homozygous deletion. Both the duplications and deletions range in size from single exon to over 60 exons. After exploratory runs with all 191 samples at both parameter settings, the panel was cleaned by excluding 30 samples where XHMM found CNVs and 17 samples with outlier average read-depths. 12 exons in the TNXB gene, a known pseudogenic region, were excluded from further analyses. Next, all excluded samples were each in turn analyzed against the cleaned panel at both parameter settings. Results and conclusion XHMM correctly called all 35 known CNVs, including 8 single exon deletions and 4 single exon duplications. XHMM detected an additional 68 CNVs that remain unconfirmed. Roughly half of these CNVs appear false positives, the other half plausible. The unconfirmed CNVs were all filtered out when results were filtered on frequently observed CNVs and requested gene-panel. CNV detection by XHMM was demonstrated to be at least as sensitive as the current methods. After filtering, the specificity proved sufficient. VUmc clinical genetics has implemented XHMM analysis has replaced MLPA for the clinical diagnosis of exon DNA duplications and deletions in CTD samples.

Poster #35

Bringing Zen to zinc sites

Bart van Beusekom1, Wouter Touw1,2, Gert Vriend2 and Robbie P. Joosten1

1Dept. of Biochemistry, Netherlands Cancer Institute, Amsterdam, the Netherlands2CMBI, Radboud University, Nijmegen, the Netherlands

E-mail: [email protected], [email protected]

1. Introduction

Structural zinc sites are a prevalent motif in proteins thatoften play an essential role in DNA recognition [1]. How-ever, the 3D description of many Zn2+ sites in the ProteinData Bank (PDB) is poor, which hampers their biologicaland biomedical interpretation.


A computer program called Zen was designed to help im-prove Zn2+ sites (Figure 1). Zen was incorporated in thePDB REDO pipeline, which optimises new and publishedcrystallographic protein structures by rebuilding and refine-ment [2]. Afterwards, all published Zn2+ site-containingstructures were renewed by PDB REDO to try and improvethe Zn2+ site description. The site quality was analysedusing a root-mean-square Z-score comprised of many dis-tances and angles in the site.

Figure 1: Schematic representation of Zen methodology.

3. Results

6935 structural Zn2+ sites were detected in 2635 PDBstructures which were updated in PDB REDO. Zn2+ sitesbecame much more realistic using the restraints from Zen:the average root-mean-square (rms) Z-score of Zn2+ sitesimproved from 1.70 ± 1.23 to 0.89 ± 0.29. Visual in-spection confirmed that Zen is capable of correcting manyhighly unrealistic Zn2+ sites (examples in Figure 2) whilerarely deteriorating sites that already have a normal geom-etry.

Figure 2: Three pairs of Zn2+ sites before (top) and after(bottom) re-refinement with restraints from Zen. Zn2+ ionsare displayed as grey spheres. Side-chains are shown withCys(Sγ) in yellow and His(Nδ/Nε) in blue.

4. Discussion

We developed a new method to improve the geometryof structural Zn2+ sites. The method has been appliedon a PDB-wide scale and structure models with updatedsites are available through the PDB REDO databank. ThePDB REDO webserver now automatically runs Zen toavoid poor Zn2+ sites in to-be-published protein structures.

References

1. MacPherson, S., Larochelle, S. and Turcotte, B. (2006). MicrobiolMol Biol Rev, 70(3):583-604.

2. Joosten, R. P., Joosten, K., Murshudov, G. and Perrakis, A. (2012).Acta Cryst. D, 48(4):484-496.

Poster #36

Disease specific network with application in network based outcome prediction

Amin Allahyar and Jeroen de Ridder

Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands E-mail: [email protected], [email protected]

1. Introduction In cancer outcome prediction, biological networks (e.g.

pathways) are used to aggregate functionally related genes with added discriminative power and biological relevance [1]. However, recent studies revealed that comparable performance might be achieved using many different biological networks [2, 3]. Perhaps even more striking is that utilizing random networks [3] or integrating random genes as markers [4] performs on par to the case when a real biological network is employed.


We aimed to investigate this issue by effectively turning the problem around. Initially, we constructed a candidate synergistic network in which two genes are connected if their integration yields prediction power beyond what is attainable individually. This is done by evaluating all pairwise combination of genes. Our hypothesis is that a biological network may perhaps be useful in breast cancer outcome prediction if it signifies more connections between identified synergistic pairs compare to random pairs.

In the next phase, we questioned the possibility of generalizing the synergistic pair selection strategy in NOPs. Concretely, we constructed a new classification problem in which topological properties of two genes (e.g. shortest path, jaccard, cluster coefficients etc.) are considered as evidences (i.e. features) and synergistic status between these genes which is determined by our candidate network are considered as corresponding labels. Using this framework, apart from being able to combine evidences from multiple topological measures, we can exploit any arbitrary number of biological networks that are available. 3. Results

We observed that none of considered biological networks sufficiently resemble the synergistic pairs which explains the comparable performance of utilizing a random network in NOPs. We conclude that “connectivity” property of a gene (as often used in NOPs) is not adequate to determine its synergistic mates.

Next, we aimed to predict synergistic pairs using topological measures of several biological networks. Our extensive experiments showed that the synergistic pairs can be accurately identified (~85% AUC) using combination of topological measures. Remarkably, our framework identified efficacy of known topological measures (e.g. degree and page rank) as well as several new ones (e.g. eigenvector centrality and closeness) that were not recognized before.

We also showed that our disease specific network which is specialized in breast cancer outcome prediction can be further employed in existing NOPs to predict the outcome of

unseen patients in an independent dataset. We observed that using our synergic network, most of the NOPs under study performed significantly better when compared to cases where existing biological networks are employed.

Figure 1. Overview of SPADE.

4. Discussion

Using biological networks in outcome prediction

problem is quite common. However, recent studies showed the inefficiency of these networks in aforementioned problem. By pairwise evaluation of all genes we aimed to see if any beneficial network for outcome prediction can be constructed that actually improve the performance. We could substantial similarity between existing networks and our optimal network. Further we investigated the ability of other topological measures that could be used to improve performance of this problem. We could identify several known topological measures as well as new measures that can be effectively used to identify synergistic pairs.

References [1] D. Hanahan, and Robert A. Weinberg, “Hallmarks of

Cancer: The Next Generation,” Cell, vol. 144, no. 5, pp. 646-674, 3/4/, 2011.

[2] C. Staiger, S. Cadot, B. Györffy, L. F. Wessels, and G. W. Klau, “Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis,” Frontiers in Genetics, vol. 4, 2013.

[3] C. Staiger, S. Cadot, R. Kooter, M. Dittrich, T. Müller, G. W. Klau, and L. F. Wessels, “A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer,” PloS one, vol. 7, no. 4, pp. e34796, 2012.

[4] D. Venet, J. E. Dumont, and V. Detours, “Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome,” PLoS Comput Biol, vol. 7, no. 10, pp. e1002240, 2011.

Poster #37

A set of 40 SNPs for DNA or RNA-based identification of Dutch individuals

S. Yousefi1, T. Abbassi-Daloii1, M. Vermaat1, D. Zhernakova2, P. de Knijff1, L. Franke2, P. A.C. ’t Hoen1 1Dept. Human Genetics, University of Leiden University Medical Center, Leiden, Netherlands

2Department of Genetics, UMC Groningen, Groningen, Netherlands

E-mail: [email protected], [email protected], [email protected],

[email protected], [email protected], [email protected], [email protected]

1. Introduction

The use of single nucleotide polymorphisms

(SNPs) for forensic purposes presents several

advantages over use of other genetic markers,

including their abundance in the genome, low

mutation rate and straightforward detection using

high-throughput technologies. The aim of the

present study was to find a small set of SNPs that

can be used for unequivocal identification of Dutch

individuals based on DNA or RNA material.


DNA-based genotype calls were taken from

the 500 parents of the Genome of the Netherlands

(GoNL) project. RNA-based genotype calls from

whole blood RNA-seq were taken from the

Biobank-based Integrative Omics Project (BIOS)

[1]. In the first step, we filtered data based on

number of expected genotypes and alleles for SNPs

(we expect 3 genotypes and 2 alleles for each

SNPs). Only SNPs for which genotypes were

called in >90% of samples (more than 90%

detection rate in both DNA and RNA) were kept.

Then the following SNP selection criteria were

used to select a set of SNPs for individual

identification: (1) minor allele frequency > 0.2, (2)

heterozygosity rate > 0.4,(3) linkage disequilibrium

~ zero, and (4) Hardy–Weinberg equilibrium p-

value > 0.01. All filtering steps have been

performed using R scripts.

3. Results

We detected 532 common SNPs between

DNA-seq and RNA-seq data after primary filtering.

Subsequently, we selected a set of SNPs for

individual identification based on several criteria

(Table 1, Figure 1). We ended up with a selection

of 40 SNPs.

Table 1. Filtering steps with number of remained SNPs

Filter Steps No. of SNPs Number of genotypes and alleles 1 1697154 Genotype calling with detection rate > 90% 18045 Common SNPs between DNA and RNA

with MAF > 0.2

532

HW_ p value > 0.01 79 LD based on R2 < 0.01 44 Ignore SNPs located in HLA loci 40 1 Ignore SNPs with > 3 genotypes or > 2 alleles

The uniqueness of these SNPs was checked and

results showed that this set is still big enough to

identify an individual uniquely in our population.

The selected SNPs were compared with previous

SNPs panels and also 1000 Genomes phase_3

populations. The Minor Allele Frequency of our

SNPs were close to 1000 Genomes European

populations and overlapped with American,

European and South Asian populations.

Figure 1.Distribution of alternative allele frequency in

532 SNPs and selected 40 SNPs in DNA (GoNL project)

and RNA (BIOS project).

4. Discussion

Several strategies and SNP panels have been

published for individual identification. However,

how to choose the best panel of SNPs is still under

discussion. In the present study we focused on a set

of SNPs for individual identification that could be

used in DNA and RNA material. With this selection

of SNPs, individuals can be identified in

biomedical studies. DNA and RNA samples from

the same individual can be matched and possible

sample mix ups can be identified and corrected.

The SNP set may serve forensic casework for

individual identification. Our SNP set has been

identified and evaluated in the Dutch population,

but the high concordance in MAF between the

Dutch and other European populations suggest that

it can more generally be used for individuals of

European descent. The set may be suboptimal for

non-European populations.

References

1. D. Zhernakova, P. Deelen, M. Vermaat, et al

.2015. “Hypothesis-free identification of

modulators of genetic risk factors”. http://dx.doi.org/10.1101/033217.

Poster #38

GruS: A novel scaffolding and gap filling tool for

Oxford Nanopore Technologies reads

Coolen J.P.M.1, Ridder D. de

1, Smit S.

1, Valle-Inclan J.E.

2 and Faino L.

2

1Department of Bioinformatics, Wageningen University and Research center, Wageningen, The Netherlands

2Department of Phytopathology, Wageningen University and Research center, Wageningen, The Netherlands

E-mail: [email protected]

1. Introduction

The introduction of next generation sequencing

technologies (NGS) resulted in an enormous growth of

available genomic datasets and publication of complete and

draft genomes. The introduction of third generation

sequencing technologies by Pacific Biosciences (PacBio) [1]

and Oxford Nanopore Technologies (ONT) [2], made it

possible to sequence large repeat regions and thereby

completing draft genomes by performing scaffolding and gap

filling. Currently no tool is able to use high error-rate ONT

data to improve draft genomes. We present the first novel

scaffolding and gap filling tool, Gru gap filling and

Scaffolder (GruS), that is compatible with PacBio and ONT

data.


GruS is based on PBJelly (PB) [3] and introduces the

ONT specific aligner Graphmap [4], a validation/evaluation

stage and an iteration stage. Thereby making it compatible

with all third generation sequencing technologies.

Contigs, 2D and 1D ONT reads were simulated and used

to evaluate the performance of GruS compared to PBJelly.

Two modes of GruS were used, one identical to PB

(GruS B) and one using Graphmap as aligner (GruS G).

3. Results

Using sets consisting of 1x – 50x coverage 2D reads to

improve simulated contigs, GruS B performs similar to PB

and GruS G outperforms both PB and GruS B in terms of

number of contig reduction and amount of errors (Figure 1).

With GruS G gaps were filled with higher accuracy

when using 2D reads to improve draft genomes (data not

shown).

Using 1D reads GruS G is able to reduce contigs, but

introduces errors where PB is unable to reduce contigs (data

not shown).

4. Discussion

GruS outperforms PB because it uses Graphmap, which

is highly optimized for using ONT reads with high error-rate,

allowing it to map more reads to the scaffolds, resulting in an

increase in number of reads identified to span or extend a

gap.

The contribution of more aligned reads improves the

consensus calling and thereby decreasing the amount of

errors introduced.

Improvements to the consensus calling could improve

the accuracy of gap filling when using 1D reads thereby

reducing the number of errors introduced.

References

1. Eid, J., et al., Real-Time DNA Sequencing from

Single Polymerase Molecules. Science, 2009.

323(5910): p. 133-138.

2. Quick, J., A.R. Quinlan, and N.J. Loman, A

reference bacterial genome dataset generated on

the MinION portable single-molecule nanopore

sequencer. Gigascience, 2014. 3(22): p. 10.1186.

3. English, A.C., et al., Mind the gap: upgrading

genomes with Pacific Biosciences RS long-read

sequencing technology. PLoS One, 2012. 7(11): p.

e47768.

4. Sovic, I., et al., Fast and sensitive mapping of

error-prone nanopore sequencing reads with

GraphMap. 2015.

Figure 1. Benchmark results show that using ONT data (2D reads)

GruS G outperforms both PB and GruS B in terms of contig reduction

and amount of errors introduced. X-axis: number of contigs. Y-axis

number of errors

Poster #39

Pathway and network analysis of transcriptomics data from the omental adipose tissue of morbidly obese diabetic individuals

Marianthi Kalafati1,2, Martina Kutmon1,2, Lars MT Eijssen1, Ilja CW Arts2,3, Chris T Evelo1,2

1Department of Bioinformatics, Maastricht University, Maastricht, The Netherlands 2Maastricht Centre for Systems Biology, Maastricht University, Maastricht,The Netherlands

3Department of Epidemiology, Maastricht University,Maastricht, The Netherlands

Email: [email protected], [email protected],

[email protected], [email protected], [email protected]

1. Introduction The prevalence of type 2 diabetes has significantly

increased along with that of obesity worldwide. The

adipose tissue has been recognized as a major endocrine

organ that is associated with obesity related diseases such

as type 2 diabetes. Nowadays every scientist has access to a

vast collection of transcriptomics data that are available in

online repositories. There are plenty of methods for

analyzing these types of data, which can provide insights

into the influence of gene expression on the molecular

level. Pathway and network analysis enables the

visualization of genomics data in the context of known

biological processes. This also allows to capture the

interactions of gene products and metabolites, information

that is important for the interpretation of transcriptomic

results.

2. Materials and Methods A publicly available dataset from Doumatey et al. (2015),

comparing the omental adipose tissue of 20 morbidly obese

subjects (GEO accession: GSE71416), including 14

morbidly obese individuals with T2D and 6 morbidly obese

individuals without T2D, was chosen based on relevance

and data quality.

The raw data of the 20 obese subjects were reanalyzed

with ArrayAnalysis.org, our online microarray quality

control and preprocessing pipeline. Pathway analysis was

performed in PathVisio 3.2.1 (www.pathvisio.org) to

interpret and visualize the molecular changes on a pathway

level. Network analysis was performed with Cytoscape

3.3.0 where all the significantly altered pathways were

merged and visualized.

3. Results In the selected human omental adipose tissue dataset,

statistical analysis showed that 310 genes were differentially

expressed, of these 225 were downregulated and 85

upregulated, in the obese diabetic subjects compared to the

obese nondiabetics. Pathway analysis revealed 8

significantly altered pathways in the WikiPathways human

pathway collection. These significantly altered pathways

were then merged into one combined network. Connection

between pathways was also possible, by 25 nodes present in

more than 2 pathways. The extension with transcription

factorgene interactions from the ENCODE project revealed

new connections between the pathways on a regulatory level.

4. Discussion This approach is an ongoing work that aims to describe

pathway and network analysis methods that will advance the

study of the diabetic omental adipose tissue. The

incorporation of this information in our approach is useful in

understanding the mechanisms affected by disease while

investigating the interaction between pathways involved in

the human omental adipose tissue by connecting them in a

biological network and extending them with additional

knowledge on transcription factor regulation and disease

association.

References: 1. Kutmon, Martina, Chris T. Evelo, and Susan L. Coort. "A

network biology workflow to study transcriptomics data of the

diabetic liver." BMC genomics 15.1 (2014): 971. 2. Doumatey, Ayo P., et al. "Global Gene Expression Profiling in

Omental Adipose Tissue of Morbidly Obese Diabetic African

Americans." Journal of endocrinology and metabolism 5.3

(2015): 199.

Poster #40

Poster #21 Development of pH -insensitive FRET sensors for … · 2016. 9. 27. · Development of...

Documents

Transcript of Poster #21 Development of pH -insensitive FRET sensors for … · 2016. 9. 27. · Development of...