BBCC2015 Bioinformatica e Biologia Computazionale in...

34
BBCC2015 Bioinformatica e Biologia Computazionale in Campania Off the record Abstract Book Consiglio Nazionale delle Ricerche Istituto di Scienze dell’Alimentazione Avellino 4 Dicembre 2015

Transcript of BBCC2015 Bioinformatica e Biologia Computazionale in...

Page 1: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

BBCC2015

Bioinformatica e Biologia

Computazionale in Campania

Off the record

Abstract Book

Consiglio Nazionale delle Ricerche

Istituto di Scienze dell’Alimentazione Avellino

4 Dicembre 2015

Page 2: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

ii

Meeting Chair: Angelo Facchiano CNR-ISA - National Research Council, Institute of Food Science, Avellino, Italy Program Committee: • Angelo Facchiano - National Research Council, Institute of Food Science, Avellino, Italy • Graham Ball - School of Science and Technology, Nottingham Trent University, UK • Olivier Dameron – University of Rennes, France • Pedro José Navarro Álvarez - Institute for Immunology, University Medical Center of the

Johannes Gutenberg University Mainz, Mainz, Germany. • Steffen B. Petersen - Medical Photonics Group, Department of Health Science and Technology,

Aalborg University, Denmark • Vasilis Promponas - Bioinformatics Research Laboratory, Department of Biological Sciences,

University of Cyprus, Nicosia, Ciprus Organizing Committee: Angelo Facchiano (Responsible) – CNR-ISA, Avellino, Italy Anna Marabotti – Università di Salerno and CNR-ISA, Avellino, Italy Eugenio Del Prete – CNR-ISA, Avellino, Italy Serena Dotolo – CNR-ISA, Avellino, Italy Technical support: Clemente Meccariello - CNR-ISA, Avellino, Italy Antonio Ottombrino – CNR-ISA, Avellino, Italy

BBCC2015 is organized with the support of Progetto Bandiera InterOmics

with the patronage of BITS – Società Italiana di Bioinformatica

Page 3: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

iii

Meeting Program

December 4th, 2015 – Avellino

9.30-10.00 Registration and hang poster up

10.00-10.30 Opening and Introduction to the meeting Angelo Facchiano

10.30-10.45 Comparison of gene expression signature using rank based statistical inference Kumar Parijat Tripathi, Sonali Gopichand Chavan, Marina Piccirillo, Sara Magliocca, Seetharaman Parashuraman, Mario R. Guarracino

10.45-11.00

Identification of a regulatory signature in mouse ES cells by reverse engineering of gene regulatory network Teresa M. R. Noviello, Daniela Tagliaferri, Giovanna M. Ventola, Geppino Falco, Luigi Cerulo and Michele Ceccarelli

11.00-11.15 Shared genetic susceptibility to neuroblastoma and congenital heart defects Andrea Cirino, V.A. Lasorsa, P. Pignataro, D. Formicola, A. Iolascon, M. Capasso

11.15-11.30

Strategies and difficulties in assembling highly recombinogenic plant organelle genomes: a case study Concita Cantarella, Rachele Tamburino, Nunzia Scotti, Teodoro Cardi, Nunzio D'Agostino

11.30-11.45 Unraveling zucchini transcriptome response to aphids Alessia Vitiello, D. Scarano, N. D'Agostino, M. C. Digilio, F. Pennacchio, G. Corrado, R. Rao

11.45-12.00 A new GRASP metaheuristic for biclustering of gene expression data Daniele Ferone, Angelo Facchiano, Anna Marabotti, and Paola Festa

12.00-12.15

miRNA and LncRNA genomic association with ATRA treatment: an integrated analysis for non-coding gene expression and H3K9-14 histone acetylation Gianluigi Franci, Monica Franzese, Joost Martens, Claudia Angelini, Lucia Altucci

12.15-12.30

A macroscopic mathematical model for cell migration assay using a real-time technology Claudia Angelini, Francesca Carfora, Maria Vincenza Carriero, Ezio Di Costanzo, Vincenzo Ingangi, Roberto Natalini

12.30-13.00 Discussion: present and future activities in Campania

Page 4: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

iv

13.00-14.00 Lunch

14.00-14.30 Poster discussion

14.30-14.45 GALACTOSEMIA Web DB: A Web-accessible database of Galactosemia-related proteins Antonio d’Acierno, Angelo Facchiano, Anna Marabotti

14.45-15.00 Feature selection on a dataset of protein families: from exploratory data analysis to statistical variable importance Eugenio Del Prete, Serena Dotolo, Anna Marabotti, Angelo Facchiano

15.00-15.15 Structural characterization of the Hepatitis C Virus E2 glycoprotein: computational and experimental approaches. Daniela Barone, Nicole Balasco and Luigi Vitagliano

15.15-15.30

Nutraceutical search through the pipeline of pharmacophore-based virtual screening Amit Dubey, Eugenio Del Prete, Serena Dotolo, Angelo Gaeta, Anna Marabotti, Pramod W. Ramteke and Angelo Facchiano

15.30-15.45 Whole transcriptome investigation of tomato root response to the interaction with the beneficial rhizosphere fungus Trichoderma harzianum Maria Salzano, M. De Palma, N. D’Agostino, M. Lorito, M. Ruocco, M. Tucci

15.45-16.00 Making a genome reference a reference in the fast evolving genomics era Maria Luisa Chiusano

16.00-16.15 Transcription, non-coding, transposons and the evolution of organismal complexity Remo Sanges

16.15-16.30 Discussion and Conclusions

Page 5: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

1

Preface – Introduction to the meeting BBCC2015 - Tenth edition of “Bioinformatics and Computational Biology in Campania” Angelo Facchiano Institute of Food Sciences, National Research Council, Avellino, Italy - [email protected] BBCC represents the meeting point of researchers in the Italian region Campania that have a common interest for bioinformatics, either for working in this field or for the potential application to own “wet biology” research activity. I organize it yearly since 2006, so it reaches now the tenth edition. The motivation for starting this meeting was the consideration of the high number of research institutes in this region, with many teams involved in bioinformatics studies, and many others in biomedical area, as well as in mathematics and informatics, so they may be interested to approach the bioinformatics community. At the end of 2006, I was involved with other colleagues in the organization of the meeting (April 2007, Naples) of the Bioinformatics Italian Society, BITS (www.bioinformatics.it). While the Italian national community of bioinformaticians had the opportunity to meet at the annual meeting, no specific event regularly organized at regional level gave also in Campania the same opportunity to establish a community. The benefit of having also a local event is in the larger opportunity of participation. Therefore, I decided to organize this meeting for the Campania region researchers, also for stimulating the local community in the view of the approaching annual meeting in Naples. In December 2006, the meeting registered the participation of about 100 researchers and students, an unexpected success and a strong motivation to repeat the meeting as an annual appointment. Since that first edition, the meeting has been held in the last months of the year, having every year about 100 participants. The formula of the meeting is very simple: a single-day event, with speakers from different teams and with different expertises, open to the participation of colleagues from other regions and countries that introduced other ideas and suggestions for the community. The tenth edition has received 27 scientific contributions, with Authors from all the main research institutions of Regione Campania, and from other regions and countries, too. This confirms the high level of participation and inclusion of different research teams over the years. This is the origin and ten years of life of the “Bioinformatica e Biologia Computazionale in Campania - BBCC” meeting series. I hope this initiative has contributed to the growth of research activity of research teams in our region, and their integration and collaboration. The future developments of this initiative will depend on the community interest in continuing to meet and work together in this field. During these years, many initiatives were organized in the region, with the involvement of different research teams that participate to the BBCC meetings. I hope that this means that the community is really working (and growing up) together, and that the BBCC meeting series is contributing to this results. This gives a strong motivation to continue in this initiative, looking for new interesting developments and opportunities.

Page 6: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

2

ORAL PRESENTATION

Comparison of gene expression signature using rank based statistical inference Kumar Parijat Tripathi 1, Sonali Gopichand Chavan 2, Marina Piccirillo 1, Sara Magliocca 1, Seetharaman Parashuraman 2, Mario R. Guarracino 1 1 Genomic, Proteomic and Transcriptomic Laboratory, National Research Council of Italy (CNR), Institute for High-Performance Computing and Networking (ICAR), Via Pietro Castellino, 111, 80131, Napoli, Italy. 2 Institute of Protein Biochemistry, National Research Council of Italy (CNR) Via Pietro Castellino, 111, 80131, Napoli, Italy To understand the unique characteristics of biological state or phenotype, it is of vital importance to understand the behavior of global gene expression. In the field of transcriptomics, gene expression patterns under the corresponding phenotypic state could be used as a proxy to determine the physiological and chemical response from the cellular system in an organism to survive and propagate. To understand the biological implication of these gene expression patterns is still an open question. In our research work, we try to implement rank based statistical approach to understand the behavior of gene expression signatures of 22 knock down (perturbed) genes involve in secretory pathways in more than 12 different cancer cell lines. We implemented prototype rank list (PRL’s) of differentially expressed transcripts for these gene perturbation experiments in 12 cancer cell lines. Through comparison of gene expression signature with respect to each perturbation per cell lines, we are able to cluster the knock-down(perturbed) genes based on their gene expression signatures to understand the combined effect of these perturbation. It also helps to understand the cellular mechanism behind a macro-molecular transport system within the cell. Later in our work we also implemented rank-rank hyper-geometric overlap maps (RRHO) for the identification of statistically significant overlapping genes between gene-expression signatures with respect to 22 genes perturbation experiments. Our results shows that the transcriptional response with respect to each perturbation does not have independent behaviour, but some how these perturbation put a combinatorial effects on transcriptional regulation. On the basis of expression signature, these 22 knock-down genes are cate- gorized into 4 clusters and sister perturbation in each cluster have a cumulative role in shaping up the behaviour of cellular system. References: Fei Li, Yang Cao, Lu Han, Xiuliang Cui, Dafei Xie, Shengqi Wang, and Xiaochen Bo. ”Gene-Expression Signature: an R package for discovering functional connections using gene expression signatures ”. OMICS: A Journal of Integrative Biology, 17(2): 116-118, 2013. Seema B. Plaisier, Richard Taschereau, Justin A. Wong, and Thomas G. Graeber. “Rank-rank Hypergeometric Overlap: Identification of Statistically Significant Overlap Between Gene-expression Signatures”. Nucleic Acids Research, 38, no. 17, 2010. Subramanian et al. ”Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles ”. PNAS, vol. 102, no. 43,15545 - 15550, 2005.

Page 7: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

3

ORAL PRESENTATION Identification of a regulatory signature in mouse ES cells by Reverse Engineering of Gene Regulatory Network Teresa M. R. Noviello, Daniela Tagliaferri, Giovanna M. Ventola, Geppino Falco, Luigi Cerulo and Michele Ceccarelli Biogem scarl; Università degli Studi del Sannio; Università degli Studi di Napoli, Federico II Many computational methods are developed to infer the Gene Regulatory Networks (GRNs) from gene expression data, applying the Reverse Engineering (RE) in order to find the Transcription Factor (TF)-Target regulatory interactions. The basic assumption of these algorithms is that causality of transcriptional regulation can be inferred from changes in mRNA expression profiles. Since the performance of RE methods hardly depend on the biological context, it is important to set specific parameters in order to infer a regulatory network that better describes these biological conditions. In this work, we proposed a novel pipeline to discover a regulatory signature in mouse ESCs using an ensemble method of three Unsupervised and one Supervised RE GNR algorithms, based on different statistical models: ARACNE, GENIE3, ZSCORE and SIRENE, respectively. The identification of all the interactions yields to a better understanding of ESC regulation circuits underlying their proprieties, like the pluripotency and self-renewal. Our pipeline: Step 1. We tested each inference method considering two important parameters: an Oracle, a manually-curated gold-standard of validated interactions, and a specific collection of microarray datasets regard to the selected Oracle from GEO. Step 2. The GRN is obtained by keeping only the edges predicted by at least two of the tested algorithms and then the performance is evaluated. Step 3. A Gene Ontology (GO) enrichment analysis has been performed to verify if the inferred network reflects the regulatory pathways of the chosen biological context. Step 4. A Master Regulator analysis (MRa) is applied in order to identify putative key genes (Master Regulators, MRs) involved in the selected biological context whose targets are enriched for a particular gene signature. The enrichment is evaluated using a statistical test (Fisher’s exact test), which returns for each MRs a significant p-value. Step 5. A Promoter Analysis has been performed to find MRs that resulted also to be regulators of driver genes regard on the chosen biological context. The final inferred network is composed by 1589 TFs and 24987 Genes. From the GO enrichment analysis, we are confident that the proposed network is suitable for conducting studies on ESCs. From the MRa, 432 MRs with a FDR<0.05 are obtained and among them, 28 putative MRs have important regulatory roles regard to the chosen biological context. The in-vitro validation of the regulatory signature is still in progress. References: Hache, Hendrik, Hans Lehrach, and Ralf Herwig. "Reverse engineering of gene regulatory networks: a comparative study." EURASIP Journal on Bioinformatics and Systems Biology 2009 (2009): 8. Margolin, Adam A., et al. "ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context." BMC Bioinformatics 7 Suppl 1 (2006): S7. Huynh-Thu, Vân Anh, et al. "Inferring regulatory networks from expression data using tree-based methods." PloS ONE 5 (2010): e12776. Mordelet, Fantine, and Jean-Philippe Vert. "SIRENE: supervised inference of regulatory networks." Bioinformatics 24 (2008): i76-i82. Prill, Robert J., et al. "Towards a rigorous assessment of systems biology models: the DREAM3 challenges." PloS ONE 5 (2010): e9202. Lefebvre, Celine, et al. "A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers." Molecular Systems Biology 6 (2010): 377.

Page 8: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

4

ORAL PRESENTATION

Shared genetic susceptibility to neuroblastoma and congenital heart defects A. Cirino, V. A. Lasorsa, P. Pignataro, D. Formicola, A. Iolascon, M. Capasso Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Università degli Studi di Napoli Federico II, Napoli, Italy and CEINGE Biotecnologie Avanzate, Napoli, Italy Neural crest migration abnormalities can cause the onset of Congenital Heart Defects (CHD) and Neuroblastoma (NB). NB is the most frequent cancer diagnosed during the first year of life and accounts for 13% of all deaths due to childhood malignancies. CHD is the most frequent congenital disorder in newborns, affecting 7 of 1000 live births; it is a major cause of childhood death and long term morbidity. The work’s aim is to explain the potential association between these two phenotypes through discovering shared risk genetic variants. We performed all analysis using summary statistics of GWAS (Genome-wide association study) containing 480279 Single nucleotide polymorphism (SNPs) obtained from 1627 NB patients and 3254 controls and summary statistics of GWAS containing 514950 SNPs obtained from 1759 patients with CHD and 5159 controls. We also considered different CHD subgroups: Ventricular Septal Defect (VSD, 191 cases), Atrial Septal Defect_Patent Foramen Ovale (ASD_PFO, 340 cases), Transposition of Great Arteries (TGA, 207 cases), Conotruncal (151 cases), Left Heart (387 cases). On these datasets, we performed cross-analysis and meta-analysis (using Plink and Metal tools) after filtering all SNPs for P<0.01. We found risk variants in RSRC1 (rs1414518, P=2.46*10-8) which is expressed in the adreal gland and left heart tissues; in band 4p16.2 (rs4689963, P=1.09x10-8) previously associated to ASD_PFO (Cordell et al. 2013); in BARD1 (rs7557557, P=5.15x10-9) which is expressed in the adrenal gland and it was also associated to NB (Capasso et al., 2009). Although there is no certainty about correlation between NB and CHD in literature, our analysis suggest that, at genetic level, the correct migration of neural crest cells is necessary both for heart development and differentiation of the peripheral nervous system cells. We hypothesize that exists a shared genetic susceptibility to NB and CHD. References: Capasso M, Devoto M, Hou C et al., 2009. Common variations in BARD1 influence susceptibility to high-risk neuroblastoma. Nat Genet. 41(6) 718–23. Cordell HJ, Bentham J, Topf A et al., 2013. Genome-wide association study of multiple congenital heart disease phenotypes identifies a susceptibility locus for atrial septal defect at chromosome 4p16. Nat Genet 45(7) 822-4.

Page 9: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

5

ORAL PRESENTATION Strategies and difficulties in assembling highly recombinogenic plant organelle genomes: a case study Concita Cantarella 1, Rachele Tamburino 2, Nunzia Scotti 2, Teodoro Cardi 1, Nunzio D'Agostino 1 1 Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria, Centro di ricerca per l'orticoltura (CREA-ORT) 2 Consiglio Nazionale delle Ricerche, Istituto di Bioscienze e BioRisorse (CNR-IBBR) Mitochondrial genomes in plants are larger and more complex than in other eukaryotes due to their recombinogenic nature as widely demonstrated. The mitochondrial DNA (mtDNA) is usually represented as a single circular map, the so-called master molecule. This molecule includes repeated sequences, some of which are able to recombine, generating sub-genomic molecules in various amounts, depending on the balance between their recombination and replication rates. Recent advances in DNA sequencing technology gave a huge boost to plant mitochondrial genome projects. Conventional approaches to mitochondrial genome sequencing involve extraction and enrichment of mitochondrial DNA, cloning, and sequencing. Large repeats and the dynamic mitochondrial genome organization complicate de novo sequence assembly from short reads. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality (fewer gaps and longer contigs). However, recently published articles revealed that PacBio sequencing is still not sufficient to address mtDNA assembly-related issues. Here we present a preliminary hybrid assembly of a potato mtDNA based on both PacBio and Illumina reads and debate the strategies and obstacles in assembling genomes containing repeated sequences that are recombinationally active and serve as a constant source of rearrangements. This work was funded by Ministry of Education, University and Research, PON Project GenHORT (PON02_00395_3215002) and is supported by the PON R&C 2007-2013 grant funded by the Italian Ministry of Education, University and Research in cooperation with the European Funds for the Regional Development (FESR).

Page 10: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

6

ORAL PRESENTATION

Unraveling zucchini transcriptome response to aphids A. Vitiello 1, D. Scarano 1, N. D'Agostino 2, M. C. Digilio 1, F. Pennacchio 1, G. Corrado 1, R. Rao 1 1 Department of Agriculture, University of Naples Federico II, Via Universita' 100, 80055 Portici (Italy) 2 Centro per la ricerca in agricoltura e l'analisi dell'economia agraria, Centro di ricerca per l'orticoltura (CREA-ORT), Via Cavalleggeri 25, 84098 Pontecagnano (Italy) Cucurbita pepo belongs to the Cucurbitaceae, the second-most large horticultural family of economic importance after Solanaceae. One major issue related to zucchini cultivation is the damage caused by aphids such as Aphis gossypii (Homoptera: Aphididae). The aim of this study is the identification of candidate genes involved in zucchini plant response to A. gossypii. In order to monitor the effect of zucchini-aphid interaction at transcriptomic level, zucchini plants (cv “San Pasquale”) were grown in controlled conditions in presence or absence of A. gossypii. Leaf material was collected at 24, 48 and 96 hours after aphid infestation. RNA extracted was sequenced using the Illumina HiSeq 2500 platform. The sequencing generated ~34 million of paired-end reads of 100 nucleotides in length per sample. High quality reads were de novo assembled into 71,648 transcripts (average length of 1331 nts). About 94% of the assembled transcripts contain coding sequences that could be translated into proteins. Over 60% of the transcripts were functionally annotated and assigned to one or more InterPro domains and Gene Ontology terms. A subset of 42,517 sequences of the C. pepo transcriptome was used for read mapping and differentially expressed genes (DEG) identification. Largest number of DEG were observed after 48 hours from aphid infestation. The transcriptome represents a high-quality reference for read alignment and DEG call. The understanding of the molecular response of infested plants will be essential to develop new tools for A. gossypii control.

Page 11: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

7

ORAL PRESENTATION

A new GRASP metaheuristic for biclustering of gene expression data Daniele Ferone 1, Angelo Facchiano 2, Anna Marabotti 2,3, and Paola Festa 1 1 University of Napoli Federico II, Dept. of Mathematics and Applications, 80126 – Napoli, Italy 2 CNR, Institute of Food Science, 83100 – Avellino, Italy 3 University of Salerno, Dept. of Chemistry and Biology, 84084 – Fisciano (SA), Italy The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2, 3, 4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data. References 1. Y. Cheng and G. Church. Biclustering of Expression Data, Proc. Int. Conf. Intell. Syst. Mod. Biol., 8, 93-103 (2000). 2. T. A. Feo and M. G. C. Resende. Greedy Randomized Adaptive Search Procedures, J. Global Optim., 6, 109-134 (1995). 3. P. Festa and M. G. C. Resende. An annotated bibliography of GRASP - Part I: Algorithms, Int. Trans. Oper. Res., 16(1), 1–24, (2009). 4. P. Festa and M. G. C. Resende. An annotated bibliography of GRASP - Part II: Applications, Int. Trans. Oper. Res., 16(2), 131–172, (2009). 5. K. Eren, M. Deveci, O. Küçüktunç, and Ü. V. Çatalyürek. A comparative analysis of biclustering algorithms for gene expression data, Brief. Bioinform., 1–14, (2012).

Page 12: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

8

ORAL PRESENTATION miRNA and LncRNA genomic association with ATRA treatment: an integrated analysis for non-coding gene expression and H3K9-14 histone acetylation Gianluigi Franci 1, Monica Franzese 2, Joost Martens 3, Claudia Angelini 2, Lucia Altucci 1 1 Dipartimento di Biochimica, Biofisica e Patologia Generale, Seconda Università di Napoli, Italy 2 Istituto per le Applicazioni del Calcolo, CNR, Napoli, Italy 3 Department of Molecular Biology, Faculty of Science, Nijmegen Centre for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands Epigenetic modifications influence gene expression and involve cellular processes in several abnormal events leading to oncogenesis. In particular, histone acetylation and methylation result in silencing or activation of specific patterns in malignant blasts. Moreover, treatment by HDAC inhibitors (HDACi), such as ATRA, has been shown to induce re-expression of previously silenced genes and restore de-regulated patterns. In this work, we consider acute promyelocytic leukemia (APL), almost a curable disease by all-trans retinoic acid (ATRA)-based induction therapy followed by two or three courses of consolidation chemotherapy. Actually, more or less than 90% of newly diagnosed patients with APL have complete remission and over 70% of them are curable [1]. We aim to understand biological bases of why there are some patients that are non-responsive to ATRA treatment. In order to answer this question, we use genome-wide epigenetic studies. We investigated the effects of ATRA on histone acetylation modulation in order to restore or not de-regulated genes and subsequently investigated other biological processes involved in ATRA treatment at genomic level. Data are generated from ChIP-seq and RNA-seq experiments in human promyelocytic leukemia cell line NB4. They are analyzed together and integrated, in order to correlate patterns of epigenetic modifications with changes in gene expression and also to evaluate specific cellular processes of interest through a functional enrichment analysis. For the ChIP-seq data, we analyzed several epigenetic modulators in NB4 cell line at H3K9-14ac level compared to ATRA treatment at different times: 4h and 24h. ChIP-Seq analysis identified specific regulated regions in time dependent manner. From the integration with RNA-seq data, we associated these differentially enriched regions, increased and decreased in ATRA treatment, to the presence of miRNA and LncRNA. Our final hypothesis is centered on the investigation of non-coding regions involvement in the non-responsive patients, in order to restore de-regulated patterns. Additional data regarding other epigenetic drugs (SAHA and MS) at 4 hours are also considered. They are included in the analysis to compare the epi-drugs responses in order to change the chromatin state depending on time. References 1. R. Ohno, N. Asou, K. Ohnishi. Leukemia (2003) 17, 1454–1463. doi: 10.1038/sj.leu.2403031.

Page 13: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

9

ORAL PRESENTATION

A macroscopic mathematical model for cell migration assay using a real-time technology Claudia Angelini 1, Francesca Carfora 1, Maria Vincenza Carriero 2, Ezio Di Costanzo 1, Vincenzo Ingangi 2, Roberto Natalini 3 1 Istituto per le Applicazioni del Calcolo "M. Picone", Consiglio Nazionale delle Ricerche Via Pietro Castellino, 111 80131 - Napoli 2 Department of Experimental Oncology Unit, IRCCS Istituto Nazionale Tumori “Fondazione G. Pascale”, via M. Semmola, 80131 – Napoli 3 Istituto per le Applicazioni del Calcolo "M. Picone", Consiglio Nazionale delle Ricerche Via dei Taurini, 19 00185 - Roma Classical experiments of cell migration and chemotaxis assay are performed in the so called Boiden Chamber. In this assay the cell motility can be estimated, at the end of the experiment, measuring the fraction of cells which passes through a porous membrane interposed between two vertical chambers: a upper chamber containing a cell density, and a lower chamber with a serum or chemoattractant. A recent technology, xCELLigence Cell Analysis System, has allowed to monitor, in a similar experiment, the cell migration in real time, relying on a micro-electronic biosensor built under the membrane into the bottom well. The sensor, measuring an electrical impedance variation, gives quantitative real-time information about the status of the cells, including cell number, viability and morphology [1, 2]. Mathematical models can be very useful to model a wide variety of biological systems including cell dynamics. In relation to the above-mentioned experimental assay, we present a macroscopic mathematical model of cellular transport through a porous membrane [3]. The model relies on convection-reaction-diffusion partial differential equations for both a cell density and a chemical species. We show numerical simulations illustrating the dynamics of the model, depending on the initial condition of the experiment, and we compare behaviour of the simulations with real-time experimental data on different cell lines. References: [1] xCELLigence -- Real Time Cell Analysis System (Acea Biosciences, distributed by Roche Diagnostics), www.roche-applied-science.com [2] R. Limane, A. Wouters, et al. (2012), Comparative Analysis of Dynamic Cell Viability, Migration and Invasion Assessments by Novel Real-Time Technology and Classic Endpoint Assays. Plos One 7(10), e46536. [3] C. Angelini, F. Carfora, M. V. Carriero, E. Di Costanzo, V. Ingangi, R. Natalini (2015), A macroscopic mathematical model for cell migration assay using xCELLigence Real Time Cell Analysis, in preparation.

Page 14: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

10

ORAL PRESENTATION

GALACTOSEMIA Web DB: A Web-accessible Database of Galactosemia-related proteins Antonio d’Acierno 1, Angelo Facchiano 1, Anna Marabotti 1,2 1 Institute of Food Science, CNR, Avellino 83100, Italy; 2 Department of Chemistry and Biology, University of Salerno, Fisciano (SA) 84084, Italy Galactose is a monosaccharide present in several foods and, once introduced into the body, it is metabolised by a biochemical pathway involving three enzymes: galactokinase (GALK), galactose-1-phosphate uridylyltransferase (GALT), and UDP-galactose-4'-epimerase (GALE). Hereditary deficiencies of these three enzymes in humans are related to three different forms of the genetic disease globally called "galactosemia". The impairment of GALK causes Galactosemia Type II, whereas GALT deficiency causes the disease called Classic Galactosemia, and finally GALE deficiency is linked to Galactosemia type III or Galactose Epimerase deficiency. The clinical manifestations of each enzyme deficiency differ markedly: patients with GALK deficiency, for example, have the mildest clinical consequences, while Classic Galactosemia is potentially lethal in infancy, if undiagnosed and/or untreated, and is also associated with long-term, organ-specific complications. The impairment of these enzymes is linked to the presence of mutations in their genes. The most common ones are missense mutations, causing the replacement of a residue on the protein sequence with another one. This kind of mutation can have different effects depending on whether the original residue is replaced with a very similar or very different one, and depending on the place where the original residue is located on the protein structure. It has been shown elsewhere that it is possible to infer the severity of a mutation by using computational approaches that can predict its impact on protein structure and function, provided that the structure is known. This kind of knowledge can be thus of help to correlate the severity of symptoms with the effect at protein level, to better understand and, possibly, to predict, the outcome of a mutation on individuals carrying it. The aim of the proposed web-accessible database is to collect and provide information about the predicted structural and functional effects of mutations of GALK, GALT and GALE enzymes linked to the different forms of galactosemia, in order to help researchers to reach a deeper comprehension of these genetic diseases. At time of writing, 257 variants are stored in the DB; data can be searched in many ways as several kinds of (combinable) filters (basic, structural, by interactors and by type of interactions) have been implemented. Moreover, information on wild-type enzymes is also available.

Page 15: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

11

ORAL PRESENTATION Feature selection on a dataset of protein families: from exploratory data analysis to statistical variable importance Eugenio Del Prete 1, Serena Dotolo 1, Anna Marabotti 1,2, Angelo Facchiano 1

1 Istituto di Scienze dell’Alimentazione, CNR, Via Roma 64, 83100 Avellino 2 Dip. di Chimica e Biologia, Università degli Studi di Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA) INTRODUCTION. Proteins are characterized by several typologies of features, especially structural, geometrical and energy ones. Within a protein family, most of these features are expected to be similar. We are interested to identify which features can be useful to identify proteins that belong to a given family, as well as to define the boundaries among families. Some of these features (or variables) are redundant, that is, their information is not compulsory for the characterization of a protein family. Moreover, they could generate noise in identifying which of these variables are essential as a fingerprint and, consequently, if they are related or not to a function of a protein family. In this work, we defined an original approach to analyze protein features in the view of defining their relationships and peculiarities within protein families. METHODS. This work has a multi-step approach, principally performed in R environment. a) Getting and Cleaning Data. Ten protein families have been chosen by their CATH classification, in order to consider different architectures, with rules over the number of structure, the length of the sequence and the choice of the chain. Protein properties investigated are secondary structures, hydrogen bonds, accessible surface areas, torsion angles, packing defects, number of charged residues, free energy of folding, volume and salt bridges. b) Exploratory Data Analysis. Kernel density estimation overcomes the discreteness of the features histograms, helping in visualization of their distribution and in discovering possible unusual multimodal profiles. Pearson’s correlation highlights statistical links between pairwise variables and Pearson’s distance gives the possibility to obtain a dendrogram with a clusterization of the features. Principal component analysis clusterizes the protein by their family and it detects possible outliers, whereas sparse principal component analysis performs a sort of feature selection. c) Predictive modeling for classification. Many classification algorithms have been used: decision trees (classical, boosting and bagging), support vector machines (flexible discriminant analysis), centroid (nearest shrunken). The interest is on their variable importance estimation. A split in training set (70% of data) and testing set (30% of data) has been chosen, with a 10-fold cross validation applied over the training set, repeated for ten times. Accuracy, kappa coefficient, sensitivity and specificity have been calculated for each methods. RESULTS. From the density plots, the percentage of mostly buried residues mostly buried percentage is significantly different for each family. Dissimilarity dendrogram shows separated clusters for secondary structures, torsion angles, defects and geometrical features. From the features network, torsion angles and surface variables result as peripheral (i.e. redundant) from the core of the graph. Principal component analysis biplot gives a good clustering for the protein families and sparse principal component analysis confirm dendrogram results. Unifying classification results with the previous ones, these features are typical for our dataset: helix, strand, coil, turn, hydrogen bond, polar and charged accessible surface area, volume and residue buried for the most part. To be thorough, random forest algorithm has the best performance values, in accordance with the features and the dataset.

Page 16: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

12

CONCLUSIONS. Graphical multivariate procedures are good tools for the characterization and the investigation of possible fingerprints about the protein families. Predictive models for classification help in performing feature selection, even by means of variable importance estimation. In the future, the use of multivariate regression models and the increase of the protein families number could improve our work. ACKNOWLEDGMENTS. This work is partially supported by the Flagship InterOmics Project (PB.P05, funded and supported by the Italian Ministry of Education, University and Research and Italian National Research Council organizations). References Del Prete E, Dotolo S, Marabotti A, Facchiano A: “Statistical analysis of protein structural features: relationships and PCA grouping”. Lecture Notes in Computer Sciences, 8623, 33-43, 2015 Grömping U, “Variable Importance Assessment in Regression: Linear Regression versus Random Forest”, The American Statistician, 63, 308-319, 2009 Kuhn M, “Building Predictive Models in R Using the caret Package”, Journal of Statistical Software, 28 (5), 2008, URL http://topepo.github.io/caret/index.html Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, Lees JG, Lewis TE, Studer RA, Rentzsch R, Yeats C, Thornton JM, Orengo CA, “New functional families (Fun-Fams) in CATH to improve the mapping of conserved functional sites to 3D structures”, Nucleic Acids Research, 41(Database issue), D490-D498, 2013. Zou H, Hastie T, Tibshirani R, “Sparse Principal Component Analysis”, Journal of Computational and Graphical Statistics, 15, 265-286, 2006.

Page 17: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

13

ORAL PRESENTATION Structural characterization of the Hepatitis C Virus E2 glycoprotein: computational and experimental approaches. Daniela Barone 1,2, Nicole Balasco 1,2 and Luigi Vitagliano 1 1 Istituto di Biostrutture e Bioimmagini, Via Mezzocannone 16, Napoli 2 Dipartimento di Scienze e Tecnologie Ambientali, Biologiche e Farmaceutiche, Seconda Università degli studi di Napoli, Via Vivaldi 43, Caserta Hepatitis C virus (HCV) infection is a major cause of chronic liver disease worldwide. Although effective therapeutic approaches, based on specific inhibitors of HCV proteins NS3/4A and NS5B, have been recently discovered, their use is limited by the elevated costs of these drugs. Currently, there is neither an effective immune globulin for prophylaxis nor a vaccine for the prevention of hepatitis C. A particularly attracting target is represented by the immunogenic E2 glycoprotein, a key factor for HCV entry in host cells. We have recently undertaken studies aimed at evaluating the potential of some regions of the protein as vaccine candidates (1). In this framework we here investigated the structural/dynamic features of the E2 protein, whose structure has been recently solved by two independent groups in complex with antibodies (2,3). Molecular dynamics simulations carried out on the protein core provided interesting information on both global dynamics of the protein and on local features of important regions. In particular, our study highlights a remarkable structural plasticity of the epitope II region, a target of several neutralizing antibodies. In a parallel investigation, we evaluated the dynamics properties of the peptide corresponding to the epitope I (residues 412-422). A combined experimental/computational analysis shows that this region is endowed with an elevated structural versatility of this region. Interestingly, our Replica Exchange Molecular Dynamics Simulations were able to capture, among others, all the states that this region (â-hairpin, single turn, double turn) adopts when complexed with different antibodies. Collectively these findings provide useful information for future studies aimed at designing anti-HCV vaccines. References 1. Sandomenico et al. J. Of Virology under revision; 2. Kong et al, Science 2013; 3. Khan et al, Nature, 2014.

Page 18: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

14

ORAL PRESENTATION

Nutraceuticals search through the pipeline of pharmacophore-based virtual screening

Amit Dubey 1,2,3, Eugenio Del Prete 2, Serena Dotolo 2, Angelo Gaeta 2, Anna Marabotti 2,4, Pramod W. Ramteke 1 and Angelo Facchiano 2

1 Jacob School of Biotechnology and Bioengineering, Sam Higginbottom Institute of Agriculture, Technology and Sciences, Allahabad- 211007 (India). 2 Istituto di Scienze dell'Alimentazione – CNR, via Roma 64 – Avellino-83100 (Italy). 3 International Centre for Genetic Engineering and Biotechnology, AREA Science Park Padriciano 99, Trieste-34149 (Italy). 4 Dipartimento di Chimica e Biologia, Università degli Studi di Salerno, Via Giovanni Paolo II 132, Fisciano-84084 (SA) (Italy). Nutraceuticals are food or their parts, present in conventional or non-conventional form, with verified safety and health benefits, beyond their nutritional value. In this work, we describe a novel pipeline for nutraceutical compounds research in the field of pharmacophore screening, providing a new idea for drug discovery. In the first step, to identify novel nutraceuticals potentially active as inhibitors of a given enzyme, a pharmacophore model is generated, with its key chemical features, starting from the experimental structure of the complex with known protein inhibitors, with pharmacophores ranking based on statistical values of sensitivity and specificity. After the validation step, this pharmacophore model is used for 3D structural screening and mapping against a subset of known nutraceutical compounds, generated through DrugBank or against special subsets from ZINC (ZINC Drug Database - Zdd and ZINC In Man - Zim). Moreover, molecular docking is performed to verify binding affinity of compounds. The hits with a good binding energy are then investigated in more details, compared with their pharmacophore features and analysed for their interacting residues. Then, to have an in silico interpretation of the potential activity of the compounds, an integrated investigation is performed, by mining literature reports about the effects of the specific compound (or food containing it) against human diseases, extracting expression data from omics repositories, in the view of integrating these information with molecular pathways and networks. Output of our pipeline are candidates for in vitro and in vivo experiments, to test the hypothesis and verify if they could become novel potential drugs.

Page 19: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

15

ORAL PRESENTATION Whole transcriptome investigation of tomato root response to the interaction with the beneficial rhizosphere fungus Trichoderma harzianum Salzano M. 1, De Palma M. 1, D’Agostino N. 2, Lorito M. 3, Ruocco M. 4, Tucci M. 1 1 CNR–Consiglio Nazionale delle Ricerche, Istituto di Bioscienze e BioRisorse, via Università 133, 80055 Portici, Italia 2 Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria - Centro Di ricerca per l'orticoltura (CREA-ORT), via Dei Cavalleggeri 25, 84098 Pontecagnano, Salerno, Italia 3 Università degli Studi di Napoli “Federico II”- Dipartimento di Agraria, Via Università 100, 80055 Portici, Napoli, Italia 4 CNR-Istituto per la Protezione delle Piante sez. Portici, Via Università 130, 80055 Portici, Napoli, Italia The globally pressing need for increasing agriculture yield while reducing environmental impact has promoted the utilization of sustainable strategies based on natural biocontrol agents. Among beneficial microbes, rhizosphere-competent Plant-Growth-Promoting Fungi have inspired great interest. In this scenario, beneficial strains of Trichoderma have proved to be very effective. Besides, data obtained on this model through different “omics” approaches can be effectively exploited in agriculture activities through “translational research”. Our work investigated the transcriptional response of tomato roots (Solanum lycopersicum cv. ‘Crovarese’) to T. harzianum strain T22 at early stages of interaction (24, 48 and 72 h), using a Next Generation Sequencing approach (RNA-seq). We found an intense remodelling of transcriptome already after 24 h, with more than 75% (938/1243) of the detected Differentially Expressed Genes (DEGs). By contrast, more limited effects on gene expression were observed at 48 and 72 h (80 and 376 DEGs, respectively). Enrichment analysis of the 24 h DEGs based on Gene Ontology (GO) showed down-regulation of “defence response” and “cell wall organization” activities as well as induction of macromolecules “metabolism”, “transport” and “localization”. Data inspection via the MapMan tool highlighted considerable enrichment of bin 35 (i.e. “not assigned”) over the three time points. Within this category, an interesting presence (155) of proteins with unknown functions was noted. Phylogenetic analysis evidenced, in particular, the clustering of 9 of them, sharing high sequence identity and a &#61543;-thionin domain. Moreover, MapMan annotation assigned 151 DEGs to bin 27 (“RNA-related”), highlighting the activation of RNA-directed DNA methylation (RdDM) mechanisms in tomato roots in response to the fungus. Along with plant transcripts, RNA-sequencing of roots also identified 448 Trichoderma genes, which were searched for differential expression between 24 vs 48 h and 48 vs 72 h. GO enrichment analysis of the differentially expressed genes revealed that, at 24-48 h, macromolecules metabolic processes were affected; on the other hand, “response to stimulus”– related, as well as cellular processes (“communication”, “amino acid metabolism”) and their regulation, were involved at 48-72 h. Taken together, our data were used to develop a model of the tomato root response (within 72 h) to stimulation by the beneficial fungus T. harzianum.

Page 20: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

16

ORAL PRESENTATION

Making a genome reference a reference in the fast evolving genomics era Maria Luisa Chiusano Universita' di Napoli Federico II The spreading of “omics” efforts, further pushed by the introduction of highly processive sequencing technologies (NGS), strongly impacted on biology and associated research favoring unexpected resolutions by cost-affordable and fast experimental technologies. This enhanced the need for suitable strategies for centralized data maintenance and for specialized, user friendly, stabilized tools and resources for data analysis, integration and interpretation. Major trends in bioinformatics are therefore becoming evident. Scientific challenges, mainly aiming to solve the primary structure of genomes from different species, genotypes, single cells, in diverse biological fields, including medicine, agriculture and ecology are led by international consortiums and/or fully equipped sequencing centers, accompanied by high throughput bioinformatics facilities. They produce new genome sequences endowed with a preliminary gene annotation that, though still drafts, are often published in high impact journals without care for further curation. Therefore, the high production rate is not permitting to reach adequate quality standards and, often, data are not immediately profitable to support associated analyses and/or “non bioinformatics expert” users. On the other hand, the accessibility to sequencing is favoring community specific efforts, giving place to multifaceted, not integrated results. Moreover, bioinformatics tools and omics resources are being produced too fast, not always favoring the establishment of reliable standards. This affects the offer of stable, comparable, integrated reference results and highlights the importance of bioinformatics dedicated to the establishment of end-users resources. We will discuss the efforts we are facing to make the public reference of the tomato genome a suitable reference for post genomics analyses.

Page 21: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

17

ORAL PRESENTATION

Transcription, non-coding, transposons and the evolution of organismal complexity Remo Sanges Stazione Zoologica Anton Dohrn, Napoli, Italy The genomes of living organisms contain a significant fraction of non-coding RNAs (ncRNAs) and transposable elements (TEs). Their amount can vary substantially among different species, but researchers are beginning to demonstrate that the more an organism is complex (it presents many different kind of cells), the higher is the percentage of such elements into the genome. The most complex organ known in nature is the brain, which contains the highest number of different cells types. These elements have been demonstrated to be mainly active in the brain and therefore have been proposed to be important for cognitive abilities. TEs and ncRNAs present subgroups of recently evolved elements and their expansion correlate with cellular complexity. The importance of them relies in the fact that, despite the total number of protein coding genes is relatively stable within all the animals, the number of ncRNAs and TEs appear to be directly proportional to the cellular complexity. We are trying to understand the evolution of these elements to better understand how complexity has evolved, developing specific bioinformatics pipelines to evaluate how these features shaped the eukaryotic genomes.

Page 22: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

18

POSTER Comparing the fluctuations of the intrinsically disordered C-terminal domain in SELK in water or in lipid membrane Andrea Polo 1, Stefano Guariniello 2, Giovanni Colonna 1, Gennaro Ciliberto 3, Susan Costantini 4 1 Servizio di Informatica Medica, Azienda Ospedaliera Universitaria, Seconda Università di Napoli, Napoli, Italy 2 Dottorato in Biologia Computazionale, Dipartimento di Biochimica, Biofisica e Patologia generale, Seconda Università degli Studi di Napoli, Napoli, Italy 3 Direttore Scientifico, Istituto Nazionale per lo studio e la cura dei tumori “Fondazione G. Pascale”- IRCCS, Napoli, Italia 4 CROM, Istituto Nazionale Tumori “Fondazione G. Pascale” - IRCCS, Napoli, Italia SELK is a single-pass trans-membrane protein that resides in the endoplasmic reticulum membrane (ER) with a C-terminal domain exposed to the cytoplasm that is known to interact with different components of the endoplasmic reticulum associated to the protein degradation (ERAD) pathway. This protein is resulted to be up-expressed in hepatocellular carcinoma and in other cancers. In this work we performed a detailed analysis of the C-terminal domain sequence of SELK, modeled its three-dimensional structure and analyzed its conformational changes by Molecular Dynamics simulations. Our analysis showed that the C-terminal domain of SELK is a weak polyelectrolyte and specifically, as a polycation, and has the characteristic molecular signature of natively disordered segments. Since BLAST search has not evidenced possible templates with an acceptable sequence identity percentage with the C-terminal sequence of SELK, its three-dimensional structure was modeled by ab initio modeling. It is characterized by one short helix and the most part of residues that did not present regular secondary structure elements. This model was subjected to MD simulation at neutral pH in water. To deepen the structural analysis of the C terminal domain, we have studied also the organization of the whole protein in the membrane using a procedure combining comparative modeling, fold recognition and folding ab initio. Then, the complete structure of SELK was subjected to MD simulations in a system composed by lipid bilayer and water molecules. Analyzing the obtained trajectories we can underline that the C-terminal domain of SELK moves much more during the MD simulation in lipid bilayer and water by showing a decrease of the structural compactness, a lesser number of H-bonds, and a higher value of the total void volume and the total solvent accessible area. However, in both the simulations this region is stabilized by an marked number of H-bonds, and pi-cation and IAC interactions. In overall, these data suggest that the water molecules tend to cluster around the protein facilitating its compactness.

Page 23: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

19

POSTER

Toward the identification of genetic determinants of breast cancer immune responsiveness I. Simeone 2,3,4, W. Hendrickx 1, S. Anjum 3, L. D. Miller 5, H. Bensmail 3, E. Wang 1, F. M. Marincola 1, M. Ceccarelli 2,4, D. Bedognetti 1 1 Division of Translational Medicine, Sidra Medical and Research Center, Doha, Qatar 2 Department of Science and Technology, University of Sannio, Benevento, Italy 3 Qatar Computing Research Institute (QBRI), Qatar Foundation, Doha, Qatar 4 Bioinformatics Laboratory, BIOGEM, Ariano Irpino, Avellino, Italy 5 Wake Forest School of Medicine, Winston Salem, NC, USA Overlapping immune signatures are observed among cancers with a better prognostic connotation and those with an increased likelihood to respond to immunotherapeutic approaches. Such signatures qualitatively overlap with those detected during other conditions of immune-mediated tissue destruction such as flares of autoimmunity or allograft rejection. These pathways reflect a process characterized by the coordinated activation of interferon stimulated genes (ISGs), the recruitment of cytotoxic cells through the production of specific chemokine ligands (CXCR3 and CCR5 ligands), and the activation of immune effector function (IEF) genes. We refer to these genes as the Immunologic Constant of Rejection (ICR). Here, we tested up-front the prognostic role of 20 ICR genes using a multilevel bioinformatics approach based on the analysis of 1097 breast cancer (BRCA) samples retrieved from The Cancer Genome Atlas (TCGA) database. The samples have been conveniently filtered to exclude both male patients and which with no RNASeq data, no clinical data, unclear histology, neo-adjuvant therapy and history of malignancy (N = 12, 3, 9, 1, 13, 66 respectively) resulting in a dataset of 1002 patients, 904 and 995 of which have also mutation and copy number (CN) data, respectively. By mining copy number variation, gene-expression, and exome sequencing data (somatic mutation) we show that the 20 ICR genes can segregate breast cancers in different immune phenotypes characterized by distinctive prognostic connotations and specific somatic alterations.

Page 24: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

20

POSTER Preliminary computational analysis of the influence of Type 2 Diabetes on myocardial post-infarction transcriptome. Antonio Federico 1,2, Carla Pollastro 1,2, Carmela Ziviello 1, Marianna Aprile 1, Maria Luisa Balestrieri 3, Valerio Costa 1, Giuseppe Paolisso 4, Raffaele Marfella 4 and Alfredo Ciccodicola 1,2 1 CNR, Institute of Genetics and Biophysics “Adriano Buzzati-Traverso”, Naples, Italy 2 Department of Science and Technology, University of Naples “Parthenope", Naples, Italy 3 Department of Biochemistry, Biophysics and General Pathology, Second University of Naples, Naples, Italy 4 Department of Medical, Surgical, Neurological, Aging and Metabolic Sciences, Second University of Naples, Naples, Italy Type 2 diabetes is one of the main causes of mortality worldwide. Since diabetes is a degenerative chronic disease, it can strongly modify individual’s life. Indeed, the presence of a comorbidity with heart failure, is an important clinical variable that can profoundly affect the response to myocardial infarction. In this study, we have evaluated the effects of type 2 diabetes on the transcriptome of post-infarction myocardial tissue. In detail, we performed a pilot transcriptome study by RNA-Sequencing on human biopsies of infarcted myocardial tissues in diabetic and euglicemic patients. Computational analysis allowed us detecting differentially expressed genes, particularly belonging to cytokine-cytokine receptor interaction and chemokine signaling pathways, that are notoriously associated to myocardial inflammatory and reparative processes. Differential expression was also evaluated by RT-qPCR. Overall, we analyzed novel unannotated transcripts in all myocardial specimens, searching for new alternative splicing events occurring in genes already associated to both diabetic and post-infarction conditions. Using a customized computational pipeline, we identified potentially novel lncRNAs and analyzed alternative 3’ UTRs, containing putative novel miRNA binding sites. Therefore, this pilot study represents a good starting point for the identification of new potential markers of myocardial post-infarction rescue, in presence of a severe and very diffuse metabolic co-morbidity.

Page 25: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

21

POSTER Comparative analysis of a set of antibody sequences recognizing the celiac autoantigen type 2 transglutaminase Bianca Rocco 1, Daniele Sblattero 2 and Romina Oliva 1 1 Department of Sciences and Technologies, University “Parthenope” of Naples, Italy; 2 Department of Life Sciences, University of Trieste, Italy. We analysed a set of over 100 antibody sequences shown by ELISA essays to bind to type 2 transglutaminase (TG2), the main celiac autoantigen. In this set of antibodies, different VH chains, all belonging to the IGHV5 gene family, have been selected on pairing with a limited set of four VL chains, three of kappa and one of lambda type. To the aim of highlight possible peculiarities of these anti-TG2 antibody sequences, we analysed them in terms of sequence identity and especially of length and conformation of the six hypervariable loops. Since we observed a limited variability in the length, conformation and composition of loops L1-L3 (except for the lambda sequence) and of loops H1-H2, we especially focused on the length and composition of loop H3, usually playing a key role in the antigen recognition. We could thus identify a “consensus” H3 sequence, present in 40% of sequences, which is 12 residues long and features specific amino acids at four loop positions. Furthermore, we compared results of these analyses with those of analogous analyses we performed on a recently reported extended repertoire of about 6500 antibody sequences (with VH belonging to the IGHV5 family) from three healthy donators [1], assumed as a negative set. Although the average H3 length is similar in the two sets, our analyses revealed a clear bias in terms of preferred length and composition of the loop. By an Hidden Markov Model (HMM) approach, we then generated ideal H3 “positive” sequences corresponding to the above consensus sequence, which have been inserted in a scaffold celiac antibody and are now under experimental testing for binding to TG2. References [1] DeKosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, Ellington AD, Georgiou G. (2015) Nature Med 21:86-91.

Page 26: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

22

POSTER

Novel Bioinformatic Tools for NGS data Analysis and Integration Righelli D., Franzese M., Russo F., Di Filippo L., Angelini C. Università degli Studi di Salerno CNR - Istituto per le Applicazioni del calcolo "M. Picone" Università degli studi di Napoli "Federico II" The widespread diffusion of NGS techniques is requiring to analyse and integrate big amounts of -omics data, like RNA-Seq, ChIP-Seq, BS-Seq, Hi-C data, etc. To satisfy this need, several tools are born, each oriented to address one or few steps of the analysis. Our group is developing novel statistical methods and user-friendly tools, like RNASeqGUI, for supporting both expert and non-expert users in the analyses of these kinds of data. Our most recent tool is called IntegrHO (Integration of High-throughput Omics data). IntegrHO is a (work-in-progress) tool written in R and Shiny, aimed to analyse a large variety of specific omic data and to integrate heterogeneous multi-omic data. Current implementation includes ChIP-Seq data analysis pipeline and its integration with RNA-Seq data. In the near future IntegrHO will include other pipelines, each one useful to analyse a specific omic data, and it will further exploit the use of machine-learning approaches for data integration. Another tool under development is Hi-CeekR (which in the future will be encapsulated as specific pipeline in IntegrHO), aimed to analyse Hi-C data in order to study long-range interactions. Finally, we stress that all our tools support the Reproducible (computational) Research. As a result, all actions and steps are automatically recorded in a report and all the data produced are stored as singular database files. References - Russo F., Righelli D. and Angelini C. - Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments - under review - Russo F., Angelini C. - RNASeqGUI: A GUI for analyzing RNA-seq data. Bioinformatics. 2014. 30(17): 2514-2516.

Page 27: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

23

POSTER

Var2Go: a web-based tool for gene variants selection Ilaria Granata, Mara Sangiovanni, Francesco Maiorano, Marco Miele and Mario R Guarracino High Performance Computing and Networking Institute, Laboratory for Genomics, Transcriptomics and Proteomics,Via P. Castellino, 111, Napoli, IT Background: One of the most challenging issue in the variant calling process is handling the data coming from the variant annotation pipeline, and filtering the obtained genes retaining only the ones strictly related to the topic of interest. Several tools permit to gather annotations at different levels of complexity for the detected genes and to group them according to the pathways and/or processes they belong to. However, it might be a time consuming and frustrating task. This is partly due to the size of the file, that might contain many thousands of genes, and to the search of associated variants that requires a gene-by-gene investigation and annotation approach. As a consequence, the initial gene list is often reduced exploiting the knowledge of variants effect, novelty and genotype, with the potential risk of loosing meaningful pieces of information. Results: Here we present Var2GO, a new web-based tool to support the annotation and filtering of genes coming from variant calling of high-throughput sequencing data. Var2GO permits to upload the unprocessed variants table into an on-the-fly generated database. Genes associated to the variants are automatically annotated with the corresponding Gene Ontology terms covering the three GO domains: Molecular function, Cellular component and Biological process. Using the web interface it is then possible to extract, from the whole list, genes having annotations in the domain of interest, by simply specifying one or more keywords. The relevance of this tool is demonstrated on NGS exome sequencing data. Conclusions: Var2GO is an effective tool that implements a topic-based approach, expressly designed to help biologists in narrowing the search of relevant genes coming from variant calling analysis.

Page 28: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

24

POSTER The linkage of human circadian rhythms and hepatocellular carcinoma (HCC) through network biology Sakshi Singh 1, Giovanni Colonna 2, Susan Costantini 3 1 Dottorato in Biologia Computazionale, Dipartimento di Biochimica, Biofisica e Patologia generale, Seconda Università degli Studi di Napoli, Napoli, Italy 2 Servizio di Informatica Medica, Azienda Ospedaliera Universitaria, Seconda Università di Napoli, Napoli, Italy 3 CROM, Istituto Nazionale Tumori "Fondazione G. Pascale" - IRCCS, Napoli, Italy Human circadian rhythms are known as connecting link of internal biological clock with external environmental and earth's day and night cycle. The human circadian rhythms are controlled by a pacemaker situated in SCN (suprachiasmatic nuclei) of hypothalamus which is synchronized everyday to the photoperiod. They are involved in many diseases like diabetes, obesity, depression, bipolar disease, and many types of cancers like breast cancer, colon cancer and also hepatocellular carcinoma which is third most life claiming cancer around the world. In this work the protein- protein interaction networks were analyzed using Cytoscape software. The human circadian network consists of 2151 nodes and 75821 interactions making it a huge network. It is very centralized with the value of 0.235. The density of the network is 0.033 and heterogeneity of 1.012. The characteristic path length is 2.373 while the average number of neighbors is equal to 70.5. Recently we performed a network analysis on gene expression data obtained in our group from HepG2 cells, a liver cancer cell line that lacks the viral infection, and identified 26 HUB genes[12]. Among these genes, 20 resulted present in the human circadian rhythm network like CSNK2A1, SRC, UBD, AURKB, CKAP5, RFC4, CDC20, SFN, MCM6, CHEK1, CENPA, HLA-B, BIRC5, MCM3, MAD2L1, MCM4, ZWINT, KIF2C, INCENP and SPC24. All these 20 genes had high degree values in the circadian network ranging from 287 to 77 indicating that they control a large number of functional interactions and information flow through the circadian network. However, in the network of genes involved in human circadian rhythms and HepG2, 83 hub nodes are common, which establish the strong relationship of liver cancer with circadian rhythms. References: Singh S, Colonna G, Di Bernardo G, Bergantino F, Cammarota M, Castello G, et al. The gene expression profiling of hepatocellular carcinoma by a network analysis approach shows a dominance of intrinsically disordered proteins (IDPs) between hub nodes. Mol Biosyst. 2015; doi:10.1039/c5mb00434a

Page 29: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

25

POSTER Current Molecular Dynamics force fields do not accurately reproduce the interplay between peptide bond geometry and local conformation Nicole Balasco 1,2, Luciana Esposito 1, Luigi Vitagliano 1 1 Institute of Biostructures and Bioimaging, C.N.R., Naples I-80134, Italy 2 DiSTABiF, Second University of Naples, Caserta 81100, Italy, Caserta 81100, Italy Several statistical and quantum mechanics investigations performed in the last two decades have unveiled a strong correlation between protein backbone geometry (bond angles/lengths, dihedral angles and pyramidalization) and the local conformation. This finding has important implication for protein structure prediction, determination, refinement and validation. Predictive protein modeling with ROSETTA has shown an improved convergence when these effects are considered. Therefore, force fields currently available for modeling and molecular dynamics should be able to reproduce these geometric properties. We have recently shown that quantum mechanics calculations on small peptide systems are able to reproduce the dependence of the bond distances/angles on the conformation and the interplay between the peptide bond distortions from planarity and dihedral angle thus demonstrating that the peptide bond geometry of proteins is essentially ruled by local effects. We here evaluate the ability of several commonly used force fields to reproduce subtle structural details related to the peptide bond. References: Esposito L, Vitagliano L, Zagari A, Mazzarella L. Protein Sci. 2000; 9:2038-42. Esposito L, De Simone A, Zagari A, Vitagliano L. J Mol Biol. 2005; 347:483-7. Improta R, Vitagliano L, Esposito L. PLoS One. 2011; 6:e24533. Berkholz DS, Driggers CM, Shapovalov MV, Dunbrack RL Jr, Karplus PA. Proc Natl Acad Sci U S A. 2012; 109:449-53. Berkholz DS, Shapovalov MV, Dunbrack RL Jr, Karplus PA. Structure. 2009; 17:1316-25. Improta R, Vitagliano L, Esposito L. Proteins. 2015; 83:1973-86. Caballero D, Maatta J, Zhou AQ, Sammalkorpi M, O'Hern CS, Regan L. Protein Sci. 2014 Jul; 23:970-80.

Page 30: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

26

POSTER Transcriptional response of dendritic cells to two different antigen delivery systems: a comparative reproducible analysis with RNASeqGUI Francesco Russo, Valerio Costa, Luciana D'Apice, Claudia Angelini CNR - IAC, CNR - IGB, CNR - IBP, CNR - IAC Vaccination is the most successful method to prevent an organism from infectious diseases. In the recent years, transcriptome analysis of gene expression variations have become a powerful approach to systematically assess the effects of vaccinations on cells involved in the immune response. The addition of specific adjuvants to a vaccine is extremely important since they can modulate the power and the quality of the immune response. In this work, using previously produced RNA-Seq datasets we systematically compared the changes in the transcriptome of bone marrow-derived dendritic cells (DCs) exposed to two distinct antigen delivery systems: E2 and fd-scaDEC. The aim of this study is twofold: on the one hand, it allowed us to define - from a biological point of view - the specific immune responses to two distinct antigenic delivery systems, on the other hand it represented a case study in which we applied a typical computational protocol for a transcriptome analysis. Here we show an analysis workflow executed in a fully reproducible way by making use of a graphical user interface for studying RNA-Seq data, called RNASeqGUI. This open source R package - freely available at: http://bioinfo.na.iac.cnr.it/RNASeqGUI/Download.html - promotes Reproducible Research as a crucial aspect of transcriptomic studies to enhance transparency of the code used for the RNA-Seq analysis and to improve knowledge transfer. Moreover, thanks to RNASeqGUI's main feature, called Caching, it allows a reader to try alternative analyses by making use of the cashed objects that store intermediate results of the analysis conducted by the usage of this package. Cached objects are automatically produced and saved during the executions of RNASeqGUI functionalities and can be also used as starting points for further investigations in future works.

Page 31: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

27

POSTER Ab initio reconstruction of PTC transcriptome identifies a novel long non coding RNA: MET-AS. Daniela Esposito, Roberta Esposito, Alfredo Ciccodicola, Valerio Costa Institute of Genetics and Biophysics "Adriano Buzzati-Traverso", National Research Council, Via Pietro Castellino 111, 80131 Naples, Italy; Computational & Biology Open Laboratory, Naples, Italy.

Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 nucleotides that lack significant open reading frames (Kapranov et al., 2007; Carninci et al., 2005). LncRNAs play important roles in a variety of important biological processes; in particular, they regulate gene expression acting at transcriptional, co-transcriptional and post-transcriptional levels (Sun et al., 2013; Guttman et al., 2011). In recent years, accumulating reports of deregulated lncRNAs expression in numerous cancer types highlight that they may act as potential oncogene or tumor-suppressor (Matjašic and Glavac, 2015; Huarte and Rinn, 2010), but little is know about their involvment in thyroid neoplasia. Thyroid cancer is the most common endocrine-related cancer, and its incidence is increasing by 4% per year. The most common type of thyroid cancer is papillary thyroid carcinoma (PTC), representing about 80% of all thyroid malignancies. Thus, to assess whether lncRNAs can exert a tumorigenic role in thyroid cells, first we systematically quantified their expression in PTC vs non-cancerous thyroid biopsies using 22 RNA-Sequencing datasets recently published by our research group (Costa et al., 2015). Our principal aim was to identify novel lncRNAs (i.e. not yet annotated in public databases) whose expression is significantly altered in patients with PTC. To this purpose, we combined ab initio reconstruction of expressed transcripts in the entire cohort of patients (starting from RNA-Seq raw reads) to a custom computational pipeline to associate newly identified lncRNAs with known cancer driver genes. Then we selected only lncRNA/mRNA pairs that displayed significant differential expression between PTCs and control samples. Using this approach, we identified a new lncRNA, transcribed antisense to MET oncogene, that we named MET-AS. Interestingly, both MET and its associated lncRNA MET-AS are up-regulated in PTCs patients carrying BRAFV600E somatic mutation and RET gene rearrangements compared to patients with somatic mutations in RAS (or with a similar transcriptional profile) and control thyroids. Preliminary data indicate that MET-AS knockdown induces a down-regulation of MET, suggesting that this novel lncRNA might be a new regulator of MET gene expression. Ongoing functional studies will help to shed light on the mechanisms by which MET-AS regulates the oncogene MET, and thus its involvement in the pathogenesis of papillary thyroid carcinoma. References: Carninci P, Kasukawa T, Katayama S, et al., The transcriptional landscape of the mammalian genome. Science 2005; 309:1559-1563. Costa V, Esposito R, Ziviello C, et al. New somatic mutations and WNK1-B4GALNT3 gene fusion in papillary thyroid carcinoma. Oncotarget 2015; 10;6(13):11242-51. Guttman M, Donaghey J, Carey BW, et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 2011. Huarte Mand, Rinn JL Large non-coding RNAs: missing links in cancer? Human Molecular Genetics, 2010; 19 (R2): R152–R161. DOI: 10.1093/hmg/ddq353. Kapranov P., Cheng J., Dike S., et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 2007; 316:1484–1488. Matjašic, Alenka and Glavac, Damjan Long Noncoding RNAs and Tumorigenesis. In: eLS. John Wiley & Sons, 2015; Ltd: Chichester. DOI: 10.1002/9780470015902.a0025688. Sun J., Lin Y., Wu J. Long non-coding RNA expression profiling of mouse testis during postnatal development. PLoS One 2013; 8:e75750.

Page 32: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

28

POSTER

Integration of multi-omics data from public resources for the functional analysis of biological networks: molecular-genetic pathways involving aryl hydrocarbon receptor

Serena Dotolo and Angelo Facchiano

Institute of Food Science (ISA-CNR), Via Roma 64, Avellino (Italy)

Omics approaches are widely applied to investigate physiological processes and pathological conditions. Many public data repositories make it possible to extract data for their analyses, comparisons and integrations, provided the availability of suitable tools. Our interest is oriented to the integration of data from different experimental approaches and fields of investigation, covering transcriptomics, proteomics, interactomics, variation data, drug discovery, in order to highlight hiddens information and to mine new knowledge from available experimental data. Therefore, we look at specific gene and protein functions, for which specific interest has been evidenced, and search for a complete view of their relationships in physiological processes. Moreover, focusing on specific pathologies, we extract from public databases the largest amount of experimental results and analyze them with meta-analysis approaches, to find novel insights on molecular aspects, useful for defining diagnostics or therapy. In this work, our attention is focused on integrative-functional analysis of molecular pathways that involve AHR (Aryl hydrocarbon receptor), a cytosolic transcription factor consisting of several protein domains with distinct functions, including hydrocarbon binding as well as DNA-protein and protein-protein interactions. Previous studies from our lab on this protein give us some specific interest and knowledge about its involvement in many pathologies (1). Therefore, we investigate it from the physiological point of view, as well as for its role in specific pathologies, also in the view of the molecular network that includes other proteins of interest for the pathology (2-6). The functional analysis is executed by means of different open-source bioinformatics platforms, including GeneCards, DSYSMAP, and in particular Cytoscape platform for realizing and visualizing molecular networks at different levels, in order to improve the knowledge of molecular mechanisms. Furthermore, as an example on a specific pathology, we use the BioGPS platform to extrapolate by Gene Atlas the gene expression profile of our biological targets involved in melanoma, and MelGene DB (a database for melanoma genetic studies and for analysis some important melanoma biomarkers). The poster presents the molecular networks and discusses the potential roles of specifc nodes evidenced by the analysis, also in consideration of the role of disease-related mutations. References: 1. Salzano M, Marabotti A, Milanesi L, Facchiano A. Human aryl-hydrocarbon receptor and its interaction with dioxin and physiological ligands investigated by molecular modelling and docking simulations. Biochem Biophys Res Commun. 2011 Sep 23;413(2):176-81. doi: 10.1016/j.bbrc.2011.08.039. 2. Faraone D, Aguzzi MS, Toietta G, Facchiano AM, Facchiano F, Magenta A, Martelli F, Truffa S, Cesareo E, Ribatti D, Capogrossi MC, Facchiano A. Platelet-derived growth factor-receptor alpha strongly inhibits melanoma growth in vitro and in vivo. Neoplasia. 2009 Aug;11(8):732-42.doi:10.1593/neo.09408 3. Facchiano F, D'Arcangelo D, Lentini A, Rossi S, Senatore C, Pannellini T, Tabolacci C, Facchiano AM, Facchiano A, Beninati S. Tissue transglutaminase activity protects from cutaneous melanoma metastatic dissemination: an in vivo study. Amino Acids. 2013 Jan;44(1):53-61. doi: 10.1007/s00726-012-1351-6

Page 33: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

29

POSTER

Genetic Characterization of Colorectal Cancer using a Next Generation Sequencing approach to identify a specific pattern of somatic alterations in tumors arrising from different anatomic sites.

Carmelo Laudanna 1,2, Gianluca Santamaria 1, Simona Migliozzi 1,2, Duarte Oliveira 1,2, Donatella Malanga 1,2, Rosario Sacco 3, Antonia Rizzuto 3, Giuseppe Viglietto 1,2

1 Department of Experimental and Clinical Medicine, Magna Græcia University of Catanzaro, Italy. 2 CIS, Centro Interdipartimentale dei Servizi, Magna Græcia University of Catanzaro, Italy. 3 Department of Medical and Surgical Sciences, Magna Græcia University of Catanzaro, Italy. Colorectal cancer (CRC) is the third leading cause of cancer-related deaths worldwide, with nearly 1.4 million new cases diagnosed in 2012. CRC results from the accumulation of multiple genetic and epigenetic aberrations. Tumor localization in the large intestine tract determines different surgical approaches and treatment options. Considering the heterogeneous nature of these tumors we hypothesized that different patterns of molecular alterations could be associated with a specific anatomical location. To identify distinct genomic alterations (e.g, copy number variations and mutations) associated to different CRC anatomical sites we sequenced 32 CRCs samples from different location (right-sided, left-sided etc.) using the Ion AmpliSeq™ Comprehensive Cancer Panel that covered the whole coding sequence of 409 tumor suppressor genes and oncogenes frequently altered in cancer. Interestingly left-sided tumors were generally more altered respect to right-sided ones. Cluster analysis of all samples allowed the identification of 21-gene core that were significantly mutated in all sample groups. As expected, KRAS and APC mutations were frequently in the tumors resected from different anatomical localizations. Unsupervised analysis of copy number variations reveals a core of 160-gene significantly altered. In addition to the expected SRC, MYC and CEBPA, we found interestingly genes in validation status. Despite missing a significant number of cases, gene panel provides a solid alternative approach to WES in order to characterize a signature of alterations correlated with CRC tumor and the identification of novel biomarkers in colorectal carcinoma that could be used as potential clinical target.

Page 34: BBCC2015 Bioinformatica e Biologia Computazionale in …bioinformatica.isa.cnr.it/BBCC/BBCC2015/BBCC2015-abstracts-book.pdf · BBCC2015 Bioinformatica e Biologia Computazionale in

30