CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It...

35
Consiglio Nazionale delle Ricerche Istituto di Matematica Applicata e Tecnologie Informatiche Enrico Magenes PUBBLICAZIONI F. Calvi, F. Bonini, A. Lisa AN AUTOMATED PIPELINE FOR CORRELAGENES, A NEW TOOL TO INTERPRETATE THE HUMAN TRANSCRIPTOME 8PV14/2/0

Transcript of CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It...

Page 1: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Consiglio Nazionale delle Ricerche

Istituto di

Matematica Applicata e Tecnologie Informatiche “Enrico Magenes”

PUBBLICAZIONI

F. Calvi, F. Bonini, A. Lisa AN AUTOMATED PIPELINE FOR CORRELAGENES, A NEW TOOL TO INTERPRETATE THE HUMAN TRANSCRIPTOME

8PV14/2/0

Page 2: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

C.N.R. - Istituto di Matematica Applicata e Tecnologie Informatiche “Enrico Magenes”

Sede di Pavia Via Ferrata, 1 - 27100 PAVIA (Italy) Tel. +39 0382 548211 Fax +39 0382 548300 Sezione di Genova Via De Marini, 6 (Torre di Francia) - 16149 GENOVA (Italy) Tel. +39 010 6475671 Fax +39 010 6475660 Sezione di Milano Via E. Bassini, 15 - 20133 MILANO (Italy) Tel. +39 02 23699521 Fax +39 02 23699538

URL: http://www.imati.cnr.it

Page 3: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

An automated pipeline for CorrelaGenes, a new tool to interpretate the human transcriptome !Calvi F1, Bonini F2, Lisa A3 1 Istituto di Matematica Applicata e Tecnologie Informatiche "Enrico Magenes"- CNR, Pavia, Italy. 2 CINECA, Segrate (MI), Italy

3 Istituto di Genetica Molecolare – CNR, Pavia, Italy

Page 4: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

Page 5: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 3

ABSTRACT In this report, the automated pipeline for CorrelaGenes, a new bioinformatics tool that performs the analysis of gene co-expression, is presented.

The distributed platform, accessible from Internet, provides both a straightforward user-friendly web interface to set up the appropriate parameters for the subsequent analysis and the back-end procedures that, tabulating and processing the input data, send the results back to the user.

Two Linux servers and an area shared by NFS between them form the hardware background on which the procedures based on scripts in Html, PHP, and Linux shell are implemented.

Page 6: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 4

INTRODUCTION The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform text files in information easily accessible to biologists. CorrelaGenes (1) is a new bioinformatics tool that performs the analysis of gene co-expression identifying functional pathways involved in the biological processes of human cells. This instrument, using the expression data derived from microarray experiments publicly available in the database Gene Expression Omnibus (GEO), allows the identification of groups of genes that show similar expression profiles in different experimental conditions. CorrelaGenes uses an algorithm for finding association rules (Association Rule Mining: ARM) to identify genes that are frequently co-expressed in the dataset extracted from GEO database. In order to improve the biological significance of the results, the algorithm was modified to identify association rules that involve only two genes, one of which, called Target, is defined as a parameter in the initial search. This modification adds to the ARM standard technique a driven approach to identify a list of genes frequently co-expressed with the Target gene suggesting their coordinated action in the same biological processes.

In this report we present, the distributed platform, accessible from Internet, that provides:

1) user-friendly web interface that allows users to submit an analysis to CorrelaGenes (html code);

2) back-end procedure to create and tabulate the input data requested by the analysis programs (PHP code);

3) batch procedures that process the data, provide and format the resulted output;

4) final procedure to send the results back to the user by e-mail. A schematic representation of the procedure explained above is presented in Figure 1.

Page 7: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 5

Figure 1 Schematic representation of the CorrelaGenes procedure

Page 8: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 6

System Architecture: hardware and software Two servers have been utilized to implement the pipeline that, starting from the input data, process the data and send the result to the user. The web interface was developed on a Dell PowerEdge 1900 (IntelXeon 2.00Ghz RAM 4Gb Hard disk 400Gb) server (SERVER_1) running Linux Debian 4.0 as Operating System. The website was built using TYPO3 Ver. 4.5, an Open Source CMS (Content Management System). Apache/1.3.34 and PHP 5.2.6-1 were installed on this server. A system of high performance computing (SERVER_2) has been used to process the data. In its hardware part the system is a cluster IBM with six nodes, each of which characterized by two processors IntelXeon 4-core and with a memory capacity of 42 Gigabyte. Each node is also equipped with two solid-state drives of 50 Gigabytes. This cluster is managed by a front-end IBM x3650 M3 with two CPUs IntelXeon 4-core, 36 Gigabytes of memory, two 250 Gigabyte hard drives and communicates with the cluster through two switches with 10 Gigabit connections. With regard to the management, the cluster is equipped with 8 disks in RAID 5 of one terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux oriented operating system: CentOS vers. 5.5. Jobs are submitted to the system through the PBS (ver. 10.4) software that allows the user to choose how many cores allocate to the job. Fortran gcc (version 4.1.2), R (version 2.14.1) and PostgreSQL (version 8.1.23) were also installed and used on this system. Moreover, an area shared by NFS between SERVER_1 and SERVER_2 have been implemented (Figure 2).

Figure 2 Schematic illustration of the system architecture

Page 9: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 7

Web interface 1. Form input data: html script A web interface was created on SERVER_1 to provide users with an easy and efficient access to the tool. To access CorrelaGenes (www.igm.cnr.it/cabgen) a username and a password, completely managed by CMS, are needed and the login information is sent by mail to the user who has requested it at the email address: [email protected] (Figure 3). !!

Figure 3 CorrelaGenes Login Page

! After login, CorrelaGenes home page (Appendix A) can be accessed and users can define the set of parameters necessary to analyze the gene of interest (Cremaschi et al., 2014). Pop-up windows containing the correct definition of the parameters flank each input field. Moreover, at this page links to user guide, sample reports, gene list, support and staff are available (Figure 4). !!

Page 10: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 8

Figure 4 CorrelaGenes Input Page

!!

When the job is submitted a window showing a summary with the chosen parameters is displayed (Figure 5).!!!

Figure 5

Example of a web browser input request !

!!

!To implement the web procedure we have developed an html script that displays the form for the input requests in the web browser and sends the collected information to a PHP script. The PHP script is executed on SERVER_1 each time a new job is submitted. It contains the instructions:

Page 11: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 9

1) to save the formatted input file on the shared area between SERVER_1 and SERVER_2 where a batch procedure will elaborate the data;

2) to send an e-mail of job submission to the user. 2. PHP script for formatting and sending the input data The PHP script (Appendix B) invoked by the previous html script on SERVER_1 uploads the input parameters given by the user and stores these values in local variables. PHP script controls the format, the empty fields and the interval range of each inserted parameters. In case of unexpected values, PHP script displays a warning message enlightening the wrong parameter and suggesting a possible solution. When all parameters have acceptable values, an incremental number (job-number) is defined for the submission and it will identify the job in any step of the subsequent analyses. At this stage, the PHP script creates and sends the notification e-mail of job submission with the job identification number in addition to the analyses summary (Figure 6).

Figure 6 Example of the text in the notification e-mail for job

submission !

Moreover, the input data are structured as a tab-delimited text files named “job-number_e-mail.txt” including all the information for the subsequent steps of the CorrelaGenes programs (Figure 7). They are saved inside the NFS shared area SHARED_AREA/INPUT, where they are stored until the batch procedure is executed.

Page 12: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 10

Figure 7 Example of the formatted input text file

Page 13: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 11

Batch procedure The batch procedure is executed on SERVER_2 and it consists of some scripts that run periodically on a given time schedule defined in the server crontab and, in different steps, sends the instructions to execute the integrated Fortran and R programs that perform the analysis of the gene of interest. The first script (Run.sh) executes the following steps (Appendix C):

1. the paths necessary for the correct implementation of the different analysis programs are assigned and exported;

2. input files, characterized by “.txt” extension and present in SHARED_AREA/INPUT directory are detected. If no input files are found, the script ends, otherwise for each input file the script executes the following commands:

a. from the input file name (job-number_e-mail.txt), the job number is extracted and stored in a variable called NUM, while a variable N stores the e-mail address;

b. a new directory called “job_NUM” is created inside the directory SHARED_AREA/INPUT where the input file is moved and renamed with a static name that fulfils the Fortran procedure request. For the same reason some static temporary files are copied inside the previous directory;

3. symbolic links to the gene databases are created in each new directory SHARED_AREA/INPUT/job_NUM;

4. a PBS script (Appendix D) that submits the Fortran programs to a cluster node is invoked. At the end of the PBS script, the draft output file with the statistical results, named “NUM_N”, is created and moved in the directory SERVER_2/PBS.

At this point, the procedure invokes another script (RunR.sh – Appendix E) that modifies each result file created by Fortran programs and stored in SERVER_2/PBS directory, adding the gene description, formatting the text file and saving the definitive output file into NFS shared area: SHARED_AREA/OUTPUT. All these steps are performed through an R procedure. The results are structured as tab-delimited text file showed in a web page. The results include an 8-rows header summarizing all the analysis parameters and two additional rows indicating the number of comparisons where the target gene was found present or modulated. Below the header, a table reports the list of the related genes found together with their annotation details (i.e.: gene symbol, gene description, chromosome, cytogenetic band, strand,

Page 14: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 12

start position, end position and Ensembl ID) and all the ARM indexes calculated during the analysis. The last script (Sent_Mail.sh), executed on SERVER_1, analyzes the SHARED_AREA/OUTPUT directory content and, for each file, composes and sends the e-mail with the link where the results are downloadable by the users (Figure 8). The script ends moving the output file inside the SHARED_AREA/SENT directory.

Figure 8 Output result display

All the scripts of the batch procedure are properly synchronized by cron schedules. A schematic illustration of the complete batch procedure is represented in Figure 9.

Page 15: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

!

FIGURE 9 Graphic diagram of the batch procedure.

Legend: 1) Run.sh (Appendix 1) 2) pbs.sh (Appendix 2) 3) RunR.sh (Appendix 3) 4) Sent_mail.sh (Appendix 4)

Page 16: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

!

! 14

Bibliography Cremaschi P, Rovida S, Sacchi L, Lisa A, Calvi F, Montecucco A, Biamonti G, Bione S, Sacchi G. CorrelaGenes: a new tool for the interpretation of the human transcriptome. BMC Bioinformatics. 2014;15 Suppl 1:S6.

Page 17: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix A

123 ####################4 # html script #5 ####################67 <br><center ><font size="6"><b>CorrelaGenes </b></font -

><br></center >89 <!-- Link to the help files -->10 <br>11 <center >12 <a href="www/guide.html" target="_blank"><font size= -

"4"><b>User Guide </b></font></a>13 &nbsp;14 &nbsp;15 &nbsp;16 &nbsp;17 <a href="www/sample.html" target="_blank"><font -

size="4"><b>Sample Reports </b></font></a>18 &nbsp;19 &nbsp;20 &nbsp;21 &nbsp;22 <a href="www/list.html" target="_blank"><font size= -

"4"><b>Gene List</b></font></a>23 &nbsp;24 &nbsp;25 &nbsp;26 &nbsp;27 <a href="www/contacts.html" target="_blank"><font -

size="4"><b>Contacts </b></font></a>28 &nbsp;29 &nbsp;30 &nbsp;31 &nbsp;32 <a href="www/staff.html" target="_blank"><font size -

="4"><b>Staff </b></font></a>33 <br><br><align=>Last update: April 04, 2013</p>34 </center >3536 <br><center ><font size="6"><b>CorrelaGenes </b></font -

><br></center >3738 <!-- Instruction for HTTP post transaction to send -

15

Page 18: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

form -data to the script start_process.php -->3940 <form method="post" action="www/start_process.php">4142 <!-- Instructions for data input -->4344 <font size="2">45 <b>TARGET GENE SYMBOL </b>46 <br>47 <input type="text" name="MySymbol">4849 <script type="text/JavaScript">50 function MySimbol ()51 {52 window.open(’SERVER_1/MySymbol.html ’,’popup ’,’ -

toolbar=no ,location=no ,directories=no ,status=no , -menubar=no ,resizable=yes ,copyhistory=no , -scrollbars=yes ,width =480, height =320’);

53 }54 </script >5556 <a href="JavaScript: MySimbol ()" onMouseOver="window -

.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

57 <font size="3">?</a></font>5859 <br><br>60 <b>TARGET SIGN</b>61 <br>62 <input type="range " name="mysign" value="0" min="-1 -

" max="+1" step="1">6364 <script type="text/JavaScript">65 function miosegno ()66 {67 window.open(’SERVER_1/mysign.html ’,’popup ’,’toolbar= -

no,location=no,directories=no,status=no,menubar= -no,resizable=yes ,copyhistory=no,scrollbars=yes , -width =480, height =320’);

68 }69 </script >7071 <a href="JavaScript: miosegno ()" onMouseOver="window -

.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

72 <font size="3">?</a></font>73 <br><br>

16

Page 19: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

74 <b>TARGET GENE LFC</b>75 <br>76 <input type="range " name="boundTarget" value="1" -

min="0" max="10" step="0.00001">7778 <script type="text/JavaScript">79 function boundTarget ()80 {81 window.open(’SERVER_1/boundTarget.html ’,’popup ’,’ -

toolbar=no ,location=no ,directories=no ,status=no , -menubar=no ,resizable=yes ,copyhistory=no , -scrollbars=yes ,width =480, height =320’);

82 }83 </script >8485 <a href="JavaScript: boundTarget ()" onMouseOver=" -

window.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

86 <font size="3">?</a></font>8788 <br><br>89 <b>GENE LFC</b><BR>90 <input type="range " name="boundLFC" value="1" min=" -

0" max="10" step="0.00001" >9192 <script type="text/JavaScript">93 function boundLFC ()94 {95 window.open(’SERVER_1/bound_LFC.html ’,’popup ’,’ -

toolbar=no ,location=no ,directories=no ,status=no , -menubar=no ,resizable=yes ,copyhistory=no , -scrollbars=yes ,width =480, height =320’);

96 }97 </script >9899 <a href="JavaScript: boundLFC ()" onMouseOver="window -

.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

100 <font size="3">?</a></font>101102 <br><br>103 <b>LFC P VALUE</b><BR>104 <input type="range " name="boundpVal" value="0.05" -

min="0" max="1" step="0.00001">105106 <script type="text/JavaScript">107 function booundpVal ()

17

Page 20: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

108 {109 window.open(’SERVER_1/boundpVal.html ’,’popup ’,’ -

toolbar=no ,location=no ,directories=no ,status=no , -menubar=no ,resizable=yes ,copyhistory=no , -scrollbars=yes ,width =480, height =320’);

110 }111 </script >112113 <a href="JavaScript: booundpVal ()" onMouseOver=" -

window.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

114 <font size="3">?</a></font>115116 <br><br>117 <b>% CO -PRES</b> <br>118 <input type="range " name="perComp" value="40.0" min -

="0" max="100" step="0.00001">119120 <script type="text/JavaScript">121 function perComp ()122 {123 window.open(’SERVER_1/perComp.html ’,’popup ’,’toolbar -

=no,location=no,directories=no,status=no,menubar= -no,resizable=yes ,copyhistory=no,scrollbars=yes , -width =480, height =320’);

124 }125 </script >126127 <a href="JavaScript: perComp ()" onMouseOver="window. -

status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

128 <font size="3">?</a></font>129130 <br><br>131 <b>LIFT</b> <br>132 <input type="range " name="boundLift" value="2" min= -

"0" max="99999999999" step="0.00001">133134 <script type="text/JavaScript">135 function boundLift ()136 {137 window.open(’SERVER_1/boundLift.html ’,’popup ’,’ -

toolbar=no ,location=no ,directories=no ,status=no , -menubar=no ,resizable=yes ,copyhistory=no , -scrollbars=yes ,width =480, height =320’);

138 }139 </script >

18

Page 21: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

140141 <a href="JavaScript: boundLift ()" onMouseOver=" -

window.status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

142 <font size="3">?</a></font>143144 <br><br>145 <b>CHI SQUARE P VALUE</b> <br>146 <input type="range " name="perCoCo" value="0.05" min -

="0" max="1" step="0.00001">147148 <script type="text/JavaScript">149 function perCoCo ()150 {151 window.open(’SERVER_1/perCoCo.html ’,’popup ’,’toolbar -

=no,location=no,directories=no,status=no,menubar= -no,resizable=yes ,copyhistory=no,scrollbars=yes , -width =480, height =320’);

152 }153 </script >154155 <a href="JavaScript: perCoCo ()" onMouseOver="window. -

status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

156 <font size="3">?</a></font>157158 <br><br>159 <b>EMAIL ADDRESS </b><br>160 <input type="email" name="email">161162 <script type="text/JavaScript">163 function posta ()164 {165 window.open(’SERVER_1/mail.html ’,’popup ’,’toolbar=no -

,location=no,directories=no,status=no,menubar=no, -resizable=yes ,copyhistory=no,scrollbars=yes ,width -=480, height =320’);

166 }167 </script >168169 <a href="JavaScript: mail()" onMouseOver="window. -

status=’Status Bar Message ’; return true" -onMouseOut="window.status=’’; return true">

170 <font size="3">?</a></font>171172 <br><br>173

19

Page 22: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

174 <!-- form -data submission -->175176 <input type="submit" name="send" value="SEND">177 <input type="reset" name="reset" value="RESET"></p>< -

br>178179 </form>

20

Page 23: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix B

12 #######################3 # start_process.php #4 #######################56 <?php78 // Retreiving variables from form910 $Gene_Name = trim($_POST[’MySymbol ’]);11 $E_Mail = trim($_POST[’email’]);12 $Value1 = trim($_POST[’boundLFC ’]);13 $Value2 = trim($_POST[’boundpVal ’]);14 $Value3 = trim($_POST[’perComp ’]);15 $Value4 = trim($_POST[’perCoCo ’]);16 $Value5 = trim($_POST[’mysign ’]);17 $Value7 = trim($_POST[’boundLift ’]);18 $Value8 = trim($_POST[’boundTarget ’]);1920 $Gene_Name = strtoupper($Gene_Name);2122 $Value1 = str_replace(",", ".",$Value1);23 $Value2 = str_replace(",", ".",$Value2);24 $Value3 = str_replace(",", ".",$Value3);25 $Value4 = str_replace(",", ".",$Value4);26 $Value7 = str_replace(",", ".",$Value7);27 $Value8 = str_replace(",", ".",$Value8);2829 // Control of mail format30 $num_at = count(explode( ’@’, $E_Mail )) - 1;31 if($num_at != 1) {32 echo "Email format unknown";33 ?>34 <br />3536 <a href="javascript:history.go(-1)">Go Back </a>37 <?3839 }40 elseif(strpos($E_Mail ,’;’) || strpos( -

$E_Mail ,’,’) || strpos($E_Mail ,’ ’)) -{

41 echo "Email format unknown";42 ?>43 <br />

21

Page 24: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

4445 <a href="javascript:history.go(-1)">Go Back </a>46 <?4748 }49 elseif (! preg_match( ’/^[\w\.\ -]+@\w+[\w -

\.\ -]*?\.\w{1,4}$/’, $E_Mail)) {50 echo "Email format unknown";51 ?>52 <br />5354 <a href="javascript:history.go(-1)">Go Back </a>55 <?5657 }5859 // Empty field control6061 elseif(empty($Gene_Name))62 {63 echo "Target Gene Symbol - Empty field not -

accepted";64 ?>65 <br />6667 <a href="javascript:history.go(-1)">Go Back </a>68 <?6970 }7172 elseif($Value5 === ’’)73 {74 echo "Target Sign - Empty field not -

accepted";75 ?>76 <br />7778 <a href="javascript:history.go(-1)">Go Back </a>79 <?8081 }8283 elseif($Value8 === ’’)84 {85 echo "Target Gene LFC - Empty field not -

accepted";86 ?>

22

Page 25: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

87 <br />8889 <a href="javascript:history.go(-1)">Go Back </a>90 <?9192 }9394 elseif($Value1 === ’’)95 {96 echo "Gene LFC - Empty field not accepted";97 ?>98 <br />99

100 <a href="javascript:history.go(-1)">Go Back </a>101 <?102103 }104105 elseif($Value2 === ’’)106 {107 echo "LFC P Value - Empty field not -

accepted";108 ?>109 <br />110111 <a href="javascript:history.go(-1)">Go Back </a>112 <?113114 }115116 elseif($Value3 === ’’)117 {118 echo "% OF CO -PRES - Empty field not -

accepted";119 ?>120 <br />121122 <a href="javascript:history.go(-1)">Go Back </a>123 <?124125 }126127 elseif($Value7 === ’’)128 {129 echo "Lift - Empty field not accepted";130 ?>131 <br />

23

Page 26: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

132133 <a href="javascript:history.go(-1)">Go Back </a>134 <?135136 }137138 elseif($Value4 === ’’)139 {140 echo "Chi Square P Value - Empty field not -

accepted";141 ?>142 <br />143144 <a href="javascript:history.go(-1)">Go Back </a>145 <?146147 }148149150 // Input data integrity control151152 elseif(ord($Gene_Name) < ’48’ || ord($Gene_Name) == -

’58’ || ord($Gene_Name) == ’59’ || ord($Gene_Name -) == ’60’ || ord($Gene_Name) == ’61’ || ord( -$Gene_Name) == ’62’ || ord($Gene_Name) == ’63’ || -ord($Gene_Name) == ’64’ || ord($Gene_Name) == ’ -

91’ || ord($Gene_Name) == ’92’ || ord($Gene_Name) -== ’93’ || ord($Gene_Name) == ’94’ || ord( -

$Gene_Name) == ’95’ || ord($Gene_Name) == ’96’ || -ord($Gene_Name) == ’123’ || ord($Gene_Name) == ’ -

124’ || ord($Gene_Name) == ’125’ || ord( -$Gene_Name) == ’126’ || ord($Gene_Name) == ’127 -’)

153154 { echo "Characters not allowed in Target Gene Symbol -

";155 ?>156 <br />157158 <a href="javascript:history.go(-1)">Go Back </a>159 <?160 }161162 elseif (strlen($Gene_Name) > ’15’)163 {echo "Target Gene Symbol value not accepted , -

more than 15 characters inserted";164 ?>

24

Page 27: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

165 <br />166167 <a href="javascript:history.go(-1)">Go Back </a>168 <?169 }170171172 // Input data range control173174 elseif ($Value5 != (int)$Value5 )175 {echo "Target Sign value must be an integer" -

;176 ?>177 <br />178179 <a href="javascript:history.go(-1)">Go Back </a>180 <?181182 }183 elseif ($Value5 != ’-1’ && $Value5 != ’0’ && -

$Value5 != ’1’)184 {echo "Target Sign value not accepted";185 ?>186 <br />187188 <a href="javascript:history.go(-1)">Go Back </a>189 <?190 }191192 elseif ($Value8 > ’10’ || $Value8 < ’0’)193 {echo "Target Gene LFC value not accepted";194 ?>195 <br />196197 <a href="javascript:history.go(-1)">Go Back </a>198 <?199200 }201202 elseif ($Value1 > ’10’ || $Value1 < ’0’)203 {echo "Gene LFC value not accepted";204 ?>205 <br />206207 <a href="javascript:history.go(-1)">Go Back </a>208 <?209

25

Page 28: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

210 }211212 elseif ($Value2 > ’1’ || $Value2 < ’0’)213 {echo "LFC P value not accepted";214 ?>215 <br />216217 <a href="javascript:history.go(-1)">Go Back </a>218 <?219220 }221 elseif ($Value3 > ’100’ || $Value3 < ’0’)222 {echo "% of CO-PRES value not accepted";223 ?>224 <br />225226 <a href="javascript:history.go(-1)">Go Back </a>227 <?228229 }230231 elseif ($Value7 < ’0’)232 {echo "Lift value not accepted";233 ?>234 <br />235236 <a href="javascript:history.go(-1)">Go Back </a>237 <?238239 }240241 elseif ($Value4 > ’1’ || $Value4 < ’0’)242 {echo "Chi Square P value not accepted";243 ?>244 <br />245246 <a href="javascript:history.go(-1)">Go Back </a>247 <?248 }249 else{250251252 // Job number assignment253254 $file = ("/www/counter ");255 $visit = file($file);256 $visit [0]++;

26

Page 29: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

257 $fp = fopen($file , "w");258 fputs($fp , "$visit [0]");259 fclose($fp);260 echo "Job $visit [0] <br />";261262 // Writing e-mail content263264 $toclient = $E_Mail;265 $sbjclient = "Correlagenes - Job number $visit [0] ";266267 $headersclient .= ’Content -type: text/html; charset= -

iso -8859 -1’ . "\n";268 $message ="269 Your job (number $visit [0]) has been successfully -

submitted to the server.<br><br>270 You selected the following parameters: <br /><br />271 Target Gene Symbol: $Gene_Name <br />272 Target Sign: $Value5 <br -

/>273 Target Gene LFC: $Value8 <br />274 Gene LFC: $Value1 <br />275 LFC P Value: $Value2 <br />276 % of CO-PRES: $Value3 <br />277 Lift: $Value7 <br />278 Chi Square P Value: $Value4 <br />279 <br /> You will receive your results by mail.280 <br />281 <br />For any question , please contact Correlagenes -

staff ([email protected])";282283 //Send mail284285 mail($toclient , $sbjclient , $message , $headersclient -

);286287 // displaying of input data in the web browser288289 echo "On: ";290 echo date ("d-F-Y ");291 echo "at: ";292 echo date ("G-i" );293 echo " Data sent as follows: <br />";294 echo "Target Gene Symbol $Gene_Name <br />";295 echo "Target Sign $Value5 <br />";296 echo "Target Gene LFC $Value8 <br />";297 echo "Gene LFC $Value1 <br />";298 echo "LFC P Value $Value2 <br />";

27

Page 30: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

299 echo "% of CO -PRES $Value3 <br />";300 echo "Lift $Value7 <br />";301 echo "Chi Square P Value $Value4 <br />";302303 echo "<br ><a href =\" http :// www.igm.cnr.it/cabgen/web -

-correlagenes0304 \">Back to Correlagenes form </a>";305306 // Batch process input file307 // The file is mounted on the NFS shared area308309 $punt=fopen("/SHARED_AREA/INPUT/$visit [0] _$E_Mail. -

txt","w");310 fwrite($punt ,"$Gene_Name\r\n");311 fwrite($punt ,"$Value5 Target Sign\r\n");312 fwrite($punt ,"$Value8 Target Gene LFC\r\n");313 fwrite($punt ,"$Value1 Gene LFC\r\n");314 fwrite($punt ,"$Value2 LFC P Value\r\n");315 fwrite($punt ,"$Value3 % of CO -PRES\r\n");316 fwrite($punt ,"$Value7 Lift\r\n");317 fwrite($punt ,"$Value4 Chi Square P Value\r\n");318 fclose($punt);319 exit;320 }321 ?>

28

Page 31: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix C

12 #!/bin/sh3 ####################4 # Script Run.sh #5 ####################67 # STEP_189 MYHOME=SHARED_AREA10 INPUT=${MYHOME }/INPUT11 OUTPUT=${MYHOME }/ OUTPUT12 MYDIR=SHARED_AREA/INPUT13 PBS=SERVER_2/PBS14 export MYHOME15 export INPUT16 export OUTPUT17 export PBS18 FILES =*.txt1920 # STEP_22122 A=‘ls -al ${INPUT }/*. txt 2>/dev/null |wc -l‘2324 if [ $A -lt 1 ]25 then26 exit27 else28 cd ${INPUT}2930 for F in ${FILES}31 do32 NUM=‘echo ${F}| awk -F"_" ’{print $1}’ ‘33 N=‘echo ${F} | awk -F"_" ’{print $2}’‘34 MYDIR=${INPUT }/job_${NUM}35 export MYDIR36 export N37 export NUM3839 mkdir ${INPUT }/job_${NUM}4041 # STEP_34243 mv ${INPUT}/${F} ${MYDIR}/CGv2 -Input44 cp ${MYHOME }/SR -CGv2 -PBS/CGv2Step1 ${MYDIR -

}/ CGv2Step1

29

Page 32: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

45 cp ${MYHOME }/SR -CGv2 -PBS/CGv2Step2 ${MYDIR -}/ CGv2Step2

46 cp ${MYHOME }/SR -CGv2 -PBS/CGv2Step3 ${MYDIR -}/ CGv2Step3

47 cp ${MYHOME }/SR -CGv2 -PBS/pbs123 ${MYDIR}/ -pbs123

48 cd ${MYDIR}4950 # link to gene database5152 ln -s SHARED_AREA/SR -DATA DATA53 ln -s SHARED_AREA/SR -DATA/dataGxC -u dataGxC -

-u5455 # Job submission to PBS queue5657 /opt/pbs/default/bin/qsub pbs12358 done59 fi60 exit

30

Page 33: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix D

12 #!/bin/sh

3 ####################

4 # PBS script #

5 ####################

6 #

7 #PBS -l walltime =00:05:00

8 #PBS -l nodes=node02:ppn=8

9 #PBS -V

10 #PBS -N pbs123

11 #PBS -r n

12 #PBS -e ERR123

13 #PBS -o OUT123

14 set -x

15 echo =========================

16 echo PBS_NODEFILE is:

17 cat ${PBS_NODEFILE}

18 echo =========================

19 cd ${MYDIR}

20 time CGv2Step1

21 time CGv2Step2

22 time CGv2Step3

2324 cp ${MYDIR }/CGv2 -Output ${PBS}/${NUM}"_"${N}

25 mv ${MYDIR }/CGv2 -Input

2627 SHARED_AREA/SENT_MAIL/${NUM}"_input_"${N}

28 rm CGv2Step*

29 rm pbs123

30 rm DATA

31 rm dataGxC -u

31

Page 34: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix E

1 #!/bin/sh

2 ####################

3 # Script RunR.sh #

4 ####################

5 #

6 PBS=SERVER_2/PBS

7 FILES =*.txt

89 A=‘ls -al ${PBS }/*. txt 2>/dev/null |wc -l‘

10 export PBS

11 if [ $A -lt 1 ]

12 then

13 exit

14 else

15 cd ${PBS}

16 for F in ${FILES}

17 do

18 Z=‘echo ${F}| awk -F"_" ’{print $1}’ ‘

19 Y=‘echo ${F} | awk -F"_" ’{print $2}’‘

20 mv ${Z}"_"${Y} CGv2 -Output

21 date

22 /usr/bin/Rscript --vanilla --verbose ←↩CreateReport.R

23 date

24 sleep 2m

25 mv ${PBS}/CGv2 -Output.txt SHARED_AREA/OUTPUT/$←↩{Z}"_"${Y}

26 rm ${PBS}/CGv2 -Output*

27 done

28 fi

29 exit

32

Page 35: CORRELA REPORT finalearturo.imati.cnr.it/boxnewsite/techrepPV/14/8PV14-2-0.pdf · terabyte each. It has also a backup unit adapted to the system. The cluster is managed with the Linux

Appendix F

12 #!/bin/sh

3 ###########################

4 # Script SendResult.sh #

5 ###########################

67 OU=SHARED_AREA/OUTPUT

8 BODY_MAIL=SHARED_AREA/OUTPUT/body_mail

9 SENT=SHARED_AREA/SENT_MAIL

10 DIR_WEB=SERVER_1/correla_result

11 DESTINATION="http :// www.igm.cnr.it/correla_result/"

12 cd ${OU}

13 FILES =*.txt

1415 A=‘ls -al ${OU}/*. txt 2>/dev/null |wc -l‘

16 if [ $A -lt 1 ]

17 then

18 exit

19 else

20 for F in ${FILES}

21 do

22 M=‘echo ${F}| awk -F"_" ’{print $2}’| awk -F".←↩txt" ’{print $1}’‘

23 N=‘echo ${F}| awk -F"_" ’{print $1}’‘

2425 X=‘/usr/bin/uuidgen ‘

2627 DESTINATION="http :// www.igm.cnr.it/←↩

correla_result/"

2829 cp ${OU}/${F} ${DIR_WEB }/ CG_Output_${N}_${X}.txt

3031 DESTINATION=${DESTINATION}CG_Output_${N}_${X}.txt

3233 /bin/cat ${BODY_MAIL} > fine

3435 echo ${DESTINATION} >> fine

3637 (/bin/cat fine) | /usr/bin/mail -s ’Correlagenes ←↩

Results job=’${N} ${M} -- -f [email protected]

3839 mv ${OU}/${F} ${SENT}

40 mv fine ${SENT}

4142 done

43 fi

44 exit

33