13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn...

1
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure (http://nova.idi.ntnu.no/) Abstract eGOn (explore Gene Ontology) is a bioinformatics tool that facilitates the use of biological background knowledge in analysis of genes selected by high throughput analysis. The list of identifiers for the selected genes is submitted to the server, and eGOn performs a search over the publicly available gene databases to retrieve GO terms annotated to these genes. The results are visualized in the GO hierarchical structure, and can be exported in various text formats. An essential feature of eGOn is that several gene lists may be analyzed simultaneously to compare the distribution of the annotated genes over the whole GO hierarchy. Statistical tests are computed to assess the level of dissimilarity between different gene lists. Figure 1. Data Flow in eGOn Figure 4. Table view with annotated genes and results from the McNemars test Figure 2. Clone set view Vidar Beisvåg 1 , Lars Jølsum 1 , Wacek Kuśnierczyk 4 , Mette Langaas 2 , Bjørn Alsberg 3 , Hallgeir Bergum 1 , Jan Komorowski 4,5 , Arne K. Sandvik 1 and Astrid Lægreid 1 Norwegian University of Science and Technology, 1 Department of Clinical and Molecular Medicine, 2 Department of Mathematical Sciences, 3 Department of Chemistry, 4 Department of Computer and Information Science, and Uppsala University, 5 The Linnaeus Centre for Bioinformatics. Future directions In order to be useful for a wider research community, the database will support GO information from other publicly available databases that have additional GO information, e.g., the Rat Genome Database. In the future, it will be possible to query eGOn with gene lists containing UniGene cluster numbers, IMAGE clone ids and GO numbers. Another statistical test, called target-target test, is under development. The aim of this test is to compare two target gene lists at each GO-level, where each list may be based on its own master list. This test is not paired, and will be performed using a hypergeometric distribution. It is suitable for e.g. comparing the proportion of genes that are up-regulated with the proportion of genes that are down- regulated at each GO-level. Contact information: Vidar Beisvåg Department of Clinical and Molecular Medicine Norwegian University of Science and Technology Centre of Medicine and Technology NO-7489 Trondheim, NORWAY Telephone work: +47 73 59 86 15 / 88 88 Telefax work: +47 73 59 86 13 Email: [email protected] or [email protected] Figure 3. Modified ’biological process’ tree view Introduction Tools are needed that enable efficient examination of biological background information, such as Gene Ontology annotations, for large numbers of genes. Direct comparison of annotated gene lists from different experiments is not straightforward. Typical questions, such as “Does treatment A induce more genes related to process P than treatments B and C do?”, require manual inspection of several distinct gene lists. With eGOn, a web based gene annotation tool, researchers are given a convenient solution to this problem. Data Flow The annotation data is downloaded from publicly available databases, such as LocusLink and UniGene, and stored as a local copy. This enables quick access to the data in response to the user query. The data flow in eGOn, as shown in Fig.1, is as follows: o a list of GenBank accession numbers is uploaded into the server; o a list of corresponding UniGene clone index numbers is extracted; o The list is used to extract all the respective GO annotations from LocusLink. The retrieved GO terms with the associated genes are mapped into a GO tree structure. The information covers all the three top-level branches in the Ontology. At any node, the quantity of genes from each input list associated with the node is shown as a separate number. The nodes may be collapsed or expanded, and the resulting structure saved as a template for future use. In the table view, it is possible to examine the gene identifiers associated with nodes in the current modification of the GO tree. Two statistical tests are available to quantify the difference between “biological profiles” of two gene lists. Implementation: the eGOn source code is written in PHP, and the underlying database uses mySQL. Comparison of gene lists in eGOn Fig. 2: Clone sets. Different gene lists can be imported and stored in eGOn. Fig. 3: Tree structure: the displayed GO structure may be manually edited, and the current view may be stored as a template, or a predefined template may be used to modify the current view. Fig. 4: Multiple gene lists may be investigated in the table view Two statistical tests aim at identification of significant differences within and between gene lists. To use the target-master test, one selects a master gene list; if no master list is selected, the paired-target test is performed. For both tests the p-values are shown, and the user may define a colour-coded cut-off. Target-master test. The target gene list contains identifiers of the genes of interest (e.g., genes differentially expressed). The master gene list contains identifiers of all the genes printed on the slide (where inference was possible, i.e., the p-values could be calculated). The total number of genes in the target list divided by the total number of genes in the master list is called the overall proportion. This test pin-points levels in the GO-tree where the proportion of the number of genes in the target list divided by the number of genes in the master list is different from the overall proportion. A two-sided one-sample binomial test is implemented. Paired-target test (Fig.4). Two gene lists, A and B (based on the same master list), are chosen and compared at each GO-level. To test if the proportion of genes present in gene list A is different from the proportion of genes present in gene list B, eGOn compares the number of genes present in gene list A but not in gene list B with the number of gene present in gene list B but not in gene list A. This is done using McNemars test. eGOn eGOn GO GO LocusLink LocusLink UniGene UniGene GeneBank GeneBank Accession Accession GeneBank GeneBank Accession Accession Table Table view view Editable Editable GO tree GO tree File export File export Table Table view view Editable Editable GO tree GO tree File export File export Input Input Database Database Output Output

Transcript of 13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn...

Page 1: 13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.

13:10:5813:10:58

A New Tool for Mapping Microarray Data onto the Gene Ontology Structure

(http://nova.idi.ntnu.no/)

Abstract

eGOn (explore Gene Ontology) is a bioinformatics tool that facilitates

the use of biological background knowledge in analysis of genes

selected by high throughput analysis. The list of identifiers for the

selected genes is submitted to the server, and eGOn performs a search

over the publicly available gene databases to retrieve GO terms

annotated to these genes. The results are visualized in the GO

hierarchical structure, and can be exported in various text formats. An

essential feature of eGOn is that several gene lists may be analyzed

simultaneously to compare the distribution of the annotated genes over

the whole GO hierarchy. Statistical tests are computed to assess the

level of dissimilarity between different gene lists.

Figure 1. Data Flow in eGOn

Figure 4. Table view with annotated genes and results from the McNemars test

Figure 2. Clone set view

Vidar Beisvåg1, Lars Jølsum1, Wacek Kuśnierczyk4, Mette Langaas2, Bjørn Alsberg3, Hallgeir Bergum1, Jan Komorowski4,5, Arne K. Sandvik1 and Astrid Lægreid1

Norwegian University of Science and Technology, 1Department of Clinical and Molecular Medicine, 2Department of Mathematical Sciences, 3Department of Chemistry, 4Department of Computer and Information Science, and Uppsala University, 5The Linnaeus Centre for Bioinformatics.

Future directions

In order to be useful for a wider research community, the database will support GO

information from other publicly available databases that have additional GO information,

e.g., the Rat Genome Database.

In the future, it will be possible to query eGOn with gene lists containing UniGene

cluster numbers, IMAGE clone ids and GO numbers.

Another statistical test, called target-target test, is under development. The aim of this

test is to compare two target gene lists at each GO-level, where each list may be based on

its own master list. This test is not paired, and will be performed using a hypergeometric

distribution. It is suitable for e.g. comparing the proportion of genes that are up-regulated

with the proportion of genes that are down-regulated at each GO-level.

Contact information:Vidar BeisvågDepartment of Clinical and Molecular MedicineNorwegian University of Science and TechnologyCentre of Medicine and TechnologyNO-7489 Trondheim, NORWAYTelephone work: +47 73 59 86 15 / 88 88Telefax work: +47 73 59 86 13Email: [email protected] or [email protected]

Figure 3. Modified ’biological process’ tree view

Introduction

Tools are needed that enable efficient examination of biological background

information, such as Gene Ontology annotations, for large numbers of genes.

Direct comparison of annotated gene lists from different experiments is not

straightforward. Typical questions, such as “Does treatment A induce more genes

related to process P than treatments B and C do?”, require manual inspection of

several distinct gene lists.

With eGOn, a web based gene annotation tool, researchers are given a convenient

solution to this problem.

Data Flow

The annotation data is downloaded from publicly available databases, such as

LocusLink and UniGene, and stored as a local copy. This enables quick access to the

data in response to the user query. The data flow in eGOn, as shown in Fig.1, is as

follows:

o a list of GenBank accession numbers is uploaded into the server;

o a list of corresponding UniGene clone index numbers is extracted;

o The list is used to extract all the respective GO annotations from LocusLink.

The retrieved GO terms with the associated genes are mapped into a GO tree structure.

The information covers all the three top-level branches in the Ontology.

At any node, the quantity of genes from each input list associated with the node is

shown as a separate number.

The nodes may be collapsed or expanded, and the resulting structure saved as a

template for future use. In the table view, it is possible to examine the gene identifiers

associated with nodes in the current modification of the GO tree.

Two statistical tests are available to quantify the difference between “biological

profiles” of two gene lists.

Implementation: the eGOn source code is written in PHP, and the underlying database

uses mySQL.

Comparison of gene lists in eGOn

Fig. 2: Clone sets. Different gene lists can be imported and stored in eGOn.

Fig. 3: Tree structure: the displayed GO structure may be manually edited, and the

current view may be stored as a template, or a predefined template may be used to

modify the current view.

Fig. 4: Multiple gene lists may be investigated in the table view

Two statistical tests aim at identification of significant differences within and between

gene lists. To use the target-master test, one selects a master gene list; if no master list

is selected, the paired-target test is performed. For both tests the p-values are shown,

and the user may define a colour-coded cut-off.

Target-master test. The target gene list contains identifiers of the genes of interest

(e.g., genes differentially expressed). The master gene list contains identifiers of all

the genes printed on the slide (where inference was possible, i.e., the p-values

could be calculated). The total number of genes in the target list divided by the

total number of genes in the master list is called the overall proportion. This test

pin-points levels in the GO-tree where the proportion of the number of genes in the

target list divided by the number of genes in the master list is different from the

overall proportion. A two-sided one-sample binomial test is implemented.

Paired-target test (Fig.4). Two gene lists, A and B (based on the same master

list), are chosen and compared at each GO-level. To test if the proportion of genes

present in gene list A is different from the proportion of genes present in gene list

B, eGOn compares the number of genes present in gene list A but not in gene list

B with the number of gene present in gene list B but not in gene list A. This is

done using McNemars test.

eGOneGOneGOneGOn

GOGOGOGO

LocusLinkLocusLinkLocusLinkLocusLink UniGeneUniGeneUniGeneUniGene

GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession

GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession TableTable

viewviewTableTableviewview

EditableEditableGO treeGO treeEditableEditableGO treeGO tree

File exportFile exportFile exportFile export

TableTableviewviewTableTableviewview

EditableEditableGO treeGO treeEditableEditableGO treeGO tree

File exportFile exportFile exportFile export

InputInputInputInput DatabaseDatabaseDatabaseDatabase OutputOutputOutputOutput