Download - 13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.

Transcript
Page 1: 13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.

13:10:5813:10:58

A New Tool for Mapping Microarray Data onto the Gene Ontology Structure

(http://nova.idi.ntnu.no/)

Abstract

eGOn (explore Gene Ontology) is a bioinformatics tool that facilitates

the use of biological background knowledge in analysis of genes

selected by high throughput analysis. The list of identifiers for the

selected genes is submitted to the server, and eGOn performs a search

over the publicly available gene databases to retrieve GO terms

annotated to these genes. The results are visualized in the GO

hierarchical structure, and can be exported in various text formats. An

essential feature of eGOn is that several gene lists may be analyzed

simultaneously to compare the distribution of the annotated genes over

the whole GO hierarchy. Statistical tests are computed to assess the

level of dissimilarity between different gene lists.

Figure 1. Data Flow in eGOn

Figure 4. Table view with annotated genes and results from the McNemars test

Figure 2. Clone set view

Vidar Beisvåg1, Lars Jølsum1, Wacek Kuśnierczyk4, Mette Langaas2, Bjørn Alsberg3, Hallgeir Bergum1, Jan Komorowski4,5, Arne K. Sandvik1 and Astrid Lægreid1

Norwegian University of Science and Technology, 1Department of Clinical and Molecular Medicine, 2Department of Mathematical Sciences, 3Department of Chemistry, 4Department of Computer and Information Science, and Uppsala University, 5The Linnaeus Centre for Bioinformatics.

Future directions

In order to be useful for a wider research community, the database will support GO

information from other publicly available databases that have additional GO information,

e.g., the Rat Genome Database.

In the future, it will be possible to query eGOn with gene lists containing UniGene

cluster numbers, IMAGE clone ids and GO numbers.

Another statistical test, called target-target test, is under development. The aim of this

test is to compare two target gene lists at each GO-level, where each list may be based on

its own master list. This test is not paired, and will be performed using a hypergeometric

distribution. It is suitable for e.g. comparing the proportion of genes that are up-regulated

with the proportion of genes that are down-regulated at each GO-level.

Contact information:Vidar BeisvågDepartment of Clinical and Molecular MedicineNorwegian University of Science and TechnologyCentre of Medicine and TechnologyNO-7489 Trondheim, NORWAYTelephone work: +47 73 59 86 15 / 88 88Telefax work: +47 73 59 86 13Email: [email protected] or [email protected]

Figure 3. Modified ’biological process’ tree view

Introduction

Tools are needed that enable efficient examination of biological background

information, such as Gene Ontology annotations, for large numbers of genes.

Direct comparison of annotated gene lists from different experiments is not

straightforward. Typical questions, such as “Does treatment A induce more genes

related to process P than treatments B and C do?”, require manual inspection of

several distinct gene lists.

With eGOn, a web based gene annotation tool, researchers are given a convenient

solution to this problem.

Data Flow

The annotation data is downloaded from publicly available databases, such as

LocusLink and UniGene, and stored as a local copy. This enables quick access to the

data in response to the user query. The data flow in eGOn, as shown in Fig.1, is as

follows:

o a list of GenBank accession numbers is uploaded into the server;

o a list of corresponding UniGene clone index numbers is extracted;

o The list is used to extract all the respective GO annotations from LocusLink.

The retrieved GO terms with the associated genes are mapped into a GO tree structure.

The information covers all the three top-level branches in the Ontology.

At any node, the quantity of genes from each input list associated with the node is

shown as a separate number.

The nodes may be collapsed or expanded, and the resulting structure saved as a

template for future use. In the table view, it is possible to examine the gene identifiers

associated with nodes in the current modification of the GO tree.

Two statistical tests are available to quantify the difference between “biological

profiles” of two gene lists.

Implementation: the eGOn source code is written in PHP, and the underlying database

uses mySQL.

Comparison of gene lists in eGOn

Fig. 2: Clone sets. Different gene lists can be imported and stored in eGOn.

Fig. 3: Tree structure: the displayed GO structure may be manually edited, and the

current view may be stored as a template, or a predefined template may be used to

modify the current view.

Fig. 4: Multiple gene lists may be investigated in the table view

Two statistical tests aim at identification of significant differences within and between

gene lists. To use the target-master test, one selects a master gene list; if no master list

is selected, the paired-target test is performed. For both tests the p-values are shown,

and the user may define a colour-coded cut-off.

Target-master test. The target gene list contains identifiers of the genes of interest

(e.g., genes differentially expressed). The master gene list contains identifiers of all

the genes printed on the slide (where inference was possible, i.e., the p-values

could be calculated). The total number of genes in the target list divided by the

total number of genes in the master list is called the overall proportion. This test

pin-points levels in the GO-tree where the proportion of the number of genes in the

target list divided by the number of genes in the master list is different from the

overall proportion. A two-sided one-sample binomial test is implemented.

Paired-target test (Fig.4). Two gene lists, A and B (based on the same master

list), are chosen and compared at each GO-level. To test if the proportion of genes

present in gene list A is different from the proportion of genes present in gene list

B, eGOn compares the number of genes present in gene list A but not in gene list

B with the number of gene present in gene list B but not in gene list A. This is

done using McNemars test.

eGOneGOneGOneGOn

GOGOGOGO

LocusLinkLocusLinkLocusLinkLocusLink UniGeneUniGeneUniGeneUniGene

GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession

GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession TableTable

viewviewTableTableviewview

EditableEditableGO treeGO treeEditableEditableGO treeGO tree

File exportFile exportFile exportFile export

TableTableviewviewTableTableviewview

EditableEditableGO treeGO treeEditableEditableGO treeGO tree

File exportFile exportFile exportFile export

InputInputInputInput DatabaseDatabaseDatabaseDatabase OutputOutputOutputOutput