13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn...
-
Upload
felipe-viles -
Category
Documents
-
view
217 -
download
2
Transcript of 13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn...
13:10:5813:10:58
A New Tool for Mapping Microarray Data onto the Gene Ontology Structure
(http://nova.idi.ntnu.no/)
Abstract
eGOn (explore Gene Ontology) is a bioinformatics tool that facilitates
the use of biological background knowledge in analysis of genes
selected by high throughput analysis. The list of identifiers for the
selected genes is submitted to the server, and eGOn performs a search
over the publicly available gene databases to retrieve GO terms
annotated to these genes. The results are visualized in the GO
hierarchical structure, and can be exported in various text formats. An
essential feature of eGOn is that several gene lists may be analyzed
simultaneously to compare the distribution of the annotated genes over
the whole GO hierarchy. Statistical tests are computed to assess the
level of dissimilarity between different gene lists.
Figure 1. Data Flow in eGOn
Figure 4. Table view with annotated genes and results from the McNemars test
Figure 2. Clone set view
Vidar Beisvåg1, Lars Jølsum1, Wacek Kuśnierczyk4, Mette Langaas2, Bjørn Alsberg3, Hallgeir Bergum1, Jan Komorowski4,5, Arne K. Sandvik1 and Astrid Lægreid1
Norwegian University of Science and Technology, 1Department of Clinical and Molecular Medicine, 2Department of Mathematical Sciences, 3Department of Chemistry, 4Department of Computer and Information Science, and Uppsala University, 5The Linnaeus Centre for Bioinformatics.
Future directions
In order to be useful for a wider research community, the database will support GO
information from other publicly available databases that have additional GO information,
e.g., the Rat Genome Database.
In the future, it will be possible to query eGOn with gene lists containing UniGene
cluster numbers, IMAGE clone ids and GO numbers.
Another statistical test, called target-target test, is under development. The aim of this
test is to compare two target gene lists at each GO-level, where each list may be based on
its own master list. This test is not paired, and will be performed using a hypergeometric
distribution. It is suitable for e.g. comparing the proportion of genes that are up-regulated
with the proportion of genes that are down-regulated at each GO-level.
Contact information:Vidar BeisvågDepartment of Clinical and Molecular MedicineNorwegian University of Science and TechnologyCentre of Medicine and TechnologyNO-7489 Trondheim, NORWAYTelephone work: +47 73 59 86 15 / 88 88Telefax work: +47 73 59 86 13Email: [email protected] or [email protected]
Figure 3. Modified ’biological process’ tree view
Introduction
Tools are needed that enable efficient examination of biological background
information, such as Gene Ontology annotations, for large numbers of genes.
Direct comparison of annotated gene lists from different experiments is not
straightforward. Typical questions, such as “Does treatment A induce more genes
related to process P than treatments B and C do?”, require manual inspection of
several distinct gene lists.
With eGOn, a web based gene annotation tool, researchers are given a convenient
solution to this problem.
Data Flow
The annotation data is downloaded from publicly available databases, such as
LocusLink and UniGene, and stored as a local copy. This enables quick access to the
data in response to the user query. The data flow in eGOn, as shown in Fig.1, is as
follows:
o a list of GenBank accession numbers is uploaded into the server;
o a list of corresponding UniGene clone index numbers is extracted;
o The list is used to extract all the respective GO annotations from LocusLink.
The retrieved GO terms with the associated genes are mapped into a GO tree structure.
The information covers all the three top-level branches in the Ontology.
At any node, the quantity of genes from each input list associated with the node is
shown as a separate number.
The nodes may be collapsed or expanded, and the resulting structure saved as a
template for future use. In the table view, it is possible to examine the gene identifiers
associated with nodes in the current modification of the GO tree.
Two statistical tests are available to quantify the difference between “biological
profiles” of two gene lists.
Implementation: the eGOn source code is written in PHP, and the underlying database
uses mySQL.
Comparison of gene lists in eGOn
Fig. 2: Clone sets. Different gene lists can be imported and stored in eGOn.
Fig. 3: Tree structure: the displayed GO structure may be manually edited, and the
current view may be stored as a template, or a predefined template may be used to
modify the current view.
Fig. 4: Multiple gene lists may be investigated in the table view
Two statistical tests aim at identification of significant differences within and between
gene lists. To use the target-master test, one selects a master gene list; if no master list
is selected, the paired-target test is performed. For both tests the p-values are shown,
and the user may define a colour-coded cut-off.
Target-master test. The target gene list contains identifiers of the genes of interest
(e.g., genes differentially expressed). The master gene list contains identifiers of all
the genes printed on the slide (where inference was possible, i.e., the p-values
could be calculated). The total number of genes in the target list divided by the
total number of genes in the master list is called the overall proportion. This test
pin-points levels in the GO-tree where the proportion of the number of genes in the
target list divided by the number of genes in the master list is different from the
overall proportion. A two-sided one-sample binomial test is implemented.
Paired-target test (Fig.4). Two gene lists, A and B (based on the same master
list), are chosen and compared at each GO-level. To test if the proportion of genes
present in gene list A is different from the proportion of genes present in gene list
B, eGOn compares the number of genes present in gene list A but not in gene list
B with the number of gene present in gene list B but not in gene list A. This is
done using McNemars test.
eGOneGOneGOneGOn
GOGOGOGO
LocusLinkLocusLinkLocusLinkLocusLink UniGeneUniGeneUniGeneUniGene
GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession
GeneBankGeneBankAccessionAccessionGeneBankGeneBankAccessionAccession TableTable
viewviewTableTableviewview
EditableEditableGO treeGO treeEditableEditableGO treeGO tree
File exportFile exportFile exportFile export
TableTableviewviewTableTableviewview
EditableEditableGO treeGO treeEditableEditableGO treeGO tree
File exportFile exportFile exportFile export
InputInputInputInput DatabaseDatabaseDatabaseDatabase OutputOutputOutputOutput