Review - GeneCards · 186 2013 Wiley-VCH Verlag GmbH&Co. KGaA, Weinheim Isr. J. Chem. 2013, 53,...

14
DOI: 10.1002/ijch.201200094 An Overview of Synergistic Data Tools for Biological Scrutiny Tsviya Olender,* [a] Marilyn Safran, [a] Ron Edgar, [b] Gil Stelzer, [a] Noam Nativ, [a] Naomi Rosen, [a] Ronit Shtrichman, [b] Yaron Mazor, [b] Michael D. West, [b] Ifat Keydar, [a] Noa Rappaport, [a] Frida Belinky, [a] David Warshawsky, [b] and Doron Lancet [a] 1 Introduction A main focus of our research is the use of genomics and bioinformatics tools to obtain novel biological insights. These approaches involve the application of computation- al strategies to biological information management and analysis. We develop data compendia that assist in obtain- ing a better understanding of how genes and their varia- tions underlie different human phenotypes. We focus on several parallel threads, including human olfaction, Men- delian and complex diseases, embryonic development and stem cell research. In the spirit of systems biology, we help generate genome-wide views to facilitate gene-based inquiry. One of our long-time fortes is a gene-centric view of the human genome universe, as embodied in Gene- Cards. More recently, we generated in parallel the human disease compendium MalaCards. Both are richly annotat- ed from numerous web resources and provide focused in- formation to external data structures (Figure 1). A special liaison is between the Weizmann data structure and the LifeMap Discovery database, which provides a unique angle on embryonic development and stem cell research. Such a network of databases helps our own teams as well as numerous external users worldwide carry out genome and bioinformatics research, creating a live digital “eco- system” that stores and expands knowledge. This paper presents an overview of each of these databases, and ex- amples of interconnections and synergistic benefits. 2 GeneCards The individual scientist, seeking research knowledge about a gene of interest, can be overwhelmed by the deluge of data from worldwide genome projects. The la- borious task of sifting through hundreds of thousands of records can be reduced by the use of integrated flexible and user-friendly databases. For over 15 years, we have developed and expanded GeneCards (www.genecards .org), a comprehensive, authoritative compendium of an- notative information about human genes, accessed yearly by more than two million unique users. Its gene-centric content is automatically mined and integrated from over 100 electronic sources, including HGNC, NCBI, Ensembl, UniProt, UCSC, REACTOME, and BioGPS, resulting in web-based cards with sections encompassing a variety of topics (e.g., genomic location, gene function, transcrip- tion, disorder, literature, and pathways) for each of more than 122,000 human gene entries. [1] GeneCards also fea- Abstract : A network of biological databases is reviewed, supplying a framework for studies of human genes and the association of their genomic variations with human pheno- types. The network is composed of GeneCards, the human gene compendium, which provides comprehensive informa- tion on all known and predicted human genes, along with its suite members GeneDecks and GeneLoc. Two databases are shown that address genes and variations focusing on ol- factory reception (HORDE) and transduction (GOSdb). In the realm of disease scrutiny, we portray MalaCards, a novel comprehensive database of human diseases and their anno- tations. Also shown is GeneKid, a tool aimed at generating novel kidney disease biomarkers using systems biology, as well as Xome, a database for whole-exome next-generation DNA sequences for human diseases in the Israeli popula- tion. Finally, we show LifeMap Discovery, a database of em- bryonic development, stem cell research and regenerative medicine, which links to both GeneCards and MalaCards. Keywords: bioinformatics · databases · gene sequencing · genetic disease · olfaction [a] T. Olender, M. Safran, G. Stelzer, N. Nativ, N. Rosen, I. Keydar, N. Rappaport, F. Belinky, D. Lancet Department of Molecular Genetics Weizmann Institute of Science Rehovot 76100 (Israel) e-mail: [email protected] [b] R. Edgar, R. Shtrichman, Y. Mazor, M.D. West, D. Warshawsky LifeMap Sciences HaNehoshet 10, Tel Aviv 69710 (Israel) Isr. J. Chem. 2013, 53, 185 – 198 # 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 185 Review

Transcript of Review - GeneCards · 186 2013 Wiley-VCH Verlag GmbH&Co. KGaA, Weinheim Isr. J. Chem. 2013, 53,...

  • DOI: 10.1002/ijch.201200094

    An Overview of Synergistic Data Tools for BiologicalScrutinyTsviya Olender,*[a] Marilyn Safran,[a] Ron Edgar,[b] Gil Stelzer,[a] Noam Nativ,[a] Naomi Rosen,[a]

    Ronit Shtrichman,[b] Yaron Mazor,[b] Michael D. West,[b] Ifat Keydar,[a] Noa Rappaport,[a] Frida Belinky,[a]

    David Warshawsky,[b] and Doron Lancet[a]

    1 Introduction

    A main focus of our research is the use of genomics andbioinformatics tools to obtain novel biological insights.These approaches involve the application of computation-al strategies to biological information management andanalysis. We develop data compendia that assist in obtain-ing a better understanding of how genes and their varia-tions underlie different human phenotypes. We focus onseveral parallel threads, including human olfaction, Men-delian and complex diseases, embryonic development andstem cell research. In the spirit of systems biology, wehelp generate genome-wide views to facilitate gene-basedinquiry. One of our long-time fortes is a gene-centric viewof the human genome universe, as embodied in Gene-Cards. More recently, we generated in parallel the humandisease compendium MalaCards. Both are richly annotat-ed from numerous web resources and provide focused in-formation to external data structures (Figure 1). A specialliaison is between the Weizmann data structure and theLifeMap Discovery database, which provides a uniqueangle on embryonic development and stem cell research.Such a network of databases helps our own teams as wellas numerous external users worldwide carry out genomeand bioinformatics research, creating a live digital “eco-system” that stores and expands knowledge. This paperpresents an overview of each of these databases, and ex-amples of interconnections and synergistic benefits.

    2 GeneCards

    The individual scientist, seeking research knowledgeabout a gene of interest, can be overwhelmed by thedeluge of data from worldwide genome projects. The la-borious task of sifting through hundreds of thousands ofrecords can be reduced by the use of integrated flexibleand user-friendly databases. For over 15 years, we havedeveloped and expanded GeneCards (www.genecards.org), a comprehensive, authoritative compendium of an-notative information about human genes, accessed yearlyby more than two million unique users. Its gene-centriccontent is automatically mined and integrated from over100 electronic sources, including HGNC, NCBI, Ensembl,UniProt, UCSC, REACTOME, and BioGPS, resulting inweb-based cards with sections encompassing a variety oftopics (e.g., genomic location, gene function, transcrip-tion, disorder, literature, and pathways) for each of morethan 122,000 human gene entries.[1] GeneCards also fea-

    Abstract : A network of biological databases is reviewed,supplying a framework for studies of human genes and theassociation of their genomic variations with human pheno-types. The network is composed of GeneCards, the humangene compendium, which provides comprehensive informa-tion on all known and predicted human genes, along withits suite members GeneDecks and GeneLoc. Two databasesare shown that address genes and variations focusing on ol-factory reception (HORDE) and transduction (GOSdb). In

    the realm of disease scrutiny, we portray MalaCards, a novelcomprehensive database of human diseases and their anno-tations. Also shown is GeneKid, a tool aimed at generatingnovel kidney disease biomarkers using systems biology, aswell as Xome, a database for whole-exome next-generationDNA sequences for human diseases in the Israeli popula-tion. Finally, we show LifeMap Discovery, a database of em-bryonic development, stem cell research and regenerativemedicine, which links to both GeneCards and MalaCards.

    Keywords: bioinformatics · databases · gene sequencing · genetic disease · olfaction

    [a] T. Olender, M. Safran, G. Stelzer, N. Nativ, N. Rosen, I. Keydar,N. Rappaport, F. Belinky, D. LancetDepartment of Molecular GeneticsWeizmann Institute of ScienceRehovot 76100 (Israel)e-mail: [email protected]

    [b] R. Edgar, R. Shtrichman, Y. Mazor, M. D. West, D. WarshawskyLifeMap SciencesHaNehoshet 10, Tel Aviv 69710 (Israel)

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 185

    Review

  • tures comprehensive searches of all mined data, allowingone to generate data relations that would otherwise beconcealed. Also shown are links to gene-related researchreagents such as antibodies, recombinant proteins, DNAclones and inhibitory RNAs, was well as human embryon-ic stem and progenitor cell lines. The GeneCards suitemember GeneALaCart provides batch query support,whereby users submit a gene list (e.g., from a microarrayexperiment) along with desired GeneCards annotation

    fields and receive tabulated output. Recent research fo-cuses include revamped gene expression visualization, in-cluding RNA-seq data from the Illumina BodyMap proj-ect, and an expanded 76-tissue repertoire from BioGPS[2]

    and CGAP Serial Analysis of Gene Expression (SAGE)[3]

    data.

    2.1 ncRNA Genes

    Non-coding RNA (ncRNA) genes in the human genomehave been increasingly studied in the past few years, andnew ncRNA classes have been discovered.[4] However,until recently, no comprehensive non-redundant databasefor all ncRNA human genes was available. We have re-cently completed a major unification effort of ncRNAgene entries within GeneCards, integrated from 15 differ-ent data sources by mining four secondary sources[5] andseveral additional primary sources. An integration algo-rithm based on mapping to genomic coordinates em-ployed, among others, another GeneCards suite member,GeneLoc[6] (Section 2.3). This was used to cluster overlap-ping entries at unified locations, thereby merging parallelversions of the same gene, as well as precursors with theirmature transcripts.[1a] A total of ~64,000 new humanncRNA genes were added to GeneCards, resulting ina total of almost 80,000 entries belonging to 14 RNAclasses, including 22,000 piwi-interacting RNAs (piRNAs)and 17,000 long non-coding RNAs (LncRNAs). Gene-Cards V3.09 now contains ~122,500 gene entries, witha greater than fivefold enhancement of the ncRNAcount, compared to the pre-unification version 3.07 (Fig-ure 2A). Among the annotations provided is a qualityscore, reflecting the degree of confidence that the entry isindeed a bona fide gene, based on functional annotationsas well as expression. Current work in progress includesa probabilistic categorization, indicating source-relatedambiguities in gene classification (Figure 2 B).

    2.2 GeneDecks

    Enrichment analysis of high-throughput data is a keymethod used to glean new biological insights from experi-mental results.[7] GeneDecks, a GeneCards suite member(www.genecards.org/GeneDecks), is an annotation-basedenrichment analysis tool for human genes and gene sets.It constitutes a systems biology facilitator, leveragingGeneCards� unique wealth of combinatorial annotations.In its Set Distiller mode, GeneDecks exposes commonali-ties within a list of genes, shedding functional light uponthe constituent individual genes (Figure 3 A). Descriptorsthat are enriched in the query gene set are sorted first bytheir statistical significance and then by their shared genecount. GeneDecks� strength lies in the broad range of at-tributes available for use in its analyses, coming fromeight categories, such as disorders, compounds, pathways,Gene Ontology (GO) terms and expression. It produces

    Dr. Tsviya Olender has a PhD in physi-cal chemistry from the Weizmann Insti-tute. In 1999, she joined the group ofProf. Lancet as a postdoc and contin-ued on as a research associate. Her re-search focuses on the genetics ofhuman olfaction and the genetics ofrare diseases using computationaltools. Together with Prof. Lancet, shehas played a key role in discovering ol-factory receptor genes from humansand other mammals, and developedthe human OR data explorer (HORDE)database. She is also involved in developing tools for decipheringdisease-causing mutations, including the Xome database describedhere.

    Marilyn Safran is the head of develop-ment for the multi-disciplinary Gene-Cards/MalaCards team in the Lancetlab at the Weizmann Institute of Sci-ence in Israel. Prior to joining in 1997,she did software development andmanagement at Ubique Inc., the CSdepartment at Weizmann, Bell Labora-tories, MIT’s Lincoln Laboratory, andthe Memorial Sloan Kettering CancerCenter, in the fields of bioinformatics,web applications, compilers, databas-es, and medicine. Marilyn receiveda BA in Math with Honors from Queens College of CUNY in 1974,and an MS in Computer Science from Boston University in 1978.

    Prof. Lancet, head of the Weizmann In-stitute’s Crown Genome Center, hasa PhD from Weizmann and postdoctor-al training from Harvard and Yale. Hepioneered research on the biochem-istry, genetics and evolution of olfac-tion, and investigates human rare dis-ease genes. Lancet and his team devel-oped GeneCards, a world-renownedweb compendium of human genes,and more recently initiated a compan-ion database MalaCards, a comprehen-sive web tool for human diseases.Lancet was awarded the Takasago and Wright Awards in the USAand the Landau Prize in Israel. He is a member of EMBO and (until2012) of the HUGO Council.

    186 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • superior results compared to parallel analysis systems,such as DAVID.[8] The new disease-centric database Mal-aCards (Section 4.1) strongly relies on GeneDecks formany of its annotations, by harnessing Set Distiller tofind shared descriptors for the genes associated with a dis-ease.

    In its Partner Hunter mode, one can “GeneDecks”a given gene with respect to a selected combinatorial an-notation in order to obtain a set of similar genes, i.e. , po-tential functional paralogs. GeneDecks portrays function-al partners in descending order of similarity (Figure 3 B),

    Figure 1. Data integration and information flow. One hundred varied external sources provide detailed information to the GeneCards inte-grated database of human genes, along with its suite members, GeneLoc exon-based genomic map and GeneDecks sets analysis. Gene-Cards has recently greatly enhanced its ncRNA gene coverage. GeneCards empowers specific databases as shown, covering tools for olfac-tory genes as well as disease-specific genes and biomarkers. Two novel medically focused databases, MalaCards and LifeMap Discovery, to-gether with GeneCards, form a powerful trio of mutually enriching bioinformatics tools. Double-lined arrows depict data sharing, single-lined arrows depict links, and the number of links between each pair of sources is also indicated.

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 187

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • often discovering relations that might be overlookedbased on sequence similarity alone.[8b]

    2.3 GeneLoc

    The GeneLoc algorithm (genecards.weizmann.ac.il/geneloc/) creates an integrated location map of genes andmarkers within the entire human genome.[6] It eliminatesredundancies and assigns each gene a meaningful chro-mosomal megabase location. This is used, in turn, tocreate a gene identifier, which serves as a GeneCards ID.GeneLoc currently uses gene sets from NCBI, Ensembl,and newly added ncRNA sources. It compares these col-lections, deciding which entries should be consolidatedand which are to remain discrete. The resulting GeneLoc“gene territory” reflects the range of the unified genomic

    coordinates of a given gene, taking into account everyexon. Additionally, DNA segments classified by catego-ries (such as EST clusters) are presented, alongside thegenes, on a single megabase-scale map, with further infor-mation and links to relevant databases. GeneLoc�s impor-tant advantage is in generating a simple tabulated list ofgenes and markers in a genomic interval of choice, orflanking a gene of interest, with ready links to GeneCardsand other databases such as Ensembl, Genethon andNCBI Gene (Figure 4). In this way, it complementsgraphic maps such as provided by the UCSC.

    Figure 2. GeneCards ncRNA genes. A) A unification procedure thatintegrated numerous data sources within GeneCards resulted in anincrease from ~15,000 RNA genes to almost 80,000, part of a gener-al gene count entry enhancement from ~68,000 entries to~122,500. The uncategorized genes segment in GeneCards hasbeen greatly reduced (from ~25 % to ~3 %), mainly feeding thencRNA category. B) Scrutiny within the GeneCards compendium re-vealed prevalent annotation ambiguities among different sources,and among different transcripts for the same gene. This involvesa trichotomy among the protein-coding, ncRNA and pseudogenecategories. The diagram portrays a prototypic probabilistic catego-rization, quantifying where a gene lies within the spectrum of thethree types, potentially allowing a more balanced view of a gene’sfunctional category.

    Figure 3. GeneDecks analysis modes. A) Set Distiller : ten bron-chiectasis-related genes (circles on the circumference) were ana-lyzed, resulting in several shared descriptors (inside central circle).Unexpected descriptors, which may give new insight into the dis-ease, appear in bold. B) Partner Hunter: the LEP gene was used asa query gene in an analysis aimed at finding potential functionalparalogous genes via shared attributes. Two of the best resultsappear to the right, indicating why IL6 scored higher than LEPR.

    188 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • 3 Olfactory Databases

    Olfaction, the sense of smell, is a molecularly complexsensory processing system, capable of producing accurateodor perception.[9] Olfactory receptors (ORs), whichdetect and recognize a multitude of odorants, constitutethe largest multigene family in the mammalian genome.They serve as an example of paralog and interindividualdiversity, and reveal a very high count of pseudogenes,with the unusual phenomenon that intact and inactivegenes may segregate in the population.[10] Downstreamfrom the ORs are auxiliary genes that mediate sensorytransduction.

    3.1 HORDE

    HORDE, the human olfactory data explorer (genome.weizmann.ac.il/horde/), presents a complete compendium

    of human OR genes, pseudogenes and segregating pseu-dogenes, as well as ORs from four other vertebrate spe-cies (chimpanzee, dog, opossum and platypus). This data-base is generated by an automated OR-specialized com-putational pipeline, which mines OR gene and pseudo-gene sequences out of complete genomes.[11] The pipelinehas the capacity to annotate OR pseudogenes, which areusually not identified by standard whole-genome annota-tion pipelines, making HORDE a unique resource for allOR loci. Human location information produced by thispipeline is supplied to HGNC and to GeneCards. Impor-tantly, HORDE�s automatic pipeline generates gene sym-bols based on sequence-similarity classification for eachgene.[12] For example, OR1A2 indicates family 1, subfami-ly A, member 2. This nomenclature system was subse-quently applied to dog,[13] opossum,[14] and platypus.[15] Itspotential as a tool for the study of mammalian ecologicaladaptation was recently demonstrated.[16]

    Figure 4. GeneLoc tabulation results for the TECPR2 genomic region. GeneLoc presents genes along the chromosome ordered by their co-ordinates, hyperlinked to their GeneCards source database(s), and with a description, when available. Gene identifiers from outside data-bases also have a link to a page that displays their exons, as reported by those databases.

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 189

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • In an attempt to provide insights into the evolution,structure and function of the complete human OR reper-toire, HORDE integrates diverse bioinformatic analysesand additional resources into the database, and provideslinks to/from HGNC, GeneCards, ORDB and others.Thus, information is given on the genomic organization of

    the ORs into clusters, expression data (obtained fromESTs and microarray data),[17] gene model,[18] annotationof putative functional amino acid residues,[19] identifica-tion of synthetic clusters,[14] and more (Figure 5 A). Thisinformation is pertinent to several open questions such asthe control of gene expression in the olfactory system, in-

    Figure 5. An example of a HORDE card. A) Partial view of the information provided for the olfactory receptor gene OR7C2. B) The haplo-type section for OR7C2. Only segregating positions are shown, with the frequency of each haplotype in the population (%Freq) and its pre-dicted functionality score (CORP[43]).

    190 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • cluding epithelial zone–specific expression as well as ex-pression with locus and allele exclusion,[20] ectopic expres-sion in non-olfactory tissues,[17,21] such as sperm,[22] andbiased expression of certain OR genes and pseudogenes(e.g., reference [23]).

    Genetic variations in OR genes underlie odorant sensi-tivity differences, specific anosmia (diminished sensitivity)and specific hyperosmia (enhanced sensitivity).[24] To ad-dress the universe of human interindividual variation, werecently integrated a comprehensive catalog of genomicvariations in human ORs, collected from eleven sour-ces,[10b] including The 1000 Genomes Project. This encom-passes single-nucleotide polymorphisms (SNPs) and copy-number variation (CNV), as well as deleterious mutation-al events. Using this catalog, HORDE now presents allinferred protein variants (haplotypes) for all intact ORgenes, potentially related to differences in odorant bind-ing (Figure 5 B). The catalog provides a realistic and up-to-date view of the personal OR repertoire, where abouttwo-thirds of the loci segregate between intact and inacti-vated alleles, and every individual genome contains differ-ent OR combinations.

    Future plans include the expansion of HORDE to ad-ditional OR repertoires of terrestrial vertebrates. Theavailability of such a collection with the unified standardnomenclature system of HORDE across species willallow researchers to make facile cross-species transitionsin the complex OR gene family.

    3.2 GOSdb

    In parallel to the study of odorant sensitivity differences,we also study general olfactory sensitivity (GOS), scantlycharted interindividual differences in overall (average)smell sensitivities.[24a] An extreme case of the GOS phe-notype is isolated congenital general anosmia (CGA),a non-syndromic inborn complete incapacity to perceiveodorants. Our working hypothesis is that genetic varia-tions in auxiliary olfactory genes, including those media-ting transduction and sensory neuronal structure and de-velopment, constitute the genetic basis for GOS andCGA. To better study such chemosensory phenotypes, weperformed a systematic exploration and created GOSdb,an online resource (genome.weizmann.ac.il/GOSdb) de-rived from eleven data sources, which integrates auxiliaryolfactory genes and their variations.[25] As a primarysource, GeneCards was searched for words like “anos-mia”. The resulting gene set was fed into GeneALaCartto extract annotations, including aliases, articles, and ge-nomic locations. In parallel, the literature was surveyed,seeking relevant functional in vitro studies, mouse geneknockouts, and human disorders with olfactory pheno-types. Also tackled were published transcriptome andproteome data for genes expressed in olfactory tissuesand genes identified in olfactory-related linkage peaks. Fi-nally, we performed in-house next-generation transcrip-

    tome sequencing of human olfactory epithelium andmouse olfactory epithelium and bulb, aiming to identifyolfactory sensory-enriched transcripts. The information ispresented on a web card for each gene, with the mainsymbol (e.g., FGFR1) linked prominently to GeneCards.

    Employing a global scoring system based on the attri-butes of the eleven data sources assembled, we identified1680 candidate auxiliary olfactory genes. To assess ourdifferential expression data sources� potential to detectolfactory functional genes, we examined which genes re-ceive high scores. Of the 20 top accumulated scoringgenes, we found eleven genes that were previously anno-tated by other data sources as core olfactory genes, andthe rest are candidates for new scrutiny (Figure 6). Fora shortlist of the 136 top-scoring genes, we identified ge-nomic variants (probably damaging single-nucleotidepolymorphisms, indels and copy-number deletions)gleaned from public repositories. Our GOSdb database ofgenes and their variants should assist in rationalizing thegreat interindividual variation in human overall olfactorysensitivity. The database and its variations assist in an on-going whole-exome sequencing study of 66 Jewish fami-lies with CGA. In addition, GOSdb may aid in scrutiniz-ing undeciphered genetic diseases accompanied by olfac-tory disorders, as exemplified by Kallmann syndrome forwhich, despite substantial progress, most of the geneticbasis remains uncharted.[26] FGFR1 is an example ofa GOSdb gene, implicated in anosmia via in vivo andmouse knockout studies, linked to GeneCards, and associ-ated in Malacards with Kallmann Syndrome. Insights intothis disease were shared with the MalaCards project, andit was featured as the sample malady on the MalaCardsV1.0.1 homepage. Ultimately, GOSdb may illuminateother sensory systems using olfaction as a model, andmay contribute to a broader understanding of neuroge-netics.

    4 Disease Databases

    One of the greatest challenges of biomedical research isdeciphering the underlying mechanisms of human diseas-es, which requires accurate classification and annotation.Most human diseases arise due to complex interactionsbetween multiple genetic variants and environmental riskfactors.[27] The study of diseases could thus shed light onbasic biological mechanisms. In parallel, diagnosis andtreatment are facilitated by the huge amount of informa-tion coming from genomics and proteomics research, al-lowing molecular-level support for medical decisions.

    The integration of massive existing amounts of infor-mation under a single disease nomenclature is an enor-mous challenge. At present, disease compilation, occur-ring in more than 60 existing data sources, is incomplete,heterogeneous and often lacking systematic inquiry mech-anisms. Each data source focuses on different aspects of

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 191

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • disease annotation, and/or contains a partial specializedlist.

    Promising attempts to settle the varied disease nomen-clature are presented via knowledge representationthrough standardized vocabularies, to ensure both effec-tive information sharing and interoperability among infor-mation systems.[28] There are several vocabularies, rangingfrom class-specific ones such as the Infectious DiseaseOntology (IDO, infectiousdiseaseontology.org/page/Main_Page) to more broadly disposed ones such as theInternational Classification of Diseases (ICD),[29] the Uni-fied Medical Language System (UMLS),[30] the Systemat-ized Nomenclature of Medicine – Clinical Terms(SNOMED-CT) (16770974), the Medical Subject Head-ings (MeSH) (www.nlm.nih.gov/mesh/) and the DiseaseOntology (DO).[31] Such data structures range from flatlists, such as Online Mendelian Inheritance in Man(OMIM),[32] to hierarchies, as exemplified by the DiseaseOntology. However, significant inconsistencies prevail inbasic terms pertaining to diseases. Existing vocabulariesare only partially cross-connected to each other, and donot define disease concepts uniformly. Moreover, mostexisting disease databases only partially associate withany ontology, which greatly limits the effectiveness of for-malization and definition unification.

    4.1 MalaCards

    It is clear to us that the realm of disease databasing re-quires the same “one-stop shop solution” as provided byGeneCards for the gene universe. We have recently intro-duced MalaCards (www.malacards.org),[44] an integrateddatabase of human maladies and their annotations, mod-eled on the GeneCards strategy, architecture and infor-mation affluence. MalaCards mines and merges 44 datasources to generate a computerized web card for each of16,919 human disease entries, with disease-specific priori-tized annotations, as well as interdisease connections. Itleverages the GeneCards relational database, searches,and GeneDecks set analyses (Figure 7 A).

    The MalaCards disease list is built from 15 rankedsources, using disease name unification heuristics. Fourschemes populate MalaCards sections: (i) direct interrog-ation of disease resources, to establish integrated diseasenames and synonyms, as well as additional annotationssuch as summaries, drugs/therapeutics, clinical features,genetic tests, and anatomical context; (ii) searches ofGeneCards for related publications and for associatedgenes, with corresponding relevance scores; (iii) analysesof disease-associated gene sets using GeneDecks,[8b] tocalculate statistically significant descriptors enriched inthis set (e.g., in the “Anaplastic Ependymoma” Mala-Card, “tumorigenesis” is entered into the phenotypes sec-tion, while “tumor metastasis” is entered into the path-

    Figure 6. Twenty top-scoring GOSdb genes. The genes are sorted by their accumulated differential expression data sources score. Theseinclude RNA-seq of human olfactory epithelium and mouse olfactory epithelium and bulb, as compared to eight control tissues. Of the 20genes shown, eleven were previously annotated by other data sources as core olfactory genes. Another nine genes are novel in the olfac-tory context (red), interesting candidates for further scrutiny.

    192 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • ways section. This process also assigns a relevance scorefor every hit, and is employed to populate the related dis-eases, phenotypes, pathways, compounds and GO termssections, also providing relevant GeneCards deep links.);(iv) searches in MalaCards itself, e.g., for additional relat-

    ed diseases. The latter form the basis for the constructionof a disease network, based on shared MalaCards annota-tions (Figure 7 B). Such networks embody associationsbased on etiology, clinical features and clinical conditions.This broadly disposed network has a power-law degree

    Figure 7. MalaCards: a database of human maladies. A) MalaCards architectural pipeline, showing integration of heterogeneous diseasedatabases and the leveraging of GeneCards, GeneDecks set distillation, and internal searches for creating MalaCards annotations. B) Mala-Cards disease network. Nodes are connected by co-appearance in MalaCards search results. Edges are colored by their cluster associationand nodes are sized by degree.

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 193

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • distribution, as previously indicated for smaller diseasenetworks, implying that some inherent properties are rep-resented within such networks. Our current work in Mala-Cards includes a more profound classification of diseases,and disease set analyses, hoping to make MalaCards animproved tool for biological as well as medical studies.

    4.2 GeneKid

    A powerful way to integrate heterogeneous omics data isby using genes as their common denominator, using thegenes� annotation network. The expected increase in pa-tient chronic kidney disease (CKD) afflictions,[33] resultingin end-stage renal failure, has inspired SysKid (www.syskid.eu/), an EU consortium involving 25 research groupsfrom 16 countries. The aim of the consortium is to associ-ate genes with CKD, so as to assist in the development ofnew diagnosis and treatment methods. The crucial step inestablishing a unified omics network is the connection ofeach datum, such as RNA expression and metabolites, toa unique approved human gene symbol, a step termed“symbolization”. The robustness of GeneCards� gene an-notation is the basis for overcoming the diversity of geneidentifiers supplied by consortium experimentalists. Thischallenge is significant for microarray probe set and SNPidentifiers, and is even more accentuated by the need toassociate metabolites to genes; to this end, compound-gene associations were fortified using both The HumanMetabolome Database (HMDB)[34] and DrugBank,[35]

    cheminformatics sources that link drugs and compoundsto their gene target.

    GeneKid is a resource for data storage and manage-ment, such as those generated by SysKid. Its databaseconsists of 18 tables storing omics data as the main entity,along with study and sample information. Insertion ofdata into GeneKid occurs after symbolization. Quantita-tive measurements detected for each experimental fea-ture are stored in GeneKid, and a combinatorial/com-pound score is produced for each of them, thus prioritiz-ing their importance in overall SysKid results (Figure 8).The GeneKid user interface enables basic lookup serv-ices, allowing collaborating groups to access intermediateresults. When testing potential biomarkers, it is highlybeneficial to use GeneCards� posted research reagents.As an example, siRNAs for seven specific genes of inter-est were extracted from GeneCards; these are alreadybeing tested on candidate CKD genes. This knowledgecould be the basis for the development of proprietary di-agnostic tools.[36]

    4.3 Xome

    Current advances in next-generation sequencing, such aswhole-genome and whole-exome sequencing,[37] have dra-matically increased the identification probability of causalvariants for genetic disorders. Indeed, during the last two

    years, more than 100 genes have been identified as partic-ipating in rare Mendelian disorders,[38] including our owndiscovery of deleterious mutations in five diseases, suchas the TECPR2 mutation that causes an autosomal reces-sive form of hereditary spastic paraplegia.[39] However,the tremendous and increasing volume of sequencingdata generated by these technologies provides a greatbioinformatics challenge in terms of data processing, stor-age, management and interpretation.

    While typical whole-exome sequencing results in~25,000 variants per individual genome, identification ofthe causal variant involves the application of various fil-tering strategies.[40] Commonly adopted procedures in-clude filtering of common variants, type of allele variants,predictions of pathogenicity, and selection of an inheri-tance mode. A key component in these analyses is a com-parison to an appropriate set of control variants, incorpo-rating exome-sequencing data of individuals from thesame population and arising from the same sequencingplatform. This reduces the number of false positive var-iants and filters out population-specific variants. In thisrealm, we developed Xome, a database for whole-exomesequencing data management. The database is currentlypopulated with data from 105 Israeli individuals. Thesegive rise to 474,781 variants. The access to Xome is cur-rently password protected, because the database containsdetailed patient information, which is difficult to disguisefor this small population.

    Figure 8. Overview of the GeneKid workflow. Individual omicsstudies are uploaded into the GeneKid database and connected togenes by the symbolizer mechanism. Thereupon, scores (fold-change etc.) resulting from mRNA, protein or other experimentscan be interlinked through a single entity, namely a gene, and pri-oritized by an overall score. This process can be used to createa CKD-relevant gene set, which may undergo further scrutiny, im-plicating new pathways, compounds, etc. This could ultimatelyreveal non-obvious chronic kidney disease biomarker candidates.

    194 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • Figure 9. An example card for an in vivo cell in the LifeMap Discovery website. A) Quick view of the cell anatomical location (kidney !metanephric mesenchyme (compartment) ! cell) and summary, notes and development time. B) List of available data on this cell, includ-ing gene expression, high-throughput data, signals, related diseases and more. C) An interactive graphical view of the cell’s developmenttree. Cells lower down are descendants, and cells can have one or more ancestors. D) Insert of the Gene Expression section, with lists ofgenes curated from literature, in situ and microarrays.

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 195

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • 5 LifeMap Discovery

    Understanding how stem and progenitor cells differenti-ate into mature functional cells during embryonic devel-opment is of fundamental interest, and clearly one of themost mystifying areas of multicellular life. Being able togenerate functional cells, and grow tissues from thesestem and progenitor cells in a dish, is perhaps the greatestchallenge for medicine in the coming decades.[41] TheLifeMap Discovery database has been created to bridgethese separate areas of research, and bring knowledgefrom the in vivo into the in vitro realm, and from there toclinical utility, facilitating scientific knowledge and futureapplications of regenerative medicine. The underlyingpostulate of LifeMap Discovery is that understanding thegenes expressed in every developing cell, and the signal-ing that drives its differentiation, will provide invaluableinformation for (i) identification and classification of dif-ferentiated stem and progenitor cells, and (ii) suggestingmechanisms to derive protocols for differentiating thesecells into the desired, more mature cell types.[42]

    The LifeMap Discovery database (discovery.lifemapsc.com) helps users trace the cellular differentiation thatoccurs during mammalian embryonic development. Theannotated cell development tree stems from the zygote,branches to progenitor cells, and terminates at maturecells (Figure 9). The LifeMap Discovery database is basedon developmental paths to specific fates, such as blood,endothelium, motor neurons, bone or cartilage. Develop-mental data are available at cellular and anatomical reso-lutions. In concert with cellular differentiation, complexorgans and tissues are formed; hence each developingcell is also a member of specific anatomical compart-ments, and associated with tissues composing the develop-ing organs.

    LifeMap Discovery is based on systematic gatheringand de novo assimilation of various scientific data de-scribing mostly mouse and human development. Due tothe inherent difficulties in studying human development,mouse information is far more abundant, and therefore isincluded in the database. The mouse data serve both asa foundation for further studies and as a model forhuman development. The resulting annotated cell devel-opment ontology tree is thus based on mouse develop-ment, but with cross-related human development infor-mation wherever available. The information is collectedand described for each cell in specific lineages. Examplesare the endothelium, blood, muscle, bone, cartilage, heartmyocardium, kidney, neuronal cells and more. The anato-my content includes the organ development path supple-mented with relevant images, in situ hybridizations, andhigh-throughput experimental gene-expression data. Cel-lular information includes the cell development path,qualitative gene expression, signaling that affects the dif-ferentiation process, high-throughput gene-expressiondata, related diseases, and relevant references.

    In addition to the in vivo developmental part, the data-base includes in vitro differentiation protocols, as well ascharacteristics and cell therapy applications of the varioustypes of human in vitro cultured cells. These include em-bryonic stem and progenitor cells, adult stem cells, in-duced pluripotent stem cells, and primary cells. The dif-ferentiation protocols are mapped, where possible, to theclosest matching in vivo development cells, anatomicalcompartments and tissues.

    The database is divided into the following main parts:(i) in vivo development – the first complete assembly andreconstruction of cell lineages developed in the mammali-an body; (ii) stem cell differentiation – cultured cells,their differentiation protocols and cell therapy�relatedapplications; and (iii) regenerative medicine – develop-ment of stem and progenitor cells into therapeutic prod-ucts. These different parts are connected and interlacedby computational and hand-curated methods. Most note-worthy, the in vivo entities are linked to their closest invitro entities whenever data is available. Matching ismostly based on gene-expression analysis and also onother cell characteristics such as functional and morpho-logical similarity.

    The value provided by LifeMap Discovery and its pro-jected effect on stem cell research and therapeutics origi-nate from the combined power of these data, which ena-bles or facilitates identifying, predicting and indicatingpossible differentiation paths and future regenerativemedicine applications. LifeMap Discovery integrates withGeneCards, where the rich information available on eachgene is directly linked, and MalaCards, to which it pro-vides the relationships of cells, tissues and organs to thedisease potentially targeted by regenerative medicine andcell therapy applications.

    Acknowledgements

    Research at the Weizmann Institute of Science is support-ed by grants from the Israel Science Foundation HeritageFund, the SysKid EU FP7 project (No. 241544) and Life-Map Sciences Inc. California (USA). Research supportcomes also from the Crown Human Genome Center, theWeizmann-Sourasky collaborative grant and the Nellaand Leon Benoziyo Center for Neurological Diseases atthe Weizmann Institute of Science. We are grateful toMichal Twik, Tsippi Iny Stein, Shahar Zimmerman, IrisBahir, Danit Oz-Levi, Anna Alkelai and Edna Ben-Asher from the Weizmann Institute and Yaron Guan-Golan from LifeMap Sciences. LifeMap Sciences Inc. isa subsidiary of BioTime Inc., Alameda, CA (USA).

    196 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de

  • References

    [1] a) F. Belinky, I. Bahir, G. Stelzer, S. Zimmerman, N. Rosen,N. Nativ, I. Dalah, T. Iny Stein, N. Rappaport, T. Mituyama,M. Safran, D. Lancet, Bioinformatics 2013, 29, 255 –261;b) G. Stelzer, I. Dalah, T. I. Stein, Y. Satanower, N. Rosen,N. Nativ, D. Oz-Levi, T. Olender, F. Belinky, I. Bahir, H.Krug, P. Perco, B. Mayer, E. Kolker, M. Safran, D. Lancet,Hum. Genomics 2011, 5, 709–717; c) M. Safran, I. Dalah, J.Alexander, N. Rosen, T. Iny Stein, M. Shmoish, N. Nativ, I.Bahir, T. Doniger, H. Krug, A. Sirota-Madi, T. Olender, Y.Golan, G. Stelzer, A. Harel, D. Lancet, Database 2010.DOI: 10.1093/database/baq020.

    [2] C. Wu, C. Orozco, J. Boyer, M. Leglise, J. Goodale, S. Bata-lov, C. L. Hodge, J. Haase, J. Janes, J. W. Huss 3rd, A. I. Su,Genome Biol. 2009, 10, R130.

    [3] V. E. Velculescu, L. Zhang, B. Vogelstein, K. W. Kinzler,Science 1995, 270, 484 –487.

    [4] J. S. Mattick, I. V. Makunin, Hum. Mol. Genet. 2006, 15(suppl 1), R17–29.

    [5] a) P. Flicek, M. R. Amode, D. Barrell, K. Beal, S. Brent, D.Carvalho-Silva, P. Clapham, G. Coates, S. Fairley, S. Fitzger-ald, L. Gil, L. Gordon, M. Hendrix, T. Hourlier, N. Johnson,A. K. Kahari, D. Keefe, S. Keenan, R. Kinsella, M. Komor-owska, G. Koscielny, E. Kulesha, P. Larsson, I. Longden, W.McLaren, M. Muffato, B. Overduin, M. Pignatelli, B. Pritch-ard, H. S. Riat, G. R. Ritchie, M. Ruffier, M. Schuster, D.Sobral, Y. A. Tang, K. Taylor, S. Trevanion, J. Vandrovcova,S. White, M. Wilson, S. P. Wilder, B. L. Aken, E. Birney, F.Cunningham, I. Dunham, R. Durbin, X. M. Fernandez-Suarez, J. Harrow, J. Herrero, T. J. Hubbard, A. Parker, G.Proctor, G. Spudich, J. Vogel, A. Yates, A. Zadissa, S. M.Searle, Nucleic Acids Res. 2012, 40, D84–D90; b) D. Ma-glott, J. Ostell, K. D. Pruitt, T. Tatusova, Nucleic Acids Res.2011, 39, D52–D57; c) T. Mituyama, K. Yamada, E. Hattori,H. Okida, Y. Ono, G. Terai, A. Yoshizawa, T. Komori, K.Asai, Nucleic Acids Res. 2009, 37, D89–D92; d) R. L. Seal,S. M. Gordon, M. J. Lush, M. W. Wright, E. A. Bruford, Nu-cleic Acids Res. 2011, 39, D514–D519.

    [6] N. Rosen, V. Chalifa-Caspi, O. Shmueli, A. Adato, M. Lapi-dot, J. Stampnitzky, M. Safran, D. Lancet, Bioinformatics2003, 19 (suppl 1), i222 – i224.

    [7] D. Groth, H. Lehrach, S. Hennig, Nucleic Acids Res. 2004,32, W313 –W317.

    [8] a) G. Dennis Jr. , B. T. Sherman, D. A. Hosack, J. Yang, W.Gao, H. C. Lane, R. A. Lempicki, Genome Biol. 2003, 4,P3; b) G. Stelzer, A. Inger, T. Olender, T. Iny-Stein, I.Dalah, A. Harel, M. Safran, D. Lancet, OMICS 2009, 13,477–487.

    [9] A. Kato, K. Touhara, Cell. Mol. Life Sci. 2009, 66, 3743 –3753.

    [10] a) I. Menashe, O. Man, D. Lancet, Y. Gilad, Nat. Genet.2003, 34, 143–144; b) T. Olender, S. M. Waszak, M. Viavant,M. Khen, E. Ben-Asher, A. Reyes, N. Nativ, C. J. Wysocki,D. Ge, D. Lancet, BMC Genomics 2012, 13, 414.

    [11] a) M. Safran, V. Chalifa-Caspi, O. Shmueli, T. Olender, M.Lapidot, N. Rosen, M. Shmoish, Y. Peter, G. Glusman, E.Feldmesser, A. Adato, I. Peter, M. Khen, T. Atarot, Y.Groner, D. Lancet, Nucleic Acids Res. 2003, 31, 142–146;b) T. Olender, E. Feldmesser, T. Atarot, M. Eisenstein, D.Lancet, GMR, Genet. Mol. Res. 2004, 3, 545–553.

    [12] G. Glusman, A. Bahar, D. Sharon, Y. Pilpel, J. White, D.Lancet, Mamm. Genome 2000, 11, 1016–1023.

    [13] T. Olender, T. Fuchs, C. Linhart, R. Shamir, M. Adams, F.Kalush, M. Khen, D. Lancet, Genomics 2004, 83, 361 –372.

    [14] R. Aloni, T. Olender, D. Lancet, Genome Biol. 2006, 7,R88.

    [15] W. C. Warren, L. W. Hillier, J. A. Marshall Graves, E.Birney, C. P. Ponting, F. Gr�tzner, K. Belov, W. Miller, L.Clarke, A. T. Chinwalla, S. P. Yang, A. Heger, D. P. Locke,P. Miethke, P. D. Waters, F. Veyrunes, L. Fulton, B. Fulton,T. Graves, J. Wallis, X. S. Puente, C. L�pez-Otin, G. R. Or-d�Çez, E. E. Eichler, L. Chen, Z. Cheng, J. E. Deakin, A.Alsop, K. Thompson, P. Kirby, A. T. Papenfuss, M. J. Wake-field, T. Olender, D. Lancet, G. A. Huttley, A. F. Smit, A.Pask, P. Temple-Smith, M. A. Batzer, J. A. Walker, M. K.Konkel, R. S. Harris, C. M. Whittington, E. S. Wong, N. J.Gemmell, E. Buschiazzo, I. M. Vargas Jentzsch, A. Merkel,J. Schmitz, A. Zemann, G. Churakov, J. O. Kriegs, J. Brosius,E. P. Murchison, R. Sachidanandam, C. Smith, G. J.Hannon, E. Tsend-Ayush, D. McMillan, R. Attenborough,W. Rens, M. Ferguson-Smith, C. M. Lef�vre, J. A. Sharp,K. R. Nicholas, D. A. Ray, M. Kube, R. Reinhardt, T. H.Pringle, J. Taylor, R. C. Jones, B. Nixon, J.-L. Dacheux, H.Niwa, Y. Sekita, X. Huang, A. Stark, P. Kheradpour, M.Kellis, P. Flicek, Y. Chen, C. Webber, R. Hardison, J.Nelson, K. Hallsworth-Pepin, K. Delehaunty, C. Markovic,P. Minx, Y. Feng, C. Kremitzki, M. Mitreva, J. Glasscock, T.Wylie, P. Wohldmann, P. Thiru, M. N. Nhan, C. S. Pohl,S. M. Smith, S. Hou, M. Nefedov, P. J. de Jong, M. B. Ren-free, E. R. Mardis, R. K. Wilson, Nature 2008, 453, 175 –183.

    [16] S. Hayden, M. Bekaert, T. A. Crider, S. Mariani, W. J.Murphy, E. C. Teeling, Genome Res. 2010, 20, 1–9.

    [17] E. Feldmesser, T. Olender, M. Khen, I. Yanai, R. Ophir, D.Lancet, BMC Genomics 2006, 7, 121.

    [18] D. Thierry-Mieg, J. Thierry-Mieg, Genome Biol. 2006, 7(suppl 1), S12.

    [19] O. Man, Y. Gilad, D. Lancet, Protein Sci. 2004, 13, 240 –254.[20] a) M. Q. Nguyen, Z. Zhou, C. A. Marks, N. J. Ryba, L. Bel-

    luscio, Cell 2007, 131, 1009 –1017; b) T. Imai, H. Sakano, Re-sults Probl. Cell Differ. 2009, 47, 57–75.

    [21] O. De la Cruz, R. Blekhman, X. Zhang, D. Nicolae, S. Fire-stein, Y. Gilad, Mol. Biol. Evol. 2009, 26, 491 –494.

    [22] N. Fukuda, K. Yomogida, M. Okabe, K. Touhara, J. CellSci. 2004, 117, 5835–5845.

    [23] L. L. Xu, B. G. Stackhouse, K. Florence, W. Zhang, N. Shan-mugam, I. A. Sesterhenn, Z. Zou, V. Srikantan, M. Augus-tus, V. Roschke, K. Carter, D. G. McLeod, J. W. Moul, D.Soppett, S. Srivastava, Cancer Res. 2000, 60, 6568–6572.

    [24] a) I. Menashe, T. Abaffy, Y. Hasin, S. Goshen, V. Yahalom,C. W. Luetje, D. Lancet, PLoS Biol. 2007, 5, e284; b) A.Keller, H. Zhuang, Q. Chi, L. B. Vosshall, H. Matsunami,Nature 2007, 449, 468–472; c) J. F. McRae, J. D. Mainland,S. R. Jaeger, K. A. Adipietro, H. Matsunami, R. D. New-comb, Chem. Senses 2012, 37, 585 –593.

    [25] I. Keydar, E. Ben-Asher, E. Feldmesser, N. Nativ, A. Oshi-moto, D. Restrepo, H. Matsunami, M.-S. Chien, J. M. Pinto,Y. Gilad, T. Olender, D. Lancet, Hum. Mutat. 2013, 34, 32–41.

    [26] G. P. Sykiotis, N. Pitteloud, S. B. Seminara, U. B. Kaiser,W. F. Crowley Jr. , Sci. Transl. Med. 2010, 2, 32rv2.

    [27] J. N. Hirschhorn, M. J. Daly, Nat. Rev. Genet. 2005, 6, 95–108.

    [28] a) R. H. Scheuermann, W. Ceusters, B. Smith, Summit onTranslat Bioinforma 2009, 116–120; b) O. Bodenreider, A.Burgun, in Proceedings of the First International Conference

    Isr. J. Chem. 2013, 53, 185 – 198 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.ijc.wiley-vch.de 197

    Synergistic Data Tools for Biological Scrutiny

    http://www.ijc.wiley-vch.de

  • on Biomedical Ontology (ICBO 2009), July 24–26, 2009,Buffalo, NY, 39–42.

    [29] a) A. A. o. H. D. Systems, A. H. Association, A. o. H. Re-cords, C. o. C. Classifications, W. H. O. C. f. C. o. D. f. N.America, The International Classification of Diseases, 9thRevision, Clinical Modification, 1991, Ann Arbor: Common Profess & Hosp Act, 1991; b) W. H. Organization, Inter-national Statistical Classification of Diseases and RelatedHealth Problems, Tenth Revision: Introduction; list of three-character categories; tabular list of inclusions and four-char-acter subcategories; morphology of neoplams; special tabula-tion lists for mortality and morbidity; definitions; regulations,World Health Organization, 1992. See: http://apps.who.int/classifications/icd10/browse/2010/en

    [30] D. A. Lindberg, B. L. Humphreys, A. T. McCray, MethodsInf. Med. 1993, 32, 281–291.

    [31] L. M. Schriml, C. Arze, S. Nadendla, Y. W. Chang, M. Ma-zaitis, V. Felix, G. Feng, W. A. Kibbe, Nucleic Acids Res.2012, 40, D940 –D946.

    [32] a) A. Hamosh, A. F. Scott, J. S. Amberger, C. A. Bocchini,V. A. McKusick, Nucleic Acids Res. 2005, 33, D514–D517;b) A. D. Baxevanis, Curr. Protoc. Hum. Genet. 2012, 73,Unit 9.13.

    [33] S. Wild, G. Roglic, A. Green, R. Sicree, H. King, DiabetesCare 2004, 27, 1047 –1053.

    [34] D. S. Wishart, C. Knox, A. C. Guo, R. Eisner, N. Young, B.Gautam, D. D. Hau, N. Psychogios, E. Dong, S. Bouatra, R.Mandal, I. Sinelnikov, J. Xia, L. Jia, J. A. Cruz, E. Lim,C. A. Sobsey, S. Shrivastava, P. Huang, P. Liu, L. Fang, J.Peng, R. Fradette, D. Cheng, D. Tzur, M. Clements, A.Lewis, A. De Souza, A. Zuniga, M. Dawe, Y. Xiong, D.Clive, R. Greiner, A. Nazyrova, R. Shaykhutdinov, L. Li,

    H. J. Vogel, I. Forsythe, Nucleic Acids Res. 2009, 37, D603 –D610.

    [35] C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A.Pon, K. Banco, C. Mak, V. Neveu, Y. Djoumbou, R. Eisner,A. C. Guo, D. S. Wishart, Nucleic Acids Res. 2011, 39,D1035–D1041.

    [36] R. Fechete, A. Heinzel, P. Perco, K. Monks, J. Sollner, G.Stelzer, S. Eder, D. Lancet, R. Oberbauer, G. Mayer, B.Mayer, Proteomics: Clin. Appl. 2011, 5, 354 –366.

    [37] J. M. Rizzo, M. J. Buck, Cancer Prev. Res. 2012, 5, 887 –900.[38] B. Rabbani, N. Mahdieh, K. Hosomichi, H. Nakaoka, I.

    Inoue, J. Hum. Genet. 2012, 57, 621 –632.[39] D. Oz-Levi, B. Ben-Zeev, E. K. Ruzzo, Y. Hitomi, A.

    Gelman, K. Pelak, Y. Anikster, H. Reznik-Wolf, I. Bar-Joseph, T. Olender, A. Alkelai, M. Weiss, E. Ben-Asher, D.Ge, K. V. Shianna, Z. Elazar, D. B. Goldstein, E. Pras, D.Lancet, Am. J. Hum. Genet. 2012, 91, 1065 –1072.

    [40] a) N. O. Stitziel, A. Kiezun, S. Sunyaev, Genome Biol. 2011,12, 227; b) S. Coutant, C. Cabot, A. Lefebvre, M. L�onard,E. Prieur-Gaston, D. Campion, T. Lecroq, H. Dauchel,BMC Bioinformatics 2012, 13 (suppl 14), S9.

    [41] F. M. Watt, R. R. Driskell, Philos. Trans. R. Soc., B 2010,365, 155–163.

    [42] M. D. West, C. Mason, Regener. Med. 2007, 2, 329–333.[43] I. Menashe, R. Aloni, D. Lancet, BMC Bioinformatics 2006,

    7, 393.[44] N. Rappaport, N. Nativ, G. Stelzer, M. Twik, Y. Guan-

    Golan, T. I. Stein, I. Bahir, F. Belinky, C. P. Morrey, M.Safran, D. Lancet, Database 2013, in press; DOI: 10.1093/database/bat018.

    Received: December 10, 2012Accepted: February 22, 2013

    198 www.ijc.wiley-vch.de � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198

    Review T. Olender et al.

    http://www.ijc.wiley-vch.de