Post on 03-Jun-2020
What ontologies exist, who builds them and and what are they used for?
An overview of the ontology landscapeAn overview of the ontology landscape
Robert Stevens, James Malone
robert.stevens@manchester.ac.uk, malone@ebi.ac.uk
Outline
• What do we need to describe?
• What exists to describe it?
• Are they any good….?
• Ontology organisations
Dimensions of description
• The entities themselves – genes proteins, processes, cells, properties
• The investigations that produced the entities
• The informational origins and history of those entities and their descriptions (data and provenance)
What entities exist to be described
• The actual “concrete” biological entities themselves: Proteins, genes, small molecules, cells, gross anatomy, etc etc
• The devices used to produce and measure them
• Properties of those entities: Size, Shape, colour, function, role, etc etc.
• The biological processes in which those biological entities take part.
• The measuring and analytical processes used on those biological entities.
• Sites on those biological entities: Shoulder region, a bit of the environment, the dorsal region of a mouse, etc etc.
• Information artefacts about all of the above: sequences, database records, who, what, when, where and how… lab protocols, etc etc.
Dividing things up from the top
Dividing things up from the top - process
Gene ontology (GO)biological process,Gene ontology (GO) molecular process
Dividing things up from the top - information
Information Artifact Ontology (IAO)Software Ontology (SWO)Unit Ontology (UO)
Dividing things up from the top - material
ChEBIProtein Ontology (PrO)Sequence Ontology (SO)Cell Type Ontology (CLO)Uberon Foundational Model of Anatomy (FMA)NCBI Taxonomy
Dividing things up from the top - property
GO Molecular FunctionPhenotypic Quality (PaTO)Human Disease Ontology (HDO)
Dividing things up from the top - site
Gazetteer Ontology (GAZ)
We’ve covered most of what there is…
• We’ve chosen bits from a simple upper level ontology
• These are domain neutral descriptions of the entities in any domain of interest
• Top-level or upper ontologies give a common view on what discriminations to make…
• … and what relationships to use between them
• BFO, Simple top Bio
Ontologies in these dimensions
• Here we want a “space” covering these dimensions with ontologies splattered about
• Dimension 1: genotype to phenotype
• Dimension 2: investigations
• Dimension 3: information – IAO, prov, etc.
Reference vs Application Ontologies
• Ontologies developed for different uses
• Reference ontologies built with aim of becoming authority on given domain
• Application ontologies built towards specific application use cases, such as for tooling or database needs
• Application ontologies often consume reference ontologies
Things we describe in Biology - Genes
• Gene Ontology - Gene biological processes, cellular components and molecular functions
• Seen as benchmark of success in bio-ontology
• Many ‘best practices’ fallen out of the GO’s development such as evidence codes, obsolescence policy and community development
Things we describe in Biology - Phenotypes• PATO – ‘phenotypic qualities’, i.e. physical properties of
organisms
• Extremely wide range of classes, examples include colour, size, shape, odour, behaviour
• Phenotypes are important in understanding how genes interact with the environment (in producing phenotypes)
Matzke MA, Image: Matzke AJM (2004) Planting the Seeds of a New Paradigm. PLoS Biol 2(5): e133
Master headline
Things we describe in Biology - Disease
• Majority of biomedical studies consider disease in some way
• Multiple terminologies for disease on biology
• SNOMED CT – Medical (clinical) terminology
• ICD-10 – Classification of disease and health problems
• NCI Thesaurus (not an ontology) - large, lots of textual definitions but less axiomatisation, disease subpart
• UMLS – set of controlled vocabularies describing medical concepts very large at >1 million biomedical concepts
• Human Disease Ontology – based on subset of UMLS, enriched with relationships and new concepts
Master headline
Things we describe in Biology - Anatomy
• Anatomy is important for many reasons including:
• Understanding how genes relate to anatomical regions
• Understanding how disease affects anatomical systems
• Comparative anatomy, i.e. comparing how structures in different species are related
• Model organism anatomies, e.g.
• Mouse adult gross anatomy
• Human anatomy – FMA
• Drosophila Anatomy
• Arabidopsis thaliana
• Zebrafish
• C. elegans
Master headline
Genes at work in different species anatomy • DII gene orthologs implicated in development in multiple
species of different anatomical parts
Mungall, C. et al (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biology 2012, 13:R5
Things we describe in Biology – Chemical Entities
• ChEBI - molecular entities focused on ‘small’ chemical compounds
• Janna will talk about this tomorrow
Things we describe in Biology – Cells
• Cell Ontology is an ontology of cell types
• CL merges information contained in species-specific anatomical ontologies as well as referencing ontologies such:
• the Protein Ontology (PR) for uniquely expressed biomarkers
• Gene Ontology (GO) for the biological processes a cell type participates in.
Things we describe in Biology – Pathways
• Reactome is a database of pathways
• Has export to BioPax ontology to describe pathway elements
• Connects many biological concepts including nucleic acids, genes, disease and GO terms
OBO Foundry• OBO = Open Biomedical Ontology
• The OBO Foundry seeks to organise human expertly curated ontologies in biomedicine
• Provides a set of principles for best practice
• Six OBO Foundry ontologies
• OBO library much bigger and there are many Foundry candidate ontologies
• Intrinsically, biology is interconnected yet many ontologies are not formally linked
• Ontology development is expensive – reducing overlap and improving collaboration would decrease this
• Modularity of domains would increase reusability
Let 100 flowers bloom vs Centralised collaboration• 100 flowers bloom:
• Competition driven
• Application and data driven (often to local use cases)
• Requires no commitment to upper ontology framework
• Mapping between efforts can be costly (potentially exponential)
• Duplication of effort
• Centralised collaboration:
• Encourages collaboration and openness
• Aim to produce consensus model of domain knowledge
• Reducing overlap reduces duplicated effort
• Interoperability part of methodology
• Requires upper ontology commitment
• Development by committee can be inhibiting