An overview of the ontology landscape - BioMedBridges€¦ · An overview of the ontology landscape...

Post on 03-Jun-2020

2 views 0 download

Transcript of An overview of the ontology landscape - BioMedBridges€¦ · An overview of the ontology landscape...

What ontologies exist, who builds them and and what are they used for?

An overview of the ontology landscapeAn overview of the ontology landscape

Robert Stevens, James Malone

robert.stevens@manchester.ac.uk, malone@ebi.ac.uk

Outline

• What do we need to describe?

• What exists to describe it?

• Are they any good….?

• Ontology organisations

Dimensions of description

• The entities themselves – genes proteins, processes, cells, properties

• The investigations that produced the entities

• The informational origins and history of those entities and their descriptions (data and provenance)

What entities exist to be described

• The actual “concrete” biological entities themselves: Proteins, genes, small molecules, cells, gross anatomy, etc etc

• The devices used to produce and measure them

• Properties of those entities: Size, Shape, colour, function, role, etc etc.

• The biological processes in which those biological entities take part.

• The measuring and analytical processes used on those biological entities.

• Sites on those biological entities: Shoulder region, a bit of the environment, the dorsal region of a mouse, etc etc.

• Information artefacts about all of the above: sequences, database records, who, what, when, where and how… lab protocols, etc etc.

Dividing things up from the top

Dividing things up from the top - process

Gene ontology (GO)biological process,Gene ontology (GO) molecular process

Dividing things up from the top - information

Information Artifact Ontology (IAO)Software Ontology (SWO)Unit Ontology (UO)

Dividing things up from the top - material

ChEBIProtein Ontology (PrO)Sequence Ontology (SO)Cell Type Ontology (CLO)Uberon Foundational Model of Anatomy (FMA)NCBI Taxonomy

Dividing things up from the top - property

GO Molecular FunctionPhenotypic Quality (PaTO)Human Disease Ontology (HDO)

Dividing things up from the top - site

Gazetteer Ontology (GAZ)

We’ve covered most of what there is…

• We’ve chosen bits from a simple upper level ontology

• These are domain neutral descriptions of the entities in any domain of interest

• Top-level or upper ontologies give a common view on what discriminations to make…

• … and what relationships to use between them

• BFO, Simple top Bio

Ontologies in these dimensions

• Here we want a “space” covering these dimensions with ontologies splattered about

• Dimension 1: genotype to phenotype

• Dimension 2: investigations

• Dimension 3: information – IAO, prov, etc.

Reference vs Application Ontologies

• Ontologies developed for different uses

• Reference ontologies built with aim of becoming authority on given domain

• Application ontologies built towards specific application use cases, such as for tooling or database needs

• Application ontologies often consume reference ontologies

Things we describe in Biology - Genes

• Gene Ontology - Gene biological processes, cellular components and molecular functions

• Seen as benchmark of success in bio-ontology

• Many ‘best practices’ fallen out of the GO’s development such as evidence codes, obsolescence policy and community development

Things we describe in Biology - Phenotypes• PATO – ‘phenotypic qualities’, i.e. physical properties of

organisms

• Extremely wide range of classes, examples include colour, size, shape, odour, behaviour

• Phenotypes are important in understanding how genes interact with the environment (in producing phenotypes)

Matzke MA, Image: Matzke AJM (2004) Planting the Seeds of a New Paradigm. PLoS Biol 2(5): e133

Master headline

Things we describe in Biology - Disease

• Majority of biomedical studies consider disease in some way

• Multiple terminologies for disease on biology

• SNOMED CT – Medical (clinical) terminology

• ICD-10 – Classification of disease and health problems

• NCI Thesaurus (not an ontology) - large, lots of textual definitions but less axiomatisation, disease subpart

• UMLS – set of controlled vocabularies describing medical concepts very large at >1 million biomedical concepts

• Human Disease Ontology – based on subset of UMLS, enriched with relationships and new concepts

Master headline

Things we describe in Biology - Anatomy

• Anatomy is important for many reasons including:

• Understanding how genes relate to anatomical regions

• Understanding how disease affects anatomical systems

• Comparative anatomy, i.e. comparing how structures in different species are related

• Model organism anatomies, e.g.

• Mouse adult gross anatomy

• Human anatomy – FMA

• Drosophila Anatomy

• Arabidopsis thaliana

• Zebrafish

• C. elegans

Master headline

Genes at work in different species anatomy • DII gene orthologs implicated in development in multiple

species of different anatomical parts

Mungall, C. et al (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biology 2012, 13:R5

Things we describe in Biology – Chemical Entities

• ChEBI - molecular entities focused on ‘small’ chemical compounds

• Janna will talk about this tomorrow

Things we describe in Biology – Cells

• Cell Ontology is an ontology of cell types

• CL merges information contained in species-specific anatomical ontologies as well as referencing ontologies such:

• the Protein Ontology (PR) for uniquely expressed biomarkers

• Gene Ontology (GO) for the biological processes a cell type participates in.

Things we describe in Biology – Pathways

• Reactome is a database of pathways

• Has export to BioPax ontology to describe pathway elements

• Connects many biological concepts including nucleic acids, genes, disease and GO terms

OBO Foundry• OBO = Open Biomedical Ontology

• The OBO Foundry seeks to organise human expertly curated ontologies in biomedicine

• Provides a set of principles for best practice

• Six OBO Foundry ontologies

• OBO library much bigger and there are many Foundry candidate ontologies

• Intrinsically, biology is interconnected yet many ontologies are not formally linked

• Ontology development is expensive – reducing overlap and improving collaboration would decrease this

• Modularity of domains would increase reusability

Let 100 flowers bloom vs Centralised collaboration• 100 flowers bloom:

• Competition driven

• Application and data driven (often to local use cases)

• Requires no commitment to upper ontology framework

• Mapping between efforts can be costly (potentially exponential)

• Duplication of effort

• Centralised collaboration:

• Encourages collaboration and openness

• Aim to produce consensus model of domain knowledge

• Reducing overlap reduces duplicated effort

• Interoperability part of methodology

• Requires upper ontology commitment

• Development by committee can be inhibiting