Ontology and the Future of Biomedical Research Barry Smith .
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Ontology and the Future of Biomedical Research Barry Smith .
Ontology and the Future of Biomedical Research
Barry Smithhttp://ifomis.org
Institute for Formal Ontology and Medical Information Science
Saarland University
From chromosome
to disease
Problem:how to reason with data deriving from different sources, each of which uses its own system of classification ?
Solution:
Ontology !
Examples of current needs for ontologies in biomedicine
– to enforce semantic consistency within a database
– to enable data sharing and re-use– to enable data integration (bridging
across data at multiple granularities)
– to allow querying
What is needed
strong general purpose classification hierarchies created by domain specialists clear, rigorous definitionsthoroughly tested in real use casesupdated in light of scientific advance
The actuality (too often)
myriad special purpose ‘light’ ontologies, prepared by ontology engineers and deposited in internet ‘repositories’ or ‘registries’
ontologies for ‘agent’
General trend
on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.
Responses to this trend
Old: UMLS (Unified Medical Language System) – rooted in the faithfulness to the ways language is used by different medical communities
SNOMED
DEMONS
U M L S
– congenital absent nipple is_a nipple– cancer documentation is_a cancer– disease prevention is_a disease– repair and maintenance of wheelchair is_a
disease– water is_a nursing phenomenon– part-whole =def. a nursing phenomenon with
topology part-whole
U M L S
MeSH
MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism
MeSH
National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...National Socialism is_a Social SciencesNational Socialism is_a MeSH Descriptors
New: Semantic Web deposits
Pet Profile Ontology
Review Vocabulary
Band Description Vocabulary
Musical Baton Vocabulary
MusicBrainz Metadata Vocabulary
Kissology
http://www.w3.org/
Beer Ontology
all instances of hops that have ever existed are necessarily ingredients of beer.
some nice computational resources, but low expressivityand few genuinely scientific demonstration cases
OWL-based ontologies …
OWL’s syntactic regimentation is not enough to ensure high-quality
ontologies
– the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solve the problem of ontology integration
Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each community
Many of these terminologies remain as torsos, gather dust, poison the wells, ...
How to do better?How to create the conditions for a step-by-step evolution towards high quality ontologies in the biomedical domainwhich will serve as stable attractors for clinical and biomedical researchers in the future?
A basic distinction
type vs. instance
science text vs. clinical document
dog vs. Fido
Instances are not represented in an ontology built for
scientific purposesIt is the generalizations that are
important
(but instances must still be taken into account)
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
Catalog vs. inventory
Ontology Types Instances
Ontology = A Representation of Types
Ontology = A Representation of Types
Each node of an ontology consists of:
• preferred term (aka term)
• term identifier (TUI, aka CUI)
• synonyms
• definition, glosses, comments
Each term in an ontology represents exactly one type
hence ontology terms should be singular nouns
National Socialism is_a Political Systems
An ontology is a representation of types
We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories – which describe not what is particular in reality but rather what is general
Ontologies need to exploit the evolutionary path to convergence created by science
High quality shared ontologies build communities
NIH, FDA trend to consolidate ontology-based standards for the communication and processing of biomedical data.
caBIG / NECTAR / BIRN / BRIDG ...
http://obo.sourceforge.net
http://www.geneontology.org/
The Methodology of Annotations
GO employs scientific curators, who use experimental observations reported in the biomedical literature to link gene products with GO terms in annotations.
This gene product exercises this function, in this part of the cell, leading to these biological processes
The Methodology of Annotations
This process of annotating literature leads to improvements and extensions of the ontology, which in turn leads to better annotations
This institutes a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself.
Annotations + ontology taken together yield a slowly growing computer-interpretable map of biological reality.
The OBO The OBO FoundryFoundry
A subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles designed to ensure
– intelligibility to biologists (curators, annotators, users)
– formal robustness – stability– compatibility– interoperability – support for logic-based reasoning
The OBO FoundryThe OBO Foundry
Custodians
•Michael Ashburner (Cambridge)•Suzanna Lewis (Berkeley)•Barry Smith (Buffalo/Saarbrücken)
The OBO FoundryThe OBO Foundry
A collaborative experiment
participants have agreed in advance to a growing set of principles specifying best practices in ontology developmentdesigned to guarantee interoperability of ontologies from the very start
The OBO FoundryThe OBO Foundry
The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single reference ontology.
The OBO FoundryThe OBO Foundry
Initial Candidate Members of the OBO Foundry
– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype Ontology– FuGO Functional Genomics Investigation
Ontology– FMA Foundational Model of Anatomy– RO Relation Ontology
The OBO FoundryThe OBO Foundry
Under development – Disease Ontology– NCI Thesaurus– Mammalian Phenotype Ontology – OBO-UBO / Ontology of Biomedical Reality – Organism (Species) Ontology– Plant Trait Ontology– Protein Ontology– RnaO RNA Ontology
The OBO FoundryThe OBO Foundry
Considered for development
– Environment Ontology– Behavior Ontology– Biomedical Image Ontology– Clinical Trial Ontology
The OBO FoundryThe OBO Foundry
CRITERIA
The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry
The ontology is open and available to be used by all.
The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap.
The ontology is in, or can be instantiated in, a common formal language.
The ontology possesses a unique identifier space within OBO.
The ontology provider has procedures for identifying distinct successive versions.
The ontology includes textual definitions for all terms.
CRITERIA
The OBO FoundryThe OBO Foundry
The ontology has a clearly specified and clearly delineated content.
The ontology is well-documented.
The ontology has a plurality of independent users.
CRITERIA
The OBO FoundryThe OBO Foundry
The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*
*Genome Biology 2005, 6:R46
CRITERIA
The OBO FoundryThe OBO Foundry
CRITERIA
Further criteria will be added over time in order to bring about a gradual improvement in the quality of the ontologies in the Foundry
The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry
A reference ontology
is analogous to a scientific theory; it seeks to optimize representational adequacy to its subject matter to the maximal degree that is compatible with the constraints of computational usefulness.
An application ontology
is comparable to an engineering artifact such as a software tool. It is constructed for a specific practical purpose.Examples:
National Cancer Institute Thesaurus FuGO Functional Genomics
Investigation Ontology
Reference Ontology vs. Application Ontology
Currently, application ontologies are often built afresh for each new task; commonly introducing not only idiosyncrasies of format or logic, but also simplifications or distortions of their subject-matters. To solve this problem application ontology development should take place always against the background of a formally robust reference ontology framework
Advantages of the methodology of shared coherently defined
ontologies• promotes quality assurance (better
coding)• guarantees automatic reasoning across
ontologies and across data at different granularities
• yields direct connection to temporally indexed instance data
Advantages of the methodology of shared coherently defined
ontologies
We know that high-quality ontologies can help in creating better mappings e.g. between human and model organism phenotypes
S Zhang, O Bodenreider, “Alignment of Multiple Ontologies of Anatomy: Deriving Indirect Mappings from Direct Mappings to a Reference Ontology”, AMIA 2005
Advantages of the methodology of shared coherently defined ontologies
once the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste.
Goal: to create a family of gold standard reference ontologies upon which terminologies developed for specific applications can draw
The OBO FoundryThe OBO Foundry
Goal: to introduce the scientific method into ontology development:– all Foundry ontologies must be constantly
updated in light of scientific advance– all Foundry ontology developers must work
with all other Foundry ontology developers in a spirit of scientific collaboration
The OBO FoundryThe OBO Foundry
Goal: to replace the current policy of ad hoc
creation of new database schemas by each clinical research group by providing reference ontologies in terms of which database schemas can be defined
The OBO FoundryThe OBO Foundry
Goal: to introduce some of the features of scientific peer review into biomedical ontology development
The OBO FoundryThe OBO Foundry
Goal:to create controlled vocabularies for use by clinical trial banks, clinical guidelines bodies, scientific journals, ...
The OBO FoundryThe OBO Foundry
Goal:to create controlled vocabularies for use by clinical trial banks, clinical guidelines bodies, scientific journals, ...
The OBO FoundryThe OBO Foundry
Goal:to create an evolving map-like representation of the entire domain of biological reality
The OBO FoundryThe OBO Foundry
GO’s three ontologies
molecular function
cellular component
biological process
cell (types)
molecular function
(GO)
species
molecular process
cellular anatom
y
anatomy(fly, fish,
human...)
cellularphysiology
organism-levelphysiology
ChEBI,Sequence,
RNA ...
cell (types)
molecular function
(GO)
species
molecular process
cellular anatom
y
anatomy(fly, fish, human...)
cellularphysiology
organism-levelphysiology
ChEBI,Sequence,
RNA ...
normal(functionings)
pathophysiology(disease)
pathoanatomy(fly, fish, human ...)
pathological(malfunctionings)
cell (types)
molecular function
(GO)
species
molecular process
cellular anatom
y(GO)
anatomy(fly, fish, human...)
cellularphysiology
organism-levelphysiology
ChEBI,Sequence,
RNA ...
pathophysiology(disease)
pathoanatomy(fly, fish, human ...)
cell (types)
molecular function
(GO)
species
molecular process
cellular anatom
y
anatomy(fly, fish, human...)
cellularphysiology
organism-levelphysiology
ChEBI,Sequence,
RNA ...
pathophysiology(disease)
pathoanatomy(fly, fish, human ...)
phenotype
cell (types)
molecular function
(GO)
species
molecular process
cellular anatom
y
anatomy(fly, fish, human...)
cellularphysiology
organism-levelphysiology
ChEBI,Sequence,
RNA ...
pathophysiology(disease)
pathoanatomy(fly, fish, human ...)
phenotype
investigation(FuGO)
Ende
First step
Alignment of OBO Foundry ontologies through a common system of formally defined relations in the OBO Relation Ontology
See “Relations in Biomedical Ontologies”, Genome Biology Apr. 2005
Judith Blake:
“The use of bio-ontologies … ensures consistency of data curation, supports extensive data integration, and enables robust exchange of information between heterogeneous informatics systems. .. ontologies … formally define relationships between the concepts.”
"Gene Ontology: Tool for the Unification of Biology"
an ontology "comprises a set of well-defined terms with well-defined relationships" (Ashburner et al., 2000, p. 27)
is_a (sensu UMLS)A is_a B =def
‘A ’ is narrower in meaning than ‘B ’
grows out of the heritage of dictionaries
(which ignore the basic distinction between types and instances)
is_acongenital absent nipple is_a nipplecancer documentation is_a cancerdisease prevention is_a diseaseNazism is_a social science
is_a (sensu logic)A is_a B =def
For all x, if x instance_of A then x instance_of B
cell division is_a biological process
adult is_a child ???
Two kinds of entitiesoccurrents (processes, events,
happenings)cell division, ovulation, death
continuants (objects, qualities, ...)cell, ovum, organism, temperature of organism, ...
is_a (for occurrents)A is_a B =def
For all x, if x instance_of A then x instance_of B
cell division is_a biological process
is_a (for continuants)A is_a B =def
For all x, t if x instance_of A at t then x instance_of B at t
abnormal cell is_a celladult human is_a humanbut not: adult is_a child
Part_of as a relation between types is more problematic than is standardly supposed
heart part_of human being ?human heart part_of human being ?human being has_part human testis ?human testis part_of human being ?
two kinds of parthood
1. between instances:Mary’s heart part_of Marythis nucleus part_of this cell
2. between typeshuman heart part_of humancell nucleus part_of cell
Definition of part_of as a relation between types
A part_of B =Def all instances of A are instance-level parts of some instance of B
ALL–SOME STRUCTURE
part_of (for occurrents)A part_of B =Def
For all x, if x instance_of A then there is some y, y instance_of B and x part_of ywhere ‘part_of’ is the instance-level part relation
part_of (for continuants)A part_of B =def.
For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x part_of y
where ‘part_of’ is the instance-level part relation
ALL-SOME STRUCTURE
How to use the OBO Relation OntologyOntologies are representations of types and
of the relations between typesThe definitions of these relations involve
reference to times and instances, but these references are washed out when we get to the assertions (edges) in the ontology
But curators should still be aware of the underlying definitions when formulating such assertions
part_of (for occurrents)A part_of B =Def
For all x, if x instance_of A then there is some y, y instance_of B and x part_of ywhere ‘part_of’ is the instance-level part relation
A part_of B, B part_of C ...The all-some structure of such
definitions allowscascading of inferences (true path
rule)(i) within ontologies(ii) between ontologies(iii) between ontologies and repositories of instance-data
Strengthened true path ruleWhichever A you choose, the instance of
B of which it is a part will be included in some C, which will include as part also the A with which you began
The same principle applies to the other relations in the OBO-RO:
located_at, transformation_of, derived_from, adjacent_to, etc.
Kinds of relationsBetween types:
– is_a, part_of, ...
Between an instance and a type– this explosion instance_of the type
explosion
Between instances:– Mary’s heart part_of Mary
In every ontologysome terms and some relations are primitive = they cannot be defined (on pain of infinite regress)
Examples of primitive relations:– identity– instantiation– (instance-level) part_of– (instance-level)
continuous_with
Fiat and bona fide boundaries
Continuity
Attachment
Adjacency
everything here is an independent continuant
structures vs. formations = bona fide vs. fiat boundaries
Modes of Connection
The body is a highly connected entity.
Exceptions: cells floating free in blood.
Modes of Connection
Modes of connection:attached_to (muscle to bone) synapsed_with (nerve to nerve, nerve to muscle)
continuous_with (= share a fiat boundary)
articular eminencearticular (glenoid)fossa
ANTERIOR
Attachment, location, containment
Containment involves relation to a hole or cavity
1: cavity2: tunnel, conduit (artery)3: mouth; a snail’s shell
Fiat vs. Bona Fide Boundaries
Fiat boundary Physical boundary
Double Hole Structure
Medium (filling the environing hole)
Tenant (occupying the central hole)
Retainer (a boundary of some surrounding structure)
head of condyle
neck of condyle
fossa
fiat boundary
THE TEMPOROMANDIBULAR JOINTTHE TEMPOROMANDIBULAR JOINT
continuous_with(a relation between instances which
share a fiat boundary)
is always symmetric:
if x continuous_with y , then y continuous_with x
continuous_with(relation between types)
A continuous_with B =Def.
for all x, if x instance-of A then there is some y such that y instance_of B and x continuous_with y
continuous_with is not always symmetric
Consider lymph node and lymphatic vessel:
Each lymph node is continuous with some lymphatic vessel, but there are lymphatic vessels (e.g. lymphs and lymphatic trunks) which are not continuous with any lymph nodes
Adjacent_toas a relation between types
is not symmetric
Considerseminal vesicle adjacent_to urinary bladder
Not: urinary bladder adjacent_to seminal vesicle
instance levelthis nucleus is adjacent to this
cytoplasmimplies:
this cytoplasm is adjacent to this nucleus
type levelnucleus adjacent_to cytoplasmNot: cytoplasm adjacent_to nucleus
ApplicationsExpectations of symmetry e.g. for
protein-protein interactions may hold only at the instance level
if A interacts with B, it does not follow that B interacts with A
if A is expressed simultaneously with B, it does not follow that B is expressed simultaneously with A
c at t1
C
c at t
C1
time
same instance
transformation_of
pre-RNA mature RNA
adultchild
transformation_of
A transformation_of B =Def. Every instance of A was at some earlier time an instance of B
adult transformation_of child
C
c at t c at t1
C1
tumor development
C
c at t
C1
c1 at t1
C'
c' at t
time
instances
zygote derives_fromovumsperm
derives_from
two continuants fuse to form a new continuant
C
c at t
C1
c1 at t1
C'
c' at t fusion
one initial continuant is replaced by two successor continuants
C
c at t
C1
c1 at t1
C2
c1 at t1
fission
one continuant detaches itself from an initial continuant, which itself continues to exist
C
c at t c at t1
C1
c1 at t
budding
one continuant absorbs a second continuant while itself continuing to exist
C
c at t
c at t1
C'
c' at t capture
A suite of defined relations between typesFoundation
al is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation
has_participanthas_agent
To be added to the Relation Ontology
lacks (between an instance and a type, e.g. this fly lacks wings)
dependent_on (between a dependent entity and its carrier or bearer)
quality_of (between a dependent and an independent continuant)
functioning_of (between a process and an independent continuant)
Low Hanging Fruit
Ontologies should include only those relational assertions which hold universally (= have the ALL-SOME form)
Often, order will matter here:We can include
adult transformation_of childbut not
child transforms_into adult
The Gene Ontology
GO’s three ontologies
molecular functions
cellular components
biological processes
When a gene is identified
three types of questions need to be addressed:
1. Where is it located in the cell? 2. What functions does it have on the
molecular level? 3. To what biological processes do these
functions contribute?
Three granularities:
Cellular (for components)Molecular (for functions)Organ + organism (for processes)
GO has cells
but it does not include terms for molecules or organisms within any of its three ontologiesexcept e.g. GO:0018995 host=Def. Any organism in which another organism spends part or all of its life cycle
Are the relations between functions and processes a matter of granularity?
Molecular activities are the ‘building blocks’ of biological processes ?
But they are not allowed to be represented in GO as parts of biological processes
GO’s three ontologies
molecular functions
cellular components
biological processes
What does “function” mean?
an entity has a biological function if and only if it is part of an organism and has a disposition to act reliably in such a way as to contribute to the organism’s survival
the function is this disposition
Improved version
an entity has a biological function if and only if it is part of an organism and has a disposition to act reliably in such a way as to contribute to the organism’s realization of the canonical life plan for an organism of that type
This canonical life plan might include
canonical embryological development
canonical growthcanonical reproductioncanonical agingcanonical death
The function of the heart is to pump blood
Not every activity (process) in an organism is the exercise of a function – there are – mal functionings– side-effects (heart beating)– accidents (external
interference)– background stochastic activity
Kidney
Nephron
Functional Segments
Functions
FunctionsThis is a screwdriverThis is a good screwdriverThis is a broken screwdriver
This is a heartThis is a healthy heartThis is an unhealthy heart
Functions are associated with certain characteristic process shapes
Screwdriver: rotates and simultaneously moves forward simultaneously transferring torque from hand and arm to screw
Heart: performs a contracting movement inwards and an expanding movement outwards
Not functioning at all
leads to death, modulo internal factors:
plasticity redundancy (2 kidneys)criticality of the system involved
external factors:prosthesis (dialysis machines, oxygen tent)special environmentsassistance from other organisms
What clinical medicine is for
to eliminate malfunctioning by fixing broken body parts(or to prevent the appearance of malfunctioning by intervening e.g. at the molecular level)
Hypothesis: there are no ‘bad’ functions
It is not the function of an oncogene to cause cancer Oncogenes were in every case proto-oncogenes with functions of their ownThey become oncogenes because of bad (non-prototypical) environments
Is there an exception for molecular functions?
Does this apply only to functions on biological levels of granularity
(= levels of granularity coarser than the molecule) ?
If pathology is the deviation from (normal) functioning, does it make sense to talk of a pathological molecule?
(Pathologically functioning molecule vs. pathologically structured molecule)
Is there an exception for molecular functions?
A molecular function is a propensity of a gene product instance to perform actions on the molecular level of granularity. Hypothesis 1: these actions must be reliably such as to contribute to biological processes.Hypothesis 2: these actions must be reliably such as to contribute to the organism’s realization of the canonical life plan for an organism of that type.
The Gene Ontology
is a canonical ontology – it represents only what is normal in the realm of molecular functioning
The GO is a canonical representation
“The Gene Ontology is a computational representation of the ways in which gene products normally function in the biological realm”
Nucl. Acids Res. 2006: 34.
The FMA is a canonical representation
It is a computational representation of types and relations between types deduced from the qualitative observations of the normal human body, which have been refined and sanctioned by successive generations of anatomists and presented in textbooks and atlases of structural anatomy.
The importance of pathways (successive causality)
Each stage in the history of a disease presupposes the earlier stages
Therefore need to reason across time, tracking the order of events in time, using relations such as derives_from, transformation_of ...
Need pathway ontologies on every level of granularity
The importance of granularity (simultaneous causality)
Networks are continuantsAt any given time there are networks existing
in the organism at different levels of granularity
Changes in one cause simultaneous changes in all the others
(Compare Boyle’s law: a rise in temperature causes a simultaneous increase in pressure)
The Granularity Gulf
most existing data-sources are of fixed, single granularity
many (all?) clinical phenomena cross granularities
Therefore need to reason across time, tracking the order of events in time
Good ontologies require:
consistent use of terms, supported by logically coherent (non-circular) definitions, in equivalent human-readable and computable formats
coherent shared treatment of relations to allow cascading inference both within and between ontologies
Three fundamental dichotomies
• continuants vs. occurrents• dependent vs. independent • types vs. instances
ONTOLOGIES AREREPRESENTATIONS OF TYPES
aka kinds, universals, categories, species, genera, ...
Continuants (aka endurants)– have continuous existence in time– preserve their identity through
change– exist in toto whenever they exist at
all
Occurrents (aka processes)– have temporal parts– unfold themselves in successive
phases– exist only in their phases
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
Dependent entities
require independent continuants as their bearers
There is no run without a runnerThere is no grin without a cat
Dependent vs. independent continuants
Independent continuants (organisms, cells, molecules, environments)
Dependent continuants (qualities, shapes, roles, propensities, functions)
All occurrents are dependent entities
They are dependent on those independent continuants which are their participants (agents, patients, media ...)
Top-Level Ontology
ContinuantOccurrent
(always dependent on one or more
independent continuants)
IndependentContinuant
DependentContinuant
= A representation of top-level types
Continuant Occurrent
IndependentContinuant
DependentContinuant
cell component
biological process
molecular function
Top-Level Ontology
Continuant Occurrent
IndependentContinuant
DependentContinuant
Functioning
Side-Effect, Stochastic Process, ...
Function
Top-Level OntologyContinuant Occurrent
IndependentContinuant
DependentContinuant
Functioning Side-Effect, Stochastic Process, ...
Function
Top-Level OntologyContinuant Occurrent
IndependentContinuant
DependentContinuant
Quality Function Spatial Region
Functioning Side-Effect, Stochastic Process, ...
instances (in space and time)
Smith B, Ceusters W, Kumar A, Rosse C. On Carcinomas and Other Pathological Entities, Comp Functional Genomics, Apr. 2006
everything here is an independent continuant
Functions, etc.
Some dependent continuants are realizable
expression of a geneapplication of a therapycourse of a diseaseexecution of an algorithmrealization of a protocol
Functions vs Functionings
the function of your heart = to pump blood in your body
this function is realized in processes of pumping blood
not all functions are realized (consider the function of this sperm ...)
Concepts
Biomedical ontology integration will never be achieved through integration of meanings or concepts
The problem is precisely that different user communities use different concepts
Concepts are in your head and will change as your understanding changes
ConceptsOntologies represent types: not
concepts, meanings, ideas ...Types exist, with their instances, in
objective reality– including types of image, of imaging
process, of brain region, of clinical procedure, etc.
Rules on typesDon’t confuse types with wordsDon’t confuse types with conceptsDon’t confuse types with ways of
getting to know typesDon’t confuse types with ways of
talking about typesDon’t confuses types with data about
types
Some other simple rules for high quality ontologies
Univocity Terms should have the same meanings
on every occasion of use.They should refer to the same kinds of
entities in realityBasic ontological relations such as is_a
and part_of should be used in the same way by all ontologies
Positivity
Complements of types are not themselves types. Hence terms such as
non-mammal non-membrane other metalworker in New Zealand
do not designate types in reality
Ontology of types logic of termsThere are no conjunctive and
disjunctive types:
anatomic structure, system, or substance
musculoskeletal and connective tissue disorder
rheumatism, excluding the back
ObjectivityWhich types exist in reality is not a
function of our knowledge.Terms such as
unknownunclassifiedunlocalizedarthropathies not otherwise specified
do not designate types in reality.
Keep Epistemology Separate from OntologyIf you want to say that
We do not know where A’s are located
do not invent a new class of A’s with unknown locations(A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)
Syntactic SeparatenessDo not confuse sentences with terms
If you want to say
I surmise that this is a case of pneumonia
do not invent a new class of surmised pneumonias
Single Inheritance
No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level
Multiple Inheritance
thing
car
blue thing
blue car
is_a is_a
Multiple Inheritance
is a source of errorsencourages lazinessserves as obstacle to integration with
neighboring ontologieshampers use of Aristotelian methodology
for defining terms
Multiple Inheritance
thing
car
blue thing
blue car
is_a1 is_a2
is_a Overloading
The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.
Example: is_a is pressed into service by the GO to express location
is-located-at and similar relations are expressed by creating special compound terms using:
site of …… within …… in …extrinsic to …
yielding associated errors
e.g. errors with ‘within’lytic vacuole within a protein storage
vacuole
lytic vacuole within a protein storage vacuole is-a protein storage vacuole
Compare:embryo within a uterus is-a uterus
similar problems with part_of
extrinsic to membrane part_of membrane
CompositionalityThe meanings of compound terms
should be determined 1. by the meanings of component terms
together with2. the rules governing syntax
Why do we need rules/standards for good ontology?
Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking
Intuitive rules facilitate training of curators and annotators
Common rules allow alignment with other ontologies
When we annotate the record of an experiment
we use terms representing types to capture what we learn about:– this experiment (instance), performed here
and now, in this laboratory– the instances experimented upon
These instances are typical = they are representatives of types – of experiment (described in FuGO)– of gene product molecules, molecular
functions, cellular components, biological processes (described in GO)
Experimental records
document a variety of instances (particular real-world examples or cases), ranging from instances of gene products (including individual molecules) to instances of biochemical processes, molecular functions, and cellular locations
Experimental records
provide evidence that gene products of given types have molecular functions of given types by documenting occurrences in the real world that involve corresponding instances of functioning.
They document the existence of real-world molecules that have the potential to execute (carry out, realize, perform) the types of molecular functions that are involved in these occurrences.
Motivation: To capture realityInferences and decisions we make are
based upon what we know of reality.An ontology is a computable
representation of biological reality, which is designed to enable a computer to reason over the data we collect about this reality in (some of) the ways that we do.