OBO Foundry PrinciplesBFORO
Barry Smith
1
OBO Foundry Principles open
common formal language (OBO Format, OWL DL, CL)
commitment to collaboration
maintenance in light of scientific advance
unique identifier space (Alan)
naming conventions (Susanna / EBI) – metadata for changes
versioning2
OBO Foundry Principles common architecture (= RO + BFO)
clearly delineated content (redundant – overlaps with orthogonality)
the ontology is well-documented (– overlaps with rules for definitions; needs expanding, for developers, for users, minimal metadata)
plurality of independent users
single locus of authority, trackers, help desk
3
OBO Foundry Principles
textual definitions plus formal definitions all definitions should be of the genus-species
form, utilizing cross-products therefore: single is_a inheritance (= each
ontology should be conceived as consisting of a core of asserted single inheritance with further is_a relations inferred)
4
Orthogonality• For each domain, there should be convergence upon a
single ontology that is recommended for use by those who wish to become involved with the Foundry initiative
• Compare what happens in other parts of science: for each domain, there should be convergence upon a single theory
Preventing silos on the side of annotated data = preventing forking of the ontologies used for annotation
5
Strategy to ensure orthogonality
• If the Foundry already has an ontology O1 covering a domain D, and an outside group creates a second ontology O2 covering D (or part of D), we need to ask:– is it in every respect better? (then replace O1 with
O2)– is it in some respects better? (then negotiate an
improved synthesis, O3)ASSUMPTION: ontologies are always comparablePROBLEM: need better measures of ontology quality)
6
Benefits of orthogonality
• Offers a solution to the problem of silos that is– modular– incremental– empirically based– incorporates a strategy for motivating potential
developers and users
7
Orthogonality = non-redundancy for the reference ontologies inside
the Foundry
• CARO-Mammal will not be orthogonal to CARO
• IDO-Malaria will not be orthogonal to IDO• IDO will not be orthogonal to DO• DO will be orthogonal to CL
8
Absolute redundancy for application ontologies
= all terms in application ontologies should be taken from orthogonal reference ontologies within the Foundry
9
Benefits of orthogonality
• Modularity brings benefits of division of labor, division of authority, minimizes redundancy
10
Benefits of orthogonality
• Scientists become motivated to commit themselves to developing an ontology falling within their domain of expertise because they themselves will need to use this ontology in their own work in the future.
• Forking would erode this motivation
11
Benefits of orthogonality
• Incrementality means that the strategy will still work even if ontologies are still only partial
• this allows adoption and application at early stages
12
Benefits of orthogonality
• Empirically based means that we can always go back and start again if some ontology module does not work (compare the problem of non-modular approaches like SNOMED CT, where it is all or nothing)
13
Benefits of orthogonality
• Modularity brings ownership, motivates on scientist-developers to commit themselves long term to developing the ontology
• This in turn motivates users to commit themselves to adoption – they see strong positive network effects from use
of the ontology)– they gain reassurance from long-term
commitment
14
Benefits of orthogonality
• It helps those new to ontology who need to know where to look in finding an ontology relating to their subject-matter
• it obviates the need for ‘mappings’ between ontologies, which are – difficult to create and use– error-prone – hard to keep up-to-date when mapped ontologies
change
15
Benefits of orthogonality
• modularity (orthogonality) ensures the mutual consistency of ontologies, and thereby also the additivity of the annotations created with their aid by different groups of annotators describing common bodies of data.
• thereby contributes to the cumulativity of science and allows new forms of unmanaged collaboration.
16
Benefits of orthogonality
• brings grave responsibilities to those in charge of ensuring for each domain that the Foundry includes an ontology for that domain
• they must commit to perpetual striving for scientific accuracy and domain-completeness in their work
• orthogonality rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes
17
Benefits of orthogonality
• it supports the strategy of utilizing cross-products in composing terms and definitions
• this strategy will work only if we can – minimize the degree of arbitrariness involved in
selecting the terms to be composed – and thereby maximize the degree to which the
Foundry ontologies are networked together through the cross-product links
18
Misunderstandings of Orthogonality
• Orthogonality does not mean that all ontologies must be developed within the Foundry framework
• We welcome the development of competing approaches to open-access ontology development – which can only make the Foundry stronger
19
Problems with Orthogonality
• what if researchers need purpose-built ontologies to meet their own specific needs?
• OBO Foundry provides orthogonal reference ontologies, so that they can as far as possible build their application ontologies using terms composed as cross-products
• thereby avoid silos• and contributing new terms back to the
Foundry in case of need
20
Problems with Orthogonality
• For each domain, there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiativeQ: WHAT DOES ORTHOGONALITY MEAN?
minimally: two ontologies are not orthogonal if they share a single term with the same meaning
Q: WHAT DOES DOMAIN MEAN?
21
22
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Initial OBO Foundry Reference Ontologies (jigsaw)
Homesteading
Recommendation: Ontology developers should register their claim on territory not yet unoccupied, as soon as possible, because the Foundry is designed to serve as an attractor for collaboration
23
24
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Orthogonality = Westphalian principles of national sovereignty for reference ontologies no shared territory
Varieties of application ontology
• cross-border national parks• Slims• Fractal ontologies• Cross-product ontologies
– Template ontologies (CARO, IDO, GDO …)
25
26
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
cross-border national parks: an ontology for studying the effects of viral infection on cell function in shrimp
27
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Slims = an ontology of dendritic cells
28
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Slims = an ontology of dendritic cells, with definitions composed using terms from other ontologies
29
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
fractal ontologies, employing small portions of many ontologies (e.g. MSO Multiple Sclerosis Ontology)
30
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
rationale of OBO Foundry coverage + BFO
GRANULARITY
RELATION TO TIME
31
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
types plus instances
Continuants (aka endurants)– have continuous existence in time– preserve their identity through change– exist in toto whenever they exist at all
Occurrents (aka processes)– have temporal parts– unfold themselves in successive phases– exist only in their phases
Fundamental Dichotomy
Functions are continuants
Functionings are occurrents
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunctio
n(placeholder)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Functio
n(GO)
MOLECULE (ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
Biom
edical Investigations (O
BI)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULE (ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
Cellular Patholog
y????
MOLECULE (ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
????(GO???)
Cellular Patholog
y????
MOLECULE (ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULE (ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULE
2- and 3-D Structure(RNAO)
(PRO)
Molecular Function(GO)
Molecular Process
(GO)Small
Molecule(ChEBI)
1-DSequence
(SO)
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULE
2- and 3-D Structure(RNAO)
(PRO)
Molecular Function(GO)
Molecular Process
(GO) ?????
Small Molecule(ChEBI)
1-DSequence
(SO)Molecular Pathway
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy / placeholde
r)
Anatomical Entity(FMA, CARO)
OrganFunction(placehol
der)
Phenotypic Quality(PATO)
Disease (DO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULE
2- and 3-D Structure(RNAO)
(PRO)Molecula
r Function
(GO)
Phenotypic Quality of Molecule
????
Molecular Process
(GO) ?????
Small Molecule(ChEBI)
1-DSequence
(SO)Reactome
Orthogonality can be preserved by expanding the territory (land
reclamation)
42
43
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
GO already started to deal with biological processes involving multiple organisms
44
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)http://obofoundry.org
45
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OFORGANISMS
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO)
Population Phenotype
PopulationProcess
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)http://obofoundry.org
46
RELATION
TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OF ORGANISMS
Family, Community,
Deme, Population OrganFunction
(FMP, CPRO)
Population
Phenotype
Population Process
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Phenotypic Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENTCell(CL)
Cell Com-
ponent(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)http://obofoundry.org
E N V
I R O
N M
E N T
47
RELATION
TO TIME
GRANULARITY
CONTINUANT
INDEPENDENT
COMPLEX OF ORGANISMS
Family, Community,
Deme, PopulationEnvironment of
populationORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Environment of single organism
CELL AND CELLULAR
COMPONENTCell(CL)
Cell Com-
ponent(FMA, GO)
Environment of cell
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular environmenthttp://obofoundry.org
E N V
I R O
N M
E N T
48
RELATION
TO TIME
GRANULARITY
CONTINUANT
INDEPENDENT
COMPLEX OF ORGANISMS
Family, Community, Deme, Population
Environment of population
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Environment of single organism*
CELL AND CELLULAR
COMPONENTCell(CL)
Cell Com-
ponent(FMA, GO)
Environment of cell
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular environment
E N V
I R O
N M
E N T
* The sum total of the conditions and elements that make up the surroundings and influence the development and actions of an individual.
49
RELATION TO TIME
GRANULARITY
CONTINUANT
INDEPENDENT
COMPLEX OFORGANISMS
biome / biotope, territory, habitat, neighborhood, ...
work environment, home environment;host/symbiont environment; ...
ORGAN ANDORGANISM
CELL AND CELLULAR
COMPONENT
extracellular matrix; chemokine gradient; ...
MOLECULEhydrophobic surface; virus localized to
cellular substructure; active site on protein; pharmacophore ...
http://obofoundry.orgE N
V I R
O N
M E N
T
50
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
Organism
NCBITaxonom
y
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO)
Phenotypic
Quality(PaTO)
Biological Process
(GO)
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Molecule(ChEBI, SO,RnaO, PrO)
Molecular Function
(GO)
Molecular Process
(GO)
Template ontologies (CARO, IDO, CL?)
X OrganismTaxonomy
51
The case of IDO
Human Disease Ontology Infectious Disease Ontology
unitary hierarchy withroot node: human diseaserefers only to dependent realizable continuants
draws terms from all BFO categories
template exists in many copies: specializing to different hosts, pathogens, vectors, etc.
We have data
TBDB: Tuberculosis Database, including Microarray data
VFDB: Virulence Factor DB TropNetEurop Dengue Case Data ISD: Influenza Sequence Database at LANLPathPort: Pathogen Portal Project ...
53
We need common controlled vocabularies to describe these data in ways that will assure
comparability and cumulationWhat content is needed to adequately cover the
infectious domain?– Host-related terms (e.g. carrier, susceptibility)– Pathogen-related terms (e.g. virulence)– Vector-related terms (e.g. reservoir, – Terms for the biology of disease pathogenesis (e.g.
evasion of host defense)– Population-level terms (e.g. epidemic, endemic,
pandemic, )54
We need to annotate this data
to allow retrieval and integration of– sequence and protein data for pathogens– case report data for patients– clinical trial data for drugs, vaccines– epidemiological data for surveillance, prevention– ...
Goal: to make data deriving from different sources comparable and computable
55
IDO needs to work withDisease Ontology (DO) + SNOMED CTGene Ontology Immunology BranchPhenotypic Quality Ontology (PATO)Protein Ontology (PRO)Sequence Ontology (SO)...
56
IDO provides a common template
IDO works like CARO.It contains terms (like ‘pathogen’, ‘vector’,
‘host’) which apply to organisms of all species involved in infectious disease and its transmission
Disease- and organism-specific ontologies then built as specifications of the IDO core
57
Proposed additions to list of OBO Foundry Principles
• INSTANTIABILITY: Terms in an ontology should correspond to instances in reality
Even disposition terms correspond to instances in reality
There are no absent nipplesThere are no cancelled studies
Proposed additions to list of OBO Foundry Principles
INSTANTIABILITY: Terms in an ontology should represent types all of which have instances in reality
types = what are described in textbooksinstances = (roughly) what are described in data
59
Proposed additions to list of OBO Foundry Principles
Ontologies consist of representations of types in
reality – therefore, their terms should consist entirely of singular nouns (preferred terms blah blah)
Ontologies should use singular nouns and noun phrases belonging to ordinary English as extended by technical terms already established in the relevant discipline – they should not use phrases like ‘EV-EXP-IGI’, no lab slang, no ellipses
60
Proposed additions to list of OBO Foundry Principles
EVALUATION• each ontology should be subject to evaluation
(as far as possible quantitative):• software (conversion OBO format OWL)• specialist review (OWL natural language)• when one version is used for a given purposes
later versions should be applied to the same purpose and results compared
61
Proposed additions to list of OBO Foundry Principles
each ontology should be built on the basis of BFO top-level distinctions (common top level):
• continuants vs. occurrents• independent continuants (molecules, cells,
organisms …)• specifically dependent continuants (qualities,
functions, roles …)• generically dependent continuants
(information artifacts, sequences …)
62
Top Related