OBO-Edit tutorial David Osumi-Sutherland FlyBase / Virtual Fly Brain / OBO-Edit Working Group (OEWG)
1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at...
1
The OBO FoundryBarry SmithCenter of Excellence in Bioinformatics & Life Sciences, University at Buffalo
IFOMIS, Saarland University
http://ontology.buffalo.edu/smith
Standards and Ontology
2
how do we know what data we have ?
how do I know what data you have ?
how do we know what data we don’t have ?
how do we make different sorts of data combinable, as we need to do in large domains such as neurodevelopment, immunology, cancer ...?
we are accumulating huge amounts of sequence data, image data, pharma data, ...
3
genomic medicine, molecular medicine, translational medicine, personalized medicine ...
need
methods for data integration to enable reasoning across data at multiple granularities
to identify biomedically relevant relations on the side of the entities themselves
5
where in the body ?
what kind of disease process ?
= we need ontologies
we need semantic annotation of data
6
Semantic Web, Moby, wikis, etc.
let a million flowers (and weeds) bloom
to create integration rely on (automatically generated?) post hoc mappings
how create broad-coverage semantic annotation systems for biomedicine?
7
most successful, thus far: UMLSbuilt by trained experts
massively useful for information retrieval and information integration
UMLS Metathesaurus a system of post hoc mappings between source vocabularies separately built
9
UMLS-based mappings fall shortof creating interoperability
because local usage is respected
regimentation frowned upon, no concern for cross-framework consistency
UMLS terminologies have different grades of formal rigor, different degrees of completeness, different update policies
10
with UMLS-based annotationswe can know what data we have (via term searches), but it is noisy
we can map between data at single granularities (via ‘synonyms’), but synonymy information is noisy
how do we know what data we don’t have ?
how do we reason with data (as at the molecular level), when no common logical backbone ?
11
for science
to develop high quality annotation resources in a collaborative, community effort?
create an evolutionary path towards improvement of terminologies, of the sort we find elsewhere in science
find ways to reward early adopters of the results
what is to be done?
12
for science
science works out from a consensus core, and strives to isolate and resolve inconsistencies as it extends at the fringes
we need to create a consensus corestart with what for human beings are trivialities (low hanging fruit) and work out from there
for science, consistency is a sine qua non
13FMA
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part
_of
is_a
Foundational Model of Anatomy
14
for science
include ontologies corresponding to the basic biomedical sciences in the core
clinical medicine relies on anatomy
and molecular biology to provide
integration across medical specialisms
15
for science
where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?
but we need more
18
science basis of the GO: trained experts curating peer-reviewed literature
different model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with gene products in a coordinated way
The methodology of annotations
19
cellular locations
molecular functions
biological processes
used to annotate the entities represented in the major biochemical databases
thereby creating integration across these databases and making them available to semantic search
A set of standardized textual descriptions of
21
This processleads to improvements and extensions of the ontology
which in turn leads to better annotations
a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself
RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form
22
Five bangs for your GO buckscience base
cross-species database integration
cross-granularity database integration
through links to the things which are of biomedical relevance
semantic searchability links people to software
23
but nowneed to improve the quality of GO to support more rigorous logic-based reasoning across the data annotated in its terms
need to extend the GO by engaging ever broader community support for the addition of new terms and for the correction of errors
24
but alsoneed to extend the methodology to other domains, including clinical domains need for
disease ontology
immunology ontology
symptom (phenotype) ontology
clinical trial ontology ...
25
the problemexisting clinical vocabularies are of variable quality and low mutual consistency
need for prospective standards to ensure mutual consistency and high quality of clinical counterparts of GO
need to ensure consistency of the new clinical ontologies with the basic biomedical sciences
if we do not start now, the problem will only get worse
26
the solutionestablish common rules governing best practices for creating ontologies and for using these in annotations
apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies
this solution is already being implemented
27
a shared portal for (so far) 58 ontologies (low regimentation)
http://obo.sourceforge.net NCBO BioPortal
First step (2003)First step (2003)
29
Second step (2004)Second step (2004)reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
30
The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/
Third step (2006)Third step (2006)
31
a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia
scientific literature model organism databases clinical trial data
The OBO FoundryThe OBO Foundry
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
32
A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)
established March 2006
12 initial candidate OBO ontologies – focused primarily on basic science domains
several being constructed ab initio
by influential consortia who have the authority to impose their use on large parts of the relevant communities.
33
undergoing rigorous reform
new
GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology
CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
34
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland,
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
35
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy?)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Annotations plus ontologies yield an ever-growing computer-interpretable map of biological reality.
36
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy?)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULE Molecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Building out from the original GO
37
Disease Ontology (DO)
Biomedical Image and Image Process Ontology (BiiO)
Upper Biomedical Ontology (OBO UBO)
Ontology of Biomedical Investigations (OBI)
Clinical Trial Ontology (CTO)
Under consideration:
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
38
OBO Foundry = a subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles reflecting best practice in ontology development designed to ensure
tight connection to the biomedical basic sciences
compatibility
interoperability, common relations
formal robustness
support for logic-based reasoning
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
39
CRITERIA
The ontology is OPEN and available to be used by all.
The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE.
The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap.
CRITERIA
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
40
CRITERIA UPDATE: The developers of each ontology
commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.
ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
41
for science
if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts
orthogonality of ontologies implies additivity of annotations
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
42
CRITERIA
IDENTIFIERS: The ontology possesses a unique identifier space within OBO.
VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use
The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms.
CRITERIA
43
CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.
DOCUMENTATION: The ontology is well-documented.
USERS: The ontology has a plurality of independent users.
CRITERIA
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
44
COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*
* Smith et al., Genome Biology 2005, 6:R46
CRITERIA
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
45
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
OBO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
46
Further criteria will be added over time in light of lessons learned in order to bring about a gradual improvement in the quality of Foundry ontologies
ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT TO CONSTANT UPDATE IN LIGHT OF SCIENTIFIC ADVANCE
IT WILL GET HARDER
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
47
But not everyone needs to join
The Foundry is not seeking to serve as a check on flexibility or creativity
ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE COMMUNITY CRITICISM, CORRECTION AND EXTENSION WITH NEW TERMS
IT WILL GET HARDER
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
48
to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development
CREDIT for high quality ontology development work
KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
49
to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new annotation schemas by each clinical research group by
REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
50
to serve as BENCHMARK FOR IMPROVEMENTS in discipline-focused terminology resources
once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage
exploit the avenue of EVIDENCE-BASED MEDICINE (NIH CLINICAL RESEARCH NETWORKS) to foster their use by clinicians
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
51
June 2006: establishment of MICheck:
reflects growing need for prescriptive checklists specifying the key information to include when reporting experimental results (concerning methods, data, analyses and results).
the vision is spreading
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
52
MICheck: ‘a common resource for minimum information checklists’ analogous to OBO / NCBO BioPortal
MICheck Foundry: will create ‘a suite of self-consistent, clearly bounded, orthogonal, integrable checklist modules’ *
* Taylor CF, et al. Nature Biotech, in press
MICheck Foundry
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
53
Transcriptomics (MIAME Working Group)
Proteomics (Proteomics Standards Initiative)
Metabolomics (Metabolomics Standards Initiative)
Genomics and Metagenomics (Genomic Standards Consortium)
In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group)
Phylogenetics (Phylogenetics Community)
RNA Interference (RNAi Community)
Toxicogenomics (Toxicogenomics WG)
Environmental Genomics (Environmental Genomics WG)
Nutrigenomics (Nutrigenomics WG)
Flow Cytometry (Flow Cytometry Community)
MICheck/Foundry communities
54
how to replicate the successes of the GO in clinical medicine?
choose two or three representative disease domains
work out reasoning challenges for those domains
work with specialists to create ontologies interoperable with OBO Foundry basic science ontologies to address these reasoning challenges
work with leaders of professional associations and of clinical trial initiatives to foster the collection of clinical data annotated in their terms
Fourth Step (the future)Fourth Step (the future)
55
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy?)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
OBO Foundry coverage (canonical ontologies)
GRANULARITY
RELATION TO TIME
56
INDEPENDENT
CONTINUANTS
organism
system
organ
organ part
tissue
cell
acellular anatomical structure
biological molecule
genome
DEPENDENT CONTINUANTS
physiology
(functions)
pathologyacute stage
progressive stage
resolution stage
58
Draft Ontology for Muscular Sclerosis
what data do we have?
what data do the others have?
what data do we not have?
59
Draft Ontology for Muscular Sclerosis
to apprehend what is unknown requires a complete demarcation of the relevant space of alternatives