http://img.cs.man.ac.uk/stevens 1
Building and Using Ontologies
Robert StevensDepartment of Computer Science
University of ManchesterManchester UK
http://img.cs.man.ac.uk/stevens 2
Introduction
• The nature of bioinformatics resources• What is knowledge?• What is an ontology?• What are the uses of ontologies?• Components of an ontology• Building an ontology (in brief)
http://img.cs.man.ac.uk/stevens 3
The Nature of Bioinformatics Resources
• Over 500 databanks and analysis tools that work over resources
• Repositories of knowledge and data and generation of new knowledge
• Knowledge often held as free text; some use made of controlled vocabularies
• Enormous amount of semantic heterogeneity and poor query facilities
• Knowledge about services not always apparent
http://img.cs.man.ac.uk/stevens 4
What is Knowledge?
• Knowledge – all information and an understanding to carry out tasks and to infer new information
• Information -- data equipped with meaning
• Data -- un-interpreted signals that reach our senses PATRICIAGRACEKENNEDY
SAIDMINEISAPINT
Patricia Grace Kennedy said mine is a pint
name noun verb
Pat Baker is a Manchester bioinformatician who drinks beer.
…CEKENN…Single letter amino acid codesC – cysteineK - lysine
Protein that acts as a tyrosine kinase inthe liver of primates.
http://img.cs.man.ac.uk/stevens 5
Capturing Knowledge
• Capturing knowledge for both humans an computer applications
• A set of vocabulary definitions that capture a community’s knowledge of a domain
• `An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are inter-related which collectively impose a structure on the domain and constrain the possible interpretations of terms.'
http://img.cs.man.ac.uk/stevens 6
What Does an Ontology Do?
• Captures knowledge• Creates a shared understanding – between
humans and for computers• Makes knowledge machine processable• Makes meaning explicit – by definition and
context
http://img.cs.man.ac.uk/stevens 7
What is an Ontology?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation Formal
is-aFrames
(properties)
Informalis-a
Formalinstance
Value Restrs. Disjointness, Inverse, part-
of…
http://img.cs.man.ac.uk/stevens 8
Roles of Ontologies in Bioinformatics
• We can divide ontology use into three types:• Domain-oriented, which are either domain specific (e.g.
E. coli) or domain generalisations (e.g. gene function or ribosomes);
• Task-oriented, which are either task specific (e.g. annotation analysis) or task generalisations (e.g. problem solving);
• Generic, which capture common high level concepts, such as Physical, Abstract and Substance. Important in ontology management and language applications.
http://img.cs.man.ac.uk/stevens 9
Uses of Ontology
• Community reference -- neutral authoring. • Either defining database schema or defining a common
vocabulary for database annotation -- ontology as specification.
• Providing common access to information. Ontology-based search by forming queries over databases.
• Understanding database annotation and technical literature.
• Guiding and interpreting analyses and hypothesis generation
http://img.cs.man.ac.uk/stevens 10
Components of an Ontology
• Concepts: Class of individuals – The concept Protein and the individual `human cytochrome C’
• Relationships between concepts• Is a kind of relationship forms a taxonomy• Other relationships give further structure – is a
part of• Axioms – Disjointness, covering, equivalence,…
http://img.cs.man.ac.uk/stevens 11
Knowledge Representation• Ontology are best delivered in some computable
representation• Variety of choices with different:
– Expressiveness• The range of constructs that can be used to formally,
flexibly, explicitly and accurately describe the ontology
– Ease of use– Computational complexity
• Is the language computable in real time?
Rigour -- Satisfiability and consistency of the representation• Systematic enforcement mechanisms
– Unambiguous, clear and well defined semantics
http://img.cs.man.ac.uk/stevens 12
Languages• Vocabularies using natural language
– Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with weak semantics
– Gene Ontology
• Object-based KR: frames– Extensively used, good structuring, intuitive. Semantics
defined by OKBC standard– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)
• Logic-based: Description Logics– Very expressive, model is a set of theories, well defined
semantics– Automatic derived classification taxonomies– Concepts are defined and primitive
http://img.cs.man.ac.uk/stevens 13
Building Ontologies
• No field of Ontological Engineering equivalent to Knowledge or Software Engineering;
• No standard methodologies for building ontologies;• Such a methodology would include:
– a set of stages that occur when building ontologies; – guidelines and principles to assist in the different stages; – an ontology life-cycle which indicates the relationships among
stages.
http://img.cs.man.ac.uk/stevens 14
The Development Lifecycle• Two kinds of complementary methodologies emerged:
– Stage-based, e.g. TOVE [Uschold96] – Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94].
• Most have TWO stages:1. Informal stage
• ontology is sketched out using either natural language descriptions or some diagram technique
2. Formal stage • ontology is encoded in a formal knowledge representation language, that is
machine computable
– the informal representation helps the former – the formal representation helps the latter.
http://img.cs.man.ac.uk/stevens 15
A Provisional Methodology• A skeletal methodology and life-cycle for building
ontologies;• Inspired by the software engineering V-process model;
• The overall process moves through a life-cycle.
The left side charts the processes in building an ontology
The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology
http://img.cs.man.ac.uk/stevens 16
The V-model Methodology
Conceptualisation
Integrating existing ontologies
Encoding
Representation
Identify purpose and scope
Knowledge acquisition
Evaluation: coverage, verification, granularity
Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency
Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation
Ontology in Use
User Model
Conceptualisation Model
Implementation Model
http://img.cs.man.ac.uk/stevens 17
The ontology building life-cycle
Identify purpose and scope
Knowledge acquisition
Evaluation
Language and representation
Available development tools
Conceptualisation
Integrating existing ontologiesEncoding
Building
http://img.cs.man.ac.uk/stevens 18
Starting Concept List
• Chemicals – atom, ion, molecule, compound, element;• Molecular-compound, ionic-compound, ionic-molecular-
compound, …;• Ionic-macromolecular-compound and ionic-small-
macromolecular-compound;• Protein, peptide, polyprotein, enzyme, holoprotein,
apoprotein,…• Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …
http://img.cs.man.ac.uk/stevens 19
Conceptualisation SketchChemical
AtomElementCompoundMolecule Ion
MetalNon-Metal
Metaloid
Molecular Compound
Molecular Element
Ionic Compound
Ionic Molecule
Ionic Molecular Compound
http://img.cs.man.ac.uk/stevens 20
Molecule Conceptualisation Sketch
NucleicAcid
ProteinPolysaccharide
DNA RNAEnzyme
Macromolecule SmallMolecule
Ionic MacromolecularCompound
Starch Glycogen
mRNA tRNA rRNAsnRNA
Peptide
http://img.cs.man.ac.uk/stevens 21
Initial Encoding
class-def chemical
subclass-of substance
class-def molecule
subclass-of chemical
class-def compound
subclass-of chemical
class-def molecular-compound
subclass-of molecule and compound
http://img.cs.man.ac.uk/stevens 22
Molecules Revisited
NucleicAcid
ProteinPolysaccharide
DNA RNAEnzyme
Macromolecule SmallMolecule
Ionic MacromolecularCompound
Starch Glycogen
mRNA tRNA rRNAsnRNA
Peptide
Non-Ionic MacromolecularCompound
http://img.cs.man.ac.uk/stevens 23
More Encoding
class-def chemical
subclass-of substance
class-def defined molecule
subclass-of chemical
Slot-constraint contains-bond min-cardinality 1 has-value covalent-bond
class-def defined compound
subclass-of chemical
Slot-constraint has-atom-types greater-than 1
class-def defined molecular-compound
subclass-of molecule and compound
http://img.cs.man.ac.uk/stevens 24
Expansion
• Sketch and encode in cycles• Build a taxonomy of a small portion• Then build links to other portions• Add more detail• Document sources, author, date and
argumentation.
http://img.cs.man.ac.uk/stevens 25
Summary
• An ontology captures knowledge for a shared understanding
• The important question is not whether an artefact is an ontology, but whether it does any good
• Making our understanding of domain explicit, consistent and processable
• Bioinformatics resources are knowledge resources – needs to be both human and machine understandable
Top Related