Knowing what we’re talking about
-
Upload
robertstevens65 -
Category
Science
-
view
129 -
download
5
description
Transcript of Knowing what we’re talking about
Knowing what we’re talking about
Robert StevensBio-health Informatics GroupSchool of Computer Science
University of manchesterOxford RoadManchester
United KingdomM13 9PL
We have an item of data
• 27
• 27 what?
• Units, with what is 27 associated?
• Even if I told you, would we interpret what I said in the same way?
27
• text
27mm
• text
tail of 27mm
Mouse tail of 27 mm• … and we can carry
on: Mouse strain, where was it raised, on what was it fed, times, dates, etc. etc.
• All this data is necessary to interpret my original number
• Even if that metadata exists, we have to agree on the things the numbers describe
mouse tail of 27mm
What is knowledge?
Heterogeneity is rife
• We agree on units (more or less)…
• We don’t agree on much else when it comes to labels for the entities in our domain
• If we don’t know what we’re talking about….
• It’s difficult to interpret and exchange data and the results from data
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
The Ogden Triangle
“Roast Beef“
Concept
[Ogden, Richards, 1923]
• Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is only indirectly possible. We do it by creating concepts that refer to things.
• The relation between symbols and things has been described in the form of the meaning triangle:
We need to know what we’re talking about…
• … if we don’t, our data are useless
• Ifg we are to interpret our data then we need to know what entities it describes
• We need to share data and re-use it
• We need to find data; compare data; analyse data
• We need to know what we know….
Manchester MercuryJanuary 1st 1754 Executed 18
Found Dead 34
Frighted 2
Kill'd by falls and other accidents 55
Kill'd themselves 36
Murdered 3
Overlaid 40
Poisoned 1
Scalded 5
Smothered 1
Stabbed 1
Starved 7
Suffocated 5
Aged 1456
Consumption 3915
Convulsion 5977
Dropsy 794
Fevers 2292
Smallpox 774
Teeth 961
Bit by mad dogs 3
Broken Limbs 5
Bruised 5
Burnt 9
Drowned 86
Excessive Drinking 15
List of diseases & casualties this year
19276 burials
15444 christenings
Deaths by centile
A World of Instances
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects, tokens, particulars.
• The Earth is a kind of Planet
• Robert Stevens (NE 67 41 58 A) is a Person
• All the individual Alpha Haemoglobins in my many Instances of Red Blood Cell
• Each cell instance in my Body has copies of some 30,000 Genes
• A Word, language, idea, etc.
• This Table, those Chairs,
• Any Thing with “A”, “The”, “That”, etc. before it….
We Put things into Categories
• All these instances hang about making our world
• Putting these things into categories is a fundamental part of human cognition
• Psychologists study this as concept formation
• The same instances are put into a category
We have Labels for the Categories and their Instances
• We label categories with symbols: Words
• “Lion” is a category of big cat with big teeth
• Gene, Protein, Cell, Person, Hydrolase Activity, etc.
• …and, as we’ve already seen, each category can have many labels and any particular label can refer to more than one category
• Semantic Heterogeneity
• “A lion” is an instance in that category
• Does the category “Lion” exist?
• Lions exist, but the category could just be a human way of talking about lions
• … we like putting things into categories
A Controlled Vocabulary• A specified set of words and phrases for the
categories in which we place instances
• Natural language definitions for those words and phrases
• A glossary defines, but doesn’t control
• The Uniprot keywords define and control
• Control is placed upon which labels are used to represent the categories (concepts) we’ve used to describe the instances in the world
• …, but there is nothing about how things in these categories are related
Biopolymer
DNA
Enzyme
Nucleic acid
mRNA
Polypeptide
snRNA
tRNA
We also like to Relate Things Together
• Categories have subcategories
• Instances in one category can be related in some way to instances in another
• Can relate instances to each other in many different ways
• Is-a, part-of, develops-from, etc.axes
• We can use these relationships to classify categories
• Things in category A are part is
• If all instances in category A are also in category B then As are kinds of Bs
Biopolymer
Nucleic Acid
Polypeptide
Enzyme
DNA RNA
tRNA mRNA smRNA
Categories and sub-categories
biopolymer
polypeptide
Nucleic acid
enzyme
DNA
RNA
Describing Category Membership
• We can make conditions that any instance must fulfil in order to be a member of a particular category
• A Phosphatase must have a phosphatase catalytic domain
• A Receptor must have a transmembrane domain
• A codon has three nucleotide residues
• A limb has part that is a joint
• A man has a Y chromosome and an X chromosome
• A woman has only an X chromosome
Relationships
• These conditions made from a property and a successor relationship
• isPartOf, hasPart
• isDerivedFrom
• DevelopsFrom
• isHomologousTo
• …and many, many more
A Structured Controlled Vocabulary
• Not only can we agree on the labels we give categories
• Can also agree on how the instances of categories are related
• And agree on the labels we give he relations
• Structure aids querying and captures knowledge with greater fidelity
Biopolymer
Nucleic Acid
Polypeptide
Enzyme
DNA RNA
tRNA mRNA smRNAGene
regionOf
transcribedFrom
trans
late
dFro
m
A Stronger Definition
• a set of logical axioms designed to account for the intended meaning of a formal vocabulary used to describe a certain (conceptualisation of) reality [described in an information system) [Guarino 1998]
• “conceptualisation of” inserted by me
• “Logical axioms” means a formal definition of meaning of terms in a formal language
• Formal language—something a computer an reason with
• Use symbols to make inferences
• Symbols represent things and their relationships
• Making inferences about things computationally
So what is an ontology?
Catalog/ID
Thesauri
Terms/glossary
Informal Is-a
FormalIs-a
Formalinstance
Frames(properties)
General Logicalconstraints
Valuerestrictions
Disjointness,Inverse, partof
Gene Ontology
Mouse AnatomyEcoCyc
PharmGKB
TAMBISArom
After Chris Welty et al
What does it all mean anyway
• To interpret our data we need to know what it is we’re talking about
• We need to decide the things that we’re talking about and agree upon them
• We need to agree on how to recognise those entities
• We need to know how they are related to one another
• Ontologies are a mechanism for describing those entities and their definitions
• There’s more to knowledge representation than ontologies…
All this knowledge needs representing
• We want this knowledge in a computational form• To make the knowledge available for software (and
humans)• To help us develop and manage the (often) complex
artefacts
Building ontologies is hard (getting all those relationships in the right place)
The Web Ontology Language (OWL) is a W3C recommendation for ontologies on the Semantic Web and in semantically enabled applications
A knowledge representation language with a strict semantics that is amenable to autoamted reasoning
Web Ontology Language (OWL)
• W3C recommendation for ontologies for the Semantic Web
• OWL-DL mapped to a decidable fragment of first order logic
• Classes, properties and instances
• Boolean operators, plus existential and universal quantification
• Rich class expressions used in restriction on properties – hasDomain some (ImnunoGlobinDomain or FibronectinDomain)
What are we saying?
Person
WomanMan
is-ais-a
•Are all instances of Man instances of Person?•Can an instance of Person be both a Manand an instance of Woman?•Can there be any more kinds of Person?
What are we saying?
• What kinds of class can fill “has chromosome”?
• How many “Y chromosome” are present?
• Does their have to be a “Y chromosome”?
• What properties are sufficient to be a Man and which are simply necessary?
Y chromosomeMan has-chromosome
Y chromosomeManhas-chromosome
X chromosomehas-chromosome
autosomehas-chromosome
1
1
44
OWL represents classes of instances
A
BC
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain
• Having a fibronectin domain does not a phosphatase make
• Necessity -- what must a class instance have?
• Any protein that has a phosphatase catalytic domain is a phosphatase enzyme
• All phosphatase enzymes have a catalytic domain
• Sufficiency – how is an instance recognised to be a member of a class?
Uses of ontologies
Ontologies in software
Problems Ontologies in Biology Try To Solve
• Provenance – where did it come from, who did it?
• Reproducibility – can I repeat and find results reported?
• Sharing – can others understand your data?
• Integration – can I readily take multiple (thousands of) data sets and use them without preparation?
• New knowledge – can we infer new knowledge as a sum of current knowledge (computationally)?
The rise and rise of ontologies
What are the prospects for ontologies