Construction And Evaluation Of Construction And Evaluation Of OWL-DL OntologiesOWL-DL OntologiesMark WilkinsonMark WilkinsonAssistant ProfessorAssistant ProfessorDepartment of Medical GeneticsDepartment of Medical GeneticsUniversity of British ColumbiaUniversity of British ColumbiaiCAPTURE Centre, St. Paul’s HospitaliCAPTURE Centre, St. Paul’s Hospital
Presenting the work of Presenting the work of
Benjamin Good, M.Sc.Benjamin Good, M.Sc.Wilkinson LaboratoryWilkinson LaboratoryBioinformatics Doctoral Programme, UBCBioinformatics Doctoral Programme, UBC
Our PerspectiveOur Perspective
““We believe that [centralized ontology We believe that [centralized ontology building] efforts are unsustainable and building] efforts are unsustainable and
that the Semantic Web will eventually be that the Semantic Web will eventually be built in the same way as the WWW was – built in the same way as the WWW was –
by its users”by its users”
Good and Wilkinson, “The Life Sciences Semantic Web is Full of Creeps!”, Briefings in Good and Wilkinson, “The Life Sciences Semantic Web is Full of Creeps!”, Briefings in Bioinformatics, (in press)Bioinformatics, (in press)
Why Do We Think This Way?Why Do We Think This Way?
BioMoby: Mass collaborative ontology building to support Web Services Interoperability
What Does BioMoby Do?What Does BioMoby Do?
The MOBY PlanThe MOBY Plan
Create an ontology of bioinformatics data-typesCreate an ontology of bioinformatics data-types
Define an XML representation of this ontologyDefine an XML representation of this ontology
Create an ontology of bioinformatics operationsCreate an ontology of bioinformatics operations
Open these ontologies to public inputOpen these ontologies to public input
Define Web interfaces v.v. these two ontologiesDefine Web interfaces v.v. these two ontologies
Register Interfaces in an ontology-aware RegistryRegister Interfaces in an ontology-aware Registry
A Machine can find an appropriate serviceA Machine can find an appropriate service
A Machine can execute that service unattendedA Machine can execute that service unattended
Ontology is community-extensibleOntology is community-extensible
Take home message…this was built by a community of non-expert ontologists!
Open Open Kimono TimeKimono Time
The BioMoby The BioMoby ontology is ontology is quite messy…quite messy…
……communal communal brains can brains can build useful build useful ontologies, but ontologies, but we will need we will need better toolingbetter tooling
How are ontologies usually How are ontologies usually constructed?constructed?
By A Few People With Lots By A Few People With Lots Of Moola!Of Moola!
Gene OntologyGene OntologyCurated: ~5 full-time staffCurated: ~5 full-time staff
$25 Million (Lewis,S personal communication)$25 Million (Lewis,S personal communication)
National Cancer Institute MetathesaurusNational Cancer Institute MetathesaurusCurated: ~12 full-time staffCurated: ~12 full-time staff
$75 Million (personal estimate)$75 Million (personal estimate)
Health Level 7 (HL7)Health Level 7 (HL7)Curated – staffing unknownCurated – staffing unknown
$15 Billion(?) (Smith, Barry, KBB Workshop, and $15 Billion(?) (Smith, Barry, KBB Workshop, and Montreal, 2005)Montreal, 2005)
Why does it cost so much??
To build the Semantic Web for Life Sciences To build the Semantic Web for Life Sciences we need to encode knowledge from EVERY we need to encode knowledge from EVERY domain of biology – from barley root apex domain of biology – from barley root apex structure and function, to HIV clinical-trials structure and function, to HIV clinical-trials
outcomes… and this knowledge is outcomes… and this knowledge is constantly changing! constantly changing!
At >>$25M a pop, can we At >>$25M a pop, can we affordafford the the Semantic Web???Semantic Web???
The iCAPTURer MethodThe iCAPTURer Method
Template-Assisted Ontology ConstructionTemplate-Assisted Ontology Construction
Pre-iCAPTURerPre-iCAPTURer
Extract the brain of one or a very few experts – expensive and time-consuming…
iCAPTUReriCAPTURerConsume as many brains as possibleConsume as many brains as possible
The iCAPTURer ExperimentThe iCAPTURer Experiment
HypothesesHypotheses
With a starting thesaurus of conceptsWith a starting thesaurus of conceptsWith a clear, simple interface for linking themWith a clear, simple interface for linking them
““wet” researchers can create a robust wet” researchers can create a robust ontology themselvesontology themselves
Using carefully-defined templates, a Knowledge Engineer can control the structure of an ontology
without controlling, nor even understanding, the content
Knowledge Capture ParametersKnowledge Capture Parameters
Domain: Cardiovascular and Pulmonary Domain: Cardiovascular and Pulmonary disease, both clinical and moleculardisease, both clinical and molecularCapture Scope Capture Scope
Thesaurus constructionThesaurus construction
Definitions (unevaluated)Definitions (unevaluated)
Synonomy (same as) relationsSynonomy (same as) relations
Hyponomy (is a) relationsHyponomy (is a) relations
Ontology Task: Ontological classification Ontology Task: Ontological classification of conference abstracts to aid inof conference abstracts to aid insemantic searchingsemantic searching
InterfaceInterfaceChatterbotChatterbot
““I’ve heard that a cardiac myocyte is a type of I’ve heard that a cardiac myocyte is a type of cardiac cell. Is this true?”cardiac cell. Is this true?”
““I’ve heard that STEMI means the same thing as ST I’ve heard that STEMI means the same thing as ST Elevated Myocardial Infarction. Is that nonsense, or Elevated Myocardial Infarction. Is that nonsense, or
is it correct?”is it correct?”
““How do you feel about your mother?”How do you feel about your mother?”
Results Over 5 daysResults Over 5 days
Concepts accepted and expert-validated: 661Concepts accepted and expert-validated: 661
Text-mined concepts rejected: 232Text-mined concepts rejected: 232
Relationships captured: 547Relationships captured: 547
Number of distinct expert knowledge capture Number of distinct expert knowledge capture events in 5 days: >12,000!!events in 5 days: >12,000!!
This is approximately the size of the GOThis is approximately the size of the GO
Cost: 4 pints of beer, 4 coffee mugs, 3 T-shirts, Cost: 4 pints of beer, 4 coffee mugs, 3 T-shirts, 1 chocolate Moose1 chocolate Moose
Was built entirely by volunteersWas built entirely by volunteers
Full details of this experiment are available in:Proceedings of the Pacific Symposium on Biocomputing, 2006
Subjective iCAPTURer ObservationsSubjective iCAPTURer Observations
Humans had an Humans had an extremelyextremely difficult difficulttime classifying things intotime classifying things intopre-existing categoriespre-existing categories
Humans had an Humans had an extremelyextremely difficult time difficult time defining new categories and placing them defining new categories and placing them into the existing classification systeminto the existing classification system
How Do We Know If It Is How Do We Know If It Is Any Good?Any Good?
Templates control structure, but Templates control structure, but not contentnot content
Structurally sound, logically valid, Structurally sound, logically valid, ontologies can still be nonsensical!ontologies can still be nonsensical!
How do we measure the quality of How do we measure the quality of an ontology?an ontology?
Possible Quality MetricsPossible Quality Metrics
Domain independentDomain independentPhilosophical Philosophical desideratadesiderata
Graphical structureGraphical structure
SatisfiabilitySatisfiability
Instance-basedInstance-based
Domain specific Domain specific ““Fit” to textFit” to text
Similarity to a Similarity to a gold standardgold standard
Task-basedTask-based
Slow, subjectiveSlow, subjective
Fast, questionable valueFast, questionable value
Fast, useful, not enoughFast, useful, not enough
Fast in theory, useful…Fast in theory, useful…
Fast, dependent on NLPFast, dependent on NLP
Fast to run, extremely Fast to run, extremely slow to set upslow to set up
Real, but not Real, but not generalizablegeneralizable
Problem Problem Evaluating the metricsEvaluating the metrics
No clear winner has yet emerged from the No clear winner has yet emerged from the morass of metricsmorass of metrics
A “global” winner is unlikely to be foundA “global” winner is unlikely to be found
Each seems to have some benefits and Each seems to have some benefits and some disadvantagessome disadvantages
Each may be useful for one ontology but Each may be useful for one ontology but not anothernot another
How do we evaluate which metrics are How do we evaluate which metrics are useful for evaluating our ontologies?useful for evaluating our ontologies?
Ontology Permutation As A Ontology Permutation As A Metrics-Evaluation ToolMetrics-Evaluation Tool
Take an ontology that everyone agreesTake an ontology that everyone agreesis “good”is “good”
Make it worse by systematically adding Make it worse by systematically adding random changes (noise)random changes (noise)
Quality metric should correlate with the Quality metric should correlate with the amount of noise addedamount of noise added
An Objective Comparison Of An Objective Comparison Of Ontology Quality MetricsOntology Quality Metrics
Amount of noise added (ontology quality decreasing)
QualityQualityMetric 1Metric 1
QualityMetric 2
MeasuredMeasuredOntologyOntology
QualityQuality
Adding Noise To OntologiesAdding Noise To Ontologies
Maintain same number of classes and Maintain same number of classes and relationships as well as satisfiabilityrelationships as well as satisfiability
Add noise by swapping relationships Add noise by swapping relationships attached to pairs of classesattached to pairs of classes
Sub/superclassSub/superclass
Domain/range etc.,Domain/range etc.,
Validate with Pellet reasonerValidate with Pellet reasoner
Quantifying NoiseQuantifying Noise
Simple number of changes is misleading, Simple number of changes is misleading, and not a good measure of “noise”and not a good measure of “noise”
Noise better quantified by the degree of Noise better quantified by the degree of (dis)similarity between the permuted (dis)similarity between the permuted ontology and the source ontologyontology and the source ontology
Maedche, A. and S. Staab, Measuring Similarity between OntologiesLecture Notes in Computer Science. 2002. 251
shipssandwater
Example Of Similarity MeasurementExample Of Similarity Measurement
Semantic distanceSemantic distance
fishermen
dolphins
fishseaweed
anchoviestunasharks
Air breathing Water breathing
Aquatic things
non breathing
Air-centric OntologySemantic Distance
Dolphins Fishermen 0
Dolphins Fish 4
1
2 3
4
Leg-centric Ontology Semantic Distance
Dolphins Fishermen 4
Dolphins Fish 0
Example Of Similarity MeasurementExample Of Similarity Measurement
Semantic distanceSemantic distance
fishermenfish
seaweedanchovies
tunasharks
Has legs No legs
1
23
4
dolphins
ships sand
water
Aquatic things
ConclusionsConclusions
Communities can build useful ontologiesCommunities can build useful ontologies
Better tools make better ontologiesBetter tools make better ontologies
Chatterbot templates seem to work wellChatterbot templates seem to work wellCould easily be incorporated into existing Could easily be incorporated into existing software tools for dynamic, organization-wide software tools for dynamic, organization-wide knowledge capture!knowledge capture!
Ontology evaluation is hard!Ontology evaluation is hard!
Some non-task-based evaluation metrics Some non-task-based evaluation metrics are showing promiseare showing promise
Genome CanadaGenome CanadaGenome AlbertaGenome Alberta
Genome British ColumbiaGenome British Columbia
GA: A Bioinformatics Platform for GA: A Bioinformatics Platform for Genome CanadaGenome Canada
GBC: Better Biomarkers in TransplantationGBC: Better Biomarkers in Transplantation
GA: A Bioinformatics Platform for GA: A Bioinformatics Platform for Genome CanadaGenome Canada
GBC: Better Biomarkers in TransplantationGBC: Better Biomarkers in Transplantation
Canadian Institutes For Health ResearchCanadian Institutes For Health Research
Bioinformatics Training ProgramBioinformatics Training Program
© 2006 Microsoft Corporation. All rights reserved.Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.