How to build an ontology 2

109
1 How to build an ontology 2 Barry Smith http://ontology.buffalo.edu/smith

description

How to build an ontology 2. Barry Smith http://ontology.buffalo.edu/smith. The 3-level Distinction. Level 1: everything that exists (things, processes, data …) ; Level 2: ideas in people’s minds (diagnoses, thoughts, images in your head, expectations, beliefs, fears …) Level 3: - PowerPoint PPT Presentation

Transcript of How to build an ontology 2

Page 1: How to build an ontology 2

1

How to build an ontology 2

Barry Smith

http://ontology.buffalo.edu/smith

Page 2: How to build an ontology 2

2

Page 3: How to build an ontology 2

The 3-level DistinctionLevel 1:

everything that exists (things, processes, data …);

Level 2:ideas in people’s minds (diagnoses, thoughts, images

in your head, expectations, beliefs, fears …)

Level 3:publicly available (published, written down, drawn,

recorded, saved) versions of level 2 entities (ontologies, databases, journal articles, newspaper reports, diaries …)

Page 4: How to build an ontology 2

The 3-level DistinctionLevel 1:

#120: an incident that happened;

Level 2:#213: the interpretation by some cognitive agent that #120

is an security breach; #31: the expectation by some cognitive agent that similar

incidents might happen in the future;

Level 3:#402: an entry in and information system concerning #120;#1503: an entry in some other information system about

#31 for mitigation or prevention purposes.

Page 5: How to build an ontology 2

5

How do we know which general terms designate universals?

Roughly: terms used by scientists to designate entities about which we have a plurality of different kinds of testable proposition

(cell, electron ...)

Page 6: How to build an ontology 2

More precisely: terms which designate universals are:

1. General

2. Used in current scientific textbooks to express laws of nature

3. Logically non-compound (‘non-rabbit’, ‘rabbit or violin’ do not designate universals)

4. Contain no parts designating particulars (‘cat in Leipzig’, ‘Finnish spy’ do not designate universals

6

Page 7: How to build an ontology 2

7

Class =defa maximal collection of particulars determined by a general term (‘cell’. ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’)

the class A = the collection of all particulars x for which ‘x is A’ is true

Page 8: How to build an ontology 2

8

universals vs. their extensions

universals

{a,b,c,...} collections of particulars

Page 9: How to build an ontology 2

9

Extension =def

The extension of a universal A is the class: instance of the universal A

(it is the class of A’s instances)

(the class of all entities to which the term ‘A’ applies)

Page 10: How to build an ontology 2

10

Problem

The same general term can be used to refer both to universals and to collections of particulars. Consider:

HIV is an infectious retrovirus

HIV is spreading very rapidly through Asia

Page 11: How to build an ontology 2

11

universals vs. classes

universals

{c,d,e,...} classes

Page 12: How to build an ontology 2

12

universals vs. classes

universals

defined classes

Page 13: How to build an ontology 2

13

universals vs. classes

universals

populations, ...

Page 14: How to build an ontology 2

14

Defined class =def

a class defined by a general term which does not designate a universal

the class of all diabetic patients in Leipzig on 4 June 1952

Page 15: How to build an ontology 2

15

OWL is a good representation of defined classes

• sibling of Finnish spy

• member of Abba aged > 50 years

Page 16: How to build an ontology 2

16

Terminology =def.

a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to designate universals together with defined classes.

Page 17: How to build an ontology 2

17

universals, classes, concepts

universals

defined classes

‘concepts’ ?

Page 18: How to build an ontology 2

18

universals < defined classes < ‘concepts’

‘concepts’ which do not correspond to defined classes:

‘Surgical or other procedure not carried out because of patient's decision’

‘Congenital absent nipple’

because they do not correspond to anything

Page 19: How to build an ontology 2

19

(Scientific) Ontology =def.

a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent

1. universals in reality

2. those relations between these universals which obtain universally (= for all instances)

lung is_a anatomical structure

lobe of lung part_of lung

Page 20: How to build an ontology 2

20

Part II: How to Build an Ontology

Page 21: How to build an ontology 2

21

How to build an ontology

work with scientists to create an initial top-level classification

find ~50 most commonly used terms corresponding to universals in reality

arrange these terms into an informal is_a hierarchy according to this Universality principle

A is_a B every instance of A is an instance of B

fill in missing terms to give a complete hierarchy

(leave it to domain scientists to populate the lower levels of the hierarchy)

Page 22: How to build an ontology 2

22

Principle of Low Hanging Fruit

Include even absolutely trivial assertions (assertions you know to be universally true)

pneumococcal virus is_a virus

Computers need to be led by the hand

Page 23: How to build an ontology 2

23

Goal: Each term in an ontology represents exactly one universal

there are universals also of collectivities:

population

complex of cells

Page 24: How to build an ontology 2

24

the use-mention confusion

swimming is healthy and has eight letters

Page 25: How to build an ontology 2

25

Principle

Avoid confusing between words and things

Avoid confusing between concepts in our minds and entities in reality

Recommendation: avoid the word ‘concept’ entirely

Page 26: How to build an ontology 2

26

Principle

For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings

(Don’t use ‘cell’ when you mean ‘plant cell’)

Page 27: How to build an ontology 2

27

Principle

Supply definitions wherever possible

(both human-understandable natural language definitions, and equivalent formal definitions)

Page 28: How to build an ontology 2

28

Principle

Each term should have at most one definition

which may have both natural-language and formal versions

Page 29: How to build an ontology 2

29

The Problem of Circularity

A Person = def. A person with an identity document

cell = def. plant cell, consisting of protoplast and cell wall; ...

Page 30: How to build an ontology 2

30

Principle

Avoid circular definitions

(The term defined should not appear in its own definition)

Page 31: How to build an ontology 2

31

Principle

A definition should use terms which are easier to understand than the term defined

Page 32: How to build an ontology 2

32

Principle

Use Aristotelian definitions

An A is a B which C’s.

A human being is an animal which is rational

Page 33: How to build an ontology 2

33

Principle

Do not seek to define everything

Page 34: How to build an ontology 2

34

In every ontology

some terms and some relations are primitive = they cannot be defined (on pain of infinite regress)

Examples of primitive relations:

identity

instance_of

Page 35: How to build an ontology 2

35

Rules for formatting terms

• Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’)

• Avoid acronyms• Avoid mass terms (‘tissue’, ‘brain

mapping’, ‘clinical research’ ...)• Treat each term ‘A’ in an ontology is

shorthand for a term of the form ‘the universal A’

Page 36: How to build an ontology 2

36

Univocity Terms should have the same meanings on

every occasion of use.

(= They should refer to the same universals)

Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies

Page 37: How to build an ontology 2

37

Universality

Ontologies are made of relational assertions

They should include only those which hold universally

pneumococcal virus causes pneumonia

Page 38: How to build an ontology 2

38

Universality

Often, order will matter:

We can assert

adult transformation_of child

but not

child transforms_into adult

Page 39: How to build an ontology 2

39

Universality

viral pneumonia caused by virus

but not

virus causes pneumonia

pneumococcal virus causes pneumonia

Page 40: How to build an ontology 2

40

Universality

results analysis later_than protocol-design

BUT NOT

protocol-design earlier_than results analysis

Page 41: How to build an ontology 2

41

Positivity

Complements of universals are not themselves universals.

Terms such as non-mammal non-membrane other metalworker in New Zealand

do not designate universals in reality

Page 42: How to build an ontology 2

42

Positivity

What about non-smoker?

Page 43: How to build an ontology 2

43

Objectivity

Which universals exist in reality is not a function of our knowledge.

Terms such as

unknown

unclassified

unlocalized

arthropathies not otherwise specified

do not designate universals in reality.

Page 44: How to build an ontology 2

44

Keep Epistemology Separate from Ontology

If you want to say that

We do not know where A’s are located

do not invent a new class of

A’s with unknown locations

(A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)

Page 45: How to build an ontology 2

45

If you want to say

I surmise that this is a case of pneumonia

do not invent a new class of surmised pneumonias

Confusion of ‘findings’ in medical terminologies

Keep Sentences Separate from Terms

Page 46: How to build an ontology 2

46

Single Inheritance

No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

Page 47: How to build an ontology 2

47

Multiple Inheritance

thing

carblue thing

blue car

is_a is_a

Page 48: How to build an ontology 2

48

Multiple Inheritance

is a source of errors

encourages laziness

serves as obstacle to integration with neighboring ontologies

hampers use of Aristotelian methodology for defining terms

hampers use of statistical search tools

Page 49: How to build an ontology 2

49

Multiple Inheritance

thing

carblue thing

blue car

is_a1 is_a2

Page 50: How to build an ontology 2

50

is_a Overloading

The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.

Page 51: How to build an ontology 2

51

Multiple Inheritance

thing

carblue thing

blue car

is_a1 is_a2

Page 52: How to build an ontology 2

52

How to solve this problem

Create two ontologies:

of cars

of colors

Link the two together via cross-products

(= factoring, normalization, modularization)

Page 53: How to build an ontology 2

53

Compositionality

The meanings of compound terms should be determined

1. by the meanings of component terms

together with

2. the rules governing syntax

Page 54: How to build an ontology 2

54

Why do we need rules/standards for good ontology?

Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking

Intuitive rules facilitate training of curators and annotators

Common rules allow alignment with other ontologies

Page 55: How to build an ontology 2

think of ontologies as legends for cartoons

Page 56: How to build an ontology 2

56

cartoons, like maps, always have a certain threshold of granularity

but they can be veridical representations of reality nonetheless

Goal: use logically well-structured ontologies to create algorithmic, dynamic cartoons

Page 57: How to build an ontology 2

57

Randomized controlled trials

http://rctbank.ucsf.edu/ontology/outline/index.htm

Page 58: How to build an ontology 2

58

Basic Formal Ontology

What the top level should look like

Page 59: How to build an ontology 2

59

Two kinds of entities

occurrents (processes, events, happenings)

continuants (objects, qualities, states...)

Page 60: How to build an ontology 2

60

Continuants (aka endurants)have continuous existence in timepreserve their identity through changeexist in toto whenever they exist at all

Occurrents (aka processes)have temporal partsunfold themselves in successive phasesexist only in their phases

Page 61: How to build an ontology 2

61

You are a continuant

Your life is an occurrent

You are 3-dimensional

Your life is 4-dimensional

Page 62: How to build an ontology 2

62

Dependent entities

require independent continuants as their bearers

There is no run without a runner

There is no grin without a cat

Page 63: How to build an ontology 2

63

Dependent vs. independent continuants

Independent continuants (organisms, buildings, environments)

Dependent continuants (quality, shape, role, propensity, function, status, power, right)

Page 64: How to build an ontology 2

64

All occurrents are dependent entities

They are dependent on those independent continuants which are their participants (agents, patients, media ...)

Page 65: How to build an ontology 2

65

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

Page 66: How to build an ontology 2

66

= A representation of top-level types

Continuant Occurrent

IndependentContinuant

DependentContinuant

cell component

biological process

molecular function

Page 67: How to build an ontology 2

67

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Functioning

Side-Effect, Stochastic Process, ...

Function

Page 68: How to build an ontology 2

68

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Functioning Side-Effect, Stochastic Process, ...

Function

Page 69: How to build an ontology 2

69

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Quality Function Spatial Region

Functioning Side-Effect, Stochastic Process, ...

instances (in space and time)

Page 70: How to build an ontology 2

70

Page 71: How to build an ontology 2

71

Page 72: How to build an ontology 2

72

Towards a Clinical Trial Ontology

To serve merger of data schemas

To serve flexibility of collaborative clinical trial research

To serve management of clinical trial research

To serve data access and reuse

Page 73: How to build an ontology 2

73

CTO will be part of OBI

Ontology of Biomedical Investigations

http://obi.sourceforge.net

which is in turn part of the OBO Foundry

http://obofoundry.org

Page 74: How to build an ontology 2

74

Overview of the Ontology of Biomedical Investigations

with thanks to Trish Whetzel on behalf of the FuGO Working Group

Page 75: How to build an ontology 2

75

OBI

PurposeProvide a resource for the unambiguous description of the

components of biomedical investigations such as the design, protocols and instrumentation, material, data and types of analysis on the data

NOT designed to model biology

EnablesAllow consistent annotation of data across different

technological and biological domainsEnable powerful queriesFacilitate semantically-driven data integration

Page 76: How to build an ontology 2

76

 

Motivation for OBI

Standardization efforts in biological and technological domains

Standard syntax - Data exchange formats To provide a mechanism for software

interoperability, e.g. FuGE Object Model

Standard semantics - Controlled vocabularies or ontology Centralize commonalities for annotation term

needs across domains to describe an investigation/study/experiment, e.g. FuGO

Page 77: How to build an ontology 2

77

Biomedical Investigation Components

Computational/Higher Level Analysis

Data Pre-Processing

Instrumental Analysis

Sample Analysis Preparation

Treatments

Material and It's Characteristics

Investigation Design

Describe the material and characteristics.

Describe the manipulations or perturbations or observations performed on the material to meet the general aim of the investigation.

Describe how the material was prepared for analysis - e.g. labeling, protein digest, etc.

Describe the instrument and settings that were used.Describe the results from the instrument, e.g. what units are represented.

Describe the type analysis performed to confirm/deny the hypothesis, e.g. clustering.

Describe the design and purpose or general aim of the the Investigation.

Page 78: How to build an ontology 2

78

FuGO Development Strategy Decisions

Unified Development

Pros

Overlap of terms is identified early in development

Universal/Common terms are defined by all those collaborating

Additional technological or biological terms can be added as needed by collaborators

Cons

Time needed to develop the ontology

Independent Development

Pros

Develop ‘Ontology’ in a time frame limited only by the community

Cons

Development of different working policies?

Use of different top level classes?

Overlap of terms at lower levels of the ontology tree

Page 79: How to build an ontology 2

79

FuGO Development Process

Collect Use Cases - within community activity

Collect examples of investigations as performed within a community and present Use Cases to developers group

Bottom up approach - within community activity

Identify concepts to describe using controlled terms

Collect terms and their definitions

Bin terms in the top level ontology structure

Top down approach - collaborative activity

Build a top level ontology structure, is_a (vertical) relationships

Make a list of other foreseen (horizontal) relationships

Review how Top Level Nodes fit in with the Upper Level Ontologies

Page 80: How to build an ontology 2

80

FuGO - Top Level Classes

Continuant: an entity that endure/remains the same through time Dependent Continuant: depend on another entity

E.g. Environment (depend on the set of ranges of conditions, e.g. geographic location)

E.g. Characteristics (entity that can be measured, e.g. temperature, unit)

- Realizable: an entity that is realizable through a process (executed/run)E.g. Software (a set of machine instructions)

E.g. Design (the plan that can be realized in a process)

E.g. Role (the part played by an entity within the context of a process)

Independent Continuant: stands on its ownE.g. All physical entity (instrument, technology platform, document etc.)

E.g. Biological material (organism, population etc.)

Occurrent: an entity that occurs/unfold in timeE.g. Temporal Regions, Spatio-Temporal Regions (single actions or Event)

Process E.g. Investigation (the entire ‘experimental’ process)E.g. Study (process of acquiring and treating the biological material)E.g. Assay (process of performing some tests and recording the results)

Page 81: How to build an ontology 2

81

Emerging FuGO Design PrinciplesOBO Foundry ontology, utilize ontology best practices

Inherit top level classes from an Upper Level ontologyUse of the Relation OntologyFollow additional OBO Foundry principlesFacilitates interoperability with other OBO Foundry ontologies

Develop recommendations for naming conventions and metadataFormat for term names, e.g. underscore vs. camel case, no purals Use of Alphanumeric identifier for terms, I.e. something that does not have semantic

meaningMechanisms for adding synonyms, etc.

Open source approachProtégé/OWLWeekly conference callsShared environment using Sourceforge (SF) and SF mailing lists

Page 82: How to build an ontology 2

82

Future Plans

Binning process - ongoing

Reconciliations into one canonical version

Iterative process

Common working practices - established

Each class consists of: unique alphanumeric identifier, human readable string name, definition and comments

Sourceforge tracker in place to collect comments on terms, definitions, relationships

Review ontology so that top level classes meet the needs of all involved ‘communities’

Page 83: How to build an ontology 2

83

OBI Collaborating Communities

Crop sciences Generation Challenge Programme (GCP), www.generationcp.orgEnvironmental genomics MGED RSBI Group, www.mged.org/Workgroups/rsbiGenomic Standards Consortium (GSC), www.genomics.ceh.ac.uk/genomecatalogueHUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.netImmunology Database and Analysis Portal, www.immport.orgImmune Epitope Database and Analysis Resource (IEDB),

http://www.immuneepitope.org/home.doInternational Society for Analytical Cytology, http://www.isac-net.org/Metabolomics Standards Initiative (MSI), msi.workgroups.sourceforge.netNeurogenetics, Biomedical Informatics Research Network (BIRN), www.nbirn.netNutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbiPolymorphismToxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbiTranscriptomics MGED Ontology Group, mged.sourceforge.net/ontologies

Page 84: How to build an ontology 2

84

http://fugo.sourceforge.net

Page 85: How to build an ontology 2

85

http://obi.sourceforge.net

Page 86: How to build an ontology 2

86

Page 87: How to build an ontology 2

87

Page 88: How to build an ontology 2

88

Page 89: How to build an ontology 2

89

Page 90: How to build an ontology 2

90

Page 91: How to build an ontology 2

91

Page 92: How to build an ontology 2

92

Page 93: How to build an ontology 2

93

Top-Level Class Hierarchy for RCT

Root Secondary-study

Trial-details

Trial

Concept • Generic-concept • Population-concept • Protocol-concept • Design-concept • Outcome-concept • Administrative-concept • Intervention-concept

Page 94: How to build an ontology 2

94

Amended Top-Level Class Hierarchy for RCT

EntityContinuant

PopulationProtocolDesign

OccurrentTrial

Secondary-study Intervention

?? Trial-details ?? Outcome-concept ?? Administrative-concept

Page 95: How to build an ontology 2

95

Concept • Generic-concept

– Term-information – Time-entity – Rule-concept

» Clinical-rule

Exclusion-rule

Inclusion-rule » Rule-entity

Recursive-rule

Base-rule » Ethnicity-language-rule » Age-gender-rule » Situation

Page 96: How to build an ontology 2

96

Page 97: How to build an ontology 2

97

Page 98: How to build an ontology 2

98

Concept • Protocol-concept

– Follow-up-compliance – Follow-up-activity – Follow-up – Protocol-change – Treatment-assignment – Protocol – Reason – Outcomes-followup – Secondary-study-protocol

Page 99: How to build an ontology 2

99

Amended Top-Level Class Hierarchy for RCT

EntityContinuant

Protocol• Secondary-study-protocol

Reason

Occurrent• Treatment-assignment • Follow-up

– Follow-up-activity

– Outcomes-follow-up

• Protocol-change

Page 100: How to build an ontology 2

100

Concept • Population-concept

– Subgroup – Recruitment-flowchart – Population – Recruitment – Site-enrollment

Page 101: How to build an ontology 2

101

Amended Top-Level Class Hierarchy for RCT

EntityContinuant

Protocol• Secondary-study-protocol

Recruitment-flowchart Reason Population

• Subgroup

Occurrent• Priors

– Recruitment– Site-enrollment – Treatment-assignment

• Follow-up – Follow-up-activity – Outcomes-follow-up

• Protocol-change

Page 102: How to build an ontology 2

102

Concept • Administrative-concept

– Publication-concept – Study-site – Person – Ethics – Study-committee – Funder – Institution – Registry-ID

Page 103: How to build an ontology 2

103

Continuant• Information object

– Publication – Registry-ID

• Study-site • Person • Institution

– Study-committee – Funder

???Ethics

Page 104: How to build an ontology 2

104

Concept • Intervention-concept

– Blinding-concept – Compliance-details – Intervention-step – Intervention-arm – Co-intervention – Intervention – Compliance-result – Intervention-logic

Page 105: How to build an ontology 2

105

Occurrent• Intervention

– Blinding– Intervention-step – Intervention-arm – Co-intervention

• ??? Intervention-logic

• ??? Compliance-result

• ??? Compliance-details

Page 106: How to build an ontology 2

106

Page 107: How to build an ontology 2

107

Test Case: Clinical Trial Ontology

primary outcomesecondary outcometimepoint clinical trialintervention groupcontrol groupassignment of populations to groupscomplex experimental designrandomizationplaceboresponseefficacycontrolprotocolnull hypothesis,confidence interval

Page 108: How to build an ontology 2

108

FuGE idea: use OBI to design datatableshow to solve this problem of converting the ontology to a database schemawhat are ‘instances’annotating images (image repositories)annotation = shared understanding of a body of knowledge I run a trial I stick my data in Excell and create a datasetdesign database, design tables – that’s it – no more annotationsmetadata is added regarding provenance, this data was added by A and corrected by McBdo rare disease people share their data: here’s my data, here’s my data key, 1 is for males,

0 is for femalessharing is localbut UCSF (Clinical Data Repository) neurodegenerative people MS talk to Alzheimer’s they

can’t because (a) because of Hippa, (b) dataschemas are so different, (c) response to NIH: they put their excell spreadsheet out there, well gee whizz, (d) PharmGKB faced problems because of this (e) more obtuse the better. I can get another paper out of this data

no possibility of meta-analysis – opposite of biologists’ view

Page 109: How to build an ontology 2

109