1 Ontology (Science) Barry Smith University at Buffalo .

Post on 22-Dec-2015

218 views 1 download

Transcript of 1 Ontology (Science) Barry Smith University at Buffalo .

1

Ontology (Science)

Barry SmithUniversity at Buffalo

http://ontology.buffalo.edu/smith

]

 

Buffalo, NY

Tutorials and Classes: July 20-23, 2009Conference: July 24-26, 2009

http://icbo.buffalo.edu

International Conference on Biomedical Ontology

2

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

How to do biology across the genome?

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

4

To successfully navigate through such data,

biomedicine needs help from ontologies

5

See Smith, et al. “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration”, Nature Biotechnology, 25 (11), November 2007.http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf

Uses of ‘ontology’ in PubMed abstracts

6

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

7

8

what cellular component?

what molecular function?

what biological process?

Gene Ontology

9

what cellular component?

what molecular function?

what biological process?

GO aids information retrieval via curation of data and literature

10

GO as Common Controlled Vocabulary

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

11

GO promotes integration of data

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

12

Ontology engineers:

“LET’S GENERALIZE THESE BENEFITS BY BUILDING MANY MANY TINY ONTOLOGIES IN OTHER AREAS”

13

The standard engineering methodology

• Pragmatics (‘usefulness’) is everything

• Usefulness = we get to write software which runs on our machines

14

• It’s easier to write useful software if we work with a simplified model

• (“…we can’t know what reality is like in any case; we only have our ‘concepts’…”)

• Engineer A: This looks like a useful model to me

• (One week later:) Engineer B: This other thing looks like a useful model to me

The standard engineering methodology

The standard engineering methodology

Result:

Data in Pittsburgh does not interoperate with data in Vancouver

Science is siloed

16

Scientific theories must be common resources

1. they cannot be bought or sold

2. they must use open publishing venues

3. they must constantly evolve to reflect results of scientific experiments (“evidence-based”)

4. must be synchronized– use common system of units– common terminologies

17

Why build scientific ontologies

Multiple ontologies only make our data silo problems worse

Just as bad scientific theories must die, so also bad ontologies must die

Ontologies should be relatively independent of tools, implementations and applications*

*Need to clearly separate the Science Domain Knowledge from the Software Programming Knowledge

18

Scientific ontologies must be constrained so that they converge

Q: What is to serve as constraint in order to avoid silo creation ?

A: Reality, as revealed, incrementally, by experimentally-based science

19

Ontological realism

• Find out what the world is like (= by doing science)

• Build representations adequate to this world, not to some simplified model in your laptop

• … this strategy is being realized by the Gene Ontology and an expanding community of biomedical scientists

20

The Open Biomedical Ontologies (OBO) Foundry

• Goal: to provide a suite of controlled structured vocabularies for the callibrated annotation of data to support integration and reasoning across the entire domain of biomedicine

• as biomedical science advances, these ontologies must be evolved in tandem

21

22

Ontology Scope URL Custodians

Cell Ontology (CL)

cell types from prokaryotes to mammals

obo.sourceforge.net/cgi-

bin/detail.cgi?cell

Jonathan Bard, Michael Ashburner, Oliver Hofman

Chemical Entities of Bio-

logical Interest (ChEBI)

molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara

Common Anatomy Refer-

ence Ontology (CARO)

anatomical structures in human and model

organisms(under development)

Melissa Haendel, Terry Hayamizu, Cornelius

Rosse, David Sutherland,

Foundational Model of Anatomy (FMA)

structure of the human body

fma.biostr.washington.

edu

JLV Mejino Jr.,Cornelius Rosse

Functional Genomics Investigation

Ontology (FuGO)

design, protocol, data instrumentation, and

analysisfugo.sf.net FuGO Working Group

Gene Ontology (GO)

cellular components, molecular functions, biological processes

www.geneontology.org

Gene Ontology Consortium

Phenotypic Quality Ontology

(PaTO)

qualities of anatomical structures

obo.sourceforge.net/cgi

-bin/ detail.cgi?attribute_and_value

Michael Ashburner, Suzanna

Lewis, Georgios Gkoutos

Protein Ontology (PrO)

protein types and modifications

(under development)Protein Ontology

Consortium

Relation Ontology (RO)

relationsobo.sf.net/

relationshipBarry Smith, Chris

Mungall

RNA Ontology(RnaO)

three-dimensional RNA structures

(under development) RNA Ontology Consortium

Sequence Ontology(SO)

properties and features of nucleic sequences

song.sf.net Karen Eilbeck

Orthogonality

• one ontology for each domain

• no need for ‘mappings’ (too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change)

http://obofoundry.org23

Orthogonality

• is our best (perhaps our only) hope of solving the data silo problem

• ontologists need to be trained to seek orthogonality

• to seek reuse with a vengeance

24

Ontologies like the GO are part of science

True, they must be associated with computer implementations (with engineering artifacts)

But the ontologies are not themselves engineering artifacts

The same ontology can be associated with multiple engineering artifacts

25

Benefits of orthogonality

• ensures that those new to ontology to find the common, tested resources they need

• and to find examplars of good practice

• ensures mutual consistency of ontologies (trivially)

• thereby ensures additivity of annotations

26

More benefits of orthogonality

• it rules out simplification and partiality

• brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness

27

More benefits of orthogonality

• helps to eliminate redundancy

• serves the division of ontological labor: allows experts to focus on their own domains of expertise

• makes possible the establishment of clear lines of authority

28

The goal of orthogonality is a basic goal of science

it is a pillar of the scientific method that scientists should strive always to resolve conflicts between competing theories

29

Is there a problem with orthogonality?

• what if I need my own ontology of cellular membranes to meet my own special purposes?

• strategy of application ontologies should be developed from the start using terms whose definitions employ the resources of orthogonal ontologies like those within the Foundry

• any other approach creates silos30

Better to have one consensus ontology serving multiple purposes

imperfectly

because multiple ontologies addressing the same domain, whether they are good ones or bad ones, create silos

31

For engineers, ontologies1. can be bought and sold

2. need have no well-demarcated scientific domains

3. need not be subject to further maintenance

4. can be stand-alone products

5. are typically tied to one specific implementation

Ontology (engineering) thereby makes the silo problem worse

32

Ontologies created to serve scientific purposes

1. are developed to be common resources (thus they cannot be bought or sold)

2. for representation of well-demarcated scientific domains

3. subject to constant maintenance by domain experts

4. designed to be used in tandem with other, complementary ontologies

5. maximally independent of format and implementation

33

Some obvious truths

• Scientific hypotheses should be formulated by scientists

• Scientific experiments should be carried out by scientists

• Scientific databases should be developed and maintained by scientists

• Scientific textbooks and journal articles should be written by scientists

34

An obvious conclusion:

• Scientific ontologies should be built by scientists

35

Problems to be addressed

• How should ontologist-scientists be trained?

• How do we create a career path for scientific ontologists?

• How do we assign credit to those who contribute to ontology creation and maintenance?

36

Ontologies like the GO are comparable to

– scientific theories

– scientific databases

– scientific journal publications

37

Ontologies like the GO are being used experimentally by scientific

journal publishers

– to provide more useful access to data and other sorts of content via controlled structured keyword lists

– to provide a basis for creating formally structured versions of journal articles

38

The OBO Foundry is working with journal publishers

to create a methodology for expert peer review of ontologies

as articles are peer reviewed

so keyword lists are peer reviewed

so an author’s use of keyword lists is peer reviewed

39

Benefits of peer review

1. provides a gigantic impetus to the improvement of scientific knowledge over time

2. brings benefits to readers, since they need only absorb and collate vetted results

(contrast what happens where vetting is not allowed e.g. on the Semantic Web)

40

Scientific ontology analogous to open source software

S. Weber, The Success of Open Source, Cambridge, MA: Harvard University Press, 2004.

Ontologies should be more like Linux and less like the Semantic Web

41

Weber’s six criteria for success

1. Disaggregated contributions can be derived from knowledge that is not proprietary.

2. The product is perceived as valuable to a critical mass of users.

3. The product benefits from widespread peer attention and review, and can improve through error correction.

4. There are strong positive network effects.

5. An individual or a small group can take the lead and generate a substantive core that promises to evolve into something truly useful.

6. A voluntary community of iterated interaction can develop around the process of building the product.

42

OBO Foundry peer review creates incentives for investment of effort in ontology work

• It gives career-related credit to both authors and reviewers (university promotions and funding are based on peer review credit)

• Supports creation of a professional career path for ontologists

• It gives credit to scientific experts for investment of scientific expertise in ontology development

• It allows measurement of citations of ontologies• It magnifies the motivating potential of the factor of

influence – scientists help to determine what ontology resources exist in their discipline

43

THE END

44