1 Ontology (Science) Barry Smith University at Buffalo .
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of 1 Ontology (Science) Barry Smith University at Buffalo .
1
Ontology (Science)
Barry SmithUniversity at Buffalo
http://ontology.buffalo.edu/smith
]
Buffalo, NY
Tutorials and Classes: July 20-23, 2009Conference: July 24-26, 2009
http://icbo.buffalo.edu
International Conference on Biomedical Ontology
2
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
How to do biology across the genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
4
To successfully navigate through such data,
biomedicine needs help from ontologies
5
See Smith, et al. “The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration”, Nature Biotechnology, 25 (11), November 2007.http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf
Uses of ‘ontology’ in PubMed abstracts
6
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
7
8
what cellular component?
what molecular function?
what biological process?
Gene Ontology
9
what cellular component?
what molecular function?
what biological process?
GO aids information retrieval via curation of data and literature
10
GO as Common Controlled Vocabulary
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
11
GO promotes integration of data
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
12
Ontology engineers:
“LET’S GENERALIZE THESE BENEFITS BY BUILDING MANY MANY TINY ONTOLOGIES IN OTHER AREAS”
13
The standard engineering methodology
• Pragmatics (‘usefulness’) is everything
• Usefulness = we get to write software which runs on our machines
14
• It’s easier to write useful software if we work with a simplified model
• (“…we can’t know what reality is like in any case; we only have our ‘concepts’…”)
• Engineer A: This looks like a useful model to me
• (One week later:) Engineer B: This other thing looks like a useful model to me
The standard engineering methodology
The standard engineering methodology
Result:
Data in Pittsburgh does not interoperate with data in Vancouver
Science is siloed
16
Scientific theories must be common resources
1. they cannot be bought or sold
2. they must use open publishing venues
3. they must constantly evolve to reflect results of scientific experiments (“evidence-based”)
4. must be synchronized– use common system of units– common terminologies
17
Why build scientific ontologies
Multiple ontologies only make our data silo problems worse
Just as bad scientific theories must die, so also bad ontologies must die
Ontologies should be relatively independent of tools, implementations and applications*
*Need to clearly separate the Science Domain Knowledge from the Software Programming Knowledge
18
Scientific ontologies must be constrained so that they converge
Q: What is to serve as constraint in order to avoid silo creation ?
A: Reality, as revealed, incrementally, by experimentally-based science
19
Ontological realism
• Find out what the world is like (= by doing science)
• Build representations adequate to this world, not to some simplified model in your laptop
• … this strategy is being realized by the Gene Ontology and an expanding community of biomedical scientists
20
The Open Biomedical Ontologies (OBO) Foundry
• Goal: to provide a suite of controlled structured vocabularies for the callibrated annotation of data to support integration and reasoning across the entire domain of biomedicine
• as biomedical science advances, these ontologies must be evolved in tandem
21
22
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland,
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
Orthogonality
• one ontology for each domain
• no need for ‘mappings’ (too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change)
http://obofoundry.org23
Orthogonality
• is our best (perhaps our only) hope of solving the data silo problem
• ontologists need to be trained to seek orthogonality
• to seek reuse with a vengeance
24
Ontologies like the GO are part of science
True, they must be associated with computer implementations (with engineering artifacts)
But the ontologies are not themselves engineering artifacts
The same ontology can be associated with multiple engineering artifacts
25
Benefits of orthogonality
• ensures that those new to ontology to find the common, tested resources they need
• and to find examplars of good practice
• ensures mutual consistency of ontologies (trivially)
• thereby ensures additivity of annotations
26
More benefits of orthogonality
• it rules out simplification and partiality
• brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness
27
More benefits of orthogonality
• helps to eliminate redundancy
• serves the division of ontological labor: allows experts to focus on their own domains of expertise
• makes possible the establishment of clear lines of authority
28
The goal of orthogonality is a basic goal of science
it is a pillar of the scientific method that scientists should strive always to resolve conflicts between competing theories
29
Is there a problem with orthogonality?
• what if I need my own ontology of cellular membranes to meet my own special purposes?
• strategy of application ontologies should be developed from the start using terms whose definitions employ the resources of orthogonal ontologies like those within the Foundry
• any other approach creates silos30
Better to have one consensus ontology serving multiple purposes
imperfectly
because multiple ontologies addressing the same domain, whether they are good ones or bad ones, create silos
31
For engineers, ontologies1. can be bought and sold
2. need have no well-demarcated scientific domains
3. need not be subject to further maintenance
4. can be stand-alone products
5. are typically tied to one specific implementation
Ontology (engineering) thereby makes the silo problem worse
32
Ontologies created to serve scientific purposes
1. are developed to be common resources (thus they cannot be bought or sold)
2. for representation of well-demarcated scientific domains
3. subject to constant maintenance by domain experts
4. designed to be used in tandem with other, complementary ontologies
5. maximally independent of format and implementation
33
Some obvious truths
• Scientific hypotheses should be formulated by scientists
• Scientific experiments should be carried out by scientists
• Scientific databases should be developed and maintained by scientists
• Scientific textbooks and journal articles should be written by scientists
34
An obvious conclusion:
• Scientific ontologies should be built by scientists
35
Problems to be addressed
• How should ontologist-scientists be trained?
• How do we create a career path for scientific ontologists?
• How do we assign credit to those who contribute to ontology creation and maintenance?
36
Ontologies like the GO are comparable to
– scientific theories
– scientific databases
– scientific journal publications
37
Ontologies like the GO are being used experimentally by scientific
journal publishers
– to provide more useful access to data and other sorts of content via controlled structured keyword lists
– to provide a basis for creating formally structured versions of journal articles
38
The OBO Foundry is working with journal publishers
to create a methodology for expert peer review of ontologies
as articles are peer reviewed
so keyword lists are peer reviewed
so an author’s use of keyword lists is peer reviewed
39
Benefits of peer review
1. provides a gigantic impetus to the improvement of scientific knowledge over time
2. brings benefits to readers, since they need only absorb and collate vetted results
(contrast what happens where vetting is not allowed e.g. on the Semantic Web)
40
Scientific ontology analogous to open source software
S. Weber, The Success of Open Source, Cambridge, MA: Harvard University Press, 2004.
Ontologies should be more like Linux and less like the Semantic Web
41
Weber’s six criteria for success
1. Disaggregated contributions can be derived from knowledge that is not proprietary.
2. The product is perceived as valuable to a critical mass of users.
3. The product benefits from widespread peer attention and review, and can improve through error correction.
4. There are strong positive network effects.
5. An individual or a small group can take the lead and generate a substantive core that promises to evolve into something truly useful.
6. A voluntary community of iterated interaction can develop around the process of building the product.
42
OBO Foundry peer review creates incentives for investment of effort in ontology work
• It gives career-related credit to both authors and reviewers (university promotions and funding are based on peer review credit)
• Supports creation of a professional career path for ontologists
• It gives credit to scientific experts for investment of scientific expertise in ontology development
• It allows measurement of citations of ontologies• It magnifies the motivating potential of the factor of
influence – scientists help to determine what ontology resources exist in their discipline
43
THE END
44