HyQue: Evaluating scientific Hypotheses using semantic web technologies
-
Upload
michel-dumontier -
Category
Health & Medicine
-
view
1.721 -
download
0
Transcript of HyQue: Evaluating scientific Hypotheses using semantic web technologies
![Page 1: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/1.jpg)
HYQUE: EVALUATING SCIENTIFIC HYPOTHESES USING SEMANTIC WEB
TECHNOLOGIES
MICHEL DUMONTIER, PHD
ASSOCIATE PROFESSOR OF BIOINFORMATICS, DEPARTMENT OF BIOLOGY, INSTITUTE OF BIOCHEMISTRY AND SCHOOL OF COMPUTER SCIENCE @ CARLETON UNIVERSITY
PROFESSEUR ASSOCIÉ, DÉPARTEMENT D’INFORMATIQUE ET DE GÉNIELOGICIEL, UNIVERSITÉ LAVAL
![Page 2: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/2.jpg)
HYQUE IS A COLLABORATIVE WORK
Work performed by Alison Callahan, a PhD student under my supervision @ Carleton University
Partnership with Dr. Nigam Shah, Assistant Professor at Stanford University
![Page 3: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/3.jpg)
![Page 4: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/4.jpg)
Source: http://kentsimmons.uwinnipeg.ca/cm1504/introscience.htm
![Page 5: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/5.jpg)
WITH UNPARALLELED GROWTH IN RESEARCH OUTPUTS, UNCOVERING ALL THE EVIDENCE TO SUPPORT/REFUTE A HYPOTHESIS IS BECOMING INCREASINGLY DIFFICULT
Citations added to Medline 1995-2009
Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html
![Page 6: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/6.jpg)
HYBROW
Computationally augmented method for hypothesis evaluation
• developed by Racunas et al. [1]• minimum event-based vocabulary• uses consistency checking to evaluate hypotheses
• constraints to ensure valid claims• rules to evaluate evidence
• compares hypotheses using neighborhood functions• incremental hypothesis improvement
[1] Racunas S. A., Shah N. H., Albert I. and Fedoroff N. V. (2004). HyBrow: A prototype system for computer-aided hypothesis evaluation. Bioinformatics 20(S. 1): i1-i8.
![Page 7: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/7.jpg)
THE GAL GENE NETWORK IN YEAST
• Genes that encode proteins that transport and metabolize galactose
• permease – gal2p – transports galactose into cells
• galactokinase – gal1p• uridylyltransferase – gal7p• epimerase – gal10p• phosphoglucomutase –gal5p
• Regulation – whether the pathway is on or off
• gal3p• gal4p• gal80p
![Page 8: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/8.jpg)
Source: Ostergaard et al. (2000). Nature Biotechnology 18: 1283 - 1286
![Page 9: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/9.jpg)
HYPOTHESISh1:
e1 (Gal4p induces expression of GAL1)
h2:
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
h3:
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
simple event-based expression
conjunctive hypothesis – must satisfy two expressions
conjunctive hypothesis with conditional expression
![Page 10: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/10.jpg)
HYBROW• small, manually generated knowledge base
• hard coded Perl rules
• challenging to apply to a new domain
• needs access to a greater KB
![Page 11: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/11.jpg)
SEMANTIC WEB TECHNOLOGIES FOR KNOWLEDGE MANAGEMENT?Semantic Web technologies are promising for application to automating hypothesis evaluation
• Languages for formal knowledge representation• Automated reasoning• Querying over distributed resources• Growing number of biological resources available in SW formats
• Ontologies• Data
Bio2RDF is one the largest resources of linked life data on the Web
~40 data sets available• Globally distributed• Dataset-specific SPARQL endpoints
![Page 12: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/12.jpg)
BIO2RDF IS PART OF A GROWING WEB OF LINKED DATA
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
![Page 13: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/13.jpg)
It is about standards for publishing, sharing and querying knowledge drawn from diverse sources
It enables the answering of sophisticated questions
The Semantic Web is a web of knowledge
![Page 14: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/14.jpg)
ontology as a strategy to
formally represent knowledge
![Page 15: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/15.jpg)
The Web Ontology Language (OWL) Has Explicit Semantics
Can therefore be used to capture knowledge in a machine understandable way
![Page 16: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/16.jpg)
HYBROW HYQUE
• Hypothesis query and evaluation system
• Built on Semantic Web technologies
• Background knowledge encoded as OWL ontologies
• Queries against SPARQL endpoints• Context-specific rules that consider experimental
conditions• consumes and produces RDF• Can be accessed via web or semantic web services
![Page 17: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/17.jpg)
HYQUE IS COMPOSED OF …
• HyQue hypothesis ontology
• Describes generic input hypothesis and output hypothesis evaluation classes
• Uses upper level classes e.g. ‘proposition’, ‘measurement value’, ‘event’
• HyQue Data
• Experimentally determined interactions between the GAL proteins (GAL knowledge base from HyBrow project)
• Literature-based evidence (citations)• Knowledge about cellular localization and biological processes (GO)• Types of evidence supporting these interactions (ECO)• yeast gene/protein/function data (SGD)
![Page 18: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/18.jpg)
A HYQUE HYPOTHESIS IS A COLLECTION OF PROPOSITIONS
• HyQue hypotheses are composed of one or more propositions connected using logical operators (AND, OR, XOR…)
• proposition: “a statement expressing something true or false”
• HyQue propositions only specify events
HyQue hypothesis ≡ ‘proposition’
that ‘specifies’ only `event’)
HyQue hypothesis ≡ ‘proposition’
that `has component part’ only
(`proposition’ that ‘specifies’ only `event’)
![Page 19: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/19.jpg)
HYQUE EVENTS
1. protein-protein binding
2. protein-nucleic acid binding
3. molecular activation
4. molecular inhibition
5. gene induction
6. gene repression
7. transport
![Page 20: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/20.jpg)
HYQUE EVENTS
Events are composed of conditional assertions on a relation between ‘actor’ and ‘target’
induces(agent, target, context, location)
For decidable logic (OWL), an n-ary object is used
Event ‘has agent’ agent ‘has target’ target ‘has context’ context ‘is located in’ location
![Page 21: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/21.jpg)
ALL DATA ARE REPRESENTED USING RDF
event:gal4p positively regulates the expression of GAL1
hypothesis
proposition
has component part
specifies
RDF’s basic representation unit is the “triple”
<subject> <predicate> <object>
:h rdf:type hyque:Hypothesis .
:h hyque:has-component-part :p1 .
:p1 rdf:type hyque:Proposition .
![Page 22: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/22.jpg)
ALL DATA ARE REPRESENTED USING RDF
event:gal4p positively regulates the expression of GAL1
hypothesis
specifies
:h a hyque:Hypothesis ;
hyque:specifies :e1 .
:e1 a <http://bio2rdf.org/go:0010628>
<!– positive regulation of gene expression -->
hyque:is_negated "0";
hyque:agent <http://bio2rdf.org/sgd:Gal4p> ;
hyque:target <http://bio2rdf.org/sgd:GAL1> ;
….
![Page 23: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/23.jpg)
USER INTERFACE FACILITATES DESIGNING THE HYPOTHESIS
![Page 24: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/24.jpg)
TEMPLATE SPARQL QUERIES COMPLETED BASED ON EVENT PROPERTIES
:e1 a go:0010628;hyque:is_negated "0" ;hyque:agent sgd:Gal4p;hyque:target sgd:GAL1 .
construct { … }
where { ?event hyque:is_negated ?negated . ?event hyque:logical_operator ?logical_operator . ?event hyque:agent <http://bio2rdf.org/sgd:Gal4p> . ?event hyque:target<http://bio2rdf.org/sgd:GAL1> . …}
binding
Hypothesis + SPARQL Template => SPARQL query
![Page 25: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/25.jpg)
SPARQL QUERY RESULTS RETRIEVED
hybrow_data:f0957524deecae38945736737cc07d45 hyque:logical_operator <http://bio2rdf.org/go:0010628> ; hyque:is_negated "0" ; hyque:agent <http://bio2rdf.org/sgd:Gal4p> ; hyque:target <http://bio2rdf.org/sgd:GAL1>; hyque:agent_type <http://bio2rdf.org/chebi:36080> ; hyque:target_type <http://bio2rdf.org/so:0000236> ; hyque:location <http://bio2rdf.org/go:0005634> ; hyque:agent_function_type <http://bio2rdf.org/go:0003700> .
Protein
Gene
Nucleus
Transcription factor activity
positive regulation of
gene expression
![Page 26: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/26.jpg)
QUERY RESULTS EVALUATED BASED ON RULE SETS‘induce’ rule (maximum score: 5):
• Is event negated?• If yes, subtract 2
• Is logical operator ‘induce’?• If yes, add 1; if no, subtract 1
• Is agent of type ‘protein’ or ‘RNA’?• If yes, add 1; if of type ‘gene’, subtract 1
• Is target of type ‘gene’? • If yes, add 1; if no, subtract 1
• Does agent have known ‘transcription factor activity’? • If yes, add 1
• Is event located in the ‘nucleus’?• If yes, add 1; if no, subtract 1
GO:0010628
CHEBI:36080
SO:0000236
GO:0003700
GO:0005634
![Page 27: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/27.jpg)
EVALUATING HYPOTHESESe1 (Gal4p induces expression of GAL1)
e1 describes the induction of GAL1 gene expression by Gal4p and is therefore an event of type ‘induce’.
Evaluation:
•Agent of type ‘protein’: yes -> +1
•Target of type ‘gene’: yes -> +1
•Agent has function ‘transcription factor activity’: no -> 0
•Event location is ‘nucleus’: yes -> +1
•Logical operator is ‘induce’: yes -> +1
•Event negated in published literature: no -> 0
Thus, the e1 event obtains 4 out of a maximum of 5 points, and receives a score of 0.8.
![Page 28: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/28.jpg)
EVALUATING HYPOTHESES
Events e2, e3, and e4 are also ‘induce’ events and are evaluated using the ‘induce’ rule set, each obtaining a score of 0.8.
e5 is undecidable - no data to support that Gal80p inhibits Gal4p when GAL3 is over-expressed in HKB
-> third entire event set is deemed undecidable.
Overall hypothesis score selected from e1 (0.8), e2 + e3 (0.8+0.8=1.6)
Final hypothesis score is 1.6 + events e2 + e3 have the strongest experimental support.
e1 (Gal4p induces expression of GAL1)
OR
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
OR
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
![Page 29: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/29.jpg)
HYPOTHESIS EVALUATION REPRESENTED AS RDF
![Page 30: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/30.jpg)
BROWSE HYPOTHESIS AND EVALUATION AS LINKED DATA
![Page 31: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/31.jpg)
http://sadiframework.org
Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB
The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web services using OWL classes as service inputs and outputs
Users can post a hypothesis in RDF and receive the hypothesis evaluation RDF
HyQue can become part of a workflow for investigations
![Page 32: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/32.jpg)
FUTURE DIRECTIONS• Investigate alternative, finer grained scoring systems
• Expand beyond the GAL network with network reconstructions and NLP facilitated data curation
• Collaborative social environment to engineer, share, compare and evaluate hypotheses, and format the results
![Page 33: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/33.jpg)
CONCLUSION
HyQue is a new system to construct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data on the Semantic Web
![Page 34: HyQue: Evaluating scientific Hypotheses using semantic web technologies](https://reader035.fdocuments.us/reader035/viewer/2022081507/554e8e50b4c90526358b4c72/html5/thumbnails/34.jpg)
AcknowledgementsAlison Callahan (developing HyQue)
Nigam Shah (key collaborator)
Stephen Racunas and Amar Das for helpful discussions
Bio2RDF: Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keith, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and, Paul Roe
SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keith, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson