Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical Informatics and...
-
Upload
anita-de-waard -
Category
Documents
-
view
874 -
download
1
description
Transcript of Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical Informatics and...
Formalising Uncertainty: An Ontology of Reasoning,
Certainty and A9ribu<on (ORCA)
Anita de Waard Disrup<ve Technologies Director Elsevier Labs, Jericho, VT, USA
Jodi Schneider PhD Researcher
DERI, Galway, Ireland
Outline • Background:
– Metadiscourse, epistemic modality, and knowledge a9ribu<on, oh my!
– Some related work: genre studies, linguis<cs, NLP • Our model:
– What it models – The ontology – How can we find this in text?
• Possible applica<ons: – Possible uses – Next steps
Background
Scien<sts make uncertain claims Uncertainty
These miRNAs neutralize p53-‐mediated CDK inhibi;on, possibly through direct inhibi;on of the expression of the tumor-‐suppressor LATS2.
But uncertainty gets lost while ci<ng Uncertainty
These miRNAs neutralize p53-‐mediated CDK inhibi;on, possibly through direct inhibi;on of the expression of the tumor-‐suppressor LATS2.
Certainty
Two oncogenic miRNAs, miR-‐372 and miR-‐373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006)
Uncertainty in ac<on:
• Voorhoeve et al., 2006: “These miRNAs neutralize p53-‐ mediated CDK inhibi<on, possibly through direct inhibi<on of the expression of the tumor suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a gene<c screen, miR-‐372 and miR-‐373 were found to allow prolifera<on of primary human cells that express oncogenic RAS and ac<ve p53, possibly by inhibi<ng the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-‐372 and-‐373, func<on as poten6al novel oncogenes in tes<cular germ cell tumors by inhibi<on of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-‐372 and miR-‐373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
“[Y]ou can transform .. fic<on into fact just by adding or subtrac<ng references”, Bruno Latour [1]
Uncertainty = Hedging: • Why do authors hedge?
– Make a claim ‘pending […] acceptance in the community’ [2] – ‘Create A Research Space’ – hedging allows authors to insert themselves into
the discourse in a community [3] – ‘the strongest claim a careful researcher can make’ [4]
• Hedging cues, specula<ve language, modality/nega<on: – Light et al [5]: finding specula<ve language – Wilbur et al [6]: focus, polarity, certainty, evidence, and direc<onality – Thompson et al [7]: level of specula<on, type/source of the evidence and
level of certainty
• Sen<ment detec<on (e.g. Kim and Hovy [8] a.m.o.): – Holder of the opinion, strength, polarity as ‘mathema<cal func<on’ ac<ng on
main proposi<onal content – Wide applica<ons in product reviews; but not (yet) in science!
Our Model
Our model for epistemic evalua<ons:
For a Proposi<on P, an epistemically marked clause E is an evalua<on of P, where EV, B, S(P), with:
– V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (-‐ 1= possibly untrue, -‐ 2 = probably untrue, -‐3 = assumed untrue)
– B = Basis: Reasoning Data
– S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit Model suggested by Eduard Hovy,
Informa;on Sciences Ins;tute University South Califormia
Adding Epistemic Evalua<on
Together, Lats2 and ASPP1 shunt p53 to proapopto<c promoters and promote the death of polyploid cells [1]. (…)
Value = 3 Source = N Basis = 0
Further biochemical characteriza<on of hMOBs showed that only hMOB1A and hMOB1B interact with both LATS1 and LATS2 in vitro and in vivo [39]. (…)
Value = 3 Source = N Basis = Data
Our findings reveal that miR-‐373 would be a poten<al oncogene and it par<cipates in the carcinogenesis of human esophageal cancer by suppressing LATS2 expression.
Value = 1 Source = Author Basis = Data
Furthermore, we demonstrated that the direct inhibi<on of LATS2 protein was mediated by miR-‐373 and manipulated the expression of miR-‐373 to affect esophageal cancer cells growth.
Value = 2 (3?) Source = Author Basis = Data
Finding hedges in text [9]: • Modal auxiliary verbs (e.g. can, could, might) • Qualifying adverbs and adjec<ves (e.g. interes;ngly, possibly, likely, poten;al, somewhat, slightly, powerful, unknown, undefined)
• References, either external (e.g. ‘[Voorhoeve et al., 2006]’) or internal (e.g. ‘See fig. 2a’).
• Repor<ng/epistemic verbs (e.g. suggest, imply, indicate, show) – either within the clause: ‘These results suggest that...’ – or in a subordinate clause governed by repor<ng-‐verb matrix clause ‘{These results suggest that} indeed, this represents the true endogenous ac;vity.’
Manual iden<fica<on: Value Modal
Aux Repor6ng Verb
Ruled by RV
Adverbs/Adjec6ves
References
None Total
Total value = 3 1 (0.5%) 81 (40%) 24 (12%) 7 (4%) 41 (20%) 47 (24%) 201(100%)
Total Value = 2 29 (51%) 23 (40%) 1 (2%) 4(7%) 57(100%)
Total Value = 1 9(27%) 11(33%) 11(33%) 1(3%) 1(3%) 33(100%)
Total Value = 0 9 (64%) 3 (21%) 1(7%) 1(7%) 14(100%)
Total No Modality 16(37%) 3(7%) 0 3(7%) 22(50%) 44(100%)
Overall Total 10 (2%) 146(23%) 64(10%) 10(2%) 50(8%) 69(11%) 640(100%)
Most prevalent clause type: “These results suggest that...”
Adverb/Connec<ve thus, therefore, together, recently, in summary
Determiner/Pronoun it, this, these, we/our
Adjec<ve previous, future, beeer
Noun phrase data, report, study, result(s); method or reference
Modal form of ‘to be’, may, remain
Adjec<ve ogen, recently, generally
Verb show, obtain, consider, view, reveal, suggest, hypothesize, indicate, believe
Preposi<on that, to
Repor<ng verbs vs. epistemic value: Value = 0 (unknown)
establish, (remain to be) elucidated, be (clear/useful), (remain to be) examined/determined, describe, make difficult to infer, report
Value = 1 (hypothe<cal)
be important, consider, expect, hypothesize (5x), give insight, raise possibility that, suspect, think
Value = 2 (probable)
appear, believe, implicate (2x), imply, indicate (12x), play a role, represent, suggest (18x), validate (2x),
Value = 3 (presumed true)
be able/apparent/important /posi<ve/visible, compare (2x), confirm (2x), define, demonstrate (15x), detect (5x), discover, display (3x), eliminate, find (3x), iden<fy (4x), know, need, note (2x), observe (2x), obtain (success/results-‐ 3x), prove to be, refer, report(2x), reveal (3x), see(2x), show(24x), study, view
Finding Claimed Knowledge Updates [10]: Defini<on: 1) A CKU expresses a proposi<on about biological en<<es 2) A CKU is a new proposi<on 3) The authors present the CKU as factual: => Strength = Certainty 4) A CKU is derived from experimental work described in the ar<cle: => Basis = Data 5) The ownership is a9ributed to the author(s) of the ar<cle. => Source = Author, Explicit 3), 4) and 5) are either explicitly expressed or structurally conveyed: Here we used mass spectrometry to iden:fy HuD as a novel SMN-‐interac;ng partner
Our analysis of known HuD-‐associated mRNAs iden:fied cpg15 mRNA as a highly abundant mRNA in HuD Ips
Automa<c hedge detec<on with The Xerox Incremental Parser:
Concept-‐matching:
Match concept pa9erns with rules
Assign features to keywords, dependencies and sentences
General linguis<c analysis of running texts:
Extract syntac<c dependencies between words
Chunking
Part-‐of-‐speech disambigua<on
Segment the sentences into words
Segment the text into sentences
Title Abstract Intro. Results Figures Discussion Citation
Interaction of survival of motor neuron (SMN) and HuD proteins [with m RNA cpg15rescues motor neuron axonal deficits]
Here we used mass spectrometry to identify HuD as a novel neuronal SMN-interacting partner.
Here we identify HuD as a novel interacting partner of SMN,
Together with our co-IP data, these results indicate that SMN associates with HuD in motor neurons.
SMN interacts with HuD.
Our MS and co-IP data demonstrate a strong interaction between SMN and HuD in spinal motor neuron axons.
Furthermore, these findings are consistent with recent studies demonstrating that the interaction of HuD with the spinal muscular atrophy (SMA) protein SMN …
Result: CKUs appear throughout the paper bio-event
interaction
entity 1 entity 2 location
HuD SMN motor neurons
event name
The Xerox Incremental Parser: Concept-‐matching:
Match concept pa9erns with rules
Assign features to keywords, dependencies and sentences
General linguis<c analysis of running texts:
Extract syntac<c dependencies between words
Chunking
Part-‐of-‐speech disambigua<on
Segment the sentences into words
Segment the text into sentences
The formal model
© Jodi Schneider, with thanks to Siggi Handschuh
orca [11] vocab.deri.ie/orca
Example Usage
<claim> orca:hasBasis orca:Data .
Basis
Source
ConfidenceLevel
How to represent the hierarchy?
lack of knowledge < hypothe;cal knowledge < dubita;ve knowledge < doxas;c knowledge
• skos:broaderThan – not appropriate • skos Collec<ons add an unwanted layer of complexity.
• Our approach: transi<ve proper<es “lessCertain” and “moreCertain”
Transi<ve proper<es used for ConfidenceLevel
ConfidenceLevel & its Rela<onships
Possible Applica<ons
Add knowledge value/basis/source to a bio-‐event
Biological statement with epistemic markup Epistemic evalua6on
Our findings reveal that miR-‐373 would be a poten<al oncogene and it par<cipates in the carcinogenesis of human esophageal cancer by suppressing LATS2 expression.
Value = Probable Source = Author Basis = Data
Further biochemical characteriza<on of hMOBs showed that only hMOB1A and hMOB1B interact with both LATS1 and LATS2 in vitro and in vivo [39].
Value = Presumed true Source = Reference Basis = Data
Moreover, the mechanisms by which tumor suppressor genes are inhibited may vary between tumors.
Value = Possible Source = Unknown Basis = Unknown
E.g. to augment Medscan [13] Biological statement with Medscan/epistemic markup
MedScan Analysis: Epistemic evalua6on
Furthermore, we present evidence that the secre;on of nesfa:n-‐1 into the culture media was drama<cally increased during the differen<a<on of 3T3-‐L1 preadipocytes into adipocytes (P < 0.001) and a{er treatments with TNF-‐alpha, IL-‐6, insulin, and dexamethasone (P < 0.01).
IL-‐6 è NUCB2 (nesfa;n-‐1) Rela<on: MolTransport Effect: Posi<ve CellType: Adipocytes Cell Line: 3T3-‐L1
Value = Probable Source = Author Basis = Data
Or Biological Exchange Language [14]: Biological statement with BEL/ epistemic markup
BEL representa6on: Epistemic evalua6on
These miRNAs neutralize p53-‐mediated CDK inhibi;on, possibly through direct inhibi;on of the expression of the tumor-‐suppressor LATS2.
Increased abundance of miR-‐372 decreases: Increased ac;vity of TP53 decreases ac;vity of CDK protein family r(MIR:miR-‐372) -‐|(tscript(p(HUGO:Trp53)) -‐| kin(p(PFH:”CDK Family”))) Increased abundance of miR-‐372 decreases abundance of LATS2 r(MIR:miR-‐372) -‐| r(HUGO:LATS2)
Value = Possible Source = Unknown Basis = Unknown
Using ORCA for Nanopublica<ons [15]:
• Use to indicate Strength, Basis, Source of Asser<ons:
Knowledge Strength, Basis, Source Methods Authors, DOIs
Next steps:
• Con<nuing experiments with automated detec<on
• Can be used in Claim-‐Evidence network projects, e.g. Data2Seman<cs or DIKB
• Could replace more complicated models of argumenta<on
• Ontology is available for all to use!
Thank you! • Funding:
– Elsevier Labs – NWO Casimir programme
• Collaborators: – Henk Pander Maat, UU – Agnes Sandor, XRCE – Siegfried Handshuh, DERI – Rinke Hoekstra & co, VU – Richard Boyce & co, UPi9 – Maria Liakata, EBI – Sophia Ananiadou & co, NaCTeM
• Discussion partners: – Phil Bourne, UCSD – Ed Hovy, – Gully Burns, ISI – Joanne Luciano, RPI – Tim Clark et al., Harvard
Ques<ons?
Anita de Waard [email protected]
h9p://elsatglabs.com/labs/anita/
Jodi Schneider [email protected]
h9p://jodischneider.com/jodi.html
References [1] Latour, B. and Woolgar, S., Laboratory Life: the Social Construc<on of Scien<fic Facts, 1979, Sage [2] Myers, G. (1992). ‘In this paper we report’: Speech acts and scien<fic facts, Jnl of Pragmatlcs 17 (1992) 295-‐313
[3] Swales, J. (1990). Genre Analysis, English in Acad. and Res.Se}ngs, Cambridge University Press, 1990. [4] Salager-‐Meyer, F. (1994), Hedges and Textual Communica<ve Func<on in Medical English Wri9en Discourse, English for Specific Purposes, Vol. 13, No. 2, pp. 149-‐170, 1994. [5] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, specula<ons, and statements in between. BioLINK 2004: Linking Biological Literature, Ontologies and Databases 2004:17-‐24. [6] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New direc<ons in biomedical text annota<ons: defini<ons, guidelines and corpus construc<on. BMC Bioinforma<cs 2006, 7:356. [7] Thompson P., Venturi G. et al. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp Building and Evalua<ng Resources for Biomedical Text Mining 2008. [8] Kim, S-‐M. Hovy, E.H. (2004). Determining the Sen<ment of Opinions,COLING conference, Geneva, 2004. [9] de Waard, A. and Pander Maat, H. (2012). Epistemic Modality and Knowledge A9ribu<on in Scien<fic Discourse: A Taxonomy of Types and Overview of Features. Workshop on Detec<ng Structure in Scholarly Discourse, ACL 2012. [10] Sándor, À. and de Waard, A., (2012). Iden<fying Claimed Knowledge Updates in Biomedical Research Ar<cles, Workshop on Detec<ng Structure in Scholarly Discourse, ACL 2012. [11] de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and A9ribu<on (ORCA), SATBI+SWIM, ISWC 2012. [12] Medscan [13] Biological Expression Language – h9p://www.openbel.org [14] Groth et al (2010) 'The anatomy of a nanopublica<on' Informa<on Services & Use 30:51-‐6