Reasoning with the RNA OntologyChris Mungall
Lawrence Berkeley National Laboratory
Reasoning with the RNA Ontology – p.1/28
What is a reasoner?
A reasoner implements a generalized decision procedurewhich takes a collection of logical axioms and finds theentailments of these axioms and whether or not the axiomsare satisfiable
An ontology can be considered as a collection ofaxioms (in contrast to a terminology)1. Relationships: is a (SubClass), partOf, ...2. Definitions3. Constraints
We can also treat data as collections of axioms
Reasoning with the RNA Ontology – p.2/28
Examples of Ontology Axioms
GNRATetraloop is a TetraloopTetraloop is a RNAStructure
Reasoning with the RNA Ontology – p.3/28
Examples of Ontology Axioms
GNRATetraloop is a TetraloopTetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x)→ Tetraloop(x)
Tetraloop(x)→ RNAStructure(x)
Reasoning with the RNA Ontology – p.3/28
Examples of Ontology Axioms
GNRATetraloop is a TetraloopTetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x)→ Tetraloop(x)
Tetraloop(x)→ RNAStructure(x)
Set theoretic:GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure
Reasoning with the RNA Ontology – p.3/28
Examples of Ontology Axioms
GNRATetraloop is a TetraloopTetraloop is a RNAStructure
A translation to first-order predicate logic:
GNRATetraloop(x)→ Tetraloop(x)
Tetraloop(x)→ RNAStructure(x)
Set theoretic:GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure
Entailment: GNRATetraloop is a RNAStructure
Reasoning with the RNA Ontology – p.3/28
The reasoning square
Data
Ontology
Classifying Validation
Inference
of unstated
relationships in
the ontology
Inference
of unstated
facts in data
Finding
inconsistent
axioms in
the ontology
Determining
if a dataset
is valid
Reasoning with the RNA Ontology – p.4/28
The reasoning square
Data
Ontology
Classifying Validation
Tetraloop
GNRA Tetraloop
G N
RA
N N
NN
T therm 23SRNA region
N
N
N
N
G A
AA
C
G
A
Purine Pyramidine
disjoint
X
G A
GA
C
G
GNRA Tetraloop
X
Reasoning with the RNA Ontology – p.5/28
Ontology Languages
First Order Logic (Common Logic ISO standard)Highly ExpressiveUndecidable : No tractable decision procedures
OWL and Description LogicsRestricted subset of FOL with highly convenientconstructs for describing classesReasoners are heavily tested on existing ontologies
OBOInitially an ad-hoc format for the Gene OntologyNow an alternate syntax for Common LogicReasoners based on rule application
Reasoning with the RNA Ontology – p.6/28
Common Logic
Common Logic is an ISO specification for First Order Logic(FOL)
SyntaxesCLIF - Lisp-like (derived from KIF)XCL - XMLCG - Conceptual Graphs
A CL text consists of CL sentences (axioms)
Sentences can be atomic, boolean or logicallyquantified
Atomic sentence: a predicate followed by zero ormore argumentsBoolean sentence: and, or, if (→ ), iff (↔ )Quantified sentence: forall (∀), exists(∃)
Reasoning with the RNA Ontology – p.7/28
Common Logic Examples
Textbook syntax CLIF
∀x : GNRATetraloop(x) → Tetraloop(x)
(forall (x)
(if (GNRATetraloop x)
(Tetraloop x)))
∀x : Purine(x) → ¬Pyramidine(x)
(forall (x)
(if (Purine x)
(not Pyramidine x)))
∀x : Intron(x) → ∃yExon(y) ∧ adjacent_to(x, y)
(forall (x)
(if (Intron x)
(exists (y)
(and (Exon y)
(adjacentTo x y)))))
Reasoning with the RNA Ontology – p.8/28
Reasoning with FOL
Undecidable.FOL Theorem provers are not guaranteed to terminate
The Horn logic subset has desirable computationalproperties
Head← Body
Logic ProgrammingSWRLDatalogRelational Model, Relational Algebranon-monotonic and probabilistic extensions
Reasoning with the RNA Ontology – p.9/28
OWL-DL
OWL belongs to a family of logic known as DescriptionLogics, circumscribed subsets of FOL that are guaranteedto be decidable
Variety of notations (syntaxes):RDF-XML - Default, but it’s a messOWL-XML - Easier to manipulate computationallyManchester Syntax - Easy on the eye
ConstructsProperty (relation) unary predicates: Functional,Transitive, Symmetric, ...Class Axioms: SubClass, EquivalentClass,DisjointWith, ...Descriptions
OWL2 has lots of tool and reasoners to choose fromReasoning with the RNA Ontology – p.10/28
Descriptions in OWL
A Description is a (possibly recursive) tree structure thatformally identifies membership criteria for a class.
Can be combined using logical connectives: AND, OR,NOT
AND : intersectionOf
OR : unionOf
NOT : complementOf
RestrictionsRestrict class membership based on some property
ONLY : example (paired with CWWONLY Guanine)
SOME :
Quantified cardinality restrictions
Example: CWWAGBasePair = hasPart only (A and pair-
sWithCWW some G)Reasoning with the RNA Ontology – p.11/28
OWL Reasoners
Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw
Reasoning with the RNA Ontology – p.12/28
OWL Reasoners
Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw
Widely used and tested on ontologiesMany reasoners can now classify the larger biologicalontologies in acceptable time
Reasoning with the RNA Ontology – p.12/28
OWL Reasoners
Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw
Widely used and tested on ontologiesMany reasoners can now classify the larger biologicalontologies in acceptable time
Less widely used on dataRDF triplestores are commonly used but these lack keyOWL constructs.OWLGRES is a promising technology here.
Reasoning with the RNA Ontology – p.12/28
No Unique Name Assumption
Classes and instances are potentially equivalent unlessdeclared otherwise. Given ontology axiom:Functional(fivePrimeTo)
An instance axioms:A(b1)
A(b2)
A(b3)
b1 fivePrimeTo b2
b1 fivePrimeTo b3
A reasoner will not say this is inconsistent. It will infer thatb2=b3. To get a reasoner to detect the inconsistency wemust explicitly declare all base instances to be distinct:b1 differentFrom b2
b1 differentFrom b3
b2 differentFrom b3Reasoning with the RNA Ontology – p.13/28
The Open World Assumption
Unstated facts are not assumed to be false. Given ontologyaxiomsA SubClassOf Base
UnpairedBase equivalentTo some
(Base that pairedWith 0 Base)
An instance axioms:A(b1)
A(b2)
A(b3)
b1 fivePrimeTo b2
b2 fivePrimeTo b3
A reasoner will not infer b1, b2 or b3 to be UnpairedBases.We need to explicitly declare this:UnpairedBase(b1)
UnpairedBase(b2)
UnpairedBase(b3)Reasoning with the RNA Ontology – p.14/28
OBO
Initially an ad-hoc format for the Gene OntologyGraph-centricTerminological features
Formal SemanticsInitially lacked formal semantics. Formal definitionwritten in natural language in Relations Ontology.Translation to OWL-DL (Horrocks et al)
With OBO 1.3, every OBO document is aCommon Logic TextOBO-Core consists only of atomic sentencesOBO-CL allows arbitrary logical formulaeOBO-H OBO-Core plus horn rules
Reasoning with the RNA Ontology – p.15/28
Reasoning over OBO ontologies
Strategies
convert to OWL and use an OWL reasoner
convert to CL and use a FOL theorem prover
Use a rule-based reasonerJava implementation: OBO-EditProlog implementation: Easy to extendSQL implementation: slow but scales over massiveontologies and datasetsLimitations: limited support for negation
Reasoning with the RNA Ontology – p.16/28
Are Description Logics enough?
Some things that cannot be done in OWL-2:
Define relations using arithmetic:
Define relations using intersection, union and negation
Declare relations with > 2 argumentsMakes reasoning about change harder
Model cyclic structuresAny structure with an acyclic path through somecombination of relations (Carbon rings, RNA molecules)
Reasoning with the RNA Ontology – p.17/28
Arithmetic in relations
We cannot express this in OWL:
upstreamOf(x, y)← end(x) < start(y)
In OWL we must:
explicitly name all the bases, and declare a 5’ to 3’ connectionrelation between them
declare < as the transitive version of the 5’ to 3’ relation
This is feasible with RNA, but not DNA
Reasoning with the RNA Ontology – p.18/28
Relation Boolean Constructs
We cannot express this in OWL:
overlaps = ends.after.startOf ∩ starts.before.endOf
disconnected = ¬overlaps
This severely limits OWL when applied to instance datainvolving intervals
Reasoning with the RNA Ontology – p.19/28
N-ary relations and time
In OWL, all relations must be binary. N-ary relations areuseful for reasoning about change.
As the RNA molecule folds, unpaired bases becomepaired:
¬paired with CWW(b1, b5, t0)
paired with CWW(b1, b5, t1)
instance of (b1, UnpairedBase, t0)
instance of (b1, PairedBase, t1)
There are a variety of (awkward) techniques fortranslating N-ary relations to binary
¬paired with CWW(b1@t0, b5@t0)
paired with CWW(b1@t1, b5@t1) Reasoning with the RNA Ontology – p.20/28
Cyclic descriptions
OWL Descriptions are tree-like. Cyclic descriptions arerequired for RNA Structures. Proposed def of GNRATetraloop:GNRA TetraloopMotif =
hasPart some
( Nucleobase and
fivePrimeTo some
(G and fivePrimeTo some
(Nucleobase and fivePrimeTo some
(Purine and fivePrimeTo some
(A and fivePrimeTo some
(Nucleobase and pairsWithCWW some Nucleobase)
and pairsWithTHS some G)))
and pairsWithTSH some A)
and pairsWithCWW some Nucleobase)
Reasoning with the RNA Ontology – p.21/28
Tree-like classification structure
A
R
N
N
N
N
N
G
G
A
GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)
Reasoning with the RNA Ontology – p.22/28
Tree-like classification structure
A
R
N
N
N
N
N
G
G
A
GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)
AC G
AG A
Reasoning with the RNA Ontology – p.22/28
Tree-like classification structure
A
R
N
N
N
N
N
G
G
A
GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)
A GA AC G
AG AA G C
Reasoning with the RNA Ontology – p.22/28
Labeled sub-descriptions
We would like to do something like this, if it were possible inOWL:GNRATetraloopMotif =
hasPart some
(Nucleobase[1] and fivePrimeTo some
(G[2] and fivePrimeTo some
(Nucleobase[3] and fivePrimeTo some
(Nucleobase[4] and fivePrimeTo some
(A[5] and fivePrimeTo some
(Nucleobase[6] and pairsWithCWW some
and pairsWithTHS some G[2])))
and pairsWithTSH some A[5])
and pairsWithCWW some Nucleobase[6])
Reasoning with the RNA Ontology – p.23/28
Rules
SWRL (Semantic Web Rule Language) extends OWL withrules. We can add this to the ontology:nucleotide(?b0),
g(?b1),
nucleotide(?b2),
purine(?b3),
a(?b4),
nucleotide(?b5),
followedBy(?b0, ?b1),
followedBy(?b1, ?b2),
followedBy(?b2, ?b3),
followedBy(?b3, ?b4),
followedBy(?b4, ?b5),
pairedWithTHS(?b4, ?b1),
pairedWithCWW(?b5, ?b0)
--> partOfGNRATetraloop(?b0)
Reasoning with the RNA Ontology – p.24/28
Is SWRL the answer?
Bonus: Can be extended with arithmetic operators (todefine upstreamOf)
Negative: only binary relations
Negative: only instance classificationWe cannot use the previous definition for ontologyclassification
Negative: we cannot infer the existence of undeclaredentitiesWe can tell a base is part of a tetraloop motif, but wecan’t infer the tetraloop motif instance
Reasoning with the RNA Ontology – p.25/28
Description Graphs
An extension of OWL to allow representation of cyclicstructures[?].
Possibly part of OWL3?
Implemented in HermiT reasoner
Largely new and untested
Reasoning with the RNA Ontology – p.26/28
OBO Graphs
Cyclic structures can be described in OBO, the graph is
translated to simple rules. These rules can be executed us-
ing LP or even SQL.
Reasoning with the RNA Ontology – p.27/28
OBO Graphs
Reasoning with the RNA Ontology – p.27/28
Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logicas it needs
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
Conclusions
There is no one single ideal subset of FOL forreasoning
All subsets have limitations.DLs cannot express a lot of what we need forprimary and secondary sequence structures
The RNA Ontology should employ as expressive a logicas it needs
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logicas it needs
An incorrect formally specified definition is worsethan a correct informally specified definitionHybrid reasoning approaches are feasibleThe basic instance classification problem is just notthat hard (compared to RNA bioinformatics as awhole)Special purpose algorithms will probably beatgeneral purpose reasoners
But first the RNAO must exist
Reasoning with the RNA Ontology – p.28/28
Conclusions
There is no one single ideal subset of FOL for reasoning
The RNA Ontology should employ as expressive a logicas it needs
But first the RNAO must existPerhaps its too early to worry too much aboutreasoningPriority: simple term lists, basic isa hierarchy, withdefinitions written for humans, plus motif definitionsin some compact notation
Reasoning with the RNA Ontology – p.28/28
Top Related