Reasoning with RNA

39
Reasoning with the RNA Ontology Chris Mungall Lawrence Berkeley National Laboratory Reasoning with the RNA Ontology – p.1/28

Transcript of Reasoning with RNA

Page 1: Reasoning with RNA

Reasoning with the RNA OntologyChris Mungall

Lawrence Berkeley National Laboratory

Reasoning with the RNA Ontology – p.1/28

Page 2: Reasoning with RNA

What is a reasoner?

A reasoner implements a generalized decision procedurewhich takes a collection of logical axioms and finds theentailments of these axioms and whether or not the axiomsare satisfiable

An ontology can be considered as a collection ofaxioms (in contrast to a terminology)1. Relationships: is a (SubClass), partOf, ...2. Definitions3. Constraints

We can also treat data as collections of axioms

Reasoning with the RNA Ontology – p.2/28

Page 3: Reasoning with RNA

Examples of Ontology Axioms

GNRATetraloop is a TetraloopTetraloop is a RNAStructure

Reasoning with the RNA Ontology – p.3/28

Page 4: Reasoning with RNA

Examples of Ontology Axioms

GNRATetraloop is a TetraloopTetraloop is a RNAStructure

A translation to first-order predicate logic:

GNRATetraloop(x)→ Tetraloop(x)

Tetraloop(x)→ RNAStructure(x)

Reasoning with the RNA Ontology – p.3/28

Page 5: Reasoning with RNA

Examples of Ontology Axioms

GNRATetraloop is a TetraloopTetraloop is a RNAStructure

A translation to first-order predicate logic:

GNRATetraloop(x)→ Tetraloop(x)

Tetraloop(x)→ RNAStructure(x)

Set theoretic:GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure

Reasoning with the RNA Ontology – p.3/28

Page 6: Reasoning with RNA

Examples of Ontology Axioms

GNRATetraloop is a TetraloopTetraloop is a RNAStructure

A translation to first-order predicate logic:

GNRATetraloop(x)→ Tetraloop(x)

Tetraloop(x)→ RNAStructure(x)

Set theoretic:GNRATetraloop ⊆ Tetraloop ⊆ RNAStructure

Entailment: GNRATetraloop is a RNAStructure

Reasoning with the RNA Ontology – p.3/28

Page 7: Reasoning with RNA

The reasoning square

Data

Ontology

Classifying Validation

Inference

of unstated

relationships in

the ontology

Inference

of unstated

facts in data

Finding

inconsistent

axioms in

the ontology

Determining

if a dataset

is valid

Reasoning with the RNA Ontology – p.4/28

Page 8: Reasoning with RNA

The reasoning square

Data

Ontology

Classifying Validation

Tetraloop

GNRA Tetraloop

G N

RA

N N

NN

T therm 23SRNA region

N

N

N

N

G A

AA

C

G

A

Purine Pyramidine

disjoint

X

G A

GA

C

G

GNRA Tetraloop

X

Reasoning with the RNA Ontology – p.5/28

Page 9: Reasoning with RNA

Ontology Languages

First Order Logic (Common Logic ISO standard)Highly ExpressiveUndecidable : No tractable decision procedures

OWL and Description LogicsRestricted subset of FOL with highly convenientconstructs for describing classesReasoners are heavily tested on existing ontologies

OBOInitially an ad-hoc format for the Gene OntologyNow an alternate syntax for Common LogicReasoners based on rule application

Reasoning with the RNA Ontology – p.6/28

Page 10: Reasoning with RNA

Common Logic

Common Logic is an ISO specification for First Order Logic(FOL)

SyntaxesCLIF - Lisp-like (derived from KIF)XCL - XMLCG - Conceptual Graphs

A CL text consists of CL sentences (axioms)

Sentences can be atomic, boolean or logicallyquantified

Atomic sentence: a predicate followed by zero ormore argumentsBoolean sentence: and, or, if (→ ), iff (↔ )Quantified sentence: forall (∀), exists(∃)

Reasoning with the RNA Ontology – p.7/28

Page 11: Reasoning with RNA

Common Logic Examples

Textbook syntax CLIF

∀x : GNRATetraloop(x) → Tetraloop(x)

(forall (x)

(if (GNRATetraloop x)

(Tetraloop x)))

∀x : Purine(x) → ¬Pyramidine(x)

(forall (x)

(if (Purine x)

(not Pyramidine x)))

∀x : Intron(x) → ∃yExon(y) ∧ adjacent_to(x, y)

(forall (x)

(if (Intron x)

(exists (y)

(and (Exon y)

(adjacentTo x y)))))

Reasoning with the RNA Ontology – p.8/28

Page 12: Reasoning with RNA

Reasoning with FOL

Undecidable.FOL Theorem provers are not guaranteed to terminate

The Horn logic subset has desirable computationalproperties

Head← Body

Logic ProgrammingSWRLDatalogRelational Model, Relational Algebranon-monotonic and probabilistic extensions

Reasoning with the RNA Ontology – p.9/28

Page 13: Reasoning with RNA

OWL-DL

OWL belongs to a family of logic known as DescriptionLogics, circumscribed subsets of FOL that are guaranteedto be decidable

Variety of notations (syntaxes):RDF-XML - Default, but it’s a messOWL-XML - Easier to manipulate computationallyManchester Syntax - Easy on the eye

ConstructsProperty (relation) unary predicates: Functional,Transitive, Symmetric, ...Class Axioms: SubClass, EquivalentClass,DisjointWith, ...Descriptions

OWL2 has lots of tool and reasoners to choose fromReasoning with the RNA Ontology – p.10/28

Page 14: Reasoning with RNA

Descriptions in OWL

A Description is a (possibly recursive) tree structure thatformally identifies membership criteria for a class.

Can be combined using logical connectives: AND, OR,NOT

AND : intersectionOf

OR : unionOf

NOT : complementOf

RestrictionsRestrict class membership based on some property

ONLY : example (paired with CWWONLY Guanine)

SOME :

Quantified cardinality restrictions

Example: CWWAGBasePair = hasPart only (A and pair-

sWithCWW some G)Reasoning with the RNA Ontology – p.11/28

Page 15: Reasoning with RNA

OWL Reasoners

Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw

Reasoning with the RNA Ontology – p.12/28

Page 16: Reasoning with RNA

OWL Reasoners

Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw

Widely used and tested on ontologiesMany reasoners can now classify the larger biologicalontologies in acceptable time

Reasoning with the RNA Ontology – p.12/28

Page 17: Reasoning with RNA

OWL Reasoners

Decision Procedure based on tableau calculusRefutation-based, repeated applications of de-Morgan’slaw

Widely used and tested on ontologiesMany reasoners can now classify the larger biologicalontologies in acceptable time

Less widely used on dataRDF triplestores are commonly used but these lack keyOWL constructs.OWLGRES is a promising technology here.

Reasoning with the RNA Ontology – p.12/28

Page 18: Reasoning with RNA

No Unique Name Assumption

Classes and instances are potentially equivalent unlessdeclared otherwise. Given ontology axiom:Functional(fivePrimeTo)

An instance axioms:A(b1)

A(b2)

A(b3)

b1 fivePrimeTo b2

b1 fivePrimeTo b3

A reasoner will not say this is inconsistent. It will infer thatb2=b3. To get a reasoner to detect the inconsistency wemust explicitly declare all base instances to be distinct:b1 differentFrom b2

b1 differentFrom b3

b2 differentFrom b3Reasoning with the RNA Ontology – p.13/28

Page 19: Reasoning with RNA

The Open World Assumption

Unstated facts are not assumed to be false. Given ontologyaxiomsA SubClassOf Base

UnpairedBase equivalentTo some

(Base that pairedWith 0 Base)

An instance axioms:A(b1)

A(b2)

A(b3)

b1 fivePrimeTo b2

b2 fivePrimeTo b3

A reasoner will not infer b1, b2 or b3 to be UnpairedBases.We need to explicitly declare this:UnpairedBase(b1)

UnpairedBase(b2)

UnpairedBase(b3)Reasoning with the RNA Ontology – p.14/28

Page 20: Reasoning with RNA

OBO

Initially an ad-hoc format for the Gene OntologyGraph-centricTerminological features

Formal SemanticsInitially lacked formal semantics. Formal definitionwritten in natural language in Relations Ontology.Translation to OWL-DL (Horrocks et al)

With OBO 1.3, every OBO document is aCommon Logic TextOBO-Core consists only of atomic sentencesOBO-CL allows arbitrary logical formulaeOBO-H OBO-Core plus horn rules

Reasoning with the RNA Ontology – p.15/28

Page 21: Reasoning with RNA

Reasoning over OBO ontologies

Strategies

convert to OWL and use an OWL reasoner

convert to CL and use a FOL theorem prover

Use a rule-based reasonerJava implementation: OBO-EditProlog implementation: Easy to extendSQL implementation: slow but scales over massiveontologies and datasetsLimitations: limited support for negation

Reasoning with the RNA Ontology – p.16/28

Page 22: Reasoning with RNA

Are Description Logics enough?

Some things that cannot be done in OWL-2:

Define relations using arithmetic:

Define relations using intersection, union and negation

Declare relations with > 2 argumentsMakes reasoning about change harder

Model cyclic structuresAny structure with an acyclic path through somecombination of relations (Carbon rings, RNA molecules)

Reasoning with the RNA Ontology – p.17/28

Page 23: Reasoning with RNA

Arithmetic in relations

We cannot express this in OWL:

upstreamOf(x, y)← end(x) < start(y)

In OWL we must:

explicitly name all the bases, and declare a 5’ to 3’ connectionrelation between them

declare < as the transitive version of the 5’ to 3’ relation

This is feasible with RNA, but not DNA

Reasoning with the RNA Ontology – p.18/28

Page 24: Reasoning with RNA

Relation Boolean Constructs

We cannot express this in OWL:

overlaps = ends.after.startOf ∩ starts.before.endOf

disconnected = ¬overlaps

This severely limits OWL when applied to instance datainvolving intervals

Reasoning with the RNA Ontology – p.19/28

Page 25: Reasoning with RNA

N-ary relations and time

In OWL, all relations must be binary. N-ary relations areuseful for reasoning about change.

As the RNA molecule folds, unpaired bases becomepaired:

¬paired with CWW(b1, b5, t0)

paired with CWW(b1, b5, t1)

instance of (b1, UnpairedBase, t0)

instance of (b1, PairedBase, t1)

There are a variety of (awkward) techniques fortranslating N-ary relations to binary

¬paired with CWW(b1@t0, b5@t0)

paired with CWW(b1@t1, b5@t1) Reasoning with the RNA Ontology – p.20/28

Page 26: Reasoning with RNA

Cyclic descriptions

OWL Descriptions are tree-like. Cyclic descriptions arerequired for RNA Structures. Proposed def of GNRATetraloop:GNRA TetraloopMotif =

hasPart some

( Nucleobase and

fivePrimeTo some

(G and fivePrimeTo some

(Nucleobase and fivePrimeTo some

(Purine and fivePrimeTo some

(A and fivePrimeTo some

(Nucleobase and pairsWithCWW some Nucleobase)

and pairsWithTHS some G)))

and pairsWithTSH some A)

and pairsWithCWW some Nucleobase)

Reasoning with the RNA Ontology – p.21/28

Page 27: Reasoning with RNA

Tree-like classification structure

A

R

N

N

N

N

N

G

G

A

GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)

Reasoning with the RNA Ontology – p.22/28

Page 28: Reasoning with RNA

Tree-like classification structure

A

R

N

N

N

N

N

G

G

A

GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)

AC G

AG A

Reasoning with the RNA Ontology – p.22/28

Page 29: Reasoning with RNA

Tree-like classification structure

A

R

N

N

N

N

N

G

G

A

GNRA TetraloopMotif = hasPart some (and fivePrimeTo some (G and fivePr(Nucleobase and fivePrimeTo some (NucleobasefivePrimeTo some (A and fivePrimeTo someand pairsWithCWW some Nucleobase) andTHS some G))) and pairsWithTSH somesWithCWW some Nucleobase)

A GA AC G

AG AA G C

Reasoning with the RNA Ontology – p.22/28

Page 30: Reasoning with RNA

Labeled sub-descriptions

We would like to do something like this, if it were possible inOWL:GNRATetraloopMotif =

hasPart some

(Nucleobase[1] and fivePrimeTo some

(G[2] and fivePrimeTo some

(Nucleobase[3] and fivePrimeTo some

(Nucleobase[4] and fivePrimeTo some

(A[5] and fivePrimeTo some

(Nucleobase[6] and pairsWithCWW some

and pairsWithTHS some G[2])))

and pairsWithTSH some A[5])

and pairsWithCWW some Nucleobase[6])

Reasoning with the RNA Ontology – p.23/28

Page 31: Reasoning with RNA

Rules

SWRL (Semantic Web Rule Language) extends OWL withrules. We can add this to the ontology:nucleotide(?b0),

g(?b1),

nucleotide(?b2),

purine(?b3),

a(?b4),

nucleotide(?b5),

followedBy(?b0, ?b1),

followedBy(?b1, ?b2),

followedBy(?b2, ?b3),

followedBy(?b3, ?b4),

followedBy(?b4, ?b5),

pairedWithTHS(?b4, ?b1),

pairedWithCWW(?b5, ?b0)

--> partOfGNRATetraloop(?b0)

Reasoning with the RNA Ontology – p.24/28

Page 32: Reasoning with RNA

Is SWRL the answer?

Bonus: Can be extended with arithmetic operators (todefine upstreamOf)

Negative: only binary relations

Negative: only instance classificationWe cannot use the previous definition for ontologyclassification

Negative: we cannot infer the existence of undeclaredentitiesWe can tell a base is part of a tetraloop motif, but wecan’t infer the tetraloop motif instance

Reasoning with the RNA Ontology – p.25/28

Page 33: Reasoning with RNA

Description Graphs

An extension of OWL to allow representation of cyclicstructures[?].

Possibly part of OWL3?

Implemented in HermiT reasoner

Largely new and untested

Reasoning with the RNA Ontology – p.26/28

Page 34: Reasoning with RNA

OBO Graphs

Cyclic structures can be described in OBO, the graph is

translated to simple rules. These rules can be executed us-

ing LP or even SQL.

Reasoning with the RNA Ontology – p.27/28

Page 35: Reasoning with RNA

OBO Graphs

Reasoning with the RNA Ontology – p.27/28

Page 36: Reasoning with RNA

Conclusions

There is no one single ideal subset of FOL for reasoning

The RNA Ontology should employ as expressive a logicas it needs

But first the RNAO must exist

Reasoning with the RNA Ontology – p.28/28

Page 37: Reasoning with RNA

Conclusions

There is no one single ideal subset of FOL forreasoning

All subsets have limitations.DLs cannot express a lot of what we need forprimary and secondary sequence structures

The RNA Ontology should employ as expressive a logicas it needs

But first the RNAO must exist

Reasoning with the RNA Ontology – p.28/28

Page 38: Reasoning with RNA

Conclusions

There is no one single ideal subset of FOL for reasoning

The RNA Ontology should employ as expressive a logicas it needs

An incorrect formally specified definition is worsethan a correct informally specified definitionHybrid reasoning approaches are feasibleThe basic instance classification problem is just notthat hard (compared to RNA bioinformatics as awhole)Special purpose algorithms will probably beatgeneral purpose reasoners

But first the RNAO must exist

Reasoning with the RNA Ontology – p.28/28

Page 39: Reasoning with RNA

Conclusions

There is no one single ideal subset of FOL for reasoning

The RNA Ontology should employ as expressive a logicas it needs

But first the RNAO must existPerhaps its too early to worry too much aboutreasoningPriority: simple term lists, basic isa hierarchy, withdefinitions written for humans, plus motif definitionsin some compact notation

Reasoning with the RNA Ontology – p.28/28