Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of...

23
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King Presenter: Mark H. Rich 2/7/2003 University of Wisconsin - Madison CS 838 Learning and Modeling Biological Networks

Transcript of Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of...

Page 1: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes

by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King

Presenter: Mark H. Rich

2/7/2003

University of Wisconsin - Madison

CS 838 Learning and Modeling Biological Networks

Page 2: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Discovering Gene Function

Yeast (S. cerevisiae) has 6,000 protein-encoding genes

Only 60% can be assigned function with confidence

The cell is a bio-chemical machine

Logic can help us discover these metabolic functions and networks

Page 3: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

ASE-Progol

Robot Scientist

BackgroundKnowledge

AnalysisLearningEngine

ResultsExperimentSelection

NewKnowledge

Page 4: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Outline

Introduction

Abduction and Active Learning

Functional Genomics

Metabolism in Logic

Experiments

Results

Page 5: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Logic in AI

DeductionGiven facts with sound and complete proof theory, show that other facts can be proven

InductionGiven positive and negative examples of facts and background knowledge, find hypothesis that explains difference between positives and negatives

Page 6: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Abduction and TCIE

Given a theory and partial facts, discover what facts are missing to form one consistent hypothesis

Lateral Thinking PuzzlesPresented with a confusing situation

There is an Oracle that knows what happened

You can only ask yes or no questions

Page 7: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

The Mysterious Package

One day a man received a parcel in the post. Carefully packed was a human arm. He examined it, repacked it and then sent it on to another man. The second man also carefully examined the arm before taking it to the woods and burying it. Why did they do this?

Page 8: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

The Mysterious PackageWas the arm cut off intentionally?Is the arm’s person still alive?Is he a doctor?Did the three men know each other?Are the other men also missing an arm?Were they ever stuck on a desert island with no food, make a pact to each cut off an arm to eat and survive, but were rescued before the doctor could cut off his own arm, and the doctor later fulfilled his commitment? YES!

Page 9: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Lateral Thinking Lessons

Certain questions are valuable and lead to large leaps of information . . .

How do we form hypotheses?

How can we pick good questions?probability that question leads to consistent hypotheses

cost of asking question

We want to find quickest cheapest path to consistent hypotheses

Page 10: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Hypothesis Generation

Use contra-positives for inverse entailment

Background Knowledgehasbeak(X) :- bird(X).bird(X) :- vulture(X).

Examplehasbeak(tweety).

Hypothesesbird(tweety).bird(X).vulture(tweety).vulture(X).

Page 11: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Trial Selection Theorye1 e2 e3 e4

H1 0 1 1 1

H2 1 1 0 1

H3 1 0 1 1

e1

e2H1

H2 H3

One possible trial path

t f

t f

Page 12: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Hypothesis Probability

Each trial partitions H into {H[t],H[t’]}

Assuming optimal encoding scheme…

Prior probability of each hypothesis

Compression is rounded f measure

p(hi | E) =2Compression(hi |E )

2Compression(h |E )

h∈H

f = E + −E +

p(l1 + l2 + n)

Page 13: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Experiment Cost

Ct is the cost of a trial t

EC(H,T) ≈ mint∈T

Ct + p(t)(meant '∈(T −t )Ct ')JH t[ ]

+(1− p(t))(meant '∈(T −t )Ct ' )JH t [ ]

⎣ ⎢ ⎢

⎦ ⎥ ⎥

p(t) = p(h)h∈H t[ ]

JH = − p(h) log2(p(h))⎣ ⎦h∈H

Page 14: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Functional Genomics

Want to learn gene-enzyme mappingGenes encode for

Enzymes that catalyze reactions between

Metabolites to eventually create

Amino Acid Products

Perform auxotrophic growth experiments to determine phenotype

Page 15: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Functional Genomics: Simple

A, B and C are EnzymesX is ubiquitous metabolite, Y and Z optionalIf we knock out gene2, we need to add nutrient Z to produce Trpwant to learn codes(gene2, B, [Y], [Z]) but only ask:

pheno_effect(gene2,[Y]) is false

pheno_effect(gene2,[Z]) is true

pheno_effect(gene2,[Y,Z]) is true

X Y Z Trp

gene1 gene2 gene3

A B C

Page 16: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Aromatic amino acid pathway

aromatic amino acidsenzymesmetabolites

Page 17: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Metabolism in Logic

Hypotheses:codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]).

codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]).

etc ...

Background Knowledge:enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]).

enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]).

etc ...

generated_by_other_pathways([‘C00002’, ‘C00005’, ‘C00006’, ... , ‘C03356’]).

ends([‘C00078’, ‘C00079’, ‘C00082’]).

Page 18: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Metabolism in Logic

What the Oracle answers:

phenotypic_effect(ORF, Growth_medium):-

generated_by_other_pathways(Ubiquitous_metabolites),

union(Ubiquitous_metabolites, Growth_medium, Starts),

connected(Starts, Wild_products),

ends(Ends),

subset(Wild_products, Ends),

enz(Enzyme, Reactants, Products),

encodes(ORF, Enzyme, Reactants, Products),

connected_without_this_step(Starts, Mutant_products,

Enzyme, Reactants, Products),

not(subset(Mutant_products, Ends)).

Page 19: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

ExperimentsLearn function of 17 genes by removing ORFGrowth Media

13 optional nutrients, at most 3 at a time378 possible experiments for each ORF

Cost of Optional NutrientsDetermined from www.sigmaaldrich.com catalog

Strategies for ComparisonRandomNaïve CheapestASE-Progol

Page 20: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Experiments

Remove all codes(…) factsLoop

Generate random sample of trialsGenerate hypotheses using Theory Completion by Inverse EntailmentFind minimum EC(H,T) trial and performAdd results to known examples

until hypotheses consistent with trials

Page 21: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Results:Cost

Page 22: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Results: Time

Page 23: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.

Conclusions and Future WorkASE-Progol finds hypotheses inexpensively and quickly5 of 17 genes had only negative examples… why? Look into inhibitors and nonmonotonic logics.Limited answers to yes/no. Probabilities?Can this be applied to gene regulatory networks, using microarray technology?What other networks have similar frameworks?