Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of...

Post on 16-Jan-2016

224 views 0 download

Tags:

Transcript of Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of...

Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes

by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King

Presenter: Mark H. Rich

2/7/2003

University of Wisconsin - Madison

CS 838 Learning and Modeling Biological Networks

Discovering Gene Function

Yeast (S. cerevisiae) has 6,000 protein-encoding genes

Only 60% can be assigned function with confidence

The cell is a bio-chemical machine

Logic can help us discover these metabolic functions and networks

ASE-Progol

Robot Scientist

BackgroundKnowledge

AnalysisLearningEngine

ResultsExperimentSelection

NewKnowledge

Outline

Introduction

Abduction and Active Learning

Functional Genomics

Metabolism in Logic

Experiments

Results

Logic in AI

DeductionGiven facts with sound and complete proof theory, show that other facts can be proven

InductionGiven positive and negative examples of facts and background knowledge, find hypothesis that explains difference between positives and negatives

Abduction and TCIE

Given a theory and partial facts, discover what facts are missing to form one consistent hypothesis

Lateral Thinking PuzzlesPresented with a confusing situation

There is an Oracle that knows what happened

You can only ask yes or no questions

The Mysterious Package

One day a man received a parcel in the post. Carefully packed was a human arm. He examined it, repacked it and then sent it on to another man. The second man also carefully examined the arm before taking it to the woods and burying it. Why did they do this?

The Mysterious PackageWas the arm cut off intentionally?Is the arm’s person still alive?Is he a doctor?Did the three men know each other?Are the other men also missing an arm?Were they ever stuck on a desert island with no food, make a pact to each cut off an arm to eat and survive, but were rescued before the doctor could cut off his own arm, and the doctor later fulfilled his commitment? YES!

Lateral Thinking Lessons

Certain questions are valuable and lead to large leaps of information . . .

How do we form hypotheses?

How can we pick good questions?probability that question leads to consistent hypotheses

cost of asking question

We want to find quickest cheapest path to consistent hypotheses

Hypothesis Generation

Use contra-positives for inverse entailment

Background Knowledgehasbeak(X) :- bird(X).bird(X) :- vulture(X).

Examplehasbeak(tweety).

Hypothesesbird(tweety).bird(X).vulture(tweety).vulture(X).

Trial Selection Theorye1 e2 e3 e4

H1 0 1 1 1

H2 1 1 0 1

H3 1 0 1 1

e1

e2H1

H2 H3

One possible trial path

t f

t f

Hypothesis Probability

Each trial partitions H into {H[t],H[t’]}

Assuming optimal encoding scheme…

Prior probability of each hypothesis

Compression is rounded f measure

p(hi | E) =2Compression(hi |E )

2Compression(h |E )

h∈H

f = E + −E +

p(l1 + l2 + n)

Experiment Cost

Ct is the cost of a trial t

EC(H,T) ≈ mint∈T

Ct + p(t)(meant '∈(T −t )Ct ')JH t[ ]

+(1− p(t))(meant '∈(T −t )Ct ' )JH t [ ]

⎣ ⎢ ⎢

⎦ ⎥ ⎥

p(t) = p(h)h∈H t[ ]

JH = − p(h) log2(p(h))⎣ ⎦h∈H

Functional Genomics

Want to learn gene-enzyme mappingGenes encode for

Enzymes that catalyze reactions between

Metabolites to eventually create

Amino Acid Products

Perform auxotrophic growth experiments to determine phenotype

Functional Genomics: Simple

A, B and C are EnzymesX is ubiquitous metabolite, Y and Z optionalIf we knock out gene2, we need to add nutrient Z to produce Trpwant to learn codes(gene2, B, [Y], [Z]) but only ask:

pheno_effect(gene2,[Y]) is false

pheno_effect(gene2,[Z]) is true

pheno_effect(gene2,[Y,Z]) is true

X Y Z Trp

gene1 gene2 gene3

A B C

Aromatic amino acid pathway

aromatic amino acidsenzymesmetabolites

Metabolism in Logic

Hypotheses:codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]).

codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]).

etc ...

Background Knowledge:enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]).

enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]).

etc ...

generated_by_other_pathways([‘C00002’, ‘C00005’, ‘C00006’, ... , ‘C03356’]).

ends([‘C00078’, ‘C00079’, ‘C00082’]).

Metabolism in Logic

What the Oracle answers:

phenotypic_effect(ORF, Growth_medium):-

generated_by_other_pathways(Ubiquitous_metabolites),

union(Ubiquitous_metabolites, Growth_medium, Starts),

connected(Starts, Wild_products),

ends(Ends),

subset(Wild_products, Ends),

enz(Enzyme, Reactants, Products),

encodes(ORF, Enzyme, Reactants, Products),

connected_without_this_step(Starts, Mutant_products,

Enzyme, Reactants, Products),

not(subset(Mutant_products, Ends)).

ExperimentsLearn function of 17 genes by removing ORFGrowth Media

13 optional nutrients, at most 3 at a time378 possible experiments for each ORF

Cost of Optional NutrientsDetermined from www.sigmaaldrich.com catalog

Strategies for ComparisonRandomNaïve CheapestASE-Progol

Experiments

Remove all codes(…) factsLoop

Generate random sample of trialsGenerate hypotheses using Theory Completion by Inverse EntailmentFind minimum EC(H,T) trial and performAdd results to known examples

until hypotheses consistent with trials

Results:Cost

Results: Time

Conclusions and Future WorkASE-Progol finds hypotheses inexpensively and quickly5 of 17 genes had only negative examples… why? Look into inhibitors and nonmonotonic logics.Limited answers to yes/no. Probabilities?Can this be applied to gene regulatory networks, using microarray technology?What other networks have similar frameworks?