ILP Slides

8/6/2019 ILP Slides

1/44

Inductive Logic Programming(for Dummies)

Anoop & Hector

8/6/2019 ILP Slides

2/44

EDAM Reading Group 2004 2

KnowledgeD

iscovery inDB

(KDD

)automatic extraction of novel, useful, and valid

knowledge from large sets of data

Different Kinds of: Knowledge

Rules

Decision trees

Cluster hierarchies Association rules

Statistically unusual subgroups

Data

8/6/2019 ILP Slides

3/44


Relational AnalysisWould it not be nice to have analysis methods and

data mining systems capable of directly working

with multiple relations as they are available in

relational database systems?

8/6/2019 ILP Slides

4/44


Single Table vs RelationalD

MThe same problems still remain.

But solutions?

More problems:

Extending the key notations

Efficiency concerns

8/6/2019 ILP Slides

5/44


RelationalD

ata MiningData Mining : ML :: Relational Data Mining : ILP

Initially

Binary Classification

Now

Classification, Regression, Clustering, AssociationAnalysis

8/6/2019 ILP Slides

6/44


ILP? India Literacy Project?

International Language Program?

Individualized Learning Program?

Instruction Level Parallelism?

International Lithosphere Program?

8/6/2019 ILP Slides

7/44

8/6/2019 ILP Slides

8/44


D

eductive Vs Inductive ReasoningT B p E(deduce)

parent(X,Y) :- mother(X,Y).

parent(X,Y) :- father(X,Y).

mother(mary,vinni).

mother(mary,andre).

father(carrey,vinni).

father(carry,andre).

parent(mary,vinni).

parent(mary,andre).

parent(carrey,vinni).parent(carrey,andre).

parent(mary,vinni).

parent(mary,andre).

parent(carrey,vinni).

parent(carrey,andre).

mother(mary,vinni).

mother(mary,andre).

father(carrey,vinni).

father(carry,andre).

parent(X,Y) :- mother(X,Y).

parent(X,Y) :- father(X,Y).

E B p T(induce)

7

7

8/6/2019 ILP Slides

9/44


ILP: ObjectiveGiven a dataset: Positive examples (E+) and optionally negative examples (E-)

Additional knowledge about the problem/application domain (Background

KnowledgeB

) Set of constraints to make the learning process more efficient (C)

Goal of an ILP system is to find a set of hypothesis that: Explains (covers) the positive examples - Completeness

Are consistent with the negative examples - Consistency

).,covers(:),covers(:: nhNnphPpHh 0

8/6/2019 ILP Slides

10/44


DB

vs. Logic ProgrammingDB Terminology

Relation namep

Attribute of relationp Tuple

Relationp

a set of tuples

Relation q

defined as a view

LP Terminology Predicate symbolp

Argument of predicatep Ground factp(a1,,an)

Predicatep

defined extensionally by a

set of ground facts Predicate q

defined intentionally by aset of rules (clauses)

8/6/2019 ILP Slides

11/44


Relational PatternIF Customer(C1,Age1,Income1,TotSpent1,BigSpender1)

AND MarriedTo(C1,C2)

AND Customer(C2,Age2,Income2,TotSpent2,BigSpender2)

AND Income2 10000

THEN BigSpender1 = Yes

big_spender(C1,Age1,Income1,TotSpent1)

married_to(C1,C2)customer(C2,Age2,Income2,TotSpent2,BigSpender2)

Income2 10000

u

n

u

8/6/2019 ILP Slides

12/44


A Generic ILP Algorithmprocedure ILP (Examples)

INITIALIZE (Theories, Examples)

repeatT = SELECT (Theories, Examples)

{Ti}ni=1 = REFINE (T, Examples)

Theories = REDUCE (Theories Ti, Examples)

until STOPPINGCRITERION (Theories, Examples)

return (Theories)

77n

i 1!

8/6/2019 ILP Slides

13/44


Procedures for a Generic ILP Algo.

INITIALIZE: initialize a set of theories(e.g. Theories = {true} orTheories = Examples)

SELECT: select the most promising candidate theory

REFINE: apply refine operators that guarantee new theories(specialization, generalization,).

REDUCE: discard unpromising theories

STOPPINGCRITERION: determine whether the current set of theories

is already good enough(e.g. when it contains a complete and consistent theory)

SELECT and REDUCE together implement the search strategy.

(e.g. hill-climbing: REDUCE = only keep the best theory.)

8/6/2019 ILP Slides

14/44


Search Algorithms

Search Methods Systematic Search

Depth-first search

Breadth-first search

Best-first search

Heuristic Search

Hill-climbing

Beam-search

Search Direction Top-down search: Generic to specific

Bottom-up search: Specific to general

Bi-directional search

8/6/2019 ILP Slides

15/44


Example

8/6/2019 ILP Slides

16/44


Search Space

8/6/2019 ILP Slides

17/44


Search Space

8/6/2019 ILP Slides

18/44


Search Space as Lattice

Search space is a lattice underU-subsumption There exists a lub andglb for every pair of

clauses

lub is least general generalization

Bottom-up approaches find the lggof the

positive examples

8/6/2019 ILP Slides

19/44


Basics of ILP contd

Bottom-up approach of finding clauses leads to longclauses through lgg.

Thus, prefer top-down approach since shorter and moregeneral clauses are learned

Two ways of doing top-down search FOIL: greedy search using information gain to score

PROGOL: branch-and-bound, usingP-N-lto score, uses saturationto restrict search space

Usually, refinement operator is to

Apply substitution

Add literal to body of a clause

8/6/2019 ILP Slides

20/44


FOIL Greedy search, score each clause using information gain:

Let c1be a specialization ofc2

Then WIG(c1) (weighted information gain) is

Wherep is the number of possible bindings that make the clause

cover positive examples,p is the number of positive examples

covered and n is the number of negative examples covered.

Background knowledge (B) is limited to ground facts.

WI (c1

,c2

) ! p2

I c1

I c

2

I(c) ! log2

p

p n

8/6/2019 ILP Slides

21/44


PROGOL Branch and bound top-down search

UsesP-N-las scoring function: Pis number of positive examples covered

Nis number of negative examples covered

lis the number of literals in the clause

Preprocessing step: build a bottom clause using a positive example and B to

restrict search space. Uses mode declarations to restrict language

B not limited to ground facts

While doing branch and bound top-down search:

Only use literals appearing in bottom clause to refine clauses.

Learned literal is a generalization of this bottom clause.

Can set depth bound on variable chaining and theorem proving

8/6/2019 ILP Slides

22/44


Example ofBottom Clause

E

! p(a),p(b)_ aE

! p(c),p(d)_ aB ! {r(a,b),r(a,c),r(c,d),

q(X,Y) n r(Y,X)} var)(var,

)(var,

var)(var,(var):modes

r

constr

qp

Select a seed from positive examples,p(a), randomly or by order (first

uncovered positive example)

Gather all relevant literals (by forward chaining add anything from B

that is allowable)

Introduce variables as required by modes

p(a)n r(a,b),r(a,c),r(c,d),q(b,a),

q(c,a),q(d,c),r(a,b),r(a,c),r(c,d)

p(A)n r(A,b),r(A,c),r(C,d),q(B,A),

q(C,A),q(D,C),r(A,B),r(A,C),r(C,D)

8/6/2019 ILP Slides

23/44


Iterate to Learn Multiple Rules Select seed from positive examples to build

bottom clause.

Get some rule If A B then P. Now throw

away all positive examples that were covered bythis rule

Repeat until there are no more positive examples.

+++

+ + + ++

++-

---

--

-

-

-

First seed

selected

First rule

learned

Second seed

selected

8/6/2019 ILP Slides

24/44


From Last Time

Why ILP is not just Decision Trees.

Language is First-Order Logic

Natural representation for multi-relational settings

Thus, a natural representation forfulldatabases

Not restricted to the classification task.

So then, what is ILP?

8/6/2019 ILP Slides

25/44


What is ILP?(An obscene generalization)

A way to search the space of First-Order

clauses.

With restrictions of course

U-subsumption and search space ordering

Refinement operators:

Applying substitutions

Adding literals

Chaining variables

8/6/2019 ILP Slides

26/44


More From Last Time

Evaluation of hypothesis requires finding

substitutions for each example.

This requires a call to PROLOG foreach

example

For PROGOL only one substitution required

For FOIL allsubstitutions are required (recallthep in scoring function)

8/6/2019 ILP Slides

27/44


An Example: MIDOS

A system for finding statistically-interestingsubgroups

Given a population and its distribution oversome class

Find subsets of the population for which thedistribution over this class is significantly

different (an example)Uses ILP-like search for definitions of

subgroups

8/6/2019 ILP Slides

28/44


MIDOS

Example:

A shop-club database

customer(ID, Zip, Sex, Age, Club) order(C_ID, O_ID, S_ID, Pay_mode)

store(S_ID, Location)

Say want to findgood_customer(ID)

FOIL or PROGOL might find: good_customer(C) :- customer(C,_,female,_,_),

order(C,_,_,credit_card).

8/6/2019 ILP Slides

29/44


MID

OS Instead, lets find a subset of customers that

contain an interesting amount of members

Target: [member, non_member]

ReferenceDistribution: 1371 total

customer(C,_,female,_,_), order(C,_,_,credit_card).

Distribution: 478 total 1.54 %%

8/6/2019 ILP Slides

30/44


MID

OS How-to:

Build a clause in an ILP fashion

Refinement operators:

Apply a substitution (to a constant, for example)

Add a literal (uses given foreign link relationships)

Use a score for statistical interestDo a beam search

8/6/2019 ILP Slides

31/44


MIDOS

Statistically-interesting scoring

g: relative size of subgroup

p0i: relative frequency of value i in entire populationpi: relative frequency of value i in subgroup

g

1gy p0i pi 2

i!1..n

8/6/2019 ILP Slides

32/44


MID

OS Example 1378 total population, 478 female credit card

buyers

906 members in total population, 334 members

among female credit card buyers:

g

1gy p0i pi

2

i!1..n

4781378

1 4781378

y 9061378 334

478 2

4721378

144478

2

! 0.00154

8/6/2019 ILP Slides

33/44


Scaling in Data Mining

Scaling to large datasets

Increasing number of training examples

Scaling to size of the examples

Increasing number of ground facts in

background knowledge

[Tang, Mooney, Melville, UT Austin, MRDM (SIGKDD) 2003.]

8/6/2019 ILP Slides

34/44


BETHE

Addresses scaling to number of ground facts in

background knowledge

Problem: PROGOL bottom-clause is too big This makes search space too big

Solution: dont build a bottom clause

Use FOIL-like search, BUT

Guide search using some seed example

8/6/2019 ILP Slides

35/44


Efficiency Issues

Representational Aspects

Search

Evaluation

Sharing computations

Memory-wise scalability

8/6/2019 ILP Slides

36/44



Example: Student(string sname, string major, string minor)

Course(string cname, string prof, string cred)

Enrolled(string sname, string cname)

In a natural join of these tables there is a one-to-onecorrespondance between join result and the Enrolled table

Data mining tasks on the Enrolled table are reallypropositional

MRDM is overkill

8/6/2019 ILP Slides

37/44



Three settings for data mining:

Find patterns within individuals represented as tuples(single table, propositional)

eg. Which minor is chosen with what major

Find patterns within individuals represented as sets oftuples (each individual induces a sub-database)

Multiple tables, restricted to some individual

eg. Student X taking course A, usually takes course

B

Find patters within whole database Mutliple tables

eg. Course taken by student A are also taken by student B

8/6/2019 ILP Slides

38/44


Search

Space restriction

Bottom clauses as seen above

Syntactical biases and typed logic Modes as seen above.

Can add types to variables to further restrict language

Search biases and pruning rules

PROGOLs bound (relies on anti-monotonicity ofcoverage)

Stochastic search

8/6/2019 ILP Slides

39/44


Evaluation

Evaluating a clause: get some measure ofcoverage

Match each example to the clause: Run multiple logical queries.

Query optimization methods from DBcommunity

Rel. Algebra operator reordering BUT: queries forDB are set oriented (bottom-up),

queries in PROLOG find a single solution (top-down).

8/6/2019 ILP Slides

40/44


Evaluation

More options

k-locality, given some bindings, literals in a clause

become independent: eg. ?-p( ,Y),q(Y,Z),r(Y,U).

Given a binding forY, proofs ofq and rare independent

So, find only one solution forq, if no solution found forrno

need to backtrack.

RelaxU-subsumption using stochastic estimates Sample space of substitutions and decide on subsumption

based on this sample

8/6/2019 ILP Slides

41/44


Sharing Computations

Materialization of features

Propositionalization

Pre-compute some statistics

Joint distribution over attributes of a table

Query selectivity

Store proofs, reuse when evaluating newclauses

8/6/2019 ILP Slides

42/44


Memory-wise scalability

All full ILP systems work on memory

databases

Exception: TILDE: learns multi-relational

decision trees

The trick: make example loop the outer loop

Current solution:Encode data compactly

8/6/2019 ILP Slides

43/44


References

Dzeroski and Lavrac, RelationalData Mining,Springer, 2001.

David Page, ILP: Just Do It, ILP 2000. Tang, Mooney, Melville, Scaling Up ILP to LargeExamples: Results on LinkDiscovery for CounterTerrorism, MRDM 2003.

Blockeel, Sebag: Scalability and efficiency in multi-relational data mining. SIGKDDExplorations 5(1) July2003

8/6/2019 ILP Slides

44/44


Overview

Theory 1

Theory 2

Theory 3

Performance

Results

Conclusion

Future

Work

ILP Slides

Documents

Transcript of ILP Slides