ILP Slides
-
Upload
annada-priyadarshinee -
Category
Documents
-
view
237 -
download
0
Transcript of ILP Slides
-
8/6/2019 ILP Slides
1/44
Inductive Logic Programming(for Dummies)
Anoop & Hector
-
8/6/2019 ILP Slides
2/44
EDAM Reading Group 2004 2
KnowledgeD
iscovery inDB
(KDD
)automatic extraction of novel, useful, and valid
knowledge from large sets of data
Different Kinds of: Knowledge
Rules
Decision trees
Cluster hierarchies Association rules
Statistically unusual subgroups
Data
-
8/6/2019 ILP Slides
3/44
EDAM Reading Group 2004 3
Relational AnalysisWould it not be nice to have analysis methods and
data mining systems capable of directly working
with multiple relations as they are available in
relational database systems?
-
8/6/2019 ILP Slides
4/44
EDAM Reading Group 2004 4
Single Table vs RelationalD
MThe same problems still remain.
But solutions?
More problems:
Extending the key notations
Efficiency concerns
-
8/6/2019 ILP Slides
5/44
EDAM Reading Group 2004 5
RelationalD
ata MiningData Mining : ML :: Relational Data Mining : ILP
Initially
Binary Classification
Now
Classification, Regression, Clustering, AssociationAnalysis
-
8/6/2019 ILP Slides
6/44
EDAM Reading Group 2004 6
ILP? India Literacy Project?
International Language Program?
Individualized Learning Program?
Instruction Level Parallelism?
International Lithosphere Program?
-
8/6/2019 ILP Slides
7/44
-
8/6/2019 ILP Slides
8/44
EDAM Reading Group 2004 8
D
eductive Vs Inductive ReasoningT B p E(deduce)
parent(X,Y) :- mother(X,Y).
parent(X,Y) :- father(X,Y).
mother(mary,vinni).
mother(mary,andre).
father(carrey,vinni).
father(carry,andre).
parent(mary,vinni).
parent(mary,andre).
parent(carrey,vinni).parent(carrey,andre).
parent(mary,vinni).
parent(mary,andre).
parent(carrey,vinni).
parent(carrey,andre).
mother(mary,vinni).
mother(mary,andre).
father(carrey,vinni).
father(carry,andre).
parent(X,Y) :- mother(X,Y).
parent(X,Y) :- father(X,Y).
E B p T(induce)
7
7
-
8/6/2019 ILP Slides
9/44
EDAM Reading Group 2004 9
ILP: ObjectiveGiven a dataset: Positive examples (E+) and optionally negative examples (E-)
Additional knowledge about the problem/application domain (Background
KnowledgeB
) Set of constraints to make the learning process more efficient (C)
Goal of an ILP system is to find a set of hypothesis that: Explains (covers) the positive examples - Completeness
Are consistent with the negative examples - Consistency
).,covers(:),covers(:: nhNnphPpHh 0
-
8/6/2019 ILP Slides
10/44
EDAM Reading Group 2004 10
DB
vs. Logic ProgrammingDB Terminology
Relation namep
Attribute of relationp Tuple
Relationp
a set of tuples
Relation q
defined as a view
LP Terminology Predicate symbolp
Argument of predicatep Ground factp(a1,,an)
Predicatep
defined extensionally by a
set of ground facts Predicate q
defined intentionally by aset of rules (clauses)
-
8/6/2019 ILP Slides
11/44
EDAM Reading Group 2004 11
Relational PatternIF Customer(C1,Age1,Income1,TotSpent1,BigSpender1)
AND MarriedTo(C1,C2)
AND Customer(C2,Age2,Income2,TotSpent2,BigSpender2)
AND Income2 10000
THEN BigSpender1 = Yes
big_spender(C1,Age1,Income1,TotSpent1)
married_to(C1,C2)customer(C2,Age2,Income2,TotSpent2,BigSpender2)
Income2 10000
u
n
u
-
8/6/2019 ILP Slides
12/44
EDAM Reading Group 2004 12
A Generic ILP Algorithmprocedure ILP (Examples)
INITIALIZE (Theories, Examples)
repeatT = SELECT (Theories, Examples)
{Ti}ni=1 = REFINE (T, Examples)
Theories = REDUCE (Theories Ti, Examples)
until STOPPINGCRITERION (Theories, Examples)
return (Theories)
77n
i 1!
-
8/6/2019 ILP Slides
13/44
EDAM Reading Group 2004 13
Procedures for a Generic ILP Algo.
INITIALIZE: initialize a set of theories(e.g. Theories = {true} orTheories = Examples)
SELECT: select the most promising candidate theory
REFINE: apply refine operators that guarantee new theories(specialization, generalization,).
REDUCE: discard unpromising theories
STOPPINGCRITERION: determine whether the current set of theories
is already good enough(e.g. when it contains a complete and consistent theory)
SELECT and REDUCE together implement the search strategy.
(e.g. hill-climbing: REDUCE = only keep the best theory.)
-
8/6/2019 ILP Slides
14/44
EDAM Reading Group 2004 14
Search Algorithms
Search Methods Systematic Search
Depth-first search
Breadth-first search
Best-first search
Heuristic Search
Hill-climbing
Beam-search
Search Direction Top-down search: Generic to specific
Bottom-up search: Specific to general
Bi-directional search
-
8/6/2019 ILP Slides
15/44
EDAM Reading Group 2004 15
Example
-
8/6/2019 ILP Slides
16/44
EDAM Reading Group 2004 16
Search Space
-
8/6/2019 ILP Slides
17/44
EDAM Reading Group 2004 17
Search Space
-
8/6/2019 ILP Slides
18/44
EDAM Reading Group 2004 18
Search Space as Lattice
Search space is a lattice underU-subsumption There exists a lub andglb for every pair of
clauses
lub is least general generalization
Bottom-up approaches find the lggof the
positive examples
-
8/6/2019 ILP Slides
19/44
EDAM Reading Group 2004 19
Basics of ILP contd
Bottom-up approach of finding clauses leads to longclauses through lgg.
Thus, prefer top-down approach since shorter and moregeneral clauses are learned
Two ways of doing top-down search FOIL: greedy search using information gain to score
PROGOL: branch-and-bound, usingP-N-lto score, uses saturationto restrict search space
Usually, refinement operator is to
Apply substitution
Add literal to body of a clause
-
8/6/2019 ILP Slides
20/44
EDAM Reading Group 2004 20
FOIL Greedy search, score each clause using information gain:
Let c1be a specialization ofc2
Then WIG(c1) (weighted information gain) is
Wherep is the number of possible bindings that make the clause
cover positive examples,p is the number of positive examples
covered and n is the number of negative examples covered.
Background knowledge (B) is limited to ground facts.
WI (c1
,c2
) ! p2
I c1
I c
2
I(c) ! log2
p
p n
-
8/6/2019 ILP Slides
21/44
EDAM Reading Group 2004 21
PROGOL Branch and bound top-down search
UsesP-N-las scoring function: Pis number of positive examples covered
Nis number of negative examples covered
lis the number of literals in the clause
Preprocessing step: build a bottom clause using a positive example and B to
restrict search space. Uses mode declarations to restrict language
B not limited to ground facts
While doing branch and bound top-down search:
Only use literals appearing in bottom clause to refine clauses.
Learned literal is a generalization of this bottom clause.
Can set depth bound on variable chaining and theorem proving
-
8/6/2019 ILP Slides
22/44
EDAM Reading Group 2004 22
Example ofBottom Clause
E
! p(a),p(b)_ aE
! p(c),p(d)_ aB ! {r(a,b),r(a,c),r(c,d),
q(X,Y) n r(Y,X)} var)(var,
)(var,
var)(var,(var):modes
r
constr
qp
Select a seed from positive examples,p(a), randomly or by order (first
uncovered positive example)
Gather all relevant literals (by forward chaining add anything from B
that is allowable)
Introduce variables as required by modes
p(a)n r(a,b),r(a,c),r(c,d),q(b,a),
q(c,a),q(d,c),r(a,b),r(a,c),r(c,d)
p(A)n r(A,b),r(A,c),r(C,d),q(B,A),
q(C,A),q(D,C),r(A,B),r(A,C),r(C,D)
-
8/6/2019 ILP Slides
23/44
EDAM Reading Group 2004 23
Iterate to Learn Multiple Rules Select seed from positive examples to build
bottom clause.
Get some rule If A B then P. Now throw
away all positive examples that were covered bythis rule
Repeat until there are no more positive examples.
+++
+ + + ++
++-
---
--
-
-
-
First seed
selected
First rule
learned
Second seed
selected
-
8/6/2019 ILP Slides
24/44
EDAM Reading Group 2004 24
From Last Time
Why ILP is not just Decision Trees.
Language is First-Order Logic
Natural representation for multi-relational settings
Thus, a natural representation forfulldatabases
Not restricted to the classification task.
So then, what is ILP?
-
8/6/2019 ILP Slides
25/44
EDAM Reading Group 2004 25
What is ILP?(An obscene generalization)
A way to search the space of First-Order
clauses.
With restrictions of course
U-subsumption and search space ordering
Refinement operators:
Applying substitutions
Adding literals
Chaining variables
-
8/6/2019 ILP Slides
26/44
EDAM Reading Group 2004 26
More From Last Time
Evaluation of hypothesis requires finding
substitutions for each example.
This requires a call to PROLOG foreach
example
For PROGOL only one substitution required
For FOIL allsubstitutions are required (recallthep in scoring function)
-
8/6/2019 ILP Slides
27/44
EDAM Reading Group 2004 27
An Example: MIDOS
A system for finding statistically-interestingsubgroups
Given a population and its distribution oversome class
Find subsets of the population for which thedistribution over this class is significantly
different (an example)Uses ILP-like search for definitions of
subgroups
-
8/6/2019 ILP Slides
28/44
EDAM Reading Group 2004 28
MIDOS
Example:
A shop-club database
customer(ID, Zip, Sex, Age, Club) order(C_ID, O_ID, S_ID, Pay_mode)
store(S_ID, Location)
Say want to findgood_customer(ID)
FOIL or PROGOL might find: good_customer(C) :- customer(C,_,female,_,_),
order(C,_,_,credit_card).
-
8/6/2019 ILP Slides
29/44
EDAM Reading Group 2004 29
MID
OS Instead, lets find a subset of customers that
contain an interesting amount of members
Target: [member, non_member]
ReferenceDistribution: 1371 total
customer(C,_,female,_,_), order(C,_,_,credit_card).
Distribution: 478 total 1.54 %%
-
8/6/2019 ILP Slides
30/44
EDAM Reading Group 2004 30
MID
OS How-to:
Build a clause in an ILP fashion
Refinement operators:
Apply a substitution (to a constant, for example)
Add a literal (uses given foreign link relationships)
Use a score for statistical interestDo a beam search
-
8/6/2019 ILP Slides
31/44
EDAM Reading Group 2004 31
MIDOS
Statistically-interesting scoring
g: relative size of subgroup
p0i: relative frequency of value i in entire populationpi: relative frequency of value i in subgroup
g
1gy p0i pi 2
i!1..n
-
8/6/2019 ILP Slides
32/44
EDAM Reading Group 2004 32
MID
OS Example 1378 total population, 478 female credit card
buyers
906 members in total population, 334 members
among female credit card buyers:
g
1gy p0i pi
2
i!1..n
4781378
1 4781378
y 9061378 334
478 2
4721378
144478
2
! 0.00154
-
8/6/2019 ILP Slides
33/44
EDAM Reading Group 2004 33
Scaling in Data Mining
Scaling to large datasets
Increasing number of training examples
Scaling to size of the examples
Increasing number of ground facts in
background knowledge
[Tang, Mooney, Melville, UT Austin, MRDM (SIGKDD) 2003.]
-
8/6/2019 ILP Slides
34/44
EDAM Reading Group 2004 34
BETHE
Addresses scaling to number of ground facts in
background knowledge
Problem: PROGOL bottom-clause is too big This makes search space too big
Solution: dont build a bottom clause
Use FOIL-like search, BUT
Guide search using some seed example
-
8/6/2019 ILP Slides
35/44
EDAM Reading Group 2004 35
Efficiency Issues
Representational Aspects
Search
Evaluation
Sharing computations
Memory-wise scalability
-
8/6/2019 ILP Slides
36/44
EDAM Reading Group 2004 36
Representational Aspects
Example: Student(string sname, string major, string minor)
Course(string cname, string prof, string cred)
Enrolled(string sname, string cname)
In a natural join of these tables there is a one-to-onecorrespondance between join result and the Enrolled table
Data mining tasks on the Enrolled table are reallypropositional
MRDM is overkill
-
8/6/2019 ILP Slides
37/44
EDAM Reading Group 2004 37
Representational Aspects
Three settings for data mining:
Find patterns within individuals represented as tuples(single table, propositional)
eg. Which minor is chosen with what major
Find patterns within individuals represented as sets oftuples (each individual induces a sub-database)
Multiple tables, restricted to some individual
eg. Student X taking course A, usually takes course
B
Find patters within whole database Mutliple tables
eg. Course taken by student A are also taken by student B
-
8/6/2019 ILP Slides
38/44
EDAM Reading Group 2004 38
Search
Space restriction
Bottom clauses as seen above
Syntactical biases and typed logic Modes as seen above.
Can add types to variables to further restrict language
Search biases and pruning rules
PROGOLs bound (relies on anti-monotonicity ofcoverage)
Stochastic search
-
8/6/2019 ILP Slides
39/44
EDAM Reading Group 2004 39
Evaluation
Evaluating a clause: get some measure ofcoverage
Match each example to the clause: Run multiple logical queries.
Query optimization methods from DBcommunity
Rel. Algebra operator reordering BUT: queries forDB are set oriented (bottom-up),
queries in PROLOG find a single solution (top-down).
-
8/6/2019 ILP Slides
40/44
EDAM Reading Group 2004 40
Evaluation
More options
k-locality, given some bindings, literals in a clause
become independent: eg. ?-p( ,Y),q(Y,Z),r(Y,U).
Given a binding forY, proofs ofq and rare independent
So, find only one solution forq, if no solution found forrno
need to backtrack.
RelaxU-subsumption using stochastic estimates Sample space of substitutions and decide on subsumption
based on this sample
-
8/6/2019 ILP Slides
41/44
EDAM Reading Group 2004 41
Sharing Computations
Materialization of features
Propositionalization
Pre-compute some statistics
Joint distribution over attributes of a table
Query selectivity
Store proofs, reuse when evaluating newclauses
-
8/6/2019 ILP Slides
42/44
EDAM Reading Group 2004 42
Memory-wise scalability
All full ILP systems work on memory
databases
Exception: TILDE: learns multi-relational
decision trees
The trick: make example loop the outer loop
Current solution:Encode data compactly
-
8/6/2019 ILP Slides
43/44
EDAM Reading Group 2004 43
References
Dzeroski and Lavrac, RelationalData Mining,Springer, 2001.
David Page, ILP: Just Do It, ILP 2000. Tang, Mooney, Melville, Scaling Up ILP to LargeExamples: Results on LinkDiscovery for CounterTerrorism, MRDM 2003.
Blockeel, Sebag: Scalability and efficiency in multi-relational data mining. SIGKDDExplorations 5(1) July2003
-
8/6/2019 ILP Slides
44/44
EDAM Reading Group 2004 44
Overview
Theory 1
Theory 2
Theory 3
Performance
Results
Conclusion
Future
Work