Discovering Robust Knowledge from Databases that Change

30
1 Discovering Robust Knowledge from Databases that Change Chun-Nan Hsu Craig A. Knoblock Arizona State University University of Southern California Journal of Data Mining and Knowledge Discovery, Volume 2, 1998 Presented & Edited by: Danielle Steimke CS332/295 Data Mining – Spring 2014

description

Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal of Data Mining and Knowledge Discovery , Volume 2, 1998. Discovering Robust Knowledge from Databases that Change. Presented & Edited by: Danielle Steimke CS332/295 Data Mining – Spring 2014. - PowerPoint PPT Presentation

Transcript of Discovering Robust Knowledge from Databases that Change

Page 1: Discovering Robust Knowledge from Databases that Change

1

Discovering Robust Knowledge from Databases that Change

Chun-Nan Hsu Craig A. KnoblockArizona State University University of Southern

California

Journal of Data Mining and Knowledge Discovery, Volume 2, 1998

Presented & Edited by: Danielle SteimkeCS332/295 Data Mining – Spring 2014

Page 2: Discovering Robust Knowledge from Databases that Change

2

Outline

Introduction Preliminaries Robustness of Knowledge Applying Robustness in Knowledge

Discovery Conclusions

Page 3: Discovering Robust Knowledge from Databases that Change

3

Introduction: Importance

Real world databases are dynamic. Changing information (updates, deletions) may

make current rules incorrect How can we detect, check & update these

inconsistent rules without high maintenance costs? Use robustness as a measure of how likely

knowledge is to be consistent after database changes

Page 4: Discovering Robust Knowledge from Databases that Change

4

Introduction: BASIL

BASIL rule discovery system – discovers semantic rules that optimize performance Many rules may be invalid after DB changes Use robustness estimation to guide data

mining using a rule pruning approach Search for pruning pattern that yields highly

robust rules

Page 5: Discovering Robust Knowledge from Databases that Change

5

Introduction: Applications of Robustness Estimation

Minimize the maintenance cost of inconsistent rules in the presence of database changes.

When an inconsistent rule is detected, a system may: remove the rule: repeatedly invoke discovery

system repair the rule: repaired frequently after data

changes Apply to guide the data mining and evaluation of

semantic rules: In data mining stage prune antecedents of a

discovered rule to increase its robustness. In evaluation stage, eliminate rules with robustness values

below a threshold.

Page 6: Discovering Robust Knowledge from Databases that Change

6

Introduction: Robustness Comparison

Robustness vs. Support: Support - the probability that a data instance satisfies

a rule. Robustness - probability that an entire database state

is consistent with a rule.

Robustness vs. Predictive Accuracy: Predictive - the probability that knowledge is

consistent with randomly selected unseen data. The difference is significant in databases that are

interpreted using the closed-world assumption (CWA).

Page 7: Discovering Robust Knowledge from Databases that Change

7

Introduction: Closed-world Assumption

Information that is not explicitly present in the database is taken to be false.

Closed-world databases are widely used in relational db’s, deductive db’s and rule-based information systems because of

The characteristics of application domains. The limitation of the representation systems.

Databases change by deletions, insertions and updates, and in a close-world database, they may affect the validity of a rule.

Closed-world data tends to be dynamic, important for knowledge discovery systems to handle dynamic and closed world data.

Page 8: Discovering Robust Knowledge from Databases that Change

8

Important Definitions

Database state at time t: the collection of the instances presents in the database at time t.

Database Literals vs. Built-In Literals Range Rule vs. Relational Rule Consistent rule: given a database state, if all

variable instantiations that satisfy the antecedents of the rule also satisfy the consequent of the rule.

Page 9: Discovering Robust Knowledge from Databases that Change

9

Outline

Introduction Preliminaries Robustness of Knowledge Applying Robustness in Knowledge

Discovery Conclusions

Page 10: Discovering Robust Knowledge from Databases that Change

10

First Definition of Robustness

Robust rule: does not become inconsistent (invalid) after database changes

Definition 1: (Robustness for all states) Given a rule r, let D be the event that a database is in a state that is consistent with r.

Robust 1 (r )=# of database states consistent with r

# of all possible database states

Robust 1(r )=Pr( D)

Page 11: Discovering Robust Knowledge from Databases that Change

11

First Definition Continued

Two problems with estimate Treats all database states as if they are

equally probable Number of database states is large, even in

small databases

Page 12: Discovering Robust Knowledge from Databases that Change

12

Second Definition of Robustness

Robustness for accessible states: Given a rule r, and a database in a state d, in which

r is consistent. Let t denote the event of performing a transaction on d that results in new database states inconsistent with r. The robustness of r in accessible states from the current state d is defined as

Robust (r∣d )=Pr (¬t∣d )=1 - Pr ( t∣d )

Page 13: Discovering Robust Knowledge from Databases that Change

13

Definitions of Robustness (cont.)

If all transactions are equally probable, then

The robustness of a rule could be different in different database states. E.g.: Suppose there are two db states d1 and d2 of a given db. To reach a state inconsistent with r, we need to delete 10 tuples in d1 and only 1 tuple in d2.

Robust(r|d1) > Robust(r|d2)

Robust 1(r )≡Robust( r∣d )

Page 14: Discovering Robust Knowledge from Databases that Change

14

Estimate Robustness Estimate the probabilities of data changes,

rather than the number of possible database states.

Estimate the robustness of a rule based on the probability of transactions that may invalidate the rule.

Decompose data changing transactions and estimate their probabilities using the Laplace law of succession.

Page 15: Discovering Robust Knowledge from Databases that Change

15

Estimate Robustness (cont.)

Laplace law of Succession: Given a repeatable experiment with an outcome of one of any of k classes. Suppose we have conducted this experiment n times, r of which have resulted in some outcome C, in which we are interested. The probability that the outcome of the next experiment will be C can be estimated as r+1

n+k

Page 16: Discovering Robust Knowledge from Databases that Change

16

Decomposition of Transactions

To estimate Pr(t|d) Use Bayesian network model to

decompose the transaction and estimate local probabilities first

X1:type of transaction?

X2: on which relation?

X3: on which tuples?

X4:on which attributes?

X5:what new attribute

value?

Page 17: Discovering Robust Knowledge from Databases that Change

17

Estimating Robustness Estimation accuracy depends on available

information. However, even given only database schemas, the method can still come up with some estimates.

How to derive transactions that invalidate an arbitrary logic statement?

Most knowledge discovery systems have strong restrictions on the syntax of discovered knowledge.

Therefore the invalidating transactions can be manually generalized into a small sets of transaction templates, as well as templates of probability estimates for robustness estimation.

Page 18: Discovering Robust Knowledge from Databases that Change

18

Outline

Introduction Preliminaries Robustness of Knowledge Applying Robustness in

Knowledge Discovery Conclusions

Page 19: Discovering Robust Knowledge from Databases that Change

19

Applying Robustness in Knowledge Discovery

A rule pruning algorithm which can increase the robustness and applicability of machine discovered rules by pruning their antecedent literals.

Specification of rule pruning problem:Take a machine-generated rule as input, which is consistent with a database but potentially overly-specific, and remove antecedent literals of the rule so that it remains consistent but is short and robust.

Basic idea of pruning algorithm: To search for a subset of antecedent literals to remove until any

further removal will yield an inconsistent rule.

However the search space can be exponentially large with respect to the number of literals in rule.

A beam-search algorithm was presented to trim the search space.

Page 20: Discovering Robust Knowledge from Databases that Change

20

Antecedent pruning algorithm

Pruning rule antecedents

1. INPUT R = rules (initially the rule to be pruned), B = beam size;

2. LET O = results;( initially empty); 3. WHILE (R is not empty) DO4. Move the first rule r in R to O; 5. Prune r, LET R’ = resulting rules; 6. Remove visited dangling or inconsistent rules in R’; 7. Estimate and sort on the robustness of rules in R’;

8. Retain top B rules in R’ and remove the rest; 9. Merge sorted R’ into R in sorted order of the

robustness; 10. RETURN O ;

Page 21: Discovering Robust Knowledge from Databases that Change

21

Pruning Algorithm

Beam-search algorithm Apply the robustness estimation approach to

estimate the robustness of a partially pruned rule and guide the pruning search.

Two properties to optimize: robustness and length.

For each set of equally short rules, the algorithm searches for the rule that is as robust as possible while still being consistent.

Page 22: Discovering Robust Knowledge from Databases that Change

22

Experimental Results

Experiment setups: Use the rule discovery system BASIL to derive rules

from two large ORACLE relations databases BASIL part of SIMS information mediator developed by

authors Synthesize 123 sample transactions to represent

possible transactions of the experimental databases (27 updates, 29 deletions and 67 insertions)

Steps of the experiments: Train BASIL to discover a set of rules and estimate their

robustness, generates 355 rules Use another set of 202 sample transactions to assist the

robustness estimation. Apply the set of 123 transactions to the two relational

database and check consistency of all 355 rules

Page 23: Discovering Robust Knowledge from Databases that Change

23

Conclusions Formalize the notion of robustness of a rule r in a

given database state d

Estimate probabilities of rule invalidating transactions

Decomposes the probability of a transactions into local probabilities that can be estimated using Laplace law.

No need to provide additional information for the estimation.

Apply in a rule discovery system Pruning antecedents of a discovered rule so that the rule

will be highly robust and widely applicable. Beam-search algorithm

Conduct empirical experiments to demonstrate the algorithm.

Page 24: Discovering Robust Knowledge from Databases that Change

Real-world Applications

Face recognition software Envirionment is the “data set” – constantly

changing Rules must resist variability in lighting and

pose Systems prognostics

Data set continuously updated with diagnostic or calibration information

Predictive failure rules used to dictate system resource planning

24

Page 25: Discovering Robust Knowledge from Databases that Change

First Exam Question

Why is finding robust rules significant?

25

Page 26: Discovering Robust Knowledge from Databases that Change

First Question - Answer

Real world databases tend to be dynamic Changing information – updates and deletions,

rather than just additions – could invalidate current rules

Continually checking and updating rules may incur high maintenance costs, especially for large databases

Robustness measures how likely the knowledge found will be consistent after changes to the database

26

Page 27: Discovering Robust Knowledge from Databases that Change

Second Exam Question

Compare and Contrast Robustness estimations with support and predictive accuracy

27

Page 28: Discovering Robust Knowledge from Databases that Change

Second Question - Answer

Robustness is the probability that an entire database state is consistent with a rule, while support is the probability that a specific data instance satisfies a rule.

Predictive Accuracy is the probability that knowledge is consistent with randomly selected unseen data. This is significant in closed world databases, where data tends to be dynamic.

28

Page 29: Discovering Robust Knowledge from Databases that Change

Third Exam Question

Why are transaction classes mutually exclusive when calculating robustness?

29

Page 30: Discovering Robust Knowledge from Databases that Change

Third Question – Answer

Transaction classes are mutually exclusive so that no transaction class covers another because for any two classes of transactions t_a and t_b, if t_a covers t_b, then Pr(t_a⋀t_b)=Pr(t_a) and it is redundant to consider t_b.

30