Download - Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Page 1: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules for Fusing RegularAssociation Rules from Different Databases

M.D. Ruiz, J. Gomez-Romero, M.J. Martin-Bautista, D.Sanchez, M. Delgado

9th July 2014

Page 2: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


I Exponential growth of available data in Data Mining area.

I Datasets are often distributed.

I Datasets are processed separately (several mining processes arecarried out over data with similar meaning coming from a differentsource)

⇒ the extracted information should be fused in order to provide aunified and not overwhelming view to the user.


Page 3: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


Several problems arise when using association rule algorithms indistributed databases:

1. Obtaining rules from very large datasets can be difficult andtime-consuming.

• Parallel versions of rule mining algorithms, e.g. MapReduce

2. Handling with distributed databases with similar meaning anddifferent description, that they cannot be directly merged.


Data Mining + Information Fusion


Page 4: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


1. Example in Crime Data Analysis

2. ProposalBrief Introduction to Association RulesMeta-Association Rules

3. Algorithm and Implementation Issues

4. Experimental Evaluation

5. Discussion and Future Research

6. References


Page 5: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Example in Crime Data Analysis

I We want to study the crime incidents happened in the city ofChicago.

I Each district of the Chicago has its own dataset: D1, D2, . . . , Dk

some of them sharing some of their attributes.

I Association rule mining algorithms are executed separately in eachdistrict obtaining different sets of rules: R1, R2, . . . , Rk.

I There are several attributes concerning/describing some aspects ofthe districts: at1, at2, . . . , atm


Fusing this information by means of Meta-Association Rules


Page 6: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


…"R1# R2# Rk&1# Rk#





Page 7: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I Data is usually stored in datasets D composed by transactions ti(rows) and attributes (columns).

I We call item to a pair 〈attribute, value〉 or 〈attribute, interval〉.

D i1 i2 . . . ij ij+1 . . . im

t1 1 0 . . . 0 1 . . . 0t2 0 1 . . . 1 1 . . . 1...


. . ....

.... . . 1 1 . . . 0 1 . . . 1

I Association Rules are expressions of the form A→ B where A, Bare non-empty set of items with no intersection.

I An association rule represents a relation between the jointco-occurrence of A and B.


Page 8: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I The support of an itemset A is defined as probability that atransaction contains the item

supp(A) =|t ∈ D : A ⊆ t|


I For assessing the ARs validity, the most common measures aresupport (joint probability P (A ∪B)) and confidence (conditionalprobability P (B|A)

Supp(A→ B) =supp(A ∪B)

|D|; Conf(A→ B) =

supp(A ∪B)

supp (A)

that must be ≥ minsupp and ≥ minconf resp. (thresholdsimposed by the user), that is, the rule is frequent and confident.


Page 9: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I An alternative framework is to measure the accuracy by means ofthe certainty factor, CF (A→ B)

Conf(A→ B)− supp(B)

1− supp(B)if Conf(A→ B) > supp(B)

Conf(A→ B)− supp(B)

supp(B)if Conf(A→ B) < supp(B)

0 otherwise.

I CF measures how our belief that B is in a transaction changes whenwe are told that A is in that transaction.

I Certainty factor has better properties than confidence and otherquality measures, in particular, it helps to reduce the number ofrules obtained by filtering those rules corresponding to statisticalindependence or negative dependence.

I When CF (A→ B) ≥ minCF the rule is called certain.


Page 10: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules

Meta-association rules are association rules where theantecedent or the consequent can contain regular rules that have

been previously extracted with a high reliability in a highpercentage of the source databases.


Page 11: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


…"R1# R2# Rk&1# Rk#





Page 12: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Algorithm and Implementation Issues

1. From each database a set of rules Ri is obtained.

2. We compile these rules in a new database D joint with theattributes at1, . . . , atm.

D r1 r2 · · · rn at1 · · · atmD1 1 1 · · · 0 1 · · · 1D2 0 1 · · · 0 0 · · · 1


.... . .


. . ....

Dk 1 0 · · · 1 1 · · · 0

3. This information is fused by finding meta-association rules(involving the rules previously extracted r1, . . . , rn and theattributes at1, . . . , atm).


Page 13: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules

Formally, we will obtain three types of meta-association rules:

I ri → rj where ri, rj can be rules or a conjunction of rules.For example: ri = ri1 ∧ · · · ∧ ris.

I ati → atj where ati, atj can be attributes or a conjunction ofattributes.

I ri → atj or atj → ri where ri, atj can be a conjunction ofrules and a conjunction of attributes resp., and they can bemixed.


Page 14: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rule Mining Algorithm

Input: D1, . . . , Dk, minsupp, minCFOutput: MR (set of meta-association rules)1: for all Di such that 1 ≤ i ≤ k do2: # Di preprocessing3: Read Di and store the items I4: Transform Di into a boolean database5: Store database into a vector of BitSets

6: # Mine very strong rules7: Compute the candidate set C of frequent itemsets Supp(X) ≥ minsupp8: Store the BitSet vector indexes of X ∈ C and Supp(X)9: Compose the rule with X,Y ∈ C10: if Supp(X ⇒ Y ) ≥ minsupp and CF (X ⇒ Y ) ≥ minCF then11: The rule is a very strong rule12: end if13: end for14: # D creation

15: Compile all different rules from R1, . . . , Rk

16: Create D using compiled rules and additional attributes17: # Mining meta-association rules18: Repeat steps 1-13 to mine meta-association rules from D


Page 15: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: DataSet Description

I 22 Databases about crime related to the districts in the city ofChicago

I Number of transactions: min = 5694 and max = 22493.

I 6 types of attributes (around 300 items) in each database:

• Quarter of the year in which the incident happened.• Day period: morning, afternoon, evening, night.• Crime description according to police standard protocols.• Location description: street, residence, etc.• Arrest, if there is an arrest associated to the crime.• Domestic, if the crime happened in a domestic environment.

I Additional attributes about the districts:

• Number of students in the district: low, medium, high, veryhigh.

• Number of misconducts notified in the district: low, very low,medium, high, very high.

• Perceived safety index, obtained by means of surveys: low,medium, high.


Page 16: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: Some Results

Example of obtained meta-association rule:

“IF (Crime-Description=$500 under → Arrest=false)AND

(Location-Description=RESIDENCE → Arrest=false)

THEN Safety-Index=High”

with Supp = 0.136 and CF = 1.

That means that it is frequent to have a high perception of security when

there are crimes of minor relevance without arrests in residential areas.


Page 17: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: Some Results

Another example of obtained meta-association rule:

“IF Safety-Index=Medium

THEN(Location-Description=STREET →Domestic=false)

ANDNumber-of-Students=Very High”

with Supp = 0.136 and CF = 0.511.

Interpretation: In some districts (13.6%) a higher safety perception

(medium) is frequently associated to the fact that crimes are happening

in the streets and the number of students in the district is very high.


Page 18: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Discussion and Future Research

We have identified several problems or deficiencies of our approachthat can be improved.

I We have taken into account the presence/absence of a rule in D.

• It would be convenient to consider the degree of importance ofthe rule

Future: Improvement taking into account fuzzy association rules.

I The databases considered have the same structure.

• It would be convenient to address the problem of havingdatasets with different structure or different attributedescriptions but very similar meaning.

Future: Using a knowledge repository assisting the algorithm inmatching items with the same meaning.


Page 19: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining


[Sanchez et al.] D. Sanchez, M.A. Vila, L. Cerda, and J.M. Serrano.Association rules applied to credit card fraud detection. ExpertSystems with Applications, 36:3630-3640, 2009.[Delgado et al.] M. Delgado, M.D. Ruiz, and D. Sanchez. Studyinginterest measures for association rules through a logical model. Int.J. of Uncertainty, Fuzziness and Knowledge-Based Systems,18(1):87-106, 2010.[Ruiz et al.] M.D. Ruiz, M.J. Martin-Bautista, D. Snchez, M.A. Vila, andM. Delgado. Anomaly detection using fuzzy association rules. Int.J. Electronic Security and Digital Forensics, 6(1):25-37, 2014.


Page 20: Meta-Association Rules for Fusing Regular Association · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Thank you. Any questions?