7/29/2019 Rule Mining
1/20
Association Rule Mining
7/29/2019 Rule Mining
2/20
2
The Task Two ways of defining the task
General Input: A collection of instances Output: rules to predict the values of any attribute(s) (not
just the class attribute) from values of other attributes E.g. if temperature = cool then humidity =normal If the right hand side of a rule has only the class attribute,
then the rule is a classification rule Distinction: Classification rules are applied together as sets of
rules
Specific - Market-basket analysis Input: a collection of transactions
Output: rules to predict the occurrence of any item(s) fromthe occurrence of other items in a transaction E.g. {Milk, Diaper} -> {Beer}
General rule structure: Antecedents -> Consequents
7/29/2019 Rule Mining
3/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of otheritems in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means co-occurrence,
not causality!
7/29/2019 Rule Mining
4/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Definition: Frequent Itemset
Itemset
A collection of one or more itemsExample: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count ()
Frequency of occurrence of anitemset
E.g. ({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that containan itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greaterthan or equal to a minsup threshold
TID Items
1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke
7/29/2019 Rule Mining
5/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Definition: Association Rule
Example:
Beer}Diaper,Milk{
4.052
|T|)BeerDiaper,,Milk( s
67.03
2
)Diaper,Milk(
)BeerDiaper,Milk,(
c
Association Rule
An implication expression of the formX Y, where X and Y are itemsets
Example:
{Milk, Diaper} {Beer}
Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain
both X and Y
Confidence (c)
Measures how often items in Yappear in transactions that
contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
7/29/2019 Rule Mining
6/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Association Rule Mining Task
Given a set of transactions T, the goal of
association rule mining is to find all rules having
support minsup threshold
confidence minconfthreshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!
7/29/2019 Rule Mining
7/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Mining Association Rules
Example of Rules:{Milk,Diaper} {Beer} (s=0.4, c=0.67)
{Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} {Milk,Beer} (s=0.4, c=0.5){Milk} {Diaper,Beer} (s=0.4, c=0.5)
TID Items
1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke
Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements
7/29/2019 Rule Mining
8/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Mining Association Rules
Two-step approach:
1. Frequent Itemset Generation
Generate all itemsets whose support minsup
2. Rule Generation Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
Frequent itemset generation is stillcomputationally expensive
7/29/2019 Rule Mining
9/20
9
Power set Given a set S, power set, P is the set of all
subsets of S Known property of power sets If S has n number of elements, P will have N = 2n
number of elements.
Examples: For S = {}, P={{}}, N = 20 = 1 For S = {Milk}, P={{}, {Milk}}, N=21=2 For S = {Milk, Diaper} P={{},{Milk}, {Diaper}, {Milk, Diaper}}, N=22=4
For S = {Milk, Diaper, Beer}, P={{},{Milk}, {Diaper}, {Beer}, {Milk, Diaper},
{Diaper, Beer}, {Beer, Milk}, {Milk, Diaper, Beer}},N=23=8
B F h
7/29/2019 Rule Mining
10/20
10
Brute Force approach toFrequent Itemset Generation
For an itemset with 3elements, we have 8 subsets Each subset is a candidate
frequent itemset whichneeds to be matched againsteach transaction
TID Items
1 Bread, Milk2 Bread, Diaper, Beer, Eggs3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer5 Bread, Milk, Diaper, Coke
Itemset Count
{Milk} 4
{Diaper} 4
{Beer} 3
1-itemsets
2-itemsets
Itemset Count
{Milk, Diaper} 3
{Diaper, Beer} 3
{Beer, Milk} 23-itemsets
Itemset Count
{Milk, Diaper, Beer} 2
Important Observation: Counts of subsets cant be smaller than the count of an itemset!
7/29/2019 Rule Mining
11/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also
be frequent
Apriori principle holds due to the following propertyof the support measure:
Support of an itemset never exceeds the support of its
subsets
This is known as the anti-monotone property of support
)()()(:, YsXsYXYX
7/29/2019 Rule Mining
12/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Illustrating Apriori Principle
Item Count
Bread 4Coke 2Milk 4Beer 3Diaper 4Eggs 1
Itemset Count
{Bread,Milk} 3{Bread,Beer} 2{Bread,Diaper} 3
{Milk,Beer} 2{Milk,Diaper} 3{Beer,Diaper} 3
Itemset Count
{Bread,Milk,Diaper} 3
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generatecandidates involving Coke
or Eggs)
Triplets (3-itemsets)Minimum Support = 3
If every subset is considered,6C1 +6C2 +
6C3 = 41With support-based pruning,
6 + 6 + 1 = 13Write all possible 3-itemsets
and prune the list based on infrequent 2-itemsets
7/29/2019 Rule Mining
13/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Apriori Algorithm
Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identifiedGenerate length (k+1) candidate itemsets from length kfrequent itemsets
Prune candidate itemsets containing subsets of length k thatare infrequent
Count the support of each candidate by scanning the DBEliminate candidates that are infrequent, leaving only thosethat are frequent
Note: This algorithm makes several passes over the transaction list
7/29/2019 Rule Mining
14/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Rule Generation
Given a frequent itemset L, find all non-empty subsets f
L such that f L f satisfies the minimum confidencerequirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,BD AC, CD AB,
If |L| = k, then there are 2k 2 candidate association rules(ignoring L and L)
Because, rules are generated from frequent itemsets,they automatically satisfy the minimum support threshold
Rule generation should ensure production of rules that satisfyonly the minimum confidence threshold
7/29/2019 Rule Mining
15/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Rule Generation
How to efficiently generate rules from frequent itemsets?
In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)
But confidence of rules generated from the same itemset has ananti-monotone property
e.g., L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD)
Confidence is anti-monotone w.r.t. number of items on the RHS of
the rule
Example: Consider the following two rules
{Milk} {Diaper, Beer} has cm=0.5
{Milk, Diaper} {Beer} has cmd=0.67 > cm
7/29/2019 Rule Mining
16/20
16
Computing Confidence for Rules
Unlike computing support,computing confidence doesnot require several passesover the transaction list
Supports computed from
frequent itemset generationcan be reused Tables on the side show
support values for all (exceptnull) the subsets of itemset{Bread, Milk, Diaper}
Confidence values are shownfor the (23-2) = 6 rulesgenerated from thisfrequent itemset
Itemset Support
{Milk} 4/5=0.8
{Diaper} 4/5=0.8
{Bread} 4/5=0.8
Example of Rules:
{Bread, Milk} {Diaper} (s=0.6, c=1.0)
{Milk, Diaper} {Bread} (s=0.6, c=1.0)
{Diaper, Bread} {Milk} (s=0.6, c=1.0){Bread} {Milk, Diaper} (s=0.6, c=0.75)
{Diaper} {Milk, Bread} (s=0.6, c=0.75)
{Milk} {Diaper, Bread} (s=0.6, c=0.75)
Itemset Support{Milk, Diaper} 3/5=0.6
{Diaper, Bread} 3/5=0.6
{Bread, Milk} 3/5=0.6
Itemset Support
{Bread, Milk, Diaper} 3/5=0.6
7/29/2019 Rule Mining
17/20
17
Rule generation in AprioriAlgorithm
For each frequent k-itemset where k > 2 Generate high confidence
rules with one item in theconsequent
Using these rules,iteratively generate highconfidence rules withmore than one items inthe consequent
if any rule has lowconfidence then all theother rules containing theconsequents can bepruned (not generated)
{Bread, Milk, Diaper}1-item rules
{Bread, Milk} {Diaper}{Milk, Diaper} {Bread}{Diaper, Bread} {Milk}
2-item rules{Bread} {Milk, Diaper}{Diaper} {Milk, Bread}
{Milk} {Diaper, Bread}
7/29/2019 Rule Mining
18/20
18
Evaluation
Support and confidence used by Apriori allow a lot ofrules which are not necessarily interesting
Two options to extract interesting rules Using subjective knowledge
Using objective measures (measures better than confidence) Subjective approaches
Visualization users allowed to interactively verify thediscovered rules
Template-based approach filter out rules that do not fit
the user specified templates Subjective interestingness measure filter out rules that
are obvious (bread -> butter) and that are non-actionable (donot lead to profits)
7/29/2019 Rule Mining
19/20
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 #
Drawback of Confidence
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Although confidence is high, rule is misleading
P(Coffee|Tea) = 0.9375
7/29/2019 Rule Mining
20/20
20
Objective Measures
Confidence estimates rule quality in terms of theantecedent support but not consequent support As seen on the previous slide, support for consequent
(P(coffee)) is higher than the rule confidence(P(coffee/Tea))
Weka uses other objective measures Lift (A->B) = confidence(A->B)/support(B) = support(A-
>B)/(support(A)*support(B)) Leverage (A->B) = support(A->B) support(A)*support(B)
Conviction(A->B) = support(A)*support(not B)/support(A->B) conviction inverts the lift ratio and also computes support
for RHS not being true