Mining Multiple-level Association Rules in Large Databases
description
Transcript of Mining Multiple-level Association Rules in Large Databases
![Page 1: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/1.jpg)
Mining Multiple-level Association Rules in Large
DatabasesAuthors :
JIAWEI HAN, Simon Fraser University, British Columbia.
YONGJIAN FU, University of Missouri-Rolla, Missouri.
Presenter : Zhenyu Lu
(based on Mohammed’s previous slides, with some changes)
IEEE Transactions on Knowledge and Data Engineering, 1999
![Page 2: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/2.jpg)
Outline Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions
![Page 3: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/3.jpg)
Introduction:Why Multiple-Level Association Rules?
TID items
T1 {m1, b2}
T2 {m2, b1}
T3 {b2}
Frequent itemset: {b2}
A.A rules: none
Is this database useless?
![Page 4: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/4.jpg)
Introduction:Why Multiple-Level Association Rules?
TID items
T1 {milk, bread}
T2 {milk, bread}
T3 {bread}
minisup = 50% miniconf = 50%
Frequent itemset: {milk, bread} A.A rules: milk <=> bread
food
milk bread
m1 m2 b1 b2
What if we have this abstraction tree?
![Page 5: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/5.jpg)
Introduction:Why Multiple-Level Association Rules?
• Sometimes, at primitive data level, data does not show any significant pattern. But there are useful information hiding behind.
• The goal of Multiple-Level Association Analysis is to find the hidden information in or between levels of abstraction
![Page 6: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/6.jpg)
Introduction:Requirements in Multiple-Level Association Analysis Two general requirements to do multiple-level association
rule mining:
1) Provide data at multiple levels of abstraction. (a common practice now)
2) Find efficient methods for multiple-level rule mining. (our
focus)
![Page 7: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/7.jpg)
Outline Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions
![Page 8: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/8.jpg)
Algorithm : observation
TID items
T1 {m1, b2}
T2 {m2, b1}
T3 {b2}
T4 {m2, b1}
T5 {m2}
minisup = 50% miniconf = 50%
food
milk bread
m1 m2 b1 b2
Level 1
Level 2
Frequent itemset: {milk, bread}A.A rule: milk <=> breadTID items
T1 {milk, bread}
T2 {milk, bread}
T3 {bread}
T4 {milk, bread}
T5 {milk}
Frequent itemset: {m2}A.A rule: none
One minisup forall levels?
What about {m2, b1}?
![Page 9: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/9.jpg)
Algorithm : observation
miniconf = 50%
food
milk bread
m1 m2 b1 b2
Level 1: minisup = 50%
Level 2: minisup = 40%
Frequent itemset: {milk, bread}A.A rule: milk <=> bread
Frequent itemset: { m2, b1, b2}A.A rule: m2 <=> b1
makes more sense now
![Page 10: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/10.jpg)
Algorithm : observationDrawbacks to use only one minisup:
• If the minisup is too high, we are losing information from lower levels
• If the minisup is too low, we are gaining too many rules from higher levels, many of them are useless
Approach: ascending minisup on each level
food
milk bread
m1 m2 b1 b2
minisup
![Page 11: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/11.jpg)
Algorithm: An Example
An entry of sales_transaction Table
A sales_item Description Relation
Transaction_id Bar_code_set
351428 {17325,92108,55349,88157,…}
Bar_code
category Brand Content
Size Storage_pd
price
17325 Milk Foremost 2% 1ga. 14(days) $3.89
![Page 12: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/12.jpg)
Algorithm: An Example
GID bar_code category content brand
112 17325 Milk 2% Foremost
food
milk
Dairyland Foremost
2% chocolate
bread
white wheat
First 1: implies milk
2: implies Foremost brand
Second 1: implies 2% content
Encode the database with layer information
![Page 13: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/13.jpg)
Encoded Transaction Table:T[1]
TID Items
T1 {111,121,211,221}
T2 {111,211,222,323}
T3 {112,122,221,411}
T4 {111,121}
T5 {111,122,211,221,413}
T6 {211,323,524}
T7 {323,411,524,713}
Algorithm: An Example
![Page 14: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/14.jpg)
T[2]Level-1 minsup = 4L[1,1]
L[1,2]
TID Items
T1 {111,121,211,221}
T2 {111,211,222}
T3 {112,122,221}
T4 {111,121}
T5 {111,122,211,221}
T6 {211}
Itemset Support
{1**} 5
{2**} 5
Itemset Support
{1**,2**} 4
Algorithm: An ExampleThe frequent 1-itemset on level 1
Use Apriori on each level
only keep itemsin L[1,1] from T[1]
![Page 15: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/15.jpg)
Level-2 minsup = 3
L[2,1]
Itemset Support
{11*} 5
{12*} 4
{21*} 4
{22*} 4
L[2,2]
Itemset Support
{11*,12*}
4
{11*,21*}
3
{11*,22*}
4
{12*,22*}
3
{21*,22*}
3
L[2,3]
Itemset Support
{11*,12*,22*}
3
{11*,21*,22*}
3
Algorithm: An Example
![Page 16: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/16.jpg)
Frequent Item Sets at Level 3Level-3 minsup = 3
L[3,1]
Itemset
Support
{111} 4
{211} 4
{221} 3
L[3,2]
Itemset Support
{111,211} 3
E.g.Level-1: 80% of customers that purchase milk also purchase bread. milk bread with Confidence= 80%
Level-2:75% of people who buy 2% milk also buy wheat bread. 2% milk wheat bread with Confidence= 75%
Only generate T[1] & T[2], all frequent itemsets after level 2 is generated from T[2]
![Page 17: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/17.jpg)
Algorithm ML_T2L1
Purpose: To find multiple-level frequent item sets for mining strong association rules in a transaction database
Input T[1]: a hierarchy-information encoded transaction
table of form <TID,Item-set> minisup threshold for each level L in the form:
(minsup[L])
Output: Multiple-level frequent item sets
![Page 18: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/18.jpg)
Algorithm variations Algorithm ML_T1LA
Use only the first encoded transaction table T[1]. Support for the candidate sets at all levels computed at the same time. pros: Only one table and maximum k-scans cons: May consist of infrequent items and requires large space.
Algorithm ML_TML1 Generate multiple encoded transaction tables T[1],…,T[max_l+1] Pros: May save substantial amount of processing Cons: Can be inefficient if only a few items are filtered out at each level
processed.
Algorithm ML_T2LA Uses 2 encoded transaction tables as in ML_T2L1 algorithm. Support for the candidate sets at all levels computed at the same time. Pros: Potentially efficient if T[2] consists of much fewer items than T[1]. Cons: ?
![Page 19: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/19.jpg)
Outline Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions
![Page 20: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/20.jpg)
Performance Study Assumptions:
The maximal level in concept hierarchy is 3
Use two data sets DB1 (Average frequent item length = 4 and Average transaction size =10) and DB2 (Average frequent item length = 6 and Average transaction size =20)
Conclusions: Relative performance of the four algorithms is highly relevant
to the threshold setting (i.e., the power of a filter at each level).
Parallel derivation of L(l,k) is useful and deriving a transaction table T(2) is usually beneficial.
ML_T1LA is found to be the BEST or the second best algorithm.
![Page 21: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/21.jpg)
Average frequent item length = 4Average transaction size =10
Average frequent item length = 6Average transaction size =20
Performance Study
minisup[2] = 2% minisup[3] = 0.75% minisup[2] = 3% minisup[3] = 1%
![Page 22: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/22.jpg)
Average frequent item length = 4Average transaction size =10
Average frequent item length = 6Average transaction size =20
minisup[1] = 60% minisup[3] = 0.75% minisup[1] = 55% minisup[3] = 1%
Performance Study
![Page 23: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/23.jpg)
minisup[1] = 60% minisup[2] = 2% minisup[1] = 55% minisup[2] = 3%
Performance Study
![Page 24: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/24.jpg)
Performance Study
Two interesting performance features:
• The performance of algorithm is highly relative to
minisup, especially minisup[1] & minisup[2].
• T[2] is beneficial
![Page 25: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/25.jpg)
Outline Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions
![Page 26: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/26.jpg)
Cross-level association
food
milk bread
m1 m2 b1 b2
food
milk bread
m1 m2 b1 b2
expand
mine rules like milk => b1mine rules like milk => breadand m2 => b1
![Page 27: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/27.jpg)
Cross-level association
Two adjustments:
• A single minisup is used at all levels
• When the frequent k-itemsets are generated, items at all levels are considered, itemsets which contain an item and its ancestor are excluded
![Page 28: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/28.jpg)
Outline Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions
![Page 29: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/29.jpg)
Filtering of uninteresting association rules Removal of redundant rules:
• To remove redundant rules, when a rule R passes the minimum confidence test, it is checked against every strong rule R' , of which R is a descendant. If the confidence of R, (R), falls in the range of the expected confidence with the variation of , it is removed.
• Example: • milk bread(12% sup, 85% con)• Chocolate milk bread(1% sup, 84% con)• Not interesting if 8% of milk is chocolate milk
• Can reduce rules by 30% to 60%
![Page 30: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/30.jpg)
Filtering of uninteresting association rules (continued)
Removal of unnecessary rules:• To filter out unnecessary association rules, for each strong rule R’ :
A => B, we test every such rule R : A ‑ C => B, where C belongs to A. If the confidence of R, (R), is not significantly different from that of R' ,(R' ), R is removed.
• Example: • 80% customer buy milk => bread• 80% customer buy milk + butter => bread
• Reduces rules by 50% to 80%
![Page 31: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/31.jpg)
Conclusions
Extended the association rules from single-level to multiple-level.
A top-down progressive deepening technique is developed for mining multiple-level association rules.
Filtering of uninteresting association rules.
![Page 32: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/32.jpg)
Exams Questions
Q1: Give an example of multilevel association rules? A: Besides finding the 80% of customers that purchase
milk may also purchase bread, it is interesting to allow users to drill-down and show that 75% of people buy wheat bread if they buy 2 percent milk.
![Page 33: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/33.jpg)
Exams Questions Q2: What are the problems in using normal Apiori methods?? A: One may apply the Apriori algorithm to examine data items at multiple levels of
abstraction under the same minimum support and minimum confidence thresholds. This direction is simple, but it may lead to some undesirable results.
First, large support is more likely to exist at high levels of abstraction. If one wants to find strong associations at relatively low levels of abstraction, the minimum support threshold must be reduced substantially; this may lead to the generation of many uninteresting associations at high or intermediate levels.
Second, since it is unlikely to find many strong association rules at a primitive concept level, mining strong associations should be performed at a rather high concept level, which is actually the case in many studies. However, mining association rules at high concept levels may often lead to the rules corresponding to prior knowledge and expectations, such as “milk => bread”, (which could be common sense), or lead to some uninteresting attribute combinations if the minimum support is allowed to be rather small, such as “toy => milk”, (which may just happen together by chance).
![Page 34: Mining Multiple-level Association Rules in Large Databases](https://reader035.fdocuments.us/reader035/viewer/2022070412/5681492a550346895db66244/html5/thumbnails/34.jpg)
Exams Questions
Q3: What are the 2 general steps to do multiple-level association rule mining?
A: To explore multiple-level association rule mining, one needs to provide:
1) Data at multiple levels of abstraction, and 2) Efficient methods for multiple-level rule mining.