A Parameterised Algorithm for Mining Association Rules
description
Transcript of A Parameterised Algorithm for Mining Association Rules
![Page 1: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/1.jpg)
1
A Parameterised Algorithm for Mining Association Rules
Department of Information & Computer Education, NTNU
Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 200
1, pp. 45-51.
Advisor: Jia-Ling Koh
Speaker: Chen-Yi Lin
![Page 2: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/2.jpg)
2
Introduction Problem Definition Finding Frequent Itemsets Experimental Results Conclusion
Department of Information & Computer Education, NTNU
Outline
![Page 3: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/3.jpg)
3
Introduction (1/2)
Majority of the algorithms finding frequent itemsets counts one category of itemsets, e.g. Apriori algorithm.
The quality of association rule mining algorithms is determined:– the number of passes through an input dat
aset– the number of candidate itemsets
Department of Information & Computer Education, NTNU
![Page 4: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/4.jpg)
4
Introduction (2/2)
One of the objectives is to construct an algorithm that makes a good guess.– the parameterised (n, p) algorithm finds all
frequent itemsets from a range of n levels in itemset lattice in p passes (n>=p) through an input data set.
Department of Information & Computer Education, NTNU
![Page 5: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/5.jpg)
5
Problem Definition
Positive candidate itemset– It is assumed (guessed) to be frequent.
Negative candidate itemset– It is assumed (guessed) to be not frequent.
Remaining candidate itemset – candidates verified in another scan.
C
C
RC
Department of Information & Computer Education, NTNU
![Page 6: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/6.jpg)
6
Finding Frequent Itemsets (Guessing Candidate Itemsets)
TID Items
1 ABC
2 ABE
3 BCF
4 BDE
5 ACE
6 ABCD
7 ABCE
8 ABCEF
9 ABCEF
10 BCDEF
Item
Freq. According to tr. Length
3 elements
4 elements
5 elements
Total freq
A 3 2 2 7
B 4 2 3 9
C 3 2 3 9
D 1 1 1 3
E 3 1 3 7
F 1 0 3 4
No. of
m-els trs.5 2 3 10
Statistics table T
FEDCBAL ,,,,,1 scan
Department of Information & Computer Education, NTNU
Initial DBscan
![Page 7: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/7.jpg)
7
FEDCBAL ,,,,,1
Item frequency threshold = 80%m-element transaction threshold = 5Number of levels to traverse (n) = 3Number of passes through an input data set (p) = 2
EFDFDECFCECDBFBEBDBCAFAEADACABC ,,,,,,,,,,,,,,2
apriori_gen
3-element transactions: 5*80%=4 {B} 4-element transactions: 2*80%=2 {ABC}
5-element transactions: 3*80%=3 {BCEF}
EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2
DFDECDBDADC ,,,,2
Department of Information & Computer Education, NTNU
Statistics table T
![Page 8: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/8.jpg)
8
CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3
apriori_gen
EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2
3C
CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3
4C
BCEFACEFABEFABCFABCEC ,,,,4
BCEFACEFABEFABCFABCEC ,,,,4
apriori_gen
23 CC
BCEFACEFABEFABCFABCEC ,,,,4
DFDECDBDADC ,,,,2
43 CC
pruning all subsets of positive superset
Department of Information & Computer Education, NTNU
![Page 9: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/9.jpg)
9
Finding Frequent Itemsets (Verification of Candidate Itemset
s)Minimum support=20%
23 CC
BCEFACEFABEFABCFABCEC ,,,,4
DFDECDBDADC ,,,,2
43 CC
scan DB (1)
DECDBDL ,,2 3L
BCEFACEFABEFABCFABCEL ,,,,4
RC2
CDEBDEBCDC R ,,3
BCDFBCDEC R ,4
generate remaining candidate itemsets
Department of Information & Computer Education, NTNU
![Page 10: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/10.jpg)
10
scan DB (2)
EFDECFCECDBFBEBDBCAFAEACABL ,,,,,,,,,,,,2 CEFBEFBDEBCFBCEBCDAEFACFACEABFABEABCL ,,,,,,,,,,,3
BCEFACEFABEFABCFABCEL ,,,,4
apriori_gen
ABCEFC 5
scan DB
ABCEFL 5
Department of Information & Computer Education, NTNU
![Page 11: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/11.jpg)
11
Finding Frequent Itemsets
Department of Information & Computer Education, NTNU
![Page 12: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/12.jpg)
12
Experimental Results (1/6)
Parameters:– ntrans- number of transactions in a datab
ase– tl- average transaction length– np- number of patterns– sup-minimum support
Department of Information & Computer Education, NTNU
![Page 13: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/13.jpg)
13
Experimental Results (2/6)
A comparison of no. database scans between Apriori and (n, p) algorithm
Department of Information & Computer Education, NTNU
![Page 14: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/14.jpg)
14
Experimental Results (3/6)
Performance of Apriori and (n, p) with tl=10 np=10 sup=20%
Department of Information & Computer Education, NTNU
![Page 15: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/15.jpg)
15
Experimental Results (4/6)
Performance of Apriori and (n, p) algorithm with tl=14 np=10 sup=20%
Performance of Apriori and (n, p) algorithm with tl=20 np=100 sup=10%
Department of Information & Computer Education, NTNU
![Page 16: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/16.jpg)
16
Experimental Results (5/6)
A performance of (n,3) with increasing ratio of (n/p)
Department of Information & Computer Education, NTNU
![Page 17: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/17.jpg)
17
Experimental Results (6/6)
A performance of (8,p) with increasing parameter p
Department of Information & Computer Education, NTNU
![Page 18: A Parameterised Algorithm for Mining Association Rules](https://reader035.fdocuments.us/reader035/viewer/2022062807/568150f7550346895dbf12c1/html5/thumbnails/18.jpg)
18
Conclusion
The important contribution is the reduction of number scans through a data set.
Department of Information & Computer Education, NTNU