Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11....
Transcript of Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11....
![Page 1: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/1.jpg)
Data Mining: Association Rules
Jef Wijsen
Universite de Mons (UMONS)
J. Wijsen Association Rules 1 / 21
![Page 2: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/2.jpg)
La BD et les regles
1 Des items {A,B,C ,D, . . .}.2 Tout ensemble fini d’items est appele un itemset ou une transaction.
3 La BD est un ensemble (plutot un bag) de transactions.
Par ex.,{B,C ,D,E}{A,B,C}{A,C ,D}{A,B,C ,D}
Une regle d’association est une expression de la forme X ⇒ Y ou X et Ysont des itemsets disjoints. Par ex., {B,C} ⇒ {D} ou B,C ⇒ D.
J. Wijsen Association Rules 2 / 21
![Page 3: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/3.jpg)
Le probleme
Le support d’une regle X ⇒ Y est s si s% pourcent des transactionscontiennent tous les items en X ∪ Y .
La confiance d’une regle X ⇒ Y est c si c% des transactions quicontiennent X , contiennent aussi Y .
Normalement, on utilise une fraction entre 0 et 1 au lieu d’unpourcentage.
Dans l’exemple, la regle B,C ⇒ D a un support de 2/4 et uneconfiance de 2/3.
Soient donnes un seuil de support (souvent ≤ 0.1) et un seuil deconfiance (plutot proche de 1.0), trouver toutes les regles quidepassent ces seuils.
J. Wijsen Association Rules 3 / 21
![Page 4: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/4.jpg)
Fichier ; BD de transactions
outlook temperature humidity windy play
sunny hot high false nosunny hot high true no
. . .;
{outlook=sunny,temperature=hot,humidity=high,windy=false,play=no}{outlook=sunny,temperature=hot,humidity=high,windy=true,play=no}
“outlook=sunny”, “temperature=hot” jouent le role d’items.
Il n’y a pas d’attribut cible !
J. Wijsen Association Rules 4 / 21
![Page 5: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/5.jpg)
Exercice
Creer un fichier paniers.arff pour stocker le contenu du “panier dela menagere” des clients qui passent a la caisse.
Decouvrir des regles d’association en paniers.arff.
Comprendre les messages de type “Size of set of large
itemsets L(2): 26”
J. Wijsen Association Rules 5 / 21
![Page 6: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/6.jpg)
Attention : Confiance 6= Correlation
Consider “basket⇒ corn flakes” and suppose there are 5000 students asfollows :
3000 students play basketball ;
3750 students eat corn flakes ;
2000 students both play basketball and eat corn flakes.
The rule “basket⇒ corn flakes” has a confidence of 20003000 = 2/3, but note :
Pr(basket ∧ corn flakes)
Pr(basket)Pr(corn flakes)= 0.889 < 1 .
J. Wijsen Association Rules 6 / 21
![Page 7: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/7.jpg)
Exercice
Pour l’ensemble
{AB,ACDE ,BCDF ,ABCD,ABCF},
trouvez tout sous-ensemble de ABCDEF avec un support count ≥ 2.
Le support count de X , denote σ(X ), est le nombre de transactionsqui incluent X . Par exemple, σ(BC ) = 3.
Notez :I le support de X ⇒ Y est egal a σ(XY )
N avec N le nombre detransactions (a multiplier par 100 pour obtenir un pourcentage) ;
I la confiance de X ⇒ Y est egale a σ(XY )σ(X ) .
J. Wijsen Association Rules 7 / 21
![Page 8: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/8.jpg)
Apriori : elagage
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 8 / 21
![Page 9: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/9.jpg)
Apriori : k-itemsets inclus dans une transaction
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 9 / 21
![Page 10: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/10.jpg)
Apriori : candidats “haches” et comptage
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 10 / 21
![Page 11: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/11.jpg)
Apriori : candidats “haches” et comptage
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 11 / 21
![Page 12: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/12.jpg)
Apriori : validation experimentale
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 12 / 21
![Page 13: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/13.jpg)
Apriori : construire les regles
J. Wijsen Association Rules 13 / 21
![Page 14: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/14.jpg)
Itemsets frequents maximaux
J. Wijsen Association Rules 14 / 21
![Page 15: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/15.jpg)
Itemsets frequents fermes
J. Wijsen Association Rules 15 / 21
![Page 16: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/16.jpg)
Parcours en largeur ou en profondeur
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 16 / 21
![Page 17: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/17.jpg)
Parcours en profondeur
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 17 / 21
![Page 18: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/18.jpg)
Representations dites “horizontale” et “verticale”
Source: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar: Introduction toData Mining. Addison Wesley, 2006
J. Wijsen Association Rules 18 / 21
![Page 19: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/19.jpg)
Calcul recursif de cover
Pour chaque enfant x · s de s, cover(x · s) est l’intersection de cover(s)avec le cover du preceding-sibling de s qui contient x .
cover(a · d) = cover(d) ∩ cover(a)cover(b · d) = cover(d) ∩ cover(b)cover(c · d) = cover(d) ∩ cover(c)
cover(a · bd) = cover(b · d) ∩ cover(a · d)cover(a · cd) = cover(c · d) ∩ cover(a · d)cover(b · cd) = cover(c · d) ∩ cover(b · d)
cover(a · bcd) = cover(b · cd) ∩ cover(a · cd)
J. Wijsen Association Rules 19 / 21
![Page 20: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/20.jpg)
FP-tree : une structure de donnees pour calculer lesintersections en RAM (Random Access Memory)
Parmi les 8 transactions qui contiennent a, il y en a 5 qui contiennentaussi b. Parmi ces 5 transactions qui incluent ab, il y en a 3 quicontiennent aussi c . Etc.
J. Wijsen Association Rules 20 / 21
![Page 21: Data Mining: Association Rulesinformatique.umons.ac.be/ssi/teaching/dwdm/ass.pdf · 2018. 11. 26. · Data Mining: Association Rules Author: Jef Wijsen Created Date: 11/26/2018 2:24:06](https://reader036.fdocuments.us/reader036/viewer/2022071419/61180a64c540cd36ca730dbe/html5/thumbnails/21.jpg)
Exemple : les transactions avec suffixe e
J. Wijsen Association Rules 21 / 21