Chapter 10 Association Rule
-
Upload
nadia-friza -
Category
Documents
-
view
474 -
download
3
description
Transcript of Chapter 10 Association Rule
![Page 1: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/1.jpg)
Chapter 10
ASSOCIATION RULEBy:
Aris D.(13406054)
Ricky A.(13406058)
Nadia FR. (13406069)
Amirah K.(13406070)
Paramita AW.(13406091)
Bahana W.(13406102)
![Page 2: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/2.jpg)
Introduction
• Affinity Analysis
Study of attributes or characteristics that “go together”.
• Market Based Analysis
The method, uncover rules for quantifying the relationship between two or more attributes.
“If antecedent, then consequent”
![Page 3: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/3.jpg)
Affinity Analysis & Market Basket Analysis
• Example: Supermarket may find that of the 1000 customers
shopping on a Thursday night, 200 bought diapers, and of the 200 who bought diapers, 50 bought beer.
The association rule:If buy diapers, then buy beers”,with support of 50/1000 = 5%,and confidence of 50/200=25%
![Page 4: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/4.jpg)
Affinity Analysis & Market Basket
Analysis (2)Examples business & research:• Investigating the proportion of subscribers to your
company’s cell phone plan that respond positively to an offer of a service upgrade
• Examining the proportion of children whose parents read to them who are themselves good readers
• Predicting degradation in telecommunications networks• Finding out which items in a supermarket are purchased
together & which are never purchased together• Determining the proportion of cases in which a new drug
will exhibit dangerous side effects
![Page 5: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/5.jpg)
Affinity Analysis & Market Basket
Analysis (3)• The number of possible association rules grows
exponentially in the number of attributes.
• If binary attributes (yes/no) then there are k.[2^(k-1)] possible association rule.
• Example: a convinience store that sells 100 items. Possible association rules = 100.[2^99] ≈ 6,4 x (10^31)
• A priori algorithm (pendahuluan) reduce the search problem to a more manageable size
![Page 6: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/6.jpg)
Notation for Data Representation in
Market Basket Analysis• Farmer sells I = {asparagus, beans, broccoli,
corn, green peppers, squash, tomatoes}
• A customer puts in a basket, Subset I = {broccoli, corn}
• Subset doesn’t keep track of how much each item is purchased, just the name of item.
![Page 7: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/7.jpg)
Transactional Data Format
![Page 8: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/8.jpg)
Tabular Data Format
![Page 9: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/9.jpg)
Support, Confidence, Frequent
Itemsets, & the Apriori Property• Example:D : set of transactions represented in Table 10.1T : a transaction in D represents a set of itemsI : set of itemsSet of items A : beans, squashSet of items B : asparagus
THEN …Association rule takes the form if A, then B (AB),A and B are PROPER subsets of I, and are mutuallyexclusive
![Page 10: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/10.jpg)
Table of Transaction Made
![Page 11: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/11.jpg)
Support and Confidence• Support, s, is the proportion of transactions in D
that contain both A and B.support = P(AB)= number of transactions containing both A&B
total number of transactions• Confidence, c, is a measure of the accuracy of the
rule.confidence = P(B|A)= P(AB)
P(A)= number of transactions containing both A&B
number of transactions containing A
• Analysts prefer RULES:High support AND High confidence
![Page 12: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/12.jpg)
Frequent Itemset Definition…
An Itemset is a set of items contained in I, and a k-itemset containing k items. e.g: {beans, squash} 2-itemset The itemset frequency…
the number of transactions that contain the particular itemset A frequent itemset …
itemset that occurs at least a certain minimum number of times, having itemset frequency
Example:Set that = 4, then itemsets that occur more than FOUR times are said to be frequent.
![Page 13: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/13.jpg)
• Mining Association RulesIt is a two-steps process:1. Find all frequent itemsets (all itemsets with
frequency )2. From the frequent itemsets, generate
association rules satisfying the minimum support and confidence conditions
• The Apriori property states that if an itemset Z isnot frequent, then adding another item A tothe itemset Z will not make Z morefrequent. This helpful property reducessignificantly the search space for the a priorialgorithm.
The Apriori Property
![Page 14: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/14.jpg)
How does the Apriori Algorithm Work?
• Part 1: Generating Frequent Itemsets
• Part 2: Generating Association Rules
![Page 15: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/15.jpg)
Generating Frequent Itemsets• Example:
let = 4, so that an itemset is frequent if it occursfour or more times in D.
F1= {asparagus, beans, broccoli, corn, greenpeppers, squash, tomatoes}F2 first, constructs a set Ck of candidate k-itemsetsby joining Fk-1 with itself. Then it prunes Ck usingthe a priori property.Ck for k=2, consists of all the combinations ofvegetables in Table 10.4F3 not much different than the steps for F2, butuse k number = 3
![Page 16: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/16.jpg)
Table 10.3 (pg.183)
![Page 17: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/17.jpg)
Table 10.4 (pg. 185)
![Page 18: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/18.jpg)
• However, consider s={beans, corn, squash}
the subset {corn, squash} has frequency 3 < 4 =, so that {corn, squash} is not frequent.
By the priori property, therefore, {beans, corn,squash} cannot be frequent, is therefore pruned,and doesn’t appear in F3
So does the s= {beans, squash, tomatoes}, the frequency of the subsets is < 4
![Page 19: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/19.jpg)
Generating Association Rules
1. Generate all subsets of s.
2. Association Rule R : ss ⇒ (s-ss)Generate R if fulfills the minimum confidence requirement.
(s-ss) is set s without ss
![Page 20: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/20.jpg)
Example two antecedent
• All transaction = 14
• Transaction include asparagus and beans = 5
• Transaction include asparagus and Squash = 5
• Transaction include Beans and squash = 6
![Page 21: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/21.jpg)
Ranked by support x Confidence
• Minimum Confidence 80%
![Page 22: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/22.jpg)
Clementine generating Association
Rules
![Page 23: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/23.jpg)
Clementine generating Association
Rules (2)• Support means occurences of antecedent,
different from what we defined before.
• First columns indicates number of antecedent occurs.
• To find actual “support” using clementine, multiply support and confidence.
![Page 24: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/24.jpg)
Extension From Flag Data to General
Categorical Data
- Association rule not only for Flag (Boolean) data.
- A priori algorithm can be applied to categorical data.
![Page 25: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/25.jpg)
Example using Clementine
• Recall Normalized adult data set in chapter 6 and 7
![Page 26: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/26.jpg)
Information-Theoretic Approach:
Generalized Rule Induction MethodWhy GRI?
• A priori algorithm is not well equipped to handle numerical attributes, need discretization
• Discretization can lead to loss of information
• GRI can handle both categorical or numerical variables as inputs, but still requires categorical variables as output
![Page 27: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/27.jpg)
Generalized Rule Induction Method (2)
J-Measure
• p(x) probability of the value of x (antecedent)
• p(y) probability of the value of y (consequent)
• p(y|x) conditional probability of y given that x has occured
)(1
)|(1ln)].|(1[
)(
)|(ln).|()(
yp
xypxyp
yp
xypxypxpJ
![Page 28: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/28.jpg)
Generalized Rule Induction Method (3)
• J-Measure shows “interestingness”
• In GRI, user specifies how many association rules would be reported
• If the “interestingness” of new rule > current minimum J in the rule table, new rule is inserted, rule with minimum J is eliminated
![Page 29: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/29.jpg)
Application of GRIp(x) : female, never married
p(x) = 0.1463
![Page 30: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/30.jpg)
Application of GRI (2)
p(y) : work class = private
p(y) = 0.6958
![Page 31: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/31.jpg)
Application of GRI (3)p(y|x) : work class = private;
given : female, never married
p(y|x) = conditional probabilities = 0.763
![Page 32: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/32.jpg)
Application of GRI
Calculation :
001637.0
)]7791.0ln().237.0()0966.1ln(.763.0[1463.0
3042.0
237.0ln).237.0(
6958.0
763.0ln.763.01463.0
)(1
)|(1ln)].|(1[
)(
)|(ln).|()(
yp
xypxyp
yp
xypxypxpJ
![Page 33: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/33.jpg)
When not to use Association Rules
• Association Rules chosen a priori could be used based on:
▫ Confidence
▫ Confidence Difference
▫ Confidence Ratio
• Association Rules need to be applied with care because the results are sometimes unreliable.
![Page 34: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/34.jpg)
When not to use Association Rules (2)Association Rules chosen a priori, based on confidence
• Applying this association rule reduces the probability of randomly selecting desired data.
• Eventhough the rule is useless, software still reported it probably because the default ranking mechanism for priori’s algorithm is confidence.
• We should never simply believe the computer output without making the effort to understand the models and mechanism underlying the result.
![Page 35: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/35.jpg)
When not to use Association Rules (3)Association Rules chosen a priori, based on confidence
![Page 36: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/36.jpg)
When not to use Association Rules (4)Association Rules chosen a priori, based on confidence difference
• A random selection from the database wouldhave provided more effective results (none useless report)than applying the association rule.
• This rule provide the greatest increase in confidence from the prior to posterior.
• Evaluation measures the absolute difference between the prior and posterior confidences.
![Page 37: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/37.jpg)
When not to use Association Rules (5)Association Rules chosen a priori, based on confidence difference
![Page 38: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/38.jpg)
When not to use Association Rules (6)Association Rules chosen a priori, based on confidence ratio
• Analyst prefer to use the confidence ratio to evaluate potential rules.
• Confidence difference criterion yielded the very same rules as did the confidence ratio criterion.
![Page 39: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/39.jpg)
When not to use Association Rules (7)Association Rules chosen a priori, based on confidence ratio
• Example:
If Marital_Satus = Divorced, then sex = Female. p(y)=0.3317 danp(y|x)=0.60
![Page 40: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/40.jpg)
Do Association Rules Represent
Supervised or Unsupervised Learning?• Supervised learning:
▫ Variable is prespecified
▫ Algorithm is provided with a rich collection of examples where possible association between the target vaiable and the predictor variables may be uncovered
• Unsupervised learning:▫ No target variable is identified explicitly
▫ Algorithm searches for patterns and structure among all the variables
• Association Rules generally used for unsupervised learning but can also be applied for supervised learning for classification task
![Page 41: Chapter 10 Association Rule](https://reader034.fdocuments.us/reader034/viewer/2022051312/546b2fe8af795971298b4abd/html5/thumbnails/41.jpg)
Local Patterns Versus Global Models
Model: Global Description or Explanation of a data set. Patterns: Essential local features of Data Association rules are well suited to uncovering
local patterns in data Applying “if “clause drills down deep into data set,
uncovering a hidden local pattern that might be relevant Finding local patterns is one of the most
important goals in data mining. It can lead to new profitable initiatives.