1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by...

51
1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. F. Provost (Stern, NYU) Prof. B. Liu, UIC

Transcript of 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by...

Page 1: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

1

CISC 4631Data Mining

Lecture 09:

Association Rule Mining

Theses slides are based on the slides by • Tan, Steinbach and Kumar (textbook authors)• Prof. F. Provost (Stern, NYU)• Prof. B. Liu, UIC

Page 2: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

What Is Association Mining?• Association rule mining:

– Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

• Applications:– Market Basket analysis, cross-marketing, catalog design,

loss-leader analysis, clustering, classification, etc.

2

Page 3: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association Mining?• Examples.

– Rule form: “Body ead [support, confidence]”.– buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%]– buys(x, "bread") buys(x, "milk") [0.6%, 65%]– major(x, "CS") /\ takes(x, "DB") grade(x, "A") [1%, 75%]– age(X,30-45) /\ income(X, 50K-75K) buys(X, SUVcar)– age=“30-45”, income=“50K-75K” car=“SUV”

Page 4: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Market-basket analysisand finding associations

• Do items occur together?(more than I might expect)

• Proposed by Agrawal et al in 1993. • It is an important data mining model studied extensively by the

database and data mining community. • Assume all data are categorical.• No good algorithm for numeric data.• Initially used for Market Basket Analysis to find how items

purchased by customers are related.

Bread Milk [sup = 5%, conf = 100%]

Page 5: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association Rule: Basic Concepts

• Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)

• Find: all rules that correlate the presence of one set of items with that of another set of items– E.g., 98% of people who purchase tires and auto accessories

also get automotive services done• Applications

– * Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)

– Home Electronics * (What other products should the store stocks up?)

– Detecting “ping-pong”ing of patients, faulty “collisions”5

Page 6: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association Rule Mining

• Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

6

Market-Basket transactions

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk},

Implication means co-occurrence, not causality!

An itemset is simply a set of items

Page 7: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association Rule Mining

– We are interested in rules that are• non-trivial (and possibly unexpected)• actionable• easily explainable

7

Page 8: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Examples from a Supermarket

• Can you think of association rules from a supermarket?

• Let’s say you identify association rules from a supermarket, how might you exploit them?– That is, if you are the store manager, how might

you make money? • Assume you have a rule of the form X Y

8

Page 9: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Supermarket examples

• If you have a rule X Y, you could:– Run a sale on X if you want to increase sales of Y– Locate the two items near each other– Locate the two items far from each other to make

the shopper walk through the store– Print out a coupon on checkout for Y if shopper

bought X but not Y

9

Page 10: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association “rules” – standard format

Rule format: (A set can consist of just a single item)

If {set of items} Then {set of items}

Condition implies Results

If {Diapers, Baby Food}

Condition

{Beer, Chips}

Results

Then

Customerbuys diaper

Customer buys both

Customerbuys beer

Page 11: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

What is an interesting association?

• Requires domain-knowledge validation– actionable vs. trivial vs. inexplicable

• Algorithms provide first-pass based on statistics on how “unexpected” an association is

• Some standard statistics used:

C R

– support ≈ p(R&C)• percent of “baskets” where rule holds

– confidence ≈ p(R|C)• percent of times R holds when C holds

Page 12: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Support and Confidence• Find all the rules X Y with

minimum confidence and support– Support = probability that a transaction

contains {X,Y}• i.e., ratio of transactions in which X, Y

occur together to all transactions in database.

– Confidence = conditional probability that a transaction having X also contains Y

• i.e., ratio of transactions in which X, Y occur together to those in which X occurs.

In general confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS:

Confidence (LHS => RHS) = Support(LHS RHS) / Support(LHS)

Customerbuys diaper

Customer buys both

Customerbuys beer

Page 13: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Definition: Frequent Itemset• Itemset

– A collection of one or more items• Example: {Milk, Bread, Diaper}

– k-itemset• An itemset that contains k items

• Support count ()– Frequency of occurrence of itemset– E.g. ({Milk, Bread,Diaper}) = 2

• Support– Fraction of transactions that contain an

itemset– E.g. s({Milk, Bread, Diaper}) = 2/5

• Frequent Itemset– An itemset whose support is greater than or

equal to a minsup threshold

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

13

Page 14: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Definition: Association Rule

Example:Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk(

s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

Association Rule– An implication expression of the

form X Y, where X and Y are itemsets

– Example: {Milk, Diaper} {Beer}

Rule Evaluation Metrics– Support (s)

Fraction of transactions that contain both X and Y

– Confidence (c) Measures how often items in Y

appear in transactions thatcontain X

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Page 15: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Support and Confidence - Example

Transaction ID Items Bought

1001 A, B, C

1002 A, C

1003 A, D

1004 B, E, F

1005 A, D, F

Itemset {A, C} has a support of 2/5 = 40%

Rule {A} ==> {C} has confidence of 50%

Rule {C} ==> {A} has confidence of 100%

Support for {A, C, E} ?Support for {A, D, F} ?

Confidence for {A, D} ==> {F} ?Confidence for {A} ==> {D, F} ?

Itemset {A, C} has a support of 2/5 = 40%

Rule {A} ==> {C} has confidence of 50%

Rule {C} ==> {A} has confidence of 100%

Support for {A, C, E} ?Support for {A, D, F} ?

Confidence for {A, D} ==> {F} ?Confidence for {A} ==> {D, F} ?

Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

Page 16: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC

16

Example

• Transaction data• Assume:

minsup = 30%minconf = 80%

• An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7]

• Association rules from the itemset: Clothes Milk, Chicken [sup = 3/7, conf = 3/3] … … Clothes, Chicken Milk, [sup = 3/7, conf = 3/3]

t1: Beef, Chicken, Milkt2: Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milkt6: Chicken, Clothes, Milkt7: Chicken, Milk, Clothes

Page 17: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Mining Association RulesExample of Rules:

{Milk,Diaper} {Beer} (s=0.4, c=0.67){Milk,Beer} {Diaper} (s=0.4, c=1.0){Diaper,Beer} {Milk} (s=0.4, c=0.67){Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5)

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Observations:

• All the above rules are binary partitions of the same itemset:

{Milk, Diaper, Beer}

• Rules originating from the same itemset have identical support but can have different confidence

• Thus, we may decouple the support and confidence requirements

Page 18: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Drawback of Confidence

Coffee Coffee

Tea 15 5 20

Tea 75 5 80

90 10 100

Association Rule: Tea Coffee

Confidence= P(Coffee|Tea) = 0.75

but P(Coffee) = 0.9

Although confidence is high, rule is misleading

P(Coffee|Tea) = 0.9375

Page 19: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Mining Association Rules

• Two-step approach: 1. Frequent Itemset Generation

– Generate all itemsets whose support minsup

2. Rule Generation– Generate high confidence rules from each frequent itemset,

where each rule is a binary partitioning of a frequent itemset

• Frequent itemset generation is still computationally expensive

Page 20: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 20

Transaction data representation

• A simplistic view of shopping baskets, • Some important information not considered.

E.g, – the quantity of each item purchased and – the price paid.

Page 21: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 21

Many mining algorithms

• There are a large number of them!! • They use different strategies and data structures. • Their resulting sets of rules are all the same.

– Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.

• Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different.

• We study only one: the Apriori Algorithm

Page 22: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 22

The Apriori algorithm• The best known algorithm• Two steps:

– Find all itemsets that have minimum support (frequent itemsets, also called large itemsets).

– Use frequent itemsets to generate rules.

• E.g., a frequent itemset{Chicken, Clothes, Milk} [sup = 3/7]

and one rule from the frequent itemsetClothes Milk, Chicken [sup = 3/7, conf = 3/3]

Page 23: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 23

Step 1: Mining all frequent itemsets

• A frequent itemset is an itemset whose support is ≥ minsup.

• Key idea: The apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets

AB AC AD BC BD CD

A B C D

ABC ABD ACD BCD

Page 24: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Steps in Association Rule Discovery• Find the frequent itemsets

– Frequent item sets are the sets of items that have minimum support

– Support is “downward closed”, so, a subset of a frequent itemset must also be a frequent itemset

• if {AB} is a frequent itemset, both {A} and {B} are frequent itemsets• this also means that if an itemset that doesn’t satisfy minimum support,

none of its supersets will either (this is essential for pruning search space)

– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemsets)

• Use the frequent itemsets to generate association rules

Page 25: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Frequent Itemset Generation

25

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Given d items, there are 2d possible candidate itemsets

Page 26: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Mining Association Rules—An Example

For rule A C:support = support({A ,C}) = 50%confidence = support({A ,C})/support({A}) = 66.6%

The Apriori principle:Any subset of a frequent itemset must be frequent

26

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

User specifies these

Page 27: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Mining Frequent Itemsets: the Key Step

• Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset

• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset. Why? Make sure you can explain this.

– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

• Use the frequent itemsets to generate association rules– This step is more straightforward and requires less computation

so we focus on the first step

27

Page 28: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

28

Illustrating Apriori Principle

Found to be Infrequent

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDEPruned supersets

Page 29: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

The Apriori Algorithm

• Terminology:– Ck is the set of candidate k-itemsets

– Lk is the set of k-itemsets

• Join Step: Ck is generated by joining the set Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset– This is a bit confusing since we want to use it the other way. We

prune a candidate k-itemset if any of its k-1 itemsets are not in our list of frequent k-1 itemsets

• To utilize this you simply start with k=1, which is single-item itemsets and they you work your way up from there!

29

Page 30: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 30

The Algorithm• Iterative algo. (also called level-wise search):

Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on.– In each iteration k, only consider itemsets that

contain some k-1 frequent itemset.

• Find frequent itemsets of size 1: F1

• From k = 2– Ck = candidates of size k: those itemsets of size k

that could be frequent, given Fk-1

– Fk = those itemsets that are actually frequent, Fk Ck (need to scan the database once).

Page 31: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 31

Apriori candidate generation

• The candidate-gen function takes Lk-1 and returns a superset (called the candidates) of the set of all frequent k-itemsets. It has two steps– join step: Generate all possible candidate

itemsets Ck of length k – prune step: Remove those candidates in Ck that

cannot be frequent.

Page 32: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

How to Generate Candidates?

• Suppose the items in Lk-1 are listed in an order

• Step 1: self-joining Lk-1

– The description below is a bit confusing– all we do is splice two sets together so that only one new item is added (see example)

• insert into Ck

• select p.item1, p.item2, …, p.itemk-1, q.itemk-1

• from Lk-1 p, Lk-1 q

• where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

• Step 2: pruning

forall itemsets c in Ck do

forall (k-1)-subsets s of c do

if (s is not in Lk-1) then delete c from Ck

32

Page 33: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Example of Generating Candidates

• L3={abc, abd, acd, ace, bcd}

• Self-joining: L3*L3

– abcd from abc and abd

– acde from acd and ace

• Pruning:

– acde is removed because ade is not in L3

• C4={abcd}

33

Page 34: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 34

Example – Finding frequent itemsets

Dataset T TID Items

T100 1, 3, 4

T200 2, 3, 5

T300 1, 2, 3, 5

T400 2, 5 itemset:count

1. scan T C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3

F1: {1}:2, {2}:3, {3}:3, {5}:3

C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}

2. scan T C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2

F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2

C3: {2, 3,5}

3. scan T C3: {2, 3, 5}:2 F3: {2, 3, 5}

minsup=0.5

Page 35: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

The Apriori Algorithm — Example (minsup = 30%)

35

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2

Page 36: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 36

Step 2: Generating rules from frequent itemsets

• Frequent itemsets association rules• One more step is needed to generate association rules• For each frequent itemset X,

For each proper nonempty subset A of X, – Let B = X - A– A B is an association rule if

• Confidence(A B) ≥ minconf,

support(A B) = support(AB) = support(X)

confidence(A B) = support(A B) / support(A)

Page 37: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 37

Generating rules: an example• Suppose {2,3,4} is frequent, with sup=50%

– Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively

– These generate these association rules:• 2,3 4, confidence=100%• 2,4 3, confidence=100%• 3,4 2, confidence=67%• 2 3,4, confidence=67%• 3 2,4, confidence=67%• 4 2,3, confidence=67%• All rules have support = 50%

Page 38: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 38

Generating rules: summary

• To recap, in order to obtain A B, we need to have support(A B) and support(A)

• All the required information for confidence computation has already been recorded in itemset generation. No need to see the data T any more.

• This step is not as time-consuming as frequent itemsets generation.

Page 39: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

CS583, Bing Liu, UIC 39

On Apriori Algorithm

Seems to be very expensive• Level-wise search• K = the size of the largest itemset• It makes at most K passes over data• In practice, K is bounded (10). • The algorithm is very fast. Under some conditions, all

rules can be found in linear time.• Scale up to large data sets

Page 40: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Granularity of items

• One exception to the “ease” of applying association rules is selecting the granularity of the items.

• Should you choose:– diet coke?– coke product?– soft drink?– beverage?

• Should you include more than one level of granularity?Be careful

• (Some association finding techniques allow you to represent hierarchies explicitly)

Page 41: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Multiple-Level Association Rules• Items often form a hierarchy

– Items at the lower level are expected to have lower support– Rules regarding itemsets at appropriate levels could be quite useful– Transaction database can be encoded based on dimensions and levels

Food

Milk Bread

Skim 2% Wheat White

Page 42: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Mining Multi-Level Associations

• A top_down, progressive deepening approach– First find high-level strong rules:

• milk bread [20%, 60%]

– Then find their lower-level “weaker” rules:• 2% milk wheat bread [6%, 50%]

– When one threshold set for all levels; if support too high then it is possible to miss meaningful associations at low level; if support too low then possible generation of uninteresting rules

• different minimum support thresholds across multi-levels lead to different algorithms (e.g., decrease min-support at lower levels)

• Variations at mining multiple-level association rules– Level-crossed association rules:

• milk wonder wheat bread

– Association rules with multiple, alternative hierarchies:• 2% milk wonder bread

Page 43: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Rule Generation• Now that you have the frequent itemsets, you can generate

association rules– Split the frequent itemsets in all possible ways and prune

if the confidence is below min_confidence threshold• Rules that are left are called strong rules

– You may be given a rule template that constrains the rules• Rules with only one item on the right side• Rules with two items on the left and one on the right

– Rules of form {X,Y} Z

43

Page 44: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Rules from Previous Example

• What are the strong rules of the form {X,Y} Z if the confidence threshold is 75%?– We start with {2,3,5}

• {2,3} 5 (confidence = 2/2 = 100%): STRONG• {3,5} 2 (confidence = 2/2 = 100%): STRONG• {2,5} 3 (confidence = 2/3 = 66%): PRUNE!

• Note that in general you don’t just look at the frequent itemsets of maximum length. If we wanted strong rules of form X Y we would look at C2

44

Page 45: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Interestingness Measurements

• Objective measures– Two popular measurements:

• Support• Confidence

• Subjective measures (Silberschatz & Tuzhilin, KDD95)A rule (pattern) is interesting if– it is unexpected (surprising to the user); and/or– actionable (the user can do something with it)

45

Page 46: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Criticism to Support and Confidence

• Example 1:– Among 5000 students

• 3000 play basketball• 3750 eat cereal• 2000 both play basket ball and eat cereal

– play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%.

– play basketball not eat cereal [20%, 33.3%] is far more interesting, although with lower support and confidence

46

basketball not basketball sum(row)cereal 2000 1750 3750not cereal 1000 250 1250sum(col.) 3000 2000 5000

Lift of A => B = P(B|A)/P(B) and a rule is interesting if lift is not near 1.0

What is the lift of this rule?

(1/3)/(1250/5000) = 1.33

Page 47: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Customer Number vs. Transaction ID

• In the homework you may have a problem where there is a customer id for each transaction– You can be asked to do association analysis based on the

customer id• If this is so, you need to aggregate the transactions to the

customer level

47

Page 48: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Market-basket analysisand finding associations

• Do items occur together? (more than I might expect)

• Why might I care?– merchandising

• e.g., placing products in a retail space (physical or electronic), catalog design• packaging optional services

– recommendations• cross-selling and up-selling opportunities• mining credit-card data• developing customer loyalty and self-investment

– fraud detection• e.g., in insurance data, a doctor very often works on cases of particular lawyer

– simply understanding my business• are there “investment profiles” of my clients?• customer segmentation based on buying behavior• is anything strange going on?

Page 49: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Virtual items

• If you’re interested in including other possible variables, can create “virtual items”

• gift-wrap, used-coupon, new-store, winter-holidays, bought-nothing,…

Page 50: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Associations: Pros and Cons

• Pros– can quickly mine patterns describing business/customers/etc. without

major effort in problem formulation– virtual items allow much flexibility– unparalleled tool for hypothesis generation

• Cons– unfocused

• not clear exactly how to apply mined “knowledge”• only hypothesis generation

– can produce many, many rules!• may only be a few nuggets among them (or none)

Page 51: 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Association Rules

• Association rule types:– Actionable Rules – contain high-quality, actionable

information– Trivial Rules – information already well-known by

those familiar with the business– Inexplicable Rules – no explanation and do not

suggest action

• Trivial and Inexplicable Rules occur most often