1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by...

1

CISC 4631Data Mining

Lecture 09:

Association Rule Mining

Theses slides are based on the slides by • Tan, Steinbach and Kumar (textbook authors)• Prof. F. Provost (Stern, NYU)• Prof. B. Liu, UIC

What Is Association Mining?• Association rule mining:

– Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

• Applications:– Market Basket analysis, cross-marketing, catalog design,

loss-leader analysis, clustering, classification, etc.

2

Association Mining?• Examples.

– Rule form: “Body ead [support, confidence]”.– buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%]– buys(x, "bread") buys(x, "milk") [0.6%, 65%]– major(x, "CS") /\ takes(x, "DB") grade(x, "A") [1%, 75%]– age(X,30-45) /\ income(X, 50K-75K) buys(X, SUVcar)– age=“30-45”, income=“50K-75K” car=“SUV”

Market-basket analysisand finding associations

• Do items occur together?(more than I might expect)

• Proposed by Agrawal et al in 1993. • It is an important data mining model studied extensively by the

database and data mining community. • Assume all data are categorical.• No good algorithm for numeric data.• Initially used for Market Basket Analysis to find how items

purchased by customers are related.

Bread Milk [sup = 5%, conf = 100%]

Association Rule: Basic Concepts

• Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)

• Find: all rules that correlate the presence of one set of items with that of another set of items– E.g., 98% of people who purchase tires and auto accessories

also get automotive services done• Applications

– * Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)

– Home Electronics * (What other products should the store stocks up?)

– Detecting “ping-pong”ing of patients, faulty “collisions”5


• Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

6

Market-Basket transactions

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk},

Implication means co-occurrence, not causality!

An itemset is simply a set of items


– We are interested in rules that are• non-trivial (and possibly unexpected)• actionable• easily explainable

7

Examples from a Supermarket

• Can you think of association rules from a supermarket?

• Let’s say you identify association rules from a supermarket, how might you exploit them?– That is, if you are the store manager, how might

you make money? • Assume you have a rule of the form X Y

8

Supermarket examples

• If you have a rule X Y, you could:– Run a sale on X if you want to increase sales of Y– Locate the two items near each other– Locate the two items far from each other to make

the shopper walk through the store– Print out a coupon on checkout for Y if shopper

bought X but not Y

9

Association “rules” – standard format

Rule format: (A set can consist of just a single item)

If {set of items} Then {set of items}

Condition implies Results

If {Diapers, Baby Food}

Condition

{Beer, Chips}

Results

Then

Customerbuys diaper

Customer buys both

Customerbuys beer

What is an interesting association?

• Requires domain-knowledge validation– actionable vs. trivial vs. inexplicable

• Algorithms provide first-pass based on statistics on how “unexpected” an association is

• Some standard statistics used:

C R

– support ≈ p(R&C)• percent of “baskets” where rule holds

– confidence ≈ p(R|C)• percent of times R holds when C holds

Support and Confidence• Find all the rules X Y with

minimum confidence and support– Support = probability that a transaction

contains {X,Y}• i.e., ratio of transactions in which X, Y

occur together to all transactions in database.

– Confidence = conditional probability that a transaction having X also contains Y

• i.e., ratio of transactions in which X, Y occur together to those in which X occurs.

In general confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS:

Confidence (LHS => RHS) = Support(LHS RHS) / Support(LHS)

Customerbuys diaper

Customer buys both

Customerbuys beer

Definition: Frequent Itemset• Itemset

– A collection of one or more items• Example: {Milk, Bread, Diaper}

– k-itemset• An itemset that contains k items

• Support count ()– Frequency of occurrence of itemset– E.g. ({Milk, Bread,Diaper}) = 2

• Support– Fraction of transactions that contain an

itemset– E.g. s({Milk, Bread, Diaper}) = 2/5

• Frequent Itemset– An itemset whose support is greater than or

equal to a minsup threshold

TID Items

1 Bread, Milk





13

Definition: Association Rule

Example:Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk(

s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

Association Rule– An implication expression of the

form X Y, where X and Y are itemsets

– Example: {Milk, Diaper} {Beer}

Rule Evaluation Metrics– Support (s)

Fraction of transactions that contain both X and Y

– Confidence (c) Measures how often items in Y

appear in transactions thatcontain X

TID Items

1 Bread, Milk





Support and Confidence - Example

Transaction ID Items Bought

1001 A, B, C

1002 A, C

1003 A, D

1004 B, E, F

1005 A, D, F

Itemset {A, C} has a support of 2/5 = 40%

Rule {A} ==> {C} has confidence of 50%

Rule {C} ==> {A} has confidence of 100%

Support for {A, C, E} ?Support for {A, D, F} ?

Confidence for {A, D} ==> {F} ?Confidence for {A} ==> {D, F} ?

Itemset {A, C} has a support of 2/5 = 40%

Rule {A} ==> {C} has confidence of 50%

Rule {C} ==> {A} has confidence of 100%

Support for {A, C, E} ?Support for {A, D, F} ?

Confidence for {A, D} ==> {F} ?Confidence for {A} ==> {D, F} ?

Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

CS583, Bing Liu, UIC

16

Example

• Transaction data• Assume:

minsup = 30%minconf = 80%

• An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7]

• Association rules from the itemset: Clothes Milk, Chicken [sup = 3/7, conf = 3/3] … … Clothes, Chicken Milk, [sup = 3/7, conf = 3/3]

t1: Beef, Chicken, Milkt2: Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milkt6: Chicken, Clothes, Milkt7: Chicken, Milk, Clothes

Mining Association RulesExample of Rules:

{Milk,Diaper} {Beer} (s=0.4, c=0.67){Milk,Beer} {Diaper} (s=0.4, c=1.0){Diaper,Beer} {Milk} (s=0.4, c=0.67){Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5)

TID Items

1 Bread, Milk





Observations:

• All the above rules are binary partitions of the same itemset:

{Milk, Diaper, Beer}

• Rules originating from the same itemset have identical support but can have different confidence

• Thus, we may decouple the support and confidence requirements

Drawback of Confidence

Coffee Coffee

Tea 15 5 20

Tea 75 5 80

90 10 100

Association Rule: Tea Coffee

Confidence= P(Coffee|Tea) = 0.75

but P(Coffee) = 0.9

Although confidence is high, rule is misleading

P(Coffee|Tea) = 0.9375

Mining Association Rules

• Two-step approach: 1. Frequent Itemset Generation

– Generate all itemsets whose support minsup

2. Rule Generation– Generate high confidence rules from each frequent itemset,

where each rule is a binary partitioning of a frequent itemset

• Frequent itemset generation is still computationally expensive

CS583, Bing Liu, UIC 20

Transaction data representation

• A simplistic view of shopping baskets, • Some important information not considered.

E.g, – the quantity of each item purchased and – the price paid.


Many mining algorithms

• There are a large number of them!! • They use different strategies and data structures. • Their resulting sets of rules are all the same.

– Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.

• Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different.

• We study only one: the Apriori Algorithm


The Apriori algorithm• The best known algorithm• Two steps:

– Find all itemsets that have minimum support (frequent itemsets, also called large itemsets).

– Use frequent itemsets to generate rules.

• E.g., a frequent itemset{Chicken, Clothes, Milk} [sup = 3/7]

and one rule from the frequent itemsetClothes Milk, Chicken [sup = 3/7, conf = 3/3]


Step 1: Mining all frequent itemsets

• A frequent itemset is an itemset whose support is ≥ minsup.

• Key idea: The apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets

AB AC AD BC BD CD

A B C D

ABC ABD ACD BCD

Steps in Association Rule Discovery• Find the frequent itemsets

– Frequent item sets are the sets of items that have minimum support

– Support is “downward closed”, so, a subset of a frequent itemset must also be a frequent itemset

• if {AB} is a frequent itemset, both {A} and {B} are frequent itemsets• this also means that if an itemset that doesn’t satisfy minimum support,

none of its supersets will either (this is essential for pruning search space)

– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemsets)

• Use the frequent itemsets to generate association rules

Frequent Itemset Generation

25

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Given d items, there are 2d possible candidate itemsets

Mining Association Rules—An Example

For rule A C:support = support({A ,C}) = 50%confidence = support({A ,C})/support({A}) = 66.6%

The Apriori principle:Any subset of a frequent itemset must be frequent

26

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

User specifies these

Mining Frequent Itemsets: the Key Step

• Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset

• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset. Why? Make sure you can explain this.

– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

• Use the frequent itemsets to generate association rules– This step is more straightforward and requires less computation

so we focus on the first step

27

28

Illustrating Apriori Principle

Found to be Infrequent

null


A B C D E



ABCDE

null


A B C D E



ABCDEPruned supersets

The Apriori Algorithm

• Terminology:– Ck is the set of candidate k-itemsets

– Lk is the set of k-itemsets

• Join Step: Ck is generated by joining the set Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset– This is a bit confusing since we want to use it the other way. We

prune a candidate k-itemset if any of its k-1 itemsets are not in our list of frequent k-1 itemsets

• To utilize this you simply start with k=1, which is single-item itemsets and they you work your way up from there!

29


The Algorithm• Iterative algo. (also called level-wise search):

Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on.– In each iteration k, only consider itemsets that

contain some k-1 frequent itemset.

• Find frequent itemsets of size 1: F1

• From k = 2– Ck = candidates of size k: those itemsets of size k

that could be frequent, given Fk-1

– Fk = those itemsets that are actually frequent, Fk Ck (need to scan the database once).


Apriori candidate generation

• The candidate-gen function takes Lk-1 and returns a superset (called the candidates) of the set of all frequent k-itemsets. It has two steps– join step: Generate all possible candidate

itemsets Ck of length k – prune step: Remove those candidates in Ck that

cannot be frequent.

How to Generate Candidates?

• Suppose the items in Lk-1 are listed in an order

• Step 1: self-joining Lk-1

– The description below is a bit confusing– all we do is splice two sets together so that only one new item is added (see example)

• insert into Ck

• select p.item1, p.item2, …, p.itemk-1, q.itemk-1

• from Lk-1 p, Lk-1 q

• where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

• Step 2: pruning

forall itemsets c in Ck do

forall (k-1)-subsets s of c do

if (s is not in Lk-1) then delete c from Ck

32

Example of Generating Candidates

• L3={abc, abd, acd, ace, bcd}

• Self-joining: L3*L3

– abcd from abc and abd

– acde from acd and ace

• Pruning:

– acde is removed because ade is not in L3

• C4={abcd}

33


Example – Finding frequent itemsets

Dataset T TID Items

T100 1, 3, 4

T200 2, 3, 5

T300 1, 2, 3, 5

T400 2, 5 itemset:count

1. scan T C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3

F1: {1}:2, {2}:3, {3}:3, {5}:3

C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}

2. scan T C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2

F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2

C3: {2, 3,5}

3. scan T C3: {2, 3, 5}:2 F3: {2, 3, 5}

minsup=0.5

The Apriori Algorithm — Example (minsup = 30%)

35

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2


Step 2: Generating rules from frequent itemsets

• Frequent itemsets association rules• One more step is needed to generate association rules• For each frequent itemset X,

For each proper nonempty subset A of X, – Let B = X - A– A B is an association rule if

• Confidence(A B) ≥ minconf,

support(A B) = support(AB) = support(X)

confidence(A B) = support(A B) / support(A)


Generating rules: an example• Suppose {2,3,4} is frequent, with sup=50%

– Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively

– These generate these association rules:• 2,3 4, confidence=100%• 2,4 3, confidence=100%• 3,4 2, confidence=67%• 2 3,4, confidence=67%• 3 2,4, confidence=67%• 4 2,3, confidence=67%• All rules have support = 50%


Generating rules: summary

• To recap, in order to obtain A B, we need to have support(A B) and support(A)

• All the required information for confidence computation has already been recorded in itemset generation. No need to see the data T any more.

• This step is not as time-consuming as frequent itemsets generation.


On Apriori Algorithm

Seems to be very expensive• Level-wise search• K = the size of the largest itemset• It makes at most K passes over data• In practice, K is bounded (10). • The algorithm is very fast. Under some conditions, all

rules can be found in linear time.• Scale up to large data sets

Granularity of items

• One exception to the “ease” of applying association rules is selecting the granularity of the items.

• Should you choose:– diet coke?– coke product?– soft drink?– beverage?

• Should you include more than one level of granularity?Be careful

• (Some association finding techniques allow you to represent hierarchies explicitly)

Multiple-Level Association Rules• Items often form a hierarchy

– Items at the lower level are expected to have lower support– Rules regarding itemsets at appropriate levels could be quite useful– Transaction database can be encoded based on dimensions and levels

Food

Milk Bread

Skim 2% Wheat White

Mining Multi-Level Associations

• A top_down, progressive deepening approach– First find high-level strong rules:

• milk bread [20%, 60%]

– Then find their lower-level “weaker” rules:• 2% milk wheat bread [6%, 50%]

– When one threshold set for all levels; if support too high then it is possible to miss meaningful associations at low level; if support too low then possible generation of uninteresting rules

• different minimum support thresholds across multi-levels lead to different algorithms (e.g., decrease min-support at lower levels)

• Variations at mining multiple-level association rules– Level-crossed association rules:

• milk wonder wheat bread

– Association rules with multiple, alternative hierarchies:• 2% milk wonder bread

Rule Generation• Now that you have the frequent itemsets, you can generate

association rules– Split the frequent itemsets in all possible ways and prune

if the confidence is below min_confidence threshold• Rules that are left are called strong rules

– You may be given a rule template that constrains the rules• Rules with only one item on the right side• Rules with two items on the left and one on the right

– Rules of form {X,Y} Z

43

Rules from Previous Example

• What are the strong rules of the form {X,Y} Z if the confidence threshold is 75%?– We start with {2,3,5}

• {2,3} 5 (confidence = 2/2 = 100%): STRONG• {3,5} 2 (confidence = 2/2 = 100%): STRONG• {2,5} 3 (confidence = 2/3 = 66%): PRUNE!

• Note that in general you don’t just look at the frequent itemsets of maximum length. If we wanted strong rules of form X Y we would look at C2

44

Interestingness Measurements

• Objective measures– Two popular measurements:

• Support• Confidence

• Subjective measures (Silberschatz & Tuzhilin, KDD95)A rule (pattern) is interesting if– it is unexpected (surprising to the user); and/or– actionable (the user can do something with it)

45

Criticism to Support and Confidence

• Example 1:– Among 5000 students

• 3000 play basketball• 3750 eat cereal• 2000 both play basket ball and eat cereal

– play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%.

– play basketball not eat cereal [20%, 33.3%] is far more interesting, although with lower support and confidence

46

basketball not basketball sum(row)cereal 2000 1750 3750not cereal 1000 250 1250sum(col.) 3000 2000 5000

Lift of A => B = P(B|A)/P(B) and a rule is interesting if lift is not near 1.0

What is the lift of this rule?

(1/3)/(1250/5000) = 1.33

Customer Number vs. Transaction ID

• In the homework you may have a problem where there is a customer id for each transaction– You can be asked to do association analysis based on the

customer id• If this is so, you need to aggregate the transactions to the

customer level

47

Market-basket analysisand finding associations

• Do items occur together? (more than I might expect)

• Why might I care?– merchandising

• e.g., placing products in a retail space (physical or electronic), catalog design• packaging optional services

– recommendations• cross-selling and up-selling opportunities• mining credit-card data• developing customer loyalty and self-investment

– fraud detection• e.g., in insurance data, a doctor very often works on cases of particular lawyer

– simply understanding my business• are there “investment profiles” of my clients?• customer segmentation based on buying behavior• is anything strange going on?

Virtual items

• If you’re interested in including other possible variables, can create “virtual items”

• gift-wrap, used-coupon, new-store, winter-holidays, bought-nothing,…

Associations: Pros and Cons

• Pros– can quickly mine patterns describing business/customers/etc. without

major effort in problem formulation– virtual items allow much flexibility– unparalleled tool for hypothesis generation

• Cons– unfocused

• not clear exactly how to apply mined “knowledge”• only hypothesis generation

– can produce many, many rules!• may only be a few nuggets among them (or none)

Association Rules

• Association rule types:– Actionable Rules – contain high-quality, actionable

information– Trivial Rules – information already well-known by

those familiar with the business– Inexplicable Rules – no explanation and do not

suggest action

• Trivial and Inexplicable Rules occur most often

1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by...

Documents

Transcript of 1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by...