Introduction of Data Mining and Association Rules

14
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia

description

Introduction of Data Mining and Association Rules. cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia. What is data mining?. The automated extraction of hidden predictive information from database Allows users to analyze large databases to solve business decision problems. - PowerPoint PPT Presentation

Transcript of Introduction of Data Mining and Association Rules

Page 1: Introduction  of Data Mining and Association Rules

Introduction of Data Mining and Association Rules

cs157 Spring 2009 Instructor: Dr. Sin-Min Lee

Student: Dongyi Jia

Page 2: Introduction  of Data Mining and Association Rules

What is data mining?

The automated extraction of hidden predictive information from database

Allows users to analyze large databases to solve business decision problems.

An extension of statistics, with a few artificial intelligence and machine learning twists thrown in.

Attempts to discover rules and patterns from data.

Page 3: Introduction  of Data Mining and Association Rules

Data Mining - On What Kind of Data

In principle, data mining should be applic

able to any kind of information repositiory:

● relational databases

● data warehouses

● transactional and advanced databases

● flat files

● World Wide Web

Page 4: Introduction  of Data Mining and Association Rules

Data Mining Functionalities-What kinds of Patterns Can be Mined?

Association AnalysisClassification and PredictionCluster AnalysisEvolution Analysis

Page 5: Introduction  of Data Mining and Association Rules

Applications of data mining Require some sort of Prediction: for example: when a person applies for a

credit card, the credit-card company wants to predict if the person is a good credit risk.

Looks for Associations:

for example: if a customer buys a book, an on-line bookstore may suggest other associated books.

Page 6: Introduction  of Data Mining and Association Rules

Associations Rule Discovery Task: Discovering association rules amo

ng items in a transaction database. How are association rules mined from lar

ge database?

1. Find all frequent itemset: each of these itemsets will occur at least as frequent as pre-determined minimum support count.

2. Generate strong association rules from the frequent itemsets: these rules must satisfy minimum support and minimum confidence.

Page 7: Introduction  of Data Mining and Association Rules

Association Rules (cont.)

Retail shops are often interested in associations between items that people buy.

Someone who buys bread is quite likely also to buy milk.

association rule: bread => milk A person who brought the book Database

System Concepts is quite likely also to buy the book Operating System Concepts.

association rule: DSC => OSC

Page 8: Introduction  of Data Mining and Association Rules

Association Rules (cont.)

Two numbers: Support: is a measure of what fraction

of the population satisfies both the antecedent and the consequent of the true.

Confidence: is a measure of how often the consequent is true when the antecedent is true.

Page 9: Introduction  of Data Mining and Association Rules

Association Rules (cont.)

Let I = {i1, i2, …im} be a total set of items

D is a set of transactions

d is one transaction consists of a set of items d I

Association rule: X Y where X I ,Y I and X Y = support = (#of transactions contain X Y ) /D confidence = (#of transactions contain X Y ) /

#of transactions contain X

Page 10: Introduction  of Data Mining and Association Rules

example Example of transaction data:

1. CD player, music’s CD, music’s book2. CD player, music’s CD3. music’s CD, music’s book4. CD player

I = {CD player, music’s CD, music’s book} D = 4 #of transactions contain both CD player, musi

c’s CD =2 #of transactions contain CD player =3 CD player music’s CD (sup=2/4 , conf =2/3

)

Page 11: Introduction  of Data Mining and Association Rules

Association Rules (cont.)

Rule support and confidence reflect the usefulness and certainty of discovered rules.

A support of 50% for association rule means that 50% of all the transactions under analysis that CD’s player and music CD are purchased together.

A confidence of 67% means that 67% of the customers who purchased a CD’s player also bought music CD.

Page 12: Introduction  of Data Mining and Association Rules

Strong Association Rule

User sets support and confidence thresholds.

Rules above support threshold have LARGE support.

Rules above confidence threshold have HIGH confidence.

Rules satisfying both are said to be STRONG.

Page 13: Introduction  of Data Mining and Association Rules

References

Professor Lee’s lectures http://www.cs.sjsu.edu/~lee/cs157b/cs157b

.html Rui Zhao, SJSU

http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html

Jiawei Han, Micheline Kamber

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers

Page 14: Introduction  of Data Mining and Association Rules

Thank you !