Final Exam Review. The following is a list of items that you should review in preparation for the...

18
Final Exam Review

Transcript of Final Exam Review. The following is a list of items that you should review in preparation for the...

Page 1: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Final Exam Review

Page 2: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

• The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides may be on the exam, and there may be items on the exam not on this slide.

Page 3: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Overview of three techniques

• Decision Tree• Clustering• Association Rule

Page 4: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

What is classification?

• Determining to what group a data element belongs– Or “attributes” of that “entity”

• Examples– Determining whether a

customer should be given a loan– Flagging a credit card transaction

as a fraudulent charge– Categorizing a news story as

finance, entertainment, or sports

Page 5: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

What is Cluster Analysis?

Grouping data so that elements in a group will be• Similar (or related) to

one another• Different (or unrelated)

from elements in other groups http://www.baseball.bornbybits.com/blog/uploaded_images/

Takashi_Saito-703616.gif

Distance within clusters is minimized

Distance between clusters is maximized

Page 6: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Uses• What products are bought together?• Amazon’s recommendation engine• Telephone calling patterns

Association Mining

Find out which items predict the occurrence of other items

Also known as “affinity analysis” or “market basket” analysis

Page 7: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Match Scenario with Data Mining Technique

• Which data mining technique (Decision Trees, Clustering, or Association Rules) would be most appropriate to answer each question below?

– What products are bought at the same time as coke?

– What is the probability that a 57-year-old female in a low income family will die because of cancer?

– How many types of customers visit fresh grocery?

Page 8: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Interpret your model

• You should be able to interpret your model from two aspects:– First, whether it is a good model– Second, how you can use your model to help you

answer question/make decision.

Page 9: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Basic Statistic Information

• Be able to understand the basic about your data by looking at explore window with descriptive statistics– Distribution, Average, Range and etc.– And what those numbers can tell you.

Page 10: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

What can you tell from this histogram? Do most people spend a lot or not?

Page 11: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Decision Tree

• Whether it is a good model– Use Subtree Assessment Plot to find out Average

Square Error and/or Misclassification Rate. Lower average square error and misclassification rate suggest better model.

– Think why these numbers can provide you the optimal number of leaf.

• How to use your model– Follow the tree path that matches the descriptions

in your question.

Page 12: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Why the optimal number of leaves is 13?

Page 13: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

What is the likelihood of 52 years old man with affluence of 5 buying an organic product?

Page 14: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Cluster and Segment

• Whether it is a good model– You want to have higher cohesion within your

cluster and higher separation between your cluster.– Higher Root Mean Square Standard Deviation

suggests lower cohesion. Higher distance to nearest cluster suggests higher separation

• How to use your model– Be able to tell the difference each cluster has

against your overall result.

Page 15: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Which model is better in terms of cluster cohesion?

For each model, which cluster has the highest cohesion?

How will the maximum number of clusters in you model may affect the cohesion and separation?

Page 16: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Is the sale of stretch jeans of cluster 2 better than the average sales of stretch jeans of entire population?

Page 17: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Association Rule

• Whether it is a good model– Confidence: the chance of Y is bought when X has

been bought– Support: the chance of X and Y bought together– Lift: the ration of confidence to the chance of X

and Y are bought together coincidentally. • How to use your model– Able to give suggestions based on your analysis

Page 18: Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.

Does coke often be bought with Beer or Pepsi? Why?

Can you give one suggestion that two products should been put close to each other? Can you give one suggestion that two products should not been put together? Why?