An ExcelAn Excel-based Data Mining Toolbased Data Mining...

30
An Excel-based Data Mining Tool An Excel-based Data Mining Tool Chapter 4

Transcript of An ExcelAn Excel-based Data Mining Toolbased Data Mining...

Page 1: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

An Excel-based Data Mining ToolAn Excel-based Data Mining Tool

Chapter 4

Page 2: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4 1 The iData Analyzer4.1 The iData Analyzer

Page 3: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

DataInterface

PreProcessor

HeuristicAgent

LargeDataset

Mining

Yes

No

NeuralNetworks ESX

Technique

GenerateRules

RulesRuleMaker

Explaination

No

Yes

Yes

No

ReportGenerator

ExcelSheets

Figure 4.1 The iDA system architecture

Page 4: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.2 A successful installation

Page 5: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.2 ESX: A Multipurpose Tool for Data Miningfor Data Mining

Page 6: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

RootRoot Level

CnC1 C2Concept Level . . .

I11 I1jI12Instance Level . . . I21 I2kI22

. . . In1 InlIn2. . .

1j 2k nl

Figure 4.3 An ESX concept hierarchy

Page 7: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.3 iDAV Format for Data Mining

Page 8: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Table 4.1 • Credit Card Promotion Database: iDAV Format

Income Magazine Watch Life Insurance Credit CardIncome Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age

C C C C C C RI I I I I I I

40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530 40K Yes No Yes Yes Male 3530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19

Page 9: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Table 4.2 • Values for Attribute Usage

Character Usage

I The attribute is used as an input attribute.

U The attribute is not used. D The attribute is not used for classif icat ion or clustering, but g,

attribute value summary information is displayed in all output reports.

O The attribute is used as an output attribute. For supervised learning w ith ESX, exactly one categorical attribute is selected as the output attribute.attribute.

Page 10: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.4 A Five-step Approach for Unsupervised ClusteringUnsupervised Clustering

Step 1: Enter the Data to be MinedStep 1: Enter the Data to be MinedStep 2: Perform a Data Mining SessionStep 3: Read and Interpret S mmar Res ltsStep 3: Read and Interpret Summary ResultsStep 4: Read and Interpret Individual Class ResultsSt 5 Vi li I di id l Cl R lStep 5: Visualize Individual Class Rules

Page 11: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Step 1: Enter The Data To Be Mined p

Page 12: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.4 The Credit Card Promotion Database

Page 13: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Step 2: Perform A Data Mining Session

Page 14: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.5 Unsupervised settings for ESX

Page 15: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.6 RuleMaker options

Page 16: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Step 3: Read and InterpretStep 3: Read and Interpret Summary Resultsy

• Class Resemblance ScoresClass Resemblance Scores• Domain Resemblance Score

D i P di t bilit• Domain Predictability

Page 17: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.8 Summery statistics for the Acme credit card promotion database

Page 18: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.9 Statistics for numerical attributes and common categorical

attribute values

Page 19: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Step 4: Read and InterpretStep 4: Read and Interpret Individual Class Results

• Class Predictability is a within-class measure.

• Class Predictiveness is a between-class measure.

Page 20: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.10 Class 3 summary results

Page 21: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.11 Necessary and sufficient attribute values for Class 3

Page 22: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Step 5: Visualize IndividualStep 5: Visualize Individual Class Rules

Page 23: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.7 Rules for the credit card promotion database

Page 24: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.5 A Six-Step Approach for S i d iSupervised Learning

Step 1: Choose an Output AttributeStep 1: Choose an Output AttributeStep 2: Perform the Mining SessionStep 3: Read and Interpret S mmar Res ltsStep 3: Read and Interpret Summary ResultsStep 4: Read and Interpret Test Set ResultsSt 5 R d d I t t Cl R ltStep 5: Read and Interpret Class ResultsStep 6: Visualize and Interpret Class Rules

Page 25: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Read and Interpret Test Set Results

Figure 4.12 Test set instance classification

Page 26: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.6 Techniques for Generating RulesRules

1. Define the scope of the rules.2. Choose the instances.3. Set the minimum rule correctness.4. Define the minimum rule coverage.5. Choose an attribute significance value.

Page 27: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.7 Instance Typicality

Page 28: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Typicality Scores

• Identify prototypical and outlier instances• Identify prototypical and outlier instances.• Select a best set of training instances.• Used to compute individual instance

classification confidence scores.

Page 29: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

Figure 4.13 Instance typicality

Page 30: An ExcelAn Excel-based Data Mining Toolbased Data Mining Toolsmiertsc/4397cis/Roiger_Chapter04.pdf · Microsoft PowerPoint - Roiger_Chapter04.pptx Author: rcte2 Created Date: 9/27/2011

4.8 Special Considerations and Features

• Avoid Mining Delays• The Quick Mine Feature• Erroneous and Missing Datag