An ExcelAn Excel-based Data Mining Toolbased Data Mining...
Transcript of An ExcelAn Excel-based Data Mining Toolbased Data Mining...
An Excel-based Data Mining ToolAn Excel-based Data Mining Tool
Chapter 4
4 1 The iData Analyzer4.1 The iData Analyzer
DataInterface
PreProcessor
HeuristicAgent
LargeDataset
Mining
Yes
No
NeuralNetworks ESX
Technique
GenerateRules
RulesRuleMaker
Explaination
No
Yes
Yes
No
ReportGenerator
ExcelSheets
Figure 4.1 The iDA system architecture
Figure 4.2 A successful installation
4.2 ESX: A Multipurpose Tool for Data Miningfor Data Mining
RootRoot Level
CnC1 C2Concept Level . . .
I11 I1jI12Instance Level . . . I21 I2kI22
. . . In1 InlIn2. . .
1j 2k nl
Figure 4.3 An ESX concept hierarchy
4.3 iDAV Format for Data Mining
Table 4.1 • Credit Card Promotion Database: iDAV Format
Income Magazine Watch Life Insurance Credit CardIncome Magazine Watch Life Insurance Credit CardRange Promotion Promotion Promotion Insurance Sex Age
C C C C C C RI I I I I I I
40–50K Yes No No No Male 4530–40K Yes Yes Yes No Female 4040–50K No No No No Male 4230–40K Yes Yes Yes Yes Male 4350–60K Yes No Yes No Female 3820–30K No No No No Female 5530 40K Yes No Yes Yes Male 3530–40K Yes No Yes Yes Male 3520–30K No Yes No No Male 2730–40K Yes No No No Male 4330–40K Yes Yes Yes No Female 4140–50K No Yes Yes No Female 4320–30K No Yes Yes No Male 2950–60K Yes Yes Yes No Female 3940–50K No Yes No No Male 5520–30K No No Yes Yes Female 19
Table 4.2 • Values for Attribute Usage
Character Usage
I The attribute is used as an input attribute.
U The attribute is not used. D The attribute is not used for classif icat ion or clustering, but g,
attribute value summary information is displayed in all output reports.
O The attribute is used as an output attribute. For supervised learning w ith ESX, exactly one categorical attribute is selected as the output attribute.attribute.
4.4 A Five-step Approach for Unsupervised ClusteringUnsupervised Clustering
Step 1: Enter the Data to be MinedStep 1: Enter the Data to be MinedStep 2: Perform a Data Mining SessionStep 3: Read and Interpret S mmar Res ltsStep 3: Read and Interpret Summary ResultsStep 4: Read and Interpret Individual Class ResultsSt 5 Vi li I di id l Cl R lStep 5: Visualize Individual Class Rules
Step 1: Enter The Data To Be Mined p
Figure 4.4 The Credit Card Promotion Database
Step 2: Perform A Data Mining Session
Figure 4.5 Unsupervised settings for ESX
Figure 4.6 RuleMaker options
Step 3: Read and InterpretStep 3: Read and Interpret Summary Resultsy
• Class Resemblance ScoresClass Resemblance Scores• Domain Resemblance Score
D i P di t bilit• Domain Predictability
Figure 4.8 Summery statistics for the Acme credit card promotion database
Figure 4.9 Statistics for numerical attributes and common categorical
attribute values
Step 4: Read and InterpretStep 4: Read and Interpret Individual Class Results
• Class Predictability is a within-class measure.
• Class Predictiveness is a between-class measure.
Figure 4.10 Class 3 summary results
Figure 4.11 Necessary and sufficient attribute values for Class 3
Step 5: Visualize IndividualStep 5: Visualize Individual Class Rules
Figure 4.7 Rules for the credit card promotion database
4.5 A Six-Step Approach for S i d iSupervised Learning
Step 1: Choose an Output AttributeStep 1: Choose an Output AttributeStep 2: Perform the Mining SessionStep 3: Read and Interpret S mmar Res ltsStep 3: Read and Interpret Summary ResultsStep 4: Read and Interpret Test Set ResultsSt 5 R d d I t t Cl R ltStep 5: Read and Interpret Class ResultsStep 6: Visualize and Interpret Class Rules
Read and Interpret Test Set Results
Figure 4.12 Test set instance classification
4.6 Techniques for Generating RulesRules
1. Define the scope of the rules.2. Choose the instances.3. Set the minimum rule correctness.4. Define the minimum rule coverage.5. Choose an attribute significance value.
4.7 Instance Typicality
Typicality Scores
• Identify prototypical and outlier instances• Identify prototypical and outlier instances.• Select a best set of training instances.• Used to compute individual instance
classification confidence scores.
Figure 4.13 Instance typicality
4.8 Special Considerations and Features
• Avoid Mining Delays• The Quick Mine Feature• Erroneous and Missing Datag