MS Word.doc

29
WEKA CS 595 Knowledge Discovery and Datamining Assignment # 1 Evaluation Report for WEKA (Waikato Environment for Knowledge Analysis) 1 of 29

description

 

Transcript of MS Word.doc

Page 1: MS Word.doc

WEKA

CS 595Knowledge Discovery and Datamining

Assignment # 1Evaluation Report for WEKA

(Waikato Environment for Knowledge Analysis)

Presented By:Manoj WartikarSameer Sagade

Date:14th March, 2000.

1 of 23

Page 2: MS Word.doc

WEKA

Weka Machine Learning Project.

Machine Learning:An exciting and potentially far-reaching development in contemporary

computer science is the invention and application of methods of Machine Learning. These enable a computer program to automatically analyze a large body of data and decide what information is most relevant. This crystallized information can then be used to help people make decision faster and more accurately.

One of the central problems of the information age is dealing with the enormous explosion in the amount of raw information that is available. Machine learning (ML) has the potential to sift through this mass of information and convert it into knowledge that people can use. So far, however, it has been used mainly on small problems under well-controlled conditions.

The aim of the Weka Project is to bring the technology out of the laboratory and provide solutions that can make a difference to people. The overall goal of this research programme is to build a state-of-the art facility for development of techniques of ML.

Objectives:The team at Waikato has incorporated several standard ML techniques

into software “Workbench” abbreviated WEKA (Waikato Environment for Knowledge Analysis). With the use of WEKA, a specialist in a particular field is able to use ML and derive useful knowledge from databases that are far too large to be analyzed by hand. The main objectives of WEKA are to

Make Machine Learning (ML) techniques generally available; Apply them to practical problems as in agriculture; Develop new machine learning algorithms; Design a theoretical framework for the field.

Documented Features:The WEKA presents a collection of algorithms for solving real-world data

mining problems. The software is written in Java 2 and includes a uniform interface to the standard techniques in machine learning. The following techniques in Data mining are implemented in WEKA.

1. Attribute Selection.2. Clustering.3. Classifiers (both numeric and non-numeric).4. Association Rules.5. Filters.6. Estimators.

2 of 23

Page 3: MS Word.doc

WEKA

Out of these options, only Classifiers, association rules and Filters are available as direct executables. All the remaining functions are available as API’s. The data required by the software is in the “.Arff” format. Sample databases are also provided with the software.

Features: The WEKA package is comprised of a number of classes and

inheritances. We have to create an instance of any class to execute it. The functionality of WEKA is classified based on the steps of Machine learning.

Classifiers:The Classifiers class prints out a decision tree classifier for the dataset

given as input. Also A ten-fold cross-validation estimation of its performance is also calculated. The Classifiers package implements the most common techniques separately for categorical and numerical values

a) Classifiers for categorical prediction:

1. Weka.classifiers.IBk K-nearest neighbor learner2. Weka.classifiers.j48.J48 C4.5 decision trees 3. Weka.classifiers.j48.PART Rule learner 4. Weka.classifiers.NaiveBayes Naive Bayes with/without kernels5. Weka.classifiers.OneR Holte's oner6. Weka.classifiers.KernelDensity Kernel density classifier7. Weka.classifiers.SMO Support vector machines8. Weka.classifiers.Logistic Logistic regression9. Weka.classifiers.AdaBoostM1 Adaboost10. Weka.classifiers.LogitBoost Logit boost11. Weka.classifiers.DecisionStump Decision stumps (for boosting)

3 of 23

Page 4: MS Word.doc

WEKA

Sample Executions of the various categorical CLASSIFIER Algorithms:

K Nearest Neighbour Algorithm:

>java weka.classifiers.IBk -t data/iris.arff

IB1 instance-based classifierusing 1 nearest neighbour(s) for classification

=== Error on training data ===

Correctly Classified Instances 150 100 %Incorrectly Classified Instances 0 0 %Mean absolute error 0.0085Root mean squared error 0.0091Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 50 0 | b = Iris-versicolor 0 0 50 | c = Iris-virginica

=== Stratified cross-validation ===

Correctly Classified Instances 144 96 %Incorrectly Classified Instances 6 4 %Mean absolute error 0.0356Root mean squared error 0.1618Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 3 47 | c = Iris-virginica

4 of 23

Page 5: MS Word.doc

WEKA

J48 Pruned Tree Algorithm:

>java weka.classifiers.j48.J48 -t data/iris.arff

J48 pruned tree------------------

petalwidth <= 0.6: Iris-setosa (50.0)petalwidth > 0.6| petalwidth <= 1.7| | petallength <= 4.9: Iris-versicolor (48.0/1.0)| | petallength > 4.9| | | petalwidth <= 1.5: Iris-virginica (3.0)| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)| petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves : 5

Size of the tree : 9

=== Error on training data ===

Correctly Classified Instances 147 98 %Incorrectly Classified Instances 3 2 %Mean absolute error 0.0233Root mean squared error 0.108 Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 49 1 | b = Iris-versicolor 0 2 48 | c = Iris-virginica

=== Stratified cross-validation ===Correctly Classified Instances 143 95.3333 %Incorrectly Classified Instances 7 4.6667 %Mean absolute error 0.0391Root mean squared error 0.1707Total Number of Instances 150

=== Confusion Matrix === a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor

5 of 23

Page 6: MS Word.doc

WEKA

=== Error on training data ===

Correctly Classified Instances 144 96 %Incorrectly Classified Instances 6 4 %Mean absolute error 0.0324Root mean squared error 0.1495 50 0 0 | a = Iris-setosa 0 48 2 | b = Iris-versicolor 0 4 46 | c = Iris-virginica

SMO (support vector machines) and logistic regression algorithms can handle only two class data sets so are not evaluated.

AdaBoost, Logit Boost,Decision Stump are algorithms which boost the performance of the two classifier algorithms. The boosted algorithms are run inside these booster algorithms. These booster algorithms monitor the execution and applies appropriate boosting patches to the them.

6 of 23

Page 7: MS Word.doc

WEKA

b) Classifiers for numerical prediction:

1. weka.classifiers.LinearRegression Linear regression2. weka.classifiers.m5.M5Prime Model trees3. weka.classifiers.Ibk K-nearest neighbor learner4. weka.classifiers.LWR Locally weighted regression5. weka.classifiers.RegressionByDiscretization Uses categorical classifiers

Sample Executions of the various categorical CLASSIFIER Algorithms:

Linear Regression Model:

> java weka.classifiers.LinearRegression -t data/cpu.arff

Linear Regression Model

class =

-152.7641 * vendor=microdata,formation,prime,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + 141.8644 * vendor=formation,prime,harris,dec,wang,perkin-elmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + -38.2268 * vendor=burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + 39.4748 * vendor=cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + -39.5986 * vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + 21.4119 * vendor=ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + -41.2396 * vendor=gould,siemens,nas,adviser,sperry,amdahl + 32.0545 * vendor=siemens,nas,adviser,sperry,amdahl + -113.6927 * vendor=adviser,sperry,amdahl + 176.5204 * vendor=sperry,amdahl + -51.2583 * vendor=amdahl + 0.0616 * MYCT + 0.0171 * MMIN + 0.0054 * MMAX + 0.6654 * CACH + -1.4159 * CHMIN + 1.5538 * CHMAX +

7 of 23

Page 8: MS Word.doc

WEKA

-41.4854

=== Error on training data ===

Correlation coefficient 0.963 Mean absolute error 28.4042Root mean squared error 41.6084Relative absolute error 32.5055 %Root relative squared error 26.9508 %Total Number of Instances 209

=== Cross-validation ===

Correlation coefficient 0.9328Mean absolute error 35.014 Root mean squared error 55.6291Relative absolute error 39.9885 %Root relative squared error 35.9513 %Total Number of Instances 209

8 of 23

Page 9: MS Word.doc

WEKA

Pruned Training Model Tree:

> java weka.classifiers.m5.M5Prime -t data/cpu.arff

Pruned training model tree:

MMAX <= 14000 : LM1 (141/4.18%)MMAX > 14000 : LM2 (68/51.8%)

Models at the leaves:

Smoothed (complex):

LM1: class = 4.15 - 2.05vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + 5.43vendor=adviser,sperry,amdahl - 5.78vendor=amdahl + 0.00638MYCT + 0.00158MMIN + 0.00345MMAX + 0.552CACH + 1.14CHMIN + 0.0945CHMAX LM2: class = -113 - 56.1vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl + 10.2vendor=adviser,sperry,amdahl - 10.9vendor=amdahl + 0.012MYCT + 0.0145MMIN + 0.0089MMAX + 0.808CACH + 1.29CHMAX

Number of Leaves : 2

=== Error on training data ===

Correlation coefficient 0.9853Mean absolute error 13.4072Root mean squared error 26.3977Relative absolute error 15.3431 %Root relative squared error 17.0985 %Total Number of Instances 209

=== Cross-validation ===

Correlation coefficient 0.9767Mean absolute error 13.1239Root mean squared error 33.4455Relative absolute error 14.9884 %

9 of 23

Page 10: MS Word.doc

WEKA

Root relative squared error 21.6147 %Total Number of Instances 209

10 of 23

Page 11: MS Word.doc

WEKA

K Nearest Neighbour classifier Algorithm:

> java weka.classifiers.IBk -t data/cpu.arff

IB1 instance-based classifierusing 1 nearest neighbour(s) for classification

=== Error on training data ===

Correlation coefficient 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 %Root relative squared error 0 %Total Number of Instances 209

=== Cross-validation ===

Correlation coefficient 0.9475Mean absolute error 20.8589Root mean squared error 53.8162Relative absolute error 23.8223 %Root relative squared error 34.7797 %Total Number of Instances 209

11 of 23

Page 12: MS Word.doc

WEKA

Locally Weighted Regression:

> java weka.classifiers.LWR -t data/cpu.arff

Locally weighted regression===========================Using linear weighting kernelsUsing all neighbours

=== Error on training data ===

Correlation coefficient 0.9967Mean absolute error 8.9683Root mean squared error 12.6133Relative absolute error 10.2633 %Root relative squared error 8.1699 %Total Number of Instances 209

=== Cross-validation ===

Correlation coefficient 0.9808Mean absolute error 14.9006Root mean squared error 31.0836Relative absolute error 17.0176 %Root relative squared error 20.0884 %Total Number of Instances 209

12 of 23

Page 13: MS Word.doc

WEKA

Regression by Descretization:

> java weka.classifiers.RegressionByDiscretization -t data/cpu.arff -W weka.classifiers.Ibk

// Sub classifier is selected by categorical classification

Regression by discretization

Class attribute discretized into 10 values

Subclassifier: weka.classifiers.Ibk

IB1 instance-based classifierusing 1 nearest neighbour(s) for classification

=== Error on training data ===

Correlation coefficient 0.9783Mean absolute error 32.0353Root mean squared error 35.6977Relative absolute error 36.6609 %Root relative squared error 23.1223 %Total Number of Instances 209

=== Cross-validation ===

Correlation coefficient 0.9244Mean absolute error 41.5572Root mean squared error 64.7253Relative absolute error 47.4612 %Root relative squared error 41.8299 %Total Number of Instances 209

13 of 23

Page 14: MS Word.doc

WEKA

Association rules:Association rule mining finds interesting association or correlation

relationships among a large set of data items. With massive amounts of data continuously being collected and stored in databases, many industries are becoming interested in mining association rules from their databases. For example, the discovery of interesting association relationships among huge amounts of business transaction records can help catalog design, cross marketing, loss-leader analysis, and other business decision making processes.

A typical example of association rule mining is market basket analysis. This process analyzes customer-buying habits by finding associations between the different items that customer’s place in their “shopping baskets". The discovery of such associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of bread) on the same trip to the supermarket? Such information can lead to increased sales.

The WEKA software efficiently produces association rules for the given data set. The Apriori algorithm is used as the foundation of the package. It gives all the itemsets and the subsequent frequent sets for the specified minimal support and confidence.

A typical output of the Association package is :

Apriori Principle:

> java weka.associations.Apriori -t data/weather.nominal.arff -I yes

Apriori=======

Minimum support: 0.2Minimum confidence: 0.9Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 12

Large Itemsets L(1):outlook=sunny 5outlook=overcast 4outlook=rainy 5temperature=hot 4temperature=mild 6temperature=cool 4

14 of 23

Page 15: MS Word.doc

WEKA

humidity=high 7humidity=normal 7windy=TRUE 6windy=FALSE 8play=yes 9play=no 5

Size of set of large itemsets L(2): 47

Large Itemsets L(2):outlook=sunny temperature=hot 2outlook=sunny temperature=mild 2outlook=sunny humidity=high 3outlook=sunny humidity=normal 2outlook=sunny windy=TRUE 2outlook=sunny windy=FALSE 3outlook=sunny play=yes 2outlook=sunny play=no 3outlook=overcast temperature=hot 2outlook=overcast humidity=high 2outlook=overcast humidity=normal 2outlook=overcast windy=TRUE 2outlook=overcast windy=FALSE 2outlook=overcast play=yes 4outlook=rainy temperature=mild 3outlook=rainy temperature=cool 2outlook=rainy humidity=high 2outlook=rainy humidity=normal 3outlook=rainy windy=TRUE 2outlook=rainy windy=FALSE 3outlook=rainy play=yes 3outlook=rainy play=no 2temperature=hot humidity=high 3temperature=hot windy=FALSE 3temperature=hot play=yes 2temperature=hot play=no 2temperature=mild humidity=high 4temperature=mild humidity=normal 2temperature=mild windy=TRUE 3temperature=mild windy=FALSE 3temperature=mild play=yes 4temperature=mild play=no 2temperature=cool humidity=normal 4temperature=cool windy=TRUE 2temperature=cool windy=FALSE 2temperature=cool play=yes 3

15 of 23

Page 16: MS Word.doc

WEKA

humidity=high windy=TRUE 3humidity=high windy=FALSE 4humidity=high play=yes 3humidity=high play=no 4humidity=normal windy=TRUE 3humidity=normal windy=FALSE 4humidity=normal play=yes 6windy=TRUE play=yes 3windy=TRUE play=no 3windy=FALSE play=yes 6windy=FALSE play=no 2

Size of set of large itemsets L(3): 39

Large Itemsets L(3):outlook=sunny temperature=hot humidity=high 2outlook=sunny temperature=hot play=no 2outlook=sunny humidity=high windy=FALSE 2outlook=sunny humidity=high play=no 3outlook=sunny humidity=normal play=yes 2outlook=sunny windy=FALSE play=no 2outlook=overcast temperature=hot windy=FALSE 2outlook=overcast temperature=hot play=yes 2outlook=overcast humidity=high play=yes 2outlook=overcast humidity=normal play=yes 2outlook=overcast windy=TRUE play=yes 2outlook=overcast windy=FALSE play=yes 2outlook=rainy temperature=mild humidity=high 2outlook=rainy temperature=mild windy=FALSE 2outlook=rainy temperature=mild play=yes 2outlook=rainy temperature=cool humidity=normal 2outlook=rainy humidity=normal windy=FALSE 2outlook=rainy humidity=normal play=yes 2outlook=rainy windy=TRUE play=no 2outlook=rainy windy=FALSE play=yes 3temperature=hot humidity=high windy=FALSE 2temperature=hot humidity=high play=no 2temperature=hot windy=FALSE play=yes 2temperature=mild humidity=high windy=TRUE 2temperature=mild humidity=high windy=FALSE 2temperature=mild humidity=high play=yes 2temperature=mild humidity=high play=no 2temperature=mild humidity=normal play=yes 2temperature=mild windy=TRUE play=yes 2temperature=mild windy=FALSE play=yes 2temperature=cool humidity=normal windy=TRUE 2

16 of 23

Page 17: MS Word.doc

WEKA

temperature=cool humidity=normal windy=FALSE 2temperature=cool humidity=normal play=yes 3temperature=cool windy=FALSE play=yes 2humidity=high windy=TRUE play=no 2humidity=high windy=FALSE play=yes 2humidity=high windy=FALSE play=no 2humidity=normal windy=TRUE play=yes 2humidity=normal windy=FALSE play=yes 4

Size of set of large itemsets L(4): 6

Large Itemsets L(4):outlook=sunny temperature=hot humidity=high play=no 2outlook=sunny humidity=high windy=FALSE play=no 2outlook=overcast temperature=hot windy=FALSE play=yes 2outlook=rainy temperature=mild windy=FALSE play=yes 2outlook=rainy humidity=normal windy=FALSE play=yes 2temperature=cool humidity=normal windy=FALSE play=yes 2

Best rules found:

1. humidity=normal windy=FALSE 4 ==> play=yes 4 (1) 2. temperature=cool 4 ==> humidity=normal 4 (1) 3. outlook=overcast 4 ==> play=yes 4 (1) 4. temperature=cool play=yes 3 ==> humidity=normal 3 (1) 5. outlook=rainy windy=FALSE 3 ==> play=yes 3 (1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 (1) 7. outlook=sunny humidity=high 3 ==> play=no 3 (1) 8. outlook=sunny play=no 3 ==> humidity=high 3 (1) 9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 (1)10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2 (1)

17 of 23

Page 18: MS Word.doc

WEKA

Advantages, disadvantages and Future Upgradations:

The WEKA system has covered the entire machine learning (knowledge discovery) process. Although an research project, the WEKA system has been able to implement and evaluate a number of different Algorithms for different steps in the machine learning process.

The output and the information provided by the package is sufficient for an expert in machine learning and related topics. The results as displayed by the system show a detailed description of the flow and the steps involved in the entire machine learning process. The outputs provided by different algorithms are easy to compare and hence make the analysis easier.

ARFF dataset is one of the most widely used data storage formats for research databases, making this system easier for use in research oriented projects.

This package provides and number of application program interfaces (API) which help novice Dataminers build their systems using the ”core WEKA system”.

Since the system provides a number of switches and options, we can customize the output of the system to suit our needs.

First, major disadvantage is that the system is a Java based system and requires Java Virtual Machine installed for its execution. Since the system is entirely based on Command Line parameters and switches, it is difficult for an amateur to use the system efficiently. A Textual interface and output makes it all the more difficult to interpret and visualize the results.

Important results such as the pruned trees, hierarchy based outputs cannot be displayed making it difficult to visualize the results.

Although a commonly used dataset, ARFF is the only format that the WEKA system supports.

All the current version i.e. 3.0.1 has some bugs or disadvantages, the developers are working on a better system and have come up with a new version which has a graphical user interface making the system complete.

18 of 23

Page 19: MS Word.doc

WEKA

Appendix

(Sample executions for other algorithms covered)

19 of 23

Page 20: MS Word.doc

WEKA

PART Decision List Algorithm

>java weka.classifiers.j48.PART -t data/iris.arff

PART decision list------------------

petalwidth <= 0.6: Iris-setosa (50.0)

petalwidth <= 1.7 ANDpetallength <= 4.9: Iris-versicolor (48.0/1.0)

: Iris-virginica (52.0/3.0)

Number of Rules : 3

=== Error on training data ===

Correctly Classified Instances 146 97.3333 %Incorrectly Classified Instances 4 2.6667 %Mean absolute error 0.0338Root mean squared error 0.1301Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 1 49 | c = Iris-virginica

=== Stratified cross-validation ===

Correctly Classified Instances 142 94.6667 %Incorrectly Classified Instances 8 5.3333 %Mean absolute error 0.0454Root mean squared error 0.1805Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 4 46 | c = Iris-virginica

20 of 23

Page 21: MS Word.doc

WEKA

Naïve Bayes Classifier Algorithm:

> java weka.classifiers.NaiveBayes -t data/iris.arff

Naive Bayes Classifier

Class Iris-setosa: Prior probability = 0.33

sepallength: Normal Distribution. Mean = 4.9913 StandardDev = 0.355 WeightSum = 50 Precision = 0.10588235294117648sepalwidth: Normal Distribution. Mean = 3.4015 StandardDev = 0.3925 WeightSum = 50 Precision = 0.10909090909090911petallength: Normal Distribution. Mean = 1.4694 StandardDev = 0.1782 WeightSum = 50 Precision = 0.14047619047619048petalwidth: Normal Distribution. Mean = 0.2743 StandardDev = 0.1096 WeightSum = 50 Precision = 0.11428571428571428

Class Iris-versicolor: Prior probability = 0.33

sepallength: Normal Distribution. Mean = 5.9379 StandardDev = 0.5042 WeightSum = 50 Precision = 0.10588235294117648sepalwidth: Normal Distribution. Mean = 2.7687 StandardDev = 0.3038 WeightSum = 50 Precision = 0.10909090909090911petallength: Normal Distribution. Mean = 4.2452 StandardDev = 0.4712 WeightSum = 50 Precision = 0.14047619047619048petalwidth: Normal Distribution. Mean = 1.3097 StandardDev = 0.1915 WeightSum = 50 Precision = 0.11428571428571428

Class Iris-virginica: Prior probability = 0.33

sepallength: Normal Distribution. Mean = 6.5795 StandardDev = 0.6353 WeightSum = 50 Precision = 0.10588235294117648sepalwidth: Normal Distribution. Mean = 2.9629 StandardDev = 0.3088 WeightSum = 50 Precision = 0.10909090909090911petallength: Normal Distribution. Mean = 5.5516 StandardDev = 0.5529 WeightSum = 50 Precision = 0.14047619047619048petalwidth: Normal Distribution. Mean = 2.0343 StandardDev = 0.2646 WeightSum = 50 Precision = 0.11428571428571428

OneR Classifier Algorithm:

21 of 23

Page 22: MS Word.doc

WEKA

> java weka.classifiers.OneR -t data/iris.arff

petallength:< 2.45 -> Iris-setosa< 4.75 -> Iris-versicolor>= 4.75 -> Iris-virginica

(143/150 instances correct)

=== Error on training data ===

Correctly Classified Instances 143 95.3333 %Incorrectly Classified Instances 7 4.6667 %Mean absolute error 0.0311Root mean squared error 0.1764Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 44 6 | b = Iris-versicolor 0 1 49 | c = Iris-virginica

=== Stratified cross-validation ===

Correctly Classified Instances 142 94.6667 %Incorrectly Classified Instances 8 5.3333 %Mean absolute error 0.0356Root mean squared error 0.1886Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 44 6 | b = Iris-versicolor 0 2 48 | c = Iris-virginica

Kernel Density Algorithm:

> java weka.classifiers.KernelDensity -t data/iris.arff

Kernel Density Estimator

22 of 23

Page 23: MS Word.doc

WEKA

=== Error on training data ===

Correctly Classified Instances 148 98.6667 %Incorrectly Classified Instances 2 1.3333 %Mean absolute error 0.0313Root mean squared error 0.0944Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 49 1 | b = Iris-versicolor 0 1 49 | c = Iris-virginica

=== Stratified cross-validation ===

Correctly Classified Instances 144 96 %Incorrectly Classified Instances 6 4 %Mean absolute error 0.0466Root mean squared error 0.1389Total Number of Instances 150

=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 48 2 | b = Iris-versicolor 0 4 46 | c = Iris-virginica

23 of 23