1 Weka: Practical Machine Learning Tools and Techniques with Java Implementations Proceedings of the...

Weka: Practical Machine Learning Tools and Techniques with Java Implementations

Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-

Based Information Systems, pages 192-196, 1999. Dunedin, New Zealand.

Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham.

Reporter: Jin-huei Dai

OUTLINE

1. Introduction

2. The command-line interface

3. The Explorer

4. The Knowledge Flow interface

5. The Experimenter

6. Conclusions

7. References

1. Introduction

Data mining is an experimental science. Machine learning provides the technical basis of data mining.

The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It is designed so that users can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning.

Weka was developed at the University of Waikato in New Zealand, and the name stands for Waikato Environment for Knowledge Analysis.

1. Introduction(cont.) Weka is freely available on the World-Wide Web and ac

companies a new text on data mining which documents and fully explains all the algorithms it contains. Applications written using the Weka class libraries can be run on any computer with a Web browsing capability; this allows users to apply machine learning techniques to their own data regardless of computer platform. The Weka software is written entirely in Java to facilitate the availability of data mining tools regardless of computer platform.

The primary learning methods in Weka are “classifiers”, and they induce a rule set or decision tree that models the data. Weka also includes algorithms for learning association rules and clustering data.

2.The command-line interface

3.The Explorerp.375 === Run information ===Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: weatherInstances: 14 Attributes: 5outlook temperature humidity windy playTest mode: 10-fold cross-validation=== Classifier model (full training set) ===J48 pruned treeoutlook = sunny| humidity <= 75: yes (2.0)| humidity > 75: no (3.0)outlook = overcast: yes (4.0)outlook = rainy| windy = TRUE: no (2.0)| windy = FALSE: yes (3.0)Number of Leaves : 5Size of the tree : 8

Time taken to build model: 0.08 seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 9 64.2857 %Incorrectly Classified Instances 5 35.7143 %Kappa statistic 0.186 Mean absolute error 0.2857Root mean squared error 0.4818Relative absolute error 60 %Root relative squared error 97.6586 %Total Number of Instances 14 === Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure Class 0.778 0.6 0.7 0.778 0.737 yes 0.4 0.222 0.5 0.4 0.444 no=== Confusion Matrix === a b <-- classified as 7 2 | a = yes 3 2 | b = no

444.09.0

4.05.0

)4.0)(5.0(2

737.0778.07.0

)778.0)(7.0(22

2 ,7.0

222.027

2 ,6.0

FPRate

2 ,778.0

RPPRmeasureF

ecision

TPRate

=== Run information ===Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0Relation: weather.symbolicInstances: 14Attributes: 5=== Associator model (full training set) ===Size of set of large itemsets L(1): 12Size of set of large itemsets L(2): 47Size of set of large itemsets L(3): 39Size of set of large itemsets L(4): 6

Best rules found: 1. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1) 2. temperature=cool 4 ==> humidity=normal 4 conf:(1) 3. outlook=overcast 4 ==> play=yes 4 conf:(1) 4. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1) 5. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1) 7. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1) 8. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1) 9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 conf:(1)10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2 conf:(1)

=== Run information ===Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: soybean Instances: 683 Attributes: 36date plant-stand precip temp hail crop-histarea-damaged severity seed-tmt germination plant-growth leavesleafspots-halo leafspots-marg leafspot-size leaf-shread leaf-malf leaf-mildstem lodging stem-cankers canker-lesion fruiting-bodies external-decaymycelium int-discolor sclerotia fruit-pods fruit-spots seedmold-growth seed-discolor seed-size shriveling roots classTest mode: evaluate on training dataJ48 pruned tree------------------leafspot-size = lt-1/8| canker-lesion = dna| | leafspots-marg = w-s-marg| | | seed-size = norm: bacterial-blight (21.0/1.0)| | | seed-size = lt-norm: bacterial-pustule (3.23/1.23)| | leafspots-marg = no-w-s-marg: bacterial-pustule (17.91/0.91)| | leafspots-marg = dna: bacterial-blight (0.0)| canker-lesion = brown: bacterial-blight (0.0)| canker-lesion = dk-brown-blk: phytophthora-rot (4.78/0.1)| canker-lesion = tan: purple-seed-stain (11.23/0.23)leafspot-size = gt-1/8

Number of Leaves : 61Size of the tree : 93Time taken to build model: 0.05 seconds

=== Evaluation on training set ===

=== Summary ===Correctly Classified Instances 658 96.3397 %Incorrectly Classified Instances 25 3.6603 %Kappa statistic 0.9598Mean absolute error 0.0104Root mean squared error 0.0625Relative absolute error 10.7981 %Root relative squared error 28.5358 %Total Number of Instances 683

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure Class 1 0.002 0.952 1 0.976 diaporthe-stem-canker 1 0 1 1 1 charcoal-rot 0.95 0 1 0.95 0.974 rhizoctonia-root-rot 1 0.008 0.946 1 0.972 phytophthora-rot 1 0 1 1 1 brown-stem-rot 1 0 1 1 1 powdery-mildew 1 0 1 1 1 downy-mildew 0.978 0.005 0.968 0.978 0.973 brown-spot 1 0.002 0.952 1 0.976 bacterial-blight 0.95 0 1 0.95 0.974 bacterial-pustule 1 0 1 1 1 purple-seed-stain 0.977 0 1 0.977 0.989 anthracnose 0.85 0 1 0.85 0.919 phyllosticta-leaf-spot 0.967 0.017 0.898 0.967 0.931 alternarialeaf-spot 0.89 0.008 0.942 0.89 0.915 frog-eye-leaf-spot 1 0 1 1 1 diaporthe-pod-&-stem-blight 1 0 1 1 1 cyst-nematode 1 0 1 1 1 2-4-d-injury 0.5 0 1 0.5 0.667 herbicide-injury

=== Confusion Matrix === a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 1 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 90 0 0 0 0 0 0 2 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 1 19 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 1 0 0 0 0 0 0 0 43 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 3 0 0 0 0 17 0 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 88 3 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 10 81 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 | r = 2-4-d-injury 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 | s = herbicide-injury

Association rules Weka contain an implementation of the Apriori leaner for generating association rules, a commonly use technique in market basket analysis. This algorithm does not seek rules that predict a particular class attribute but rather looks for any rules that capture strong associations between different attribute.

ClusteringMethod of clustering also do not seek rules that predict a particular class, but rather try to divide the data into natural groups or “clusters.” Weka includes an implementation of the EM algorithm, which can be used for unsupervised learning, it makes the assumption that all attributes are independent random variables.

PredictiveAprioriBest rules found: 1. outlook=overcast 4 ==> play=yes 4 acc:(0.95323) 2. temperature=cool 4 ==> humidity=normal 4 acc:(0.95323) 3. humidity=normal windy=FALSE 4 ==> play=yes 4 acc:(0.95323) 4. outlook=sunny humidity=high 3 ==> play=no 3 acc:(0.92093) 5. outlook=sunny play=no 3 ==> humidity=high 3 acc:(0.92093) 6. outlook=rainy windy=FALSE 3 ==> play=yes 3 acc:(0.92093) 7. outlook=rainy play=yes 3 ==> windy=FALSE 3 acc:(0.92093) 8. outlook=sunny temperature=hot 2 ==> humidity=high play=no 2 acc:(0.86233) 9. outlook=sunny humidity=normal 2 ==> play=yes 2 acc:(0.86233) 10. outlook=sunny play=yes 2 ==> humidity=normal 2 acc:(0.86233) 11. outlook=overcast temperature=hot 2 ==> windy=FALSE play=yes 2 acc:(0.86233) 12. outlook=overcast windy=FALSE 2 ==> temperature=hot play=yes 2 acc:(0.86233) 13. outlook=rainy humidity=high 2 ==> temperature=mild 2 acc:(0.86233) 14. outlook=rainy windy=TRUE 2 ==> play=no 2 acc:(0.86233) 15. outlook=rainy play=no 2 ==> windy=TRUE 2 acc:(0.86233) 16. temperature=hot play=yes 2 ==> outlook=overcast windy=FALSE 2 acc:(0.86233) 17. temperature=hot play=no 2 ==> outlook=sunny humidity=high 2 acc:(0.86233) 18. temperature=mild humidity=normal 2 ==> play=yes 2 acc:(0.86233) 19. temperature=mild play=no 2 ==> humidity=high 2 acc:(0.86233) 20. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 acc:(0.86233)

Select Attributes === Run information ===Evaluator: weka.attributeSelection.PrincipalComponents -R 0.95 -A 5Search: weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy playEvaluation mode: evaluate on all training dataSearch Method: Attribute ranking.Attribute Evaluator (unsupervised): Principal Components Attribute TransformerCorrelation matrix 1 -0.47 -0.56 0.31 0.03 0.04 -0.47 1 -0.47 0.14 -0.17 -0.09 -0.56 -0.47 1 -0.44 0.13 0.04 0.31 0.14 -0.44 1 0.32 0.33 0.03 -0.17 0.13 0.32 1 0.2 0.04 -0.09 0.04 0.33 0.2 1 eigenvalue proportion cumulative 1.94405 0.32401 0.32401 0.578temperature-0.571outlook=rainy+0.506outlook=sunny+0.227windy+0.164humidity... 1.58814 0.26469 0.5887 -0.68outlook=overcast+0.443humidity+0.424outlook=rainy+0.334windy+0.217outlook=sunny... 1.29207 0.21534 0.80404 0.567outlook=sunny-0.443windy-0.432outlook=overcast-0.414humidity-0.312temperature... 0.79269 0.13212 0.93616 0.738windy-0.667humidity-0.077temperature-0.052outlook=overcast+0.033outlook=rainy... 0.38305 0.06384 1 0.748temperature-0.4humidity+0.348outlook=rainy-0.308windy-0.191outlook=overcast...

Eigenvectors V1 V2 V3 V4 V5 0.5064 0.2166 0.5674 0.0167 -0.1683 outlook=sunny 0.0684 -0.6798 -0.4317 -0.0522 -0.1906 outlook=overcast-0.5709 0.4244 -0.1603 0.0325 0.348 outlook=rainy 0.5785 0.053 -0.3125 -0.0772 0.7476 temperature 0.1639 0.4432 -0.4145 -0.6669 -0.4003 humidity 0.227 0.3341 -0.4433 0.7384 -0.3083 windy

Ranked attributes: 0.675990846445273472 1 0.578temperature-0.571outlook=rainy+0.506outlook=sunny+0.227windy+0.164humidity... 0.411301353536642624 2 -0.68outlook = overcast+0.443humidity+0.424outlook = rainy + 0.334windy +0.217outlook=sunny...

0.195956514975330624 3 0.567outlook=sunny-0.443windy-0.432outlook=overcast-0.414humidity-0.312temperature... 0.063841150150769192 4 0.738windy-0.667humidity-0.077temperature-0.052outlook=overcast+0.033outlook=rainy... 0.000000000000000111 5 0.748temperature-0.4humidity+0.348outlook=rainy-0.308windy-0.191outlook=overcast...

Selected attributes: 1,2,3,4,5 : 5

Search Method: Attribute ranking.Attribute Evaluator (supervised, Class (nominal): 5 play):

Symmetrical Uncertainty Ranking FilterRanked attributes: 0.196 1 outlook 0.05 4 windy 0 3 humidity 0 2 temperatureSelected attributes: 1,4,3,2 : 4========================Search Method: Attribute ranking.

OneR feature evaluator.Using 10 fold cross validation for evaluating attributes.Minimum bucket size for OneR: 6

Ranked attributes: 57.143 3 humidity 50 1 outlook 50 2 temperature 42.857 4 windySelected attributes: 3,1,2,4 : 4=========================Search Method: Attribute ranking.

Information Gain Ranking FilterRanked attributes: 0.2467 1 outlook 0.0481 4 windy 0 3 humidity 0 2 temperatureSelected attributes: 1,4,3,2 : 4

Search Method: Best first, Exhaustive Search.Selected attributes: 1,4 : 2 outlook windy============================

Search Method: Genetic search.Initial populationmerit scaled subset 0 0.03362 2 0 0.03362 2 0.04999 0.0548 4 0.06572 0.06147 1 2 3 4 ... 0.17354 0.10716 1 4 0 0.03362 3 0 0.03362 2 0 0.03362 2 Generation: 20merit scaled subset 0.19601 0.2076 1 0.19601 0.2076 1 0.19601 0.2076 1 0.19601 0.2076 1 0.17354 0.16236 1 4 0.09292 0 1 3 4 ... 0.19601 0.2076 1 Attribute Subset Evaluator (supervised, Class (nominal): 5 play):

CFS Subset EvaluatorIncluding locally predictive attributes

Selected attributes: 1,4 : 2 outlook windy

4. The Knowledge Flow Interface

5. The Experimenter

Dataset (1) r.ZeroR|(2)r.OneR(3)trees.J48 ------------------------------------iris (100) 33.33 | 93.53 v 94.73 v ------------------------------------ (v/ /*) | (1/0/0) (1/0/0)

Dataset (1) trees.J48|(2)r.OneR(3) r.ZeroR ------------------------------------iris (100) 94.73 | 93.53 33.33 * ------------------------------------ (v/ /*) | (0/1/0) (0/0/1)

Dataset (1)r.DecisionTable(2)r.ConjunctiveRule(3)r.NNge ------------------------------------iris (20) 93.33 | 66.67 * 95.67 ------------------------------------ (v/ /*) | (0/0/1) (0/1/0)

Dataset (1) rules.On | (2) rules (3) rules (4) rules (5) rules (6) rules (7) rules ----------------------------------------------------------------------------Iris (100) 93.53 | 66.67 * 93.27 93.93 96.00 94.20 94.60 ---------------------------------------------------------------------------- (v/ /*) | (0/0/1) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) Skipped: Key:(1) rules.OneR '-B 6' 3010129309850089072(2) rules.ConjunctiveRule '-N 3 -M 2.0 -P -1 -S 1' -5938309903225087198(3) rules.DecisionTable '-X 1 -S 5' 2788557078165701326(4) rules.JRip '-F 3 -N 2.0 -O 2 -S 1' -6589312996832147161(5) rules.NNge '-G 5 -I 5' 4084742275553788972(6) rules.PART '-M 2 -C 0.25 -Q 1' 8121455039782598361(7) rules.Ridor '-F 3 -S 1 -N 2.0' -7261533075088314436

Dataset (1) rules.Co(2)rules(3)rules(4)trees(5)trees(6)trees(7)trees(8)trees(9)trees ----------------------------------------------------------------------------------------------------------------contact-lenses 63.17 82.50 80.67 72.17 73.17 83.50 77.00 76.17 75.67 ---------------------------------------------------------------------------------------------------------------- (v/ /*) (0/1/0) (0/1/0)(0/1/0) (0/1/0) (1/0/0) (0/1/0)(0/1/0) (0/1/0) (1) rules.ConjunctiveRule '-N 3 -M 2.0 -P -1 -S 1' -5938309903225087198(2) rules.DecisionTable '-X 1 -S 5' 2788557078165701326(3) rules.JRip '-F 3 -N 2.0 -O 2 -S 1' -6589312996832147161(4) trees.DecisionStump '' -7265551604329079943(5) trees.Id3 '' -2693678647096322561(6) trees.J48 '-C 0.25 -M 2' -217733168393644444(7) trees.LMT '-I -1 -M 15' -1113212459618104943(8) trees.NBTree '' -4716005707058256086(9) trees.RandomForest '-I 10 -K 0 -S 1' 4216839470751428698

Dataset (1)r.ConjunctiveRule(2)r.DecisionTable(3)r.JRip(4)r.NNge(5)r.OneR(6)r.PART(7)rules.Ridor(8)rules.ZeroR

labor-neg-data (100)77.60 | 83.80 83.70 86.23 72.77 77.73 82.70 64.67 * (v/ /*) | (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/0/1)

Dataset (1)r.Ridor(2)r.ConjunctiveRule(3)r.NNge(4)t.DecisionStump(5)t.LMT(6)t.RandomTree(7)bayes.BayesNet labor-neg-data(100) 82.70 | 77.60 86.23 78.77 91.37 83.90 90.60 (v/ /*) | (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0)

LIBSVM -- A Library for Support Vector Machines

• LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). It supports multi-class classification.

• Since version 2.8, it implements an SMO-type algorithm proposed in this paper:R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918,

SVM, Support Vector Machine , is something that

has similar roots with neural networks. But

recently it has been widely used in Classification.

That means, if I have some sets of things

classified (But you know nothing about HOW I

CLASSIFIED THEM, or say you don't know the

rules used for classification), when a new data

comes, SVM can PREDICT which set it should

belong to.

The syntax of svmtrain is basically:

svmtrain [options] training_set_file [model_file]

The syntax to svm-predict is:

svmpredict test_file model_file output_file

6. Conclusions

As the technology of machine learning continues to develop and mature, learning algorithms need to be brought to the desktops of people who work with data andunderstand the application domain from which it arises. It is necessary to get the algorithms out of the laboratoryand into the work environment of those who can use them. Weka is a significant step in the transfer of machine learning technology into the workplace.

6. Conclusions(cont.)

The primary one of the three separate interactive interfaces is Explorer, which gives access to all of Weka’s facilities using menu selection and form filling.

The Knowledge Flow interface allows users to design configurations for streamed data processing, and the Experimenter, with which users set up automated experiments that run selected machine learning algorithms with different parameter settings on a corpus of datasets, collect performance statistics, and perform significance tests on the results.

7. References

1. Ian H. Witten & Eibe Frank. [2005]. Data Mining- Practical Machine Learning Tools and Techniques, Second Edition, Morgan kaufmann,San Francisco.2. Zdravko Markov, Ingrid Russell. 2006. An Introduction to the WEKA Data Mining System, ITICSE '06: Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education.3. 台大資工系林智仁(cjlin)老師的 libsvm

1 Weka: Practical Machine Learning Tools and Techniques with Java Implementations Proceedings of the...

Documents

Transcript of 1 Weka: Practical Machine Learning Tools and Techniques with Java Implementations Proceedings of the...

WEKA: A Machine Machine Learning with WEKA · 5/6/2018 University of Waikato 4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description

Queen annes common entrance

Weka classification

St Annes Court - Compressed

Weka Project

Auntie Annes Franchise Kit

THE - St. Annes

Weka project

Shraddha weka

Queen Annes Show Jumping Programme

WEKA DATA MINING SYSTEM Weka Experiment Environment

ICONIP 2020 Call for Papers · 2020-03-09 · ICONIP 2020 Call for Papers The 27th International Conference on Neural Information Processing (ICONIP2020) aims to provide a leading

WEKA in the Ecosystem for Scientific Computing. Contents Part 1: Introduction to WEKA Part 2: WEKA & Octave Part 3: WEKA & R Part 4: WEKA & Python Part.

Weka presentation

125 YEARS - st-annes-shrine.org

Weka Experiments

St Annes Catholic Primary School:

Seaford Parish Bulletin - St. Annes

INSTALLATION MANUAL FOR WEKA GUARD · PDF fileweka_manual_protector24-05-2016 installation manual for weka guard weka protector™ weka boxcoolers b.v. industrieweg 8 nl-2921 lb krimpen

Weka tutorial