Data Mining with WEKA - University of...
Transcript of Data Mining with WEKA - University of...
![Page 1: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/1.jpg)
Data Mining with WEKA
![Page 2: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/2.jpg)
WEKA ?
� Waikato Environment for Knowledge Analysis
� A Collection of Machine Learning algorithms for data tasks.
� WEKA contains tools for data – pre-processing, classification, regression, clustering
association rules.
![Page 3: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/3.jpg)
Start with WEKA1)Get the WEKA program on the web
http://www.cs.waikato.ac.nz/ml/weka/
2)set the CLASSPATH
system environment variables;
variable name: CLASSPATH
variable value: (e.g C:\Program Files\Weka-3-4)
![Page 4: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/4.jpg)
Prepare the Data Set
Need to convert ARFF format1. Data can be load to excel spreadsheet2. Save this data in comma-separated format (CSV)3. Load this file into Micro Word 4. Make beginning of the ARFF file.
-@ relation ( title)-@ attribute (data type)-@ data
![Page 5: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/5.jpg)
Load into Excel
![Page 6: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/6.jpg)
Save as the CSV file format
![Page 7: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/7.jpg)
Load into MS word
![Page 8: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/8.jpg)
Make other parts..
![Page 9: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/9.jpg)
WEKA only deals with ARFF files
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,
atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present
![Page 10: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/10.jpg)
![Page 11: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/11.jpg)
Preprocessing the data
� Integration from different sources� The Data must be assembled, integrated, and cleaned up
� Pre-processing tools in WEKA are called “filters”
� WEKA contains filters for:� Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
![Page 12: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/12.jpg)
With numeric data (Iris.arff)
![Page 13: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/13.jpg)
Select Discretize filter
![Page 14: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/14.jpg)
Changed to nominal data
![Page 15: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/15.jpg)
Filtering using CLI (Iris.data)
-i source –o object file
![Page 16: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/16.jpg)
Association (weather.nominal.arff)
![Page 17: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/17.jpg)
Association -result
� Best rules found:
1. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1)2. temperature=cool 4 ==> humidity=normal 4 conf:(1)3. outlook=overcast 4 ==> play=yes 4 conf:(1)4. humidity=normal 7 ==> play=yes 6 conf:(0.86)5. play=no 5 ==> humidity=high 4 conf:(0.8)6. windy=FALSE 8 ==> play=yes 6 conf:(0.75)7. play=yes 9 ==> windy=FALSE 6 conf:(0.67)8. play=yes 9 ==> humidity=normal 6 conf:(0.67)9. humidity=normal play=yes 6 ==> windy=FALSE 4 conf:(0.67)10. windy=FALSE play=yes 6 ==> humidity=normal 4 conf:(0.67
![Page 18: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/18.jpg)
Classification – voting records
![Page 19: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/19.jpg)
Classification - zeroR
![Page 20: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/20.jpg)
Classification -oneR
![Page 21: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/21.jpg)
Classification –J48
![Page 22: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/22.jpg)
Decision Tree from J48 result
![Page 23: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/23.jpg)
Cluster (Iris.ARFF data)
![Page 24: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/24.jpg)
Cluster – k-means
![Page 25: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/25.jpg)
K- means: numClusters to 3
![Page 26: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/26.jpg)
K – means clustered to 3 group
![Page 27: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/27.jpg)
Visualization of clustering
![Page 28: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/28.jpg)
Cluseter – CobWeb
![Page 29: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/29.jpg)
Experiment – add DataSet
![Page 30: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/30.jpg)
Experiment - destination
![Page 31: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/31.jpg)
Experiment –classifying algorithm
![Page 32: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/32.jpg)
Experiment-multiple scheme
![Page 33: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/33.jpg)
Experiment -run
![Page 34: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/34.jpg)
Experiment - analysis
![Page 35: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/35.jpg)
Experiment – better or worse
� Analysing: Percent_correct� Datasets: 1� Resultsets: 3� Confidence: 0.05 (two tailed)� Date: 04. 5. 16. ��3:2
� Dataset (1) rules.On | (2) trees (3) rules � ------------------------------------� iris (10) 94.51 | 94.9 33.33 * � ------------------------------------� (v/ /*) | (0/1/0) (0/0/1) � Skipped:
� Key:
� (1) rules.OneR '-B 6' -2459427002147861445� (2) trees.J48 '-C 0.25 -M 2' -217733168393644444� (3) rules.ZeroR '' 48055541465867954
![Page 36: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/36.jpg)
Experiment - summary
� Analysing: Percent_correct� Datasets: 1� Resultsets: 3� Confidence: 0.05 (two tailed)� Date: 04. 5. 16. ��3:20
� a b c (No. of datasets where [col] >> [row])� - 0 0 | a = rules.OneR '-B 6' -2459427002147861445� 0 - 0 | b = trees.J48 '-C 0.25 -M 2' -217733168393644444� 1 1 - | c = rules.ZeroR '' 48055541465867954
![Page 37: Data Mining with WEKA - University of Manchesterstudentnet.cs.manchester.ac.uk/.../tutorials/WEKA.pdfWEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning](https://reader031.fdocuments.us/reader031/viewer/2022022512/5ae634417f8b9aee078ca6a4/html5/thumbnails/37.jpg)
Experiment - ranking
� Analysing: Percent_correct� Datasets: 1� Resultsets: 3� Confidence: 0.05 (two tailed)� Date: 04. 5. 16. ��3:23
� >-< > < Resultset� 1 1 0 trees.J48 '-C 0.25 -M 2' -217733168393644444� 1 1 0 rules.OneR '-B 6' -2459427002147861445� -2 0 2 rules.ZeroR '' 48055541465867954