An Introduction to
WEKAECLT 5810/ SEEM 5750
The Chinese University of Hong Kong
Content
What is WEKA?
The Explorer:
Preprocess data
Classification
Clustering
Association Rules
Attribute Selection
Data Visualization
References and Resources
4/8/2019 2
What is WEKA?
Waikato Environment for
Knowledge Analysis It’s a data mining/machine learning tool developed by
Department of Computer Science, University of Waikato,
New Zealand.
Weka is also a bird found only on the islands of New
Zealand.
3
Download and Install WEKA
Website:
http://www.cs.waikato.ac.nz/~ml/weka/index.html
Support multiple platforms (written in java):
Windows, Mac OS X and Linux
4
Main GUI
Three graphical user interfaces
“The Explorer” (exploratory data analysis)
“The Experimenter” (experimental environment)
“The KnowledgeFlow” (new process model inspired interface)
5
Content
What is WEKA?
The Explorer:
Preprocess data
Classification
Clustering
Association Rules
Attribute Selection
Data Visualization
References and Resources
4/8/2019 6
Explorer: pre-processing the
data
Data can be imported from a file in various formats:
ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
4/8/2019 7
ARFF Format
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...4/8/2019 8
4/8/2019University of Waikato 9
4/8/2019University of Waikato 10
4/8/2019University of Waikato 11
4/8/2019University of Waikato 12
4/8/2019University of Waikato 13
4/8/2019University of Waikato 14
4/8/2019University of Waikato 15
4/8/2019University of Waikato 16
4/8/2019University of Waikato 17
4/8/2019University of Waikato 18
4/8/2019University of Waikato 19
4/8/2019University of Waikato 20
4/8/2019University of Waikato 21
4/8/2019University of Waikato 22
4/8/2019University of Waikato 23
4/8/2019University of Waikato 24
4/8/2019University of Waikato 25
4/8/2019University of Waikato 26
4/8/2019University of Waikato 27
4/8/2019University of Waikato 28
4/8/2019University of Waikato 29
Explorer: building classifiers
Classifiers in WEKA are models for predicting nominal or
numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support
vector machines, multi-layer perceptrons, logistic
regression, Bayes’ nets, …
4/8/2019 30
4/8/2019University of Waikato 31
4/8/2019University of Waikato 32
4/8/2019University of Waikato 33
4/8/2019University of Waikato 34
4/8/2019University of Waikato 35
4/8/2019University of Waikato 36
4/8/2019University of Waikato 37
4/8/2019University of Waikato 38
4/8/2019University of Waikato 39
4/8/2019University of Waikato 40
4/8/2019University of Waikato 41
4/8/2019University of Waikato 42
4/8/2019University of Waikato 43
4/8/2019University of Waikato 44
4/8/2019University of Waikato 45
4/8/2019University of Waikato 46
4/8/2019University of Waikato 47
4/8/2019University of Waikato 48
4/8/2019University of Waikato 49
4/8/2019University of Waikato 50
4/8/2019University of Waikato 51
4/8/2019University of Waikato 52
Explorer: finding associations
WEKA contains an implementation of the Apriori
algorithm for learning association rules
Works only with discrete data
Can identify statistical dependencies between groups of
attributes:
milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
4/8/2019 55
4/8/2019University of Waikato 56
4/8/2019University of Waikato 57
4/8/2019University of Waikato 58
4/8/2019University of Waikato 59
4/8/2019University of Waikato 60
Explorer: attribute selection
Panel that can be used to investigate which (subsets of)
attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection, random,
exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
4/8/2019 61
4/8/2019University of Waikato 62
4/8/2019University of Waikato 63
4/8/2019University of Waikato 64
4/8/2019University of Waikato 65
4/8/2019University of Waikato 66
4/8/2019University of Waikato 67
4/8/2019University of Waikato 68
4/8/2019University of Waikato 69
Explorer: data visualization
Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and pairs of
attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and to
detect “hidden” data points)
“Zoom-in” function
4/8/2019 70
4/8/2019University of Waikato 71
4/8/2019University of Waikato 72
4/8/2019University of Waikato 73
4/8/2019University of Waikato 74
4/8/2019University of Waikato 75
4/8/2019University of Waikato 76
4/8/2019University of Waikato 77
4/8/2019University of Waikato 78
4/8/2019University of Waikato 79
4/8/2019University of Waikato 80
References and Resources
References:
WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html
WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating
all graphical user interfaces (GUI) in Weka.
WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical
Machine Learning Tools and Techniques (Second Edition)
WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page
Others: Jiawei Han and Micheline Kamber, Data Mining:
Concepts and Techniques, 2nd ed.
Top Related