An Introduction to WEKA

Post on 09-Jan-2022

5 views 0 download

Transcript of An Introduction to WEKA

An Introduction to

WEKAECLT 5810/ SEEM 5750

The Chinese University of Hong Kong

Content

What is WEKA?

The Explorer:

Preprocess data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization

References and Resources

4/8/2019 2

What is WEKA?

Waikato Environment for

Knowledge Analysis It’s a data mining/machine learning tool developed by

Department of Computer Science, University of Waikato,

New Zealand.

Weka is also a bird found only on the islands of New

Zealand.

3

Download and Install WEKA

Website:

http://www.cs.waikato.ac.nz/~ml/weka/index.html

Support multiple platforms (written in java):

Windows, Mac OS X and Linux

4

Main GUI

Three graphical user interfaces

“The Explorer” (exploratory data analysis)

“The Experimenter” (experimental environment)

“The KnowledgeFlow” (new process model inspired interface)

5

Content

What is WEKA?

The Explorer:

Preprocess data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization

References and Resources

4/8/2019 6

Explorer: pre-processing the

data

Data can be imported from a file in various formats:

ARFF, CSV, C4.5, binary

Data can also be read from a URL or from an SQL

database (using JDBC)

Pre-processing tools in WEKA are called “filters”

WEKA contains filters for:

Discretization, normalization, resampling, attribute

selection, transforming and combining attributes, …

4/8/2019 7

ARFF Format

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...4/8/2019 8

4/8/2019University of Waikato 9

4/8/2019University of Waikato 10

4/8/2019University of Waikato 11

4/8/2019University of Waikato 12

4/8/2019University of Waikato 13

4/8/2019University of Waikato 14

4/8/2019University of Waikato 15

4/8/2019University of Waikato 16

4/8/2019University of Waikato 17

4/8/2019University of Waikato 18

4/8/2019University of Waikato 19

4/8/2019University of Waikato 20

4/8/2019University of Waikato 21

4/8/2019University of Waikato 22

4/8/2019University of Waikato 23

4/8/2019University of Waikato 24

4/8/2019University of Waikato 25

4/8/2019University of Waikato 26

4/8/2019University of Waikato 27

4/8/2019University of Waikato 28

4/8/2019University of Waikato 29

Explorer: building classifiers

Classifiers in WEKA are models for predicting nominal or

numeric quantities

Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support

vector machines, multi-layer perceptrons, logistic

regression, Bayes’ nets, …

4/8/2019 30

4/8/2019University of Waikato 31

4/8/2019University of Waikato 32

4/8/2019University of Waikato 33

4/8/2019University of Waikato 34

4/8/2019University of Waikato 35

4/8/2019University of Waikato 36

4/8/2019University of Waikato 37

4/8/2019University of Waikato 38

4/8/2019University of Waikato 39

4/8/2019University of Waikato 40

4/8/2019University of Waikato 41

4/8/2019University of Waikato 42

4/8/2019University of Waikato 43

4/8/2019University of Waikato 44

4/8/2019University of Waikato 45

4/8/2019University of Waikato 46

4/8/2019University of Waikato 47

4/8/2019University of Waikato 48

4/8/2019University of Waikato 49

4/8/2019University of Waikato 50

4/8/2019University of Waikato 51

4/8/2019University of Waikato 52

Explorer: finding associations

WEKA contains an implementation of the Apriori

algorithm for learning association rules

Works only with discrete data

Can identify statistical dependencies between groups of

attributes:

milk, butter bread, eggs (with confidence 0.9 and

support 2000)

Apriori can compute all rules that have a given

minimum support and exceed a given confidence

4/8/2019 55

4/8/2019University of Waikato 56

4/8/2019University of Waikato 57

4/8/2019University of Waikato 58

4/8/2019University of Waikato 59

4/8/2019University of Waikato 60

Explorer: attribute selection

Panel that can be used to investigate which (subsets of)

attributes are the most predictive ones

Attribute selection methods contain two parts:

A search method: best-first, forward selection, random,

exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary

combinations of these two

4/8/2019 61

4/8/2019University of Waikato 62

4/8/2019University of Waikato 63

4/8/2019University of Waikato 64

4/8/2019University of Waikato 65

4/8/2019University of Waikato 66

4/8/2019University of Waikato 67

4/8/2019University of Waikato 68

4/8/2019University of Waikato 69

Explorer: data visualization

Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem

WEKA can visualize single attributes (1-d) and pairs of

attributes (2-d)

To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values

“Jitter” option to deal with nominal attributes (and to

detect “hidden” data points)

“Zoom-in” function

4/8/2019 70

4/8/2019University of Waikato 71

4/8/2019University of Waikato 72

4/8/2019University of Waikato 73

4/8/2019University of Waikato 74

4/8/2019University of Waikato 75

4/8/2019University of Waikato 76

4/8/2019University of Waikato 77

4/8/2019University of Waikato 78

4/8/2019University of Waikato 79

4/8/2019University of Waikato 80

References and Resources

References:

WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating

all graphical user interfaces (GUI) in Weka.

WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical

Machine Learning Tools and Techniques (Second Edition)

WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page

Others: Jiawei Han and Micheline Kamber, Data Mining:

Concepts and Techniques, 2nd ed.