In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

101
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer

Transcript of In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Page 1: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

In part from: Yizhou Sun2008

An Introduction to WEKA Explorer

Page 2: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

What is WEKA?Waikato Environment for Knowledge

AnalysisIt’s a data mining/machine learning tool

developed by Department of Computer Science, University of Waikato, New Zealand.

Weka is also a bird found only on the islands of New Zealand.

2 04/19/23

Page 3: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Download and Install WEKAWebsite:

http://www.cs.waikato.ac.nz/~ml/weka/index.html

Support multiple platforms (written in java):Windows, Mac OS X and Linux

3 04/19/23

Page 4: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Main Features49 data preprocessing tools76 classification/regression algorithms8 clustering algorithms3 algorithms for finding association rules15 attribute/subset evaluators + 10

search algorithms for feature selection

4 04/19/23

Page 5: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Main GUIThree graphical user interfaces

“The Explorer” (exploratory data analysis)

“The Experimenter” (experimental environment)

“The KnowledgeFlow” (new process model inspired interface)

Simple CLI- provides users without a graphic interface option the ability to execute commands from a terminal window

5 04/19/23

Page 6: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

ExplorerThe Explorer:

Preprocess dataClassificationClusteringAssociation RulesAttribute SelectionData Visualization

References and Resources

6 04/19/23

Page 7: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/237

Explorer: pre-processing the dataData can be imported from a file in various

formats: ARFF, CSV, C4.5, binaryData can also be read from a URL or from

an SQL database (using JDBC)Pre-processing tools in WEKA are called

“filters”WEKA contains filters for:

Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

Page 8: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/238

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 9: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/239

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 10: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato10

Page 11: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato11

Page 12: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Page 13: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Page 14: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

IRIS dataset5 attributes, one is the classification3 classes: setosa, versicolor, virginica

Page 15: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato15

Page 16: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Attribute dataMin, max and average value of attributesdistribution of values :number of items for

which:

class: distribution of attribute values in the classes

Page 17: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Page 18: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato18

Page 19: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato19

Page 20: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato20

Page 21: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato21

Page 22: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Filtering attributesOnce the initial data has been selected and loaded the

user can select options for refining the experimental data.

The options in the preprocess window include selection of optional filters to apply and the user can select or remove different attributes of the data set as necessary to identify specific information.

The user can modify the attribute selection and change the relationship among the different attributes by deselecting different choices from the original data set.

There are many different filtering options available within the preprocessing window and the user can select the different options based on need and type of data present.

Page 23: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato23

Page 24: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato24

Page 25: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato25

Page 26: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato26

Page 27: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato27

Page 28: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato28

Page 29: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato29

Page 30: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato30

Page 31: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato31

Discretizes in 10 bins of equal frequency

Page 32: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato32

Discretizes in 10 bins of equal frequency

Page 33: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato33

Discretizes in 10 bins of equal frequency

Page 34: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato34

Page 35: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato35

Page 36: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato36

Page 37: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/2337

Explorer: building “classifiers”“Classifiers” in WEKA are machine learning

algorithmsfor predicting nominal or numeric quantities

Implemented learning algorithms include:Conjunctive rules, decision trees and lists,

instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

Page 38: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Explore Conjunctive Rules learner

Need a simple dataset with few attributes , let’s select the weather dataset

Page 39: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Select a Classifier

Page 40: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Select training method

Page 41: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Right-click to select parameters

numAntds= number of antecedents, -1= empty rule

Page 42: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Select numAntds=10

Page 43: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Results are shown in the right window (can be scrolled)

Page 44: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Can change the right hand side variable

Page 45: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Performance data

Page 46: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Decision Trees with WEKA

Page 47: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato47

Page 48: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato48

Page 49: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato49

Page 50: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato50

Page 51: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato51

Page 52: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato52

Page 53: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato53

Page 54: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato54

Page 55: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato55

Page 56: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato56

Page 57: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato57

Page 58: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato58

Page 59: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato59

Page 60: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato60

Page 61: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato61

Page 62: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato62

Page 63: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato63

Page 64: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato64

Page 65: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato65

Page 66: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato66

Page 67: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato67

Page 68: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato68

Page 69: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

right click: visualize cluster assignement

Page 70: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/2373

Explorer: finding associationsWEKA contains an implementation of the

Apriori algorithm for learning association rulesWorks only with discrete data

Can identify statistical dependencies between groups of attributes:milk, butter bread, eggs (with confidence

0.9 and support 2000)Apriori can compute all rules that have a

given minimum support and exceed a given confidence

Page 71: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

April 19, 202374

Basic Concepts: Frequent Patterns

itemset: A set of one or more items

k-itemset X = {x1, …, xk}(absolute) support, or,

support count of X: Frequency or occurrence of an itemset X

(relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X)

An itemset X is frequent if X’s support is no less than a minsup threshold

Customerbuys diaper

Customerbuys both

Customerbuys beer

Tid Items bought

10 Beer, Nuts, Diaper

20 Beer, Coffee, Diaper

30 Beer, Diaper, Eggs

40 Nuts, Eggs, Milk

50 Nuts, Coffee, Diaper, Eggs, Milk

Page 72: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

April 19, 202375

Basic Concepts: Association Rules

Find all the rules X Y with minimum support and confidence support, s, probability that

a transaction contains X Y

confidence, c, conditional probability that a transaction having X also contains Y

Let minsup = 50%, minconf = 50%

Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3

Customerbuys diaper

Customerbuys both

Customerbuys beer

Nuts, Eggs, Milk40Nuts, Coffee, Diaper, Eggs,

Milk50

Beer, Diaper, Eggs30

Beer, Coffee, Diaper20

Beer, Nuts, Diaper10

Items boughtTid

Association rules: (many more!) Beer Diaper (60%, 100%) Diaper Beer (60%, 75%)

Page 73: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Page 74: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato77

Page 75: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato78

Page 76: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato79

Page 77: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato80

Page 78: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato81

Page 79: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219 conf:(1)

2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y 198 ==> Class=democrat 198 conf:(1)

3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1)

ecc.

Page 80: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/2383

Explorer: attribute selectionPanel that can be used to investigate which

(subsets of) attributes are the most predictive ones

Attribute selection methods contain two parts:A search method: best-first, forward selection,

random, exhaustive, genetic algorithm, rankingAn evaluation method: correlation-based,

wrapper, information gain, chi-squared, …Very flexible: WEKA allows (almost) arbitrary

combinations of these two

Page 81: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato84

Page 82: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato85

Page 83: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato86

Page 84: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato87

Page 85: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato88

Page 86: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato89

Page 87: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato90

Page 88: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato91

Page 89: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/2392

Explorer: data visualizationVisualization very useful in practice: e.g. helps

to determine difficulty of the learning problemWEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values“Jitter” option to deal with nominal attributes

(and to detect “hidden” data points)“Zoom-in” function

Page 90: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Page 91: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato94

Page 92: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato95

Page 93: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato96

Page 94: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato97

Page 95: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato98

click on a cell

Page 96: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato99

Page 97: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato100

Page 98: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato101

Page 99: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato102

Page 100: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

04/19/23University of Waikato103

Page 101: In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

References and ResourcesReferences:

WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating all

graphical user interfaces (GUI) in Weka. A presentation which explains how to use Weka for

exploratory data mining. WEKA Data Mining Book:

Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)

WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page

Others: Jiawei Han and Micheline Kamber, Data Mining: Concepts

and Techniques, 2nd ed.