WEKA Tutorial Summer Institute 2012

86
SAN DIEGO SUPERCOMPUTER CENTER An Introduction to WEKA As presented by PACE 8/9/2012

Transcript of WEKA Tutorial Summer Institute 2012

Page 1: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

An Introduction to WEKA

As presented by PACE8/9/2012

Page 2: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Content

• What is WEKA?• The Explorer Application

• Preprocess• Classify• Cluster• Associate• Select Attributes• Visualize

• Weka on Trestles• References and Resources

2 04/11/23

Page 3: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

What is WEKA?• Weka is a bird found only in New Zealand. • Waikato Environment for Knowledge Analysis

• Weka is a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand.

• Weka is a collection of machine learning algorithms for data mining tasks.

• The algorithms can either be applied directly to a dataset or called from Java code.

• Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

• Weka is open source software in JAVA issued under the GNU General Public License

3 04/11/23

Page 4: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Download and Install WEKA

• Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

• Support multiple platforms (written in java):• Windows, Mac OS X and Linux

• Datasets(iris.arff, weather.arff)• Available on Trestles at: /home/diag/opt/weka/data• Available with Download: …../weka/data/

4 04/11/23

Page 5: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Main Features

• 49 data preprocessing tools• 76 classification/regression algorithms• 8 clustering algorithms• 3 algorithms for finding association rules• 15 attribute/subset evaluators + 10 search

algorithms for feature selection

5 04/11/23

Page 6: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Main GUI• Three graphical user interfaces

• “The Explorer” (exploratory data analysis)• pre-process data• build “classifiers” • cluster data• find associations• attribute selection• data visualization

• “The Experimenter” (experimental environment)• used to compare performance of different learning

schemes

• “The KnowledgeFlow” (new process model inspired interface)

• Java-Beans-based interface for setting up and running machine learning experiments.

• Command line Interface (“Simple CLI”)

6 04/11/23More at: http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Page 7: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Content

• What is WEKA?• The Explorer:

• Preprocess• Classify• Cluster• Associate• Select Attributes• Visualize

• Weka on Trestles• References and Resources

7 04/11/23

Page 8: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER8University of Waikato 04/11/23

Page 9: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

WEKA:: Explorer: Preprocess

• Data format• Uses flat text files to describe the data• Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary• Data can also be read from a URL or from an SQL

database (using JDBC)

Page 10: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

WEKA:: ARFF file format@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html

Page 11: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER11

University of Waikato 04/11/23

Page 12: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER12

University of Waikato 04/11/23

Page 13: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER13

University of Waikato 04/11/23

Page 14: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER14

University of Waikato 04/11/23

Page 15: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER15

University of Waikato 04/11/23

Page 16: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER16

University of Waikato 04/11/23

Page 17: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER17

University of Waikato 04/11/23

Page 18: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

WEKA:: Explorer: Preprocess

• Used to define filters to transform Data.

• WEKA contains filters for:• Discretization, normalization, resampling,

attribute selection, transforming, combining attributes, etc

Page 19: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER19

University of Waikato 04/11/23

Page 20: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER20

University of Waikato 04/11/23

Page 21: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER21

University of Waikato 04/11/23

Page 22: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER22

University of Waikato 04/11/23

Page 23: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER23

University of Waikato 04/11/23

Page 24: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER24

University of Waikato 04/11/23

Page 25: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER25

University of Waikato 04/11/23

Page 26: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER26

University of Waikato 04/11/23

Page 27: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER27

University of Waikato 04/11/23

Page 28: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER28

University of Waikato 04/11/23

Page 29: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER29

University of Waikato 04/11/23

Page 30: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER30

University of Waikato 04/11/23

Page 31: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER31

University of Waikato 04/11/23

Page 32: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

WEKA:: Explorer: building “classifiers”

• Classifiers in WEKA are models for predicting nominal or numeric quantities

• Implemented learning schemes include:• Decision trees and lists, instance-based

classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

• “Meta”-classifiers include:• Bagging, boosting, stacking, error-correcting

output codes, locally weighted learning, …

Page 33: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

33April 11, 2023

This follows an example of Quinlan’s ID3

Decision Tree Induction: Training Dataset

Page 34: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER34April 11, 2023

age?

overcast

student? credit rating?

<=30 >40

no yes yes

yes

31..40

no

fairexcellentyesno

Output: A Decision Tree for “buys_computer”

Page 35: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER36

University of Waikato 04/11/23

Page 36: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER37

University of Waikato 04/11/23

Page 37: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER38

University of Waikato 04/11/23

Page 38: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER39

University of Waikato 04/11/23

Page 39: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER40

University of Waikato 04/11/23

Page 40: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER41

University of Waikato 04/11/23

Page 41: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER42

University of Waikato 04/11/23

Page 42: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER43

University of Waikato 04/11/23

Page 43: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER44

University of Waikato 04/11/23

Page 44: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER45

University of Waikato 04/11/23

Page 45: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER46

University of Waikato 04/11/23

Page 46: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER47

University of Waikato 04/11/23

Page 47: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER48

University of Waikato 04/11/23

Page 48: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER49

University of Waikato 04/11/23

Page 49: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER50

University of Waikato 04/11/23

Page 50: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER51

University of Waikato 04/11/23

Page 51: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER52

University of Waikato 04/11/23

Page 52: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER53

University of Waikato 04/11/23

Page 53: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER54

University of Waikato 04/11/23

Page 54: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER55

University of Waikato 04/11/23

Page 55: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER56

University of Waikato 04/11/23

Page 56: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Explorer: Select Attributes

• Panel that can be used to investigate which (subsets of) attributes are the most predictive ones

• Attribute selection methods contain two parts:• A search method: best-first, forward selection, random,

exhaustive, genetic algorithm, ranking• An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

• Very flexible: WEKA allows (almost) arbitrary combinations of these two

57

04/11/23

Page 57: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER58

University of Waikato 04/11/23

Page 58: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER59

University of Waikato 04/11/23

Page 59: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER60

University of Waikato 04/11/23

Page 60: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER61

University of Waikato 04/11/23

Page 61: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER62

University of Waikato 04/11/23

Page 62: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER63

University of Waikato 04/11/23

Page 63: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER64

University of Waikato 04/11/23

Page 64: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER65

University of Waikato 04/11/23

Page 65: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Explorer: Visualize

• Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem

• WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)• To do: rotating 3-d visualizations (Xgobi-style)

• Color-coded class values• “Jitter” option to deal with nominal attributes

(and to detect “hidden” data points)• “Zoom-in” function

6604/11/23

Page 66: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER67

University of Waikato 04/11/23

Page 67: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER68

University of Waikato 04/11/23

Page 68: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER69

University of Waikato 04/11/23

Page 69: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER70

University of Waikato 04/11/23

Page 70: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER71

University of Waikato 04/11/23

Page 71: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER72

University of Waikato 04/11/23

Page 72: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER73

University of Waikato 04/11/23

Page 73: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER74

University of Waikato 04/11/23

Page 74: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER75

University of Waikato 04/11/23

Page 75: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER76

University of Waikato 04/11/23

Page 76: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Using Weka On Trestles

Page 77: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Using Weka on Trestles

• Shared Resources• Batch and Interactive• Use GUI and Command Line• Use GUI on login nodes to create command line• Use command line to run interactive or batch

jobs on production nodes

Page 78: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Weka Gui

• To launch Weka Gui on a:• Windows machine

• to run software on remote machine with GUI requires a secure shell with x forwarding enabled to establish a remote connection and an X Server to handle the local display.

– Suggested software putty and Xming» http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html» http://www.straightrunning.com/XmingNotes/

• Linux and MAC OS X support X Forwarding• Mac users need to run Applications > Utilities > Xterm• ssh –Y [email protected]

• Load weka module• Weka installation available at: /home/diag/opt/weka• At command prompt > weka

Page 79: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

PBS Script

Page 80: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Output file

Page 81: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Hands On with Weka

Page 82: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

The Weather Data Set.arff file

Weather.arff file • Available on Trestles at: /home/diag/opt/weka/data• On line: http://www.hakank.org/weka/• With Weka download

Data Set:

@relation PlayTennis

@attribute day numeric@attribute outlook {Sunny, Overcast, Rain} @attribute temperature {Hot, Mild, Cool} @attribute humidity {High, Normal}@attribute wind {Weak, Strong} @attribute playTennis {Yes, No}

@data 1,Sunny,Hot,High,Weak,No,? 2,Sunny,Hot,High,Strong,No,?3,Overcast,Hot,High,Weak,Yes,? 4,Rain,Mild,High,Weak,Yes,? 5,Rain,Cool,Normal,Weak,Yes,? 6,Rain,Cool,Normal,Strong,No,?7,Overcast,Cool,Normal,Strong,Yes,?8,Sunny,Mild,High,Weak,No,?.

Page 83: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

The Problem

• Each instance describes the facts of the day and the action of the observed person (played or no play).

• The Data Set• 14 Instances• 6 attributes (day, outlook, temp, humidity, wind, play

tennis)

• Based on the given records we can assess which factors affected the person's decision about playing tennis.

Page 84: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

The Question

• Use j48 decision tree learner to model for class attribute play tennis

• Make prediction for “play”.• Make predictions for the ‘temperature’ attribute. Do you

need to do any additional data preparation?

Page 85: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

Result

Page 86: WEKA Tutorial Summer Institute 2012

SAN DIEGO SUPERCOMPUTER CENTER

References and Resources• References:

• WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html

• WEKA Tutorial:• Machine Learning with WEKA: A presentation demonstrating all graphical

user interfaces (GUI) in Weka. • A presentation which explains how to use Weka for exploratory data

mining. • WEKA Data Mining Book:

• Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)

• WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page