WekaTutorial Knowledge Flow With Notes

26
1 Department of Computer Science, University of Waikato, New Zealand Adapted from Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

Transcript of WekaTutorial Knowledge Flow With Notes

Page 1: WekaTutorial Knowledge Flow With Notes

1

Department of Computer Science, University of Waikato, New Zealand

Adapted from Eibe Frank

WEKA: A Machine Learning ToolkitThe Explorer

• Classification and Regression

• Clustering• Association Rules• Attribute Selection• Data Visualization

The ExperimenterThe Knowledge Flow GUIConclusions

Machine Learning with WEKA

Page 2: WekaTutorial Knowledge Flow With Notes

2

2007 Adapted from Eibe Frank 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

Page 3: WekaTutorial Knowledge Flow With Notes

3

2007 Adapted from Eibe Frank 3

WEKA: the softwareMachine learning/data mining software written in Java (distributed under the GNU Public License)Used for research, education, and applicationsComplements “Data Mining” by Witten & FrankMain features:

Comprehensive set of data pre-processing tools, learning algorithms and evaluation methodsGraphical user interfaces (incl. data visualization)Environment for comparing learning algorithms

Page 4: WekaTutorial Knowledge Flow With Notes

4

2007 Adapted from Eibe Frank 4

Page 5: WekaTutorial Knowledge Flow With Notes

5

2007 Adapted from Eibe Frank 5

Page 6: WekaTutorial Knowledge Flow With Notes

6

2007 Adapted from Eibe Frank 6

The Knowledge Flow GUINew graphical user interface for WEKAJava-Beans-based interface for setting up and running machine learning experimentsData sources, classifiers, etc. are beans and can be connected graphicallyData “flows” through components: e.g.,“data source” -> “filter” -> “classifier” -> “evaluator”Layouts can be saved and loaded again later

Page 7: WekaTutorial Knowledge Flow With Notes

7

2007 Adapted from Eibe Frank 7

This is the main window. In the current version the it starts by the ‘DataSources’ tab.

Page 8: WekaTutorial Knowledge Flow With Notes

8

2007 Adapted from Eibe Frank 8

If the ‘DataSources’ is not selected, click on it.

Page 9: WekaTutorial Knowledge Flow With Notes

9

2007 Adapted from Eibe Frank 9

In the top area of the window appear the possible objects for data sources, which in Weka are:-‘Arff Loader’, which loads a arff file.-‘C45 Loader’, which loads a C45 file (a special file format used by weka to represent decision trees).-‘CSV Loader’, which loads a comma separated values file.-‘Database Loader’, which loads data from a database.-‘Serialized InstancesLoader’, which loads data from a serialized java file.Click on the ‘Arff Loader’ and insert it in the knowledge flow layout.

Page 10: WekaTutorial Knowledge Flow With Notes

10

2007 Adapted from Eibe Frank 10

Click on the ‘Visualization’ tab.

Page 11: WekaTutorial Knowledge Flow With Notes

11

2007 Adapted from Eibe Frank 11

Select and drag a ‘DataVisualizer’ object in to the layout.

Page 12: WekaTutorial Knowledge Flow With Notes

12

2007 Adapted from Eibe Frank 12

Click on the ‘Arff Loader’ with the mouse right button.

Page 13: WekaTutorial Knowledge Flow With Notes

13

2007 Adapted from Eibe Frank 13

Select ‘dataset’ and connect it to the ‘Data Visualizer’ object.

Page 14: WekaTutorial Knowledge Flow With Notes

14

2007 Adapted from Eibe Frank 14

Page 15: WekaTutorial Knowledge Flow With Notes

15

2007 Adapted from Eibe Frank 15

The data loaded by the ‘Arff Loader’ will be passed into the ‘Data Visualizer’ and can be shown by it.In the ‘Configure’ option of the ‘Arff Loader’ select a dataset (the ‘iris.arff’for example).Then select the ‘Start Loading’ option of the ‘Arff Loader’, which loads data from the ‘Arff Loader’ to the ‘Data Visualizer’.Now you can select the option ‘Show plot’ of the ‘Data Visualizer’ to see the data in the selected dataset.

Page 16: WekaTutorial Knowledge Flow With Notes

16

2007 Adapted from Eibe Frank 16

Build the data flow illustrated in the figure. You can find the following elements in this tabs:-‘Data Visualizer’, ‘Attribute Summarizer’, ‘Scatter PlotMatrix’ and ‘Text Viewer’ in the ‘Visualization’ tab.-‘Cross Validation Fold Maker’ and ‘Classifier Performance Evaluator’ in the ‘Evaluation’ tab.-‘Attribute Selection’ in the ‘Filters’ tab.-‘SMO’ in the ‘Classifiers’ tab.Connect every one using the context menu of each of the source object, and selecting the label in the link between the objects that you want to connect.

Page 17: WekaTutorial Knowledge Flow With Notes

17

2007 Adapted from Eibe Frank 17

Click in the ‘Arff Loader’ with the mouse right button.

Page 18: WekaTutorial Knowledge Flow With Notes

18

2007 Adapted from Eibe Frank 18

Select ‘Configure’.

Page 19: WekaTutorial Knowledge Flow With Notes

19

2007 Adapted from Eibe Frank 19

Choose the ‘iris.arff’ dataset.

Page 20: WekaTutorial Knowledge Flow With Notes

20

2007 Adapted from Eibe Frank 20

Click ‘Start loading’ of the ‘Arff Loader’ to run the data flow.

Page 21: WekaTutorial Knowledge Flow With Notes

21

2007 Adapted from Eibe Frank 21

Go to the ‘Text Viewer’ and select ‘Show results’.

Page 22: WekaTutorial Knowledge Flow With Notes

22

2007 Adapted from Eibe Frank 22

You can now see the results of the classification of the ‘iris.arff’ dataset using the SMO classifier.

Page 23: WekaTutorial Knowledge Flow With Notes

23

2007 Adapted from Eibe Frank 23

Try this new data flow.

Page 24: WekaTutorial Knowledge Flow With Notes

24

2007 Adapted from Eibe Frank 24

Another Example:Setting up a flow to load an arff file (batch mode) and perform a cross validation using J48 (Weka's C4.5 implementation).

First start the KnowlegeFlow.

Next click on the DataSources tab and choose "ArffLoader" from the toolbar (the mouse pointer will change to a "cross hairs").

Next place the ArffLoadercomponent on the layout area by clicking somewhere on the layout (A copy of the ArffLoader icon will appear on the layout area).

Next specify an arff file to load by first right clicking the mouse over the ArffLoader icon on the layout. A pop-up menu will appear. Select "Configure" under "Edit" in the list from this menu and browse to the location of your arff file.

Next click the "Evaluation" tab at the top of the window and choose the "ClassAssigner" (allows you to choose which column to be the class) component from the toolbar. Place this on the layout.

Page 25: WekaTutorial Knowledge Flow With Notes

25

2007 Adapted from Eibe Frank 25

Another Example:Now connect the ArffLoader to the ClassAssigner: first right click over the ArffLoader and select the "dataSet" under "Connections" in the menu. A "rubber band" line will appear. Move the mouse over the ClassAssigner component and left click - a red line labeled "dataSet“ will connect the two components.

Next right click over the ClassAssigner and choose "Configure" from the menu. This will pop up a window from which you can specify which column is the class in your data (last is the default).

Next grab a "CrossValidationFoldMaker" component from the Evaluation toolbar and place it on the layout. Connect the ClassAssigner to the CrossValidationFoldMaker by right clicking over "ClassAssigner" and selecting "dataSet" from under "Connections" in the menu.

Next click on the "Classifiers" tab at the top of the window and scroll along the toolbar until you reach the "J48" component in the "trees" section. Place a J48 component on the layout.

Connect the CrossValidationFoldMaker to J48 TWICE by first choosing "trainingSet" and then "testSet" from the pop-up menu for the CrossValidationFoldMaker.

Next go back to the "Evaluation" tab and place a "ClassifierPerformanceEvaluator" component on the layout. Connect J48 to this component by selecting the "batchClassifier" entry from the pop-up menu for J48.

Next go to the "Visualization" toolbar and place a "TextViewer“ component on the layout. Connect the ClassifierPerformanceEvaluator to the TextViewer by selecting the "text" entry from the pop-up menu for ClassifierPerformanceEvaluator.

Now start the flow executing by selecting "Start loading" from the pop-up menu for ArffLoader. Depending on how big the data set is and how long cross validation takes you will see some animation from some of the icons in the layout (J48's tree will "grow" in the icon and the ticks will animate on the ClassifierPerformanceEvaluator). You will also see some progress information in the "Status" bar and "Log" at the bottom of the window.

When finished you can view the results by choosing show results from the pop-up menu for the TextViewercomponent.

Other cool things to add to this flow: connect a TextViewer and/or a GraphViewer to J48 in order to view the textual or graphical representations of the trees produced for each fold of the cross validation (this is something that is not possible in the Explorer).

Page 26: WekaTutorial Knowledge Flow With Notes

26

2007 Adapted from Eibe Frank 26

Conclusion: try it yourself!WEKA is available at

http://www.cs.waikato.ac.nz/ml/wekaAlso has a list of projects based on WEKAWEKA contributors:Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang