Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and...
-
Upload
horatio-griffith -
Category
Documents
-
view
214 -
download
0
Transcript of Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and...
Weka: a useful tool in data mining and machine learning
Team 5Noha Elsherbiny, Huijun Xiong, and
Bhanu Peddi
What does it mean really to mine data?
• Data mining is an experimental science.• Data mining finds valuable information hidden in large volumes
of data• Data mining is the analysis of data and the use of software
techniques for finding patterns and regularities in sets of data• Data mining encompasses varied fields, including:
– Databases– Statistics– Machine Learning– High Performance Computing– Visualization– Mathematics
How does WEKA come into play?
• "Drowning in Data yet Starving for Knowledge“
• There is no single machine learning scheme is suitable to all data mining problems.
• WEKA(Waikato Environment for knowledge Analysis)
What is in WEKA, that makes it special?
• Provides many different algorithms for data mining and machine learning
• This is an open source and freely available• It is platform-independent• It is easily useable by people who are not data
mining specialists• It provides flexible facilities for scripting experiments• Its has kept up-to-date, with new algorithms being
added as they appear in research literature.
How do one implement WEKA, then?
• Apply a learning method to a dataset and analyze its output to learn more about the data.
• Use learned models to generate prediction on new instances.
• Apply several different learners and compare their performance in order to chose best one for prediction.
How do you actually use it?
• All algorithms take their input form of a single relational table in the ARFF format.
• The learning methods are called classifiers.– weka.classifiers.IBk: k-nearest neighbour learner – weka.classifiers.trees.J48: decision trees– weka.classifiers.NaiveBayes: Naive Bayes
with/without kernels – weka.classifiers.SMO: support vector machines
• There are also pre-processing tools, called filters
References
• Witten, Ian: Data Mining: Practical Tools and Techniques • KDNuggets,http://www.kdnuggets.com/ • McNicholas, P. D. and Zhao, Y. C. (2009), Association rules:
An overview, in Y. Zhao, C. Zhang & L. Cao, eds, 'Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction', IGI Global, pp. 1-10. Available at https://irma-international.org/downloads/excerpts/33406.pdf
• http://maya.cs.depaul.edu/~Classes/Ect584/Weka/preprocess.html
• University of Waikato, New Zealand