Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework •...
Transcript of Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework •...
![Page 1: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/1.jpg)
Today’s Schedule
• Introduction to Machine Learning• Possible project topics• Paper discussion• Find team mates (if desired)
![Page 2: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/2.jpg)
Introduction to
Machine Learning
![Page 3: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/3.jpg)
Topics
• Basic concepts and process• Algorithms• Example (WEKA)
![Page 4: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/4.jpg)
• Find patterns / clusters• Evaluation: similarity value,
classes to clusters, …
• Predict the correct label• Evaluation: correctly classified
instances, false positive rate, …
green
blue
red
Unsupervised Learning Supervised Learning
![Page 5: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/5.jpg)
Classification Regression
blue red green no distinct categories, but a real value
![Page 6: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/6.jpg)
Features / Attributes, Instances
R G B Color1 227 25 59 Red2 17 184 56 Green3 113 125 222 Blue4 230 67 175 Red Instance
Features / Attributes
Label /Class attribute
![Page 7: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/7.jpg)
Training Set – Test Set
Training SetTest Set
5 0 4 14 2 1 3
Dataset Labels
![Page 8: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/8.jpg)
Validation Methods10-fold
Cross-validationLeave-one-out cross-validation
2/3 Training Set1/3 Test Set
![Page 9: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/9.jpg)
Validation Metrics
Classifier outcome:Positive
Classifier outcome:Negative
Condition (label):Positive True positive False negative
Condition (label):Negative False positive True negative
Confusion Matrix
Accuracy: (Σ True positive + Σ True negative) / total
True positive rate = Recall: Σ True positive / Σ condition positive True negative rate: Σ True negative / Σ condition negativePrecision: Σ True positive / Σ Classifier outcome positive
Compare to: base accuracy = percentage share of most likely category
Source and more information: http://en.wikipedia.org/wiki/Confusion_matrix
![Page 10: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/10.jpg)
Underfitting - Overfitting
![Page 11: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/11.jpg)
Basic Process
1. Data collection2. Feature calculation3. Feature selection4. Classification
![Page 12: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/12.jpg)
Algorithms
• Naive Bayes• Support Vector Machine• Decision Trees
(There are many more: Neural networks, k-nearest neighbour, …)
![Page 13: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/13.jpg)
Naïve Bayes
• Fast and high performance• Based on Bayes Theorem• Assumes independence of features
Example: e-mail classification into spam and no spam. Features: words
Bayes Theorem:
Source: http://de.wikipedia.org/wiki/Bayes-Klassifikator
![Page 14: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/14.jpg)
Support Vector Machine
• Divides objects in classes by maintaining a maximally large margin between theobjects Large Margin Classifier
• can be used for classification andregression
Source: http://de.wikipedia.org/wiki/Support_Vector_Machine
![Page 15: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/15.jpg)
Decision Tree
• Builds a tree to classify objects• leaves = class labels
branches = conjunctions of features that lead to those class labels
• can be used for classification andregression
Source: http://en.wikipedia.org/wiki/Decision_tree_learning
![Page 16: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/16.jpg)
WEKA
• Java machine learning framework• Provides a Java library and a graphical user interface• Implements many preprocessing algorithms (filters) and classifiers• Filters: attribute selection, transforming and combining attributes,
discretization, normalization, …• Classifiers: Support Vector Machine (SMO), Decision Tree (J48), Naive
Bayes, …
Source: http://www.cs.waikato.ac.nz/ml/weka/
![Page 17: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/17.jpg)
![Page 18: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/18.jpg)
Example Dataset: diabetes.arffAttributes:1. Number of times pregnant2. Plasma glucose concentration a 2
hours in an oral glucose tolerancetest
3. Diastolic blood pressure (mm Hg)4. Triceps skin fold thickness (mm)5. 2-Hour serum insulin (mu U/ml)6. Body mass index (weight in
kg/(height in m)^2)7. Diabetes pedigree function8. Age (years)9. Class variable (0 or 1)
General Info:• Number of Instances: 768• Number of Attributes: 8 plus class• Number of instances with label tested_negative: 500• Number of instances with label tested_positve: 268
6,148,72,35,0,33.6,0.627,50,tested_positive1,85,66,29,0,26.6,0.351,31,tested_negative8,183,64,0,0,23.3,0.672,32,tested_positive1,89,66,23,94,28.1,0.167,21,tested_negative0,137,40,35,168,43.1,2.288,33,tested_positive
This and more datasets here: http://storm.cis.fordham.edu/~gweiss/data-mining/datasets.html
![Page 19: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/19.jpg)
Example Weka Code Part 1//read data fileDataSource source = new DataSource("C:/Users/Manuela/OneDrive/Work/Teaching/HASE/diabetes.arff");Instances data = source.getDataSet();
//set class variableif (data.classIndex() == -1) {
data.setClassIndex(data.attribute("class").index());}
//Attribute selectionAttributeSelection filter = new AttributeSelection(); CfsSubsetEval eval = new CfsSubsetEval();GreedyStepwise search = new GreedyStepwise();search.setSearchBackwards(true);filter.setEvaluator(eval);filter.setSearch(search);filter.setInputFormat(data);
// Attribute reductionInstances filteredData = Filter.useFilter(data, filter);
![Page 20: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/20.jpg)
Example Weka Code Part 2for (int i = 0; i < 10; i++) {
int seed = i + 1;Random rand = new Random(seed);Instances randData = new Instances(data);randData.randomize(rand);if (randData.classAttribute().isNominal())
randData.stratify(10);
Evaluation evalJ48 = new Evaluation(randData);for (int n = 0; n < 10; n++) {
Instances train = randData.trainCV(10, n);Instances test = randData.testCV(10, n);
J48 newTree = (J48) J48.makeCopy(tree);newTree.buildClassifier(train);evalJ48.evaluateModel(newTree, test);
}}
Randomize the data
We do a 10 times 10-fold cross-validation
We do a 10 times 10-foldcross-validation
Set training set and test set
build and evaluate the classifier
![Page 21: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/21.jpg)
Interpretation of ResultsClassifier Features Accuracy (%)
J48 All 74.49
J48 Selected 74.38
SMO All 76.81
SMO Selected 76.95
Naïve Bayes All 75.76
Naïve Bayes Selected 77.06
Selected Features:• Plasma glucose concentration a 2 hours in an oral glucose tolerance test• Body mass index• Diabetes pedigree function (synthesis of family history concerning diabetes)• Age
Base accuracy: 65.1 %
![Page 22: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/22.jpg)
Interpretation of Results
Classifier outcome:Positive
Classifier outcome:Negative
Condition (label):Positive 436.3 63.7
Condition (label):Negative 112.5 155.5
Confusion Matrix: Naïve Bayes, selected features
![Page 23: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/23.jpg)
Summary
Basic concepts ofMachine Learning
Classification
Confusion Matrix
Naïve Bayes
Test Set
Overfitting
Machine Learning algorithms
Cross-Validation
Decision Tree
Support Vector Machine
Example
J48 newTree = (J48) J48.makeCopy(tree);
newTree.buildClassifier(train);
evalJ48.evaluateModel(newTree, test);
Classifier Features Accuracy (%)
J48 All 74.49
J48 Selected 74.38
SMO All 76.81
SMO Selected 76.95
Naïve Bayes All 75.76
Naïve Bayes Selected 77.06
![Page 24: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/24.jpg)
Further Readings / Links to Machine Learning
• Weka Download: http://www.cs.waikato.ac.nz/ml/weka/downloading.html
• Weka Wiki: http://weka.wikispaces.com/• Sample Datasets: http://storm.cis.fordham.edu/~gweiss/data-
mining/datasets.html• Book about Machine Learning and Weka:
http://www.cs.waikato.ac.nz/ml/weka/book.html• Book about Artificial Intelligence:
http://aima.cs.berkeley.edu/
![Page 25: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/25.jpg)
Datasets and Projects
![Page 26: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/26.jpg)
Sensing and Visualizing Heart Rate Variability
• Develop a visualization for heart rate related data
• Find your research question concerning collected data or the visualization and do a small evaluation
![Page 27: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/27.jpg)
Measuring Interruptions
• Develop an approach to detect external interruptions using a microphone and voice recognition libraries
• Evaluate the accuracy of your approach
![Page 28: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/28.jpg)
Sleep / Activity and Productivity
• Develop an approach to collect interesting data (sleep, physical activity, hand movements) using an activity tracker
• Collect productivity ratings and activity data, and evaluate it
![Page 29: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/29.jpg)
Fostering Focused Time Through Self-Awareness
• Determine a concept to be visualized using an LED indicator
• Find your research question and evaluate it
![Page 30: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/30.jpg)
Exploring Program Switches
• Use a given data set or track your own program switches
• Find an interesting research question and evaluate it (e.g. Is the number of program switches related to attention? How does the switching behavior change over time?
![Page 31: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/31.jpg)
Scheduling Meetings for Software Developers
• Analyze how software developers schedule their meetings using a survey or analyzing Outlook data
![Page 32: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/32.jpg)
Dataset: Developer Activity andBiometric Data• Raw data recorded by the
sensors• User input: keystrokes, clicks,
movement, scrolling• Active window title and activity
category
Possible RQ: Are developers more productive when they are happy and are their patterns of productivity / happiness?
![Page 33: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/33.jpg)
Dataset: Software Repository Histories and Metrics
This large-scale dataset contains code metrics which have been computed for ~1.3 Million revisions of 300 Java, C# and JavaScript projects. As such, it describes the evolutionary history of these projects at a very fine-grained level.
Possible RQ: Is it possible to identify evolutionary patterns? (i.e. could time series clustering analyses reveal "typical" kinds of project evolutions?)
![Page 34: Intro to Machine Learning - UZH IfI1a3dc6bb-f72a-4ca8... · • Java machine learning framework • Provides a Java library and a graphical user interface • Implements many preprocessing](https://reader030.fdocuments.us/reader030/viewer/2022040216/5d1f3e2988c9936f5d8d11f5/html5/thumbnails/34.jpg)
Image SourcesTitle Page: http://www.enterprisetech.com/2014/02/11/netflix-speeds-machine-learning-amazon-gpus/
Regression: http://www.digplanet.com/wiki/Linear_regression
Handwritten Letters: http://yann.lecun.com/exdb/mnist/
Overfitting: http://pingax.com/regularization-implementation-r/
Naïve Bayes Formulas: http://de.wikipedia.org/wiki/Bayes-Klassifikator
Support Vector Machine: http://de.wikipedia.org/wiki/Support_Vector_Machine
Decision Tree: http://en.wikipedia.org/wiki/Decision_tree_learning
Weka Logo: http://www.cs.waikato.ac.nz/ml/weka/
Weka Screenshot: http://commons.wikimedia.org/wiki/File:Weka-3.5.5.png
Empatica: https://www.empatica.com/products.php
Sensecore: https://www.senseyourcore.com
Conversation: http://www.ravishly.com/sites/default/files/field/image/ThinkstockPhotos-122554224.jpg
Blink Light: http://cdn.shopify.com/s/files/1/0543/2969/products/HAD000031_-_blink_Product_4_c8a69a5a-131c-4ce3-be1e-d84b2c3b2d0c.jpg?v=1448393350
Open Windows: https://u.osu.edu/5226sp15/files/2015/02/27a76377-ce9b-4837-8e4a-dd283f1ecaf1_0-1pfyu10.jpg
Meetings: http://www.toronto.ca/legdocs/news/assets/images/2012-calendar.jpg
Github History: http://4.bp.blogspot.com/_jUrEaqvFttU/TKcCHi-QXHI/AAAAAAAAA8M/OY8Shjfl23s/s1600/bikesoup-history.png