Class 4 – More Classifiers
description
Transcript of Class 4 – More Classifiers
![Page 1: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/1.jpg)
Class 4 – More Classifiers
Ramoza Ahsan, Yun Lu, Dongyun Zhang, Zhongfang Zhuang, Xiao Qin, Salah Uddin Ahmed
![Page 2: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/2.jpg)
Lesson 4.1Classification Boundaries
![Page 3: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/3.jpg)
Classification Boundaries• Visualization of the data in the
training stage of building a classifier can provide guidance in parameter selection• Weka visuilization tool• 2 dimensional data set
![Page 4: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/4.jpg)
Boundary Representation With OneR
• Color diagram shows the decision boundaries with training data• Spatial representation of
the decision boundary on OneR algorithm
![Page 5: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/5.jpg)
Boundary Representation With IBk
• Lazy classifier (instance based learner)
• Chooses nearest instance to classify
• Piece wise linear boundary• Increasing k will give blurry
boundaries
![Page 6: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/6.jpg)
Boundary Representation With Naïve Bayes• Naïve Bayes treats each of the
two attribute as contributing equally and independently to decision
• When multiple along the two dimensions get a checkerboard pattern of probabilities.
![Page 7: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/7.jpg)
Boundary Representation With J-48
• Increasing the minNumObj parameter will result in simpler tree
![Page 8: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/8.jpg)
Classification Boundaries• Different classifiers have different capabilities
for carving up instance space. (“Bias”)• Usefulness:
• Important visualization tool.• Provides insight how the algorithm works on data.
• Limitations:• Restricted to numeric attributes and 2-dimensional
plot.
![Page 9: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/9.jpg)
Lesson 4.2Linear Regression
![Page 10: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/10.jpg)
What Is Linear Regression?• In statistics, linear regression is an approach to model
the relationship between a dependent variable y and one or more explanatory variables denoted X.
Straight-line regression analysis: one explanatory variable.Multiple linear regression: more than one explanatory variable
• In data mining, we use this method to make predictions based on numeric attributes for numeric classes.
NominalToBinary filter
![Page 11: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/11.jpg)
Why Linear Regression?• A regression models the past relationship
between variables to predict the future behavior.• Businesses use regression to predict such things
as future sales, stock prices, currency exchange rates, and productivity gains resulting from a training program.• Example: A person’s salary is related with years
of experience. The dependent variable in this instance is salary and the explanatory variable (also called independent variable) is experience here.
![Page 12: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/12.jpg)
Mathematics Of Simple Linear Regression
• The simplest form of regression function is:
Where y is the dependent variable, x is the explanatory variable, b and w are regression coefficients. By thinking regression coefficients as weight, we could get:
Where:
![Page 13: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/13.jpg)
Previous ExampleSalary Dataset
• From the given dataset we could get:
• Thus, we could get • For the instances, we can predict that a person with 10
years experience will get the salary of $58,600 per year.
XYears of Experience
YSalary in $1000s
3 308 579 64
13 723 366 43
11 5921 901 20
16 83
![Page 14: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/14.jpg)
Run This Dataset On Weka
![Page 15: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/15.jpg)
Run This Dataset On Weka
![Page 16: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/16.jpg)
Run This Dataset On Weka
![Page 17: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/17.jpg)
Run This Dataset On Weka
![Page 18: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/18.jpg)
Multiple Linear Regression • Multiple linear regression is an extension of straight-line
regression so as to involve more than one predictor variable. It allows the dependent variable to be modeled as linear summary of n predictor variables described by tuple ( ).
Then adjust weights to minimize square error on training data:
This equation is hard to solve by hand, so we need a tool like Weka to do it.
![Page 19: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/19.jpg)
Non-linear Regression
• Often no linear relationship between dependent variable(class attribute) and explanatory variables.
• Often convert into a linear by a patchwork of serial linear regression models
In Weka, we have “model tree” named M5P method, which can solve this problem. A "model tree" is a tree where each leaf has one of these linear regression models. And we can calculate coefficients for each linear function and then we could make prediction based on this “model tree”.
![Page 20: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/20.jpg)
Lesson 4.3Classification By Regression
![Page 21: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/21.jpg)
Review: Linear Regression
• Several numeric attributes: • Weights of each attributes plus a constant: • Weighed sum of the attributes: • Minimize the squared error:
![Page 22: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/22.jpg)
Using Regression In Classification
• Convert the class values to numeric values(usually binary)• Decide the class according to the
regression result• The result is NOT the probability!!!
• Set the threshold
![Page 23: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/23.jpg)
2-Class Problems
• Assign the binary values to the two classes• Training: Linear Regression•Output prediction
![Page 24: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/24.jpg)
![Page 25: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/25.jpg)
![Page 26: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/26.jpg)
Multi-Class Problems
•Multi-Response Linear Regression•Divide into n regression problems• Build different model for each problem• Select the model with the largest
output
![Page 27: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/27.jpg)
![Page 28: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/28.jpg)
More Investigations
• Cool stuff• Lead to the foundation of Logistic Regression• Convert the class value to binary• Add the Linear Regression result as an attribute• Detect the split using OneR
![Page 29: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/29.jpg)
![Page 30: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/30.jpg)
![Page 31: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/31.jpg)
![Page 32: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/32.jpg)
![Page 33: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/33.jpg)
![Page 34: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/34.jpg)
![Page 35: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/35.jpg)
![Page 36: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/36.jpg)
Lesson 4.4Logistic Regression
![Page 37: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/37.jpg)
Logistic Regression
• In linear regression, we use
to calculate weights from training data• In Logistic Regression, we use
to estimate class probabilities directly.
(1)(2)
![Page 38: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/38.jpg)
Classification
• Email: Spam / Not Spam?• Online Transactions: Fraudulent (Yes / No)?• Tumor: Malignant / Benign?
0: “negative class” (e.g., Benign tumor)1: “positive class” (e.g., Malignant tumor)
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
![Page 39: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/39.jpg)
(Yes) 1
Malignant?
(No) 0
h (𝑥)
Tumor Size
Threshold classifier output y at 0.5:
If predict “”If predict “”
0.5
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
𝑥1
![Page 40: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/40.jpg)
(Yes) 1
Malignant?
(No) 0
h (𝑥)
0.5
Threshold classifier output y at 0.5:
If predict “”If predict “”
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
𝑥1 𝑥2
![Page 41: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/41.jpg)
Classification: y = 0 or 1
In Linear Regression, can be >1 or <0
Logistic regression:
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
![Page 42: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/42.jpg)
Logistic Regression Model
• We want
• Sigmoid functionLogistic function
h𝑤 (𝑥 )= 11+𝑒−(𝑤0+𝑤1𝑥)
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
(3)(4)
![Page 43: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/43.jpg)
Interpretation Of Hypothesis Output
= estimated probability that on input Example: if
Tell patient that 70% chance of tumor being malignant().
“Probability that , given , parameterized by .
Coursera – Machine Learning – Prof. Andrew Ng from Stanford University
(5)(6)(7)
![Page 44: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/44.jpg)
Lesson 4.5Support Vector Machine
![Page 45: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/45.jpg)
Things About SVM
• Better on small and not linear separable data
• Low dimensions => High dimensions
• Support Vectors
• Maximum Marginal Hyperplane
![Page 46: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/46.jpg)
Overview
• The support vectors are the most difficult tuples to classify and give the most information regarding classification.
![Page 47: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/47.jpg)
SVM searches for the hyperplane with the largest margin, that is, the Maximum Marginal Hyperplane (MMH).
![Page 48: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/48.jpg)
SVM Demo
• CMsoft SVM Demo Tool•Question(s)
![Page 49: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/49.jpg)
MoreVery resilient to overfitting• Boundary depends on a few points• Parameter setting (regularization)
Weka: functions>SMORestricted to two classes• So use Multi-Response linear regression … or pairwise linear
regression
Weka: functions>libsvm• External library for support vector machines• Faster than SMO, more sophisticated options
![Page 50: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/50.jpg)
Lesson 4.6Ensemble Learning
![Page 51: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/51.jpg)
Ensemble Learning
• Take the training data• Derive several different training sets from it• Learn a model from each training set• Combine them to produce an ensemble of
learned models
• In brief, instead of depending on a single classifier, we take vote from different classifiers to reach a verdict.
![Page 52: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/52.jpg)
Ensemble Learning
We will discuss 4 types of ensemble methods:• Bagging• Randomization- Random Forest• Boosting• Stacking
![Page 53: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/53.jpg)
Ensemble Learning -- Bagging
• Several training datasets of the same size are chosen at random from the problem domain or produced by sampling with replacement• A particular machine learning technique is used to
build a model for each dataset• For each new test instance we get the prediction from
each model• The class that has the largest support or vote from
the models becomes the resultant class
![Page 54: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/54.jpg)
Ensemble Learning -- Bagging In Weka
![Page 55: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/55.jpg)
Ensemble Learning -- Bagging In Weka
![Page 56: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/56.jpg)
Ensemble Learning -- Bagging In Weka
![Page 57: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/57.jpg)
Ensemble Learning -- Randomization
• One training dataset• Randomize the choices of classifier algorithm to build
several model for the same dataset• For each new test instance we get the prediction from
each model• Very similar to bagging , the class that has the largest
support or vote from the models becomes the resultant class• Ex: Random Forest – uses J48, instead of choosing the
best attribute, randomly pick from k best options
![Page 58: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/58.jpg)
Ensemble Learning -- Random Forest In Weka
![Page 59: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/59.jpg)
Ensemble Learning -- Random Forest In Weka
![Page 60: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/60.jpg)
Ensemble Learning -- Random Forest In Weka
![Page 61: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/61.jpg)
Ensemble Learning -- Boosting
• One training dataset• A particular classifier is used iteratively several
times to produce several models• Output of a iteration becomes input for the next
iteration.• Extra weight is assigned to misclassified
instances to encourage the next iteration to correctly classify• For a test instance, the class that has the
largest vote from the models becomes the resultant class• Ex: AdaBoostM1 with C4.5 or J48
![Page 62: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/62.jpg)
Ensemble Learning -- Boosting In Weka
![Page 63: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/63.jpg)
Ensemble Learning -- Boosting In Weka
![Page 64: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/64.jpg)
Ensemble Learning -- Boosting In Weka
![Page 65: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/65.jpg)
Ensemble Learning -- Stacking
• One training dataset is the input for several base learners or level-0 learners• Output predictions of base learners become
input of a meta learner or level-1 learner• Base learners are different classifiers• An instance is first fed into the level-0 models,
the guesses are fed into the level-1 model, which combines them into the final prediction.• Ex: StackingC with LinearRegression
![Page 66: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/66.jpg)
Ensemble Learning -- Stacking In Weka
![Page 67: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/67.jpg)
Ensemble Learning -- Stacking In Weka
![Page 68: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/68.jpg)
Ensemble Learning -- Stacking In Weka
![Page 69: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/69.jpg)
Ensemble Learning -- Stacking In Weka
![Page 70: Class 4 – More Classifiers](https://reader031.fdocuments.us/reader031/viewer/2022020309/568162be550346895dd349e9/html5/thumbnails/70.jpg)
Ensemble Learning
•Usefulness:• Diversity helps, especially with “unsta
ble” learners •Disadvantages:• It is hard to analyze - it is not easy to
understand what factors are contributing to the improved decisions.