Summary
-
Upload
chanpreet-singh -
Category
Documents
-
view
1 -
download
0
description
Transcript of Summary
Summary
Technique Ordinary linear regression
Application Predicting the numeric quantity. The dependent variable can be numeric as well as categorical.
Alternative Support vector regression and neural network
Assumption Normality, iid and homoscedasticity for the residuals .If assumption gets violated then box cox method can be used and variable can be transformed
Limitations Data should be linearly dependent otherwise svm and neural network can be used .Multicollinearity within dependent variable need to be checked
Accuracy measure R-Sqaure and adjusted r-square can be used
Key – terms Ordinary linear regression, residuals , transformation of variable , multicollinearity , outliers and influential points , box-cox, normal distribution, PCA
Sources to study NPTEL notes by Prof Shulabh from iit k ( theory )Regression model course at course era by john Hopkins ( implementation on R )
Technique Logistic Regression
Application Predicting the categorical variables. The dependent variable can be numeric as well as categorical.
Alternative Support vector regression and neural network, decision tree , naïve Bayesian
Assumption Normality, iid and homoscedasticity for the residuals .If assumption gets violated then box cox method can be used and variable can be transformed
Limitations Data should be linearly dependent otherwise svm and neural network can be used .Multicollinearity within dependent variable need to be checked.
Accuracy measure Confusion matric can be used when used as a classifier
Key – terms Logistic regression as a classifier, residuals , transformation of variable , multicollinearity , outliers and influential points , box-cox, normal distribution, PCA
Sources to study NPTEL notes by Prof Shulabh from iit k ( theory )Regression model course at course era by john Hopkins ( implementation on R )
Technique Poisson Regression
Application Predicting the number of events in a given time period ( event can be a failure of a machine )
Alternative -
Assumption The probability of occurring of an event should be very less
Limitations Cannot predict when the event will occur
Accuracy measure AIC, BIC
Key – terms Generalized linear models , poisson distribution , link function
Sources to study NPTEL notes by Prof Sulabh from iit k ( theory )Regression model course at course era by john Hopkins ( implementation on R )
Technique Analysis of variance ( ANOVA )
Application To figure out the significance of independent categorical variable in the numeric output.
Alternative Neural network
Assumption Normality, iid and homoscedasticity for the groups .KRUSKAL WALLIS method can be used if assumptions get violated
Limitations Independent variable should only be categorical.
Accuracy measure F statistics
Key – terms Design of experiment, levels in an experiment, TUKEY HSD ,
Sources to study kutner applied linear statistical models ( book ) R code from http://mgmt.iisc.ernet.in/CM/MG221/Handouts.html
Technique Association rules & Association Sequence
Application Used in retail industry to find out an association between products.
Alternative Decision tree , naïve Bayesian
Assumption No assumptions
Limitations Everything should be categorical.If not then it has to be divided into categories
Accuracy measure Support , confidence and lift
Key – terms Association sequence, support, confidence , lift, market basket analysis ,
Sources to study http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )R code is provided during training
Technique Clustering
Application Used to cluster similar data points together. .
Alternative -
Assumption No assumptions
Limitations If dataset is large ( hierarchical cannot be used and finding number of cluster is difficult )Due to redundant variables , clear cluster are not visible in large data
Accuracy measure Tuning parameter and within variation within the cluster
Key – terms K-means , k-medoids, hierarchical , Sparseclustering , PCA , Feature selection , knee point , sumofsquare
Sources to study Machine learning course by Stanford university Prof Andrewhttp://www-users.cs.umn.edu/~kumar/dmbook/index.php
Technique Decision Tree ( classifier )
Application Classifier
Alternative Naïve bayes , logistic , Svm ,Neural network
Assumption -
Limitations The data should be linear separable.Interpretation is difficult when tree is big .Only categorical variable otherwise numeric data is categorized by decision tree with loss in information
Accuracy measure Confusion matrix , accuracy measure , KS statistics, Area under ROC Curve
Key – terms Entropy , splitinfo, Gain ratio
Sources to study http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )
Technique Naïve Bayes ( classifier )
Application Classifier
Alternative Decision tree , logistic , Svm ,Neural network
Assumption All variables must be categorical
Limitations The data should be linear separable.All variables must be independent.Only categorical variable.
Accuracy measure Confusion matrix , accuracy measure , KS statistics, Area under ROC Curve
Key – terms Bayes theorem
Sources to study http://www-users.cs.umn.edu/~kumar/dmbook/index.php ( theory )
Technique SVM and Neural network ( classifier )
Application Classifier ( NON_LINEAR )
Alternative -
Assumption -
Limitations Interpretation is very difficult.Very expensive and time consuming
Accuracy measure Confusion matrix , accuracy measure , KS statistics, Area under ROC Curve
Key – terms Kernel trick , hidden layers, non linear classifier , pattern recognition
Sources to study Machine learning course by Stanford university Prof Andrew http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Technique Principle Component
Application Dimensionality reduction for clustering as well as regression
Alternative Feature selection
Assumption -
Limitations Interpretation is difficult.
Accuracy measure
Key – terms PCA, Multicollnearity , dimentionality reduction .
Sources to study Machine learning course by Stanford university Prof Andrew http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Technique Time series
Application Auto regression. Is used for only time series data
Alternative Markov model if sudden jumps are present
Assumption Stationary and regression assumption for the residuals
Limitations Cyclic component and sudden jumps are taken into account
Accuracy measure AIC and BIC
Key – terms ARIMA, FOERCASTING ,
Sources to study NPTEL VIDEOS BY IISC PROF FROM CIVIL ENGINEERING DEPTTWebsite otext.com
Markov models : for predicting sudden jumps in stock market data or time series data related to other domain
Survival Analysis : data when the machine is going to fail. Sometimes you get the data which is censored. Like a data of failure of machine where some machines did not fail when the data was collected.
Data envelop Analysis : to measure the performance difference between various units or teams based on multiple factors .
Further readings suggested
Further readings suggested • Transformation of variables in linear modeling ( box cox method )• What measure you do when assumption violates ( boc – cox )• Adding non linear terms to your model • Association sequence rule • Sparse clustering for feature selection in clustering ( special method for clustering )• Naïve bayes classifier ( used in text analytics to classify tweets , mails , document etc • K-NN classifier that is usually used when clustering and classification both are required.• How to include the interaction term to improve the model performance • Poisson regression for predicting the failure of machines • Generalized linear regression modelling• Linear discriminant analysis• Boosting bagging and other methods for improvement of classifiers • Random forests method for classification • Topic modelling • Sentimental analysis• Support vector regression for non-linear regression
Thank You