Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles...
-
Upload
cordelia-cooper -
Category
Documents
-
view
305 -
download
21
Transcript of Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles...
![Page 1: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/1.jpg)
Data Science and Big Data Analytics
Chap 6: Adv Analytical Theory and Methods: Regression
Charles TappertSeidenberg School of CSIS, Pace
University
![Page 2: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/2.jpg)
Chapter Sections
6.1 Linear Regression 6.2 Logical Regression 6.3 Reasons to Choose and Cautions 6.4 Additional Regression Models Summary
![Page 3: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/3.jpg)
6 Regression Regression analysis attempts to explain the
influence that input (independent) variables have on the outcome (dependent) variable
Questions regression might answer What is a person’s expected income? What is probability an applicant will default on a
loan? Regression can find the input variables
having the greatest statistical influence on the outcome Then, can try to produce better values of input
variables E.g. – if 10-year-old reading level predicts
students’ later success, then try to improve early age reading levels
![Page 4: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/4.jpg)
6.1 Linear Regression
Models the relationship between several input variables and a continuous outcome variable Assumption is that the relationship is
linear Various transformations can be used to
achieve a linear relationship Linear regression models are
probabilistic Involves randomness and uncertainty Not deterministic like Ohm’s Law (V=IR)
![Page 5: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/5.jpg)
6.1.1 Use Cases Real estate example
Predict residential home prices Possible inputs – living area, #bathrooms,
#bedrooms, lot size, property taxes
Demand forecasting example Restaurant predicts quantity of food
needed Possible inputs – weather, day of week, etc.
Medical example Analyze effect of proposed radiation
treatment Possible inputs – radiation treatment duration,
freq
![Page 6: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/6.jpg)
6.1.2 Model Description
![Page 7: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/7.jpg)
6.1.2 Model DescriptionExample
Predict person’s annual income as a function of age and education
Ordinary Least Squares (OLS) is a common technique to estimate the parameters
![Page 8: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/8.jpg)
6.1.2 Model DescriptionExample
OLS
![Page 9: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/9.jpg)
6.1.2 Model DescriptionExample
![Page 10: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/10.jpg)
6.1.2 Model DescriptionWith Normally Distributed
Errors Making additional assumptions on
the error term provides further capabilities
It is common to assume the error term is a normally distributed random variable Mean zero and constant variance That is
![Page 11: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/11.jpg)
6.1.2 Model DescriptionWith Normally Distributed
Errors With this assumption, the expected
value is
And the variance is
![Page 12: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/12.jpg)
6.1.2 Model DescriptionWith Normally Distributed
Errors Normality assumption with one input
variable
E.g., for x=8, E(y)~20 but varies 15-25
![Page 13: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/13.jpg)
6.1.2 Model DescriptionExample in R
Be sure to get publisher's R downloads: http://www.wiley.com/WileyCDA/WileyTitle/productCd-111887613X.html
> income_input = as.data.frame(read.csv(“c:/data/income.csv”))> income_input[1:10,]> summary(income_input)
> library(lattice)> splom(~income_input[c(2:5)], groups=NULL, data=income_input, axis.line.tck=0, axis.text.alpha=0)
![Page 14: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/14.jpg)
6.1.2 Model DescriptionExample in R
Scatterplot
Examine bottom line
income~age: strong + trendincome~educ: slight + trendincome~gender: no trend
![Page 15: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/15.jpg)
6.1.2 Model Description Example in R
Quantify the linear relationship trends
> results <- lm(Income~Age+Education+Gender,income_input)> summary(results)
Intercept: income of $7263 for newborn female
Age coef: ~1, year age increase -> $1k income incr
Educ coef: ~1.76, year educ + -> $1.76k income +
Gender coef: ~-0.93, male income decreases $930
Residuals – assumed to be normally distributed – vary from -37 to +37 (more information coming)
![Page 16: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/16.jpg)
6.1.2 Model Description Example in R
Examine residuals – uncertainty or sampling error
Small p-values indicate statistically significant results Age and Education highly significant, p<2e-16 Gender p=0.13 large, not significant at 90% confid.
level Therefore, drop variable gender from linear
model> results2 <- lm(Income~Age+Education,income_input)> summary(results) # results about same as before
Residual standard error: residual standard deviation
R-squared (R2): variation of data explained by model Here ~64% (R2 = 1 means model explains data
perfectly)
F-statistic: tests entire model – here p value is small
![Page 17: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/17.jpg)
6.1.2 Model Description Categorical Variables
In the example in R, Gender is a binary variable Variables like Gender are categorical variables
in contrast to numeric variables where numeric differences are meaningful
The book section discusses how income by state could be implemented
![Page 18: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/18.jpg)
6.1.2 Model Description Confidence Intervals on the
Parameters
Once an acceptable linear model is developed, it is often useful to draw some inferences R provides confidence intervals using confint()
function> confint(results2, level = .95)
For example, Education coefficient was 1.76, and now the corresponding 95% confidence interval is (1.53. 1.99)
![Page 19: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/19.jpg)
6.1.2 Model Description Confidence Interval on Expected
Outcome
In the income example, the regression line provides the expected income for a given Age and Education
Using the predict() function in R, a confidence interval on the expected outcome can be obtained
> Age <- 41 > Education <- 12
> new_pt <- data.frame(Age, Education) > conf_int_pt <- predict(results2,new_pt,level=.95,
interval=“confidence”)
> conf_int_pt
Expected income = $68699, conf interval ($67831,$69567)
![Page 20: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/20.jpg)
6.1.2 Model Description Prediction Interval on a Particular
Outcome
The predict() function in R also provides upper/lower bounds on a particular outcome, prediction intervals
> pred_int_pt <- predict(results2,new_pt,level=.95, interval=“prediction”) > pred_int_pt
Expected income = $68699, pred interval ($44988,$92409)
This is a much wider interval because the confidence interval applies to the expected outcome that falls on the regression line, but the prediction interval applies to an outcome that may appear anywhere within the normal distribution
![Page 21: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/21.jpg)
6.1.3 DiagnosticsEvaluating the Linearity
Assumption
A major assumption in linear regression modeling is that the relationship between the input and output variables is linear
The most fundamental way to evaluate this is to plot the outcome variable against each income variable
In the following figure a linear model would not apply In such cases, a transformation might allow a linear
model to apply
Class of dataset Groceries is transactions, containing 3 slots 1. transactionInfo # data frame with vectors having length of transactions
2. itemInfo # data frame storing item labels
3. data # binary evidence matrix of labels in transactions
> Groceries@itemInfo[1:10,]> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))
>
![Page 22: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/22.jpg)
6.1.3 DiagnosticsEvaluating the Linearity
Assumption
Income as a quadratic function of Age
>
![Page 23: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/23.jpg)
6.1.3 DiagnosticsEvaluating the Residuals
The error terms was assumed to be normally distributed with zero mean and constant variance
> with(results2,{plot(fitted.values,residuals,ylim=c(-40,40)) })
>
![Page 24: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/24.jpg)
6.1.3 DiagnosticsEvaluating the Residuals
Next four figs don’t fit zero mean, const variance assumption
>
Nonlnear trend in residuals
Residuals not centered on zero
![Page 25: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/25.jpg)
6.1.3 DiagnosticsEvaluating the Residuals
>
Variance notconstant
Residuals not centered on zero
![Page 26: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/26.jpg)
6.1.3 DiagnosticsEvaluating the Normality
Assumption The normality assumption still has to be validate> hist(results2$residuals)
>
Residuals centered on zero and appear normally distributed
![Page 27: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/27.jpg)
6.1.3 DiagnosticsEvaluating the Normality
Assumption
Another option is to examine a Q-Q plot comparing observed data against quantiles (Q) of assumed dist
> qqnorm(results2$residuals)> qqline(results2$residuals)
>
![Page 28: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/28.jpg)
6.1.3 DiagnosticsEvaluating the Normality
Assumption
>
Normally distributed residuals
Non-normally distributed residuals
![Page 29: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/29.jpg)
6.1.3 DiagnosticsN-Fold Cross-Validation
To prevent overfitting, a common practice splits the dataset into training and test sets, develops the model on the training set and evaluates it on the test set
If the quantity of the dataset is insufficient for this, an N-fold cross-validation technique can be used Dataset randomly split into N dataset of equal size Model trained on N-1 of the sets, tested on remaining one Process repeated N times Average the N model errors over the N folds Note: if N = size of dataset, this is leave-one-out procedure
>
![Page 30: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/30.jpg)
6.1.3 DiagnosticsOther Diagnostic Considerations
The model might be improved by including additional input variables However, the adjusted R2 applies a penalty as the number
of parameters increases Residual plots should be examined for outliers
Points markedly different from the majority of points They result from bad data, data processing errors, or
actual rare occurrences Finally, the magnitude and signs of the estimated
parameters should be examined to see if they make sense
>
![Page 31: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/31.jpg)
6.2 Logistic RegressionIntroduction
In linear regression modeling, the outcome variable is continuous – e.g., income ~ age and education
In logistic regression, the outcome variable is categorical, and this chapter focuses on two-valued outcomes like true/false, pass/fail, or yes/no
>
![Page 32: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/32.jpg)
6.2.1 Logistic RegressionUse Cases
Medical Probability of a patient’s successful response to a specific
medical treatment – input could include age, weight, etc. Finance
Probability an applicant defaults on a loan Marketing
Probability a wireless customer switches carriers (churns) Engineering
Probability a mechanical part malfunctions or fails
>
![Page 33: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/33.jpg)
6.2.2 Logistic RegressionModel Description
Logical regression is based on the logistic function
As y -> infinity, f(y)->1; and as y->-infinity, f(y)->0
>
![Page 34: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/34.jpg)
6.2.2 Logistic RegressionModel Description
With the range of f(y) as (0,1), the logistic function models the probability of an outcome occurring
>
In contrast to linear regression, the values of y are not directly observed; only the values of f(y) in terms of success or failure are observed.
Called log odds ratio, or logit of p.Maximum Likelihood Estimation (MLE) is used to estimate model parameters. MLR is beyond the scope of this book.
![Page 35: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/35.jpg)
6.2.2 Logistic RegressionModel Description: customer churn
example
A wireless telecom company estimates probability of a customer churning (switching companies) Variables collected for each customer: age (years),
married (y/n), duration as customer (years), churned contacts (count), churned (true/false)
After analyzing the data and fitting a logical regression model, age and churned contacts were selected as the best predictor variables
>
![Page 36: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/36.jpg)
6.2.2 Logistic RegressionModel Description: customer churn
example
>
![Page 37: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/37.jpg)
6.2.3 DiagnosticsModel Description: customer churn
example
> head(churn_input) # Churned = 1 if cust churned> sum(churn_input$Churned) # 1743/8000 churned Use the Generalized Linear Model function glm()> Churn_logistic1<-glm(Churned~Age+Married+Cust_years+Churned_contacts,data=churn_input,family=binomial(link=“logit”))> summary(Churn_logistic1) # Age + Churned_contacts best> Churn_logistic3<-glm(Churned~Age+Churned_contacts,data=churn_input,family=binomial(link=“logit”))> summary(Churn_logistic3) # Age + Churned_contacts
>
![Page 38: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/38.jpg)
6.2.3 DiagnosticsDeviance and the Pseudo-R2
In logistic regression, deviance = -2logL where L is the maximized value of the likelihood function
used to obtain the parameter estimates Two deviance values are provided
Null deviance = deviance based on only the y-intercept term Residual deviance = deviance based on all parameters
Pseudo-R2 measures how well fitted model explains the data Value near 1 indicates a good fit over the null model
>
![Page 39: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/39.jpg)
6.2.3 DiagnosticsDeviance and the Log-Likelihood Ratio
Test
Skip this section
>
![Page 40: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/40.jpg)
6.2.3 DiagnosticsReceiver Operating Characteristic (ROC)
Curve
Logistic regression is often used to classify In the Churn example, a customer can be classified as
Churn if the model predicts high probability of churning Although 0.5 is often used as the probability threshold,
other values can be used based on desired error tradeoff For two classes, C and nC, we have
True Positive: predict C, when actually C True Negative: predict nC, when actually nC False Positive: predict C, when actually nC False Negative: predict nC, when actually C
>
![Page 41: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/41.jpg)
6.2.3 DiagnosticsReceiver Operating Characteristic (ROC)
Curve
The Receiver Operating Characteristic (ROC) curve Plots TPR against FPR
>
![Page 42: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/42.jpg)
6.2.3 DiagnosticsReceiver Operating Characteristic (ROC)
Curve
> library(ROCR) > Pred = predict(Churn_logistic3, type=“response”)
>
![Page 43: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/43.jpg)
6.2.3 DiagnosticsReceiver Operating Characteristic (ROC)
Curve
>
![Page 44: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/44.jpg)
6.2.3 DiagnosticsHistogram of the Probabilities
>
It is interesting to visualize the counts of the customers who churned and who didn’t churn against the estimated churn probability.
![Page 45: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/45.jpg)
6.3 Reasons to Choose and Cautions
Linear regression – outcome variable continuous Logistic regression – outcome variable categorical Both models assume a linear additive function of
the inputs variables If this is not true, the models perform poorly In linear regression, the further assumption of normally
distributed error terms is important for many statistical inferences
Although a set of input variables may be a good predictor of an output variable, “correlation does not imply causation”
>
![Page 46: Data Science and Big Data Analytics Chap 6: Adv Analytical Theory and Methods: Regression Charles Tappert Seidenberg School of CSIS, Pace University.](https://reader033.fdocuments.us/reader033/viewer/2022061410/5697bfbe1a28abf838ca2701/html5/thumbnails/46.jpg)
6.4 Additional Regression Models
Multicollinearity is the condition when several input variables are highly correlated This can lead to inappropriately large coefficients
To mitigate this problem Ridge regression applies a penalty based on the size of the
coefficients Lasso regression applies a penalty proportional to the sum of the
absolute values of the coefficients
Multinomial logistic regression – used for a more-than-two-state categorical outcome variable
>