Recep maz msb 701 quantitative analysis for managers
-
Upload
recepmaz -
Category
Technology
-
view
107 -
download
0
description
Transcript of Recep maz msb 701 quantitative analysis for managers
Quantitative Analysis for Managers
Regression analysis application
Instructor: Prof. MINE AYSEN DOYRANStudent: Recep Maz
Regression analysisRegression analysis includes any techniques for
modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.
Regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.
Regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed.
Regression analysisThe focus is on a quantile, or other location
parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function.
In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.
Regression analysisRegression analysis is widely used for
prediction and forecasting, where its use has substantial overlap with the field of machine learning.
Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships.
Regression analysisIn restricted circumstances, regression
analysis can be used to infer causal relationships between the independent and dependent variables.
Simple linear regression models have only two variables
Multiple regression models have more variables
Regression models involve the following variables
The variable to be predicted is called the dependent variable, Y
Sometimes called the response variable
The value of this variable depends on the value of the independent variable, X
Sometimes called the explanatory or predictor variable, control variable
A regression model relates Y to a function of X
Introduction regression models
dependent variable, Y independent variable, X A regression model relates Y to a
function of X
Independent variable
Dependent variable
Independent variable
= +
Testing the Model for Significance
If the F-statistic is large, the significance level (P-value) will be low, indicating it is unlikely this would have occurred by chance
If P value of F Statistic (Significance F) is smaller than 0.05 (5%), it means that your regression model is statistically significant.
Testing the Model for SignificanceThe best model is a statistically significant
model with a high r2 and few variablesAs more variables are added to the model,
the r2-value usually increasesFor this reason, the adjusted r2 value is often
used to determine the usefulness of an additional variable
The adjusted r2 takes into account the number of independent variables in the model
Testing the Model for Significance As the number of variables increases, the
adjusted r2 gets smaller unless the increase due to the new variable is large enough to offset the change in k (number of independent variables)
Testing the Model for Significance In general, if a new variable increases the adjusted
r2, it should probably be included in the modelIn some cases, variables contain duplicate
informationWhen two independent variables are correlated, they
are said to be collinearWhen more than two independent variables are
correlated, multicollinearity existsWhen multicollinearity is present, hypothesis tests
for the individual coefficients are not valid but the model may still be useful
Hypothesis statement , dependent variable and independent variable Dependent variable……: Total number of white
people between 18 to 64 yearsIndependent variable…: Number of white
people below poverty level between 18 to 64 years
Hypothesis statement..: Hypothesis statement is that while population of white adult people (18 to 64 years) increases, number of white people between 18 to 64 years who are living below poverty level decrease by the years.
INTERPREATION OF REGRESSION OUTPUTS R SquareR square= 0.024884311=2.5% of variation in
total number of white people between 18 to 64 years is explained by white people below poverty level . This value is indicating weak fitness.
I f R square is too high (0,8/0,9…) we will have multicollinearity problem. Which means our variables correlated each other. Fortunately, our R square value is not too high and it is also between 0 and 1.
INTERPREATION OF REGRESSION OUTPUTS Adjusted R squareAdjusted R Square= -0.0834618768434626=-8.3%
this value is indicating weak fitness.If the number of observations is small we may
obtain a higher value of r square. This can provide a very misleading indicator of goodness of fit. That is why many researchers use adjusted R square value instead.
If the adjusted R square value higher than R square value we may face multicollinearity problem.
Adjusted R Square=-8.3% < R square=2.5% . We don’t have multicollinearity problem.
INTERPREATION OF REGRESSION OUTPUTSSignificance FThe most important indicator to analysis regression outputs
significance F. This value refers statical significant of regression model. This value provides evidence of existence of a linear relationship between our two variables. It also provides a measure of the total variation explained by the regression relative to the total unexplained variation. The higher the significance F, the better the overall fit of the regression line. Significance F values of 5% (0.05) or less are generally considered statistically significant. Like P values, lower the significant of the value, the more confident we can be of the overall significance of the regression equation.
Interpretation of Significance F is the low number means there is only 64% chance that our regression model fits the data purely by accident.
Significance F=0.643195730271619=64% > 5% that means ,there is no significant relationship between our two variables.
INTERPREATION OF REGRESSION OUTPUTSP valueP value=0.000253490931854696=0.025% .It
indicates high statistical significance of our independent variables individually. It shows how confident we are in your analysis. For a P value to be statistically significant, it has to be;
P value=5%=0.05P value=1%=0.01P value=10%=0.10
0 20 40 60 80 100 120120000
125000
130000
135000
140000
145000
Normal Probability Plot
Sample Percentile
Y
11000 11500 12000 12500 13000 13500 14000
-8000
-6000
-4000
-2000
0
2000
4000
6000
8000
X Variable 1 Residual Plot
X Variable 1
Resi
duals
11000 11500 12000 12500 13000 13500 14000120000
125000
130000
135000
140000
145000
X Variable 1 Line Fit Plot
YPredicted Y
X Variable 1
Y