Recep maz msb 701 quantitative analysis for managers

Quantitative Analysis for Managers

Regression analysis application

Instructor: Prof. MINE AYSEN DOYRANStudent: Recep Maz

Regression analysisRegression analysis includes any techniques for

modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

Regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed.

Regression analysisThe focus is on a quantile, or other location

parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function.

In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.

Regression analysisRegression analysis is widely used for

prediction and forecasting, where its use has substantial overlap with the field of machine learning.

Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships.

Regression analysisIn restricted circumstances, regression

analysis can be used to infer causal relationships between the independent and dependent variables.

Simple linear regression models have only two variables

Multiple regression models have more variables

Regression models involve the following variables

The variable to be predicted is called the dependent variable, Y

Sometimes called the response variable

The value of this variable depends on the value of the independent variable, X

Sometimes called the explanatory or predictor variable, control variable

A regression model relates Y to a function of X

Introduction regression models

dependent variable, Y independent variable, X A regression model relates Y to a

function of X

Independent variable

Dependent variable

Independent variable

= +

Testing the Model for Significance

If the F-statistic is large, the significance level (P-value) will be low, indicating it is unlikely this would have occurred by chance

If P value of F Statistic (Significance F) is smaller than 0.05 (5%), it means that your regression model is statistically significant.

Testing the Model for SignificanceThe best model is a statistically significant

model with a high r2 and few variablesAs more variables are added to the model,

the r2-value usually increasesFor this reason, the adjusted r2 value is often

used to determine the usefulness of an additional variable

The adjusted r2 takes into account the number of independent variables in the model

Testing the Model for Significance As the number of variables increases, the

adjusted r2 gets smaller unless the increase due to the new variable is large enough to offset the change in k (number of independent variables)

Testing the Model for Significance In general, if a new variable increases the adjusted

r2, it should probably be included in the modelIn some cases, variables contain duplicate

informationWhen two independent variables are correlated, they

are said to be collinearWhen more than two independent variables are

correlated, multicollinearity existsWhen multicollinearity is present, hypothesis tests

for the individual coefficients are not valid but the model may still be useful

Hypothesis statement , dependent variable and independent variable Dependent variable……: Total number of white

people between 18 to 64 yearsIndependent variable…: Number of white

people below poverty level between 18 to 64 years

Hypothesis statement..: Hypothesis statement is that while population of white adult people (18 to 64 years) increases, number of white people between 18 to 64 years who are living below poverty level decrease by the years.

INTERPREATION OF REGRESSION OUTPUTS R SquareR square= 0.024884311=2.5% of variation in

total number of white people between 18 to 64 years is explained by white people below poverty level . This value is indicating weak fitness.

I f R square is too high (0,8/0,9…) we will have multicollinearity problem. Which means our variables correlated each other. Fortunately, our R square value is not too high and it is also between 0 and 1.

INTERPREATION OF REGRESSION OUTPUTS Adjusted R squareAdjusted R Square= -0.0834618768434626=-8.3%

this value is indicating weak fitness.If the number of observations is small we may

obtain a higher value of r square. This can provide a very misleading indicator of goodness of fit. That is why many researchers use adjusted R square value instead.

If the adjusted R square value higher than R square value we may face multicollinearity problem.

Adjusted R Square=-8.3% < R square=2.5% . We don’t have multicollinearity problem.

INTERPREATION OF REGRESSION OUTPUTSSignificance FThe most important indicator to analysis regression outputs

significance F. This value refers statical significant of regression model. This value provides evidence of existence of a linear relationship between our two variables. It also provides a measure of the total variation explained by the regression relative to the total unexplained variation. The higher the significance F, the better the overall fit of the regression line. Significance F values of 5% (0.05) or less are generally considered statistically significant. Like P values, lower the significant of the value, the more confident we can be of the overall significance of the regression equation.

Interpretation of Significance F is the low number means there is only 64% chance that our regression model fits the data purely by accident.

Significance F=0.643195730271619=64% > 5% that means ,there is no significant relationship between our two variables.

INTERPREATION OF REGRESSION OUTPUTSP valueP value=0.000253490931854696=0.025% .It

indicates high statistical significance of our independent variables individually. It shows how confident we are in your analysis. For a P value to be statistically significant, it has to be;

P value=5%=0.05P value=1%=0.01P value=10%=0.10

0 20 40 60 80 100 120120000

125000

130000

135000

140000

145000

Normal Probability Plot

Sample Percentile

Y

11000 11500 12000 12500 13000 13500 14000

-8000

-6000

-4000

-2000

0

2000

4000

6000

8000

X Variable 1 Residual Plot

X Variable 1

Resi

duals

11000 11500 12000 12500 13000 13500 14000120000

125000

130000

135000

140000

145000

X Variable 1 Line Fit Plot

YPredicted Y

X Variable 1

Y

Recep maz msb 701 quantitative analysis for managers

Technology

Transcript of Recep maz msb 701 quantitative analysis for managers