Advanced Financial Accounting II Åbo Akademi School of...

Post on 18-Jul-2020

3 views 0 download

Transcript of Advanced Financial Accounting II Åbo Akademi School of...

Regression analysis

Advanced Financial Accounting II

Åbo Akademi School of Business

Regression analysis

A statistical process for estimating the relationships

among variables

Includes many techniques for modeling and analyzing

several variables, when the focus is on the relationship

between a dependent variable and one or more

independent variables

Helps one understand how the typical value of the

dependent variable (or 'Criterion Variable') changes

when any one of the independent variables is varied,

while the other independent variables are held fixed

Regression models and regression


Regression models involve the following variables:

– The unknown parameters, b, which may represent a

scalar or a vector.

– The independent variables, X.

– The dependent variable, Y.

A regression model relates Y to a function of X and b

Y = f(X,b)

Regression model and regression


Regression analysis estimates the conditional

expectation of the dependent variable given the

independent variables

E(Y | X) = f(X,b)

The estimation target is the regression function Y =


it is also of interest to characterize the variation of the

dependent variable around the regression function,

which can be described by a probability distribution

Linear regression

In linear regression, the model specification is that the

dependent variable, is a linear combination of the

parameters b

– need not be linear in the independent variables X

For example, in simple linear regression for modeling n

data points there is one independent variable X, and

two parameters, b0 and b1 giving the straight line

yi = b0 + b1xi + ei

ei is an error term and the subscript i indexes a

particular observation

Simple linear regression

Example of simple linear regression, which has one

independent variable


Once a regression model has been constructed, it

may be important to confirm the goodness of fit of the

model and the statistical significance of the estimated


Commonly used checks of goodness of fit include

– the coefficient of determination R2

– analyses of the pattern of residuals

– hypothesis testing

Statistical significance can be checked by

– F-test of the overall fit

– t-tests of individual parameters

Goodness of fit – Coefficient of

determination R2

The coefficient of determination, R2 indicates how well

data points fit a line or curve

Provides a measure of how well observed outcomes

are replicated by the model, as the proportion of total

variation of outcomes explained by the model

The better the linear regression fits the data, the

closer the value of R2 is to one

squares of sum total the ,SS

squares of sum residual the ,SS

















Goodness of fit – Adjusted R2

R2 automatically increases when extra explanatory

variables are added to the model

Some of the increase may be due to spurious effects

A modification of R2 adjusts for the number of

explanatory terms in a model relative to the number

of data points

Unlike R2, the adjusted R2 increases when a new

explanator is included only if the new explanator

improves the R2 more than would be expected in the

absence of any explanatory value being added by the

new explanator

Simple linear regression analysis – an


Research question: Does the amount of money spent

on advertising in affect the yearly sales of a company?

Data: File: AFAII_Regression_Excercise.xlsx

– Yearly sales (Sales)

– Amount spent on advertising (AdvTotal)

for 100 companies

Regression equation to estimate:

Salesi = b0 + b1AdvTotali + ei

Simple regression analysis with SPSS




Move Sales to Dependent

Move AdvTotal to Independent(s)


Simple Linear Regression Analysis

with SPSS – Interpretation – Model fit

Adjusted R2 = 0.375

37.5 % of the variation in

the yearly sales is explained

by the amount spent on

advertising – all other

factors fixed

Simple Linear Regression Analysis with

SPSS – Significance of total model

The F-statistics for

the total model

significant at 5 % level

Simple Linear Regression Analysis with

SPSS – Interpretation – Coefficients

t-values for both Constant and the

independent variable AdvTotal >

1.96 the parameter estimates are

significant at 5 % level

Estimated regression equation

Salesi = 11 890,599 + 4.914 AdvTotali + ei

Multiple linear regression analysis

In the more general multiple regression model, there are p independent variables:

yi = b0 + b1xi1 + b2xi2 + … + bpxip + ei

The predictor variables have to be linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others

Highly correlated predictor variables lead to multicollinearity problems where the coefficient estimates may change erratically in response to small changes in the model or the data

– Multicollinearity does not reduce the predictive power

or reliability of the model as a whole but it may not give valid results about any individual predictor

Multiple linear regression analysis –

an example

Research question: Do the amounts of money spent on advertising in TV, web, and press affect the yearly sales of a company?

Data: File: AFAII_Regression_Excercise.xlsx

– Yearly sales (Sales)

– Amount spent on advertising in TV (AdvTV)

– Amount spent on advertising in web (AdvWeb)

– Amount spent on advertising in press (AdvPress)

for 100 companies

Regression equation to estimate:

Salesi = b0 + b1AdvTVi + b2AdvWebi + b3AdvPressi + ei

Multiple linear regression analysis

with SPSS




Move Sales to Dependent

Move AdvTV, AdvWeb, and AdvPress to


Method: Enter


MLR with SPSS – Interpretation

Coefficients for all three

independent variables

are estimated

MLR with SPSS – Interpretation –

Goodness of fit

Adjusted R2 = 0.398

39.8 % of the variation in

the yearly sales is explained

by the amount spent on

advertising in TV, web and


MLR with SPSS – Interpretation –

Significance of total model

The F-statistics for

the total model

significant at 5 % level

MLR with SPSS – Interpretation –


Coefficients for AdvTV and

AdvWeb significant at 5 % level

(t-value > 1.96, significance >

0.05) Constant and coefficient

for AdvPress insignificant

Stepwise regression models

The method Enter estimates a model simultaneously

including all the suggested variables that pass some

predefined criteria

The insignificance of one of the suggested predictor

variables, AdvPress, suggests that a more suitable

model could be found by eliminating this variable

In order to find a suitable variable combination, a

stepwise estimation process may be selected

In SPSS: Method: Stepwise

Stepwise MLR with SPSS

The variables AdvTV and

AdvWeb were entered in the

regression model in the

order they improve the

total model significance (F-

statistics). AdvPress was left

outside the model.

Stepwise MLR with SPSS –

Development of Goodness of fit

Entering the second

independent variable

AdvWeb increases the

explanation power of the

model from 34.9 % to

39.4 %

Stepwise MLR with SPSS –


t-values for both Constant and the

independent variables AdvTV and

AdvWeb > 1.96 the parameter

estimates are significant at 5 % level Estimated regression equation

Salesi = 8 450.755 + 4.549 AdvTVi + 21.532 AdvWebi + ei