My regression lecture mk3 (uploaded to web ct)

50
SIMPLE AND MULTIPLE REGRESSION Chris Stiff [email protected] 1

description

 

Transcript of My regression lecture mk3 (uploaded to web ct)

Page 1: My regression lecture   mk3 (uploaded to web ct)

SIMPLE AND MULTIPLE REGRESSION

Chris Stiff

[email protected]

1

Page 2: My regression lecture   mk3 (uploaded to web ct)

LEARNING OBJECTIVES

In this lecture you will learn: What simple and multiple regression mean. The rationale behind these forms of analyses How to conduct a simple bivariate and multiple

regression analyses using SPSS How to interpret the results of a regression

analysis

2

Page 3: My regression lecture   mk3 (uploaded to web ct)

REGRESSION What is regression?

Regression is similar to correlation in the sense that both assess the relationship between two variables

Regression is used to predict values of an outcome variable (y) from one or more predictor variables (x)

Predictors must either be continuous or categorical with ONLY two categories

3

Page 4: My regression lecture   mk3 (uploaded to web ct)

SIMPLE REGRESSION

Simple regression involves a single predictor variable and an outcome variable

Examines changes in an outcome variable from a predictor variable

Other names: Outcome = dependent, endogenous or

criterion variable.Predictor = independent, exogenous or

explanatory variable. 4

Page 5: My regression lecture   mk3 (uploaded to web ct)

SIMPLE REGRESSION The relationship between two variables can be

expressed mathematically by the slope of line of best fit.

Usually expressed as

Y = a + b X

Outcome Intercept + (Coefficient x Predictor)

5

Page 6: My regression lecture   mk3 (uploaded to web ct)

SIMPLE REGRESSIONWhere: Y = Outcome (e.g., amount of stupid behaviour)

a = Intercept/constant (average amount of stupid behaviour is nothing is drunk

b = Unit increment in the outcome that is explained by a unit increase in the predictor – line gradient

X = Predictor (e.g., amount of alcohol drunk)

6

Page 7: My regression lecture   mk3 (uploaded to web ct)

LINE OF BEST FIT

0102030405060708090100

0 5 10 15 20 25 30

7Amount of alcohol

Stu

pid

beha

viou

r

Page 8: My regression lecture   mk3 (uploaded to web ct)

LINE OF BEST FIT – POOR EXAMPLE

0102030405060708090100

0 5 10 15 20 25 30

Behaviour

8

Stu

pid

beha

viou

r

Number of pairs of socks

?

Page 9: My regression lecture   mk3 (uploaded to web ct)

SIMPLE REGRESSION USING SPSS

Analyze RegressionLinear

9

Page 10: My regression lecture   mk3 (uploaded to web ct)

10

Page 11: My regression lecture   mk3 (uploaded to web ct)

SPSS OUTPUT

11

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 amounta . Enter

a. All requested variables entered.

b. Dependent Variable: behaviour

Page 12: My regression lecture   mk3 (uploaded to web ct)

SPSS OUTPUT

12

R = correlation between amount drunk and stupid behaviourR square = proportion of variance in outcome (behaviour) accounted for by the predictor (amount drunk)Adjusted R square = takes into account the sample size and the number of predictor variables

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .746a .556 .531 20.44929

a. Predictors: (Constant), amount

Page 13: My regression lecture   mk3 (uploaded to web ct)

THE R2

The R2, increases with inclusion of more predictor variables into a regression model Commonly reported

The adjusted R2 however only increases when the new predictor(s) improves the model more than would be expected by chanceThe adj. R2 will always be equal to, or less

than R2

Particularly useful during variable selection stage of model building

13

Page 14: My regression lecture   mk3 (uploaded to web ct)

SPSS OUTPUT

14

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9421.425 1 9421.425 22.530 .000a

Residual 7527.125 18 418.174

Total 16948.550 19

a. Predictors: (Constant), amount

b. Dependent Variable: behaviour

Page 15: My regression lecture   mk3 (uploaded to web ct)

SPSS OUTPUT

15

Beta = standardised regression coefficient and shows the degree to which a unit increase in the predictor variable produces a standard deviation change in the outcome variable with all other things constant

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 16.250 8.042 2.021 .058

amount 2.227 .469 .746 4.747 .000

a. Dependent Variable: behaviour

Page 16: My regression lecture   mk3 (uploaded to web ct)

REPORTING THE RESULTS OF SIMPLE REGRESSION

ß = 74, t(18) = 4.74, p < .001, R2 = .56

16Beta value t value and associate df and p R square

Page 17: My regression lecture   mk3 (uploaded to web ct)

GENERATING DF AND T

df = n – p - 1 Where n is number of observations and p is number of parameters estimated (i.e.,

predictor(s) + constant).

NB This is for regression, df can be calculated differently for other tests!

17

Page 18: My regression lecture   mk3 (uploaded to web ct)

ASSUMPTIONS OF SIMPLE REGRESSION

Outcome variable should be measured at interval level

When plotted the data should have a linear trend

18

Page 19: My regression lecture   mk3 (uploaded to web ct)

SUMMARY OF SIMPLE REGRESSION

Used to predict the outcome variable from a predictor variable

Used when one predictor variable and one outcome variable

The relationship must be linear

19

Page 20: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION

Multiple regression is used when there is more than one predictor variable

Two major uses of multiple regression: Prediction Causal analysis

20

Page 21: My regression lecture   mk3 (uploaded to web ct)

USES OF MULTIPLE REGRESSION

Multiple regression can be used to examine the following: How well a set of variables predict an outcome Which variable in a set of variables is the best

predictor of the outcome Whether a predictor variable still predicts the

outcome when another variable is controlled for.

21

Page 22: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION - EXAMPLE

22

Attendance at lectures

Books read

Motivation

Exam Performance

(Grade)

What might predict exam performance?

Page 23: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION USING SPSS

Analyze Regression Linear

23

Page 24: My regression lecture   mk3 (uploaded to web ct)

24

Page 25: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION: SPSS OUTPUT

25

Variables Entered/Removedb

Lecturesattended,Number ofbooksread

a

. Enter

Model1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: Grade achievedb.

Page 26: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION: SPSS OUTPUT

26

Model Summary

.605a .367 .336 13.711Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Lectures attended, Number ofbooks read

a.

Page 27: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION: SPSS OUTPUT

27

ANOVAb

4569.053 2 2284.526 12.153 .000a

7895.258 42 187.982

12464.311 44

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Lectures attended, Number of books reada.

Dependent Variable: Grade achievedb.

For overall model: F(2, 42) = 12.153, p<.001

Page 28: My regression lecture   mk3 (uploaded to web ct)

MULTIPLE REGRESSION: SPSS OUTPUT

28

Coefficientsa

39.173 6.625 5.913 .000

3.832 1.712 .331 2.238 .031

1.290 .536 .356 2.407 .021

(Constant)

Number of books read

Lectures attended

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Grade achieveda.

Number of books read is significant predictorb=.33, t(42) = 2.24, p<.05

Lectures attended is a significant predictorb=.36, t(42) = 2.41, p<.05

Page 29: My regression lecture   mk3 (uploaded to web ct)

MAJOR TYPES OF MULTIPLE REGRESSION

There are different types of multiple regression:Standard multiple regression

EnterHierarchical multiple regression

Block entrySequential multiple regression

Forward Backward Stepwise

29

}}Statistical model building

Theory-based model building

Page 30: My regression lecture   mk3 (uploaded to web ct)

STANDARD MULTIPLE REGRESSION Most common method. All the predictor

variables are entered into the analysis simultaneously (i.e., enter)

Used to examine how much: An outcome variable is explained by a set of

predictor variables as a group Variance in the outcome variable is explained by

a single predictor (unique contribution).

30

Page 31: My regression lecture   mk3 (uploaded to web ct)

EXAMPLE The different methods of regression and their

associated outputs will be illustrated using: Outcome variable

Essay mark Predictor variables

Number lectures attended (out of 20) Motivation of student (on scale from 0 – 100) Number of course books read (from 0 -10)

31

Attendance at lectures

Books read

Motivation

Exam Performance

(Grade)

Page 32: My regression lecture   mk3 (uploaded to web ct)

ENTER OUTPUT

32

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 books, lectures,

motivationa

. Enter

a. All requested variables entered.

b. Dependent Variable: essay

Page 33: My regression lecture   mk3 (uploaded to web ct)

ENTER OUTPUT

33

R square = proportion of variance in outcome accounted for by the predictor variables Adjusted R square = takes into account the sample size and the number of predictor variables

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .918a .842 .812 6.84522

a. Predictors: (Constant), books, lectures, motivation

Page 34: My regression lecture   mk3 (uploaded to web ct)

ENTER OUTPUT

34

ANOVAb

95293.006 3 31764.335 17.030 .000a

382376.0 205 1865.249

477669.0 208

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Gender identification, Negative impressions males holdabout females, Positive impressions males hold about females

a.

Dependent Variable: Negative impression about malesb.

Page 35: My regression lecture   mk3 (uploaded to web ct)

ENTER OUTPUT

35

Beta = standardised regression coefficient and shows the degree to which the predictor variable predicts the outcome variable with all other things constant

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

Page 36: My regression lecture   mk3 (uploaded to web ct)

HIERARCHICAL MULTIPLE REGRESSION aka sequential regression

Predictor variables entered in a prearranged order of steps (i.e., block entry)

Can examine how much variance is accounted for by a predictor when others already in the model

36

Page 37: My regression lecture   mk3 (uploaded to web ct)

37

Page 38: My regression lecture   mk3 (uploaded to web ct)

38

Don’t forget to choose the r-square change option from the Statistics menu

Page 39: My regression lecture   mk3 (uploaded to web ct)

BLOCK ENTRY OUTPUT

39

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 lecturesa . Enter

2 books,

motivationa

. Enter

a. All requested variables entered.

b. Dependent Variable: essay

Page 40: My regression lecture   mk3 (uploaded to web ct)

BLOCK ENTRY OUTPUT

40

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .884a .781 .768 7.60374

2 .918b .842 .812 6.84522

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

Model Summary

Model

Change Statistics

R Square

Change F Change df1 df2 Sig. F Change

1 .781 64.069 1 18 .000

2 .061 3.105 2 16 .073

NB – this will be in one long line in the output!

Page 41: My regression lecture   mk3 (uploaded to web ct)

BLOCK ENTRY OUTPUT

41

ANOVAc

Model Sum of Squares df Mean Square F Sig.

1 Regression 3704.295 1 3704.295 64.069 .000a

Residual 1040.705 18 57.817

Total 4745.000 19

2 Regression 3995.288 3 1331.763 28.422 .000b

Residual 749.712 16 46.857

Total 4745.000 19

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

c. Dependent Variable: essay

Page 42: My regression lecture   mk3 (uploaded to web ct)

BLOCK ENTRY OUTPUT

42

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 30.311 3.042 9.965 .000

lectures 2.194 .274 .884 8.004 .000

2 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

Page 43: My regression lecture   mk3 (uploaded to web ct)

STATISTICAL MULTIPLE REGRESSION

aka sequential techniques

43

Page 44: My regression lecture   mk3 (uploaded to web ct)

STATISTICAL MULTIPLE REGRESSION aka sequential techniques

Relies on SPSS selecting which predictor variables to include in a model

Three types: Forward selection Backward selection Stepwise selection

44

Page 45: My regression lecture   mk3 (uploaded to web ct)

Forward Starts with no variables in model, tries them all, includes best predictor, repeats

Backward Starts with ALL variable, removes lowest contributor, repeats

Stepwise Combination. Starts as Forward, checks that all variables are making contribution after each iteration (like Backward)

45

Page 46: My regression lecture   mk3 (uploaded to web ct)

SUMMARY OF MODEL SELECTION TECHNIQUES

Theory basedEnter - all predictors entered together

(standard)Block entry – predictors entered in groups

(hierarchical)

Statistical basedForward – variables entered in to the model

based on their statistical significanceBackward – variables are removed from the

model based on their statistical significanceStepwise – variables are moved in and out of

the model based on their statistical significance46

Page 47: My regression lecture   mk3 (uploaded to web ct)

ASSUMPTIONS OF REGRESSION

Linearity Relationship between the dependent and predictors must be

linear check: violations assessed using a scatter-plot

Independence Values on outcome variables must be independent

i.e., each value comes from a different participant Homoscedasity

At each level of the predictor variable the variance of the residual terms should be equal (i.e. all data points should be about as close to the line of best fit) Can indicate if all data is drawn from same sample

Normality Residuals/errors should be normally distributed

check : violations using histograms (e.g., outliers) Multicollinearity

Predictor variables should not be highly correlated

47

Page 48: My regression lecture   mk3 (uploaded to web ct)

OTHER IMPORTANT ISSUES Regression in this case is for

continuous/interval or categorical predictors with ONLY two categories More than two are possible (dummy coding)

Outcome must be continuous/interval

Sample Size Multiple regression needs a relatively large sample

size some authors suggest using between 10 and 20

participants per predictor variable others argue should be 50 cases more than the

number of predictors to be sure that one is not capitalising on chance effects

48

Page 49: My regression lecture   mk3 (uploaded to web ct)

OUTCOMES So – what is regression?

This lecture has: introduced the different types regression detailed how to conduct and interpret regression

using SPSS described the underlying assumptions of regression outlined the data types and sample sizes needed

for regression outlined the major limitation of a regression

analysis

49

Page 50: My regression lecture   mk3 (uploaded to web ct)

REFERENCES

Allison, P. D. (1999). Multiple regression: a primer. Thousand oaks: pine press.

Clark-carter, D. (2004). Quantitative psychological research: A student’s handbook. Hove: psychology press.

Coolican, H. (2004). Research methods and statistics in psychology (4th ed). Oxon: Hodder Arnold.

George, D., & Mallery, P. (2005). SPSS for windows step by step (5th ed). Pearson: Boston.

Field, A. (2002). Discovering statistics using SPSS for windows. London: sage publications.

Pallant, J. (2002). SPSS survival manual. Buckingham: open university press.

http://www.statsoft.com/textbook/stmulreg.html#aassumption

50