My regression lecture mk3 (uploaded to web ct)

Post on 30-Nov-2014

5.133 views 2 download

description

 

Transcript of My regression lecture mk3 (uploaded to web ct)

SIMPLE AND MULTIPLE REGRESSION

Chris Stiff

c.stiff@psy.keele.ac.uk

1

LEARNING OBJECTIVES

In this lecture you will learn: What simple and multiple regression mean. The rationale behind these forms of analyses How to conduct a simple bivariate and multiple

regression analyses using SPSS How to interpret the results of a regression

analysis

2

REGRESSION What is regression?

Regression is similar to correlation in the sense that both assess the relationship between two variables

Regression is used to predict values of an outcome variable (y) from one or more predictor variables (x)

Predictors must either be continuous or categorical with ONLY two categories

3

SIMPLE REGRESSION

Simple regression involves a single predictor variable and an outcome variable

Examines changes in an outcome variable from a predictor variable

Other names: Outcome = dependent, endogenous or

criterion variable.Predictor = independent, exogenous or

explanatory variable. 4

SIMPLE REGRESSION The relationship between two variables can be

expressed mathematically by the slope of line of best fit.

Usually expressed as

Y = a + b X

Outcome Intercept + (Coefficient x Predictor)

5

SIMPLE REGRESSIONWhere: Y = Outcome (e.g., amount of stupid behaviour)

a = Intercept/constant (average amount of stupid behaviour is nothing is drunk

b = Unit increment in the outcome that is explained by a unit increase in the predictor – line gradient

X = Predictor (e.g., amount of alcohol drunk)

6

LINE OF BEST FIT

0102030405060708090100

0 5 10 15 20 25 30

7Amount of alcohol

Stu

pid

beha

viou

r

LINE OF BEST FIT – POOR EXAMPLE

0102030405060708090100

0 5 10 15 20 25 30

Behaviour

8

Stu

pid

beha

viou

r

Number of pairs of socks

?

SIMPLE REGRESSION USING SPSS

Analyze RegressionLinear

9

10

SPSS OUTPUT

11

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 amounta . Enter

a. All requested variables entered.

b. Dependent Variable: behaviour

SPSS OUTPUT

12

R = correlation between amount drunk and stupid behaviourR square = proportion of variance in outcome (behaviour) accounted for by the predictor (amount drunk)Adjusted R square = takes into account the sample size and the number of predictor variables

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .746a .556 .531 20.44929

a. Predictors: (Constant), amount

THE R2

The R2, increases with inclusion of more predictor variables into a regression model Commonly reported

The adjusted R2 however only increases when the new predictor(s) improves the model more than would be expected by chanceThe adj. R2 will always be equal to, or less

than R2

Particularly useful during variable selection stage of model building

13

SPSS OUTPUT

14

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9421.425 1 9421.425 22.530 .000a

Residual 7527.125 18 418.174

Total 16948.550 19

a. Predictors: (Constant), amount

b. Dependent Variable: behaviour

SPSS OUTPUT

15

Beta = standardised regression coefficient and shows the degree to which a unit increase in the predictor variable produces a standard deviation change in the outcome variable with all other things constant

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 16.250 8.042 2.021 .058

amount 2.227 .469 .746 4.747 .000

a. Dependent Variable: behaviour

REPORTING THE RESULTS OF SIMPLE REGRESSION

ß = 74, t(18) = 4.74, p < .001, R2 = .56

16Beta value t value and associate df and p R square

GENERATING DF AND T

df = n – p - 1 Where n is number of observations and p is number of parameters estimated (i.e.,

predictor(s) + constant).

NB This is for regression, df can be calculated differently for other tests!

17

ASSUMPTIONS OF SIMPLE REGRESSION

Outcome variable should be measured at interval level

When plotted the data should have a linear trend

18

SUMMARY OF SIMPLE REGRESSION

Used to predict the outcome variable from a predictor variable

Used when one predictor variable and one outcome variable

The relationship must be linear

19

MULTIPLE REGRESSION

Multiple regression is used when there is more than one predictor variable

Two major uses of multiple regression: Prediction Causal analysis

20

USES OF MULTIPLE REGRESSION

Multiple regression can be used to examine the following: How well a set of variables predict an outcome Which variable in a set of variables is the best

predictor of the outcome Whether a predictor variable still predicts the

outcome when another variable is controlled for.

21

MULTIPLE REGRESSION - EXAMPLE

22

Attendance at lectures

Books read

Motivation

Exam Performance

(Grade)

What might predict exam performance?

MULTIPLE REGRESSION USING SPSS

Analyze Regression Linear

23

24

MULTIPLE REGRESSION: SPSS OUTPUT

25

Variables Entered/Removedb

Lecturesattended,Number ofbooksread

a

. Enter

Model1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: Grade achievedb.

MULTIPLE REGRESSION: SPSS OUTPUT

26

Model Summary

.605a .367 .336 13.711Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Lectures attended, Number ofbooks read

a.

MULTIPLE REGRESSION: SPSS OUTPUT

27

ANOVAb

4569.053 2 2284.526 12.153 .000a

7895.258 42 187.982

12464.311 44

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Lectures attended, Number of books reada.

Dependent Variable: Grade achievedb.

For overall model: F(2, 42) = 12.153, p<.001

MULTIPLE REGRESSION: SPSS OUTPUT

28

Coefficientsa

39.173 6.625 5.913 .000

3.832 1.712 .331 2.238 .031

1.290 .536 .356 2.407 .021

(Constant)

Number of books read

Lectures attended

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Grade achieveda.

Number of books read is significant predictorb=.33, t(42) = 2.24, p<.05

Lectures attended is a significant predictorb=.36, t(42) = 2.41, p<.05

MAJOR TYPES OF MULTIPLE REGRESSION

There are different types of multiple regression:Standard multiple regression

EnterHierarchical multiple regression

Block entrySequential multiple regression

Forward Backward Stepwise

29

}}Statistical model building

Theory-based model building

STANDARD MULTIPLE REGRESSION Most common method. All the predictor

variables are entered into the analysis simultaneously (i.e., enter)

Used to examine how much: An outcome variable is explained by a set of

predictor variables as a group Variance in the outcome variable is explained by

a single predictor (unique contribution).

30

EXAMPLE The different methods of regression and their

associated outputs will be illustrated using: Outcome variable

Essay mark Predictor variables

Number lectures attended (out of 20) Motivation of student (on scale from 0 – 100) Number of course books read (from 0 -10)

31

Attendance at lectures

Books read

Motivation

Exam Performance

(Grade)

ENTER OUTPUT

32

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 books, lectures,

motivationa

. Enter

a. All requested variables entered.

b. Dependent Variable: essay

ENTER OUTPUT

33

R square = proportion of variance in outcome accounted for by the predictor variables Adjusted R square = takes into account the sample size and the number of predictor variables

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .918a .842 .812 6.84522

a. Predictors: (Constant), books, lectures, motivation

ENTER OUTPUT

34

ANOVAb

95293.006 3 31764.335 17.030 .000a

382376.0 205 1865.249

477669.0 208

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Gender identification, Negative impressions males holdabout females, Positive impressions males hold about females

a.

Dependent Variable: Negative impression about malesb.

ENTER OUTPUT

35

Beta = standardised regression coefficient and shows the degree to which the predictor variable predicts the outcome variable with all other things constant

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

HIERARCHICAL MULTIPLE REGRESSION aka sequential regression

Predictor variables entered in a prearranged order of steps (i.e., block entry)

Can examine how much variance is accounted for by a predictor when others already in the model

36

37

38

Don’t forget to choose the r-square change option from the Statistics menu

BLOCK ENTRY OUTPUT

39

Variables Entered/Removedb

Model

Variables

Entered

Variables

Removed Method

1 lecturesa . Enter

2 books,

motivationa

. Enter

a. All requested variables entered.

b. Dependent Variable: essay

BLOCK ENTRY OUTPUT

40

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .884a .781 .768 7.60374

2 .918b .842 .812 6.84522

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

Model Summary

Model

Change Statistics

R Square

Change F Change df1 df2 Sig. F Change

1 .781 64.069 1 18 .000

2 .061 3.105 2 16 .073

NB – this will be in one long line in the output!

BLOCK ENTRY OUTPUT

41

ANOVAc

Model Sum of Squares df Mean Square F Sig.

1 Regression 3704.295 1 3704.295 64.069 .000a

Residual 1040.705 18 57.817

Total 4745.000 19

2 Regression 3995.288 3 1331.763 28.422 .000b

Residual 749.712 16 46.857

Total 4745.000 19

a. Predictors: (Constant), lectures

b. Predictors: (Constant), lectures, books, motivation

c. Dependent Variable: essay

BLOCK ENTRY OUTPUT

42

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 30.311 3.042 9.965 .000

lectures 2.194 .274 .884 8.004 .000

2 (Constant) 19.738 5.399 3.656 .002

lectures 1.217 .469 .490 2.595 .020

motivation .352 .144 .466 2.450 .026

books .509 .504 .103 1.010 .327

a. Dependent Variable: essay

STATISTICAL MULTIPLE REGRESSION

aka sequential techniques

43

STATISTICAL MULTIPLE REGRESSION aka sequential techniques

Relies on SPSS selecting which predictor variables to include in a model

Three types: Forward selection Backward selection Stepwise selection

44

Forward Starts with no variables in model, tries them all, includes best predictor, repeats

Backward Starts with ALL variable, removes lowest contributor, repeats

Stepwise Combination. Starts as Forward, checks that all variables are making contribution after each iteration (like Backward)

45

SUMMARY OF MODEL SELECTION TECHNIQUES

Theory basedEnter - all predictors entered together

(standard)Block entry – predictors entered in groups

(hierarchical)

Statistical basedForward – variables entered in to the model

based on their statistical significanceBackward – variables are removed from the

model based on their statistical significanceStepwise – variables are moved in and out of

the model based on their statistical significance46

ASSUMPTIONS OF REGRESSION

Linearity Relationship between the dependent and predictors must be

linear check: violations assessed using a scatter-plot

Independence Values on outcome variables must be independent

i.e., each value comes from a different participant Homoscedasity

At each level of the predictor variable the variance of the residual terms should be equal (i.e. all data points should be about as close to the line of best fit) Can indicate if all data is drawn from same sample

Normality Residuals/errors should be normally distributed

check : violations using histograms (e.g., outliers) Multicollinearity

Predictor variables should not be highly correlated

47

OTHER IMPORTANT ISSUES Regression in this case is for

continuous/interval or categorical predictors with ONLY two categories More than two are possible (dummy coding)

Outcome must be continuous/interval

Sample Size Multiple regression needs a relatively large sample

size some authors suggest using between 10 and 20

participants per predictor variable others argue should be 50 cases more than the

number of predictors to be sure that one is not capitalising on chance effects

48

OUTCOMES So – what is regression?

This lecture has: introduced the different types regression detailed how to conduct and interpret regression

using SPSS described the underlying assumptions of regression outlined the data types and sample sizes needed

for regression outlined the major limitation of a regression

analysis

49

REFERENCES

Allison, P. D. (1999). Multiple regression: a primer. Thousand oaks: pine press.

Clark-carter, D. (2004). Quantitative psychological research: A student’s handbook. Hove: psychology press.

Coolican, H. (2004). Research methods and statistics in psychology (4th ed). Oxon: Hodder Arnold.

George, D., & Mallery, P. (2005). SPSS for windows step by step (5th ed). Pearson: Boston.

Field, A. (2002). Discovering statistics using SPSS for windows. London: sage publications.

Pallant, J. (2002). SPSS survival manual. Buckingham: open university press.

http://www.statsoft.com/textbook/stmulreg.html#aassumption

50