2SLS HATCO SPSS, STATA and SHAZAM Example by Eddie Oczkowski August...

21
2SLS HATCO SPSS, STATA and SHAZAM Example by Eddie Oczkowski August 2001 This example illustrates how to use SPSS to estimate and evaluate a 2SLS latent variable model. The bulk of the example relates to SPSS, the SHAZAM code is provided on the final page. We employ data from Hair et al (Multivariate Data Analysis, 1998). The data pertain to a company called HATCO and relate to purchase outcomes from and perceptions of the company. The models presented may not necessarily be good models, we simply use them for presentation purposes. Consider a model which has a single dependent variable (usage) and two latent independent variables (strategy and image). Dependent variable X9: Usage Level (how much of the firm’ s total product is purchased from HATCO). Latent Independent Variables Strategy X1: Delivery Speed (assume this is the scaling variable) X2: Price Level X3: Price Flexibility X7: Product Quality Image X4: Manufacturer’ s Image (assume this is the scaling variable) X6: Salesforce image

Transcript of 2SLS HATCO SPSS, STATA and SHAZAM Example by Eddie Oczkowski August...

2SLS HATCO SPSS, STATA and SHAZAM

Example by Eddie Oczkowski

August 2001

This example illustrates how to use SPSS to estimate and evaluate a 2SLS latent variable

model. The bulk of the example relates to SPSS, the SHAZAM code is provided on the

final page. We employ data from Hair et al (Multivariate Data Analysis, 1998). The data

pertain to a company called HATCO and relate to purchase outcomes from and

perceptions of the company. The models presented may not necessarily be good models,

we simply use them for presentation purposes. Consider a model which has a single

dependent variable (usage) and two latent independent variables (strategy and image).

Dependent variable

X9: Usage Level (how much of the firm’s total product is purchased from HATCO).

Latent Independent Variables

Strategy

X1: Delivery Speed (assume this is the scaling variable) X2: Price Level

X3: Price Flexibility

X7: Product Quality

Image

X4: Manufacturer’s Image (assume this is the scaling variable) X6: Salesforce image

2

2SLS Estimation

The 2SLS option is gained via:

Analyze � Regression � 2-Stage Least Squares

For our basic model (usage against strategy and image) the variable boxes are filled by:

Dependent Variable: X9 Explanatory Variables: X1 and X4 (these are our scaling variables)

Instrumental Variables: X2, X3, X7 and X6 (these are our non-scaling variables)

3

For the diagnostic testing of the model it is useful to save the residuals and predictions from this model using Options.

Part of the output from this 2SLS model is:

Two-stage Least Squares

Equation number: 1

Dependent variable.. X9

Multiple R .58798

R Square .34573

Adjusted R Square .33224

Standard Error 6.61991

Analysis of Variance:

DF Sum of Squares Mean Square

Regression 2 2246.2000 1123.1000

Residuals 97 4250.8496 43.8232

F = 25.62798 Signif F = .0000

4

------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

X1 5.362919 .834134 .787978 6.429 .0000

X4 2.284282 .735917 .287522 3.104 .0025

(Constant) 15.261425 4.877526 3.129 .0023

The following new variables are being created:

Name Label

FIT_1 Fit for X9 from 2SLS, MOD_2 Equation 1

ERR_1 Error for X9 from 2SLS, MOD_2 Equation 1

Comments: The R-Square is 0.34 and F-statistic being significant indicates reasonable

overall fit. The two independent variables are both statistically significant with expected

positive signs. Two variables have been created: FIT_1 is the IV ‘fitted value’ variable

while ERR_1 is the IV residual.

2SLS as two OLS Regressions

Consider now the 2 step method for calculating estimates. This should be employed to get the 2SLS forecasts and residuals for later diagnostic testing.

The first step is to run a regression for each scaling variable against all instruments and

save predictions.

OLS Regression: X1 against X2, X3, X6, X7, save predictions.

OLS Regression: X4 against X2, X3, X6, X7, save predictions.

Recall the R-square values from these runs can be examined to ascertain the possible

usefulness of the instruments.

5

The standard OLS option is gained via:

Analyze � Regression � Linear

The 1st regression is:

OLS Regression: X1 against X2, X3, X6, X7, save predictions.

6

Save the predictions in the Save box.

Part of the output from the regression is:

Regression

Model Summaryb

Model

R

R Square

Adjusted

R Square

Std. Error of

the Estimate

1 .604a .365 .338 1.075

a. Predictors: (Constant), Product Quality, Salesforce

Image, Price Flexibility, Price Level

b. Dependent Variable: Delivery Speed

7

Coefficientsa

Model

Unstandardized

Coefficients

Standardi

zed

Coefficien

ts

t

Sig. B Std. Error Beta

1 (Constant)

Price Level Price

Flexibility

Salesforce Image

Product Quality

2.335

-6.44E-02

.322

.271

-.277

1.117

.110

.094

.144

.081

-.058

.338

.158

-.332

2.091

-.583

3.438

1.884

-3.409

.039

.561

.001

.063

.001

a. Dependent Variable: Delivery Speed

Comments: The R-square exceeds 0.10 and some variables are significant, this

indicates some instrument acceptability. Note, however, that Price Level appears not to

be a good instrument. A new variable with the predictions has been saved here: pre_1.

The same approach is used for the other scaling variable.

OLS Regression: X4 against X2, X3, X6, X7, save predictions.

Part of the output from this regression is:

Regression

Model Summaryb

Model

R

R Square

Adjusted

R Square

Std. Error of

the Estimate

1 .799a .639 .623 .694

a. Predictors: (Constant), Product Quality, Salesforce

Image, Price Flexibility, Price Level

b. Dependent Variable: Manufacturer Image

8

Coefficientsa

Model

Unstandardized

Coefficients

Standardi

zed

Coefficien

ts

t

Sig. B Std. Error Beta

1 (Constant)

Price Level Price

Flexibility

Salesforce Image

Product Quality

2.261

.108

-3.01E-02

1.125

-4.41E-03

.721

.071

.060

.093

.052

.114

-.037

.767

-.006

3.134

1.516

-.498

12.087

-.084

.002

.133

.620

.000

.933

a. Dependent Variable: Manufacturer Image

Comments: The R-square is much better here, and so the instruments appear to be better

for image rather than strategy. Here clearly Salesforce Image is the key instrument for the

image scaling variable. A new variable with the predictions has been saved here: pre_2.

The final step in the process is to OLS regress the dependent variable (X9) on the two

new prediction variables (pre_1 and pre_2).

9

To produce the 2SLS forecasts and residuals we need to use the Save option:

Part of the output from the 2nd stage regression is:

Regression

Model Summaryb

Model

R

R Square

Adjusted

R Square

Std. Error of

the Estimate

1 .530a .281 .266 7.701

a. Predictors: (Constant), Unstandardized Predicted

Value, Unstandardized Predicted Value

b. Dependent Variable: Usage Level

10

ANOVAb

Model

Sum of

Squares

df

Mean Square

F

Sig.

1 Regression

Residual

Total

2246.200

5752.800

7999.000

2

97

99

1123.100

59.307

18.937 .000a

a. Predictors: (Constant), Unstandardized Predicted Value, Unstandardized Predicted Value

b. Dependent Variable: Usage Level

Coefficientsa

Model

Unstandardized

Coefficients

Standardi

zed

Coefficien

ts

t

Sig. B Std. Error Beta

1 (Constant)

Unstandardized

Predicted Value

Unstandardized

Predicted Value

15.261

5.363

2.284

5.674

.970

.856

.476

.230

2.690

5.527

2.668

.008

.000

.009

a. Dependent Variable: Usage Level

Comments: Note how the parameter estimates are the same between this regression and

the initial 2SLS model. Also note how the standard errors (and hence t and significance

levels) are different. The reported R-square is the (GR 2 ) generalized R-square referred to

in the notes and this indicates how 28.1% of the variation in the data is explained. This is

different to the initially presented R-square in the 2SLS model of 34.6%. Two new variables have been saved: pre_3 which are the 2SLS forecasts and res_1 which are the

2SLS residuals.

Over-identifying Restrictions Test

To perform this test we perform a regression of the IV residuals (err_1) against all the instruments: X2, X3, X6, X7. Note the R-square from this regression and multiply it by the sample size (N = 100) to get the test statistic. In this case the degrees of freedom (no. of instruments less no. of RHS variables) is (4 – 2 = 2). At the 5% level of significance

the critical value for a chi-square with d.f. = 2 is: 5.99

11

The relevant regression window is:

Part of the output from this regression is:

Regression

Model Summary

Model

R

R Square

Adjusted

R Square

Std. Error of

the Estimate

1 .680a .462 .440 4.9052771

a. Predictors: (Constant), Product Quality, Salesforce

Image, Price Flexibility, Price Level

12

Coefficientsa

Model

Unstandardized

Coefficients

Standardi

zed

Coefficien

ts

t

Sig. B Std. Error Beta

1 (Constant)

Price Level Price

Flexibility

Salesforce Image

Product Quality

-34.839

3.511

3.119

-1.496

.847

5.097

.504

.427

.658

.370

.641

.660

-.176

.205

-6.836

6.964

7.308

-2.274

2.285

.000

.000

.000

.025

.025

a. Dependent Variable: Error for X9 from 2SLS, MOD_2 Equation 1

Comments: The R-square is 0.462 and so the test statistic is: N * R-Square = 100

(0.462) = 46.2, this far exceeds the critical value of 5.99 and therefore we conclude that

there is a model specification problem or the instruments are invalid. There is a major

problem here. Note, all the instruments are significant in this equation illustrating how

the instruments can explain significant amounts of the variation in the residuals.

RESET (Specification Error Test)

To perform this test we first need to compute the square of the 2SLS forecasts. That is

we need to compute: pre_3 *pre_3. We can call the new variable whatever we want, say,

pre_32.

13

To do this we use the option:

Transform � Compute

The new variable pre_32 is now added to the original 2SLS model. That is, we employ the original dependent, independent and instrumental variables, but we add to the

independent variables and instrumental variables pre_32. Part of the output from this

2SLS regression is:

Two-stage Least Squares

Dependent variable.. X9

Multiple R .48849

R Square .23863

Adjusted R Square .21483

Standard Error 8.68198

14

------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

X1

8.950208

8.336701

1.315060

1.074

.2857

X4 3.877008 3.794486 .487998 1.022 .3095

PRE_32 -.007123 .016422 -.360003 -.434 .6655

(Constant) 9.590647 14.521680 .660 .5106

Comments: The test statistic is the t-ratio for pre_32. In this case the t-ratio is –0.434

with a p-value of 0.6655. This is highly insignificant. This implies that there are no

omitted variables and the functional form can be trusted. Taken together with the

previous test, this may imply that the problems with the model relate to inadequate

instruments.

Hetero scedasticity Test

To perform this test we initially have to square the IV residuals using the compute option:

err_12 = err_1 * err_1

15

This new variable (err_12) is then regressed against the 2SLS forecasts (pre_32) and the t-ratio on the forecast variable represents the test statistic.

The output from this regression is: Regression

Model Summary

Model

R

R Square

Adjusted

R Square

Std. Error of

the Estimate

1 .069a .005 -.005 51.4851

a. Predictors: (Constant), PRE_32

16

Coefficientsa

Model

Unstandardized

Coefficients

Standardi

zed

Coefficien

ts

t

Sig. B Std. Error Beta

1 (Constant)

PRE_32

25.777

7.790E-03

24.997

.011

.069

1.031

.684

.305

.496

a. Dependent Variable: ERR_12

The t-ratio on pre_32 is 0.684 with a p-value of 0.496, this is highly insignificant

indicating the absence of heteroscedastcity.

Interaction Effects

To illustrate interaction effects, assume that strategy and image interact to create a new

interaction latent independent variable. This variable is in addition to the original two

independent variables. To create the new variables we employ the transform � compute

option. For the new independent variable we multiply the scaling variables by each

other: say X1X4 = X1*X4

17

The instruments for this new variable are the products of all the remaining non-scaling

variables across the two constructs. Since there is only one non-scaling variable for

image we simply multiply it with the non-scaling variables for strategy to get our

instruments:

X2X6 =X2*X6 X3X6 = X3*X6 X7X6 = X7*X6

Thus the original 2SLS model is run again with one new explanatory variable X1X4 and three new instrumental variables X2X6, X3X6, X7X6.

Part of the output from this 2SLS regression is:

Two-stage Least Squares

Dependent variable.. X9

Multiple R .59043

R Square .34861

Adjusted R Square .32826

Standard Error 6.67686

18

------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

X1

8.295013

4.662882

1.218792

1.779

.0784

X4 4.506761 3.536422 .567265 1.274 .2056 X1X4 -.555352 .859773 -.519850 -.646 .5199

(Constant) 3.577392 19.010684 .188 .8511

Comments: Note, this model appears to be inferior to the original specification. All the variables are now insignificant, including the new interaction term X1X4.

Non-nested Testing

To illustrate these tests consider two models:

Model A: Usage Strategy

Model B: Usage Image

Assume we wish to ascertain which variable better explains usage. We will conduct a

paired test alternating the role of Models A and B.

Case 1

H0: Null model: Usage Strategy

H1: Alternative model: Usage Image

In terms of our notation, our x’s are the strategy indicators while the w’s are the image indicators. The three steps are:

1. Regression: X4 on X6 and save the predictions (pre_4).

2. 2SLS regression X9 on X1 and pre_4 (instruments: X2, X3, X7 and pre_4). 3. The t-ratio on the pre_4 variable is the test statistic.

The output from this 2SLS regression is:

Two-stage Least Squares

Dependent variable.. X9

Multiple R .58664

R Square .34415

Adjusted R Square .33062

Standard Error 6.47420

19

------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

X1

5.095873

.822486

.748740

6.196

.0000

PRE_4 1.998917 .735642 .198320 2.717 .0078

(Constant) 17.697687 4.564165 3.878 .0002

Comments: The t-ratio for Pre_4 is 2.717 with a p-value of 0.0078, this is highly

significant. This implies that the alternative model H1 image rejects the null model H0

strategy.

Case 2

H0: Null model: Usage Image

H1: Alternative model: Usage Strategy

In terms of our notation our, x’s are the image indicators while the w’s are the strategy

indicators. The three steps are:

1. Regression: X1 on X2,X3,X7 and save the predictions (pre_5).

4. 2SLS regression X9 on X4 and pre_5 (instruments: X6 and pre_5).

5. The t-ratio on the pre_5 variable is the test statistic.

The output from this 2SLS regression is:

Two-stage Least Squares Dependent variable.. X9

Multiple R .53666

R Square .28800

Adjusted R Square .27332

Standard Error 7.68902

------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

X4

3.227772

.886499

.406279

3.641

.0004

PRE_5 6.010515 1.032240 .515718 5.823 .0000

(Constant) 8.033696 6.596728 1.218 .2262

Comments: The t-ratio for Pre_5 is 5.823 with a p-value of 0.0000, this is highly

significant. This implies that the alternative model H1 strategy rejects the null model H0

image.

In summary these results combined imply that both models reject each other and

therefore it is erroneous to use either in isolation.

20

2SLS HATCO STATA EXAMPLE

This section presents the STATA code corresponding to the SPSS example.

* Original 2SLS model

ivregress 2sls X9 (X1 X4 = X2 X3 X7 X6)

predict FIT_1

predict ERR_1, r

* 2 step OLS version to get 2SLS predictions, residuals and GR^2

regress X1 X2 X3 X6 X7

predict PRE_1

regress X4 X2 X3 X6 X7

predict PRE_2

regress X9 PRE_1 PRE_2

predict PRE_3

predict RES_1, r

* Over-identifying restrictions test

regress ERR_1 X2 X3 X6 X7

gen OIR=e(N)*e(r2)

display OIR

*RESET test

gen PRE_32=PRE_3*PRE_3

ivregress 2sls X9 PRE_32 (X1 X4 = X2 X3 X7 X6 PRE_32)

* Heteroscedasticity Test

gen ERR_12=ERR_1*ERR_1

regress ERR_12 PRE_32

* Interactions Model Specification

gen X1X4=X1*X4

gen X2X6=X2*X6

gen X3X6=X3*X6

gen X7X6=X7*X6

ivregress 2sls X9 (X1 X4 X1X4 = X2 X3 X7 X6 X2X6 X3X6 X7X6)

* Non-nested Test Case 1

regress X4 X6

predict PRE_4

ivregress 2sls X9 PRE_4 (X1 = X2 X3 X7 PRE_4)

* Non-nested Test Case 2

regress X1 X2 X3 X7

predict PRE_5

ivregress 2sls X9 PRE_5 (X4 = PRE_5 X6 PRE_5)

21

2SLS HATCO SHAZAM EXAMPLE

This section presents the SHAZAM code corresponding to the SPSS example.

* Original 2SLS model

2SLS X9 X1 X4 (X2 X3 X7 X6) / PREDICT=FIT_1 RESID=ERR_1 * 2 step OLS version to get 2SLS predictions, residuals and GR^2

OLS X1 X2 X3 X6 X7 / PREDICT=PRE_1 OLS X4 X2 X3 X6 X7 / PREDICT=PRE_2

OLS X9 PRE_1 PRE_2 / PREDICT=PRE_3 RESID=RES_1

* Over-identifying restrictions test OLS ERR_1 X2 X3 X6 X7

*RESET test GENR PRE_32=PRE_3*PRE_3

2SLS X9 X1 X4 PRE_32 (X2 X3 X7 X6 PRE_32) * Heteroscedasticity Test

GENR ERR_12=ERR_1*ERR_1

OLS ERR_12 PRE_32 * Interactions Model Specification

GENR X1X4=X1*X4 GENR X2X6=X2*X6

GENR X3X6=X3*X6

GENR X7X6=X7*X6

2SLS X9 X1 X4 X1X4 (X2 X3 X7 X6 X2X6 X3X6 X7X6)

* Non-nested Test Case 1 OLS X4 X6 / PREDICT=PRE_4

2SLS X9 X1 PRE_4 (X2 X3 X7 PRE_4)

* Non-nested Test Case 2

OLS X1 X2 X3 X7 / PREDICT=PRE_5

2SLS X9 X4 PRE_5 (X6 PRE_5)