Dimensions of Research - The University of … · Web viewDimensions of Research Problems Research...

Analysis of Data Using MR and GLMThis lecture: ANOVA via MR and GLM

Dimensions of Research Problems

We’ll start at the top of this box and work our way through it.

We’ll illustrate analysis with MR procedure and with the GLM procedure.

Ideally, at the end of this, you’ll be comfortable using either procedure to perform the same analyses.

ANOVA via MR and GLM 1 5/14/2023

Research Involving a mixture of qualitative and quantitative factors

Research involving only comparison of Group means – Qualitative Factors only

Research involving only quantitative factors

Training ProgramPerformanceafter training

Gender

Training ProgramGender PerformanceCognitive Ability

Cognitive AbilityConscientiousness GPAEmotional Stability

One Qualitative Factor: One Way ANOVAManipulation Check on Faking Data. This example is taken from the Rosetta project data. It is from Raven Worthy’s thesis, completed in 2012. Raven compared three conditions defined by different instructions regarding faking.

In each condition, participants were given a personality test (the IPIP 50-item Sample Big 5 questionnaire). Then, they were split into 3 groups.

Condition 1: Given instructions to respond honestly. Given a 2nd Big 5 questionnaire (IPIP Other 50 items)Condition 2: Given instructions to fake good. Given same 2nd Big 5 questionnaire.Condition 3: Given instructions to respond honestly but told that highest score would receive a gift certificate.

Dependent Variable: The dependent variable is a difference score . . .Mean scores on 2nd questionnaire (all 50 items) minus mean score on 1st questionnaire.

Expectation for DV values: 0 means person did not fake on 2nd administration relative to the first.Positive means person faked “good” on 2nd administration relative to first.Negative means person faked “bad” on 2nd administration relative to first.

Expectation for Conditions: Condition 1: DV ~~ 0 – Honest responseCondition 2: DV > > 0 – Instructed to fake goodCondition 3: DV > 0 – Incentive to fake good.

A snippet of the data (Dependent variable is boldfaced) . . .

b5origmean b5othmean b5meandiff ncondit

4.28 4.02 -.26 1 5.20 5.34 .14 1 4.16 3.70 -.46 1 4.68 4.16 -.52 1 5.76 5.62 -.14 1 4.54 4.40 -.14 1 4.96 4.34 -.62 1 5.16 5.06 -.10 1 4.22 3.98 -.24 1 5.36 5.16 -.20 1 5.42 5.38 -.04 1 5.06 4.65 -.41 1 4.82 5.04 .22 1 5.12 5.34 .22 1 4.58 4.46 -.12 1


Preliminary examination of data using dot plots . . .

It looks as if the manipulation worked. It appears that the participants in Condition 2 had more positive difference scores (more faking good) than did those in the other two conditions.

It also appears that the Incentive condition 3 resulted in more positive scores than Condition 1.

But the devil is in the p-values.

Note that the ncondit=2 scores seem more variable than the others!!!????

Various pieces of information that could/should be presented . . .

1. Means and SD’s.2. Plots of means.3. Tests of assumptions.4. Tests of significance.5. Effect sizes and observed power.6. Post hoc tests, if appropriate.


The data that were analyzed

Note that the dependent variable is a difference score – post instruction minus pre-instruction.


The analysis: Analyze -> General Linear Model -> Univariate

We will always put the name of the nominal / categorical variable defining group membership into the Fixed Factor(s) field.

(No group coding variables needed for GLM.)

We won’t be using the Random Factor(s) field in this class.


Descriptive Statistics.

This output was obtained by clicking on the [Options] button on the GLM dialog box and then checking Descriptive Statistics.

Note this is a quick-and-dirty way to get group means and SD’s.

The Output

Univariate Analysis of VarianceBetween-Subjects Factors

N

ncondit 1 110

2 108

3 110

Descriptive Statistics

Dependent Variable: b5meandiff

ncondit Mean Std. Deviation N

1 -.0861 .27205 110

2 .6228 .84490 108

3 .1975 .47792 110

Total .2424 .64744 328

Levene's Test of Equality of Error

Variancesa


F df1 df2 Sig.

76.376 2 325 .000

Tests the null hypothesis that the error variance

of the dependent variable is equal across

groups.

a. Design: Intercept + ncondit


Homogeneity tests.

The variances were significantly different across groups. We’ll have to interpret the mean comparisons cautiously

The following is the default output of the GLM procedure.

Tests of Between-Subjects Effects


Source

Type III Sum of

Squares df Mean Square F Sig.

Partial Eta

Squared

Noncent.

Parameter

Observed

Powerb

Corrected Model 27.727a 2 13.863 41.205 .000 .202 82.409 1.000

Intercept 19.643 1 19.643 58.382 .000 .152 58.382 1.000

ncondit 27.727 2 13.863 41.205 .000 .202 82.409 1.000

Error 109.346 325 .336

Total 156.349 328

Corrected Total 137.073 327

a. R Squared = .202 (Adjusted R Squared = .197)

b. Computed using alpha = .05

How did GLM analyze the data?

GLM formed two group-coding-variables and performed a regression of b5meandiff onto those variables. The corrected model F is simply the F testing the relationship of RATING to all the predictors – just two in this case. The ncondit F tests the relationship of bemeandiff to the two just the two group-coding variables


Observed Power: Probability of a significant F if population mean differences were identical to the observed sample mean differences.

If we redid the study drawing another sample from the population and the population means were identical to the sample means obtained here, the likelihood of a significant difference would be 1.000.

Eta squared. The most common estimate of effect size for analysis of variance.

Small: .01Medium: .059Large: .138

Test of significance of relationship of DV to all predictors - MR ANOVA

p values. Note that a 2nd p-value is a test that the intercept = 0. Don’t mistake that test for the ones you’re interested in.

GLM created to represent differences in ncondit. In this example, those are the only two group-coding variables that were created. In most analyses, there will be many more than just two.

What should we conclude?

The Oneway ANOVA F is completely valid only when the population variances are all equal. The Levine’s test comparing the variances was significant, indicating that the variances are not equal. This suggests that we should interpret the differences cautiously. We could perform an additional nonparametric test, the Kruskal-Wallis nonparametric ANOVA, for example. Or we could consider transformations of the data to equalize the variances.

In this case, however, the effect size is so big that I would feel comfortable arguing that there are significant differences between the means. If the effect size had been small (close to .01) and the p-value large (nearly .05), then I would have less confidence arguing that the means are different.


.Post-Hoc Tests

Knowing that the variances were not equal, I went back to the analysis specification and asked for the Scheffe test, which is a quite conservative post-hoc test and four other post hoc tests designed specifically for situations in which variance are not assumed to be equal.

I got the following output . . .


Post Hoc TestsNonredundant comparisons p-values are red’d.

NconditMultiple Comparisons


(I) ncondit (J) ncondit

Mean Difference (I-

J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

Scheffe 1 2 -.7090* .07857 .000 -.9022 -.5158

3 -.2836* .07821 .002 -.4759 -.0913

2 1 .7090* .07857 .000 .5158 .9022

3 .4254* .07857 .000 .2322 .6186

3 1 .2836* .07821 .002 .0913 .4759

2 -.4254* .07857 .000 -.6186 -.2322

Tamhane 1 2 -.7090* .08534 .000 -.9154 -.5025

3 -.2836* .05243 .000 -.4100 -.1572

2 1 .7090* .08534 .000 .5025 .9154

3 .4254* .09320 .000 .2006 .6502

3 1 .2836* .05243 .000 .1572 .4100

2 -.4254* .09320 .000 -.6502 -.2006

Dunnett T3 1 2 -.7090* .08534 .000 -.9153 -.5026

3 -.2836* .05243 .000 -.4100 -.1572

2 1 .7090* .08534 .000 .5026 .9153

3 .4254* .09320 .000 .2007 .6501

3 1 .2836* .05243 .000 .1572 .4100

2 -.4254* .09320 .000 -.6501 -.2007

Games-Howell 1 2 -.7090* .08534 .000 -.9113 -.5066

3 -.2836* .05243 .000 -.4075 -.1596

2 1 .7090* .08534 .000 .5066 .9113

3 .4254* .09320 .000 .2050 .6458

3 1 .2836* .05243 .000 .1596 .4075

2 -.4254* .09320 .000 -.6458 -.2050

Dunnett C 1 2 -.7090* .08534 -.9118 -.5062

3 -.2836* .05243 -.4082 -.1590

2 1 .7090* .08534 .5062 .9118

3 .4254* .09320 .2039 .6469

3 1 .2836* .05243 .1590 .4082

2 -.4254* .09320 -.6469 -.2039

Based on observed means.

The error term is Mean Square(Error) = .336.

*. The mean difference is significant at the .05 level.


The Dunnett C test gives only lower and upper bounds of a confidence interval for each pair.

Homogeneous Subsetsb5meandiff

ncondit N

Subset

1 2 3

Scheffea,b,c 1 110 -.0861

3 110 .1975

2 108 .6228

Sig. 1.000 1.000 1.000

Means for groups in homogeneous subsets are displayed.

Based on observed means.

The error term is Mean Square(Error) = .336.

a. Uses Harmonic Mean Sample Size = 109.325.

b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are

not guaranteed.

c. Alpha = .05.

All of the post-hoc tests agreed that all three means were different from each other.

1 vs 2: Significant

1 vs 3: Significant

2 vs 3: Significant

So, it appears that the manipulation was successful.

When participants were told to respond honestly, they changed their responses the least from pre-instruction to post-instruction (mean diff = -.09).

When they were instructed to fake good, they did (mean diff = 0.62).

When they were given a mild incentive to fake good, they did, a little (mean diff = 0.20).

Wonderful. The manipulation worked!!


Profile Plots

Note: SPSS’s plotting algorithm adjusts the scale values so the graph fills as much of the rectangular space as possible.

This can make nonsignificant differences look bigger than they really are.

It’s important to note that the means that are plotted are estimated means – means estimated assuming all participants had the same value on any covariates.

In this case, there were no covariates, so the estimated means are the same as the observed means.

But in analyses involving covariates (quantitative predictors), the estimated means and observed means are not equal.


GLM’s Group Coding Variables in Oneway ANOVA I said above that GLM forms Group Coding Variables to conduct the ANOVA. What information on them is available? For example, what coding scheme does SPSS use?

This example was taken from Aron and Aron, p. 318. Persons in 3 groups rated guilt of a defendant after having read a background sheet on the defendant. For those in Group 1, the background sheet mentioned a criminal record. For those in Group 2, the sheet mentioned a clean record. For those in Group 3, the background sheet gave no information on the defendant's background. Question: Are the differences in guilt ratings significant? Each case is a jury, created just for this experiment.


Getting GLM to display Regression information.

Information on regression analyses performed in GLM including group coding variables is displayed by requesting the “Parameter Estimates” table. It is shown being requested in the dialog box shown on the right


Case Sum m ar i esa

10. 00 1. 00 Cr im inal Recor d G r oup 1 0 1 0





5. 00 2. 00 Clean Recor d G r oup 0 1 0 1





4. 00 3. 00 No I nf or m at ion G r oup 0 0 - 1 - 1





15 15 15 15 15 15

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

NTot al

RATI NG Rat ing of G uilt .

10 = m ax. G RO UP DC1 DC2 EC1 EC2

Lim it ed t o f ir st 100 cases.a.

Analysis Illustrating Default GLM group coding variables.

Univariate Analysis of Variance

Output from the REGRESSION procedure using the default dummy codes

So, the default group coding variables estimated in GLM are dummy codes.


I used the REGRESSION procedure to determine what the GLM procedure’s Parameter Estimates table was giving.

Note that the output of the Parameter Estimates box is the same as the regression analysis of dummy codes.

This is new information – the Parameter Estimates Table requested above. The codes are Dummy Codes with the last group as the comparison group. I figured this out by doing a regression analysis with various codes until I found a match.

Since I did not request Eta-squared and Power for this analysis, they are not displayed here.

Be twe e n-Subje c ts Fa c tors

Cri min a l Re c o rd Gro u p 5

Cle a n Re c o rd Gro u p 5

No In fo rma ti o n Gro u p 5

1 .0 0

2 .0 0

3 .0 0

GROUPVa lu e L a b e l N

De s c riptiv e Sta tis tic s

De p e n d en t Va ri a b le : RATING Ra ti n g o f Gu i l t. 1 0 = ma x .

8 .0 0 0 0 2 .1 2 1 3 5

4 .0 0 0 0 2 .2 3 6 1 5

5 .0 0 0 0 2 .5 4 9 5 5

5 .6 6 6 7 2 .7 6 8 9 1 5

GROUP1 .0 0 Crimin a l Re c o rd Gro u p

2 .0 0 Cl ea n Re c o rd Gro up

3 .0 0 No In fo rma ti o n Gro u p

To ta l

Me a n Std . De v ia tio n N


Dependent Variable: RATING Rat ing of Guilt . 10 = max.

43. 333a 2 21. 667 4. 063 .045

481. 667 1 481. 667 90. 312 .000

43. 333 2 21. 667 4. 062 .045

64. 000 12 5. 333

589. 000 15

107. 333 14

SourceCorrect ed Model

Int ercept

GROUP

Error

Tot al

Correct ed Tot al

Type I I I Sumof Squares df Mean Square F Sig.

R Squared = . 404 (Adjust ed R Squared = . 304)a.

Parameter Est i mates

Dependent Var iable: RATI NG Rat ing of Guilt . 10 = max.

5. 000 1. 033 4. 841 . 000 2. 750 7. 250

3. 000 1. 461 2. 054 . 062 - . 182 6. 182

-1. 000 1. 461 - . 685 . 507 -4. 182 2. 182

0a . . . . .

Paramet erI nt ercept

[ G RO UP=1. 00]

[ G RO UP=2. 00]

[ G RO UP=3. 00]

B St d. Er ror t Sig. Lower Bound Upper Bound

95% Conf idence I nt erval

This paramet er is set t o zero because it is redundant .a.

Co e ffic ie n tsa

5 .0 0 0 1 .0 3 3 4 .8 4 1 .0 0 0

3 .0 0 0 1 .4 6 1 .5 2 9 2 .0 5 4 .0 6 2

-1 .0 0 0 1 .4 6 1 -.1 7 6 -.6 8 5 .5 0 7

(Co n s ta n t )

DC1

DC2

Mo d e l1

B Std . Erro r

Un s ta n d a rd i z e dCo e f f i c i e n ts

Be ta

Sta n d a rdi z e d

Co e f f i c i en ts

t S i g .

De p e n d e n t Va ri a b l e : RAT ING Ra t i n g o f Gu i l t . 1 0 = ma x .a .

Analysis Illustrating Requesting your own GLM Contrasts

UNIANOVA rating BY group /CONTRAST (group)=Deviation /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PRINT = DESCRIPTIVE PARAMETER /CRITERIA = ALPHA(.05) /DESIGN = group .


So GLM always displays dummy variable codes in the Parameter estimates box.


Same data analyzed. This time, I clicked on the [Contrasts...] button and requested Deviation Contrasts. Deviation contrasts are the same as Effects contrasts.

What is displayed in the Parameter Estimates Box is unaffected by the fact that Deviation Contrasts were chosen. What is (always) displayed here is Dummy Codes with the last group as the comparison group.

Syntax, if you’re interested.

Same output as in the above two examples.

No effect sizes or power values displayed because I did not ask for them. I did ask for Parameter Estimates.

Between-Subjects Factors

CriminalRecordGroup

5

CleanRecordGroup

5

NoInformationGroup

5

1.00

2.00

3.00

GROUPValue Label N

De s c riptiv e Sta tis tic s

De p e n d en t Va ri a b le : RATING Ra ti n g o f Gu i l t. 1 0 = ma x .

8 .0 0 0 0 2 .1 2 1 3 5

4 .0 0 0 0 2 .2 3 6 1 5

5 .0 0 0 0 2 .5 4 9 5 5

5 .6 6 6 7 2 .7 6 8 9 1 5

GROUP1 .0 0 Crimin a l Re c o rd Gro u p

2 .0 0 Cl ea n Re c o rd Gro up

3 .0 0 No In fo rma ti o n Gro u p

To ta l

Me a n Std . De v ia tio n N


Dependent Variable: RATING Rat ing of Guilt . 10 = max.

43. 333a 2 21. 667 4. 063 .045

481. 667 1 481. 667 90. 312 .000

43. 333 2 21. 667 4. 062 .045

64. 000 12 5. 333

589. 000 15

107. 333 14

SourceCorrect ed Model

Int ercept

GROUP

Error

Tot al

Correct ed Tot al

Type I I I Sumof Squares df Mean Square F Sig.

R Squared = . 404 (Adjust ed R Squared = . 304)a.

Parameter Est i mates

Dependent Var iable: RATI NG Rat ing of Guilt . 10 = max.

5. 000 1. 033 4. 841 . 000 2. 750 7. 250

3. 000 1. 461 2. 054 . 062 - . 182 6. 182

-1. 000 1. 461 - . 685 . 507 -4. 182 2. 182

0a . . . . .

Paramet erI nt ercept

[ G RO UP=1. 00]

[ G RO UP=2. 00]

[ G RO UP=3. 00]

B St d. Er ror t Sig. Lower Bound Upper Bound

95% Conf idence I nt erval

This paramet er is set t o zero because it is redundant .a.

Analysis Illustrating Requesting your own contrasts continued.

Results of user-requested contrasts are presented in the Custom Hypothesis Section.

Results from REGRESSION, for comparison.

The bottom line is that the Parameter Estimates box is pretty useless for ANOVA applications unless you're interested in dummy coding. It is most useful for displaying REGRESSION-like information on quantitative variables.

On the other hand, the Contrast Results box gives you a lot of information about your contrasts, although it omits the t-statistic values (???!!!). (But we only look at the p-values anyway, right?)


The Regression Procedure Coefficients box using Effects Coding.

Note that the significance values for EC1 and EC2 are identical to the above significance values for

This box is the result of the specification of a contrast other than the “None” default.

“Deviation” coding was specified. This is the same as what we have called Effects coding.

Cont rast Resul t s ( K M at r i x)

2. 333

0

2. 333

. 843

. 017

. 496

4. 171

- 1. 667

0

- 1. 667

. 843

. 072

- 3. 504

. 171

Cont r ast Est imat e

Hypot hesized Value

Dif f er ence ( Est im at e - Hypot hesized)

St d. Er r or

Sig.

Lower Bound

Upper Bound

95% Conf idence I nt er valf or Dif f er ence

Cont r ast Est imat e

Hypot hesized Value

Dif f er ence ( Est im at e - Hypot hesized)

St d. Er r or

Sig.

Lower Bound

Upper Bound

95% Conf idence I nt er valf or Dif f er ence

G RO UP Deviat ion Cont r asta

Level 1 vs. M ean

Level 2 vs. M ean

RATI NG Rat ing of G uilt .

10 = m ax.

DependentVar iable

O m it t ed cat egor y = 3a.

Te s t Re s ults

De p e n d e n t Va ri a b le : RATING Ra ti n g o f Gu i l t. 1 0 = ma x .

4 3 .3 3 3 2 2 1 .6 6 7 4 .0 6 3 .0 4 5

6 4 .0 0 0 1 2 5 .3 3 3

So u rc eCo n tra s t

Erro r

Su m o fSq u a re s d f Me a n Sq u a re F Sig .

Co e ffic ie n tsa

5 .6 6 7 .5 9 6 9 .5 0 3 .0 0 0

2 .3 3 3 .8 4 3 .7 1 2 2 .7 6 7 .0 1 7

-1 .6 6 7 .8 4 3 -.5 0 9 -1 .9 7 6 .0 7 2

(Co n s ta n t )

EC1

EC2

Mo d e l1

B Std . Erro r

Un s ta n d a rd i z e dCo e f f i c i e n ts

Be ta

Sta n d a rdi z e d


t S i g .

De p e n d e n t Va ri a b l e : RAT ING Ra t i n g o f Gu i l t . 1 0 = ma x .a .

Analysis of Between-Subjects Factorial DesignsIssues

Factorial Designs – a review

Definition

Research with 2 or more factors in which data have been gathered at all combinations of levels of all factors.

Typical representation is as a Two Way Table.

Rows of the table represent one factor – i.e., the Row FactorColumns of the table represent the Column Factor.Cells represent individual groups of persons observed at each combination of factor levels.

Factor 1

Level 1 Level 2Level 1

Factor 2

Level 2

Note - each factor varies completely within each level of the other.All levels of each factor appear at all levels of the other(s).

Called a completely crossed design because the variation in each variable completely crosses the other variable.

Three Way factorial designs are often represented by separate layers of two way tables

Example: Factor 1 = Type of Training Program, say Lecture vs. ComputerizedFactor 2 = Gender; Factor 3 = Job level – 1st line managers vs. middle managers


Lecture CompMale

Female

Lecture CompMale

Female1st line Managers

Middle Managers

Analyses of 2x2 Factorial Designs using GLM

The data

The data are from a study by a surgeon at Erlanger on the effect of helmet use on injuries from ATV accidents. The factors investigated here areHELMET: Whether the driver was wearing a helmet or not, with 2 levels.ROLLVER: Whether the ATV rolled over or not, with 2 levels.

The dependent variable is log of the Injury Severity Score (ISS). The larger the ISS value, the more severe the injury. The logarithm was used to make the distribution of the dependent variable more nearly symmetric and less positively skewed.

FYI - Here’s a comparison of the distribution of raw ISS vs. log ISS scores.


Skewness values areStatistics

iss lgiss

NValid 500 500

Missing 0 0

Skewness 2.263 -.624

Std. Error of Skewness .109 .109

So, the ISS values are clearly more positively skewed than are the log ISS values which are actually slightly negatively skewed.

We’ll analyze the logs of the ISS values.

Original ISS

Log10 ISS

Expectations:

I would expect higher log ISS scores for those not wearing helmets.

I would expect higher ISS scores for those who did not roll over assuming no rollover represents collision.That is, they’re in the hospital for a reason. If they didn’t roll over, they must have hit something.

I would expect the effect of helmet use to be greater among those who did roll (bigger difference between no helmet and helmet) and less among those who did not (smaller difference between no helmet and helmet) – that is, I would expect an interaction of HELMET and ROLLOVER. I could be wrong.

Here’s a small part of the data matrix . . .


GLM Analysis


The syntax, if you’re interested . . .

GLM lgiss BY helmet rollover /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PLOT = PROFILE( helmet*rollover rollover*helmet ) /PRINT = DESCRIPTIVE ETASQ OPOWER HOMOGENEITY /EMMEANS = TABLES(helmet*rollover) /CRITERIA = ALPHA(.05) /DESIGN = helmet rollover helmet*rollover

Univariate Analysis of Variance [DataSet3] G:\MdbT\InClassDatasets\ATVDataForClass050906.sav

Manually produced Two-way Table

Only the main effect of helmet usage was significant.There was no main effect of rollover – persons who rolled over did not have significantly different mean log ISS scores.There was no interaction of HELMET and ROLLOVER. The effect of wearing a helmet was the not significantly different among those who rolled over vs among those who crashed into something. Note, however, that the there was a large numeric difference - .08 vs .20 – between the two pairs of means. It was large numerically, but not large enough to be statistically significant.


No Helmet

Helmet

Roll .90 .82 .89No Ro .98 .78 .95

.93 .80 .91

.08

.20

Tests significance of interaction: Difference between effect of Helmet in Rollover group vs. effect of Helmet in No R group: .08 vs. .20.


no 320

yes 61

no 137

yes 244

0

1

helmet helmetuse

0

1

rollover rollover

Value Label N


Dependent Variable: lgiss

.9835 .36049 115

.9033 .35490 205

.9321 .35843 320

.7763 .27251 22

.8194 .32152 39

.8039 .30315 61

.9503 .35529 137

.8899 .35050 244

.9116 .35296 381

rollover rollover0 no

1 yes

Total

0 no

1 yes

Total

0 no

1 yes

Total

helmet helmetuse0 no

1 yes

Total

Mean Std. Deviation N

Levene's Test of Equality of Error Variances a


.820 3 377 .483F df1 df2 Sig.

Tests the null hypothesis that the error variance of thedependent variable is equal across groups.

Design: Intercept+helmet+rollover+helmet * rollovera.



1.343b 3 .448 3.670 .012 .028 11.009 .800

143.244 1 143.244 1174.063 .000 .757 1174.063 1.000

1.001 1 1.001 8.202 .004 .021 8.202 .815

.016 1 .016 .133 .715 .000 .133 .065

.180 1 .180 1.474 .225 .004 1.474 .228

45.997 377 .122

363.955 381

47.340 380

SourceCorrected Model

Intercept

helmet

rollover

helmet * rollover

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

Partial EtaSquared

Noncent.Parameter Observed Power

a

Computed using alpha = .05a.

R Squared = .028 (Adjusted R Squared = .021)b.

Profile Plots

Which factor should be the horizontal axis? Both.

Both plots give the same information in different ways. One might be more useful than the other.

Plot 1: Horizontal axis is defined by helmet use.

Different heights of the lines (white ellipses are the marginal means) represent the Rollover effect. You can see that they’re not terribly or consistently different.

Comparison of left side points with right side points (red ellipses are the marginal means) represent the Helmet effect. Larger mean log ISS scores for no helmet group.

Lack of parallelness of lines represent the interaction. They’re crossed, but not so nonparallel as to represent a significant interaction.

helmet use * rollover


helmet use rollover Mean Std. Error 95% Confidence Interval

Lower Bound Upper Bound


The marginal means are identical to the observed means because there are no covariates or other factors to control for.

0 no0 no .984 .033 .919 1.048

1 yes .903 .024 .855 .951

1 yes0 no .776 .074 .630 .923

1 yes .819 .056 .709 .929

Plot 2: Horizontal axis defined by Rollover

This plot is probably easier to understand.

Different heights of the lines (red ellipses are the marginal means) represent the helmet effect. They’re quite different in height, reflecting the significant effect.

Heights of the left points vs. the right side points (white ellipses are the marginal meas) represent the rollover effect. Average height above “No” is about the same as average height above “Yes”.

Lack of parallelness represents the interaction. Lines are not parallel but not so different in slope as to represent an interaction although the difference between helmet use and nonuse is numerically (but not significantly) greater for nonrollover accidents. (The opposite of my expectation. OK – I really expected the Rollover accidents to have a greater effect than the NonRoller accidents. Example of post hoc hypothesizing – Monday morning hypotheses.)


Analysis of a 2 x 3 Factorial Design using GLM Myers & Well, p. 127

Table 5.1 presents the data for 48 subjects run in a text recall experiment. The scores are percentages of idea units recalled. The data were presented at three different rates - 300, 450, or 600 words per minute. The text was either intact or scrambled.

Rate

Text G1 G2 G3G4 G5 G6

Cell scram rateG1 1 1G2 1 2G3 1 3G4 2 1G5 2 2G6 2 3


How the data should be entered into SPSSRecall rate scram

72 1 163 1 157 1 152 1 169 1 175 1 168 1 174 1 149 2 171 2 163 2 148 2 168 2 165 2 152 2 163 2 140 3 149 3 136 3 150 3 154 3 146 3 146 3 126 3 165 1 245 1 2

_

Analysis of the 2x3 Equal N Meyers/Well data using GLMGET FILE='E:\MdbT\P595\ANOVAviaMR\Meyers_well p.127.sav'.UNIANOVA dv BY row col /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PLOT = PROFILE( col*row ) /EMMEANS = TABLES(row) /EMMEANS = TABLES(col) /EMMEANS = TABLES(row*col) /PRINT = DESCRIPTIVE ETASQ OPOWER HOMOGENEITY /PLOT = SPREADLEVEL /CRITERIA = ALPHA(.05) /DESIGN = row col row*col .

Menu Sequence: Analyze -> GLM -> Univariate



Col 1 Col 2 Col 3 MarginRow 1 66.25 59.87 43.38 56.50Row 2 54.38 49.75 45.88 50.00Margin: 60.31 54.81 44.63 53.25

Manually created Two-way table of means.


24

24

16

16

16

1

2

ROW

1

2

3

COL

N


Dependent Variable: DV

66.25 8.28 8

59.87 8.92 8

43.38 9.02 8

56.50 12.91 24

54.38 5.83 8

49.75 7.23 8

45.88 7.43 8

50.00 7.46 24

60.31 9.24 16

54.81 9.42 16

44.63 8.09 16

53.25 10.94 48

COL1

2

3

Total

1

2

3

Total

1

2

3

Total

ROW1

2

Total

Mean Std. Dev iation N

Levene's Test of Equality of Error Variances a


.774 5 42 .574F df1 df2 Sig.

Tests the null hypothesis that the error variance ofthe dependent variable is equal across groups.

Design: Intercept+ROW+COL+ROW * COLa.

The F associated with the "Corrected Model" above is the same as the F in the REGRESSION ANOVA box when all variables are in the equation. It's simply the significance of the relationship of Y to all of the variables coding main effects and interactions.

The F in the “Intercept” row is the square of the t for the Intercept in the MR. Don’t interpret it.

Observed Power: If the population means were equal to this sample’s means, observed power is the probability you would reject if you took a new sample.

Estimated Marginal Means


Estimated marginal mean: An estimate of what the mean would be if all scores were equal on all the other factors.

Equal to observed marginal means if 1) cell sizes are equal or 2) if there are no other factors.

These are "estimated" means for each row, column, and cell. Since sample sizes are equal, and there are no other factors, in this case the marginal means are equal to the observed means.

When the interaction is significant, you must use caution in reporting the main effects – because they’re not “main” anymore – each is conditional on the value of the other factor.

Test s of Bet w een- Subj ect s Ef f ect s

Dependent Var iab le: DV

3026. 500b

5 605. 300 9. 791 . 000 . 538 48. 956 1. 000

136107. 000 1 136107. 000 2201. 615 . 000 . 981 2201. 615 1. 000

507. 000 1 507. 000 8. 201 . 007 . 163 8. 201 . 799

2027. 375 2 1013. 688 16. 397 . 000 . 438 32. 794 . 999

492. 125 2 246. 062 3. 980 . 026 . 159 7. 960 . 682

2596. 500 42 61. 821

141730. 000 48

5623. 000 47

Sour c eCor r ec t edM odel

I nt e r c ept

RO W

CO L

RO W * CO L

Er r or

Tot a l

Cor r ec t ed Tot al

Ty pe I I I Sumof Squar es df M ean Squar e F Sig. Et a Squar ed

Nonc ent .Par am et erO bs er v ed Power

a

Com put ed us ing alpha = . 05a.

R Squar ed = . 538 ( Adjus t ed R Squar ed = . 483)b.

1. ROW


56.500 1.605 53.261 59.739

50.000 1.605 46.761 53.239

ROW1

2

Mean Std. Error Lower Bound Upper Bound

95% Confidenc e Interv al

2. COL


60.313 1.966 56.346 64.279

54.813 1.966 50.846 58.779

44.625 1.966 40.658 48.592

COL1

2

3


95% Confidenc e Interv al

3 . ROW * COL

De p e n d e n t Va ri a b l e : DV

6 6 .2 5 0 2 .7 8 0 6 0 .6 4 0 7 1 .8 6 0

5 9 .8 7 5 2 .7 8 0 5 4 .2 6 5 6 5 .4 8 5

4 3 .3 7 5 2 .7 8 0 3 7 .7 6 5 4 8 .9 8 5

5 4 .3 7 5 2 .7 8 0 4 8 .7 6 5 5 9 .9 8 5

4 9 .7 5 0 2 .7 8 0 4 4 .1 4 0 5 5 .3 6 0

4 5 .8 7 5 2 .7 8 0 4 0 .2 6 5 5 1 .4 8 5

COL1

2

3

1

2

3

ROW1

2

Me a n Std . Erro rL o we r Bo u n dUp p e r Bo u n d

9 5 % Co n f i d e n c e In te rv a l

Spread-versus-Level Plots

Profile Plots

So, words forming text (Row 1) were recalled at a higher rate than words which were scrambled (Row 2) until the rate got so high that neither was recalled well.


Used to help decide whether variability within each cell increases as mean of a cell increases. If so, a transformation would be recommended.

I see no trend here, so no transformation will be made.

This is a plot of the "estimated" cell means printed in the above table. Since there are no covariates, they're equal to the observed means displayed in the table at the beginning of the output for this analysis.

This plot is the plot recommended for 2-way factorial designs to show in graphical fashion the form of main effects and interactions.

It shows that there was an interaction of the row and col effects: the difference between rows (red minus green) changed across columns - with a large difference favoring Row 1 (red) in columns 1 and 2, and a small or perhaps insignificant difference favoring Row 2 (green) in column 3.

These differences of differences are the interaction. If the differences were all equal, i.e., there were no differences of differences, the interaction would not be significant.

Rate

Spread vs. Level Plot of DV

Groups: ROW * CO L

L e v e l (Me a n )

70605040

Sp

rea

d (

Sta

nd

ard

De

via

tio

n)

9. 5

9. 0

8. 5

8. 0

7. 5

7. 0

6. 5

6. 0

5. 5

Spread vs. Level Plot of DV

Groups: ROW * CO L

L e v e l (Me a n )

70605040

Sp

rea

d (

Va

ria

nc

e)

90

80

70

60

50

40

30

Estimated Marginal Means of DV

COL

321

Es

tim

ate

d M

arg

ina

l M

ea

ns

70

60

50

40

ROW

1

2

Rcmdr Analysis of Myers/Well 2 x 3 Factorial Data – Start 10/2/17Data Import Data From SPSS Dataset . . .

Change independent variables to factors in RcmdrBefore performing an ANOVA using Rcmdr, make sure that all factors (qualitative independent variables) are recognized as “factors” by Rcmdr.

If you’re not sure, simply do the following.

Data Manage Variables in Active Dataset Convert Numeric Variables to Factor . . .

If a variable is already a factor, it will either have (Factor) after its name or it will not appear in the list of variables.

If you see a variable that you wish to be a factor in an ANOVA in the list, select it. Then click on [OK].

If you check [Supply level names], you’ll be asked for a value label for each level. Otherwise, Rcmdr will simply use the values of the variable as the labels.


Perfoming the ANOVA using Rcmdr

Statistics Fit Model Linear Model . . .


I double-clicked on the names in the scrolling field to put them into the Model formula line.

You can simply type them in, if you wish.

R uses A:B to represent the interaction term.

The Rcmdr output, after clicking OK . . .> LinearModel.6 <- lm(dv ~ row +col +row:col, data=MyersWell)

> summary(LinearModel.6)

Call:lm(formula = dv ~ row + col + row:col, data = MyersWell)

Residuals: Min 1Q Median 3Q Max -17.375 -4.781 2.188 5.844 11.125

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 66.250 2.780 23.832 < 2e-16 ***row[T.2] -11.875 3.931 -3.021 0.00428 ** col[T.2] -6.375 3.931 -1.622 0.11238 col[T.3] -22.875 3.931 -5.819 7.24e-07 ***row[T.2]:col[T.2] 1.750 5.560 0.315 0.75450 row[T.2]:col[T.3] 14.375 5.560 2.586 0.01328 * ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.863 on 42 degrees of freedomMultiple R-squared: 0.5382, Adjusted R-squared: 0.4833 F-statistic: 9.791 on 5 and 42 DF, p-value: 2.963e-06

Argh!! This is not like the GLM “Tests of Between Subjects Effects” output.

Getting Rcmdr to print an ANOVA summary table . . .

The output of the anova command . . .> anova(LinearModel.6)

Analysis of Variance Table

Response: dv Df Sum Sq Mean Sq F value Pr(>F) row 1 507.00 507.00 8.2010 0.006508 ** col 2 2027.38 1013.69 16.3970 5.458e-06 ***row:col 2 492.13 246.06 3.9802 0.026127 * Residuals 42 2596.50 61.82 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bingo!! – Same values as the GLM table. (page 24)


I typed anova(LinearModel.6) and then pressed the [Submit] button.

This table is like the Parameters table in SPSS. Each line represents a group coding variable.

Using REGRESSION to perform analyses of Factorial Designs - SkipTo assess each factor, Fs with the following form must be computed

R2All Factors – R2

All except factor of interest, i.e. R2 for just the factor of interest----------------------------------------------------------------------------------Number of GCV’s for factor of interest

FFactor of interest = --------------------------------------------------------------------------------------1 – R2

All Factors

---------------------------------------------------N – Number of GCV’s for all factors – 1

These analyses can be conducted using only the REGRESSION procedure.

To get SPSS REGRESSION to create this F, 0) Request SPSS to print F for R2 change.1) Enter ALL GCV’s, but ignore the output associated with this step.

2) Remove the GCVs for the 1st factor being tested, again ignoring the output associated w. this step.3) Re-enter them. The significance of F change assesses significance of the 1st factor.

4) Remove the GCVs for the 2nd factor.5) Re-enter them. The significance of F change assesses significance of the 2nd factor.

6) Remove the GCVs for the interaction.7) Re-enter them. The significance of F change assesses significance of the interaction.


Problem – how to get REGRESSION to use this as the denominator

Analysis of the 2x2 Factorial using REGRESSION -skip

The 2x2 coding.

Factor 2: Helmet

Factor 1: Rollover G1 G2G3 G4

Factor 1 (Rollover) compares G1+G2 with G3+G4.Factor 2 (Helmet) compares G1+G3 with G2+G4.Interaction compares G1-G2 with G3-G4, i.e., the contrast is (G1-G2)-(G3-G4).

To make the REGRESSION output identical to the GLM output, you must use contrast codes - no dummy variables or effect coding variables.

Factor 1 Factor 2 InteractionGCV1 GCV2 GCV3

G1 +.5 +.5 +.25G2 +.5 -.5 -.25G3 -.5 +.5 -.25G4 -.5 -.5 +.25

Creating the Contrast coding variables using syntax

recode rollover (0=-.5)(1=+.5) into gcv1.recode helmet (0=-.5)(1=+.5) into gcv2.compute gcv3 = gcv1*gcv2.

variable labels gcv1 "GCV representing ROLLOVER " gcv2 "GCV representing whether HELMET was used" gcv3 "GCV representing interaction of ROLLOVER and HELMET".

Specifying the regression using syntax

The increase in R2 associated with each factor must be tested controlling for all other factors.

This means that each factor must be added to the equation containing all the other factors.Usually this means that a sequential procedure described below must be followed.

The red’d terms below represent the addition of each comparison to the equation containing all the other comparisons.

regression variables = lgiss gcv1 gcv2 gcv3 /descriptives = default corr /statistics = default cha /dep=lgiss /enter gcv1 gcv2 gcv3 /remove gcv3 /enter gcv3/remove gcv2 /enter gcv2 /remove gcv1 /enter gcv1.

Note – each effect – the Row Main Effect, the Col Main Effect, and the Interaction Effect – is represent by a collection of gcvs. In this example, however, each collection has just 1 contrast code.


Note that I entered gcv3 first. Some analysts will enter the interaction comparison first. If it’s not significant, they’ll leave the gcv for that comparison out of the equation. I chose to leave it in.

The regression output -skip

In the following, the red’d lines represent the entry of each factor into the equation containing all the other factors.

So the significance of the R2 change is assessed for Model 3, 5, and 7.

Variables Entered/Removeda

Model Variables Entered Variables Removed Method

1 gcv3, gcv1, gcv2b . Enter

2 .b gcv3c Remove

3 gcv3b . Enter

4 .b gcv2c Remove

5 gcv2b . Enter

6 .b gcv1c Remove

7 gcv1b . Enter

a. Dependent Variable: lgiss

b. All requested variables entered.

c. All requested variables removed.


RolloverHELMET

Rollover

HELMET


.9116 .35296 381

-.3399 .36719 381

.1404 .48051 381

-.0479 .24569 381

lgiss

gcv1 GCV representing HELMET usage

gcv2 GCV representing whether accident involved rollover

gcv3 GCV representing interaction of HELMET and ROLLOVER


Correlations

1.000 -.133 -.082 .071

-.133 1.000 -.001 .209

-.082 -.001 1.000 -.665

.071 .209 -.665 1.000

lgiss

gcv1 GCV representing HELMET usage

gcv2 GCV representing whether accident involved rollover

gcv3 GCV representing interaction of HELMET and ROLLOVER

PearsonCorrelation

lgiss

gcv1 GCVrepresenting

HELMET usage

gcv2 GCVrepresenting

whether accidentinvolved rollover

gcv3 GCVrepresentinginteraction ofHELMET andROLLOVER

-skipModel Summary

Model R R Square Adjusted R

Square

Std. Error of the

Estimate

Change Statistics

R Square

Change

F Change df1 df2 Sig. F Change

1 .168a .028 .021 .34930 .028 3.670 3 377 .012

2 .157b .025 .019 .34951 -.004 1.474 1 377 .225

3 .168c .028 .021 .34930 .004 1.474 1 377 .225

4 .167d .028 .023 .34890 .000 .133 1 377 .715

5 .168e .028 .021 .34930 .000 .133 1 377 .715

6 .085f .007 .002 .35261 -.021 8.202 1 377 .004

7 .168g .028 .021 .34930 .021 8.202 1 377 .004

a. Predictors: (Constant), gcv3, gcv1, gcv2

b. Predictors: (Constant), gcv1, gcv2

c. Predictors: (Constant), gcv1, gcv2, gcv3

d. Predictors: (Constant), gcv1, gcv3

e. Predictors: (Constant), gcv1, gcv3, gcv2

f. Predictors: (Constant), gcv3, gcv2

g. Predictors: (Constant), gcv3, gcv2, gcv1

ANOVAa

Model Sum of Squares df Mean Square F Sig.


Note that each red p-value is the same as the corresponding p-value in the GLM output above.

1

Regression 1.343 3 .448 3.670 .012b

Residual 45.997 377 .122

Total 47.340 380

2

Regression 1.163 2 .582 4.762 .009c

Residual 46.177 378 .122

Total 47.340 380

3

Regression 1.343 3 .448 3.670 .012d

Residual 45.997 377 .122

Total 47.340 380

4

Regression 1.327 2 .663 5.451 .005e

Residual 46.013 378 .122

Total 47.340 380

5

Regression 1.343 3 .448 3.670 .012f

Residual 45.997 377 .122

Total 47.340 380

6

Regression .343 2 .171 1.378 .253g

Residual 46.998 378 .124

Total 47.340 380

7

Regression 1.343 3 .448 3.670 .012h

Residual 45.997 377 .122

Total 47.340 380

a. Dependent Variable: lgissb. Predictors: (Constant), gcv3, gcv1, gcv2c. Predictors: (Constant), gcv1, gcv2d. Predictors: (Constant), gcv1, gcv2, gcv3e. Predictors: (Constant), gcv1, gcv3f. Predictors: (Constant), gcv1, gcv3, gcv2g. Predictors: (Constant), gcv3, gcv2h. Predictors: (Constant), gcv3, gcv2, gcv1


-skipIf we’re only interested in the significance of the effects of the two factors and their interaction, then the coefficients box is of little interest to us.

Since each comparison involved on 1 group-coding variable, this analysis could have been done with one equation: an equation with all three GCVs in it, as illustrated by the circled p-values above..

But, the use of only one equation only works for 2 x 2 factorial designs. Any designs with more than 2 levels in a factor would require the sequential projess that I followed above.

The example below illustrates the need to perform the analysis of more complex designs sequentially.


Note that each time all three GCVs were in the equation, the collection of p-values at the right was the same as the collection of p-values obtained with GLM.

This is because in this particular example, a regression analysis with all 3 GCVs includes the same tests – row main effect, column main effect, and interaction effect - as the GLM analysis.

Analysis of a 2 x 3 Factorial design using REGRESSION -skip

Factorial Designs in which one or more factors has more than 2 levels.

When a one or more of the main effects has more than 2 levels, the analysis using MR gets a little more complicated. This is because the factor with more than 2 levels must be represented by more than 2 or more group-coding variables. And that means that the interaction will also be represented by more than 1 group-coding variable. The result is that the coefficients box will generally NOT give information on the significance of factors in such an analysis.

Example of a 2x3 Factorial

The 2x3 Table

Factor 2

Factor 1 G1 G2 G3G4 G5 G6

The Data Editor

Group Factor 1 Factor 2 InteractionF1GCV F2GCV1 F2GCV2 IntGCV1 IntCGV2

G1 .5 .6667 0 .3333 0G2 .5 -.3333 .5 -.1667 .25G3 .5 -.3333 -.5 -.1667 -.25G4 -.5 .6667 0 -.3333 0G5 -.5 -.3333 .5 .1667 -.25G6 -.5 -.3333 -.5 .1667 .25

Main Effect of Factor 1: Average of G1,G2,G3) vs. Average of G4,G5,G6

Main Effect of Factor 2- 1st Contrast: Average of G1G4 vs Average of G2,G3,G5,G6 - 2nd Contrast: Average of G2,G5 vs. Average of G3,G6

Interaction: 1st Contrast: (G1 – Av of G2,G3) vs. (G4 – Av of G5,G6)

(G1-G2,G3) - (G4-G5,G6)

Is the difference between Col 1 and Col’s 2&3 the same across rows?

2nd Contrast: G2 – G3 vs. G5 – G6

(G2 – G3) – (G5 – G6)

Is the difference between Col 2 and Col3 the same across rows?

Coefficients can be easily gotten by multiplying the main effect coefficients.


The Myers/Wells data Matrix for Regression analysis with contrast coding -skip

DV ROW COL ROWGCV COLGCV1 COLGCV2 INTGCV1 INTGCV2

72 1 1 .50 .67 .00 .33 .00 63 1 1 .50 .67 .00 .33 .00 57 1 1 .50 .67 .00 .33 .00 52 1 1 .50 .67 .00 .33 .00 69 1 1 .50 .67 .00 .33 .00 75 1 1 .50 .67 .00 .33 .00 68 1 1 .50 .67 .00 .33 .00 74 1 1 .50 .67 .00 .33 .00 49 1 2 .50 -.33 .50 -.17 .25 71 1 2 .50 -.33 .50 -.17 .25 63 1 2 .50 -.33 .50 -.17 .25 48 1 2 .50 -.33 .50 -.17 .25 68 1 2 .50 -.33 .50 -.17 .25 65 1 2 .50 -.33 .50 -.17 .25 52 1 2 .50 -.33 .50 -.17 .25 63 1 2 .50 -.33 .50 -.17 .25 40 1 3 .50 -.33 -.50 -.17 -.25 49 1 3 .50 -.33 -.50 -.17 -.25 36 1 3 .50 -.33 -.50 -.17 -.25 50 1 3 .50 -.33 -.50 -.17 -.25 54 1 3 .50 -.33 -.50 -.17 -.25 46 1 3 .50 -.33 -.50 -.17 -.25 46 1 3 .50 -.33 -.50 -.17 -.25 26 1 3 .50 -.33 -.50 -.17 -.25 65 2 1 -.50 .67 .00 -.33 .00 45 2 1 -.50 .67 .00 -.33 .00 53 2 1 -.50 .67 .00 -.33 .00 53 2 1 -.50 .67 .00 -.33 .00 51 2 1 -.50 .67 .00 -.33 .00 58 2 1 -.50 .67 .00 -.33 .00 53 2 1 -.50 .67 .00 -.33 .00 57 2 1 -.50 .67 .00 -.33 .00 56 2 2 -.50 -.33 .50 .17 -.25 55 2 2 -.50 -.33 .50 .17 -.25 49 2 2 -.50 -.33 .50 .17 -.25 52 2 2 -.50 -.33 .50 .17 -.25 35 2 2 -.50 -.33 .50 .17 -.25 57 2 2 -.50 -.33 .50 .17 -.25 45 2 2 -.50 -.33 .50 .17 -.25 49 2 2 -.50 -.33 .50 .17 -.25 41 2 3 -.50 -.33 -.50 .17 .25 42 2 3 -.50 -.33 -.50 .17 .25 57 2 3 -.50 -.33 -.50 .17 .25 39 2 3 -.50 -.33 -.50 .17 .25 36 2 3 -.50 -.33 -.50 .17 .25 52 2 3 -.50 -.33 -.50 .17 .25 52 2 3 -.50 -.33 -.50 .17 .25 48 2 3 -.50 -.33 -.50 .17 .25_Number of cases read: 48 Number of cases listed: 48


Used for GLM

Used for Regression

analysis

The regression analysis of the Myers/Well data -skip

regression variables = dv rowgcv colgcv1 colgcv2 intgcv1 intgcv2 /descriptives = default /statistics = default cha /dep=dv /enter rowgcv colgcv1 colgcv2 intgcv1 intgcv2 /remove rowgcv /enter rowgcv /remove colgcv1 colgcv2 /enter colgcv1 colgcv2 /remove intgcv1 intgcv2 /enter intgcv1 intgcv2.

Regression


Enters all the variables.

The "/descriptives = " subcommand above causes these two boxes of output to be printed.

Descr iptive S tatistics

53.25 10.94 48

.0000 .5053 48

6.939E -18 .4764 48

.0000 .4126 48

-3.4694E -18 .2382 48

.0000 .2063 48

DV

ROW GCV

COLGCV 1

COLGCV 2

INTGCV 1

INTGCV 2

Mean S td. Deviation N

Cor rel at i ons

1. 000 . 300 . 461 . 384 . 176 . 238

. 300 1. 000 . 000 . 000 . 000 . 000

. 461 . 000 1. 000 . 000 . 000 . 000

. 384 . 000 . 000 1. 000 . 000 . 000

. 176 . 000 . 000 . 000 1. 000 . 000

. 238 . 000 . 000 . 000 . 000 1. 000

DV

RO WG CV

CO LG CV1

CO LG CV2

I NTG CV1

I NTG CV2

Pear son Cor r elat ionDV RO WG CV CO LG CV1 CO LG CV2 I NTG CV1 I NTG CV2

Var iables Entered/Removed c

INTGCV2,INTGCV1,COLGCV2,COLGCV1,ROWGCV

a

. Enter

.a ROWGCVb Remove

ROWGCVa . Enter

.a COLGCV2,

COLGCV1b Remove

COLGCV2,COLGCV1

a . Enter

.a INTGCV2,

INTGCV1b Remove

INTGCV2,INTGCV1

a . Enter

Model1

2

3

4

5

6

7

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

All requested variables removed.b.

Dependent Variable: DVc.

-skip


Row main effect

Col main effect

Interaction effect

Not of any use in this analysis.

Overall ANOVA

ANOVA g

3026. 500 5 605. 300 9. 791 . 000a

2596. 500 42 61. 821

5623. 000 47

2519. 500 4 629. 875 8. 727 . 000b

3103. 500 43 72. 174

5623. 000 47

3026. 500 5 605. 300 9. 791 . 000a

2596. 500 42 61. 821

5623. 000 47

999. 125 3 333. 042 3. 169 . 034c

4623. 875 44 105. 088

5623. 000 47

3026. 500 5 605. 300 9. 791 . 000d

2596. 500 42 61. 821

5623. 000 47

2534. 375 3 844. 792 12. 035 . 000e

3088. 625 44 70. 196

5623. 000 47

3026. 500 5 605. 300 9. 791 . 000f

2596. 500 42 61. 821

5623. 000 47

Regression

Residual

Tot al

Regression

Residual

Tot al

Regression

Residual

Tot al

Regression

Residual

Tot al

Regression

Residual

Tot al

Regression

Residual

Tot al

Regression

Residual

Tot al

Model1

2

3

4

5

6

7

Sum ofSquares df Mean Square F Sig.

Predict ors: (Const ant ) , I NTGCV2, I NTGCV1, COLGCV2, COLGCV1,ROWGCV

a.

Predict ors: (Const ant ) , I NTGCV2, I NTGCV1, COLGCV2, COLGCV1b.

Predict ors: (Const ant ) , I NTGCV2, I NTGCV1, ROWGCVc.

Predict ors: (Const ant ) , I NTGCV2, I NTGCV1, ROWGCV, COLGCV2,COLGCV1

d.

Predict ors: (Const ant ) , ROWGCV, COLGCV2, COLGCV1e.

Predict ors: (Const ant ) , ROWGCV, COLGCV2, COLGCV1, I NTGCV2,I NTGCV1

f .

Dependent Variable: DVg.

-skip

Models 3,5, and 7 are all identical. Each has all variables in the equation. The change in R2 when going from Model 2 to 3, or Model 4 to 5, or Model 6 to 7, tests the significance of one of the effects in the factorial design - either a main effect or the interaction effect. If your only interest is in those effects, then the coefficients box won't be of interest to you. You might be interested in the significance of one or more of the individual GCV's presented here, though.

ANOVA via MR and GLM 43 5/14/2023WHEW!!

Not of much use in this analysis.

Co e ffic ie n tsa

5 3 .2 5 0 1 .1 3 5 4 6 .9 2 1 .0 0 0

6 .5 0 0 2 .2 7 0 .3 0 0 2 .8 6 4 .0 0 7

1 4 .1 2 5 3 .2 1 0 .4 6 1 4 .4 0 0 .0 0 0

1 0 .1 8 8 2 .7 8 0 .3 8 4 3 .6 6 5 .0 0 1

1 0 .7 5 0 6 .4 2 0 .1 7 6 1 .6 7 4 .1 0 1

1 2 .6 2 5 5 .5 6 0 .2 3 8 2 .2 7 1 .0 2 8

5 3 .2 5 0 1 .2 2 6 4 3 .4 2 6 .0 0 0

1 4 .1 2 5 3 .4 6 8 .4 6 1 4 .0 7 3 .0 0 0

1 0 .1 8 8 3 .0 0 4 .3 8 4 3 .3 9 2 .0 0 2

1 0 .7 5 0 6 .9 3 7 .1 7 6 1 .5 5 0 .1 2 9

1 2 .6 2 5 6 .0 0 7 .2 3 8 2 .1 0 2 .0 4 1

5 3 .2 5 0 1 .1 3 5 4 6 .9 2 1 .0 0 0

6 .5 0 0 2 .2 7 0 .3 0 0 2 .8 6 4 .0 0 7

1 4 .1 2 5 3 .2 1 0 .4 6 1 4 .4 0 0 .0 0 0

1 0 .1 8 8 2 .7 8 0 .3 8 4 3 .6 6 5 .0 0 1

1 0 .7 5 0 6 .4 2 0 .1 7 6 1 .6 7 4 .1 0 1

1 2 .6 2 5 5 .5 6 0 .2 3 8 2 .2 7 1 .0 2 8

5 3 .2 5 0 1 .4 8 0 3 5 .9 8 8 .0 0 0

6 .5 0 0 2 .9 5 9 .3 0 0 2 .1 9 6 .0 3 3

1 0 .7 5 0 8 .3 7 0 .1 7 6 1 .2 8 4 .2 0 6

1 2 .6 2 5 7 .2 4 9 .2 3 8 1 .7 4 2 .0 8 9

5 3 .2 5 0 1 .1 3 5 4 6 .9 2 1 .0 0 0

6 .5 0 0 2 .2 7 0 .3 0 0 2 .8 6 4 .0 0 7

1 4 .1 2 5 3 .2 1 0 .4 6 1 4 .4 0 0 .0 0 0

1 0 .1 8 8 2 .7 8 0 .3 8 4 3 .6 6 5 .0 0 1

1 0 .7 5 0 6 .4 2 0 .1 7 6 1 .6 7 4 .1 0 1

1 2 .6 2 5 5 .5 6 0 .2 3 8 2 .2 7 1 .0 2 8

5 3 .2 5 0 1 .2 0 9 4 4 .0 3 4 .0 0 0

6 .5 0 0 2 .4 1 9 .3 0 0 2 .6 8 7 .0 1 0

1 4 .1 2 5 3 .4 2 0 .4 6 1 4 .1 3 0 .0 0 0

1 0 .1 8 8 2 .9 6 2 .3 8 4 3 .4 3 9 .0 0 1

5 3 .2 5 0 1 .1 3 5 4 6 .9 2 1 .0 0 0

6 .5 0 0 2 .2 7 0 .3 0 0 2 .8 6 4 .0 0 7

1 4 .1 2 5 3 .2 1 0 .4 6 1 4 .4 0 0 .0 0 0

1 0 .1 8 8 2 .7 8 0 .3 8 4 3 .6 6 5 .0 0 1

1 0 .7 5 0 6 .4 2 0 .1 7 6 1 .6 7 4 .1 0 1

1 2 .6 2 5 5 .5 6 0 .2 3 8 2 .2 7 1 .0 2 8

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

(Co n s ta n t )

ROW GCV

COL GCV1

COL GCV2

INT GCV1

INT GCV2

Mo d e l1

2

3

4

5

6

7

B Std . E rro r

Un s ta n d a rd i z e dCo e f f i c i e n t s

Be ta

Sta n d a rdi z e d


t S i g .

De p e n d e n t Va ri a b l e : DVa .

19961997

19981999

20002001

year

300275250225200175

part 1 scores

-The MEDRES Dataset – A 2x6 Factorial Design Example Included because it illustrates use of built-in orthogonal polynomialsThis example concerns the issue of quality of surgery residents over the years. There was talk that quality of surgical residents has been decreasing. To address this issue, a doctor at a local hospital conducted a survey of resident programs throughout the nation. The survey requested information on residents’ academic credentials. Of interest here are PART1 scores, scores on the major part of a GRE-like exam taken by all residents, and AOA member qualification, whether the resident was in the top 10% of his/her class.

Six years’ worth of data were collected from the survey responses. The distributions of PART1 scores for all six years is given below. Be sure to understand that each year represents a different group of residents.

The distributions of AOA scores (1=In top 10%, 0=Not) is below.

The interest of the investigators was in changes, if any, across years. There was also an interest in any differences that might exist between small programs and large programs.

Thus, this is a 6 (Year) x 2 (Progsize) factorial design problem with two dependent variables – PART1 scores and AOA.


aoa aoa * year Crosstabulation

% within year

70.1% 71.4% 71.6% 77.5% 84.4% 84.4% 76.7%

29.9% 28.6% 28.4% 22.5% 15.6% 15.6% 23.3%

100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

0 no

1 yes

aoa aoa

Total

1996 1997 1998 1999 2000 2001

year

Total

Specifying the GLM Analysis of PART1 scores from the MEDRES Data


The main dialog box.

Tells GLM that PART1 is the dependent variable.

YEAR and PROGSIZE are the two independent variables, i.e., factors.

These are fixed factors. That is, all of their values are included in the data.

The Options dialog box.

Have GLM compute estimated marginal means for each factor and for each cell if possible.

I checked the “Compare main effects” box to see what GLM output will result.

Have GLM display descriptive statistics and effect size and power estimates.

The GLM Output

First, the syntax resulting from all the above pulldowns.

UNIANOVA part1 BY year progsize /CONTRAST (year)=Polynomial /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /POSTHOC = year ( BTUKEY ) /PLOT = PROFILE( year*progsize ) /EMMEANS = TABLES(year) COMPARE ADJ(LSD) /EMMEANS = TABLES(progsize) COMPARE ADJ(LSD) /EMMEANS = TABLES(year*progsize) /PRINT = DESCRIPTIVE ETASQ OPOWER /CRITERIA = ALPHA(.05) /DESIGN = year progsize year*progsize .


The Post Hoc dialog box.

Since PROGSIZE is a dichotomy, it makes no sense to ask for post hoc comparisons for it.

But YEAR is 6-valued, so we can, although we’ll be focusing on the components of trend requested below.

The Contrasts dialog box.

I’ve requested that GLM test for linear, quadratic, cubic, etc. trends across the six years. The method will be that of orthogonal polynomials. Thankfully, we don’t have to look up and enter the coefficients – they’re built into the program. All we do is ask for Polynomial contrasts.



The descriptive statistics requested by clicking on Options and checking the Descriptive Statistics box.

Examination of means doesn’t suggest a clear trend over years. There is, however, a large numeric difference between small and large programs.


196

206

204

212

212

211

4 or fewerresidents

502

5 or moreresidents

739

1996

1997

1998

1999

2000

2001

year

0

1

progsize

Value Label N


Dependent Variable: part1 part 1 scores

209.35 17.515 80

222.02 14.435 116

216.85 16.915 196

211.37 13.107 83

220.59 17.330 123

216.88 16.371 206

214.38 16.420 85

222.39 17.567 119

219.05 17.512 204

212.58 15.282 84

221.88 15.804 128

218.20 16.217 212

219.26 16.777 88

225.80 16.415 124

223.08 16.839 212

214.79 17.689 82

224.41 17.542 129

220.67 18.175 211

213.70 16.422 502

222.87 16.606 739

219.16 17.127 1241

progsize0 4 or fewer residents

1 5 or more residents

Total

0 4 or fewer residents


Total



Total



Total



Total



Total



Total

year1996

1997

1998

1999

2000

2001

Total


What follows is the basic output of GLM – the tests of differences associated with the between-subjects factors.

Starting at the top . . .

Corrected Model: This is the overall ANOVA in that we’ve seen in REGRESSION output. It’s a test of the significance of the relationship of the DV to ALL the group-coding variables created internally by GLM to represent the two factors and their interaction – all 5+1+5 = 11 of them. The conclusion is that PART1 scores are related to the factors.

Intercept. This is a test of the null hypothesis that in the population the intercept of the regression of the DV onto the group-coding variables representing the factors is 0.

Year. The Year line tests the significance of the relationship of the DV to the 5 group-coding variables created to represent the YEAR factor. There are significant differences in mean PART1 scores across years. (Use Tests of Contrasts below to see in which direction.)

Progsize The Progsize line tests the significance of the relationship of the DV to the 1 group-coding variable created to represent the PROGSIZE factor. There is a significant difference in mean PART1 scores between small and large programs. (Looking at the graphs below shows that larger programs have higher mean.)

Year * Progsize This line tests the significance of the relationship of the DV to the 5 product variables created to represent the interaction of YEAR and PROGSIZE. Its nonsignificance means that the change across years is the same for small and large programs.

Error. The Error line represents the denominator of the F ratio used to test all the hypotheses.

Partial Eta squared column. The entries in this column are estimates of effect size. See 510/511 notes for explanations. It can be reported if a journal wants effect sizes. .020 is small. .071 is medium to large.

Noncent Parameter The noncentrality parameter is a quantity that is used to compute power. It is not reported.

Observed Power. The entries in this column give the probability of rejecting the null with a new sample if the population values were equal to those found in this sample.




32289.485b 11 2935.408 10.884 .000 .089 119.724 1.000

56887237.743 1 56887237.743 210928.054 .000 .994 210928.054 1.000

6615.688 5 1323.138 4.906 .000 .020 24.530 .983

25421.258 1 25421.258 94.258 .000 .071 94.258 1.000

1012.640 5 202.528 .751 .585 .003 3.755 .273

331460.960 1229 269.700

59971422.000 1241

363750.445 1240

SourceCorrected Model

Intercept

year

progsize

year * progsize

Error

Total

Corrected Total

Type III Sum ofSquares df Mean Square F Sig.

Partial EtaSquared


a



Custom Hypothesis Tests – Results for the orthogonal polynomials


The Custom Hypothesis Tests box gives the results of contrasts you have specified by clicking on the Contrasts button in the main dialog box.

Here the linear trend is significant and the contrast estimate is positive, suggesting that mean PART1 scores increased over the 6-year period.

The 5th order contrast is also marginally significant. I would ignore it as a chance result.

Contrast Results (K Matrix)

4.551

0

4.551

1.171

.000

2.254

6.848

-.498

0

-.498

1.170

.671

-2.792

1.797

-1.612

0

-1.612

1.162

.166

-3.892

.668

-1.704

0

-1.704

1.158

.141

-3.976

.568

-2.541

0

-2.541

1.159

.029

-4.815

-.268

Contrast Estimate

Hypothesized Value

Difference (Estimate - Hypothesized)

Std. Error

Sig.

Lower Bound

Upper Bound

95% Confidence Intervalfor Difference

Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


year PolynomialContrast

a

Linear

Quadratic

Cubic

Order 4

Order 5

part1 part 1 scores

DependentVariable

Metric = 1.000, 2.000, 3.000, 4.000, 5.000, 6.000a.

Test Results


6615.688 5 1323.138 4.906 .000 .020 24.530 .983

331460.960 1229 269.700

SourceContrast

Error

Sum of Squares df Mean Square F Sig.Partial EtaSquared


a


Estimated Marginal Means 1. year


The estimated marginal means are computed controlling for the other factor, PROGSIZE.

They’re computed assuming that each YEAR has the same mix of small and large programs.

So these Estimated Marginal Means will be different than the Obtained means printed in the Descriptives output section.

This output is the result of checking the “Compare Main Effects” box.

Mean at each level is compared with the mean at every other level.

Don’t request this output unless you know what you’re doing.

Obligatory aggregation of the above specific comparisons into a single overall test.

Estimates


215.684 1.193 213.342 218.025

215.983 1.166 213.695 218.272

218.386 1.166 216.098 220.674

217.233 1.153 214.971 219.495

222.530 1.145 220.284 224.775

219.602 1.160 217.327 221.877

year1996

1997

1998

1999

2000

2001



Pairwise Comparisons


-.300 1.669 .857 -3.574 2.974

-2.702 1.668 .106 -5.976 .571

-1.549 1.659 .351 -4.805 1.706

-6.846* 1.653 .000 -10.090 -3.602

-3.918* 1.664 .019 -7.183 -.653

.300 1.669 .857 -2.974 3.574

-2.402 1.649 .146 -5.638 .834

-1.250 1.640 .446 -4.467 1.968

-6.546* 1.634 .000 -9.752 -3.340

-3.618* 1.645 .028 -6.845 -.391

2.702 1.668 .106 -.571 5.976

2.402 1.649 .146 -.834 5.638

1.153 1.640 .482 -2.065 4.370

-4.144* 1.634 .011 -7.350 -.939

-1.216 1.645 .460 -4.443 2.011

1.549 1.659 .351 -1.706 4.805

1.250 1.640 .446 -1.968 4.467

-1.153 1.640 .482 -4.370 2.065

-5.297* 1.625 .001 -8.484 -2.109

-2.369 1.635 .148 -5.577 .840

6.846* 1.653 .000 3.602 10.090

6.546* 1.634 .000 3.340 9.752

4.144* 1.634 .011 .939 7.350

5.297* 1.625 .001 2.109 8.484

2.928 1.629 .073 -.269 6.125

3.918* 1.664 .019 .653 7.183

3.618* 1.645 .028 .391 6.845

1.216 1.645 .460 -2.011 4.443

2.369 1.635 .148 -.840 5.577

-2.928 1.629 .073 -6.125 .269

(J) year1997

1998

1999

2000

2001

1996

1998

1999

2000

2001

1996

1997

1999

2000

2001

1996

1997

1998

2000

2001

1996

1997

1998

1999

2001

1996

1997

1998

1999

2000

(I) year1996

1997

1998

1999

2000

2001

MeanDifference (I-J) Std. Error Sig.

aLower Bound Upper Bound

95% Confidence Interval forDifference

a

Based on estimated marginal means

The mean difference is significant at the .05 level.*.

Adjustment for multiple comparisons: Least Significant Difference(equivalent to no adjustments).

a.

Univariate Tests


6615.688 5 1323.138 4.906 .000 .020 24.530 .983

331460.960 1229 269.700

Contrast

Error



a

The F tests the effect of year. This test is based on the linearly independent pairwise comparisons among theestimated marginal means.


Estimated Marginal Means 2. progsize


This output is the result of checking the “Compare Main Effects” box.

Mean at each level is compared with the mean at every other level.

Don’t request this output unless you know what you’re doing.

Estimates


213.623 .733 212.184 215.062

222.850 .605 221.664 224.036

progsize0 4 or fewerresidents

1 5 or moreresidents



Pairwise Comparisons


-9.227* .950 .000 -11.091 -7.362

9.227* .950 .000 7.362 11.091

(J) progsize1 5 or moreresidents

0 4 or fewerresidents

(I) progsize0 4 or fewerresidents


MeanDifference (I-J) Std. Error Sig.

aLower Bound Upper Bound

95% Confidence Interval forDifference

a

Based on estimated marginal means

The mean difference is significant at the .05 level.*.

Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).a.

Univariate Tests


25421.258 1 25421.258 94.258 .000 .071 94.258 1.000

331460.960 1229 269.700

Contrast

Error



a

The F tests the effect of progsize. This test is based on the linearly independent pairwise comparisons among theestimated marginal means.


3. year * progsize


209.350 1.836 205.748 212.952

222.017 1.525 219.026 225.009

211.373 1.803 207.837 214.910

220.593 1.481 217.688 223.499

214.376 1.781 210.882 217.871

222.395 1.505 219.441 225.348

212.583 1.792 209.068 216.099

221.883 1.452 219.035 224.731

219.261 1.751 215.827 222.696

225.798 1.475 222.905 228.692

214.793 1.814 211.235 218.351

224.411 1.446 221.574 227.248













year1996

1997

1998

1999

2000

2001



Post Hoc Tests

year

Homogeneous Subsets


In view of the significant linear trend found above, the post hoc tests are kind of meaningless.

These are identical to the observed cell means.

If there had been quantitative covariates, they would not have been identical to the observed means.

part1 part 1 scores

Tukey Ba,b,c

196 216.85

206 216.88

212 218.20

204 219.05 219.05

211 220.67 220.67

212 223.08

year1996

1997

1999

1998

2001

2000

N 1 2

Subset

Means for groups in homogeneous subsets are displayed.Based on Type III Sum of SquaresThe error term is Mean Square(Error) = 269.700.

Uses Harmonic Mean Sample Size = 206.671.a.

The group sizes are unequal. The harmonic mean of thegroup sizes is used. Type I error levels are not guaranteed.

b.

Alpha = .05.c.

Analysis of AOA scores.Because the AOA scores are dichotomies, they should be analyzed using Logistic Regression. The results of the analysis using GLM will be presented here for comparison with logistic regression results when we cover it in a few weeks..UNIANOVA aoa BY year progsize /CONTRAST (year)=Polynomial /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /POSTHOC = year ( BTUKEY ) /PLOT = PROFILE( year*progsize ) /EMMEANS = TABLES(year) COMPARE ADJ(LSD) /EMMEANS = TABLES(progsize) COMPARE ADJ(LSD) /EMMEANS = TABLES(year*progsize) /PRINT = DESCRIPTIVE ETASQ OPOWER /CRITERIA = ALPHA(.05) /DESIGN = year progsize year*progsize .



Skip in Fall 10, Fall 11, Fall 13, Fall 14, Fall 15, Fall 16, Fall17 Fall18


201

206

204

213

211

212

4 or fewerresidents

504

5 or moreresidents

743

1996

1997

1998

1999

2000

2001

year

0

1

progsize

Value Label N


Dependent Variable: aoa aoa

.13 .343 82

.41 .494 119

.30 .459 201

.10 .297 83

.41 .495 123

.29 .453 206

.13 .338 85

.39 .491 119

.28 .452 204

.11 .311 84

.30 .461 129

.23 .419 213

.08 .272 88

.21 .410 123

.16 .364 211

.09 .281 82

.20 .402 130

.16 .363 212

.11 .307 504

.32 .467 743

.23 .423 1247



Total



Total



Total



Total



Total



Total



Total

year1996

1997

1998

1999

2000

2001

Total


Note that the YEAR and PROGSIZE factors are both significant, and that the YEAR*PROGSIZE interaction is nearly significant.

The “almost” interaction will appear in the plots below.




20.254b

11 1.841 11.211 .000 .091 123.316 1.000

54.923 1 54.923 334.403 .000 .213 334.403 1.000

3.480 5 .696 4.238 .001 .017 21.191 .963

14.152 1 14.152 86.165 .000 .065 86.165 1.000

1.726 5 .345 2.102 .063 .008 10.510 .701

202.839 1235 .164

291.000 1247

223.092 1246

SourceCorrectedModel

Intercept

year

progsize

year * progsize

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

Partial EtaSquared


a



Custom Hypothesis Tests


Contrast Results (K Matrix)

-.124

0

-.124

.029

.000

-.181

-.068

-.021

0

-.021

.029

.470

-.077

.036

.026

0

.026

.029

.364

-.030

.082

.028

0

.028

.029

.333

-.028

.084

-.010

0

-.010

.029

.733

-.066

.046

Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


Contrast Estimate

Hypothesized Value


Std. Error

Sig.

Lower Bound

Upper Bound


year PolynomialContrast

a

Linear

Quadratic

Cubic

Order 4

Order 5

aoa aoa

Dependent

Variable

Metric = 1.000, 2.000, 3.000, 4.000, 5.000, 6.000a.

Test Results


3.480 5 .696 4.238 .001 .017 21.191 .963

202.839 1235 .164

SourceContrast

Error



a


Profile Plots

We won’t try this one using REGRESSION. Although the procedure is straightforward, it’s quite tedious.


Review of main tests conducted in Factorial Designs – skipped in 2017, 2018

The data below represent a 2 (row) by 3 (column) factorial design.

Participants were shown lists of words. Some were intact words, others were scrambled. So the row factor is “Intactness”?? The data were presented at three different rates - 300, 450, or 600 words per minute. So the column factor is Rate. The scores are percentages of idea units recalled.

Effects tested in factorial designs

When data have been gathered in a 2 way factorial design, the following questions are usually asked. (But remember, the fact that the data were gathered factorially doesn’t mean that you have to analyze them factorially.)

1. Is there a main effect of the first factor. Is the DV averaged across levels of the 2nd factor related to changes in levels of the 1st factor. This will involve comparison of 50.0 vs. 56.5 from the above example.

2. Is there a main effect of the 2nd factor? Is the DV averaged across levels of the 1st factor related to changes in levels of the 2nd factor. E.g., is the mean of Column 1 significantly different from the mean of column 2? This will involve comparisons of 60.3125, 54.8125, and 44.6250 in the above.

3 Is there an interaction of the effects of the 1st and 2nd factors. Does the relationship of the DV to the 1st factor change as levels of the 2nd factor change? Do differences between Row means change from one column to the next? If so, there is an interaction. (Alternatively, do differences between Column means change from one row to the next?) This will involve comparing the difference 66.250-54.375 with 59.875-49.75 with 43.375-45.875. If the differences are consistent, there is no interaction. If the differences are different, then there IS an interaction.


Is number of words recalled affected by Rate of presentation and Intactness of the words?

Higher order factorial designs.

With higher order factorial designs, even more questions involving interactions can be asked . . .

For a 3 way factorial design1. We test the main effect of Factor A2. We test the main effect of Factor B3. We test the main effect of Factor C4. We test the interaction of factors A & B. Is the effect of A the same across levels of B, or vice versa?5. We test the interaction of factors A & C. Is the effect of A the same across levels of C or vice versa?6. We test the interaction of factors B & C. Is the effect of B the same across levels of C or vice versa?7. We test the interaction of all factors: ABC Is the interaction of A and B different at different levels of C?

Significance Testing – a review

The general form of a significance test in MR

R2 Factor being tested + Factors controlled for – R2 Factors controlled for

---------------------------------------------------------------------Number of variables representing factor being tested.

F = --------------------------------------------------------------------------1 – R2

All variables

---------------------------------------------------------------------N – All variables – 1

The numerator

The numerator of the numerator:: R2 Factor being tested + Factors controlled for – R2 Factors to be controlled for

It’s the change (i.e., increase) in R2 due to the addition of the variable or variables representing the factor being tested. It’s the increase over and above R2 for another set of variables – those representing the factors being controlled for.

So, for example, if I have two factors, A and B, and I want to test the significance of Factor A, controlling for Factor B, the numerator of the F statistics will be

R2A + B – R2 B

If I have 3 factors and want to test the significance of A controlling for both B and C, then the numerator will be R2

A + B + C – R2 B + C

The significance of a variable or factor is always assessed by determining if it adds to R2 over and above a base. If we’re controlling for other variables, the R2 obtained when they’re in the equation is the base. If we’re not controlling for anything, then 0 is the base.

Note that the numerator is an increase in R2, from R2 associated with the base to R2 associated with the base PLUS the variable being tested.

The denominator

The denominator of the F statistic is 1 – R2All variables.

This quantity should represent random variability in the dependent variable. It’s the proportion of variance left over when we take out all the predictability associated with the variables we’re studying. The smaller it is, the greater the chance that the F will be significant.


Dimensions of Research - The University of … · Web viewDimensions of Research Problems Research...

Documents

Transcript of Dimensions of Research - The University of … · Web viewDimensions of Research Problems Research...