SADC Course in Statistics Comparing Regressions (Session 14)

SADC Course in Statistics

Comparing Regressions

(Session 14)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to

• understand and interpret the components of a linear model with one quantitative variable and one categorical factor

• interpret output from such models

• write regressions equations for each level of the categorical variable using the model estimates


Return to the Paddy example

In the paddy example, consider the possible effects of fertiliser and variety together.

Objective is to explore whether fertiliser or variety of both affect paddy yields.

Note that the two explanatory variables (we will call them factors) being considered here are of different types, one is a quantitative variable, the other is a categorical variable.


Models with each factor in turnPreviously we have fitted each variable one at a time.Thus the model with fertiliser alone is:

yi = 0 + 1 (fert)i + i

while the model with variety alone is:

yij = ’0 + vi + ij

In models above, 0 , ’0 represent constants, 1

is the slope of the line in first model and vi (i=1,2,3)

represent the variety effect in 2nd model.


One model with both factors

We can put the two factors together into a single model as:

yij = 0 + 1 (fert)ij + vi + ij

This model fits a regression lines with common slope for each variety, i.e. it represents three parallel lines.

The intercepts of the lines are:

(0 + v1), (0 + v2) and (0 + v3).


Anova results (sequential)Source d.f. S.S. M.S. F Prob.

Fertiliser 1 29.94 29.94 130.8 0.000

Variety 2 12.29 6.14 26.9 0.000

Residual 32 7.32 0.2288

Total 35 49.55

The Residual M.S. (s2) = 0.2288. It describes the variation not explained by fertiliser and variety.

How may the above results be interpreted?


Anova results (adjusted)Source d.f. Adj.SS. Adj.MS. F Prob.

Fertiliser 1 6.95 6.95 30.4 0.000

Variety 2 12.29 6.14 26.9 0.000

Residual 32 7.32 0.2288

Total 35 49.55

In anova above, each term has been adjusted for the other. So S.S. for fertiliser, variety and residual do not add to the total S.S.

What conclusions may be drawn from above?


Model estimates

Parameter Coeff. Std.error t t prob

0 : constant 4.776 0.322 14.9 0.000

1 : fertiliser 0.526 0.096 5.51 0.000

g1 (new) 0 - - -

g2 (old) -1.207 0.269 -4.49 0.000

g3 (trad) -2.179 0.304 -7.16 0.000

What do these results tell us?


Comparing variety means

Thus: Old - New = -1.207 = Estimate of g2

Trad - New = -2.179 = Estimate of g3

In addition, because the results need to be adjusted for the effect of fertiliser, results again need to be reported in terms of adjusted means!

These are usually calculated at the overall mean of the fertiliser variable = 1.444

As before, comparisons with the base level can be made using the model estimates.


Raw means and adjusted means

Sample Raw Std.error

Variety Size(n) Means (s.d./n)

New improved 4 5.96 0.128

Old improved 17 4.54 0.173

Traditional 15 3.00 0.168

Variety Adjusted means Std.error

New improved 5.54 0.251

Old improved 4.33 0.122

Traditional 3.36 0.139

Variety means adjusted for fertiliser effect:


Parallel lines for each variety

Equations describing the regression of yield on fertiliser for each variety are:

y = 0 + 1 (fert) + vi

y = (0 + vi) + 1 (fert)

Thus for the new improved variety, y = (4.776 + 0) + 0.526 (fert) y = 4.776 + 0.526 (fert)

Similarly, equations can be found for the remaining two varieties.


Model with different slopes

We can put the two factors together into a single model as:

yij = 0 + 1(fert)ij + vi + i(fert)ij + ij

This model fits regression lines with different

intercepts (0 + vi), and diff. slopes (1 + i).

The separate slopes are:

(1 + 1), (1 + 3) and (1 + 3).


Anova with different slopes

Source d.f. Adj.SS. Adj.MS. F Prob.

Fertiliser 1 0.391 0.391 1.6 0.211

Variety 2 1.610 0.805 3.4 0.048

Fert*Var 2 0.143 0.071 0.3 0.745

Residual 30 7.180 0.239

Total 35 49.55

Fitting separate lines involves fitting an interaction term (see below)

What are your conclusions?


Final model….

Clear from above that the added term in the model to allow for different slopes is non-significant.

Hence return to the parallel lines model, i.e.y = 4.776 + 0.526(fert), for new varietyy = 3.569 + 0.526(fert), for old varietyy = 2.597 + 0.526(fert), for traditional


Practical work follows to ensure learning objectives are

achieved…

SADC Course in Statistics Comparing Regressions (Session 14)

Documents

Transcript of SADC Course in Statistics Comparing Regressions (Session 14)