Post on 31-Jan-2016
description
1
Always be contented, be grateful, be understanding and be compassionate.
2
Blocking• We will add a factor even if it is not of interest so that the study of the prime factors is under more homogeneous conditions. This factor is called “block”. Most of time, the block does not interact with prime factors.
• Popular block factors are “location”, “gender” and so on.
• A two-factor design with one block factor is called a “randomized block design”.
RBD Model (Section 15.2)
3
•A randomized (complete) block design is an experimental design for comparing t treatments (or say levels) in b blocks. Treatments are randomly assigned to units within a block and without replications.
•The probability model of RBD is the same as two-way Anova model with no interaction term (so can conduct multiple comparisons for each factor separately)
4
For example, suppose that we are studying worker absenteeism as a function of the age of the worker, and have different levels of ages: 25-30, 40-55, and 55-60. However, a worker’s gender may also affect his/her amount of absenteeism. Even though we are not particularly concerned with the impact of gender, we want to ensure that the gender factor does not pollute our conclusions about the effect of age. Moreover, it seems unlikely that “gender” interacts with “ages”. We include “gender” as a block factor.
O/L: Example 15.1
5
• Goal: To compare the effects of 3 different insecticides on a variety of string beans.
• Condition: It was necessary to use 4 different plots of land.
• Response of interest: the number of seedlings that emerged per row.
Data:
6
insecticide plot seedlings1 1 561 2 481 3 661 4 622 1 832 2 782 3 942 4 933 1 803 2 723 3 833 4 85
Minitab>>General Linear Model, response seedlings, model insecticide & plot
7
General Linear Model: seedings versus insectcide, plot Analysis of Variance for seedlings, using Adjusted SS for TestsSource DF Seq SS Adj SS Adj MS F Pinsecticide 2 1832.00 1832.00 916.00 211.38 0.000plot 3 438.00 438.00 146.00 33.69 0.000Error 6 26.00 26.00 4.33Total 11 2296.00S = 2.08167 R-Sq = 98.87% R-Sq(adj) = 97.92%
Unusual Observations for seedingsObs seedings Fit SE Fit Residual St Resid 11 83.0000 86.0000 1.4720 -3.0000 -2.04 RR denotes an observation with a large standardized residual.
8
420-2-4
99
90
50
10
1
Residual
Perc
ent
9080706050
2
0
-2
Fitted Value
Resi
dual
210-1-2-3
3
2
1
0
Residual
Fre
quency
121110987654321
2
0
-2
Observation Order
Resi
dual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for seedings
RBD with random blocks
• We would like to apply our conclusions on a large pool of blocks
• We are able to sample blocks randomly
• Example: Minitab unit 5– Goal: to study the difference of 3 appraisers on
their appraised values– Blocks: randomly selected 5 properties
9
10
Latin Square Design (Section 15.3)Example:
Three factors, A (block factor), B (block factor), and C (treatment factor), each at three levels. A possible arrangement:
B 1 B 2 B 3
A1
C1 C1 C1
A C2 C2 C2
A3 C3 C3 C3
2
11
Notice, first, that these designs are squares; all factors are at the same number of levels, though there is no restriction on the nature of the levels themselves. Notice, that these squares are balanced: each letter (level) appears the same number of times; this insures unbiased estimates of main effects.
How to do it in a square? Each treatment appears once in every column and row.
Notice, that these designs are incomplete; of the 27 possible combinations of three factors each at three levels, we use only 9.
12
Example:
Three factors, A (block factor), B (block factor), and C (treatment factor), each at three levels, in a Latin Square design; nine combinations.
B 1 B 2 B 3
A1
C1 C2 C3
A C2 C3 C1
A3 C3 C1 C2
2
13
Example with 4 Levels per FactorExample with 4 Levels per Factor
AutomobilesAutomobiles A A four levelsfour levelsTire positionsTire positions B B four levelsfour levelsTire treatments Tire treatments C C four levelsfour levels
FACTORSFACTORS
Lifetime of a tire Lifetime of a tire (days)(days)
VARIABLEVARIABLE
A 1
A 2
A 3
A 4
B1 B2 B3 B4
C 4
8 5 5C 3
8 7 7C 2
8 9 0C 1
9 9 7
C 1
9 6 2C 2
8 1 7C 3
8 4 5C 4
7 7 6
C 3
8 4 8C 4
8 4 1C 1
7 8 4C 2
7 7 6
C 2
8 3 1C 1
9 5 2C 4
8 0 6C 3
8 7 1
14
The Model for (Unreplicated) The Model for (Unreplicated) Latin SquaresLatin Squares
Example:
Note that interaction is not present in the model.
Threefactors , ,andeachat mlevels,
yijk = +
i+ j + k + ijk
i= 1,... m
j=1, ..., m
k=1, ... ,m
Same three assumptions: normality, constant variances, and randomness.
Y = A + B + C + eAB, AC, BC, ABC
15
Putting in Estimates:Putting in Estimates:
Total variability
among yields
Variability among yields
associated with Rows
Variability among yields
associated with
Columns
Variability among yields
associated with Inside
Factor
where R =
or bringing y••• to the left – hand side,
(y ijk –y ...) = (y i .. – y ...) + (y .j . – y...) + (y ..k – y ...) + R,
= + +
y ijk =y ... + (y i.. – y ... ) + (y . j. – y ...) + (y ..k – y ... ) + R
yijk – y i.. – y . j. – y.. k + 2y...
16
Actually, Actually, R R
An “interaction-like” term. (After all, there’s no replication!)
= y ijk - y i .. - y . j. - y ..k + 2y...= (y ijk - y ...)
-
(y i.. - y ...)
(y . j. - y ...)
(y ..k - y ...),-
-
17
The analysis of variance (omitting the mean squares, The analysis of variance (omitting the mean squares, which are the ratios of second to third entries), and which are the ratios of second to third entries), and expectations of mean squares:expectations of mean squares:
Source ofvariation
Sum ofsquares
Degrees offreedom
Expectedvalue of
mean squareRows
m (y i.. – y ...)2
i = 1
m m – 1 2 + VRows
Columns m (y . j . – y ...)
2j = 1
m m – 1 2 + VCol
Insidefactor
m (y ..k – y ...)
2k = 1
m m – 1 2 + V Inside factor
by subtraction (m – 1)( m – 2) 2
Total i
j(y ijk – y ...)
2k
m 2 – 1
Error
18
The expected values of the mean squares immediately suggest the F ratios appropriate for testing null hypotheses on rows, columns and inside factor.
19
Our Example:
B1 B2 B3 B4
A1 4855
3877
2890
1997
A2 1962
2817
3845
4776
A3 3848
4841
1784
2776
A4 2831
1952
4806
3871
Tire Position
Auto.
(Inside factor = Tire Treatment)
20
General Linear Model: Lifetime versus Auto, Postn, Trtmnt
Factor Type Levels Values Auto fixed 4 1 2 3 4Postn fixed 4 1 2 3 4Trtmnt fixed 4 1 2 3 4
Analysis of Variance for Lifetime, using Adjusted SS for Tests
Source DF Seq SS Adj SS Adj MS F PAuto 3 17567 17567 5856 2.17 0.192Postn 3 4679 4679 1560 0.58 0.650Trtmnt 3 26722 26722 8907 3.31 0.099Error 6 16165 16165 2694Total 15 65132
Unusual Observations for Lifetime
Obs Lifetime Fit SE Fit Residual St Resid 11 784.000 851.250 41.034 -67.250 -2.12R
21
Minitab DATA ENTRY
VAR1 VAR2 VAR3 VAR4855 1 1 4962 2 1 1848 3 1 3831 4 1 2877 1 2 3817 2 2 2. . . .. . . .. . . .871 4 4 3
22
Latin Square with REPLICATION
• Case One: using the same rows and columns for all Latin squares.
• Case Two: using different rows and columns for different Latin squares.
• Case Three: using the same rows but different columns for different Latin squares.
23
Treatment Assignments for n Replications
• Case One: repeat the same Latin square n times.
• Case Two: randomly select one Latin square for each replication.
• Case Three: randomly select one Latin square for each replication.
24
Example: n = 2, m = 4, trtmnt = A,B,C,D
Case One:
column
row 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
column
row 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
• Row = 4 tire positions; column = 4 cars
25
column
row 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
column
row 5 6 7 8
5 B C D A
6 A D C B
7 D B A C
8 C A B D
Case Two
• Row = clinics; column = patients; letter = drugs for flu
26
5 6 7 8
B C D A
A D C B
D B A C
C A B D
Case Three
column
row 1 2 3 4
1 A B C D
2 B C D A
3 C D A B
4 D A B C
• Row = 4 tire positions; column = 8 cars
27
ANOVA for Case 1
SSBR, SSBC, SSBIF are computed the same way as before, except that the multiplier of (say for
rows) m (Yi..-Y…)2 becomes
mn (Yi..-Y…)2
and degrees of freedom for error becomes
(nm2 - 1) - 3(m - 1) = nm2 - 3m + 2
28
ANOVA for other cases:
Using Minitab in the same way can give Anova tables for all cases.
1. SS: please refer to the book, Statistical Principles of research Design and Analysis by R. Kuehl.
2. DF: # of levels – 1 for all terms except error. DF of error = total DF – the sum of the rest DF’s.
29
Three or More Factors
Notation:
• Y = response; A, B, C, … = input factors
• AB = interaction between A and B
• ABC = interaction between A, B, and C
• The term involving k factors has order of k: eg. AB order 2 term
ABC order 3 term
30
• Full model = the model includes all factors and their interactions, denoted as
(1) Two factors
A|B (= A+B+AB)
(2) Three factors
A|B|C (= A+B+C+AB+AC+BC+ABC)
(3) And so on.
31
Backward Model Selection
1. Fit the full model and delete the most insignificant highest order term.
2. Fit the reduced model from 1. and delete the most insignificant highest order term.
3. Repeat 2. until all remaining highest order terms are significant.
4. Repeat the same procedure (deleting the most insignificant term each time until no insignificant terms) for the 2nd highest order, then the 3rd highest order, …, and finally the order 1 terms.
5. Determine the final model and do assumption checking for it.
32
Note.
If a term is in the current model, then all lower order terms involving factors in that term must not be deleted even if they are insignificant.
eg. If ABC is significant (so it is in the model), then A, B, C, AB, AC, BC cannot be deleted.
33
Note.
The procedure of backward model selection can be very time-consuming if the number of factors, k, is large. In such cases, we delete all insignificant terms together when we are processing the order 4 or higher terms.
• Examples are in Minitab unit 11.