Post on 20-Dec-2015
Stat 470-7
• Today: Transformation of the response; Latin-squares
Transformations (Section 2.5)
• Often one will perform a residual analysis to verify modeling assumptions…and at least one assumption fails
• A defect that can frequently arise in non-constant variance
• This can occur, for example, when the data follow a non-normal, skewed distribution
• The F-test in ANOVA is only slightly violated
• In such cases, a variance stabalizing transformation may be applied
Transformations
• Several transformations may be attempted:
– Y*=
– Y*=
– Y*=
Transformations
• Analyze the data on the Y* scale, choosing the transformation where:
– The simplest model results,
– There are no patterns in the residuals
– One can interpret the transformation
Example
• An engineer wishes to study the impact of 4 factors on the rate of advance of a drill. Each of the 4 factors (labeled A-D) were studied at 2 levels
A B C D Y -1 -1 -1 -1 1.68 +1 -1 -1 -1 1.98 -1 +1 -1 -1 3.28 +1 +1 -1 -1 3.44 -1 -1 +1 -1 4.98 +1 -1 +1 -1 5.70 -1 +1 +1 -1 9.97 +1 +1 +1 -1 9.07 -1 -1 -1 +1 2.07 +1 -1 -1 +1 2.44 -1 +1 -1 +1 4.09 +1 +1 -1 +1 4.53 -1 -1 +1 +1 7.77 +1 -1 +1 +1 9.43 -1 +1 +1 +1 11.75 +1 +1 +1 +1 16.30
Example
• Would like to fit an N-way ANOVA to these data (main effects and 2-factor interactions only)
• Model:
Example
Tests of Between-Subjects Effects
Dependent Variable: Y
257.614a 10 25.761 25.406 .001
606.144 1 606.144 597.781 .000
3.331 1 3.331 3.285 .130
43.494 1 43.494 42.894 .001
165.508 1 165.508 163.225 .000
20.885 1 20.885 20.597 .006
9.000E-02 1 9.000E-02 .089 .778
1.416 1 1.416 1.397 .290
2.839 1 2.839 2.800 .155
9.060 1 9.060 8.935 .030
.783 1 .783 .772 .420
10.208 1 10.208 10.067 .025
5.070 5 1.014
868.829 16
262.684 15
SourceCorrected Model
Intercept
A
B
C
D
A * B
A * C
A * D
B * C
B * D
C * D
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .981 (Adjusted R Squared = .942)a.
Example
Residuals vs. Predicted
Predicted Value for Y
1614121086420
Re
sid
ua
l fo
r Y
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
Example
Tests of Between-Subjects Effects
Dependent Variable: SQRTY
9.876a 10 .988 53.540 .000
88.512 1 88.512 4798.513 .000
.103 1 .103 5.610 .064
1.735 1 1.735 94.084 .000
7.011 1 7.011 380.070 .000
.688 1 .688 37.296 .002
2.269E-04 1 2.269E-04 .012 .916
1.683E-02 1 1.683E-02 .912 .383
5.731E-02 1 5.731E-02 3.107 .138
6.818E-02 1 6.818E-02 3.696 .113
3.726E-03 1 3.726E-03 .202 .672
.192 1 .192 10.409 .023
9.223E-02 5 1.845E-02
98.480 16
9.968 15
SourceCorrected Model
Intercept
A
B
C
D
A * B
A * C
A * D
B * C
B * D
C * D
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .991 (Adjusted R Squared = .972)a.
Example
Residuals vs. Predicted
Predicted Value for SQRTY
4.03.53.02.52.01.51.0
Re
sid
ua
l fo
r S
QR
TY
1.0.9.8.7.6.5.4.3.2.1.0
-.1-.2-.3-.4-.5-.6-.7-.8-.9
-1.0
A New Example
• A scientist wishes to investigate the effect of 5 different ingredients (A-E) on the reaction time of a chemical process
• The scientist has enough resources to perform 25 trials
• Each batch of raw material is only large enough to permit 5 runs to be made
• Each run takes about 1.5 hours, so only 5 runs can be performed in a day
Example
• How can we run the experiment?
Day/Batch 1 2 3 4 5 1
2 3 4 5
Two Blocking Variables ; 1 Factor
• Can set up an experiment to remove the effect of 2 blocking variables (e.g., season and time of day)
• Experiment is an example of a 5x5 Latin Squares Design
Latin Squares Design
• Situation:
– Have 2 blocking factors - one for rows and one for columns
– Have 1 experimental factor
– Each factor has k levels
– Design is arranged so that each level of the experimental factors appears exactly one time in each row and each column
– The levels of the two blocking factors are assigned at random to the columns and rows
• Model:
• i, j=1,2,…,k
• l indicates the index for the Latin letter in the (i,j)th cell
• The triplet (i,j,l) takes on values
ijlljiijly
Notes
• No interaction, since interactions cannot be estimated in an un-replicated experiment
• Usual assumptions apply
ANOVA Decomposition
ANOVA Decomposition
ANOVA Table
Source of Variation
Degrees of Freedom
Sum of Squares
Mean Squares
F
Row k-1 Column k-1 Treatment (k-1) Residual (k-1)(k-2)) Total k2-1
Hypotheses
Multiple Comparisons
Example
Day/Batch 1 2 3 4 5 1 (A) 8 (B) 7 (D) 1 (C) 7 (E) 3
2 (C) 11 (E) 2 (A) 7 (D) 3 (B) 8 3 (B) 4 (A) 9 (C) 10 (E) 1 (D) 5 4 (D) 6 (C) 8 (E) 6 (B) 6 (A) 10 5 (E) 4 (D) 2 (B) 3 (A) 8 (C) 8
Useful Plots
BATCH
6543210
TIM
E
12
10
8
6
4
2
0
Useful Plots
DAY
6543210
TIM
E
12
10
8
6
4
2
0
Useful Plots
Ingredient
6543210
TIM
E12
10
8
6
4
2
0
Example
Tests of Between-Subjects Effects
Dependent Variable: TIME
125.120a 12 10.427 1.535 .235
864.360 1 864.360 127.237 .000
11.440 4 2.860 .421 .791
12.240 4 3.060 .450 .770
101.440 4 25.360 3.733 .034
81.520 12 6.793
1071.000 25
206.640 24
SourceCorrected Model
Intercept
BATCH
DAY
INGRED
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .605 (Adjusted R Squared = .211)a.
Example
Multiple Comparisons
Dependent Variable: TIME
Tukey HSD
1.8000 1.64843 .807 -3.4543 7.0543
-1.4000 1.64843 .910 -6.6543 3.8543
3.0000 1.64843 .407 -2.2543 8.2543
4.2000 1.64843 .144 -1.0543 9.4543
-1.8000 1.64843 .807 -7.0543 3.4543
-3.2000 1.64843 .348 -8.4543 2.0543
1.2000 1.64843 .946 -4.0543 6.4543
2.4000 1.64843 .607 -2.8543 7.6543
1.4000 1.64843 .910 -3.8543 6.6543
3.2000 1.64843 .348 -2.0543 8.4543
4.4000 1.64843 .118 -.8543 9.6543
5.6000* 1.64843 .035 .3457 10.8543
-3.0000 1.64843 .407 -8.2543 2.2543
-1.2000 1.64843 .946 -6.4543 4.0543
-4.4000 1.64843 .118 -9.6543 .8543
1.2000 1.64843 .946 -4.0543 6.4543
-4.2000 1.64843 .144 -9.4543 1.0543
-2.4000 1.64843 .607 -7.6543 2.8543
-5.6000* 1.64843 .035 -10.8543 -.3457
-1.2000 1.64843 .946 -6.4543 4.0543
(J) INGREDB
C
D
E
A
C
D
E
A
B
D
E
A
B
C
E
A
B
C
D
(I) INGREDA
B
C
D
E
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .05 level.*.