Continuation of inference testing 9E.1 : Inference Testing for Linear Regression
An alternative approach to testing for a linear association
description
Transcript of An alternative approach to testing for a linear association
![Page 1: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/1.jpg)
An alternative approach totesting for a linear association
The Analysis of Variance (ANOVA) Table
![Page 2: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/2.jpg)
Translating a research question into a statistical procedure
• Is there a (linear) relationship between skin cancer mortality and latitude?– How ??– Also, the (analysis of variance) F-test
• Is there a (linear) relationship between height and grade point average?– How ??– Also, the (analysis of variance) F-test
![Page 3: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/3.jpg)
Where does this topic fit in?
• Model formulation• Model estimation• Model evaluation• Model use
![Page 4: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/4.jpg)
Example:Skin cancer mortality and latitude
27 30 33 36 39 42 45 48
100
150
200
Mor
talit
y (D
eath
s pe
r 10
mill
ion)
Latitude (at center of state)
Skin cancer mortality versus State latitude
![Page 5: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/5.jpg)
Example: Skin cancer mortality and latitude
The regression equation is Mort = 389 - 5.98 Lat
Predictor Coef SE Coef T PConstant 389.19 23.81 16.34 0.000Lat -5.9776 0.5984 -9.99 0.000
S = 19.12 R-Sq = 68.0% R-Sq(adj) = 67.3%
Analysis of Variance
Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365Total 48 53637
![Page 6: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/6.jpg)
504030
200
150
100
Latitude (at center of state)
Mor
talit
y (D
eath
s pe
r 10
milli
on)
88.152y
ii xy 98.519.389ˆ
17173ˆ1
2
n
iii yy
536371
2
n
ii yy
36464ˆ1
2
n
ii yy
Example: Skin cancer mortality and latitude
iy
![Page 7: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/7.jpg)
Example: Skin cancer mortality and latitude
The regression equation is Mort = 389 - 5.98 Lat
Predictor Coef SE Coef T PConstant 389.19 23.81 16.34 0.000Lat -5.9776 0.5984 -9.99 0.000
S = 19.12 R-Sq = 68.0% R-Sq(adj) = 67.3%
Analysis of Variance
Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365Total 48 53637
![Page 8: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/8.jpg)
Example:Height and GPA
756555
4
3
2
Height (inches)
G.P
.A.
![Page 9: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/9.jpg)
Example: Height and GPA
The regression equation is gpa = 3.41 - 0.0066 height
Predictor Coef SE Coef T PConstant 3.410 1.435 2.38 0.023height -0.00656 0.02143 -0.31 0.761
S = 0.5423 R-Sq = 0.3% R-Sq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F PRegression 1 0.0276 0.0276 0.09 0.761Residual Error 33 9.7055 0.2941Total 34 9.7331
![Page 10: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/10.jpg)
Example: Height and GPA
756555
4
3
2
height
gpa
7055.9ˆ1
2
n
iii yy
7331.91
2
n
ii yy
0276.0ˆ1
2
n
ii yy
xy 0066.041.3ˆ
97.2y
iy
![Page 11: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/11.jpg)
Example: Height and GPA
The regression equation is gpa = 3.41 - 0.0066 height
Predictor Coef SE Coef T PConstant 3.410 1.435 2.38 0.023height -0.00656 0.02143 -0.31 0.761
S = 0.5423 R-Sq = 0.3% R-Sq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F PRegression 1 0.0276 0.0276 0.09 0.761Residual Error 33 9.7055 0.2941Total 34 9.7331
![Page 12: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/12.jpg)
The basic idea
• Break down the total variation in y (“total sum of squares”) into two components:– a component that is “due to” the change in x
(“regression sum of squares”)– a component that is just due to random error (“error
sum of squares”)• If the regression sum of squares is a large
component of the total sum of squares, it suggests that there is a linear association between x and y.
![Page 13: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/13.jpg)
A geometric decomposition
x
y
xbby 10ˆ
y
iy
iiii yyyyyy ˆˆ
![Page 14: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/14.jpg)
The decomposition holds for the sum of the squared deviations, too:
n
iii
n
ii
n
ii yyyyyy
1
2
1
2
1
2 ˆˆ
Total sum of squares (SSTO)
Regression sum of squares (SSR)
Error sum of squares (SSE)
SSESSRSSTO
![Page 15: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/15.jpg)
Breakdown of degrees of freedom
211 nn
Degrees of freedom associated with SSTO
Degrees of freedom associated with SSR
Degrees of freedom associated with SSE
![Page 16: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/16.jpg)
Example: Skin cancer mortality and latitude
The regression equation is Mort = 389 - 5.98 Lat
Predictor Coef SE Coef T PConstant 389.19 23.81 16.34 0.000Lat -5.9776 0.5984 -9.99 0.000
S = 19.12 R-Sq = 68.0% R-Sq(adj) = 67.3%
Analysis of Variance
Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365Total 48 53637
![Page 17: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/17.jpg)
Definitions of Mean Squares
Similarly, the regression mean square (MSR) is defined as:
We already know the mean square error (MSE) is defined as:
11
ˆ 2 SSRyyMSR i
22
ˆ 2
nSSE
nyy
MSE ii
![Page 18: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/18.jpg)
Expected Mean Squares
n
ii XXMSRE
1
221
2)(
2)( MSEE
• If β1 = 0, we’d expect the ratio MSR/MSE to be …
• If β1 ≠ 0, we’d expect the ratio MSR/MSE to be …
• Use ratio, MSR/MSE, to reject whether or not β1= 0.
![Page 19: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/19.jpg)
Analysis of Variance (ANOVA) Table
Source of variation DF SS MS F
Regression 1
Residual error n-2
Total n-1
n
iii yySSE
1
2ˆ
n
ii yySSTO
1
2
n
ii yySSR
1
2ˆ1SSRMSR
2nSSEMSE
MSEMSRF
![Page 20: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/20.jpg)
The formal F-test for slope parameter β1
Null hypothesis H0: β1 = 0Alternative hypothesis HA: β1 ≠ 0
Test statisticMSEMSRF *
P-value = What is the probability that we’d get an F* statistic as large as we did, if the null hypothesis is true?
The P-value is determined by comparing F* to an F distribution with 1 numerator degree of freedom and n-2 denominator degrees of freedom.
![Page 21: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/21.jpg)
Row Year Men200m 1 1900 22.20 2 1904 21.60 3 1908 22.60 4 1912 21.70 5 1920 22.00 6 1924 21.60 7 1928 21.80 8 1932 21.20 9 1936 20.70 10 1948 21.10 11 1952 20.70 12 1956 20.60 13 1960 20.50 14 1964 20.30 15 1968 19.83 16 1972 20.00 17 1976 20.23 18 1980 20.19 19 1984 19.80 20 1988 19.75 21 1992 20.01 22 1996 19.32
Winning times (in seconds) in Men’s 200 meter Olympic sprints, 1900-1996.
Are men getting faster?
![Page 22: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/22.jpg)
200019501900
22.5
21.5
20.5
19.5
Year
Men
200m
S = 0.298134 R-Sq = 89.9 % R-Sq(adj) = 89.4 %Men200m = 76.1534 - 0.0283833 Year
Regression Plot
![Page 23: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/23.jpg)
Analysis of Variance Table
Analysis of VarianceSource DF SS MS F PRegression 1 15.8 15.8 177.7 0.000Residual Error 20 1.8 0.09Total 21 17.6
DFE = n-2 = 22-2 = 20
DFTO = n-1 = 22-1 = 21
MSR = SSR/1 = 15.8
MSE = SSE/(n-2) = 1.8/20 = 0.09
F* = MSR/MSE = 15.796/0.089 = 177.7
P = Probability that an F(1,20) random variable is greater than 177.7 = 0.000…
![Page 24: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/24.jpg)
For simple linear regression model, the F-test and t-test are equivalent.
7.177)33.13( 2
Predictor Coef SE Coef T PConstant 76.153 4.152 18.34 0.000Year -0.0284 0.00213 -13.33 0.000
Analysis of VarianceSource DF SS MS F PRegression 1 15.796 15.796 177.7 0.000Residual Error 20 1.778 0.089Total 21 17.574
*)2,1(
2*)2( nn Ft
![Page 25: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/25.jpg)
Equivalence of F-test to t-test
• For a given α level, the F-test of β1 = 0 versus β1 ≠ 0 is algebraically equivalent to the two-tailed t-test.
• Will get exactly same P-values, so…– If one test rejects H0, then so will the other.
– If one test does not reject H0, then so will the other.
![Page 26: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/26.jpg)
Should I use the F-test or the t-test?
• The F-test is only appropriate for testing that the slope differs from 0 (β1 ≠ 0).
• Use the t-test to test that the slope is positive (β1 > 0) or negative (β1 < 0).
• F-test is more useful for multiple regression model when we want to test that more than one slope parameter is 0.
![Page 27: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/27.jpg)
Getting ANOVA table in Minitab
• The Analysis of Variance (ANOVA) Table is default output for either command:– Stat >> Regression >> Regression …– Stat >> Regression >> Fitted line plot …
![Page 28: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/28.jpg)
Stat >> Regression >> Regression
![Page 29: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/29.jpg)
Stat >> Regression >> Fitted line plot
![Page 30: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/30.jpg)
Example: Is number of stories linearly related to building height?
STORIES
HEIG
HT
100908070605040302010
1200
1000
800
600
400
200
S 58.3259R-Sq 90.4%R-Sq(adj) 90.2%
Fitted Line PlotHEIGHT = 90.31 + 11.29 STORIES
![Page 31: An alternative approach to testing for a linear association](https://reader036.fdocuments.us/reader036/viewer/2022081603/56815adc550346895dc8a83e/html5/thumbnails/31.jpg)
Example: Is number of stories linearly related to building height?
The regression equation is HEIGHT = 90.3 + 11.3 STORIES
Predictor Coef SE Coef T PConstant 90.31 20.96 4.31 0.000STORIES 11.2924 0.4844 23.31 0.000
S = 58.3259 R-Sq = 90.4% R-Sq(adj) = 90.2%
Analysis of Variance
Source DF SS MS F PRegression 1 1848520 1848520 543.38 0.000Residual Error 58 197311 3402Total 59 2045831