Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split...
-
Upload
emma-merritt -
Category
Documents
-
view
216 -
download
0
Transcript of Last time: One-way Analysis of Variance. Example List of 50 spoken words 3 x 10 Subjects (split...
Last time:Last time:
One-way Analysis of VarianceOne-way Analysis of Variance
Example• List of 50 spoken words• 3 x 10 Subjects (split among I=3 groups)• Group 1: (Fast sound) Person in movie reads list, but
sounds precede lip movement slightly• Group 2: (Slow sound) Person in movie reads list, but
sounds lag behind lip movement slightly• Group 3: (Synchrony) Person in movie reads list with
auditory and visual stimuli in synchrony• Memory Task: Subjects are asked to recall as many
items as possible.
One-way Analysis of Variance Model Assumptions:
,1N ,2N ,iN ,IN
11
12
11
...
nX
X
X
22
22
21
...
nX
X
X
iin
i
i
X
X
X
...2
1
IIn
I
I
X
X
X
...2
1
I many Independent Groups
Data … …
Population
Sample Size1n
2n i
nIn… …
One-way Analysis of Variance
,~ iij NX
,0~ , NX ijijiij
IH ...: 210
oneleast at :aH
Similar recipe as in Linear Regression!
i
iiji
iijii
ij XXnXXXX22
,,
2
Sum SquaresTotal(SST)
Sum SquaresError(SSE)
Sum SquaresGroups(SSG)
Degrees of Freedom
DFT = N-1
Degrees of Freedom
DFG = I-1
Degrees of Freedom
DFE=N-I= +
1,,
1
:SquaresMean
I
SSGMSG
IN
SSEMSE
N
SSTMST
MSG
IH ...: 210 2
2
,MSE pji
iij
SIN
XX
2 ofestimator unbiasedan is MSE
wellas ofestimator
unbiasedan isMSG 2
1 toclose be totends0 MSE
MSGH
INIFMSE
MSGH ,1~:0
ANOVASource of Variation SS df MS F P-value
Between Groups 233.8667 2 116.9333 5.894698 0.007513Within Groups 535.6 27 19.83704
Total 769.4667 29
Let’s grind it out for our example…
MSG
89.583704.19
9333.116
MSE
MSG
Large MSG leads tosignificant F statistic.
Reject Null Hypothesis!Conclusion: The population means
are not identical across groups
What if I=2?
2,1~ :Then NFMSE
MSG
Remember: The Square of a t Random Variable with
n-2 degrees of freedom is an F Random Variablewith 1 degree of freedom in the numerator and
with n-2 degrees of freedom in the denominator.
Thus, the one-way analysis of variance is a natural extensionof the comparison of two means from independent samples(with equal population variances).
Robustness
• If the samples sizes are equal, then the assumption of equal variance (equal standard deviation) is not crucial.
• CLT helps with violations of normality, i.e. as long as sample sizes are large, we do not need normality of the X variables.
Today:Today:
Wrap up “Loose Ends”Wrap up “Loose Ends”
An Illustrating Example An Illustrating Example on Simple Regressionon Simple Regression
Typo CorrectionTypo Correction
One last quiz…One last quiz…
San Fernando Valley Real Estate Data
RENT FEET YRS DIST OFFICE POWER CLEAR LOAD PARK LOT SPRINK1 0.65 11000 0 12 1000 200 20 0 15 1.52 12 0.47 16544 0 12 2156 400 20 1 33 1.5 13 0.48 15004 0 12 2178 400 20 1 30 1.5 14 0.45 27960 1 16 5824 800 18 0 56 1.43 15 0.55 10665 1 13 1000 400 18 0 27 1.5 16 0.51 13700 1 7 1370 600 24 0 28 1.63 0
47 0.47 13440 23 20 2885 1600 14.5 0 17 1.79 048 0.56 14703 24 3 5500 1800 16 1 46 2.31 149 0.53 10000 30 9 800 200 14 1 30 3.19 050 0.5 10320 31 4 1000 400 14 0 22 1.84 151 0.58 27600 33 3 7600 2000 16 0 52 4.5 152 0.36 10360 33 20 730 200 12 0 0 0.8 0
etc.
San Fernando Valley Real Estate Data
RENT FEET YRS DIST OFFICE POWER CLEAR LOAD PARK LOT SPRINK1 0.65 11000 0 12 1000 200 20 0 15 1.52 12 0.47 16544 0 12 2156 400 20 1 33 1.5 13 0.48 15004 0 12 2178 400 20 1 30 1.5 14 0.45 27960 1 16 5824 800 18 0 56 1.43 15 0.55 10665 1 13 1000 400 18 0 27 1.5 16 0.51 13700 1 7 1370 600 24 0 28 1.63 0
47 0.47 13440 23 20 2885 1600 14.5 0 17 1.79 048 0.56 14703 24 3 5500 1800 16 1 46 2.31 149 0.53 10000 30 9 800 200 14 1 30 3.19 050 0.5 10320 31 4 1000 400 14 0 22 1.84 151 0.58 27600 33 3 7600 2000 16 0 52 4.5 152 0.36 10360 33 20 730 200 12 0 0 0.8 0
(Rent per square foot)
(Square-footage)
0
5000
10000
15000
20000
25000
30000
35000
0 10000 20000 30000 40000 50000 60000 70000
Sq. Feet
Is there significant evidence for a linear relationship?
• Test using the correlation• Test using the slope• Test using the ANOVA table
Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value
Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06
Total 51 1.87E+09
Y
Sample correlation R
n n-2t-stat
Y
Sample correlation R
t-stat
0:
0:0
XYA
XY
H
H
2,22022
21
:if Reject ~
21
:Test
nn t
nR
rHt
nr
r
Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Y
Sample correlation R
t-stat
0:
0:0
XYA
XY
H
H
2,2222009.2253.20
2)944(.1
944.have We~
21
:Test
nn t
n
t
nr
r
Dependent Variable: TotalRent Realest.xlsIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
The correlation is significant at 5% significance level.Yes, significant evidence for a linear relationship.
Dependent Variable: TotalRentIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
0:
0:0
iA
i
H
H
0̂
1̂0̂SE
1̂SE
Observedt-statistics for
*
* iSEi
ˆ
0ˆ p-value =
observedn ttP 2
toingcorrespond
95% CIs
0:
0:
1
10
AH
H
1̂ 1̂SE
Observedt-statistic for
*
* 1ˆ
1 0ˆ
SE
p-value <.001Yes, significant evidence forlinear relationship
95% CI
Dependent Variable: TotalRentIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
p-value <.001Yes, significant evidence forlinear relationship
Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value
Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06
Total 51 1.87E+09
179.4100607.4
0967.1
E
E
MSE
MSM
What is the best fitting regression equation?
0̂
1̂
Dependent Variable: TotalRentIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
Feet Square 451.779.960Rent
“I bet the population intercept is more then 900”
This would mean that you pay a fixed
minimum flat amount of $900,
plus whatever rent you need to pay
based on square footage.
Dependent Variable: TotalRentIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
claim. for the evidencet significan No
009.211.951.521
900779.960
025.
900Intercept :
,900Intercept :0
AH
H
I bet, for every additional 10 Square Feet,
you have to pay more than an extra $4 Rent!
That would mean more than $.4 extra rent per extra square foot.
That would mean the slope is > .4.
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
50,02.1 109.2318.2
022.
4.451.4.
:statistic edstandardiz Observe
1
tSE
b
b
4.:4.: 1110 HH
Significant at 2% significance level.Yes, significant evidence that we pay over $4 extra per 10sqft extra.
For every additional 1,000 Square Feet, how much extra Rent do you have to pay?
Give a 95% Confidence Interval
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
]496,.407[.022.009.2451. :obtain wecase,our In
:slope for the Interval Confidence 95%
2
*2,21
XX
stb
i
n
This is our 95% CI for the extra Rent per extra Square Foot.Thus:95% CI for extra Rent per 1,000 Square Feet: [$407, $496]
What is our best guess at the standard deviation of
the Error Term?
What percentage of the variance are we able to explain with this
model?
22
n
SSEs
SST
SSR
SST
SSER 12
2n
SSEs
SSR = SST-SSE
n
i iYY YYSSST1
2)(
)ˆˆˆ()ˆ( 101
2 XYYYSSEn
i i
Dependent Variable: TotalRentIndependent Variables: SquareFeet
Descriptive StatisticsVariable Mean Std.Dev. Std.Err. Maximum Minimum Count
SquareFeet 1.98E+04 1.27E+04 1757.569 64570 5200 52TotalRent 9885.505 6059.211 840.261 32285 3016 52
Correlation MatrixVariable SquareFeet TotalRent
SquareFeet 1.000TotalRent 0.944 1.000
Regression StatisticsR R Square Adj.RSqr Std.Err. # Cases #Missing Deg.Free t(2.5%,50)
0.944 0.891 0.889 2017.147 52 0 50 2.009
Summary TableVariable Coeff. Std.Err. t Stat. P-value Lower95% Upper95%
Intercept 960.779 521.951 1.841 0.072 -87.591 2009.149SquareFeet 0.451 0.022 20.253 0.000 0.407 0.496
Analysis of VarianceSource df Sum Sqrs Mean Sqr F P-value
Regression 1 1.67E+09 1.67E+09 410.179 0.000Residual 50 2.03E+08 4.07E+06
Total 51 1.87E+09
0
5
10
15
20
25
Residual Range
Histogram of Residuals
Residual
Theoretical
?),0( likelook residuals theDo 2N
Line Fit P lot
-850
4150
9150
14150
19150
24150
29150
34150
39150
5000 15000 25000 35000 45000 55000 65000 75000
SquareF eet
Actual
P redicted
Upper 95%Lower 95%
Prediction Region
Slide Typo Correction:Slide Typo Correction:2x2 Contingency Tables2x2 Contingency Tables
Special Case: 2x2 Tables
1df with Square-Chi toCompare
:
2121
221122211
2
2
)2(1)1(10
CCRR
NNNNn
E
-ENΧ
ppH
i,j ij
ijij
This typo occurred in several slidesdue to cut and pasting.
Last (and special) QuizLast (and special) QuizCounts as 5 Bonus Points Counts as 5 Bonus Points
in Grand Totalin Grand Total
RegressionRegression