Prediction concerning the response Y
-
Upload
shafira-ramos -
Category
Documents
-
view
22 -
download
0
description
Transcript of Prediction concerning the response Y
![Page 1: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/1.jpg)
Prediction concerning the response Y
![Page 2: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/2.jpg)
Where does this topic fit in?
• Model formulation
• Model estimation
• Model evaluation
• Model use
![Page 3: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/3.jpg)
Translating two research questions into two reasonable statistical answers
• What is the mean weight, μ, of all American women, aged 18-24? – If we want to estimate μ, what would be a good
estimate?
• What is the weight, y, of a randomly selected American woman, aged 18-24?– If we want to predict y, what would be a good
prediction?
![Page 4: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/4.jpg)
62 66 70 74
110
120
130
140
150
160
170
180
190
200
210
height
we
ight
Could we do better by taking into account a person’s height?
8.158y
hw 1.65.266
![Page 5: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/5.jpg)
One thing to estimate (μy) and one thing to predict (y)
54321
22
18
14
10
6
High school gpa
Co
llege
ent
ranc
e te
st s
core
xYEY 10
iii xY 10
![Page 6: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/6.jpg)
Two different research questions
• What is the mean response μY when the predictor value is xh?
• What value will a new observation Ynew be when the predictor value is xh?
![Page 7: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/7.jpg)
Example: Skin cancer mortality and latitude
• What is the expected (mean) mortality rate for all locations at 40o N latitude?
• What is the predicted mortality rate for 1 new randomly selected location at 40o N?
![Page 8: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/8.jpg)
504030
200
150
100
Latitude
Mo
rta
lity
S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 %Mortality = 389.189 - 5.97764 Latitude
Regression Plot
Example: Skin cancer mortality and latitude
1.150)40(9776.519.389ˆ y
![Page 9: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/9.jpg)
“Point estimators”
is the best answer to each research question.
hh xbby 10ˆ
That is, it is:
• the best guess of the mean response at xh
• the best guess of a new observation at xh
But, as always, to be confident in the answer to our research question, we should put an interval around our best guess.
![Page 10: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/10.jpg)
It is dangerous to “extrapolate” beyond scope of model.
6543210
30
25
20
15
conc
colo
nie
s
S = 2.67546 R-Sq = 66.8 % R-Sq(adj) = 63.5 %colonies = 16.0667 + 1.61576 conc
Regression Plot
![Page 11: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/11.jpg)
It is dangerous to “extrapolate” beyond scope of model.
10 5 0
30
20
10
conc
colo
nie
s
S = 2.74819 R-Sq = 69.6 % R-Sq(adj) = 64.5 % - 0.276956 conc**2colonies = 15.0205 + 3.22113 conc
Regression Plot
![Page 12: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/12.jpg)
A confidence interval for the population mean response μY
… when the predictor value is xh
![Page 13: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/13.jpg)
Again, what are we estimating?
54321
22
18
14
10
6
High school gpa
Co
llege
ent
ranc
e te
st s
core
xYEY 10
iii xY 10
![Page 14: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/14.jpg)
(1-α)100% t-interval for mean response μY
Formula in notation:
Formula in words:
Sample estimate ± (t-multiplier × standard error)
2
2
2,2
1ˆ
xx
xx
nMSEty
i
hnh
![Page 15: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/15.jpg)
Example: Skin cancer mortality and latitude
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.56, 155.61) (111.23,188.93)
Values of Predictors for New Observations
New Obs Lat
1 40.0
![Page 16: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/16.jpg)
Factors affecting the length of the confidence interval for μY
2
2
2,2
1ˆ
xx
xx
nMSEty
i
hnh
• As the confidence level decreases, …• As MSE decreases, …• As the sample size increases, …• The more spread out the predictor values, …• The closer xh is to the sample mean, …
![Page 17: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/17.jpg)
Does the estimate of μY when xh = 1 vary more here …?
10987654321
25
15
5
x
y
Var N StDevyhat(x=1) 5 0.320
![Page 18: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/18.jpg)
… or here?
10987654321
30
20
10
0
x
y
Var N StDev yhat(x=1) 5 2.127
![Page 19: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/19.jpg)
Does the estimate of μY vary more when xh = 1 or when xh = 5.5?
10987654321
30
20
10
0
x
y Var N StDev yhat(x=1) 5 2.127yhat(x=5.5) 5 0.512
![Page 20: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/20.jpg)
Predicted Values for New Observations
New Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.6,155.6) (111.2,188.93) 2 221.82 7.42 (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the center
Values of Predictors for New ObservationsNew Obs Latitude1 40.0 Mean of Lat = 39.5332 28.0
Example: Skin cancer mortality and latitude
![Page 21: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/21.jpg)
When is it okay to use the confidence interval for μY formula?
• When xh is a value within the scope of the model – xh does not have to be one of the actual x values in the data set.
• When the “LINE” assumptions are met.– The formula works okay even if the error terms
are only approximately normal.– If you have a large sample, the error terms can
even deviate substantially from normality.
![Page 22: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/22.jpg)
Prediction interval for a new response Ynew
![Page 23: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/23.jpg)
Again, what are we predicting?
54321
22
18
14
10
6
High school gpa
Co
llege
ent
ranc
e te
st s
core
xYEY 10
iii xY 10
![Page 24: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/24.jpg)
(1-α)100% prediction interval for new response Ynew
Formula in notation:
Formula in words:
Sample prediction ± (t-multiplier × standard error)
2
2
2,2
11ˆ
xx
xx
nMSEty
i
hnh
![Page 25: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/25.jpg)
Example: Skin cancer mortality and latitude
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.56, 155.61) (111.23,188.93)
Values of Predictors for New Observations
New Obs Lat
1 40.0
![Page 26: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/26.jpg)
When is it okay to use the prediction interval for Ynew formula?
• When xh is a value within the scope of the model – xh does not have to be one of the actual x values in the data set.
• When the “LINE” assumptions are met.– The formula for the prediction interval depends
strongly on the assumption that the error terms are normally distributed.
![Page 27: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/27.jpg)
What’s the difference in the two formulas?
Confidence interval for μY :
2
2
2,2
1ˆ
xx
xx
nMSEty
i
hnh
Prediction interval for Ynew:
2
2
2,2
11ˆ
xx
xx
nMSEty
i
hnh
![Page 28: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/28.jpg)
Prediction of Ynew if the mean μY is known
21019017015013011090
0.02
0.01
0.00
Mortality
No
rma
l cur
ve
0.95
Suppose it were known that the mean skin cancer mortality at xh = 40o N is 150 deaths per million (with variance 400)?
What is the predicted skin cancer mortality in Columbus, Ohio?
![Page 29: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/29.jpg)
And then reality sets in
• The mean μY is not known.
– Estimate it with the predicted response y
– The cost of using y to estimate μY is the
• The variance σ2 is not known.
variance of
y
– Estimate it with MSE.
![Page 30: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/30.jpg)
Variance of the prediction
)ˆ(22hY
n
ii
hn
ii
h
xx
xx
nMSE
xx
xx
nMSEMSE
1
2
2
1
2
2 11
1
which is estimated by:
The variation in the prediction of a new response depends on two components:
1. the variation due to estimating the mean μY with
2. the variation in Y
hy
![Page 31: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/31.jpg)
What’s the effect of the difference in the two formulas?
Confidence interval for μY :
2
2
2,2
1ˆ
xx
xx
nMSEty
i
hnh
Prediction interval for Ynew:
2
2
2,2
11ˆ
xx
xx
nMSEty
i
hnh
![Page 32: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/32.jpg)
What’s the effect of the difference in the two formulas?
• A (1-α)100% confidence interval for μY at xh will always be narrower than a (1-α)100% prediction interval for Ynew at xh.
• The confidence interval’s standard error can approach 0, whereas the prediction interval’s standard error cannot get close to 0.
![Page 33: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/33.jpg)
Confidence intervals and prediction intervals for response in Minitab
• Stat >> Regression >> Regression …• Specify response and predictor(s).• Select Options…
– In “Prediction intervals for new observations” box, specify either the X value or a column name containing multiple X values.
– Specify confidence level (default is 95%).
• Click on OK. Click on OK.• Results appear in session window.
![Page 34: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/34.jpg)
Confidence intervals and prediction intervals for response in Minitab
![Page 35: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/35.jpg)
Confidence intervals and prediction intervals for response in Minitab
C64028
![Page 36: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/36.jpg)
Predicted Values for New Observations
New Fit SE Fit 95.0% CI 95.0% PI1 150.08 2.75 (144.6,155.6) (111.2,188.93) 2 221.82 7.42 (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the center
Values of Predictors for New ObservationsNew Obs Latitude1 40.0 Mean of Lat = 39.5332 28.0
Example: Skin cancer mortality and latitude
![Page 37: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/37.jpg)
A plot of the confidence interval and prediction interval in Minitab
• Stat >> Regression >> Fitted line plot …
• Specify predictor and response.
• Under Options …– Select Display confidence bands. – Select Display prediction bands. – Specify desired confidence level (95% default)
• Select OK. Select OK.
![Page 38: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/38.jpg)
A plot of the confidence interval and prediction interval in Minitab
![Page 39: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/39.jpg)
A plot of the confidence interval and prediction interval in Minitab
![Page 40: Prediction concerning the response Y](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813006550346895d957912/html5/thumbnails/40.jpg)
30 40 50
50
150
250
Latitude
Mo
rta
lity
Mortality = 389.189 - 5.97764 LatitudeS = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 %
Regression
95% CI
95% PI
Regression Plot