Inference for Simple Regression Social Research Methods 2109 & 6507 Spring 2006 March 15, 16, 2006.
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Inference for Simple Regression Social Research Methods 2109 & 6507 Spring 2006 March 15, 16, 2006.
Inference for Simple Regression
Social Research Methods 2109 & 6507
Spring 2006
March 15, 16, 2006
Regression Equation
Equation of a regression line:(y_hat) = α +βx y = α +βx + ε
y = dependent variablex = independent variableβ = slope = predicted change in y with a one unit c
hange in xα= intercept = predicted value of y when x is 0y_hat = predicted value of dependent variable
補充 : Proportional Reduction of Error (PRE)( 消減錯誤的比例 )
• PRE measures compare the errors of predictions under different prediction rules; contrasts a naïve to sophisticated rule
• R2 is a PRE measure• Naïve rule = predict y_bar• Sophisticated rule = predict y_hat• R2 measures reduction in predictive error f
rom using regression predictions as contrasted to predicting the mean of y
Example: SPSS Regression Procedures and Output
• To get a scatterplot ():
統計圖 (G) → 散佈圖 (S) → 簡單 →定義(選x 及 y )
• To get a correlation coefficient:
分析 (A) → 相關 (C) → 雙變量• To perform simple regression
分析 (A) → 回歸方法 (R) → 線性 (L) (選 x及 y )(還可選擇儲存預測值及殘差)
SPSS Example: Infant mortality vs. Female Literacy, 1995 UN Data
Infant Mortality vs. Female Literacy
109 countries, 1995 UN Data
Females who read (%)
120100806040200
Infa
nt
mort
ality
(d
eath
s p
er
10
00
liv
e b
irth
s)
200
100
0
Example: correlation between infant mortality and female literacy
相關
1 -.843**. .000
109 85-.843** 1.000 .
85 85
Pearson 相關 ( )顯著性 雙尾
個數Pearson 相關
( )顯著性 雙尾個數
BABYMORT Infantmortality (deaths per1000 live births)
LIT_FEMA Femaleswho read (%)
BABYMORT Infant mortality(deaths per 1000
live births)
LIT_FEMA Females who
read (%)
0.01 ( )在顯著水準為 時 雙尾 ,相關顯著。**.
Regression: infant mortality vs. female literacy, 1995 UN Data
模式摘要b
.843a .711 .708 20.6971模式1
R R 平方調過後的R 平方 估計的標準誤
( ), LIT_FEMA Females who read (%)預測變數: 常數a. \ BABYMORT Infant mortality (deaths per 1000依變數 :
live births)b.
係數a
127.203 5.764 22.067 .000 115.738 138.668
-1.129 .079 -.843 -14.302 .000 -1.286 -.972
( )常數LIT_FEMA Femaleswho read (%)
模式1
B 之估計值 標準誤未標準化係數
Beta 分配
標準化係數
t 顯著性 下限 上限
B 95% 迴歸係數 的 信賴區間
\ BABYMORT Infant mortality (deaths per 1000 live births)依變數 :a.
Diagnosis: a residual plot
Regression Residuals vs. Female Literacy
109 countries, 1995 UN Data
Females who read (%)
120100806040200
Uns
tand
ardi
zed
Res
idua
l
60
40
20
0
-20
-40
-60
-80
Global test--F 檢定 : 檢定迴歸方程式有無解釋能力 (β= 0)
The regression model ( 迴歸模型 )
• Note: the slope and intercept of the regression line are statistics (i.e., from the sample data).
• To do inference, we have to think of α and β as estimates of unknown parameters.
Regression as conditional means
• Ways to think about regression:1. Straight-line description of association2. Prediction3. Conditional means ( 條件平均數 ) Conditional mean: a mean computed conditi
onal on the value of another variableRegression line predicts the conditional mea
n of y given x
Assumptions for regression inference
Think about there as being a population or “true” regression line
Assumptions:• For any fixed value of x, the response (y) varies
according to a normal distribution. Repeated responses y are independent of each other.
• μy = α +βx (means of y conditional on x fall in a straight line)
• The standard deviation of y (call it σ) for each value of x is the same. The value of σ is unknown.
“True” regression line
Inference for regression
• Population regression line:
μy = α +βx
estimated from sample:
(y_hat) = a + bx
b is an unbiased estimator ( 不偏估計式 )of the true slope β, and a is an unbiased estimator of the true intercept α
Sampling distribution of a (intercept) and b (slope)
• Mean of the sampling distribution of a is α
• Mean of the sampling distribution of b is β
Sampling distribution of a (intercept) and b (slope)
• Mean of the sampling distribution of a is α
• Mean of the sampling distribution of b is β
• The standard error of a and b are related to the amount of spread about the regression line (σ)
• Normal sampling distributions; with σ estimated use t-distribution for inference
The standard error of the least-squares line
• Estimate σ (spread about the regression line using residuals from the regression)
• recall that residual = (y –y_hat)
• Estimate the population standard deviation about the regression line (σ) using the sample estimates
Estimate σ from sample data
Standard Error of Slope (b)
• The standard error of the slope has a sampling distribution given by:
• Small standard errors of b means our estimate of b is a precise estimate of
• SEb is directly related to s; inversely related to sample size (n) and Sx
Confidence Interval for regression slope
A level C confidence interval for the slope of “true” regression line β is
b ± t * SEb
Where t* is the upper (1-C)/2 critical value from the t distribution with n-2 degrees of freedom
To test the hypothesis H0: β= 0, compute the t statistic:
t = b/ SEb
In terms of a random variable having the t,n-2 distribution
Significance Tests for the slope
Test hypotheses about the slope of β. Usually:
H0: β= 0 (no linear relationship between the independent and dependent variable)
Alternatives:
HA: β > 0 or HA: β < 0
or HA: β ≠ 0
Statistical inference for intercept
We could also do statistical inference for the regression intercept, α
Possible hypotheses:
H0: α = 0
HA: α≠ 0t-test based on a, very similar to prior t-tests
we have doneFor most substantive applications, interested
in slope (β), not usually interested in α
Regression: infant mortality vs. female literacy, 1995 UN Data
模式摘要b
.843a .711 .708 20.6971模式1
R R 平方調過後的R 平方 估計的標準誤
( ), LIT_FEMA Females who read (%)預測變數: 常數a. \ BABYMORT Infant mortality (deaths per 1000依變數 :
live births)b.
係數a
127.203 5.764 22.067 .000 115.738 138.668
-1.129 .079 -.843 -14.302 .000 -1.286 -.972
( )常數LIT_FEMA Femaleswho read (%)
模式1
B 之估計值 標準誤未標準化係數
Beta 分配
標準化係數
t 顯著性 下限 上限
B 95% 迴歸係數 的 信賴區間
\ BABYMORT Infant mortality (deaths per 1000 live births)依變數 :a.
變異數分析b
87617.840 1 87617.840 204.538 .000a
35554.673 83 428.370123172.513 84
迴歸殘差總和
模式1
平方和 自由度 平均平方和 F 檢定 顯著性
( ), LIT_FEMA Females who read (%)預測變數: 常數a. \ BABYMORT Infant mortality (deaths per 1000 live births)依變數 :b.
Hypothesis test example
大華正在分析教育成就的世代差異,他蒐集到 117 組父子教育程度的資料。父親的教育程度是自變項,兒子的教育程度是依變項。他的迴歸公式是: y_hat = 0.2915*x +10.25
迴歸斜率的標準誤差 (standard error) 是 : 0.10
1. 在 α=0.05 ,大華可得出父親與兒子的教育程度是有關連的嗎?
2. 對所有父親的教育程度是大學畢業的男孩而言,這些男孩的平均教育程度預測值是多少?
3. 有一男孩的父親教育程度是大學畢業,預測這男孩將來的教育程度會是多少?