Regression and Correlation
Jake BlanchardFall 2010
IntroductionWe can use regression to find
relationships between random variables
This does not necessarily imply causation
Correlation can be used to measure predictability
Regression with Constant VarianceLinear Regression: E(Y|
X=x)=+xIn general, variance is function of
xIf we assume the variance is a
constant, then the analysis is simplified
Define total absolute error as the sum of the squares of the errors
Linear Regression
n
ii
n
iii
n
iiii
n
iii
n
iii
n
iii
xx
xxyy
xysolve
xyx
xy
xyxy
1
2
1
1
2
1
2
1
2
1
22
02
02
Variance in Regression AnalysisRelevant variance is conditional:
Var(Y|X=x)
2
2|2
22|
1
22
1
22|
1
22|
1
2
2121
Y
XY
XY
n
ii
n
iiXY
n
iiiXY
ss
r
ns
xxyyn
s
xyn
s
Confidence IntervalsRegression coefficients are t-
distributed with n-2 dofStatistic below is thus t-
distributed with n-2 dof
And the confidence interval is
n
ii
ixY
xYi
xx
xxn
s
Yi
1
2
2
|
|
1
n
ii
iXY
nixY
xx
xxn
styi
1
2
2
|2,
211|
1
ExampleExample 8.1Data for compressive strength (q)
of stiff clay as a function of “blow counts” (N)
038.08305.0
2
029.0
112.0
22.191
12.9591123.27.18
22|
22
222
222
ns
Nq
NnNqNnqN
qnqs
NnNs
qN
Nq
i
ii
iq
iN
744.0,21.07.18*104353
7.184101038.*306.2477.
477.04*112.0029.04
306.2
1
95.0|
2
2
95.0|
8,975.0
1
2
2
|2,
211|
Nq
Nq
i
n
ii
iXY
nixY
yNat
t
xx
xxn
styi
Plot
Correlation Estimate
22
2|2
,
,
1,
1,
121
11
11
rss
nn
ss
ss
yxnyx
n
ss
yyxx
n
Y
xYyx
Y
Xyx
YX
n
iii
yx
YX
n
iii
yx
Regression with Non-Constant VarianceNow relax
assumption of constant variance
Assume regions with large conditional variance weighted less
)(2
)(1
)(1
|1
)|()(|
|
1
2
2
22
2
11
2
1
1111
1
11
1
22
22
22
xsgsn
yyws
xgww
xwxww
ywxwyxww
w
xwyw
xyw
xgxXYVarw
weightsxxXYExgxXYVar
xY
n
iii
iii
n
iii
n
iii
n
ii
n
iii
n
iii
n
iiii
n
ii
n
ii
n
iii
n
iii
n
iiii
iii
Example (8.2)Data for maximum settlement (x)
of storage tanks and maximum differential settlement (y)
From looking at data, assume g(x)=x (that is, standard deviation of y increases linearly with x
2
22
1|
ii xw
xxXYVar
Example (8.2) continued
96.0
243.00589.0
65.0045.0
627.0923.011.165.1
|
2
xss
ssyx
xy
y
x
Multiple Regression
ikkiii xxxy ...22110
“Nonlinear” Regression
)()|( xgxYE
Use LINEST in Excel
Top Related