Regression analysis
description
Transcript of Regression analysis
![Page 1: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/1.jpg)
Regression analysis
Linear regression Logistic regression
![Page 2: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/2.jpg)
2
Relationship and association
![Page 3: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/3.jpg)
3
Straight line
95 95.5 96 96.5 97 97.5 98 98.5 9921.523
21.5235
21.524
21.5245
21.525
21.5255
21.526
21.5265
H ip (cm )
1 cm
-0.0008BM
I
XbbY 10
XBMI 0008.01000
)()(
12
121 XX
YYb
onintersecti0 b
HIPBMI 10 bb
![Page 4: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/4.jpg)
4
Best straight line?
![Page 5: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/5.jpg)
5
Best straight line!
90 92 94 96 98 100 102 104 106 10814
16
18
20
22
24
26
28
30
32
(X1,Y1)
11 YYe
N
iii YYe
1
2ˆ
Least square estimation
![Page 6: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/6.jpg)
6
Simple linear regression
1. Is the association linear?
-3 -2 -1 0 1 2 3-4
-2
0
2
4
6
8
10
12
![Page 7: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/7.jpg)
7
Simple linear regression
1. Is the association linear?2. Describe the
association: what is b0 and b1BMI = -12.6kg/m2+0.35kg/m3*Hip
21
XX
YYXXb
i
ii
nX
X i
XbYb 10
![Page 8: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/8.jpg)
8
Simple linear regression
1. Is the association linear?2. Describe the association3. Is the slope significantly
different from 0?Help SPSS!!!
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -12,581 2,331 -5,396 ,000
Hip ,345 ,023 ,565 15,266 ,000
a. Dependent Variable: BMI
![Page 9: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/9.jpg)
9
Simple linear regression
1. Is the association linear?2. Describe the association3. Is the slope significantly
different from 0?4. How good is the fit?
How far are the data points fom the line on avarage?
11
22
r
YYXX
YYXXr
ii
ii
![Page 10: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/10.jpg)
10
The Correlation Coefficient, r
R = 0
R = 1
R = 0.7
R = -0.5
![Page 11: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/11.jpg)
11
r2 – Goodness of fitHow much of the variation can be explained by the model?
R2 = 0
R2 = 1
R2 = 0.5
R2 = 0.2
![Page 12: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/12.jpg)
12
Multiple linear regression
Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:
WSTHIPBMI 210 bbb
0.17WST0.25HIP12.2- BMI
![Page 13: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/13.jpg)
13
Multiple linear regression
If Y is linearly dependent on more than one independent variable:
is the intercept, the value of Y when X1 and X2 = 01 and 2 are termed partial regression coefficients1 expresses the change of Y for one unit of X when 2 is kept constant
jjj XXY 2211
05
1015
2025
12
34
56
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
![Page 14: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/14.jpg)
14
Multiple linear regression – residual error and estimations
As the collected data is not expected to fall in a plane an error term must be added
The error term sums up to be zero.
Estimating the dependent factor and the population parameters:
jjjj XXY 2211
05
1015
2025
12
34
56
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
jjj XbXbaY 2211ˆ
![Page 15: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/15.jpg)
15
Multiple linear regression – general equations
In general an finite number (m) of independent variables may be used to estimate the hyperplane
The number of sample points must be two more than the number of variables
j
m
iijij XY
1
![Page 16: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/16.jpg)
16
Multiple linear regression – co-liniarity
Adding age: adj R2 = 0.352
Adding thigh: adj R2 = 0.352?
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95,0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190
Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252
Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313
Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028
a. Dependent Variable: BMI
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95,0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086
Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252
Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028
Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313
a. Dependent Variable: BMI
![Page 17: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/17.jpg)
17
Assumptions
1. Dependent variable must be metric continuous
2. Independent must be continuous or ordinal
3. Linear relationship between dependent and all independent variables
4. Residuals must have a constant spread.
5. Residuals are normal distributed6. Independent variables are not
perfectly correlated with each other
![Page 18: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/18.jpg)
18
Multible linear regression in SPSS
![Page 19: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/19.jpg)
19
Multible linear regression in SPSS
![Page 20: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/20.jpg)
Non-parametric correlation
20
![Page 21: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/21.jpg)
21
Ranked Correlation
Kendall’s Spearman’s rs
Correlation between -1 og 1. Where -1 indicates perfect inversse correlation , 0 indicates no
correlation, and 1 indicates perfect correlation
Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent
variables4. Residuals must have a constant spread.5. Residuals are normal distributed
![Page 22: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/22.jpg)
22
Kendall’s - An example
![Page 23: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/23.jpg)
23
Kendall’s - An example
121
nnS QPS
![Page 24: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/24.jpg)
24
Spearman – the same example
d2 1 4 9 1 1 1 9 9 1 16
0.68481010
52616
1 33
2
nnd
rs
![Page 25: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/25.jpg)
25
Korrelation i SPSS
![Page 26: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/26.jpg)
26
Korrelation i SPSS
Correlations
a b
a Pearson
Correlation
1 ,685*
Sig. (2-tailed) ,029
N 10 10
b Pearson
Correlation
,685* 1
Sig. (2-tailed) ,029
N 10 10
*. Correlation is significant at the 0.05 level (2-tailed).
Correlations
a b
Kendall's tau_b a Correlation
Coefficient
1,000 ,511*
Sig. (2-tailed) . ,040
N 10 10
b Correlation
Coefficient
,511* 1,000
Sig. (2-tailed) ,040 .
N 10 10
Spearman's rho a Correlation
Coefficient
1,000 ,685*
Sig. (2-tailed) . ,029
N 10 10
b Correlation
Coefficient
,685* 1,000
Sig. (2-tailed) ,029 .
N 10 10
*. Correlation is significant at the 0.05 level (2-tailed).
![Page 27: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/27.jpg)
Logistic regression
27
![Page 28: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/28.jpg)
28
Logistic Regression
• If the dependent variable is categorical and especially binary?
• Use some interpolation method
• Linear regression cannot help us.
![Page 29: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/29.jpg)
29
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
![Page 30: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/30.jpg)
30
The sigmodal curve
• The intercept basically just ‘scale’ the input variable
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 2; 1 = 1
0 = -2; 1 = 1
![Page 31: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/31.jpg)
31
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 0; 1 = 2
0 = 0; 1 = 0.5
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
![Page 32: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/32.jpg)
32
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 0; 1 = -1
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
• Positive regression coefficient → risk factor increases the probability
• Logistic regession uses maximum likelihood estimation, not least square estimation
![Page 33: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/33.jpg)
33
Does age influence the diagnosis? Continuous independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138
Constant -4,213 ,423 99,097 1 ,000 ,015
a. Variable(s) entered on step 1: Age.
age1
1
10
BBze
p z
![Page 34: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/34.jpg)
34
Does previous intake of OCP influence the diagnosis? Categorical independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043
Constant ,233 ,123 3,583 1 ,058 1,263
a. Variable(s) entered on step 1: OCP.
OCP1
1
10
BBze
p z
0.48051
11
1)1( 1, OCP If
0.55801
11
1)1( 0, OCP If
311.0233.01
233.0
10
0
eeYp
eeYp
BB
B
![Page 35: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/35.jpg)
35
Odds ratio
zeppo
1
0.7327 ratio odds 311.01010
0
10
eeeee BBBB
B
BB
![Page 36: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/36.jpg)
36
Multiple logistic regression
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
BMIageOCP1
1
3210
BBBBze
p z
![Page 37: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/37.jpg)
37
Predicting the diagnosis by logistic regression
What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?
z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
![Page 38: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/38.jpg)
38
Logistic regression in SPSS
![Page 39: Regression analysis](https://reader036.fdocuments.us/reader036/viewer/2022062301/56816326550346895dd3a02b/html5/thumbnails/39.jpg)
39
Logistic regression in SPSS