• Correlation
• Regression
• Multiple Regression
• Curve fitting
Correlation / regression
Correlation
• Represents the relationship between two measurements
– Examples : height and weight, education level and income, BMI and skin fold thickness, wealth and fertility
– Correlation does not represent one causing the other, usually is present if both measurements are influenced by a common factor
– The value is from -1 to 1
• 0 = no relationship
• 1 = perfect relationship
• -1 = perfect inverse relationship
Sxy
SSqxSSqyr
n
xxSSqx
22
n
yySSqy
2
2
n
yxxySxy
2
xsumx
22 xsumx ysumy
22 ysumy
xysumxy
Confidence interval for r
• Correlation is not a linear measurement
– It stretches near 0 and compresses neat 1 or -1
– It has to be
• Transformed into a normally distributed linear measurement
• Have Standard Error estimated
• Have CI estimated
• Transformed back to the original format
r
rLogz
1
15.0
3
1
n
SE
SEzF 96.1 SEzG 96.1
12exp
12exp
12exp
12exp%95
G
Gto
F
FCI
Sample size
• Iterative procedure that satisfy two equations
3
2
m
zzn ba
12211
n
rrr
Logm
WhereZa = z value for Type I errorzb = z value for Type II error
• Correlation
• Regression
• Multiple Regression
• Curve fitting
Correlation / regression
Regression
• Draw a line which best fits the relationship between x and y– The line takes the form y = a + bx
• Where a is the y value when x=0
• Where b is the slope of the line, or how much y changes for one unit of change in x
– It assumes that y is dependent on x– It explains how changes in y values are governed by changes in
x values– It allows x to predict y
• Note : x = a + by is not the mirror image of y=a +bx, as how best fit is calculated differs
Regression - example
RegressionBest fit y=a + bx
RegressionBest fit x=a + by
Sxy
SSqxb
n
xxSSqx
22
n
yySSqy
2
2
n
yxxySxy
2
xsumx
22 xsumx ysumy
22 ysumy
xysumxy
n
xb
n
ya
Confidence interval for b
2
2
).( nSSqxSxy
SSqySE xyb
tSEbtSEbCIb
t = Student’s t for sample size and Type I Error
Confidence interval for predicted y
• SE 2 components and changes with x value– SE of regression slope b
– SE of departure from residual variation
SSqx
meanxx
nSSqx
SxySSqySE xy
22
.
1
xyxyxy tSEbtSExyCI ... .
Confidence interval for predicted y
• Correlation
• Regression
• Multiple Regression
• Curve fitting
Correlation / regression
Multiple Regression
• Outcome, particularly clinical outcome
– Are subjected to multiple influences
– All of which are related to each other
• Multiple regression model is therefore commonly needed
•BMI is influenced by mother and grandparents, but
•People who married tend to have comparable BMI
•Parent’s BMI tend to be dependent influenced by grandparents’
•Multiple regression y = a + b1x1 + b2x2 + b3x3 …bixi
Multiple Regression
• Starts with a matrix of Sum/products [S]k,k from k measurements
– where Si,j is the Sxy between any pair I and j
– Where Si,i is the SSqx of variable i
• This matrix is inverted [V] = [S]-1
• The Partial Regression Coefficient bi
iiyy
iyi
VV
Vb
,,
,
• The constant an
xb
n
ya i
ki
• Correlation
• Regression
• Multiple Regression
• Curve fitting
Correlation / regression
Curve fit
• In cases where the relationship between x and y are not linear
• y = function(x)
– y = Log(x)
– y = sine(x)
• Polynomial curve fit
– A special case of multiple regression
– Will fit into any shape where y increases with x
– y = a + b1x + b2x2 + b3x3 …..bkxk
– In most biological systems fitting to the power of 3 is sufficient
Polynomial curve fit
0
2
4
6
8
10
0 1 2 3 4 5 6x
y
Data point
y = a + bx
y = a + b1x + b2x2
y = a + b1x + b2x2 + b3x3
CI of polynomial fit
• Complexity of calculating Standard Error– Summing of each individual coefficients– Residual
• Solution – 2 stage procedure– Do polynomial curve fit– Calculate error (distance between each
datapoint from the regression line)– Curve fit error
-2
0
2
4
6
8
10
12
1 2 3 4 5 6
y = 14.45 – 16.66x + 5.83x2 – 0.45x3
SD = 0.29 + 0.18x
Curve fitFemur length according to gestational age
Gestation (days)
Fem
ur
len
gth
(cm
s)
Top Related