Linear Regression and Correlation
description
Transcript of Linear Regression and Correlation
![Page 1: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/1.jpg)
Linear Regression and Correlation
![Page 2: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/2.jpg)
Fitted Regression Line
54 56 58 60 62 64 66 68 7080
100
120
140
160
180
200
Length (cm)
Y=W
eigh
t(g)
![Page 3: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/3.jpg)
Equation of the Regression Line
XbbY 10 Least squares regression line of Y on X
2)())((
1 xxyyxx
i
iib
xbyb 10
![Page 4: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/4.jpg)
Regression Calculations
![Page 5: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/5.jpg)
![Page 6: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/6.jpg)
![Page 7: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/7.jpg)
Plotting the regression line
![Page 8: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/8.jpg)
Residuals
Using the fitted line, it is possible to obtain an estimate of the y coordinate
The “errror” in the fit we term the “residual error”
ii xbby 10ˆ
ii yy ˆ
![Page 9: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/9.jpg)
54 56 58 60 62 64 66 68 7080
100
120
140
160
180
200
Length (cm)
Y=W
eigh
t(g)
Residual
![Page 10: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/10.jpg)
Residual Standard Deviation
2)ˆ( 2
|
nyys i
XY
54 56 58 60 62 64 66 68 7080
100
120
140
160
180
200
Length (cm)
Y=W
eigh
t(g)
![Page 11: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/11.jpg)
Residuals from example
![Page 12: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/12.jpg)
Other ways to evaluate residuals
Lag plots, plot residuals vs. time delay of residuals…looks for temporal structure.
Look for skew in residualsKurtosis in residuals – error not distributed
“normally”.
![Page 13: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/13.jpg)
0
20
40
0
20
400
20
40
0
20
40
0 0.3-0.3
Model Residuals: constrained
Pairwise model
Independent model
0 0.3-0.3
Model Residuals: freely moving
Pairwise model
Independent model
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
-0.4 -0.3 -0.2 -0.1 0 0.1-0.4
-0.3
-0.2
-0.1
0
0.1
Pairwise model
Independent model
-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
-0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
Pairwise model
Independent model
![Page 14: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/14.jpg)
Parametric Interpretation of regression: linear models
Conditional Populations and Conditional DistributionsA conditional population of Y values associated
with a fixed, or given, value of X.A conditional distribution is the distribution of
values within the conditional population above
XY |
X|Y
Population mean Y value for a given X
Population SD of Y value for a given X
![Page 15: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/15.jpg)
The linear model
Assumptions:LinearityConstant standard deviation
XY
X
Y
10
10X|Y
X|Y
![Page 16: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/16.jpg)
Statistical inference concerning
You can make statistical inference on model parameters themselves
1
XY |1b
0b estimates 0estimates
XYs | estimates
1
![Page 17: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/17.jpg)
Standard error of slope
95% Confidence interval for
2
|
)(1 xx
sSE
i
XYb
1
1025.01 bSEtb where
![Page 18: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/18.jpg)
Hypothesis testing: is the slope significantly different from zero?
1 = 0Using the test statistic:
1
1
bs SE
bt
df=n-2
![Page 19: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/19.jpg)
Coefficient of Determination
r2, or Coefficient of determination: how much of the variance in data is accounted for by the linear model.
2
2
)()ˆ(
1yyyy
i
i
![Page 20: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/20.jpg)
Line “captures” most of the data variance.
![Page 21: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/21.jpg)
Correlation Coefficient
R is symmetrical under exchange of x and y.
X
Y
ssrb *1
22 )()(
))((
yyxx
yyxxr
ii
ii
![Page 22: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/22.jpg)
What’s this?It adjusts R to compensate for the factThat adding even uncorrelated variables tothe regression improves R
![Page 23: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/23.jpg)
Statistical inference on correlations
Like the slope, one can define a t-statistic for correlation coefficients:
21
11
1r
nrSEbtb
s
![Page 24: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/24.jpg)
![Page 25: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/25.jpg)
Consider the following some “Spike Triggered Averages”:
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-2mV
02mV
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-10mV
010mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-1mV
01mV
-5 0 5 10 15-2mV
02mV
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-2mV
02mV
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-10mV
010mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-1mV
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-2mV
02mV
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-500V
0500V
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-10mV
010mV
-5 0 5 10 15-5mV
05mV
-5 0 5 10 15-1mV
01mV
-5 0 5 10 15-2mV
02mV
![Page 26: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/26.jpg)
STA example
-10 -5 0 5-6
-4
-2
0
2
4
6
R2=0.25.Is this correlation
significant?
N=446, t = 0.25*(sqrt(445/(1-0.25^2))) = 5.45
21
11
1r
nrSEbtb
s
![Page 27: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/27.jpg)
When is Linear Regression Inadequate?
CurvilinearityOutliersInfluential points
![Page 28: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/28.jpg)
Curvilinearity
60 61 62 63 64 65 66 67 68 69 70-25
-20
-15
-10
-5
0
![Page 29: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/29.jpg)
OutliersCan reduce correlations and unduly influence the
regression lineYou can “throw out” some clear outliersA variety of tests to use. Example? Grubb’s test
SDvaluemean
Z
Look up critical Z value in a table Is your z value larger? Difference is significant and data
can be discarded.
![Page 30: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/30.jpg)
Influential pointsPoints that have a lot of influence on
regressed modelNot really an outlier, as residual is small.
50 55 60 65 70 75 80 85 90 95 100 10550
100
150
200
250
300
![Page 31: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/31.jpg)
Conditions for inference
Design conditions Random subsampling model: for each x observed, y is viewed as
randomly chosen from distribution of Y values for that X Bivariate random sampling: each observed (x,y) pair must be
independent of the others. Experimental structure must not include pairing, blocking, or an internal hierarchy.
Conditions on parameters X10X|Y
XYs | is not a function of X
Conditions concerning population distributions Same SD for all levels of X Independent Observatinos Normal distribution of Y for each fixed X Random Samples
![Page 32: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/32.jpg)
Error Bars on Coefficients of Model
![Page 33: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/33.jpg)
MANOVA and ANCOVA
![Page 34: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/34.jpg)
MANOVA
Multiple Analysis of VarianceDeveloped as a theoretical construct by
S.S. Wilks in 1932Key to assessing differences in groups
across multiple metric dependent variables, based on a set of categorical (non-metric) variables acting as independent variables.
![Page 35: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/35.jpg)
MANOVA vs ANOVA
ANOVAY1 = X1 + X2 + X3 +...+ Xn
(metric DV) (non-metric IV’s)
MANOVA Y1 + Y2 + ... + Yn = X1 + X2 + X3 +...+ Xn
(metric DV’s) (non-metric IV’s)
![Page 36: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/36.jpg)
ANOVA Refresher
SS Df MS FBetween SS(B) k-1
Within SS(W) N-k
Total SS(W)+SS(B) N-1
1)(
kBSS
)()(
WMSBMS
kNWSS)(
Reject the null hypothesis if test statistic is greater than critical F value with k-1Numerator and N-k denominator degrees of freedom. If you reject the null,At least one of the means in the groups are different
![Page 37: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/37.jpg)
MANOVA Guidelines
Assumptions the same as ANOVAAdditional condition of multivariate
normalityall variables and all combinations of the
variables are normally distributedAssumes equal covariance matrices
(standard deviations between variables should be similar)
![Page 38: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/38.jpg)
Example The first group receives technical
dietary information interactively from an on-line website. Group 2 receives the same information in from a nurse practitioner, while group 3 receives the information from a video tape made by the same nurse practitioner.
User rates based on usefulness, difficulty and importance of instruction
Note: three indexing independent variables and three metric dependent variables
![Page 39: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/39.jpg)
Hypotheses
H0: There is no difference between treatment group (online learners) from oral learners and visual learners.
HA: There is a difference.
![Page 40: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/40.jpg)
Order of operations
![Page 41: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/41.jpg)
MANOVA Output 2
Individual ANOVAs not significant
![Page 42: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/42.jpg)
MANOVA output
Overall multivariate effect is signficant
![Page 43: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/43.jpg)
Post hoc tests to find the culprit
![Page 44: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/44.jpg)
Post hoc tests to find the culprit!
![Page 45: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/45.jpg)
Once more, with feeling: ANCOVA
Analysis of covarianceHybrid of regression analysis and ANOVA
style methodsSuppose you have pre-existing effect
differences between subjectsSuppose two experimental conditions, A and
B, you could test half your subjects with AB (A then B) and the other half BA using a repeated measures design
![Page 46: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/46.jpg)
Why use?Suppose there exists a particular variable that *explains* some of
what’s going on in the dependent variable in an ANOVA style experiment.
Removing the effects of that variable can help you determine if categorical difference is “real” or simply depends on this variable.
In a repeated measures design, suppose the following situation: sequencing effects, where performing A first impacts outcomes in B. Example: A and B represent different learning methodologies.
ANCOVA can compensate for systematic biases among samples (if sorting produces unintentional correlations in the data).
![Page 47: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/47.jpg)
Example
![Page 48: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/48.jpg)
![Page 49: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/49.jpg)
Results
![Page 50: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/50.jpg)
Second Example
How does the amount spent on groceries, and the amount one intends to spend depend on a subjects sex?
H0: no dependenceTwo analyses:
MANOVA to look at the dependenceANCOVA to determine if the root of there is
significant covariance between intended spending and actual spending
![Page 51: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/51.jpg)
MANOVA
![Page 52: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/52.jpg)
Results
![Page 53: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/53.jpg)
ANCOVA
![Page 54: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/54.jpg)
ANCOVA Results
So if you remove the amount the subjects intend to spend from the equation,No significant difference between spending. Spending difference not a resultOf “impulse buys”, it seems.
![Page 55: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/55.jpg)
Principal Component Analysis
Say you have time series data, characterized by multiple channels or trials. Are there a set of factors underlying the data that explain it (is there a simpler exlplanation for observed behavior)?
In other words, can you infer the quantities that are supplying variance to the observed data, rather than testing *whether* known factors supply the variance.
![Page 56: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/56.jpg)
0 500 1000 1500 2000 2500
Example: 8 channels of recorded EMG activity
![Page 57: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/57.jpg)
PCA works by “rotating” the data (considering a time series as a spatial vector) to a “position” in the abstract space that minimizes covariance.
Don’t worry about what this means.
![Page 58: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/58.jpg)
0 500 1000 1500 2000 2500
0 500 1000 1500 2000 2500
Note how a single component explainsalmost all of the variance in the 8 EMGsRecorded.
Next step would be to correlatethese components with some other parameter in the experiment.
![Page 59: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/59.jpg)
0 500 1000 1500 2000 2500
Largest PCN
eura
l firi
ng ra
tes
![Page 60: Linear Regression and Correlation](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815e80550346895dcd0d39/html5/thumbnails/60.jpg)
Some additional uses: Say you have a very large data set, but
believe there are some common features uniting that data set
Use a PCA type analysis to identify those common features.
Retain only the most important components to describe “reduced” data set.