Correlation

16
Xuhua Xia Slide 1 Correlation Simple correlation between two variables Multiple and Partial correlations between one variable and a set of other variables Canonical Correlation between two sets of variables each containing more than one variable. Simple and multiple correlations are special cases of canonical correlation. , 2 2 ( )( ) ( ) ( ) XY X X Y Y r X X Y Y . 2 2 . . 1 1 If 0, If , ( ) ( ) XY XZ YZ XY Z XZ YZ XZ YZ XY Z XY XY XZ YZ XY Z XY r r r r r r r r r r r r r sign r sign r 2 23 23 13 12 2 13 2 12 23 . 1 1 2 r r r r r r R Multiple: x 1 on x 2 and x 3 Partial: between X and Y with Z being controlled for

description

Correlation. Simple correlation between two variables Multiple and Partial correlations between one variable and a set of other variables Canonical Correlation between two sets of variables each containing more than one variable. - PowerPoint PPT Presentation

Transcript of Correlation

Page 1: Correlation

Xuhua Xia Slide 1

Correlation• Simple correlation

– between two variables

• Multiple and Partial correlations

– between one variable and a set of other variables

• Canonical Correlation

– between two sets of variables each containing more than one variable.

• Simple and multiple correlations are special cases of canonical correlation.

,2 2

( )( )

( ) ( )X Y

X X Y Yr

X X Y Y

.2 2

.

.

1 1

If 0,

If , ( ) ( )

XY XZ YZXY Z

XZ YZ

XZ YZ XY Z XY

XY XZ YZ XY Z XY

r r rr

r r

r r r r

r r r sign r sign r

223

2313122

132

1223.1 1

2

r

rrrrrR

Multiple: x1 on x2 and x3

Partial: between X and Y with Z being controlled for

Page 2: Correlation

Xuhua Xia Slide 2

Review of correlationX Z Y1 4 14.00001 5 17.90871 6 16.32552 3 14.44412 4 15.29522 5 19.15872 6 16.02992 5 17.00003 3 14.75563 4 17.68233 5 20.53013 6 21.64084 3 15.09034 4 18.16034 5 22.24715 2 14.44505 3 16.55545 4 21.00475 5 22.00006 1 19.00006 2 18.00006 3 18.18636 4 21.0000

Compute Pearson correlation coefficients between X and Z, X and Y and Z and Y.

Compute partial correlation coefficient between X and Y, controlling for Z (i.e., the correlation coefficient between X and Y when Z is held constant), by using the equation in the previous slide.

Run SAS to verify your calculation:

proc corr pearson;

var X Y;

partial Z;

run;

Page 3: Correlation

Xuhua Xia Slide 3

Many Possible Correlations

• With multiple DV’s and IV’s, there could be many correlation patterns:– Variable A in the DV set could be correlated to variables

a, b, c in the IV set

– Variable B in the DV set could be correlated to variables c, d in the IV set

– Variable C in the DV set could be correlated to variables a, c, e in the IV set

• With these plethora of possible correlated relationships, what is the best way of summarizing them?

Page 4: Correlation

Xuhua Xia Slide 4

Dealing with Two Sets of Variables

• The simple correlation approach:– For N DV’s and M IV’s, calculate the simple correlation

coefficient between each of N DV’s and each of M IV’s, yielding a total of N*M correlation coefficients

• The multiple correlation approach:– For N DV’s and M IV’s, calculate multiple or partial

correlation coefficients between each of N DV’s and the set of M IV’s, yielding a total of N correlation coefficients

• The canonical correlation• Note: All these deal with linear correlations

Page 5: Correlation

Xuhua Xia Slide 5

Fitness Data

/* First three variables: physical Last three variables: exercise Middle-aged men*/data fit; input weight waist pulse chins situps jumps @@; cards;191 36 50 5 162 60 189 37 52 2 130 60193 38 58 12 101 101 162 35 62 12 145 37189 35 46 13 145 58 182 36 56 4 141 42211 38 56 8 151 38 167 34 60 6 155 40176 31 74 15 200 40 154 30 56 17 251 250169 34 50 17 120 38 166 33 52 13 210 115154 34 64 14 215 105 247 46 50 1 50 50193 36 46 6 170 31 202 37 62 12 120 120176 37 54 4 160 25 157 32 52 11 230 80156 33 54 15 215 73 138 33 68 2 150 43;

Page 6: Correlation

Xuhua Xia Slide 6

SAS Program

proc cancorr data=fit vdep wdep smc stb t probt vprefix=PHYS vname='Physical Measurements' wprefix=EXER wname='Exercises'; var weight waist pulse; with chins situps jumps; title2 'Middle-aged Men in a Health Fitness Club'; title3 'Data Courtesy of Dr. A. C. Linnerud, NC State Univ.';run;

What’s the meaning of these cryptic terms?Next slide

Page 7: Correlation

Xuhua Xia Slide 7

SAS Program

proc cancorr data=fit short vdep wdep smc stb t probt

• SHORT - suppresses all default output except the tables of Canonical correlations and multivariate statistics.

• VDEP - requests multiple regression analyses with the VAR variable as dependent variables and the WITH variables as regressors. WDEP does the opposite

• SMC - prints squared multiple correlations and F tests for the regression analyses

• The STB option requests standardized regression coefficients.

• VPREFIX - specify a variable prefix for canonical variables instead of using the default V1, V2, and so on. WPREFIX does the same.

Page 8: Correlation

Xuhua Xia Slide 8

Multiple Correlations

DV: the Physical MeasurementsIV: Exercises

Squared Multiple Correlations and F Tests 3 numerator df 16 denominator df

95% CI for R2

R2 R2.adj Lower Upper F Pr > Fweight 0.517798 0.427385 0.065 0.736 5.73 0.0074waist 0.752679 0.706306 0.380 0.877 16.23 <.0001pulse 0.037362 -.143132 0.000 0.177 0.21 0.8901

Weight and WAIST are significantly associated with the exercise variables.

Page 9: Correlation

Xuhua Xia Slide 9

Regression of Phys. on Exer.

Standardized Regression Coefficients weight waist pulsechins -0.1059 -0.2791 0.1281situps -0.7273 -0.7640 0.1351jumps 0.1619 0.1465 -0.0909

t Values for the Regression Coefficients weight waist pulsechins -0.4957 -1.8243 0.4244situps -3.4776 -5.1007 0.4571jumps 0.7768 0.9809 -0.3087

Prob > |t| for the Regression Coefficients weight waist pulsechins 0.6268 0.0868 0.6769situps 0.0031 0.0001 0.6537jumps 0.4486 0.3412 0.7615

Page 10: Correlation

Xuhua Xia Slide 10

Multiple Correlations

DV: Exercises IV: the Physical Measurements

Squared Multiple Correlations and F Tests 3 numerator df 16 denominator df

95% CI for R2

R2 R2.adj Lower Upper F Pr> Fchins 0.408377 0.297448 0.000 0.657 3.68 0.0344situps 0.716127 0.662901 0.316 0.857 13.45 0.0001jumps 0.144544 -.015853 0.000 0.395 0.90 0.4622

Page 11: Correlation

Xuhua Xia Slide 11

Regression of Exer. on Phys. Standardized Regression Coefficients

chins situps jumpsweight 0.4994 0.0468 0.2802waist -1.0261 -0.9209 -0.6102pulse -0.0085 -0.1324 -0.0658

t Values for the Regression Coefficients

chins situps jumpsweight 1.2653 0.1710 0.5904waist -2.6335 -3.4120 -1.3024pulse -0.0411 -0.9249 -0.2649

Prob > |t| for the Regression Coefficients

chins situps jumpsweight 0.2239 0.8664 0.5632waist 0.0181 0.0036 0.2112pulse 0.9678 0.3688 0.7945

Page 12: Correlation

Xuhua Xia Slide 12

Canonical Correlation Adjusted Approx Squared Canonical Canonical Standard Canonical Correlation Correlation Error Correlation

1 0.878578 0.856195 0.052330 0.7718992 0.264992 0.080853 0.213306 0.0702213 0.062661 . 0.228515 0.003926 Eigenvalue Difference Proportion Cumulative

1 3.3840 3.3085 0.9771 0.97712 0.0755 0.0716 0.0218 0.99893 0.0039 0.0011 1.0000

Significance test:

Eigenvalue Likelihood Approximate Ratio F Value Num DF Den DF Pr > F

1 0.21125051 3.40 9 34.223 0.00442 0.92612863 0.29 4 30 0.87993 0.99607358 0.06 1 16 0.8049

Page 13: Correlation

Xuhua Xia Slide 13

Standardized Canonical Coefficients

for the Physical Measurements

PHYS1 PHYS2 PHYS3

weight -0.1899 2.0261 0.2691waist 1.1929 -1.5800 -0.4314pulse 0.1218 0.3245 -1.0176

for the exercises

EXER1 EXER2 EXER3

chins -0.3383 1.0114 -0.6139situps -0.8614 -0.8403 -0.0579jumps 0.1512 0.2536 1.1640

Because the variables are not measured in the same units, the standardized coefficients rather than the raw coefficients should be interpreted.

Page 14: Correlation

Xuhua Xia Slide 14

Canonical Structure: correlationsBetween Phys. and their canonical var.:

PHYS1 PHYS2 PHYS3weight 0.8028 0.5335 0.2662waist 0.9872 0.0737 0.1416pulse -0.2061 0.1098 -0.9723

Between Exer. and their canonical var.:

EXER1 EXER2 EXER3chins -0.6945 0.7165 -0.0658situps -0.9609 -0.2169 0.1721jumps -0.4141 0.3671 0.8329

Between Phys. and the canonical var. of Exer.:

EXER1 EXER2 EXER3weight 0.7054 0.1414 0.0167waist 0.8673 0.0195 0.0089pulse -0.1811 0.0291 -0.0609

Between Exer. and the canonical var. of Phys.:

PHYS1 PHYS2 PHYS3chins -0.6102 0.1899 -0.0041situps -0.8442 -0.0575 0.0108jumps -0.3638 0.0973 0.0522

Page 15: Correlation

Xuhua Xia Slide 15

Ecology datadata candata;input Sp1 Sp2 Sp3 Sp4 Chem1 Chem2 Chem3 Chem4;cards;21.09 21.90 9.19 9.18 20.96 21.52 7.46 7.4114.69 14.85 14.06 14.07 14.80 14.63 13.71 13.692.11 2.17 3.13 3.06 3.17 2.43 2.10 1.969.58 9.47 8.14 8.06 9.54 9.71 9.36 9.4310.02 10.71 9.02 9.06 11.16 10.59 10.91 11.1014.65 14.32 15.10 15.15 14.59 14.61 13.55 13.5524.42 24.12 6.00 6.12 24.36 24.50 4.30 4.3422.20 22.10 4.14 4.04 23.37 22.74 4.90 5.068.34 8.88 9.16 9.06 8.75 8.19 7.59 7.5810.49 10.12 11.08 11.13 10.09 10.73 9.55 9.5625.72 25.91 1.12 1.16 25.94 26.01 1.98 1.994.16 4.44 3.05 3.09 3.97 4.89 4.53 4.5312.07 12.31 11.09 11.15 12.68 12.89 12.62 12.7819.13 19.36 11.13 11.05 18.69 19.05 9.01 9.165.80 5.15 4.11 4.18 6.07 6.33 5.10 4.961.27 1.15 2.10 2.17 1.27 1.80 0.73 0.7522.15 22.52 8.01 8.04 22.08 22.53 7.43 7.3126.53 26.27 0.14 0.11 26.33 26.88 0.55 0.5717.25 17.68 11.12 11.18 17.39 17.76 9.51 9.557.94 7.46 6.13 6.03 7.53 7.67 7.51 7.474.12 4.45 3.08 3.14 5.21 4.65 3.92 4.0017.59 17.53 11.19 11.04 16.97 16.70 12.30 12.2615.41 15.16 13.12 13.03 15.79 16.01 12.00 11.8312.90 12.93 11.12 11.12 12.80 12.04 11.52 11.5219.14 19.11 7.16 7.14 19.88 19.84 8.86 8.9025.11 25.50 3.13 3.20 25.28 25.44 4.26 4.23;

Page 16: Correlation

Xuhua Xia Slide 16

SAS Program (cont.)proc cancorr vdep wdep smc stb t probt vprefix=BIO vname='Species' wprefix=ENV wname='Environment'; var Sp1 Sp2 Sp3 Sp4; with Chem1 Chem2 Chem3 Chem4;run;

Run and explain