Disc Rims As
-
Upload
sayan-chakraborty -
Category
Documents
-
view
220 -
download
0
Transcript of Disc Rims As
-
7/31/2019 Disc Rims As
1/34
Discriminant analysis using SAS macro - 1
Discriminant analysis
Data : Diabetic dataStep 1: Creating SAS- data
Step 2 : Perform the Exploratory scatter plots by group by running the SAS macro DISCRIM and inputting YES to data exploration. Also run stepwise discriminantanalysis to select significant variables.
Step 3: Checking for multi-variate normality and outliers by running the SAS macroDISCRIM and inputting YES to assumption check
Step 4 Run Canonical and PARAMETRIC Discriminant Function analyses by running the SAS macro DISCRIM and leaving data exploration field blank .
Step 5: Examine canonical discriminant analysis results and bi-plot display of discrimination.
Step 6:If Multivariate normality assumption is severely violated, perform non-paramet ric discriminant function analysis by inputting YES to non-parametric option.
Step 7: Check the classification results compare the error rates for cross validation
Step 8:If you have independent validation data set, confirm the discrimination results usingthe validation data
Step 9: Summary and Conclusions
-
7/31/2019 Disc Rims As
2/34
Discriminant analysis using SAS macro - 2
Creating SAS data set:
If you are using the sample data, open the downloaded discrim-data.SAS file andsubmit.If you are using some other data, create a SAS data set.
proc format;
value gp 3='(3) Overt Diabetic ' 2='(2) Chem. Diabetic' 1='(1) Normal';
run;
data diabetic;
input patient relwt glufast glutest instest sspg group $;
label relwt = 'Relative weight'
glufast = 'Fasting Plasma Glucose'
glutest = 'Test Plasma Glucose'
instest = 'Plasma Insulin during Test'
sspg = 'Steady State Plasma Glucose'
group = 'Clinical Group';
datalines;
1 0.81 80 356 124 55 1
2 0.95 97 289 117 76 1
3 0.94 105 319 143 105 1
4 1.04 90 356 199 108 1
5 1.00 90 323 240 143 1
6 0.76 86 381 157 165 1
7 0.91 100 350 221 119 1
8 1.10 85 301 186 105 1
9 0.99 97 379 142 98 1
10 0.78 97 296 131 94 1
11 0.90 91 353 221 53 1
12 0.73 87 306 178 66 1
13 0.96 78 290 136 142 1
14 0.84 90 371 200 93 1
15 0.74 86 312 208 68 1
16 0.98 80 393 202 102 1
17 1.10 90 364 152 76 1
(complete data not shown)
141 1.05 353 1428 41 480 3
142 0.91 180 923 77 150 3
143 0.90 213 1025 29 209 3144 1.11 328 1246 124 442 3
145 0.74 346 1568 15 253 3
;
PROC PRINT DATA=diabetic(obs=10) label;
TITLE ' Discriminant analysis example';
run;
-
7/31/2019 Disc Rims As
3/34
Discriminant analysis using SAS macro - 3
Step 2 : Perform the Exploratory scatter plots by running the SAS macro DISCRIM and inputting YES to data exploration
< Open the macro-call window discrim by running the downloaded macro-call filediscrim.sas, enter the appropriate macro variable names, and submit.
-
7/31/2019 Disc Rims As
4/34
Discriminant analysis using SAS macro - 4
The Method for Selecting Variables is BACKWARD
Observations 145 Variable(s) in the Analysis 5
Class Levels 3 Variable(s) will be Included 0
Significance Level to Stay 0.15
Statistics for Removal, DF = 2, 138
Variable Label
PartialR-
Square F Value Pr > F
relwt Relative weight 0.0749 5.58 0.0047
glufast Fasting Plasma Glucose 0.2781 26.58
-
7/31/2019 Disc Rims As
5/34
Discriminant analysis using SAS macro - 5
Stepwise Selection Summary
umber
In Entered Removed Label
PartialR-
Square
F
Value Pr > F
Wilks'
Lambda
Pr
ASCC1 glutest Test
PlasmaGlucose
0.7738 242.84
-
7/31/2019 Disc Rims As
6/34
Discriminant analysis using SAS macro - 6
Forward Selection Summary
epNumber
In Entered Label
PartialR-
SquareF
Value Pr > FWilks'
LambdaPr ASCC
1 1 glutest Test
PlasmaGlucose
0.7738 242.84
-
7/31/2019 Disc Rims As
7/34
Discriminant analysis using SAS macro - 7
Selected Scatter Plots:
-
7/31/2019 Disc Rims As
8/34
Discriminant analysis using SAS macro - 8
-
7/31/2019 Disc Rims As
9/34
Discriminant analysis using SAS macro - 9
Step 2 : Perform the canonical and parametric discriminant function analysis by running the SAS macro discrim and keep data exploration field Blank . Also Check for multivariatenormality and influential outliers
< Open the macro-call window discrim by running the downloaded macro-call file discrim.sas, enterthe appropriate macro variable names, and submit.
-
7/31/2019 Disc Rims As
10/34
Discriminant analysis using SAS macro - 10
Clinical Group=1
Obs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5
1 0.11252 0.029294 -0.099238 1.71461 0.95699 -0.75959 0.14815 -0.71406 5.43254 0.55298
Clinical Group=2
bs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5
2 -0.34755 -0.23682 0.70585 1.24705 -0.65609 -0.72631 -0.39443 0.037414 1.16893 -0.35500
Clinical Group=3
bs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5
3 0.021434 0.37220 0.13407 2.11185 -0.11574 -0.88786 -1.25087 -1.16310 5.82761 -0.44997
Multivariate normality test statistics
M_SKEW
8.433 Multivariate skewness
CHI_SKEW
203.798 skewness chisquare
PVALSKEW
0.000 skewness P-value
M_KURT
52.853 Multivariate kurtosis
Z_KURT
12.847 kurtosis z-value
PVALKURT
0.000 kurtosis P-value
-
7/31/2019 Disc Rims As
11/34
Discriminant analysis using SAS macro - 11
id rdsq chisq diff
86 46.5157 17.6296 28.8860
144 30.2647 15.0041 15.2606
131 25.7032 13.7552 11.9480
141 24.0769 12.9202 11.1567145 23.2948 12.2891 11.0057
139 22.3210 11.7799 10.5411
134 19.3147 11.3522 7.9625
116 16.3652 10.9827 5.3825
93 15.8799 10.6570 5.2229
82 14.6057 10.3655 4.2402
133 13.9789 10.1014 3.8775
137 13.6007 9.8597 3.7410
69 12.9646 9.6367 3.3279
89 12.1243 9.2364 2.8880
113 12.2535 9.4297 2.8238
136 11.5431 8.8837 2.6593
124 11.3769 8.7218 2.6552
99 11.2212 8.5680 2.6532
114 11.6813 9.0548 2.6264
-
7/31/2019 Disc Rims As
12/34
Discriminant analysis using SAS macro - 12
-
7/31/2019 Disc Rims As
13/34
Discriminant analysis using SAS macro - 13
-
7/31/2019 Disc Rims As
14/34
Discriminant analysis using SAS macro - 14
Observations 145 DF Total 144
Variables 5 DF Within Classes 142
Classes 3 DF Between Classes 2
Class Level Information
group
VariableName
Frequency Weight Proportion
PriorProbabilit
y
1 _1 76 76.0000 0.524138 0.524138
2 _2 36 36.0000 0.248276 0.248276
3 _3 33 33.0000 0.227586 0.227586
Total-Sample
Variable Label N Sum Mean Variance
StandardDeviatio
n
relwt Relative weight 145 141.71000 0.97731 0.01670 0.1292
glufast Fasting Plasma Glucose 145 17688 121.98621 4087 63.9304
glutest Test Plasma Glucose 145 78824 543.61379 100458 316.9509
instest Plasma Insulin during Test 145 26987 186.11724 14625 120.9352
sspg Steady State Plasma Glucose 145 26710 184.20690 11242 106.0299
II. Canonical discriminant analysis
-
7/31/2019 Disc Rims As
15/34
Discriminant analysis using SAS macro - 15
group = 1
Variable Label N Sum Mean Variance
StandardDeviatio
n
relwt Relative weight 76 71.23000 0.93724 0.01652 0.1285
glufast Fasting Plasma Glucose 76 6930 91.18421 67.69895 8.2279
glutest Test Plasma Glucose 76 26598 349.97368 1359 36.8706instest Plasma Insulin during Test 76 13121 172.64474 4741 68.8538
sspg Steady State Plasma Glucose 76 8664 114.00000 3310 57.5328
group = 2
Variable Label N Sum Mean Variance
StandardDeviatio
n
relwt Relative weight 36 38.01000 1.05583 0.01021 0.1010glufast Fasting Plasma Glucose 36 3575 99.30556 90.04683 9.4893
glutest Test Plasma Glucose 36 17782 493.94444 3070 55.4117
instest Plasma Insulin during Test 36 10368 288.00000 24911 157.8317
sspg Steady State Plasma Glucose 36 7523 208.97222 3595 59.9593
group = 3
Variable Label N Sum Mean Variance
StandardDeviatio
n
relwt Relative weight 33 32.47000 0.98394 0.01447 0.1203
glufast Fasting Plasma Glucose 33 7183 217.66667 5862 76.5632
glutest Test Plasma Glucose 33 34444 1044 95725 309.3953
instest Plasma Insulin during Test 33 3498 106.00000 8728 93.4251
sspg Steady State Plasma Glucose 33 10523 318.87879 7801 88.3221
-
7/31/2019 Disc Rims As
16/34
Discriminant analysis using SAS macro - 16
The DISCRIM ProcedureTest of Homogeneity of Within Covariance Matrices
Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function. Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.
Chi-Square DF Pr > ChiSq
396.799635 30
-
7/31/2019 Disc Rims As
17/34
-
7/31/2019 Disc Rims As
18/34
Discriminant analysis using SAS macro - 18
Eigenvalues of Inv(E)*H= CanRsq/(1-CanRsq)
anonicalrrelation
AdjustedCanonical
Correlation
ApproximateStandard
Error
SquaredCanonical
Correlation Eigenvalue Difference Proportion Cumulative
0.909389 0.906426 0.014418 0.826988 4.7799 4.1355 0.8812 0.8812
0.625998 0.618252 0.050677 0.391874 0.6444 0.1188 1.0000
Test of H0: The canonical correlations in the current row and all that follow are zero
LikelihoodRatio
ApproximateF Value Num DF Den DF Pr > F
1 0.10521315 57.49 10 276
-
7/31/2019 Disc Rims As
19/34
Discriminant analysis using SAS macro - 19
-
7/31/2019 Disc Rims As
20/34
Discriminant analysis using SAS macro - 20
-
7/31/2019 Disc Rims As
21/34
Discriminant analysis using SAS macro - 21
-
7/31/2019 Disc Rims As
22/34
Discriminant analysis using SAS macro - 22
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC Resubstitution Summary using Quadratic Discriminant Function
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Total
1 7598.68
11.32
00.00
76100.00
2 38.33
3391.67
00.00
36100.00
3 00.00
39.09
3090.91
33100.00
Total 7853.79
3725.52
3020.69
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0132 0.0833 0.0909 0.0483
Priors 0.5241 0.2483 0.2276
Parametric Discriminant Function analysis
-
7/31/2019 Disc Rims As
23/34
Discriminant analysis using SAS macro - 23
The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC
Cross-validation Results using Quadratic Discriminant Function
Posterior Probability of Membership in group
Obs From group Classified into group 1 2 3
26 1 2 * 0.2237 0.5779 0.1984
63 1 2 * 0.3550 0.6380 0.0070
64 1 2 * 0.3708 0.6067 0.0225
75 1 2 * 0.0000 0.9998 0.0002
79 2 1 * 0.6305 0.3624 0.0071
83 2 1* 0.5895 0.4035 0.0070
95 2 3 * 0.0000 0.4623 0.5377
96 2 1 * 0.5005 0.4485 0.0510
107 2 3 * 0.0000 0.1789 0.8211
110 2 1 * 0.8438 0.1498 0.0063
111 2 3 * 0.0000 0.0872 0.9128
131 3 2 * 0.0000 1.0000 0.0000
134 3 2 * 0.0000 0.9777 0.0223
136 3 2* 0.0000 0.9328 0.0672
* Misclassified observation
-
7/31/2019 Disc Rims As
24/34
Discriminant analysis using SAS macro - 24
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC
Cross-validation Summary using Quadratic Discriminant Function
Number of Observations and Percent Classified into group
From group 1 2 3 Total
1 7294.74
45.26
00.00
76100.00
2 411.11
2980.56
38.33
36100.00
3 00.00
39.09
3090.91
33100.00
Total 7652.41
3624.83
3322.76
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0526 0.1944 0.0909 0.0966
Priors 0.5241 0.2483 0.2276
-
7/31/2019 Disc Rims As
25/34
Discriminant analysis using SAS macro - 25
Non-parametric Discriminant Function analysis Perform non- parametric discriminant function analysis by running the SAS macro discrim, inputYES to the Non-par field and keep data exploration field Blank .
-
7/31/2019 Disc Rims As
26/34
Discriminant analysis using SAS macro - 26
Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.DIABETIC Resubstitution Summary using 2 Nearest Neighbors
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Other Total
1 7294.74 00.00 00.00 45.26 76100.00
2 00.00
2980.56
00.00
719.44
36100.00
3 00.00
00.00
2884.85
515.15
33100.00
Total 7249.66
2920.00
2819.31
1611.03
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group1 2 3 Total
Rate 0.0526 0.1944 0.1515 0.1103
Priors 0.5241 0.2483 0.2276
-
7/31/2019 Disc Rims As
27/34
Discriminant analysis using SAS macro - 27
Posterior Probability of Membership in group
Obs From group Classified into group 1 2 364 1 2 * 0.0000 1.0000 0.0000
124 3 2 * 0.0000 1.0000 0.0000
131 3 2 * 0.0000 1.0000 0.0000
134 3 2 * 0.0000 1.0000 0.0000
136 3 2 * 0.0000 1.0000 0.0000
* Misclassified observation
Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance
The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC
Cross-validation Results using 2 Nearest Neighbors
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC
Cross-validation Summary using 2 Nearest Neighbors
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Total
1 7598.68
11.32
00.00
76100.00
2 00.00
36100.00
00.00
36100.00
3 00.00
412.12
2987.88
33100.00
Total 7551.72
4128.28
2920.00
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0132 0.0000 0.1212 0.0345
Priors 0.5241 0.2483 0.2276
-
7/31/2019 Disc Rims As
28/34
-
7/31/2019 Disc Rims As
29/34
Discriminant analysis using SAS macro - 29
Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance
The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC
Cross-validation Results using 3 Nearest Neighbors
Posterior Probability of Membership in groupObs From group Classified into group 1 2 3
62 1 2 * 0.3363 0.6637 0.0000
63 1 2 * 0.3363 0.6637 0.0000
64 1 2 * 0.0000 1.0000 0.0000
66 1 2 * 0.3363 0.6637 0.0000
69 1 2 * 0.3363 0.6637 0.0000
75 1 2 * 0.3363 0.6637 0.0000
84 2 1 * 0.6604 0.3396 0.000085 2 1 * 0.6604 0.3396 0.0000
96 2 1 * 0.6604 0.3396 0.0000
105 2 1 * 0.6604 0.3396 0.0000
124 3 2 * 0.0000 1.0000 0.0000
131 3 2 * 0.0000 1.0000 0.0000
134 3 2 * 0.0000 1.0000 0.0000
135 3 2 * 0.0000 0.6598 0.3402
136 3 2 * 0.0000 1.0000 0.0000
* Misclassified observation
-
7/31/2019 Disc Rims As
30/34
Discriminant analysis using SAS macro - 30
Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC
Cross-validation Summary using 3 Nearest Neighbors
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Total
1 7092.11
67.89
00.00
76100.00
2 4
11.11
32
88.89
0
0.00
36
100.003 0
0.005
15.1528
84.8533
100.00
Total 7451.03
4329.66
2819.31
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0789 0.1111 0.1515 0.1034
Priors 0.5241 0.2483 0.2276
-
7/31/2019 Disc Rims As
31/34
Discriminant analysis using SAS macro - 31
Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC
Resubstitution Summary using Normal Kernel Density
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Total
1 76100.00
00.00
00.00
76100.00
2 00.00
36100.00
00.00
36100.00
3 00.00
00.00
33100.00
33100.00
Total 7652.41
3624.83
3322.76
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0000 0.0000 0.0000 0.0000
Priors 0.5241 0.2483 0.2276
-
7/31/2019 Disc Rims As
32/34
Discriminant analysis using SAS macro - 32
Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth
The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC
Cross-validation Results using Normal Kernel Density
Posterior Probability of Membership in group
Obs From group Classified into group 1 2 3
26 1 3 * 0.0286 0.2117 0.7597
62 1 2 * 0.2529 0.7468 0.0003
63 1 2 * 0.2854 0.7017 0.0129
70 1 2 * 0.4493 0.5504 0.0003
75 1 2 * 0.0000 0.6699 0.3301
76 1 2 * 0.3220 0.6244 0.0536
79 2 1 * 0.7210 0.2771 0.0019
96 2 1 * 0.6601 0.3384 0.0015
104 2 3 * 0.0000 0.0245 0.9755
107 2 3 * 0.0000 0.0138 0.9862
111 2 3 * 0.0000 0.0001 0.9999
112 2 3 * 0.0000 0.0840 0.9160
131 3 2 * 0.0000 0.9999 0.0001
134 3 2 * 0.0000 0.9945 0.0055
136 3 2 * 0.0000 0.9949 0.0051
* Misclassified observation
-
7/31/2019 Disc Rims As
33/34
Discriminant analysis using SAS macro - 33
Number of Observations and PercentClassified into group
Fromgroup 1 2 3 Total
1 7092.11
56.58
11.32
76100.00
2 25.56
3083.33
411.11
36100.00
3 00.00
39.09
3090.91
33100.00
Total 7249.66
3826.21
3524.14
145100.00
Priors 0.52414 0.24828 0.22759
Error Count Estimates for group
1 2 3 Total
Rate 0.0789 0.1667 0.0909 0.1034
Priors 0.5241 0.2483 0.2276
Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth
The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC
Cross-validation Summary using Normal Kernel Density
-
7/31/2019 Disc Rims As
34/34
Conclusion:
Exploratory scatter plots, stepwise discriminant analysis, checking for multivariate normality and multivariate outliers,canonical, parametric and non-parametric discriminant function analysis were performed using discrim macro and discrim.sasmacro-call file (Fernandez, 2001).The discrim macro was invoked using the parameters specified in the macro-call file discrim..sas (Fernandez, 2001).
References:
Fernandez G.C.J 2001 Discriminant analysis using SAS macro DISCRIM. In Free SAS STAT applications: IV Multivariatemethods. http://www.ag.unr.edu/gf Department of Applied Economics and Statistics, MS 204 , UNR Reno NV 89557
Suggested additional readings:
Statistics:Subhash Sharma Applied Multivariate technique1996 Wiley
SAS publications:
SAS Institute Inc., Technical report P-179 Additional SAS/STAT procedures. Cary, NC SAS Institute Inc., 1988.
SAS Institute Inc., SAS/GRAPH Software: Reference, Version 6 FirstEdition. Volumes 1 and 2, Cary, NC SAS Institute Inc., 1990.
SAS Institute Inc., SAS Guide to macro processing , Version 6 FirstEdition. Cary, NC SAS Institute Inc., 1987.
SAS Institute Inc., SAS/STAT Users Guide Version 6 Fourth Edition.Volumes 1 and 2, Cary, NC SAS Institute Inc., 1989.
IIII