Disc Rims As

download Disc Rims As

of 34

Transcript of Disc Rims As

  • 7/31/2019 Disc Rims As

    1/34

    Discriminant analysis using SAS macro - 1

    Discriminant analysis

    Data : Diabetic dataStep 1: Creating SAS- data

    Step 2 : Perform the Exploratory scatter plots by group by running the SAS macro DISCRIM and inputting YES to data exploration. Also run stepwise discriminantanalysis to select significant variables.

    Step 3: Checking for multi-variate normality and outliers by running the SAS macroDISCRIM and inputting YES to assumption check

    Step 4 Run Canonical and PARAMETRIC Discriminant Function analyses by running the SAS macro DISCRIM and leaving data exploration field blank .

    Step 5: Examine canonical discriminant analysis results and bi-plot display of discrimination.

    Step 6:If Multivariate normality assumption is severely violated, perform non-paramet ric discriminant function analysis by inputting YES to non-parametric option.

    Step 7: Check the classification results compare the error rates for cross validation

    Step 8:If you have independent validation data set, confirm the discrimination results usingthe validation data

    Step 9: Summary and Conclusions

  • 7/31/2019 Disc Rims As

    2/34

    Discriminant analysis using SAS macro - 2

    Creating SAS data set:

    If you are using the sample data, open the downloaded discrim-data.SAS file andsubmit.If you are using some other data, create a SAS data set.

    proc format;

    value gp 3='(3) Overt Diabetic ' 2='(2) Chem. Diabetic' 1='(1) Normal';

    run;

    data diabetic;

    input patient relwt glufast glutest instest sspg group $;

    label relwt = 'Relative weight'

    glufast = 'Fasting Plasma Glucose'

    glutest = 'Test Plasma Glucose'

    instest = 'Plasma Insulin during Test'

    sspg = 'Steady State Plasma Glucose'

    group = 'Clinical Group';

    datalines;

    1 0.81 80 356 124 55 1

    2 0.95 97 289 117 76 1

    3 0.94 105 319 143 105 1

    4 1.04 90 356 199 108 1

    5 1.00 90 323 240 143 1

    6 0.76 86 381 157 165 1

    7 0.91 100 350 221 119 1

    8 1.10 85 301 186 105 1

    9 0.99 97 379 142 98 1

    10 0.78 97 296 131 94 1

    11 0.90 91 353 221 53 1

    12 0.73 87 306 178 66 1

    13 0.96 78 290 136 142 1

    14 0.84 90 371 200 93 1

    15 0.74 86 312 208 68 1

    16 0.98 80 393 202 102 1

    17 1.10 90 364 152 76 1

    (complete data not shown)

    141 1.05 353 1428 41 480 3

    142 0.91 180 923 77 150 3

    143 0.90 213 1025 29 209 3144 1.11 328 1246 124 442 3

    145 0.74 346 1568 15 253 3

    ;

    PROC PRINT DATA=diabetic(obs=10) label;

    TITLE ' Discriminant analysis example';

    run;

  • 7/31/2019 Disc Rims As

    3/34

    Discriminant analysis using SAS macro - 3

    Step 2 : Perform the Exploratory scatter plots by running the SAS macro DISCRIM and inputting YES to data exploration

    < Open the macro-call window discrim by running the downloaded macro-call filediscrim.sas, enter the appropriate macro variable names, and submit.

  • 7/31/2019 Disc Rims As

    4/34

    Discriminant analysis using SAS macro - 4

    The Method for Selecting Variables is BACKWARD

    Observations 145 Variable(s) in the Analysis 5

    Class Levels 3 Variable(s) will be Included 0

    Significance Level to Stay 0.15

    Statistics for Removal, DF = 2, 138

    Variable Label

    PartialR-

    Square F Value Pr > F

    relwt Relative weight 0.0749 5.58 0.0047

    glufast Fasting Plasma Glucose 0.2781 26.58

  • 7/31/2019 Disc Rims As

    5/34

    Discriminant analysis using SAS macro - 5

    Stepwise Selection Summary

    umber

    In Entered Removed Label

    PartialR-

    Square

    F

    Value Pr > F

    Wilks'

    Lambda

    Pr

    ASCC1 glutest Test

    PlasmaGlucose

    0.7738 242.84

  • 7/31/2019 Disc Rims As

    6/34

    Discriminant analysis using SAS macro - 6

    Forward Selection Summary

    epNumber

    In Entered Label

    PartialR-

    SquareF

    Value Pr > FWilks'

    LambdaPr ASCC

    1 1 glutest Test

    PlasmaGlucose

    0.7738 242.84

  • 7/31/2019 Disc Rims As

    7/34

    Discriminant analysis using SAS macro - 7

    Selected Scatter Plots:

  • 7/31/2019 Disc Rims As

    8/34

    Discriminant analysis using SAS macro - 8

  • 7/31/2019 Disc Rims As

    9/34

    Discriminant analysis using SAS macro - 9

    Step 2 : Perform the canonical and parametric discriminant function analysis by running the SAS macro discrim and keep data exploration field Blank . Also Check for multivariatenormality and influential outliers

    < Open the macro-call window discrim by running the downloaded macro-call file discrim.sas, enterthe appropriate macro variable names, and submit.

  • 7/31/2019 Disc Rims As

    10/34

    Discriminant analysis using SAS macro - 10

    Clinical Group=1

    Obs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5

    1 0.11252 0.029294 -0.099238 1.71461 0.95699 -0.75959 0.14815 -0.71406 5.43254 0.55298

    Clinical Group=2

    bs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5

    2 -0.34755 -0.23682 0.70585 1.24705 -0.65609 -0.72631 -0.39443 0.037414 1.16893 -0.35500

    Clinical Group=3

    bs s1 s2 s3 s4 s5 r1 r2 r3 r4 r5

    3 0.021434 0.37220 0.13407 2.11185 -0.11574 -0.88786 -1.25087 -1.16310 5.82761 -0.44997

    Multivariate normality test statistics

    M_SKEW

    8.433 Multivariate skewness

    CHI_SKEW

    203.798 skewness chisquare

    PVALSKEW

    0.000 skewness P-value

    M_KURT

    52.853 Multivariate kurtosis

    Z_KURT

    12.847 kurtosis z-value

    PVALKURT

    0.000 kurtosis P-value

  • 7/31/2019 Disc Rims As

    11/34

    Discriminant analysis using SAS macro - 11

    id rdsq chisq diff

    86 46.5157 17.6296 28.8860

    144 30.2647 15.0041 15.2606

    131 25.7032 13.7552 11.9480

    141 24.0769 12.9202 11.1567145 23.2948 12.2891 11.0057

    139 22.3210 11.7799 10.5411

    134 19.3147 11.3522 7.9625

    116 16.3652 10.9827 5.3825

    93 15.8799 10.6570 5.2229

    82 14.6057 10.3655 4.2402

    133 13.9789 10.1014 3.8775

    137 13.6007 9.8597 3.7410

    69 12.9646 9.6367 3.3279

    89 12.1243 9.2364 2.8880

    113 12.2535 9.4297 2.8238

    136 11.5431 8.8837 2.6593

    124 11.3769 8.7218 2.6552

    99 11.2212 8.5680 2.6532

    114 11.6813 9.0548 2.6264

  • 7/31/2019 Disc Rims As

    12/34

    Discriminant analysis using SAS macro - 12

  • 7/31/2019 Disc Rims As

    13/34

    Discriminant analysis using SAS macro - 13

  • 7/31/2019 Disc Rims As

    14/34

    Discriminant analysis using SAS macro - 14

    Observations 145 DF Total 144

    Variables 5 DF Within Classes 142

    Classes 3 DF Between Classes 2

    Class Level Information

    group

    VariableName

    Frequency Weight Proportion

    PriorProbabilit

    y

    1 _1 76 76.0000 0.524138 0.524138

    2 _2 36 36.0000 0.248276 0.248276

    3 _3 33 33.0000 0.227586 0.227586

    Total-Sample

    Variable Label N Sum Mean Variance

    StandardDeviatio

    n

    relwt Relative weight 145 141.71000 0.97731 0.01670 0.1292

    glufast Fasting Plasma Glucose 145 17688 121.98621 4087 63.9304

    glutest Test Plasma Glucose 145 78824 543.61379 100458 316.9509

    instest Plasma Insulin during Test 145 26987 186.11724 14625 120.9352

    sspg Steady State Plasma Glucose 145 26710 184.20690 11242 106.0299

    II. Canonical discriminant analysis

  • 7/31/2019 Disc Rims As

    15/34

    Discriminant analysis using SAS macro - 15

    group = 1

    Variable Label N Sum Mean Variance

    StandardDeviatio

    n

    relwt Relative weight 76 71.23000 0.93724 0.01652 0.1285

    glufast Fasting Plasma Glucose 76 6930 91.18421 67.69895 8.2279

    glutest Test Plasma Glucose 76 26598 349.97368 1359 36.8706instest Plasma Insulin during Test 76 13121 172.64474 4741 68.8538

    sspg Steady State Plasma Glucose 76 8664 114.00000 3310 57.5328

    group = 2

    Variable Label N Sum Mean Variance

    StandardDeviatio

    n

    relwt Relative weight 36 38.01000 1.05583 0.01021 0.1010glufast Fasting Plasma Glucose 36 3575 99.30556 90.04683 9.4893

    glutest Test Plasma Glucose 36 17782 493.94444 3070 55.4117

    instest Plasma Insulin during Test 36 10368 288.00000 24911 157.8317

    sspg Steady State Plasma Glucose 36 7523 208.97222 3595 59.9593

    group = 3

    Variable Label N Sum Mean Variance

    StandardDeviatio

    n

    relwt Relative weight 33 32.47000 0.98394 0.01447 0.1203

    glufast Fasting Plasma Glucose 33 7183 217.66667 5862 76.5632

    glutest Test Plasma Glucose 33 34444 1044 95725 309.3953

    instest Plasma Insulin during Test 33 3498 106.00000 8728 93.4251

    sspg Steady State Plasma Glucose 33 10523 318.87879 7801 88.3221

  • 7/31/2019 Disc Rims As

    16/34

    Discriminant analysis using SAS macro - 16

    The DISCRIM ProcedureTest of Homogeneity of Within Covariance Matrices

    Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function. Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.

    Chi-Square DF Pr > ChiSq

    396.799635 30

  • 7/31/2019 Disc Rims As

    17/34

  • 7/31/2019 Disc Rims As

    18/34

    Discriminant analysis using SAS macro - 18

    Eigenvalues of Inv(E)*H= CanRsq/(1-CanRsq)

    anonicalrrelation

    AdjustedCanonical

    Correlation

    ApproximateStandard

    Error

    SquaredCanonical

    Correlation Eigenvalue Difference Proportion Cumulative

    0.909389 0.906426 0.014418 0.826988 4.7799 4.1355 0.8812 0.8812

    0.625998 0.618252 0.050677 0.391874 0.6444 0.1188 1.0000

    Test of H0: The canonical correlations in the current row and all that follow are zero

    LikelihoodRatio

    ApproximateF Value Num DF Den DF Pr > F

    1 0.10521315 57.49 10 276

  • 7/31/2019 Disc Rims As

    19/34

    Discriminant analysis using SAS macro - 19

  • 7/31/2019 Disc Rims As

    20/34

    Discriminant analysis using SAS macro - 20

  • 7/31/2019 Disc Rims As

    21/34

    Discriminant analysis using SAS macro - 21

  • 7/31/2019 Disc Rims As

    22/34

    Discriminant analysis using SAS macro - 22

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC Resubstitution Summary using Quadratic Discriminant Function

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Total

    1 7598.68

    11.32

    00.00

    76100.00

    2 38.33

    3391.67

    00.00

    36100.00

    3 00.00

    39.09

    3090.91

    33100.00

    Total 7853.79

    3725.52

    3020.69

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0132 0.0833 0.0909 0.0483

    Priors 0.5241 0.2483 0.2276

    Parametric Discriminant Function analysis

  • 7/31/2019 Disc Rims As

    23/34

    Discriminant analysis using SAS macro - 23

    The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC

    Cross-validation Results using Quadratic Discriminant Function

    Posterior Probability of Membership in group

    Obs From group Classified into group 1 2 3

    26 1 2 * 0.2237 0.5779 0.1984

    63 1 2 * 0.3550 0.6380 0.0070

    64 1 2 * 0.3708 0.6067 0.0225

    75 1 2 * 0.0000 0.9998 0.0002

    79 2 1 * 0.6305 0.3624 0.0071

    83 2 1* 0.5895 0.4035 0.0070

    95 2 3 * 0.0000 0.4623 0.5377

    96 2 1 * 0.5005 0.4485 0.0510

    107 2 3 * 0.0000 0.1789 0.8211

    110 2 1 * 0.8438 0.1498 0.0063

    111 2 3 * 0.0000 0.0872 0.9128

    131 3 2 * 0.0000 1.0000 0.0000

    134 3 2 * 0.0000 0.9777 0.0223

    136 3 2* 0.0000 0.9328 0.0672

    * Misclassified observation

  • 7/31/2019 Disc Rims As

    24/34

    Discriminant analysis using SAS macro - 24

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC

    Cross-validation Summary using Quadratic Discriminant Function

    Number of Observations and Percent Classified into group

    From group 1 2 3 Total

    1 7294.74

    45.26

    00.00

    76100.00

    2 411.11

    2980.56

    38.33

    36100.00

    3 00.00

    39.09

    3090.91

    33100.00

    Total 7652.41

    3624.83

    3322.76

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0526 0.1944 0.0909 0.0966

    Priors 0.5241 0.2483 0.2276

  • 7/31/2019 Disc Rims As

    25/34

    Discriminant analysis using SAS macro - 25

    Non-parametric Discriminant Function analysis Perform non- parametric discriminant function analysis by running the SAS macro discrim, inputYES to the Non-par field and keep data exploration field Blank .

  • 7/31/2019 Disc Rims As

    26/34

    Discriminant analysis using SAS macro - 26

    Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance

    The DISCRIM Procedure

    Classification Summary for Calibration Data: WORK.DIABETIC Resubstitution Summary using 2 Nearest Neighbors

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Other Total

    1 7294.74 00.00 00.00 45.26 76100.00

    2 00.00

    2980.56

    00.00

    719.44

    36100.00

    3 00.00

    00.00

    2884.85

    515.15

    33100.00

    Total 7249.66

    2920.00

    2819.31

    1611.03

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group1 2 3 Total

    Rate 0.0526 0.1944 0.1515 0.1103

    Priors 0.5241 0.2483 0.2276

  • 7/31/2019 Disc Rims As

    27/34

    Discriminant analysis using SAS macro - 27

    Posterior Probability of Membership in group

    Obs From group Classified into group 1 2 364 1 2 * 0.0000 1.0000 0.0000

    124 3 2 * 0.0000 1.0000 0.0000

    131 3 2 * 0.0000 1.0000 0.0000

    134 3 2 * 0.0000 1.0000 0.0000

    136 3 2 * 0.0000 1.0000 0.0000

    * Misclassified observation

    Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance

    The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC

    Cross-validation Results using 2 Nearest Neighbors

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC

    Cross-validation Summary using 2 Nearest Neighbors

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Total

    1 7598.68

    11.32

    00.00

    76100.00

    2 00.00

    36100.00

    00.00

    36100.00

    3 00.00

    412.12

    2987.88

    33100.00

    Total 7551.72

    4128.28

    2920.00

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0132 0.0000 0.1212 0.0345

    Priors 0.5241 0.2483 0.2276

  • 7/31/2019 Disc Rims As

    28/34

  • 7/31/2019 Disc Rims As

    29/34

    Discriminant analysis using SAS macro - 29

    Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance

    The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC

    Cross-validation Results using 3 Nearest Neighbors

    Posterior Probability of Membership in groupObs From group Classified into group 1 2 3

    62 1 2 * 0.3363 0.6637 0.0000

    63 1 2 * 0.3363 0.6637 0.0000

    64 1 2 * 0.0000 1.0000 0.0000

    66 1 2 * 0.3363 0.6637 0.0000

    69 1 2 * 0.3363 0.6637 0.0000

    75 1 2 * 0.3363 0.6637 0.0000

    84 2 1 * 0.6604 0.3396 0.000085 2 1 * 0.6604 0.3396 0.0000

    96 2 1 * 0.6604 0.3396 0.0000

    105 2 1 * 0.6604 0.3396 0.0000

    124 3 2 * 0.0000 1.0000 0.0000

    131 3 2 * 0.0000 1.0000 0.0000

    134 3 2 * 0.0000 1.0000 0.0000

    135 3 2 * 0.0000 0.6598 0.3402

    136 3 2 * 0.0000 1.0000 0.0000

    * Misclassified observation

  • 7/31/2019 Disc Rims As

    30/34

    Discriminant analysis using SAS macro - 30

    Non-parametric Discriminant analysisNearest -neighbour method Mahalanobis distance based on pooled covariance

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC

    Cross-validation Summary using 3 Nearest Neighbors

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Total

    1 7092.11

    67.89

    00.00

    76100.00

    2 4

    11.11

    32

    88.89

    0

    0.00

    36

    100.003 0

    0.005

    15.1528

    84.8533

    100.00

    Total 7451.03

    4329.66

    2819.31

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0789 0.1111 0.1515 0.1034

    Priors 0.5241 0.2483 0.2276

  • 7/31/2019 Disc Rims As

    31/34

    Discriminant analysis using SAS macro - 31

    Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC

    Resubstitution Summary using Normal Kernel Density

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Total

    1 76100.00

    00.00

    00.00

    76100.00

    2 00.00

    36100.00

    00.00

    36100.00

    3 00.00

    00.00

    33100.00

    33100.00

    Total 7652.41

    3624.83

    3322.76

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0000 0.0000 0.0000 0.0000

    Priors 0.5241 0.2483 0.2276

  • 7/31/2019 Disc Rims As

    32/34

    Discriminant analysis using SAS macro - 32

    Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth

    The DISCRIM ProcedureClassification Results for Calibration Data: WORK.DIABETIC

    Cross-validation Results using Normal Kernel Density

    Posterior Probability of Membership in group

    Obs From group Classified into group 1 2 3

    26 1 3 * 0.0286 0.2117 0.7597

    62 1 2 * 0.2529 0.7468 0.0003

    63 1 2 * 0.2854 0.7017 0.0129

    70 1 2 * 0.4493 0.5504 0.0003

    75 1 2 * 0.0000 0.6699 0.3301

    76 1 2 * 0.3220 0.6244 0.0536

    79 2 1 * 0.7210 0.2771 0.0019

    96 2 1 * 0.6601 0.3384 0.0015

    104 2 3 * 0.0000 0.0245 0.9755

    107 2 3 * 0.0000 0.0138 0.9862

    111 2 3 * 0.0000 0.0001 0.9999

    112 2 3 * 0.0000 0.0840 0.9160

    131 3 2 * 0.0000 0.9999 0.0001

    134 3 2 * 0.0000 0.9945 0.0055

    136 3 2 * 0.0000 0.9949 0.0051

    * Misclassified observation

  • 7/31/2019 Disc Rims As

    33/34

    Discriminant analysis using SAS macro - 33

    Number of Observations and PercentClassified into group

    Fromgroup 1 2 3 Total

    1 7092.11

    56.58

    11.32

    76100.00

    2 25.56

    3083.33

    411.11

    36100.00

    3 00.00

    39.09

    3090.91

    33100.00

    Total 7249.66

    3826.21

    3524.14

    145100.00

    Priors 0.52414 0.24828 0.22759

    Error Count Estimates for group

    1 2 3 Total

    Rate 0.0789 0.1667 0.0909 0.1034

    Priors 0.5241 0.2483 0.2276

    Non-parametric Discriminant analysisUsing Kernal desnsity estimates with unequal bandwidth

    The DISCRIM ProcedureClassification Summary for Calibration Data: WORK.DIABETIC

    Cross-validation Summary using Normal Kernel Density

  • 7/31/2019 Disc Rims As

    34/34

    Conclusion:

    Exploratory scatter plots, stepwise discriminant analysis, checking for multivariate normality and multivariate outliers,canonical, parametric and non-parametric discriminant function analysis were performed using discrim macro and discrim.sasmacro-call file (Fernandez, 2001).The discrim macro was invoked using the parameters specified in the macro-call file discrim..sas (Fernandez, 2001).

    References:

    Fernandez G.C.J 2001 Discriminant analysis using SAS macro DISCRIM. In Free SAS STAT applications: IV Multivariatemethods. http://www.ag.unr.edu/gf Department of Applied Economics and Statistics, MS 204 , UNR Reno NV 89557

    Suggested additional readings:

    Statistics:Subhash Sharma Applied Multivariate technique1996 Wiley

    SAS publications:

    SAS Institute Inc., Technical report P-179 Additional SAS/STAT procedures. Cary, NC SAS Institute Inc., 1988.

    SAS Institute Inc., SAS/GRAPH Software: Reference, Version 6 FirstEdition. Volumes 1 and 2, Cary, NC SAS Institute Inc., 1990.

    SAS Institute Inc., SAS Guide to macro processing , Version 6 FirstEdition. Cary, NC SAS Institute Inc., 1987.

    SAS Institute Inc., SAS/STAT Users Guide Version 6 Fourth Edition.Volumes 1 and 2, Cary, NC SAS Institute Inc., 1989.

    IIII