analysis sample hold-out sample - Cleveland State...
Transcript of analysis sample hold-out sample - Cleveland State...
![Page 1: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/1.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 2: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/2.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 3: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/3.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 4: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/4.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 5: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/5.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 6: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/6.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 7: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/7.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 8: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/8.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 9: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/9.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 10: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/10.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 11: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/11.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 12: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/12.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 13: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/13.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 14: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/14.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 15: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/15.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 16: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/16.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 17: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/17.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 18: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/18.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 19: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/19.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 20: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/20.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 21: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/21.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 22: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/22.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 23: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/23.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 24: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/24.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 25: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/25.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 26: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/26.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 27: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/27.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 28: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/28.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 29: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/29.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 30: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/30.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 31: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/31.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 32: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/32.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 33: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/33.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 34: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/34.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 35: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/35.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 36: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/36.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 37: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/37.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 38: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/38.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 39: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/39.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 40: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/40.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 41: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/41.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 42: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/42.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 43: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/43.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 44: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/44.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 45: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/45.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 46: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/46.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 47: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/47.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 48: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/48.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 49: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/49.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 50: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/50.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 51: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/51.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 52: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/52.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 53: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/53.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 54: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/54.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 55: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/55.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 56: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/56.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 57: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/57.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 58: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/58.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 59: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/59.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 60: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/60.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 61: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/61.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 62: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/62.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 63: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/63.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 64: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/64.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 65: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/65.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 66: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/66.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 67: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/67.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 68: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/68.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 69: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/69.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 70: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/70.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 71: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/71.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 72: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/72.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 73: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/73.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 74: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/74.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 75: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/75.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 76: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/76.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08
![Page 77: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/77.jpg)
1
NeuendorfDiscriminant Analysis
Assumptions:
1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
2. Linearity--in relationships among variables--discriminant functions are linear constructions of theIVs that best differentiate among the DV groups. The number of DFs that may be derived is c-1(where c=# of categories on the DV) or k (where k=# of IVs), whichever is smaller.
3. Univariate and multivariate normal distributions
4. Little or no multicollinearity. However, SPSS will not assess this in the Discriminant procedure;we can run Multiple Regression to at least get the tolerances. . .
5. Homogeneity of variances/covariances (for the different DV groups). . . Box's M tests theassumption of homogeneity of variances/covariances of the DV groups. Based on thedeterminants of the group variance/covariance matrices, Box’s M uses an F transformation. Asignificant F indicates substantial group differences, showing heterogeneity ofvariances/covariances, a type of heteroscedasticity (which we do not want).
Decisions to make:
1. Simultaneous/Forced entry (“Enter independents together,” in SPSS-ese) vs. stepwise entry of IVs
2. Use (or not) of a hold-out sample for validation of the discriminant function. This is a split halvestest, where a portion of the cases are randomly assigned to an analysis sample for purposes ofderiving the discriminant function(s), and then the function(s) are validated by assessing theirperformance with the remaining cases in the hold-out sample.
Statistics:
1. Standardized canonical discriminant coefficients/weights--like regression betas, they indicate therelative contribution of each IV to each DF (discriminant function) (in Standardized CanonicalDiscriminant Function Coefficients table in SPSS). Each "ß" below:
DF1 = ß1Xz1 + ß2Xz2 + ß3Xz3 . . . DF2 = ß4Xz1 + ß5Xz2 + ß6Xz3 . . . etc. :
2. Structure coefficients/discriminant “loadings” (in SPSS’s Structure Matrix)--simple r's betweeneach IV and a DF. Viewed by many as a better way to interpret a DF, since the discriminantcoefficients are partials and these loadings are not. [NOTE: The term “loading” may have aslightly different meaning across different statistical procedures and across stat books.]
3. Unstandardized discriminant coefficients/weights (in Canonical Discriminant Function
![Page 78: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/78.jpg)
2
Coefficients table in SPSS)--they allow the calculation of discriminant scores for individuals, inorder to conduct classification analysis. Each "b" below:
DF1 = a1 + b1X1 + b2X2 + b3X3 . . . DF2 = a2 + b4X1 + b5X2 + b6X3 . . . etc. ::Strangely, Hair et al. call the calculated DF1, DF2, etc., scores “Discriminant Z scores,” whichseems to invite confusion with simple standardized scores (z-scores).
4. An eigenvalue for each DF--the eigenvalue has no absolute meaning (much like in factoranalysis). As Klecka says, “they cannot be interpreted directly.” Each eigenvalue is a relativemeasure of how much of the total discriminating power a DF has. Examining the eigenvalues tellsus the relative strength of each DF. For example, from Klecka:
DF Eigenvalue Relative %1 9.66 85.5%2 1.58 14.03 .05 0.5
5. Wilks' lambda (Λ)--assesses the statistical significance of each DF, based on eigenvalues. It is amultivariate measure of group differences over several IVs. Rather than testing a DF itself,lambda examines the residual discrimination in the system prior to deriving that function(Klecka). Λ is interpretable as an inverse measure of how much discrimination there is among thegroups (i.e., how much the groups differ on the pool of IVs). As DFs are derived, the lambdatypically starts small and gets bigger.
R Λ = 1 no discrimination among groupsA N GE Λ = 0 great discrimination among groups
One formula for lambda: q
Λ = Π 1 i=k+1 1 + eigeni
where Π is like Σ, only with multiplication instead of addition, and q=# of DFstotal, k=# of DFs derived at that point
So, from the e.g., in #4 above:
1 1 1 Λ = 1+9.66 X 1+1.58 X 1+.05 = .035 for NO DFs DERIVED YET
![Page 79: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/79.jpg)
3
Thus, with Λ = .035, there's a lot of discrimination left to capture by the deriving of DF(s). If we continue, we find:
Wilks’ Chi-squareDFs derived “Test of Function(s)” lambda (Λ) (χ2) df Sig. (p)
0 1 through 3 .035 43.76 18 .0011 2 through 3 .368 13.00 10 .2242 3 .949 0.68 4 .954
How many DFs are significant? [Answer: 1] NOTE: SPSS uses the column titled “Testof Function(s).” Interpret this as “Test of the significance of aggregate group differencesfor the pool of IVs prior to deriving this/these DF(s).”
6. Canonical correlation coefficient--Another way to judge the substantive utility of a DF. Each DFhas a CC with the DV (treated here as a collection of c-1 dummies). CC2 = coefficient ofdetermination (shared variance), as always. Here, the shared variance is between the individualDF and the set of dummies representing the DV groups.
7. Group centroids--the means of the DFs are reported for each of the DV groups. This is central todiscriminant analysis, yet sometimes overlooked in writeups. It tells us how the groups differ onthe function(s) that have been derived for that very purpose. We can look at these centroidsgraphically in SPSS’s territorial maps, which plot the centroids in the first two dimensions, i.e.,the first two Dfs.
8. The territorial map–the optimal cutting scores are shown visually for 2 DFs at a time in aterritorial map. With this SPSS output component, you can plot the position of any givenindividual case for the 2 DFs, and see which group that individual is predicted to be in.
8. Classification matrix (found in SPSS’s “Classification Results”)--a chart shows predicted groupmembership against actual group membership. We hope for large values on the diagonal, andsmall values on the off-diagonal. We also hope for a high "percent. . . correctly classified." Thepattern shown in the matrix can be assessed with two different statistics--tau and Press' Q. Neitheris provided in the SPSS output; each must be calculated by hand (but neither is very difficult).
9. Tau--very much like a special form of a χ2, it tests whether a given classification analysisimproves one's prediction to groups over chance.
Tau = ncor - Σpini n - Σpini
where:ncor = # of cases correctly classifiedn = # of casespi = chance probability of membership in each group (e.g., .25 for each of 4 groups)ni = # of cases in that groupi = each group
This test for "classification errors" is interpreted as the proportion fewer errors obtained bythe classification analysis than what would be expected by chance (see Klecka p. 51 for
![Page 80: analysis sample hold-out sample - Cleveland State …academic.csuohio.edu/kneuendorf/c53108/hand29.pdf · test, where a portion of the cases are randomly assigned to an analysis sample](https://reader036.fdocuments.us/reader036/viewer/2022062909/5b0263867f8b9af1148f98cf/html5/thumbnails/80.jpg)
4
more info.)
10. Press' Q--a counterpart to tau, its calculation is shown on pp. 303-304 of Hair, and below. Using achi-square table, using 1 degree of freedom, one can actually get a significance test for thedifference from chance.
Press’s Q = [N - (nK)]2
N(K - 1)where
N = total sample sizen = number of observations correctly classifiedK = number of groups
11. Fisher’s linear discriminant functions (i.e., classification functions)–not to be confused with theDFs. These are contained in the “Classification Function Coefficients” table in SPSS, and providea handy-dandy method of placing a “new” case in its predicted DV group without running datathrough SPSS. A new case’s values for the IVs may be inserted in the functions and a score iscalculated for each function for that case. The case is then classified into the group for which ithas the highest classification score. Practical, rather than informative of relationships amongvariables.
References
Klecka, W. R. (1980). Discriminant analysis. Beverly Hills, CA: Sage Publications.
3/08