Multiple Discriminant Analysis
Multiple Discriminant Analysis Dependent variable will have more than two values Amount spent on family vacation can be High,
medium or low – thus it is a three-group discriminant analysis
Question of interest is whether the households that spend high, medium or low amounts on their vacations can be differentiated in terms of Annual family income Attitude towards travel Importance attached to family vacation Household size & Age of the head of the household
Group Means
Amount Income Travel Vacation Hsize Age
1 38.57 4.5 4.7 3.1 50.3
2 50.11 4 4.2 3.4 49.5
3 64.97 6.1 5.9 4.2 56
Total 51.22 4.87 4.93 3.57 51.93
Group means indicate that income appears to differentiate the 3 groups more widely than any other variable. There is some differentiation on travel and vacation, with group 3 being fairly high on both. Group 1 & 2 are very close on household size and age. Age has a large standard deviation relative to the separation between the groups.
Group Standard Deviations
Amount Income Travel Vacation Hsize Age
1 5.3 1.72 1.89 1.2 8.1
2 6 2.36 2.49 1.51 9.25
3 8.61 1.2 1.66 1.14 7.6
Total 12.8 1.98 2.1 1.33 8.57
Pooled within-groups correlation matrix
Income Travel Vacation Hsize Age
Income 1
Travel 0.0512 1
Vacation 0.3068 0.036 1
Hsize 0.3805 0.005 0.2208 1
Age -0.209 -0.34 -0.01326 -0.02512 1
There is some correlation between Hsize & Income ; Vacation & Income. Age has some –ve correlation with travel. But these correlations are not very high and hence will not be of concern.
Wilks' Lambda and Univariate F ratio with 2 & 27 degrees of freedom
VariableWilks'
Lambda FSignificanc
e
Income 0.26 38 0
Travel 0.79 3.63 0.04
Vacation 0.88 1.83 0.18
Hsize 0.87 1.94 0.16
Age 0.88 1.8 0.18
Univariate F ratios indicates that when the predictors are considered individually, only income and travel are significant in differentiating between the two groups.
Number of discriminant functions
In multiple discriminant analysis, if there are G groups, G-1 discriminant functions can be estimated if the number of predictors is larger than this quantity
Thus with G groups and k predictors, it is possible to estimate up to the smaller of G-1 or k discriminant functions
The first function has the highest ratio of between-groups to within-groups sum of squares The second function, uncorrelated with the first has the second
highest ratio and so on It is not necessary that all the functions may be statistically
significant
Canonical Discriminant Functions
FunctionEigenValu
ePercent of
VarianceCumulative
PercentCanonical
Correlation
1 3.82 93.93 93.93 0.89
2 0.25 6.07 100 0.45
Since there are G=3 groups & k=5 predictor variables, the number of discriminant functions will be min(G-1,k)=min(2,5)=2
Eigenvalue associated with the first function is 3.82 & it explains 93.93% of the explained variance. Since it has a large Eigenvalue, function 1 will be superior
After Function Wilks ג Chi-square DF Sig.
0 0.17 44.83 10 0
1 0.8 5.52 4 0.24
After Function 0 – indicates the significance of the two functions together, whereas Function 1 – indicates only function 2 after removal of Function 1
Thus, the two functions together significantly differentiate between the three groups. However, when the first function is removed, the second function is not significant at the 0.05 level. Therefore, the second function does not contribute significantly to the group differences
Standard Canonical Discriminant Function Coefficients
Func1 Func2
Income 1.0474 -0.42076
Travel 0.33991 0.76851
Vacation -0.14198 0.53354
Hsize -0.16317 0.12932
Age 0.49474 0.52447
Pooled within-groups correlations
Func1 Func2
Income 0.85556 -0.27833
Hsize 0.19319 0.07749
Vacation 0.21935 0.58829
Travel 0.14899 0.45362
Age 0.16576 0.34079
Standardised coefficients indicate a large coefficient for income on func1, whereas travel, vacation and age have a large coefficient on func2
Similarly the correlation matrix indicates that income and hsize have higher correlation on func1 compared to func2.
Vacation, travel and age have higher correlation on func2 compared to func1
Group Centroids
Groups Func1 Func2
1 -2.041 0.41847
2 -0.40479 -0.65867
3 2.44578 0.2402
Group 3 has the highest value on function 1 and since function1 is primarily associated with income and hsize, group 3 will have people with higher income and higher household size.
Group 1 is highest on function 2 and Group 2 is lowest. Thus, this function separates these two groups. Since the function is primarily associated with travel, vacation and age, group 1 will be higher than group 2 on these variables
Unstandard Canonical Discriminant Function Coefficients
Func1 Func2
Income 0.15427 -0.06197
Travel 0.18680 0.42234
Vacation -0.06952 0.26127
Hsize -0.12653 0.10028
Age 0.05928 0.06284
Constant -11.09442 -3.79160
Thus the 2 equations will be
Funct1= -11.09422+.15427*Income+.18680*Travel-.06952*Vacation-.12653*Hsize+.05928*Age
Funct2=-3.79160-.06197*Income+.42234*Travel+.26127*Vacation+.10028*Hsize+.06284*Age
Analysis SampleAmount 1 2 3 Total
1 9 1 0 102 1 9 0 103 0 2 8 10
Total 10 12 8 301 90 10 02 10 90 03 0 20 80
Hit Ratio 86.70%
Holdout sampleAmount 1 2 3 Total
1 3 1 0 42 0 3 1 43 1 0 3 4
Total 4 4 4 121 75 25 02 0 75 253 25 0 75
Hit Ratio 75%
Predicted Group Membership
Original
Count
%
%
Count
Original
Predicted Group Membership
Three groups of equal size, so by chance one would expect a hit ratio of 1/3 =33.3%. Thus there is large improvement over chance, thus validating the discriminant
Example…1
A recent survey asked business people about the concern of hiring and maintaining employees during the current harsh economic environment
If an organisation wants to retain its employees, it must learn why people leave their jobs and why others stay and are satisfied with their jobs
Discriminant analysis was used to determine what factors explained the differences between salespeople who left a large computer manufacturing company and those who stayed
Example…2 Independent variables were
Company rating Job security Seven job satisfaction dimensions Four role-conflict dimensions Four role-ambiguity dimensions Nine measures of sales performance
Dependent variable was dichotomous – Those who stayed and those who left
The canonical correlation, an index of discrimination (R=0.4572), was significant (p =.0180)
Results indicated that the variables discriminated between those who left and those who stayed
Discriminant Analysis ResultsCoefficients Standardised Coefficients Structure Correlations
1 Work 0.0903 0.391 0.54462 Promotion 0.0288 0.1515 0.50443 Job Security 0.1567 0.1384 0.49584 Customer Relations 0.0086 0.1751 0.49065 Company Rating 0.4059 0.324 0.48246 Working with others7 Overall performance8 Time-territory management9 Sales produced
10 Presentation skill11 Technical Information12 Pay-benefits13 Quota achieved14 Management15 Information collection16 Family17 Sales manager18 Coworker19 Customer 20 Family21 Job 22 Job23 Customer24 Sales manager25 Sales manager26 Customer
Characteristic profile…1
In the example, based on structure correlations, Promotion was identified as the second most important variable.
However, looking at standardised discriminant functions, Promotion is not the second most important variable
The anamoly arises because of multi-collinearity In such cases, develop a Characteristic Profile for
each group By describing each group in terms of the group means for
the predictor variables
Characteristic profile…2
Promotion Company RatingThose who stayed 4.5 4Those who left 2.3 3.83Overall 3.42 3.92
Clearly promotion is more discriminating the two groups than company rating. Those who stayed with the company are satisfied with the promotions.
Discriminant Analysis using SPSS Analyse>Classify>Discriminant
Select Analyse from the SPSS menu bar Click Classify and then Discriminant Move criterion variable into the Grouping Variable box
‘Taken Vacation’ in the 1st example; ‘Amt spent on vacation’ in the 2nd example Click Define Range.
Enter 1- Taken vacation in last 2 years & 2 – the rest Enter 1- Low spenders, 2- Medium spenders, 3- High spenders
Move predictor variables to the Independents box Move ‘Income’, ‘Travel’, ‘Vacation’, ‘Hsize’ and ‘Age’ into the Independents box
Select Enter Independents Together (default option) Click on Statistics. In the pop-up window, in the Descriptives box check Means
and Univariate ANOVAS. In the Matrices box check Within Group Correlations. Click Continue
Click Classify. In the Display box check Summary Table. In the Use Covariance Matrix box check Within Groups. Click Continue
Click OK.
Classroom Problem…1 Data on Nike was obtained from 45 respondents. Which of the
independent variables discriminate between the 2 types of users of Nike? Dependent variable
2 types of users of Nike 1- Not so Heavy users 2- Heavy users
Independent variables Gender
1 – Females 2 – Males
Awareness Attitude Preference Intention & Loyalty
All these are measured on a 7 point scale where 1- very unfavorable & 7 – very favorable
Classroom Problem…2 Data on Nike was obtained from 45 respondents. Which of the independent
variables discriminate between the 3 types of users of Nike? Dependent variable
2 types of users of Nike 1- Light users 2- Medium users 3- Heavy users
Independent variables Gender
1 – Females 2 – Males
Awareness Attitude Preference Intention & Loyalty
All these are measured on a 7 point scale where 1- very unfavorable & 7 – very favorable
Thank you
Top Related