Download - Discriminant analysis

Multiple Discriminant Analysis

Multiple Discriminant Analysis Dependent variable will have more than two values Amount spent on family vacation can be High,

medium or low – thus it is a three-group discriminant analysis

Question of interest is whether the households that spend high, medium or low amounts on their vacations can be differentiated in terms of Annual family income Attitude towards travel Importance attached to family vacation Household size & Age of the head of the household

Group Means

Amount Income Travel Vacation Hsize Age

1 38.57 4.5 4.7 3.1 50.3

2 50.11 4 4.2 3.4 49.5

3 64.97 6.1 5.9 4.2 56

Total 51.22 4.87 4.93 3.57 51.93

Group means indicate that income appears to differentiate the 3 groups more widely than any other variable. There is some differentiation on travel and vacation, with group 3 being fairly high on both. Group 1 & 2 are very close on household size and age. Age has a large standard deviation relative to the separation between the groups.

Group Standard Deviations

Amount Income Travel Vacation Hsize Age

1 5.3 1.72 1.89 1.2 8.1

2 6 2.36 2.49 1.51 9.25

3 8.61 1.2 1.66 1.14 7.6

Total 12.8 1.98 2.1 1.33 8.57

Pooled within-groups correlation matrix

Income Travel Vacation Hsize Age

Income 1

Travel 0.0512 1

Vacation 0.3068 0.036 1

Hsize 0.3805 0.005 0.2208 1

Age -0.209 -0.34 -0.01326 -0.02512 1

There is some correlation between Hsize & Income ; Vacation & Income. Age has some –ve correlation with travel. But these correlations are not very high and hence will not be of concern.

Wilks' Lambda and Univariate F ratio with 2 & 27 degrees of freedom

VariableWilks'

Lambda FSignificanc

e

Income 0.26 38 0

Travel 0.79 3.63 0.04

Vacation 0.88 1.83 0.18

Hsize 0.87 1.94 0.16

Age 0.88 1.8 0.18

Univariate F ratios indicates that when the predictors are considered individually, only income and travel are significant in differentiating between the two groups.

Number of discriminant functions

In multiple discriminant analysis, if there are G groups, G-1 discriminant functions can be estimated if the number of predictors is larger than this quantity

Thus with G groups and k predictors, it is possible to estimate up to the smaller of G-1 or k discriminant functions

The first function has the highest ratio of between-groups to within-groups sum of squares The second function, uncorrelated with the first has the second

highest ratio and so on It is not necessary that all the functions may be statistically

significant

Canonical Discriminant Functions

FunctionEigenValu

ePercent of

VarianceCumulative

PercentCanonical

Correlation

1 3.82 93.93 93.93 0.89

2 0.25 6.07 100 0.45

Since there are G=3 groups & k=5 predictor variables, the number of discriminant functions will be min(G-1,k)=min(2,5)=2

Eigenvalue associated with the first function is 3.82 & it explains 93.93% of the explained variance. Since it has a large Eigenvalue, function 1 will be superior

After Function Wilks ג Chi-square DF Sig.

0 0.17 44.83 10 0

1 0.8 5.52 4 0.24

After Function 0 – indicates the significance of the two functions together, whereas Function 1 – indicates only function 2 after removal of Function 1

Thus, the two functions together significantly differentiate between the three groups. However, when the first function is removed, the second function is not significant at the 0.05 level. Therefore, the second function does not contribute significantly to the group differences

Standard Canonical Discriminant Function Coefficients

Func1 Func2

Income 1.0474 -0.42076

Travel 0.33991 0.76851

Vacation -0.14198 0.53354

Hsize -0.16317 0.12932

Age 0.49474 0.52447

Pooled within-groups correlations

Func1 Func2

Income 0.85556 -0.27833

Hsize 0.19319 0.07749

Vacation 0.21935 0.58829

Travel 0.14899 0.45362

Age 0.16576 0.34079

Standardised coefficients indicate a large coefficient for income on func1, whereas travel, vacation and age have a large coefficient on func2

Similarly the correlation matrix indicates that income and hsize have higher correlation on func1 compared to func2.

Vacation, travel and age have higher correlation on func2 compared to func1

Group Centroids

Groups Func1 Func2

1 -2.041 0.41847

2 -0.40479 -0.65867

3 2.44578 0.2402

Group 3 has the highest value on function 1 and since function1 is primarily associated with income and hsize, group 3 will have people with higher income and higher household size.

Group 1 is highest on function 2 and Group 2 is lowest. Thus, this function separates these two groups. Since the function is primarily associated with travel, vacation and age, group 1 will be higher than group 2 on these variables

Unstandard Canonical Discriminant Function Coefficients

Func1 Func2

Income 0.15427 -0.06197

Travel 0.18680 0.42234

Vacation -0.06952 0.26127

Hsize -0.12653 0.10028

Age 0.05928 0.06284

Constant -11.09442 -3.79160

Thus the 2 equations will be

Funct1= -11.09422+.15427*Income+.18680*Travel-.06952*Vacation-.12653*Hsize+.05928*Age

Funct2=-3.79160-.06197*Income+.42234*Travel+.26127*Vacation+.10028*Hsize+.06284*Age

Analysis SampleAmount 1 2 3 Total

1 9 1 0 102 1 9 0 103 0 2 8 10

Total 10 12 8 301 90 10 02 10 90 03 0 20 80

Hit Ratio 86.70%

Holdout sampleAmount 1 2 3 Total

1 3 1 0 42 0 3 1 43 1 0 3 4

Total 4 4 4 121 75 25 02 0 75 253 25 0 75

Hit Ratio 75%

Predicted Group Membership

Original

Count

%

%

Count

Original

Predicted Group Membership

Three groups of equal size, so by chance one would expect a hit ratio of 1/3 =33.3%. Thus there is large improvement over chance, thus validating the discriminant

Example…1

A recent survey asked business people about the concern of hiring and maintaining employees during the current harsh economic environment

If an organisation wants to retain its employees, it must learn why people leave their jobs and why others stay and are satisfied with their jobs

Discriminant analysis was used to determine what factors explained the differences between salespeople who left a large computer manufacturing company and those who stayed

Example…2 Independent variables were

Company rating Job security Seven job satisfaction dimensions Four role-conflict dimensions Four role-ambiguity dimensions Nine measures of sales performance

Dependent variable was dichotomous – Those who stayed and those who left

The canonical correlation, an index of discrimination (R=0.4572), was significant (p =.0180)

Results indicated that the variables discriminated between those who left and those who stayed

Discriminant Analysis ResultsCoefficients Standardised Coefficients Structure Correlations

1 Work 0.0903 0.391 0.54462 Promotion 0.0288 0.1515 0.50443 Job Security 0.1567 0.1384 0.49584 Customer Relations 0.0086 0.1751 0.49065 Company Rating 0.4059 0.324 0.48246 Working with others7 Overall performance8 Time-territory management9 Sales produced

10 Presentation skill11 Technical Information12 Pay-benefits13 Quota achieved14 Management15 Information collection16 Family17 Sales manager18 Coworker19 Customer 20 Family21 Job 22 Job23 Customer24 Sales manager25 Sales manager26 Customer

Characteristic profile…1

In the example, based on structure correlations, Promotion was identified as the second most important variable.

However, looking at standardised discriminant functions, Promotion is not the second most important variable

The anamoly arises because of multi-collinearity In such cases, develop a Characteristic Profile for

each group By describing each group in terms of the group means for

the predictor variables

Characteristic profile…2

Promotion Company RatingThose who stayed 4.5 4Those who left 2.3 3.83Overall 3.42 3.92

Clearly promotion is more discriminating the two groups than company rating. Those who stayed with the company are satisfied with the promotions.

Discriminant Analysis using SPSS Analyse>Classify>Discriminant

Select Analyse from the SPSS menu bar Click Classify and then Discriminant Move criterion variable into the Grouping Variable box

‘Taken Vacation’ in the 1st example; ‘Amt spent on vacation’ in the 2nd example Click Define Range.

Enter 1- Taken vacation in last 2 years & 2 – the rest Enter 1- Low spenders, 2- Medium spenders, 3- High spenders

Move predictor variables to the Independents box Move ‘Income’, ‘Travel’, ‘Vacation’, ‘Hsize’ and ‘Age’ into the Independents box

Select Enter Independents Together (default option) Click on Statistics. In the pop-up window, in the Descriptives box check Means

and Univariate ANOVAS. In the Matrices box check Within Group Correlations. Click Continue

Click Classify. In the Display box check Summary Table. In the Use Covariance Matrix box check Within Groups. Click Continue

Click OK.

Classroom Problem…1 Data on Nike was obtained from 45 respondents. Which of the

independent variables discriminate between the 2 types of users of Nike? Dependent variable

2 types of users of Nike 1- Not so Heavy users 2- Heavy users

Independent variables Gender

1 – Females 2 – Males

Awareness Attitude Preference Intention & Loyalty

All these are measured on a 7 point scale where 1- very unfavorable & 7 – very favorable

Classroom Problem…2 Data on Nike was obtained from 45 respondents. Which of the independent

variables discriminate between the 3 types of users of Nike? Dependent variable

2 types of users of Nike 1- Light users 2- Medium users 3- Heavy users

Independent variables Gender

1 – Females 2 – Males

Awareness Attitude Preference Intention & Loyalty

All these are measured on a 7 point scale where 1- very unfavorable & 7 – very favorable

Thank you