discriminant analysis

39
KRISHNA D K ROLL NO: 20510 DIVISION OF AGRICULTURAL EXTENSION Discriminant Analysis 1 Credit Seminar

Transcript of discriminant analysis

Page 1: discriminant analysis

1

KRISHNA D K

ROLL NO: 20510

DIVISION OF AGRICULTURAL EXTENSION

Discriminant Analysis

Credit Seminar

Page 2: discriminant analysis

2Discriminant Analysis Discriminant analysis (DA) is a technique for analyzing

data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature.

It is a technique to discriminate between two or more mutually exclusive and exhaustive groups on the basis of some explanatory variables

Linear D A - when the criterion / dependent variable has two categories eg: adopters & non-adopters

Multiple D A- when three or more categories are involved eg: SHG1, SHG2,SHG3

Types of D.A

Page 3: discriminant analysis

3Similarities and Differences

ANALYSIS ANOVA REGRESSION DISCRIMINANT

Similarities1.Number of dependent One One Onevariables2.Number of independent Multiple Multiple Multiplevariables

Differences1.Nature of the dependent Metric Metric Categorical2.Nature of the independent Categorical Metric Metric

Page 4: discriminant analysis

4Assumptions 1. Sample size (n) group sizes of the dependent should not be grossly different i.e.

80:20. It should be at least five times the number of independent variables.

2. Normal distribution Each of the independent variable is normally distributed.

3. Homogeneity of variances / covariances All variables have linear and homoscedastic relationships.

4. Outliers Outliers should not be present in the data. DA is highly

sensitive to the inclusion of outliers.

Page 5: discriminant analysis

55. Non-multicollinearity There should NOT BE MULTICOLLINEARITY among

the independent variables.

6. Mutually exclusive The groups must be mutually exclusive, with every

subject or case belonging to only one group.

7. Classification Each of the allocations for the dependent

categories in the initial classification are correctly classified.

Page 6: discriminant analysis

6Discriminant Analysis Model

The discriminant analysis model involves linear combinations of

the following form:

D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk where

D = discriminant score

b 's = discriminant coefficient or weight

X 's = predictor or independent variable

The coefficients, or weights (b), are estimated so that the groups differ as much as possible on the values of the discriminant function.

Discriminant analysis – creates an equation which will minimize the possibility of misclassifying cases into their respective groups or categories

Page 7: discriminant analysis

7Hypothesis

Discriminant analysis tests the following hypotheses:

H0: The group means of a set of independent variables for two or more groups are equal.

Against

H1: The group means for two or more groups are not equal

This group means is referred to as a centroid.

Page 8: discriminant analysis

8Statistics Associated with Discriminant Analysis

Canonical correlation: Canonical correlation measures the extent of association between the discriminant scores and the groups.

It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership.

The canonical correlation is the multiple correlation between the predictors and the discriminant function

Centroid. The centroid is the mean values for the discriminant scores for a particular group. There are as many centroids as there are groups, as there is

one for each group. The means for a group on all the functions are the group centroids.

Page 9: discriminant analysis

9 Classification matrix. Sometimes also called

confusion or prediction matrix, the classification matrix contains the number of correctly classified and misclassified cases.

Discriminant function coefficients. The discriminant function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of measurement.

F values and their significance. These are calculated from a one-way ANOVA, with the grouping variable serving as the categorical independent variable. Each predictor, in turn, serves as the metric dependent variable in the ANOVA.

Page 10: discriminant analysis

10 Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores.

Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of between-group to within-group sums of squares. Large Eigenvalues imply superior functions.

Pooled within-group correlation matrix. The pooled within-group correlation matrix is computed by averaging the separate covariance matrices for all the groups.

Page 11: discriminant analysis

11 Standardized discriminant function coefficients.

The standardized discriminant function coefficients are the discriminant function coefficients and are used as the multipliers

Structure correlations. Also referred to as discriminant loadings, the structure correlations represent the simple correlations between the predictors and the discriminant function.

Group means and group standard deviations. These are computed for each predictor for each group.

Page 12: discriminant analysis

12 Wilks‘ lambda . Sometimes also called the U statistic,

Wilks‘ λ for each predictor is the ratio of the within-group sum of squares to the total sum of squares. Its value varies between 0 and 1.

Large values of λ (near 1) indicate that group means do not seem to be different. Small values of λ (near 0) indicate that the group means seem to be different. It is (1-R2 ) where R2 is the canonical correlation

It is used to measure how well each function separates cases into groups. It also indicates the significance of the discriminant function and provides the proportion of total variability not explained.

Page 13: discriminant analysis

Linear discriminant analysis : Hypothetical example

Groups based on adoption intention

quality (x1)

accessibility(x2)

Price (x3)

Group A: would adopt

Person 1Person 2Person 3Person 4Person 5

86

1094

97648

65342

Group B: would not adopt

Person 6Person 7Person 8Person 9

Person 10

53422

47542

72532

13

Page 14: discriminant analysis

14

0 2 4 6 8 10 120

2

4

6

8

10

12

person

Qua

lity

(x1)

adopters

Mis-classification

Non-adopters

Page 15: discriminant analysis

15

0 2 4 6 8 10 120

1

2

3

4

5

6

7

8

9

10

person

Acce

ssib

ilitt

y (x

2)

adopters

Mis-classification

Non-adopters

Page 16: discriminant analysis

16

0 2 4 6 8 10 120

1

2

3

4

5

6

7

8

person

Pric

e (x

3)

adopters

Mis-classification

Non-adopters

Page 17: discriminant analysis

17Out put :

Function Eigen value % of variance Cumulative % Canonical correlation

1 3.315 100 100 0.877

Test of functions

Wilk’s lambda Chi-squre d.f. Sig.

1 0.232 9.504 3 0.023

Function

1

X1 1.110

X2 0.709

x3 -0.564

Standardised canonical discrimination function coefficients

Zi = 1.110x1+0.709x2-0.564x3

Discriminant function can be written as

Note : more eigen value and lesser wilk’s lambda preferred

Page 18: discriminant analysis

18

Predicting group membership: Group centroids are calculated as 10.77 and 4.52.

by taking the mean of respective discriminant scores of the Group. Thus the cut of score is average of both = 7.65

One can predict a person’s choice of dependent variable i.e. adopting / non – adopting

Page 19: discriminant analysis

19Multiple discriminant analysis

When we need to discriminate among more than two groups, we use multiple discriminant analysis.

This technique requires fitting g-1 number of discriminant functions, where g is the number of groups

Assumptions remain same for this type too..The best D will be judged as per the

comparison between functions

Page 20: discriminant analysis

20Case study 1:

Title : A study on agri-entrepreneurship behaviour of farmers

author : Dipika hajong (2014)

tool used for : to discriminate between variables which contribute for behaviour in agripreneurs and non-agripreneurs

Sample size : 20 entrepreneurs and 30 conventional farmers

Variables : aspiration, information processing behaviour, proactiveness, information passing frequency , social network , resiliency , autonomy,

total land & hope of success

Finding : confidential

Page 21: discriminant analysis

21Case study : 2

Application of Discriminant Function Analysis in

Agricultural Extension Research

Ayogu, Chiebonam Justina, Madukwe, Micheal.C, Yekinni, Oyedeji Taofeeq

A research study was carried out to select the variables which could best discriminate between two groups of Extension Agents – Effective Extension Agents (Group 1); and Ineffective Extension Agents (Group 2).

note : join research gate

Page 22: discriminant analysis

221. Analyse ˃>>Classify >>>Discriminant 

Page 23: discriminant analysis

232. Click Define Range button and enter the lowest and highest code for your groups.

Page 24: discriminant analysis

243.Click on Statistics button and select Means, Univariate Anovas, Box’s M,

Page 25: discriminant analysis

254. Click on Save and then select Predicted Group Membership and Discriminant Scores, click Continue

Page 26: discriminant analysis

26Findings of case study 2:

EA group Mean Std. Deviation Unweighted Weighted Ineffective EA Age 31.0588 8.20420 17 17.000

Years of experience 19.1765 10.42374 17 17.000

Distance of residence to work 3.1765 2.24264 17 17.000 place

Communication skills 7.2312 4.28795 17 17.000

Positive attitude to work 1.4706 .71743 17 17.000 Effective EA Age 31.8182 10.33941 33 33.000

Years of experience 31.3030 8.47233 33 33.000

Distance of residence to work 3.3636 2.54728 33 33.000 place

Communication skills 16.8406 5.37169 33 33.000

Positive attitude to work 2.6364 .65279 33 33.000 Total Age 31.5600 9.58775 50 50.000

Years of experience 27.1800 10.77164 50 50.000

Distance of residence to work 3.3000 2.42647 50 50.000 place

Communication skills 13.5734 6.78175 50 50.000

Positive attitude to work 2.2400 .87037 50 50.000

Page 27: discriminant analysis

27 Wilks' Lambda F df1 df2 Sig.

Age .999 .069 1 48 .794

Years of experience .710 19.625 1 48 .000

Distance of residence to work

place .999 .065 1 48 .799

Communication skills .540 40.846 1 48 .000

Positive attitude to work .589 33.464 1 48 .000

Table provides statistical evidence of significant differences between means of effective EAs and ineffective EA groups for all independent variables with communication skill and positive attitude to work producing very high value F’s.

Page 28: discriminant analysis

28

Test of           

          

Functi Wilks'         

on(s) Lambda Chi-square df   Sig. 

             

1 .350 47.825   5 .000 

             

Wilks'Lambda

The significance of the discriminant function is indicated by Wilks’ lambda and provides the proportion of total variability not explained, i.e. it is the converse of the squared canonical correlation.

Page 29: discriminant analysis

29Pooled Within-Groups Matrices

Distance of Positive

Years of residence to Communicatio attitude to

Correlation age experience work place n skills work

Age 1.000 .094 -.149 -.036 .243

Years of experience .094 1.000 -.231 .139 .021

Distance of residence-.149 -.231 1.000 -.198 -.303

to work place

Communication skills -.036 .139 -.198 1.000 .214

Positive attitude to work

.243 .021 -.303 .214 1.000

The within- groups correlation matrix shows the correlations between the predictors. 

Page 30: discriminant analysis

30

An eigenvalue provides information on the proportion of variance explained. A canonical correlation of 0.807 suggests the model explains 65.13% (i.e.0.8072 ×100) of the variation in the grouping variable, i.e. whether an extension agent is effective or ineffective

Eigenvalues table

Functi

Canonical

on Eigenvalue % of Variance Cumulative % Correlation

1 1.861a 100.0 100.0 .807

Page 31: discriminant analysis

31Structure matrix table

Function

1

Communication skills .676

Positive attitude to work .612

Years of experience .469

age .028

Distance of residence to work

.027

place

These unstandardized coefficients (b) operate like unstandardized b (in regression) coefficients and are used to create the actual prediction equation which are used to classify new cases.

Page 32: discriminant analysis

32Canonical Discriminant Function Coefficients table

Age Years of experience

Distance of residence to work place

Communication skills

Positive attitude to work

(Constant) Unstandardized coefficients

Function

1

-.009

.053

.175

.110

.940

-5.329

Page 33: discriminant analysis

33

D= (-0.009 age) + (0.053 × years of experience in extension work) + (0.175 × distance of residence to work place) + (0.110 × communication skill) + (0.940 × positive attitude to work) - 5.329.

Page 34: discriminant analysis

34Advantages Discrimination of different groups

Accuracy of classification of groups can be determined

Helps for categorical regression analysis

Visual graphics makes clear understanding for the two or more categories with computational logics.

Page 35: discriminant analysis

35

Linear discrimination cannot be used when subgroups are stronger.

The selection of the predictor variables are not strong until a strong classification exists.

It cannot be used when there is insufficient data to define sample means

Limitations

Page 36: discriminant analysis

36Contd…

If the number of observations are less, the discrimination method cannot be used.(5 times more than the no. of predictor variables) : Lawrence – applied multivariate research)

If the overlap in the distribution is small, the discriminant function separates the groups well.

If the overlap is large, the function is a poor discriminator between the groups.

Page 37: discriminant analysis

37ApplicationsPrediction and description DA (Lawrence : applied multivariate research)

Agriculture- Fisheries, Crop studies, yield studies, Geoinformatics, Bioinformatics, social science researches

Socio-economics and Behavioral studies of rural communities

Hydrological and physico-chemical studies in different water resources

Bankruptcy prediction based on accounting ratios and other financial variables (LDA)

Face recognition (Computerized)

Marketing –Different types of customers and products based on surveys.

Page 38: discriminant analysis

38References

1. Hajong Dipika.(2014). A study on agri-entrepreneurship behaviour of farmers. PhD thesis. IARI, New Delhi

2. Kothari, C. R. (2004). Research methodology: Methods and techniques. New Age International.

3. Meyers, L. S., Gamst, G., & Guarino, A. J. (2006). Applied multivariate research: Design and interpretation. Sage.

4. Poulsen, J., & French, A. (2008). Discriminant function analysis. San Francisco State University: San Francisco, CA.

5. SPSS Chapter 25 Data File B. Retrieved from www.uk.sagepub.com/

6. www.youtube.com/watch?v=7zYcMZ-61c4

Page 39: discriminant analysis

39

Thank You….

All great men are gifted with intuition. Just an Analysis and a reasoning will fructify

their contribution….

conclusion