Discriminant Analysis.ppt

16
DISCRIMINANT ANALYSIS DISCRIMINANT ANALYSIS Discriminant Analysis is a technique Discriminant Analysis is a technique for analysing data when the dependent for analysing data when the dependent variable(DV) is categorical variable(DV) is categorical (classification) and the predictor (classification) and the predictor independent variables (IVs ) are of independent variables (IVs ) are of interval or ratio nature. interval or ratio nature. IMPORTANT IMPORTANT DV : Non-metric (Nominal or ordinal DV : Non-metric (Nominal or ordinal scaled) scaled) Classification/grouping variable Classification/grouping variable IVs : Metric variables IVs : Metric variables (Interval or ratio scaled variables) (Interval or ratio scaled variables)

description

Discriminant Analysis.ppt

Transcript of Discriminant Analysis.ppt

Page 1: Discriminant Analysis.ppt

DISCRIMINANT ANALYSISDISCRIMINANT ANALYSIS Discriminant Analysis is a technique for Discriminant Analysis is a technique for

analysing data when the dependent analysing data when the dependent variable(DV) is categorical (classification) variable(DV) is categorical (classification) and the predictor independent variables and the predictor independent variables (IVs ) are of interval or ratio nature.(IVs ) are of interval or ratio nature.

IMPORTANTIMPORTANT DV : Non-metric (Nominal or ordinal DV : Non-metric (Nominal or ordinal

scaled) scaled) Classification/grouping Classification/grouping variablevariable

IVs : Metric variables IVs : Metric variables

(Interval or ratio scaled variables) (Interval or ratio scaled variables)

Page 2: Discriminant Analysis.ppt

Examples:Examples: DV is Choice of a brand of PC (A, B or C) DV is Choice of a brand of PC (A, B or C) and IVs are rating of attributes of PC’S on and IVs are rating of attributes of PC’S on a 7- point scale.a 7- point scale. Classification of customers into buyers Classification of customers into buyers and non- buyers based on their and non- buyers based on their demographic profiles such as age, demographic profiles such as age, income,sex and some factors related to income,sex and some factors related to shopping habits.shopping habits. Families who go/don’t go for vacation Families who go/don’t go for vacation holidays to holiday resorts as criterion holidays to holiday resorts as criterion variable and income,house hold size, variable and income,house hold size, attitude towards travel, importance to attitude towards travel, importance to family vacation etc. as predictor variables.family vacation etc. as predictor variables.

Page 3: Discriminant Analysis.ppt

Discriminant AnalysisDiscriminant Analysis PURPOSEPURPOSE:: to understand to understand

segmentation/ classifications and to segmentation/ classifications and to predict group membershippredict group membership

INPUT:INPUT: dependent variable as an dependent variable as an indicator of group membership and indicator of group membership and independent variables as classification independent variables as classification criteria criteria

KEY OUTPUT:KEY OUTPUT: classification matrix classification matrix

Page 4: Discriminant Analysis.ppt

The objectives of this technique areThe objectives of this technique are

  

1.1. Development of Discriminant function Development of Discriminant function which is a linear combination of which is a linear combination of independent variables, that best independent variables, that best discriminates between the categories of discriminates between the categories of the dependent variable (groups)the dependent variable (groups)

2.2. Examine whether significant differences Examine whether significant differences exists among the groups , in terms of the exists among the groups , in terms of the predictor variables.predictor variables.

Page 5: Discriminant Analysis.ppt

3.3. Determine which predictor variables Determine which predictor variables contributes to most of the inter-group contributes to most of the inter-group differences .differences .

4.4. Classification of cases to one of the Classification of cases to one of the groups based on values of the predictor groups based on values of the predictor variables.variables.

5.5. Evaluating accuracy of the classification.Evaluating accuracy of the classification.

  

Page 6: Discriminant Analysis.ppt

The Linier discriminant analysis model known The Linier discriminant analysis model known as the discriminant function is given by as the discriminant function is given by

D ( or Y) = bD ( or Y) = b0 0 + b+ b1 1 XX11 + b + b22XX2 2 + ………..+ b + ………..+ bkkXXkk

WhereWhere

D = Discriminant scoreD = Discriminant score

bs = discriminant coefficientbs = discriminant coefficient Xs = Independent variables ( k Xs = Independent variables ( k

independent variables)independent variables)

In discriminant analysis a score is assigned to In discriminant analysis a score is assigned to individuals or objects .This forms a basis for individuals or objects .This forms a basis for classifyng the item in the most likely class.classifyng the item in the most likely class.

Page 7: Discriminant Analysis.ppt

The Linier discriminant function in The Linier discriminant function in standardised formstandardised form is given by is given by

D ( or Y) = BD ( or Y) = B1 1 XX11 + B + B22XX2 2 + ………..+ B + ………..+ BkkXXkk

Page 8: Discriminant Analysis.ppt

If the DV has two groups a single If the DV has two groups a single discriminant equation is needed for discriminant equation is needed for categorising. categorising.

If the DV has three groups two If the DV has three groups two discriminant equations are needed for discriminant equations are needed for categorising. categorising.

If the DV has n groups (n-I) discriminant If the DV has n groups (n-I) discriminant equations will be required for equations will be required for categorisation categorisation

Page 9: Discriminant Analysis.ppt

Examples of applications of DA in Examples of applications of DA in Business researchBusiness research

  

1. How do customers who exhibit store 1. How do customers who exhibit store loyalty differ from those who do not, in loyalty differ from those who do not, in terms of demographic characteristics?terms of demographic characteristics?

2. Do market segments differ in their 2. Do market segments differ in their media consumption habits?media consumption habits?

3. What are the distinguishing 3. What are the distinguishing characteristics of consumers who prefer characteristics of consumers who prefer to shop on the net?to shop on the net?

Page 10: Discriminant Analysis.ppt

Important statistics associated with Important statistics associated with analysisanalysis

1.1. Discriminat Scores (DS) Discriminat Scores (DS)

2.2. Discriminant function coefficients: (Bs )Discriminant function coefficients: (Bs )

3.3. Canonical correlation : association Canonical correlation : association between discriminant scores(DS) and between discriminant scores(DS) and the groupsthe groups

4.4. Centriod : mean value of DS for a Centriod : mean value of DS for a particular group.particular group.

5.5. Classification matrix ( Confusion matrix Classification matrix ( Confusion matrix or prediction matrix): Matrix of correctly or prediction matrix): Matrix of correctly classified and misclassified cases.classified and misclassified cases.

Page 11: Discriminant Analysis.ppt

6.6. Hit ratio : Proportion of correct Hit ratio : Proportion of correct classification.classification.

7.7. Eigen values : Ratio of between group and Eigen values : Ratio of between group and with in group sum of squares. Larger the with in group sum of squares. Larger the eigen value better is the function.eigen value better is the function.Eigen value > 1 indicates that 100% of the Eigen value > 1 indicates that 100% of the explained variance is accounted for.explained variance is accounted for.( square of the cannonical correlation ( square of the cannonical correlation explains the % variation in the dependent explains the % variation in the dependent varaible explained by the model)varaible explained by the model)

8.8. Wilks lamda: indicates the significance of Wilks lamda: indicates the significance of the model. A lower value indicates higher the model. A lower value indicates higher significance. (Wilks lamda is converted to a significance. (Wilks lamda is converted to a chi-square value)chi-square value)

Page 12: Discriminant Analysis.ppt

Classification matrix Classification matrix True GroupTrue Group

good Credit Bad Creditgood Credit Bad CreditAssigned group Assigned group

Good Credit 40 10Good Credit 40 10Bad Credit 15 35Bad Credit 15 35

For the above matrix, the proportion of For the above matrix, the proportion of correct classification, i.e. correct classification, i.e. hit ratehit rate is is

(40+35)/(40+35+10+15)=75/100=75%(40+35)/(40+35+10+15)=75/100=75%

Key Output of Discriminant Key Output of Discriminant Analysis: Classification Matrix Analysis: Classification Matrix

Page 13: Discriminant Analysis.ppt

Examine the Quality of Examine the Quality of Discriminant Analysis: Which Discriminant Analysis: Which Result is Better?Result is Better?

True GroupTrue Group Good Credit Bad Good Credit Bad

CreditCredit Good Credit 30 Good Credit 30 1010

Assigned GroupAssigned Group Bad Credit 10 35Bad Credit 10 35

True GroupTrue Group Good Credit Bad CreditGood Credit Bad Credit Good Credit 40 10Good Credit 40 10Assigned GroupAssigned Group

Bad Credit 10 30Bad Credit 10 30

Page 14: Discriminant Analysis.ppt

Application problemApplication problem A firm has developed a new industrial A firm has developed a new industrial

process which is a distinct improvement process which is a distinct improvement over the existing one .The firm wants to over the existing one .The firm wants to know which industrial units would be know which industrial units would be interested in buying the process.Units interested in buying the process.Units which are early adopters and innovators which are early adopters and innovators would go in for the new process. Net would go in for the new process. Net profit of industrial units and their profit of industrial units and their membership with trade associations and membership with trade associations and technical societies are identified as two technical societies are identified as two important determinants. Data w.r.t. important determinants. Data w.r.t. these two are available.these two are available.

Data File Discrim.SavData File Discrim.Sav

Page 15: Discriminant Analysis.ppt

Logistic vs discrminantLogistic vs discrminant

When you need too many When you need too many assumption and chances of type 2 assumption and chances of type 2 error and of null hypothesiserror and of null hypothesis

Disciminat: all assumption are Disciminat: all assumption are met and you need to end the met and you need to end the hypothesishypothesis

Page 16: Discriminant Analysis.ppt

Regression vs logistic Regression vs logistic vs discriminantvs discriminant Regression is continuousRegression is continuous Logistic and Logistic and

discriminant:dependent variablediscriminant:dependent variable