disriminanant Analysis .Final

8/9/2019 disriminanant Analysis .Final

1/44

Discriminant Analysis

Prepared by-

Sumit Jain


2/44

Introduction-

Discriminant analysis or DA, is a technique for analysing marketingresearch data when criterion or dependent variable is categorical andthe predictor or independent variables are interval in nature . In otherwords, Discriminant analysis is a statistical method thatis used by researchers to help them understand the

relationship between a "dependent variable" and oneor more "independent variables." A dependentvariable is the variable that a researcher is trying toexplain or predict from the values of the independentvariables. Discriminant analysis is similar to regressionanalysis and analysis of variance (ANOVA). Theprincipal difference between discriminant analysis andthe other two methods is with regard to the nature ofthe dependent variable.


3/44


4/44

Contd..

It is a statistical technique that is used to classify the dependentvariable between two or more categories. Discriminant analysisalso has a regression technique, which is used for predicting thevalue of the dependent categorical variable.

F test (Wilks lambda) The overall model significance of thediscriminant function is tested by the Wilks lambda test. If theoverall model is significant, then the F test is used to test whetheror not the individual variable means differ from the group mean

function..


5/44

Examples-

For example, an educational researcher may want toinvestigate which variables discriminate between high schoolgraduates who decide (1) to go to college, (2) to attend a tradeor professional school, or (3) to seek no further training or

education. For that purpose the researcher could collect dataon numerous variables prior to students' graduation. Aftergraduation, most students will naturally fall into one of thethree categories. Discriminant Analysis could then be used todetermine which variable(s) are the best predictors of students'

subsequent educational choice.


6/44

Another example a medical researcher may record differentvariables relating to patients' backgrounds in order to learnwhich variables best predict whether a patient is likely torecover completely (group 1), partially (group 2), or not at all

(group 3). A biologist could record different characteristics ofsimilar types (groups) of flowers, and then perform adiscriminant function analysis to determine the set ofcharacteristics that allows for the best discrimination betweenthe types.


7/44

Purpose-

The main purpose of a discriminant function analysis is topredict group membership based on a linear combination of theinterval variables. The procedure begins with a set ofobservations where both group membership and the values of the

interval variables are known. The end result of the procedure is amodel that allows prediction of group membership when only theinterval variables are known. A second purpose of discriminantfunction analysis is an understanding of the data set, as a carefulexamination of the prediction model that results from the

procedure can give insight into the relationship between groupmembership and the variables used to predict group membership.


8/44

Objectives-

To classify cases into groups using a discriminant predictionequation.

To test theory by observing whether cases are classified aspredicted.

To investigate differences between or among groups.

To determine the most parsimonious way to distinguish amonggroups.

To determine the percent of variance in the dependent variableexplained by the independents.

To determine the percent of variance in the dependent variable

explained by the independents over and above the varianceaccounted for by control variables, using sequentialdiscriminant analysis.


9/44

To assess the relative importance of the independentvariables in classifying the dependent variable.

To discard variables which are little related to groupdistinctions.

To infer the meaning of MDA dimensions whichdistinguish groups, based on discriminant loadings.


10/44

Multiple discriminant analysis (MDA) is an extension ofdiscriminant analysis and a cousin of multiple analysis ofvariance (MANOVA), sharing many of the same assumptions andtests. MDA is used to classify a categorical dependent which has

more than two categories, using as predictors a number ofinterval or dummy independent variables. MDA is sometimesalso called discriminant factor analysis or canonical discriminantanalysis.


11/44

Assumptions in Discriminant analysis-

1. Independence: Each case should be independent of each other.Correlated data cannot be used in discriminant analysis.

2. Adequate sample size: There must be at least two cases for

each category of the dependent variable. However, it isrecommended that there should be at least four or five times asmany cases as independent variables.

3. Interval data: In discriminant analysis, there should be aninterval data for independent variable.

4. Variance: No independents have a zero standard deviation inone or more of the groups formed by the dependent.


12/44

Contd..

5. Random error: Error terms are assumed to be randomly distributed.

6. Homogeneity of variances: Variance with each group of independentvariables should be equal.

7. Absence of perfect multicollinearity: There should be no perfectmulticollinearitybetween the independent variables.

8. Assumes linearity: The discriminant functions should be linear and relatedto each other.

9. Normally distributed: The predictor variable should be normallydistributed.


13/44

STEPSSTEPS


14/44

Key Terms and Concepts-

Discriminating variables: Discriminating variables areindependent variables that are used to predict the dependentvariable. These variables are also called the predictors.

The criterion variable: Dependent variables are also called thecriterion variables.

Discriminant function: The Linear combination of thediscriminating (independent) variable is called the

discriminant function. For example, L = b11 + b22 + + bnxn + c where L= discriminant function, b1= discriminant

coefficients, X= independents variables, and C = constants


15/44

Number of discriminant functions: For the two groups, there isone discriminant analysis function. For multivariatediscriminant analysis there will be g-1 discriminant function.

The Eigenvalues: This is also called characteristic root, whichtells us the variance explained by each discriminant function.

The discriminant score: By applying discriminant formulas, the

value that comes is called the discriminant score. Thisdiscriminant score helps us to classify the group category.


16/44

Contd

Cutoff: This is the value which divides the group value into twoparts. When the value of the discriminant score is at thenegative side of the cutoff point, then the group will fall into a

lower category, and when it is at the positive side, the groupwill be at a higher category.

Unstandardized discriminant coefficients: Unstandardizeddiscriminant coefficients are simply like the regression beta,which is used to predict the discriminate score. Standardizeddiscriminant coefficients are used to compare the relativeimportance of the independent variables.


17/44

TYPES OF DISCRIMINANT ANALYSIS-

LINEAR DISCRIMINANT ANALYSIS

Linear Discriminant model (LDA) is used in the case when

the groups are separable by linear combinations of the discriminatingvariables. If only two features, the separators between objectsgroup will become lines. If the features are three, the separator isa plane and the number of features (i.e. independent variables) ismore than 3, the separators become a hyper- plane. The finalvalue of the Discriminant function will determine the group the

particular observation belongs to. Appropriate threshold valuesand relative significance of individual Discriminant function willlead to the final

outcome/group.


18/44

Contd..

LDA is closely related to ANOVA (analysis of variance) andregression analysis, which also attempt to express onedependent variable as a linear combination of other features ormeasurements. In the other two methods however, the

dependent variable is a numerical quantity, while for LDA it isa categorical variable (i.e. the class label).


19/44

Application-


20/44

Career Counsellors

suppose we have two groups of high schoolgraduates: Those who choose to attendcollege after graduation and those who donot. We could have measured students'

stated intention to continue on to collegeone year prior to graduation. If the meansfor the two groups (those who actually wentto college and those who did not) aredifferent, then we can say that intention to

attend college as stated one year prior tograduation allows us to discriminatebetween those who are and are not collegebound (and this information may be used bycareer counsellors to provide the

appropriate guidance to the respectivestudents).


21/44

Marketing-

In marketing, discriminant analysiswas once often used to determinethe factors which distinguish

different types of customers and/orproducts on the basis of surveys orother forms of collected data.

Logistic regression or other methodsare now more commonly used. Theuse of discriminant analysis inmarketing can be described by thefollowing steps:


22/44

Formulate the problem and gatherdata - Identify the salient attributesconsumers use to evaluate products inthis category - Use quantitative

marketing research techniques (suchas surveys) to collect data from asample of potential customersconcerning their ratings of all the

product attributes. The data collectionstage is usually done by marketingresearch professionals. Surveyquestions ask the respondent to rate aproduct from one to five (or 1 to 7, or 1

to 10) on a range of attributes chosen


23/44

Anywhere from five to twentyattributes are chosen. They couldinclude things like: ease of use,

weight, accuracy, durability,colourfulness, price, or size. Theattributes chosen will vary dependingon the product being studied. The

same question is asked about all theproducts in the study. The data formultiple products is codified andinput into a statistical program suchas R, SPSS or SAS. (This step is the


24/44

Estimate the Discriminant FunctionCoefficients and determine the statisticalsignificance and validity - Choose theappropriate discriminant analysis method. Thedirect method involves estimating thediscriminant function so that all the predictorsare assessed simultaneously. The stepwisemethod enters the predictors sequentially. Thetwo-group method should be used when thedependent variable has two categories or

states. The multiple discriminant method isused when the dependent variable has three ormore categorical states. Use Wilkss Lambdatotest for significance in SPSS or F stat in SAS.

The most common method used to test validityis to split the sample into an estimation oranalysis sample, and a validation or holdout


25/44

The estimation sample is used in constructing thediscriminant function. The validation sample is used toconstruct a classification matrix which contains thenumber of correctly classified and incorrectly classifiedcases. The percentage of correctly classified cases is

called the hit ratio.

Plot the results on a two dimensional map, define thedimensions, and interpret the results. The statisticalprogram (or a related module) will map the results.

The map will plot each product (usually in twodimensional space). The distance of products to eachother indicate either how different they are. Thedimensions must be labelled by the researcher. Thisrequires subjective judgement and is often verychallenging.


26/44

SOCIAL SCIENCES-

Prediction of Elections:

In this case the variables can be various social and economicfactors,

coupled with party effort parameters. Some of these variables canbe as follows

(1)No. of new projects implemented by incumbent party

(2)No. of candidates in fray

(3)National reach of the party (no .of states active in)


27/44

(4)SEC division of the Electorate (in form of ratios)

(5)Profession wise division of the Electorate

(6)Age wise division of the Electorate.

The variables mentioned above are few of the representative parametersthat might have a bearing on the coming elections. Nowadays another

important parameter is the result of exit polls, which are conducted by

various media agencies. They provide the general expectations of the

electorate in view.


28/44

Outcome of terrorist attacks with hostages:

With the increasing occurrences of terrorist attacks, it becomesvery important for the law and order enforcing body andgovernments to ensure minimal collateral damage during rescueoperations. Lot of times it can be prudent to predict the

possibility of such an operation going bad i.e. casualty whilerescue. Research on this front has already been initiated. Thebasic hypothesis is based on the fact that various variables maybe good predictors of the safe release or execution of thehostages. Some of these variables are as follows-


29/44

Contd..

(1)Number of terrorists(2)Strength of their support in the local population(3)Number of weapons and amount of ammunition with the terrorists(4)Type of weapons wielded by the attackers(5)Ratio of terrorists to hostages(6)Whether the terrorists are independent operators or they belong tosome large scale terrorist outfit(7)Time since the hostages were taken(8)Female/male ratio among the hostages(9)Children/adults ratio among the hostages

A careful training with past cases can help the government take a decisionon whether to use force or negotiations to neutralize the terrorist threat.


30/44

MEDICINE AND DIAGNOSTICS

The application of multivariate analysis, and especiallydiscriminant analysis ,to the study of trace elements in food andenvironmental fields has been largely used in various occasions.In the clinical field, Discriminant analysis has been tentatively

used to improve the predictive value of tomography images indifferential diagnosis between AD and frontotemporal dementia.Similarly, the need for non-invasive, specific and sensitive testled to study whether levels of some proteins considered markersof neuronal degeneration were useful to discriminate between

patients and control groups.


31/44

Hepatitis Disease Detection

Research has been going in this domain. The basic diagnosticflowchart follows. Here LDA is useful in determining the mostimportant features impacting the advent of the disease. Once thereduction is done, the actual classification is done through a

fuzzy network based classifier. Here the LDA is like a dataconditioning function, instead of being a predictor. Diagram


32/44


33/44


34/44

Contd..

The study hence conducted attained 94.16% accuracy indetection on Hepatitis, which is very high. This would help quickmedication and hence recovery for the patient.


35/44

INSURANCE COMPANIES

Insolvency prediction (Case study on Spanish Banks)

Unlike other financial problems, there are agreat number of agents facing business failure, soresearch in this topic has been of growing interest

in the last decades. Insolvency, early detection offinancial distress, or conditions leading toinsolvency of insurance companies have been aconcern of parties such as insurance regulators,investors, management, financial analysts, banks,auditors, policy holders and consumers. Thisconcern has arised from the necessity ofprotecting the general public


36/44

against the consequences of insurersinsolvencies, as well as minimizing the costsassociated to this problem such as theeffects on state insurance guaranty funds or

the responsibilities for management andauditors. It has long been recognized that there needsto be some form of supervision of such entities toattempt to minimize the risk of failure. Nowadays,Solvency II project is intended to lead to the reform ofthe existing solvency rules in European Union. Manyinsolvency cases appeared after the insurance cycles ofthe 1970s and 1980s in the United States and inEuropean Union.

Contd..


37/44

Contd..

Several surveys have been devoted to identify the main causes ofinsurers insolvency, in particular, the Mller Group Report(1997) analyses the main identified causes of insuranceinsolvencies in the European Union. The main reasons can besummarized as follows: operational risks (operational failurerelated to inexperienced or incompetent management, fraud);underwriting risks (inadequate reinsurance programme andfailure to recover from reinsurers, higher losses due to rapidgrowth, excessive operating costs, poor underwriting process);

insufficient provisions and imprudent


38/44

Contd..

investments. On the other hand, many insurance companies,specially larger companies, have developed internal risk modelsfor a number of purposes. There is an absence of suchstandardized systems in Spain, where most insurance companieshave internal check mechanism to predict insolvency.

A recent study by academicians from Madrid performed a LDAto predict insolvency of Spanish banks using historical data from72 banks. The data was collected 1,2,3 years prior to theinsolvency. Some of the results of the study are as given below.

click


39/44

, ,Here Model 1 2 and 3 are predictors with data 1, .2 and 3 years prior to insolvency respectively


40/44

: ,Table List of Financial Ratios used as variables for the Predictormodel

&BDM DM


41/44

:Table Final Results of the LDA performed in the three models,From the above results we see that the LDA model was probably not the

,best model to apply here as the accuracy was very low and only

slightly. .better than 0 5 probability in the case of the test cases Maybe some

other.high level classification method would work better here

&BDM


42/44

In short, Discriminant Analysis is avery useful tool (1) for detecting thevariables that allow the researcher to

discriminate between different(naturally occurring) groups, and (2)for classifying cases into different

groups with a better than chanceaccuracy.

CONCLUSIONS-


43/44

Reference

www.wikipedia.com

www.books.google.co.in

www.resample.com

www.statsoft.com

www.faculty.chass.ncsu.edu

www.eso.org
http://www.wikipedia.com/http://www.books.google.co.in/http://www.resample.com/http://www.statsoft.com/http://www.faculty.chass.ncsu.edu/http://www.eso.org/http://www.eso.org/http://www.faculty.chass.ncsu.edu/http://www.statsoft.com/http://www.resample.com/http://www.books.google.co.in/http://www.wikipedia.com/


44/44

disriminanant Analysis .Final

Documents

Transcript of disriminanant Analysis .Final