disriminanant Analysis .Final

download disriminanant Analysis .Final

of 44

Transcript of disriminanant Analysis .Final

  • 8/9/2019 disriminanant Analysis .Final

    1/44

    Discriminant Analysis

    Prepared by-

    Sumit Jain

  • 8/9/2019 disriminanant Analysis .Final

    2/44

    Introduction-

    Discriminant analysis or DA, is a technique for analysing marketingresearch data when criterion or dependent variable is categorical andthe predictor or independent variables are interval in nature . In otherwords, Discriminant analysis is a statistical method thatis used by researchers to help them understand the

    relationship between a "dependent variable" and oneor more "independent variables." A dependentvariable is the variable that a researcher is trying toexplain or predict from the values of the independentvariables. Discriminant analysis is similar to regressionanalysis and analysis of variance (ANOVA). Theprincipal difference between discriminant analysis andthe other two methods is with regard to the nature ofthe dependent variable.

  • 8/9/2019 disriminanant Analysis .Final

    3/44

  • 8/9/2019 disriminanant Analysis .Final

    4/44

    Contd..

    It is a statistical technique that is used to classify the dependentvariable between two or more categories. Discriminant analysisalso has a regression technique, which is used for predicting thevalue of the dependent categorical variable.

    F test (Wilks lambda) The overall model significance of thediscriminant function is tested by the Wilks lambda test. If theoverall model is significant, then the F test is used to test whetheror not the individual variable means differ from the group mean

    function..

  • 8/9/2019 disriminanant Analysis .Final

    5/44

    Examples-

    For example, an educational researcher may want toinvestigate which variables discriminate between high schoolgraduates who decide (1) to go to college, (2) to attend a tradeor professional school, or (3) to seek no further training or

    education. For that purpose the researcher could collect dataon numerous variables prior to students' graduation. Aftergraduation, most students will naturally fall into one of thethree categories. Discriminant Analysis could then be used todetermine which variable(s) are the best predictors of students'

    subsequent educational choice.

  • 8/9/2019 disriminanant Analysis .Final

    6/44

    Another example a medical researcher may record differentvariables relating to patients' backgrounds in order to learnwhich variables best predict whether a patient is likely torecover completely (group 1), partially (group 2), or not at all

    (group 3). A biologist could record different characteristics ofsimilar types (groups) of flowers, and then perform adiscriminant function analysis to determine the set ofcharacteristics that allows for the best discrimination betweenthe types.

  • 8/9/2019 disriminanant Analysis .Final

    7/44

    Purpose-

    The main purpose of a discriminant function analysis is topredict group membership based on a linear combination of theinterval variables. The procedure begins with a set ofobservations where both group membership and the values of the

    interval variables are known. The end result of the procedure is amodel that allows prediction of group membership when only theinterval variables are known. A second purpose of discriminantfunction analysis is an understanding of the data set, as a carefulexamination of the prediction model that results from the

    procedure can give insight into the relationship between groupmembership and the variables used to predict group membership.

  • 8/9/2019 disriminanant Analysis .Final

    8/44

    Objectives-

    To classify cases into groups using a discriminant predictionequation.

    To test theory by observing whether cases are classified aspredicted.

    To investigate differences between or among groups.

    To determine the most parsimonious way to distinguish amonggroups.

    To determine the percent of variance in the dependent variableexplained by the independents.

    To determine the percent of variance in the dependent variable

    explained by the independents over and above the varianceaccounted for by control variables, using sequentialdiscriminant analysis.

  • 8/9/2019 disriminanant Analysis .Final

    9/44

    To assess the relative importance of the independentvariables in classifying the dependent variable.

    To discard variables which are little related to groupdistinctions.

    To infer the meaning of MDA dimensions whichdistinguish groups, based on discriminant loadings.

  • 8/9/2019 disriminanant Analysis .Final

    10/44

    Multiple discriminant analysis (MDA) is an extension ofdiscriminant analysis and a cousin of multiple analysis ofvariance (MANOVA), sharing many of the same assumptions andtests. MDA is used to classify a categorical dependent which has

    more than two categories, using as predictors a number ofinterval or dummy independent variables. MDA is sometimesalso called discriminant factor analysis or canonical discriminantanalysis.

  • 8/9/2019 disriminanant Analysis .Final

    11/44

    Assumptions in Discriminant analysis-

    1. Independence: Each case should be independent of each other.Correlated data cannot be used in discriminant analysis.

    2. Adequate sample size: There must be at least two cases for

    each category of the dependent variable. However, it isrecommended that there should be at least four or five times asmany cases as independent variables.

    3. Interval data: In discriminant analysis, there should be aninterval data for independent variable.

    4. Variance: No independents have a zero standard deviation inone or more of the groups formed by the dependent.

  • 8/9/2019 disriminanant Analysis .Final

    12/44

    Contd..

    5. Random error: Error terms are assumed to be randomly distributed.

    6. Homogeneity of variances: Variance with each group of independentvariables should be equal.

    7. Absence of perfect multicollinearity: There should be no perfectmulticollinearitybetween the independent variables.

    8. Assumes linearity: The discriminant functions should be linear and relatedto each other.

    9. Normally distributed: The predictor variable should be normallydistributed.

  • 8/9/2019 disriminanant Analysis .Final

    13/44

    STEPSSTEPS

  • 8/9/2019 disriminanant Analysis .Final

    14/44

    Key Terms and Concepts-

    Discriminating variables: Discriminating variables areindependent variables that are used to predict the dependentvariable. These variables are also called the predictors.

    The criterion variable: Dependent variables are also called thecriterion variables.

    Discriminant function: The Linear combination of thediscriminating (independent) variable is called the

    discriminant function. For example, L = b11 + b22 + + bnxn + c where L= discriminant function, b1= discriminant

    coefficients, X= independents variables, and C = constants

  • 8/9/2019 disriminanant Analysis .Final

    15/44

    Number of discriminant functions: For the two groups, there isone discriminant analysis function. For multivariatediscriminant analysis there will be g-1 discriminant function.

    The Eigenvalues: This is also called characteristic root, whichtells us the variance explained by each discriminant function.

    The discriminant score: By applying discriminant formulas, the

    value that comes is called the discriminant score. Thisdiscriminant score helps us to classify the group category.

  • 8/9/2019 disriminanant Analysis .Final

    16/44

    Contd

    Cutoff: This is the value which divides the group value into twoparts. When the value of the discriminant score is at thenegative side of the cutoff point, then the group will fall into a

    lower category, and when it is at the positive side, the groupwill be at a higher category.

    Unstandardized discriminant coefficients: Unstandardizeddiscriminant coefficients are simply like the regression beta,which is used to predict the discriminate score. Standardizeddiscriminant coefficients are used to compare the relativeimportance of the independent variables.

  • 8/9/2019 disriminanant Analysis .Final

    17/44

    TYPES OF DISCRIMINANT ANALYSIS-

    LINEAR DISCRIMINANT ANALYSIS

    Linear Discriminant model (LDA) is used in the case when

    the groups are separable by linear combinations of the discriminatingvariables. If only two features, the separators between objectsgroup will become lines. If the features are three, the separator isa plane and the number of features (i.e. independent variables) ismore than 3, the separators become a hyper- plane. The finalvalue of the Discriminant function will determine the group the

    particular observation belongs to. Appropriate threshold valuesand relative significance of individual Discriminant function willlead to the final

    outcome/group.

  • 8/9/2019 disriminanant Analysis .Final

    18/44

    Contd..

    LDA is closely related to ANOVA (analysis of variance) andregression analysis, which also attempt to express onedependent variable as a linear combination of other features ormeasurements. In the other two methods however, the

    dependent variable is a numerical quantity, while for LDA it isa categorical variable (i.e. the class label).

  • 8/9/2019 disriminanant Analysis .Final

    19/44

    Application-

  • 8/9/2019 disriminanant Analysis .Final

    20/44

    Career Counsellors

    suppose we have two groups of high schoolgraduates: Those who choose to attendcollege after graduation and those who donot. We could have measured students'

    stated intention to continue on to collegeone year prior to graduation. If the meansfor the two groups (those who actually wentto college and those who did not) aredifferent, then we can say that intention to

    attend college as stated one year prior tograduation allows us to discriminatebetween those who are and are not collegebound (and this information may be used bycareer counsellors to provide the

    appropriate guidance to the respectivestudents).

  • 8/9/2019 disriminanant Analysis .Final

    21/44

    Marketing-

    In marketing, discriminant analysiswas once often used to determinethe factors which distinguish

    different types of customers and/orproducts on the basis of surveys orother forms of collected data.

    Logistic regression or other methodsare now more commonly used. Theuse of discriminant analysis inmarketing can be described by thefollowing steps:

  • 8/9/2019 disriminanant Analysis .Final

    22/44

    Formulate the problem and gatherdata - Identify the salient attributesconsumers use to evaluate products inthis category - Use quantitative

    marketing research techniques (suchas surveys) to collect data from asample of potential customersconcerning their ratings of all the

    product attributes. The data collectionstage is usually done by marketingresearch professionals. Surveyquestions ask the respondent to rate aproduct from one to five (or 1 to 7, or 1

    to 10) on a range of attributes chosen

  • 8/9/2019 disriminanant Analysis .Final

    23/44

    Anywhere from five to twentyattributes are chosen. They couldinclude things like: ease of use,

    weight, accuracy, durability,colourfulness, price, or size. Theattributes chosen will vary dependingon the product being studied. The

    same question is asked about all theproducts in the study. The data formultiple products is codified andinput into a statistical program suchas R, SPSS or SAS. (This step is the

  • 8/9/2019 disriminanant Analysis .Final

    24/44

    Estimate the Discriminant FunctionCoefficients and determine the statisticalsignificance and validity - Choose theappropriate discriminant analysis method. Thedirect method involves estimating thediscriminant function so that all the predictorsare assessed simultaneously. The stepwisemethod enters the predictors sequentially. Thetwo-group method should be used when thedependent variable has two categories or

    states. The multiple discriminant method isused when the dependent variable has three ormore categorical states. Use Wilkss Lambdatotest for significance in SPSS or F stat in SAS.

    The most common method used to test validityis to split the sample into an estimation oranalysis sample, and a validation or holdout

  • 8/9/2019 disriminanant Analysis .Final

    25/44

    The estimation sample is used in constructing thediscriminant function. The validation sample is used toconstruct a classification matrix which contains thenumber of correctly classified and incorrectly classifiedcases. The percentage of correctly classified cases is

    called the hit ratio.

    Plot the results on a two dimensional map, define thedimensions, and interpret the results. The statisticalprogram (or a related module) will map the results.

    The map will plot each product (usually in twodimensional space). The distance of products to eachother indicate either how different they are. Thedimensions must be labelled by the researcher. Thisrequires subjective judgement and is often verychallenging.

  • 8/9/2019 disriminanant Analysis .Final

    26/44

    SOCIAL SCIENCES-

    Prediction of Elections:

    In this case the variables can be various social and economicfactors,

    coupled with party effort parameters. Some of these variables canbe as follows

    (1)No. of new projects implemented by incumbent party

    (2)No. of candidates in fray

    (3)National reach of the party (no .of states active in)

  • 8/9/2019 disriminanant Analysis .Final

    27/44

    (4)SEC division of the Electorate (in form of ratios)

    (5)Profession wise division of the Electorate

    (6)Age wise division of the Electorate.

    The variables mentioned above are few of the representative parametersthat might have a bearing on the coming elections. Nowadays another

    important parameter is the result of exit polls, which are conducted by

    various media agencies. They provide the general expectations of the

    electorate in view.

  • 8/9/2019 disriminanant Analysis .Final

    28/44

    Outcome of terrorist attacks with hostages:

    With the increasing occurrences of terrorist attacks, it becomesvery important for the law and order enforcing body andgovernments to ensure minimal collateral damage during rescueoperations. Lot of times it can be prudent to predict the

    possibility of such an operation going bad i.e. casualty whilerescue. Research on this front has already been initiated. Thebasic hypothesis is based on the fact that various variables maybe good predictors of the safe release or execution of thehostages. Some of these variables are as follows-

  • 8/9/2019 disriminanant Analysis .Final

    29/44

    Contd..

    (1)Number of terrorists(2)Strength of their support in the local population(3)Number of weapons and amount of ammunition with the terrorists(4)Type of weapons wielded by the attackers(5)Ratio of terrorists to hostages(6)Whether the terrorists are independent operators or they belong tosome large scale terrorist outfit(7)Time since the hostages were taken(8)Female/male ratio among the hostages(9)Children/adults ratio among the hostages

    A careful training with past cases can help the government take a decisionon whether to use force or negotiations to neutralize the terrorist threat.

  • 8/9/2019 disriminanant Analysis .Final

    30/44

    MEDICINE AND DIAGNOSTICS

    The application of multivariate analysis, and especiallydiscriminant analysis ,to the study of trace elements in food andenvironmental fields has been largely used in various occasions.In the clinical field, Discriminant analysis has been tentatively

    used to improve the predictive value of tomography images indifferential diagnosis between AD and frontotemporal dementia.Similarly, the need for non-invasive, specific and sensitive testled to study whether levels of some proteins considered markersof neuronal degeneration were useful to discriminate between

    patients and control groups.

  • 8/9/2019 disriminanant Analysis .Final

    31/44

    Hepatitis Disease Detection

    Research has been going in this domain. The basic diagnosticflowchart follows. Here LDA is useful in determining the mostimportant features impacting the advent of the disease. Once thereduction is done, the actual classification is done through a

    fuzzy network based classifier. Here the LDA is like a dataconditioning function, instead of being a predictor. Diagram

  • 8/9/2019 disriminanant Analysis .Final

    32/44

  • 8/9/2019 disriminanant Analysis .Final

    33/44

  • 8/9/2019 disriminanant Analysis .Final

    34/44

    Contd..

    The study hence conducted attained 94.16% accuracy indetection on Hepatitis, which is very high. This would help quickmedication and hence recovery for the patient.

  • 8/9/2019 disriminanant Analysis .Final

    35/44

    INSURANCE COMPANIES

    Insolvency prediction (Case study on Spanish Banks)

    Unlike other financial problems, there are agreat number of agents facing business failure, soresearch in this topic has been of growing interest

    in the last decades. Insolvency, early detection offinancial distress, or conditions leading toinsolvency of insurance companies have been aconcern of parties such as insurance regulators,investors, management, financial analysts, banks,auditors, policy holders and consumers. Thisconcern has arised from the necessity ofprotecting the general public

  • 8/9/2019 disriminanant Analysis .Final

    36/44

    against the consequences of insurersinsolvencies, as well as minimizing the costsassociated to this problem such as theeffects on state insurance guaranty funds or

    the responsibilities for management andauditors. It has long been recognized that there needsto be some form of supervision of such entities toattempt to minimize the risk of failure. Nowadays,Solvency II project is intended to lead to the reform ofthe existing solvency rules in European Union. Manyinsolvency cases appeared after the insurance cycles ofthe 1970s and 1980s in the United States and inEuropean Union.

    Contd..

  • 8/9/2019 disriminanant Analysis .Final

    37/44

    Contd..

    Several surveys have been devoted to identify the main causes ofinsurers insolvency, in particular, the Mller Group Report(1997) analyses the main identified causes of insuranceinsolvencies in the European Union. The main reasons can besummarized as follows: operational risks (operational failurerelated to inexperienced or incompetent management, fraud);underwriting risks (inadequate reinsurance programme andfailure to recover from reinsurers, higher losses due to rapidgrowth, excessive operating costs, poor underwriting process);

    insufficient provisions and imprudent

  • 8/9/2019 disriminanant Analysis .Final

    38/44

    Contd..

    investments. On the other hand, many insurance companies,specially larger companies, have developed internal risk modelsfor a number of purposes. There is an absence of suchstandardized systems in Spain, where most insurance companieshave internal check mechanism to predict insolvency.

    A recent study by academicians from Madrid performed a LDAto predict insolvency of Spanish banks using historical data from72 banks. The data was collected 1,2,3 years prior to theinsolvency. Some of the results of the study are as given below.

    click

  • 8/9/2019 disriminanant Analysis .Final

    39/44

    , ,Here Model 1 2 and 3 are predictors with data 1, .2 and 3 years prior to insolvency respectively

  • 8/9/2019 disriminanant Analysis .Final

    40/44

    : ,Table List of Financial Ratios used as variables for the Predictormodel

    &BDM DM

  • 8/9/2019 disriminanant Analysis .Final

    41/44

    :Table Final Results of the LDA performed in the three models,From the above results we see that the LDA model was probably not the

    ,best model to apply here as the accuracy was very low and only

    slightly. .better than 0 5 probability in the case of the test cases Maybe some

    other.high level classification method would work better here

    &BDM

  • 8/9/2019 disriminanant Analysis .Final

    42/44

    In short, Discriminant Analysis is avery useful tool (1) for detecting thevariables that allow the researcher to

    discriminate between different(naturally occurring) groups, and (2)for classifying cases into different

    groups with a better than chanceaccuracy.

    CONCLUSIONS-

  • 8/9/2019 disriminanant Analysis .Final

    43/44

    Reference

    www.wikipedia.com

    www.books.google.co.in

    www.resample.com

    www.statsoft.com

    www.faculty.chass.ncsu.edu

    www.eso.org

    http://www.wikipedia.com/http://www.books.google.co.in/http://www.resample.com/http://www.statsoft.com/http://www.faculty.chass.ncsu.edu/http://www.eso.org/http://www.eso.org/http://www.faculty.chass.ncsu.edu/http://www.statsoft.com/http://www.resample.com/http://www.books.google.co.in/http://www.wikipedia.com/
  • 8/9/2019 disriminanant Analysis .Final

    44/44