Intro Multivariate Stats(Lecture10)

download Intro Multivariate Stats(Lecture10)

of 30

Transcript of Intro Multivariate Stats(Lecture10)

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    1/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    2/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    3/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    4/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    5/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    6/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    7/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    8/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    9/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    10/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    11/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    12/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    13/30

    For example, a client may be interested inunderstanding the effect of price and promotionalactivity on a products market share among bothloyal and not loyal customers Technical result is a linear model of the form

    Y = a0 + a1X1 + a2X2 + +anXn

    Best visualizations of the results control all but one(or two) of the independent variables and examinehow the value of dependent variable changes withrespect to the free independent variables

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    14/30

    Market share for loyal customers

    Promotion Index

    80 70 60 50 40 30 20 M a r k e

    t S h a r e

    60

    50

    40

    30

    20

    10

    0

    Promotion Index

    80 70 60 50 40 30 20 M a r k e

    t S h a r e

    60

    50

    40

    30

    20

    10

    0

    Market share for not-loyal customers

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    15/30

    Properties Single interval scale dependent variable Multiple independent variables, preferably on interval scale Familiar and useful techniqueIssues Assumes linear relationship between dependent and

    independent variables Overused and often assumptions not fully checked Often misapplied to classification problems

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    16/30

    Logistic Regression is a dependence techniquesused to model the relationship between a singlecategorical dependent variable and a set of metric

    independent variables Typically dependent variable takes one of two values

    success/failure, buy/do not buy Multinomial formulations

    A logistic model gives the probability that thedependent variable takes a target value given thevalues of the independent variable

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    17/30

    For example, which credit and demographicfactors best predict whether a customer will

    keep a loan current Dependent variable taken as 60 days past due or

    worse Independent variables are credit and employment

    history, and demographic descriptors

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    18/30

    Properties Powerful technique for predicting group membership and

    identifying important independent variables Becoming more widely used Procedures and results similar to linear regressionIssues

    Adequate data Model validation Communicating probabilistic concepts

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    19/30

    Decision trees are a dependence technique used todevelop a model to classify the value of a singledependent variable based on a set of independentvariables Dependent and independent variables can be any data

    typeThe typical product of CART is a straightforward,easily interpretable set of segmentation rules For example, classify existing customers as high or low

    likelihood buyers of a new product based ondemographics and historical purchasing behavior.

    Classification could be used to focus advertising campaign

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    20/30

    Decision trees can be also used to examineprofiles of different market segments with

    respect to underlying demographic andpsychographic variables

    For example, what are the most significant

    demographic variables determining whether theInternet is a persons most important informationsource?

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    21/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    22/30

    Properties Single dependent variable of any scale Multiple independent variables of any scale Free of model assumptions typical in other dependencetechniques Powerful statistical learning algorithm able to identify

    complex variable interactionsIssues Not as familiar Standard inferential statistics not applicable Often leads to asymmetric relationships

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    23/30

    Factor analysis is an interdependence techniqueused to identify a set of underlying latent traits(factors) that explain the correlations between alarge number of variables Data summarizing

    Derive a set of underlying concepts that summarize a larger setof variables

    Data reduction Develop a set of factor variables that serves as a more

    parsimonious description of the data

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    24/30

    Interested in defining underlying dimensionsinfluencing the perception of online destinations Survey respondents are asked to rate a set of destinations

    (including clients) with respect to a number of traits Factor analysis can be applied to develop a succinct set of

    perception dimensions This manageable set of dimensions can be used to

    characterize a clients site and to develop a focused plan toreposition it

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    25/30

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    26/30

    Properties Very useful in identifying structure and relationships in

    data Provides tractable set of concepts for both managerial

    and analytical uses Provides opportunities for visualizationsIssues Questionnaire design Variable selection Factor interpretation and validity

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    27/30

    Cluster analysis is an interdependence techniqueused to segment cases into homogeneousgroups based on a specified set of variables Data reduction

    Develop a more parsimonious description of cases which canthen be used in analytical classification methods

    Identify similarities between cases with respect to

    clustering variables Characterize clusters with respect to other sets ofvariables

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    28/30

    Want to identify and then characterize similargroups of TV pilot shows based on survey responsesrating shows on various traits For one or two traits it may be possible to do this

    subjectively. Cluster analysis provides anobjective method for multiple traits

    Clusters can be characterized with respect to variables notused in the analysis, such as show success, and clustermembership can be used as a dependent variable inclassification method

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    29/30

    Cluster 1: Low likelihood of success

    Cluster 2: Moderate likelihood of success

    Cluster 3: High likelihood of success

    CLEVER

    6050403020

    60

    50

    40

    30

    20

    Wanda at

    Tick2

    The Pitt

    The Grub

    Ruling C

    Oliver B

    Normal P

    Normal O

    NationalNathan's

    Msgr. Ma

    More Pat

    Live Gir Ground2

    Greg the

    College

    Cedric

    Bernie MBecoming

    Beat Cop Andy Ric

    12

    3

  • 8/13/2019 Intro Multivariate Stats(Lecture10)

    30/30

    Properties Many cluster techniques are available for data of all scales Can identify structure in large data sets that may be

    difficult to discover in any other way Provides objective segmentation methodIssues Selecting appropriate clustering method

    Determining appropriate number of clusters Validating clusters