Chapter_14 Advanced Regression Models
Transcript of Chapter_14 Advanced Regression Models
-
7/30/2019 Chapter_14 Advanced Regression Models
1/49
557
Chapter 14
ADVANCED REGRESSION MODELS
Raghuram Iyengar, University of Pennsylvania
Sunil Gupta, Columbia University
Introduction
The previous chapter covered the basics of the powerful and yet simple technique of
Ordinary Least Squares (OLS). It was noted that the mathematical relationship between the
dependent variable for an observation yt at time t and a vector of independent variables xt can be
written in the following manner.
yt = xt + t (1)
Here, xt is the transpose of vector xt and is a vector of parameters. Also, yt is continuous from -
to and t is the random error that is typically assumed to be normally distributed.
Several scenarios fit the assumption of a continuous dependent variable that ranges from
- to . In cases when yt is strictly positive (e.g. sales), we can transform it as ln(yt) to make it
lie between - to and continue to use OLS. But what happens if the dependent variableis
discrete (e.g. buy / no buy) or choice of a brand (e.g., Brand A, B, C or D) and we want to
analyze the effect of brand prices on these decisions? The purpose of this chapter is to show
methods that can be used in such scenarios.
We begin with Discriminant Analysis. This is followed by a discussion of logistic
regression and the multinomial logit model. Thereafter, we focus on the multinomial probit
model. The chapter ends with a discussion on Tobit models.
Discriminant Analysis
-
7/30/2019 Chapter_14 Advanced Regression Models
2/49
558
Consider the following example where a dependent variable is binary a buy / no buy
decision. A company that has introduced a product in the market wishes to describe the people
that are buying its product. Figure 14.1 shows the demographic information that the company has
together with the purchasers (P) and non-purchasers (N). The figure suggests that purchasers of
this product are older and richer. Thus, age and income discriminate among the purchasers and
non-purchasers. However, it is not clear which of the two variables is more important and how
we can predict a new person to be a purchaser or non-purchaser based on his/her income and age.
Such questions can be answered by using discriminant analysis.
[Figure 14.1 about here]
Discriminant analysis is a method to analyze which independent variables discriminate
among groups and to classify observations into predetermined groups based on these variables.
These predetermined groups can be either binary (eg., buy or no buy) or more than two. In the
latter case, the analysis is termed as multiple discriminant analysis. For the sake of simplicity, we
begin with a two-group discriminant analysis.
In a discriminant analysis, an index is built using the measured characteristics as the
independent variables. Thus for an observation at time t,
ft = x1t1 + x2t2 + x3t3 + + xKtK= xt (2)
Here, ft is the index. It is also called the discriminant function. There are K measured
characteristics (x1t, x2t, ..., xKt). The vector xt is the transpose of vector xt that contains these K
variables. There are also K parameters (1, 2,, K), which are the weights corresponding to
these variables. These weights are also termed as the discriminant coefficients.
The goal of discriminant analysis is to estimate weights such that the index values for the
two groups are as far as possible. In other words, the weights are derived such that the variation
-
7/30/2019 Chapter_14 Advanced Regression Models
3/49
559
in f scores between the two groups is as large as possible, while the variation in the f scores
within the groups is as small as possible. That is, the weights are derived so that the following
ratio is maximized.
VariationGroup-Within
VariationGroup-Between(3)
Maximizing the above ratio makes the two groups as distinct as possible with respect to the
index values. More mathematically oriented readers can see Chapter 11 of Johnson and Wichern
(2002) for a description of how the above quantity is maximized.
Discriminant analysis is related to and yet is distinct from linear regression. In both
methods, there is a weighted linear combination of independent variables that is used to predict a
dependent variable. Also, like linear regression, discriminant analysis suffers from
multicollinearity of the independent variables. The primary difference between the two methods
is that in a linear regression, the dependent variable is typically assumed to range from - to
whereas in a discriminant analysis, the dependent variable is group membership i.e., is discrete.
For an application where the group membership is in two groups, a linear regression can be run
with a dummy variable representing group membership as the dependent variable. The estimates
from such a regression will be proportional to the weights that are obtained from a discriminant
analysis. When the number of groups is, however, greater than two, then a regression will not
yield the same results.
Discriminant analysis is also different from cluster analysis (see Chapter 18, this book).
In discriminant analysis, the groups are predetermined and the analysis is focused on which
variables best discriminate among these groups. In a cluster analysis, the group memberships are
unknown and the focus of the analysis is to form these groups.
-
7/30/2019 Chapter_14 Advanced Regression Models
4/49
560
Consider the following example of a two-group discriminant analysis. Table 14.1
contains the data on the fifty US states and they are broken down into two groups 15 states that
are South and 35 that are Non-South (Lehmann, Gupta, & Steckel, 1997). These groups are
compared on observable characteristics such as income, population and others. A univariate F-
test compares the differences in means across the two groups on each of the independent
variables. The big differences between the two groups appear to be in income, tax per capita and
mineral production.
A discriminant analysis was run and Table 14.1 contains the discriminant coefficients.
These are the weights of the independent variables. Another column shows standardized
discriminant coefficients. These coefficients are similar to the standardized regression
coefficients in an OLS regression. They correct for any scale issues associated with the
independent variables. We can calculate these coefficients by first standardizing the independent
variables and then running a discriminant analysis or by first running a discriminant analysis and
then multiplying each discriminant coefficient by the standard deviation of the respective
independent variable. Both methods yield standardized coefficients and these can be used to
ascertain how a change of one standard deviation in each independent variable will affect the
discriminant function.
[Table 14.1 about here]
From the estimated unstandardized coefficients, we find that population is most important
variable for discrimination followed by average income. Upon standardizing the variables, we
observe a different set of variables that are important. We find that while population is still the
most important, college enrollment and manufacturing output are clearly more relevant for
-
7/30/2019 Chapter_14 Advanced Regression Models
5/49
561
discrimination among the states than is average income. Thus, a failure to account for differences
in scale can lead to erroneous conclusions about the relative importance of variables.
Measures of Fit
There are several measures of fit that are used to analyze how good is the model for
discrimination.
Chi-Squared Value
A Chi-Squared value tests whether overall the variables help discriminate among the two
groups. This is very similar to the F-test for overall significance in a regression setting. Here, the
Chi-Squared value is 42.71. For testing the significance, we look at the critical value for 11
degrees of freedom (the number of independent variables). This value is 31.3 at the 0.001 level.
Thus, the variables clearly help in discrimination.
Canonical Correlation
The canonical correlation is the correlation resulting from a regression of the independent
variables on a dummy dependent variable. Its squared value is the R2from this regression. In this
example, the canonical correlation is 0.80. Thus, the R2 is 0.64.
Wilks Lambda
Wilks Lambda is the ratio of within-group variance to the total variance. Here, it is
essentially 1- R2. Thus, the Wilks Lamda is 1-0.64 = 0.36.
The Hit-Miss Table
The Hit-Miss table provides an indication of how good is the discriminant function in
classifying observations. Table 14.2 is such a hit-miss table. Here we find that 32 out of the 35
non south states and 14 out of the 15 south states are correctly classified. Thus, the overall
classification rate is (32+ 14)/ 50 i.e., 92 %.
-
7/30/2019 Chapter_14 Advanced Regression Models
6/49
562
[Table 14.2 about here]
Multiple Discriminant Analysis
A multiple discriminant analysis is carried out when the observations are preclassified
into more than two groups. The basic idea is first to find a single function that spreads all groups
as far apart as possible. Then, a second function is found that best explains any differences
among groups and so on. If there are K groups, then K-1 discriminant functions are found.
To illustrate multiple discriminant analysis, we consider an example described in
Lehmann, Gupta, and Steckel (1997). In this example, there are five groups of consumers
depending on how much they spend in dollars on their monthly expenditure for food. Table 14.3
shows the five groups together with the averages of the independent variables. The means appear
to indicate the larger spenders are more educated, are younger, have higher incomes, have larger
family sizes and shop more extensively. Table 14.3 also shows F tests for the variables for the
significance of differences among the five groups. These tests suggest that family size and
income are the most important (i.e. have the highest F value).
[Table 14.3 about here]
A discriminant analysis is run. In the analysis, a few variables are dropped as they do not
contribute to discrimination among the groups. We then obtain the standardized and
unstandardized discriminant coefficients. As there are 5 groups, we have 5-1=4 discriminant
functions. Table 14.4 shows the unstandardized and standardized coefficients. The discriminant
functions are ranked according to their usefulness for discrimination. In other words, the first
function is the most important for discriminating amongst the five groups; the second one is the
second most important and so on.
[Table 14.4 about here]
-
7/30/2019 Chapter_14 Advanced Regression Models
7/49
563
From the results on the standardized coefficients, we find that the most important
variables in the first function are family size, income and how often they shop. The second
function is related to age and family size. Table 14.5 gives the group means for the groups based
on the four discriminant functions. Figure 14.2 plots these means for the first and the second
functions. We can see that there is a big spread of the means of the five groups along the
horizontal axis (Function-1) and less so along the vertical axis (Function 2).
[Table 14.5 about here]
[Figure 14.2 about here]
Measure of Fit
As a measure of fit of the model, we use the hit-miss table. Table 14.6 is such a hit miss
table for the five categories. From the results, we see that the overall classification rate is
(20+106+90+84+40)/ (34+284+293+181+61) *100 = 39.86 %.
[Table 14.6 about here]
Discriminant analysis rests on two statistical assumptions. One, the independent variables
are assumed to be jointly normally distributed and two, the covariances are assumed to be the
same across all groups. When these assumptions are violated then the statistical interpretation of
the results becomes very difficult. For instance, while in practice, dummy variables are
frequently used as independent variables, in theory it is a problem. This is because if a dummy
independent variable is used then the independent variables are not normally distributed. To
alleviate such statistical difficulties, the method of logistic regression is used. We motivate this
method with a managerial problem that all direct marketers face.
Logistic Regression
-
7/30/2019 Chapter_14 Advanced Regression Models
8/49
564
Catalog companies regularly keep track of Recency, Frequency and Monetary (RFM)
variables. There is an interest in relating these RFM measures to purchase behavior a buy / no
buy decision. These measures can then be used for predicting purchase and for making any
strategic intervention decisions to increase retention. Table 14.7 contains the summary statistics
of such a data where Recency is measured in months since last purchase and Monetary is in
dollar amount. Choice is a variable which takes a value 1 if a consumer made a purchase and
0 if she did not.1
[Table 14.7 about here]
One strategy for estimating the relationship between choice and the RFM measures
would be to use an OLS with choice as the dependent variable (yt) and RFM measures as the
independent variables (xt). Table 14.8 shows the results for the OLS regression. The results
suggest that Recency and Monetary are significant whereas Frequency is not. Further, the R2 is
about 0.61 and the adjusted R2is around 0.60. Despite the high R
2, OLS is not appropriate for
several reasons. Figure 14.3 plots the predictions of Choice and the true value of Choice for the
100 data points. We see that there are instances where the predictions for choice are either less
than zero or greater than one! While this is not surprising given that OLS assumes that the
dependent variable is continuous between - to , in the current context these predictions are
clearly inconsistent with the data. For instance, how do we interpret a prediction of 1.32 and
compare it with a prediction of 1.82? Are both indicating a purchase decision i.e. should we
assume both are just 1 (buy)? Similarly, it is not clear how to interpret a prediction of -0.18 when
a value of 0 reflects no purchase. This example shows that when an assumption of the OLS
technique (in this case, the continuous distribution of the dependent variable) is violated, its
results cannot be interpreted.
-
7/30/2019 Chapter_14 Advanced Regression Models
9/49
565
[Table 14.8 about here]
[Figure 14.3 about here]
The dependent variable in the above example is discrete. Such choice scenarios are
extremely common. For instance, pharmaceutical companies are interested in predicting whether
a physician would prescribe their drug or not and the factors that might increase the prescription
rate. Similarly, managers in industries with an online presence are interested in identifying
factors that can predict which consumers will purchase online (Bellman, Lohse, & Johnson,
1999). While these questions can also be addressed by discriminant analysis, there are other
scenarios such as when the dependent variableis market share (i.e. lies between 0 and 1) and we
want to quantify the effect of price and promotions on it, which needs a different method that can
accommodate such responses.
Model for Logistic Regression
A logistic regression analysis begins with a dependent variable, which is either discrete
(eg. buy / no buy) or lies between 0 and 1 (eg. market share). If we are modeling a discrete
decision such as buy / no buy then we specify the probabilities of the two possible events i.e.
P(Buy) and P(No Buy). As P(Buy) and P(No Buy) are probabilities, they are between 0 and 1
and they should sum up to 1. Next, we revisit the example with the discrete choice (buy / no buy)
and RFM measures that we discussed earlier. We then briefly discuss how the same framework
can be applied for analyzing market shares.
In the RFM example, the two events are purchase and no purchase. Using the measures
of P(Buy) and P(No Buy), we can specify the odds of buying as
P(Buy))(1
P(Buy)
=
P(NoBuy)
P(Buy)Odds(Buy) = (4)
-
7/30/2019 Chapter_14 Advanced Regression Models
10/49
566
The odds of buying are constrained between 0 and + and take a value 1 if both outcomes are
equally likely i.e. P(Buy) = 0.5 and P(No Buy) = 0.5. We can make the odds lie between - and
+ by taking the natural log transform. Thus,
)P(NoBuy)
P(Buy)Log(uy))Log(Odds(B = (5)
As log odds lie between - and +, we can relate it to any independent variables and
interpret the effects of the variables in a manner similar to that in OLS; only now the effect of the
variables would be on the log odds of the dependent variable. Thus, we can write the following
equation relating the log odds of purchase for an observation t with the independent variables
(xt) as.
Log(Odds(Buy))t = xt (6)
This can be rewritten as
)).xexp(1/(1P(Buy) tt += (7)
Recall that P(Buy) is the probability of a purchase and hence should always be between 0
and 1. The above expression ensures that this will be the case irrespective of the values of the
covariates.
We can now use the above model for our example. Table 14.9 shows the results of two
logistic regression models using Maximum Likelihood Estimation (MLE) the intercept only
model, where xi contains only the intercept and the full model, where xi contains the intercept
and the RFM variables. The results of the full model show that the RFM variables are significant.
Further, an increase of 1 month in Recency causes an increase of 3.34 in the log odds of Buying.
We can also calculate the effect on the odds of buying. This would be exp(3.34) or 28.28 i.e., the
effect of increasing the Recency by 1 month increases the odds of buying by 28.28. A similar
-
7/30/2019 Chapter_14 Advanced Regression Models
11/49
567
analysis can be done for the other variables. Note that the RFM estimates are close to the true
values of the sensitivities (see Footnote 1). Also note that the frequency sensitivity is significant
in this analysis while it was not so using OLS. Thus, OLS can mask the true relationship between
variables and its results can lead to erroneous interpretations for cases when the dependent
variable is not continuous.
[Table 14.9 about here]
Figure 14.4 plots the predicted probabilities of Buying with the true value of Choice.
Notice that, in contrast to the predictions of the OLS regression (Figure 14.3), all predictions lie
between 0 and 1. Also, unlike the case of the OLS regression, a higher predicted value has the
interpretation of a higher probability of purchase. To see how this probability of purchase varies
with a change in one of the covariates, see Figure 14.5. In this figure, we plot the predicted
probability of purchase with change in frequency. For generating this figure, we fixed the
recency and monetary variables at their average values. The figure shows that probability of
purchase has an S shape curve when the frequency increases.
[Figure 14.4 about here]
[Figure 14.5 about here]
In the above example, we modeled the purchase decision and then related it to RFM
measures. As the purchase variable was discrete, we specified the probability of purchase and no
purchase measures that lie between 0 and 1. We then specified the odds of purchase and took a
log transform to make it lie between - and +. We can apply the above framework to analyze
market shares (MS) as well. For instance, a brand manager might want to quantify the effect of
the region-specific prices and promotions on the market share in these regions. In this case, we
-
7/30/2019 Chapter_14 Advanced Regression Models
12/49
568
can begin the analysis by directly specifying the odds of market share since it already lies
between 0 and 1. Thus,
'ttt x)
MS1
MSLog(S))Log(Odds(M =
= (8)
Here, for a region t, the vector xt will contain the prices and promotions for that region.
Measures of Fit
There are several measures of model fit that are used for testing the suitability of logistic
regression models. Most of these measures are based around the log-likelihood measure, which
is as follows.
LL() = t
)Ln(Lt (9)
Here, is the entire set of MLE parameters (intercept and the other explanatory variables).
Likelihood Ratio Test
The most commonly used likelihood ratio test has the following test statistic:
-2(LL(C ) - LL()) (10)
Here, LL(C ) refers to the likelihood of the data when only an intercept model is run. Suppose
there are K covariates in the model (including the intercept) then the above statistic is distributed
2 with K-1 degrees of freedom (Theil, 1971). Thus, the test statistic measures whether the
increase in the likelihood caused by the inclusion of the explanatory variables (over and above
the intercept) is significantly better than the likelihood from a model containing only the
intercept.
-
7/30/2019 Chapter_14 Advanced Regression Models
13/49
569
In our example, -2 LL() is 30.489 while -2LL(C ) is 137.628. Thus, the test statistic
takes a value of 107.139. The degrees of freedom are 4-1=3. The critical value of a 2 with 3
degrees of freedom at the 0.001 level is 16.26. Thus, the likelihood of a model that has the RFM
measures is significantly better than a model with just the intercept.
Akaike Information Criterion (AIC)
AIC provides a way of adjusting the log-likelihood of a model for the number of
parameters in the model. This adjustment corrects for over fitting of the data. The expression for
this statistic is as follows.
AIC = -2 LL() + 2K (11)
Here, K is the dimension of . Lower values of AIC denote a better model. Thus, a model with
very large number of variables might have a low likelihood but it will also be penalized for the
number of variables.
In our example, we can calculate the AIC with the intercept only model (AICint) and the
AIC associated with a model containing the intercept and RFM measures (AICfull). These are as
follows.
AICint = 137.628 + 2(1) = 139.628, (12a)
AICfull = 30.489 + 2(4) = 38.489. (12b)
Thus, the full model has a better (i.e. lower) AIC as compared to the intercept only model.
Likelihood Ratio Index (2)
The likelihood ratio index is similar to the R2 in the regular regression models. It is
described as follows.
-
7/30/2019 Chapter_14 Advanced Regression Models
14/49
570
2
= 1 LL() /LL(C ) (13)
Here, LL() is -15.24 (= -30.48/2) and LL( C ) is 68.81 (= 137.628/2). Thus, the value of2 is
0.78.
As the R2, the 2 of a model will always increase or atleast stay the same when new
variables are added. There is another statistic, the adjusted likelihood ration index ( 2 ) that
penalizes for the increase in the number of parameters. This statistic is similar to the adjusted R2 .
2 = 1- (LL() -K)/(LL(C )-1) (14)
In our example, this statistic will be the following.
2 = 1- (15.24+4)/(68.81+1) = 0.72 (15)
Hit Rate
Another measure that is typically used to test the fit of a model is the hit rate. For
computing this measure, we take the predicted probabilities of the events from the logistic
regression and employ a cut off value for making discrete predictions for the occurrence of an
event. We then compare the predicted events with the actual events to determine the percentage
of times in the dataset the two are the same.
In our example, the two events are buy / no buy. The results from the logistic regression
estimation provide the probability of purchase. We put a cut-off at 0.5 i.e. for an observation if
the predicted probability of purchase is above 0.5, then we predict a purchase for that
observation else we predict a no purchase. We then compare these predictions with actual events.
We find that, using the full model, we correctly classify 94 out of the 100 observations. Thus, the
hit rate is 94 %.
The measure of hit rate as a statistic for model accuracy has a few limitations. First, the
cutoff is arbitrary. Here we took a cut off of 0.5. We could have chosen any other cutoff value as
-
7/30/2019 Chapter_14 Advanced Regression Models
15/49
571
well. Second, the hit rate is not very useful when the data is skewed. Suppose we have a dataset
where there are many observations with no purchase and few observations with purchase. Then a
model that predicts no purchase for all observations will do well on the hit rate.
In most applications, the data is also typically split into a calibration sample and a hold
out sample. The model is estimated on the calibration sample and then is used to predict the
observations in the hold out sample. Almost always, the hit rate within the hold out sample is
lower than the hit rate within the calibration sample.
Thus far, we have considered instances when the dependent variable is binary (or is
between 0-1, e.g. market share) and logistic regression is readily applicable. There are also
scenarios where the dependent variable can take multiple values. For instance, in the
antihistamine category, there are 4 major drugs - Claritin, Zyrtec, Allegra and Clarinex. A doctor
might prescribe one of these drugs to a patient. It is of much interest to pharmaceutical
companies to quantify the factors which can predict when a doctor is most likely to prescribe
their drug. Analysis of situations that have a multinomial dependent variable is not possible with
a logistic regression. Next, we describe a method that can analyze such situations.
Multinomial Logit Model
Consider the case of a consumer packaged goods manufacturer in the grocery industry.
The company is interested in predicting which brands their customers will choose on a shopping
occasion and how prices and promotions might affect this choice. For example, Figure 14.6
shows the variation in market share of a brand with changes in promotion. In this figure, we find
that there is an increase in market share (shown in blue) whenever there is a dip in prices (shown
in red). Further, the presence of various promotional vehicles such as feature, display and
coupons affects these shares. A quantitative analysis of such a problem can help retailers
-
7/30/2019 Chapter_14 Advanced Regression Models
16/49
572
understand the effect of brand promotions (Gupta, 1988), aid in appropriately setting retail prices
and determine the product portfolio that they should carry (Draganska & Jain, 2005). A
multinomial logit model is the most popular model to analyze such scenarios. Next, we develop
this model within a random utility framework.
[Figure 14.6 about here]
Random Utility Theory
Assume that a consumer assigns a level of attractiveness to each discrete alternative in
her choice set. This attractiveness number for an alternative, a single index, conveys how much
the consumer likes that alternative. Thus, all the information present in the attributes of the
alternative is collapsed into this single index. This alternative-specific index is typically called
utility.
For an alternative j and time t, we will specify the utility (Ujt) to be composed of two
components. One component is called the systematic component (denoted by Vjt). This is
deterministic and contains the effects of covariates on the utility. The second component is called
the random component. This contains any other random factors that affect consumers choice.
Thus,
Ujt = Vjt + jt (16a)
or,
Ujt = x jt + jt (16b)
Here, for time t, xjt contains covariates associated with alternative j and is a vector of
parameters.
We assume that decision makers choose the alternative that gives them the maximum
utility. Also, for all alternatives the random componentsare independent and identically Gumbel
-
7/30/2019 Chapter_14 Advanced Regression Models
17/49
573
distributed. This particular choice of the error distribution leads to the following expression for
the probability of choice of an alternative j out of the possible J alternatives in a choice set.
(17)
The above expression is intuitive to understand. The numerator can be interpreted as the strength
of alternative j while the denominator is the sum of the strengths of all alternatives. Thus, the
probability expression essentially is the relative strength of alternative j. For a detailed
description of how this probability expression is attained from the assumptions of the error
distributions, see Ben-Akiva and Lerman (1985) or Train (2003).
The above expression also shows that the logistic regression model is a subset of the
multinomial logit model with binary outcomes. Thus, we can also arrive at the expressions for
the probabilities of the logistic regression by beginning with a random utility specification for the
binary outcomes.
We can apply the above model to an example from grocery industry. The data for this
example, made available by A.C. Nielsen, was collected during January, 1993 to May, 1995. We
use a sample of 300 people that purchased in the Breakfast Foods category. There are four major
brands in this category.2 For each brand, we have the price and promotion variation over time,
which enter the vector xjt. In this application, promotion is a dummy variable created by
combining various promotional vehicles such as feature and display. Table 14.10 shows the
summary statistics for the Breakfast Foods data.
== j)P(Choice
=
J
k
e
e
1
kt
jt
x
x
-
7/30/2019 Chapter_14 Advanced Regression Models
18/49
574
We estimate the parameters of the multinomial logit model using MLE on this data. Prior
to looking at the results, a set of identification conditions have to be discussed. These are
restriction conditions that must be imposed such that the model is identifiable i.e. only one set of
parameters will be maximizing the likelihood. The restriction corresponds to setting any one of
the brand intercepts to be zero. This is because only differences in utility matter in specifying
which brand a consumer will choose. This can be seen from the following illustration. Suppose
the four brands have the following utilities: U1t=10, U2t=20, U3t=25 and U4t=30. Then, a
consumer will choose Brand 4 as that has the maximum utility. Now, suppose we add 5 units to
each brand-specific utility. Then, the utilities for the four brands will be the following: U1t=15,
U2t=25, U3t=30 and U4t=35. This addition of 5 units will not change the chosen brand. A
consumer will still choose Brand 4. Thus, the absolute values of the utilities do not matter. It is
only the relative differences in the utilities among the brands that do. We will arbitrarily set the
intercept of Brand 4 to be zero. Thus, the intercepts of the other three brands will be interpreted
as being relative to Brand 4. This is similar to the interpretation of a dummy variable in a
regression.
[Table 14.10 about here]
Table 14.11 contains the MLE estimates of two multinomial logit models. The brand-
specific intercepts only model contains the estimates for a model that contains only the
alternative specific intercepts. The full model contains both the intercepts and the price and
promotion covariates. The estimates of the full model show that the coefficient of price is
negative (as it should be) whereas the coefficient for promotion is positive (again as expected).
While these results are intuitive, a more managerially relevant goal is to estimate the impact of
changing the price (or promotion) of brand j on the probability of choice of brand j as well as
-
7/30/2019 Chapter_14 Advanced Regression Models
19/49
575
on the probability of choosing any other brand k. A variable used for quantifying such effects is
elasticity.
[Table 14.11 about here]
Elasticity from the Logit Model
The systematic component for a brand j contains price and promotion. Thus,
.Prom*Price*V jtjtj prompricejt ++= (18)
Here, j is the intercept for alternative j, price is the price sensitivity and prom is the promotion
sensitivity. The elasticity of any dependent variable with respect to an independent variable is the
percent change in the dependent variable following a 1% change in the independent variable. As
an example, suppose the price of brand j is changed, then the own-price elasticity can be
ascertained by estimating the percent change in the probability of purchasing brand j after a 1%
change in its price. Similarly, the cross-price elasticity on a brand k can be evaluated by
considering the percent change in the probability of purchasing brand k following a 1% change
in price of brand j.
For the multinomial logit model, the expressions for the own-price and cross-price
elasticity are closed-form and are determined by the multinomial logit probabilities. These
expressions are as follows.
pricejj
j
Price)P(j)1(P(j)
Price
Price
P(j) =
=pricejj (19a)
pricejj
j
P(j)PriceP(k)
Price
Price
P(k) =
=pricekj (19b)
-
7/30/2019 Chapter_14 Advanced Regression Models
20/49
576
Here, pricejj denotes the own-price elasticity of brand j and reflects the percentage change in
probability of buying brand j with a 1% change in the price of brand j. And, pricekj is the cross-
price elasticity of brand k and reflects the percentage change in the probability of buying brand
k with a 1% change in the price of brand j is changed. Notice that the cross-price elasticity for
brand k does not depend on the attributes of brand k. Thus, the cross-price elasticity arising
from a change in brand j is the same for all other brands. This property, termed as uniform
cross-elasticity, is a consequence of the expression of the multinomial logit probabilities.
Table 14.12 contains the price elasticity measures for the full model. To estimate these
elasticity measures, we calculate the own and cross price elasticities for each brand and for every
observation. Then, we average these measures over all observations in the dataset. We can use
these numbers to interpret the impact of changing prices of a brand on own shares as well as
shares of other brands. We find that a 1% increase in the price of Brand 1 lowers the probability
of choosing Brand 1 by about 4.5 %. Similarly, a 1% increase in the price of Brand 1 increases
the probability of choosing the others brands by 0.85 %. A similar analysis can be conducted for
the other brands.
[Table 14.12 about here]
The elasticity measures also show an interesting property. From the summary statistics,
we know that Brand 3 has the highest share. If we now consider the elasticity measures, we
notice that Brand 3 has the lowest own-price elasticity and the highest cross-price elasticity. This
is a limitation of the elasticity measures resulting from the multinomial logit model i.e., high
market share brands show low own-price elasticity and high cross-price elasticity.
Note that we showed elasticity measures for the multinomial logit model. Similar
measures can also be calculated for the logistic regression model.
-
7/30/2019 Chapter_14 Advanced Regression Models
21/49
577
Fit Measures
In this application, we can calculate all the fit measures that we specified in the section
on logistic regression.
Likelihood Ratio Test
A typical likelihood ratio test involves comparing a model with only alternative specific
intercepts with a model where there are alternative-specific intercepts together with other
explanatory variables.
Let LL(C ) refer to the likelihood of the data when only intercepts are included in a
model while LL() denotes the likelihood when the model contains intercepts together with the
price and promotion covariates. In our example, from Table 14.11, -2 LL() is 3033.92 while -
2LL(C ) is 4321.88. Thus, the test statistic takes a value of 1287.96. The degrees of freedom are
5-3=2. The critical value of a 2 with 2 degrees of freedom at the 0.001 level is 13.81. Thus, the
likelihood of a model that contains the price and promotion covariates is significantly better than
a model without.
Likelihood Ratio Index (2)
The likelihood ratio index is described as follows.
2
= 1 LL() /LL(C ) (20)
In the current application, LL() is -1516.96 and LL(C ) is -2160.94. Thus, 2 is 0.30.
As explained earlier, the adjusted 2 has the following expression.
2 = 1- (LL() -K)/(LL(C )-P) (21)
Here, K is the total number of parameters including the intercepts and other covariates while P is
the number of intercepts. Thus, 2 is 0.29.
-
7/30/2019 Chapter_14 Advanced Regression Models
22/49
578
Hit Rate
In a multinomial logit model, the probabilities of choice of each alternative have a closed
form expression. The predicted probabilities for choosing each alternative can then be easily
calculated by inserting the MLE estimates in the probability expressions. We can then predict the
alternative that is most likely to be chosen (brand with the highest probability) and compare it
with the brand that is actually chosen. If the two are the same, we have a hit (i.e. a correction
prediction) else the prediction is wrong. We calculate the hit rate for the intercepts only model
and the full model. Table 14.11 reports these results. We find that the hit rate for an intercepts
only model is around 48.3 % while the hit rate for the full model is considerably higher at 63.2
%.
Independence of Irrelevant Alternatives (I.I.A.)
The multinomial logit model has several properties. One property that we discussed was
the uniform cross-elasticity. Another property that has been especially emphasized is that of
I.I.A. The property can be best illustrated by revisiting the expressions for the probabilities from
the logit model.
Suppose we consider the probability of choice of two alternatives, i and j, denoted by P(i)
and P(j) respectively, then,
x
x
x
x
x
x
)(
)(
jt
it
1
kt
jt
1
kt
it
=
=
=
=
e
e
e
e
e
e
jP
iP
J
k
J
k (22)
-
7/30/2019 Chapter_14 Advanced Regression Models
23/49
579
Equation (22) shows that the ratio of the probabilities of choosing two alternatives, i and j, is
independentof the presence of other alternatives and is only dependent on the systematic utilities
of the two alternatives. Thus, even if a new alternative very similar to i enters into the market,
it will not make a different in the relative probabilities of choosing i and j. This result is a
direct consequence of the independence assumption among the errors of the alternative-specific
utilities. This assumption can be pretty tenuous in many contexts. The following problem
illustrates one such context.
There is a famous problem, called the red bus/ blue bus problem, which illustrates the
I.I.A. issue. The problem is as follows. Suppose consumers are choosing between a car and a
blue bus as means of transportation and suppose they equally like both modes of transport. The
probability of choosing either a car or a blue bus is 0.5. In other words,
P(choose car) / P(choose blue bus) = 1. (23)
Recall, the I.I.A. property dictates that this ratio should remain the same irrespective of the
choice set. Now, suppose a red bus, similar to the blue bus in all respects except the color, is
introduced as a means of transport. Then, we would expect that consumers will be equally likely
to choose a red or a blue bus. This equality together with the above equality will imply the
following.
P(choose car) = P(choose blue bus) = P(choose red bus) = 1/3. (24)
This result is not appealing as consumers will mostly likely consider both bus types as
one alternative. If this is the case, then it implies the following probabilities are more reasonable.
P(choose car) = 1/2 ; P(choose red bus) = P(choose blue bus) = 1/4. (25)
Thus, the I.I.A. property can constrain the probabilities in such a way that in some
contexts, we can get results that are unrealistic. There are several ways of correcting this
-
7/30/2019 Chapter_14 Advanced Regression Models
24/49
580
problem. One alternative is to allow for a tree structure for consumer choice. We can achieve this
with a nested-logit model (Ben-Akiva, 1973) that allows for correlation among the utilities of
alternatives only within a nest. A second alternative is to allow for heterogeneity in customers
parameters then, at the aggregate level, the IIA property disappears (see Chapter 19 this book). A
third method is to allow for the brand utilities to be correlated as is done by the multinomial
probit model. We discuss this last method later.
Sampling of Alternatives
In the above analysis, we just had four alternatives. There are many instances, however,
where the number of alternatives can be much larger. For example, if retail store managers want
to evaluate the effect of price and promotion at the UPC level rather than focusing at the brand
level, then the number of alternatives can be in hundreds. In that case, evaluating the
denominator (the sum of strengths of all alternatives) in the probability expression of choosing a
particular alternative will be infeasible. One method for circumventing this problem is to sample
a set of alternatives from the entire set of possible alternatives and then evaluate the probabilities.
The following example illustrates this method.
Suppose we wish to model consumers choosing a mutual fund from all available mutual
funds. There are many mutual funds that consumers can choose from - the latest figures suggest
that there are more than 8000 mutual funds in the US alone (Investment Company Institute,
2005). We definitely cannot use all 8000 or more of these funds while evaluating the probability
of choosing a specific one. What we can do is to randomly sample a small number of these
mutual funds, for example 10, to form the set of alternatives. While sampling, for each
observation we have to ensure that the mutual fund chosen for that observation is in the
constructed set of alternatives (else how could the consumer have chosen that mutual fund if it
-
7/30/2019 Chapter_14 Advanced Regression Models
25/49
581
were not in her set of alternatives?). To ensure this, we include the mutual fund that was chosen
on that observation and randomly sample 9 others from the rest of alternatives. We can then
estimate the parameters of the model in exactly the same manner as described above in the
Breakfast Foods example. The MLE parameter estimates from using a set of alternatives
constructed from such a random sampling scheme will be exactly the same as those from using
all the alternatives. For a more detailed description of sampling, look at Ben-Akiva and Lerman
(1985).
There are other methods for sampling of alternatives, such as importance sampling. This
sort of sampling scheme is typically used when there is a need to over sample an alternative. For
example, in the previous example of mutual funds, suppose we find that many consumers are
choosing a few mutual funds then a sampling scheme should take these skewed choices into
account when selecting samples.An importance sample scheme does exactly that. Here, while
estimating the model, a correction factor is included to account for the non-random sampling.
Train, Ben-Akiva, and Atherton (1989) show an application of such a sampling scheme in the
context of consumers choosing long distance plans and minutes of consumption.
Multinomial Probit Model
There are several instances when there is a need to allow the utility errors of the
alternatives to be correlated. For example, consumers typically choose between different modes
of transport such as bus, car, train and others. They can also be using a combination of these
alternatives for commuting e.g., a mix of car and train (Currim, 1982). In such a scenario, the
errors in the utility of choosing a car, train and the alternative representing a combination of car
and train can be correlated (i.e., cannot be assumed to be independent). Clearly, we need a model
that is flexible enough to capture any possible correlation.
-
7/30/2019 Chapter_14 Advanced Regression Models
26/49
582
A multinomial probit model allows for the utility errors to be correlated and have
different variances (i.e., different scales for different alternatives). It also places several
identification restrictions (Keane, 1992). We show these restrictions in the simplest setting a
choice model with three alternatives. The utilities for the three alternatives are given as follows.
U1t=1 + x1t1 + 1t
U2t=2 + x2t2 + 2t
U3t=3 + x3t3 + 3t (26)
Here, we have intentionally separated the intercepts with the other covariates to show the
identification conditions. Also note, we have assumed a general model where the parameters for
the covariates are alternative-specific. The errors are assumed to have the following
distributional specification.
t
t
t
3
2
1
~
333231
232221
131211
,0N
(27)
As only differences in utilities matter, we can rewrite the above utilities in the following manner.
Y1t = U1tU3t =(1-3) + (x1t1 -x3t3) + ( 1t - 3t)
Y2t = U2tU3t =(2-3) + (x2t1 -x3t3) + ( 2t - 3t) (28)
Let 1tbe ( 1t - 3t) and 2t be ( 2t - 3t) then the joint distributional specification is as
follows.
t
t
2
1
~
2221
1211,0N
(29)
-
7/30/2019 Chapter_14 Advanced Regression Models
27/49
583
We now state the identification conditions. First, note that the differences in the intercepts, (1-
3) and (2-3), enter Y1t and Y2t. Thus, it is only these differences among the intercepts and not
their absolute values that are estimable. We can, therefore, without loss of generality set 3 as 0.
Second, unlike a linear regression where the dependent variable is observable, utilities are latent
(i.e., unobservable) and we have to set its scale. We do this by setting one of the variances (11 or
12) to 1. Let 11be set to 1 then, only 12 and 22 are estimable. Here, the parameter12 captures
any correlation between the differenced utilities and, therefore, the IIA problem is no longer a
concern. In empirical applications, the estimate of12 will suggest whether there is correlation
present among the utilities. If in an application, the estimate is significantly different from zero
then it implies that a multinomial logit model is inappropriate for that application as it does not
allow for utilities to be correlated.
Note that not all parameters of the original covariance matrix of the non-differenced
utilities are identified. In general, if there are J alternatives then the original covariance matrix
contains J*(J+1)/2 parameters. Of these, upon taking the difference of utilities and putting the
identification conditions, only ((J-1)*J/2)-1 parameters are identified (Train, 2003). In the above
formulation, we had 3 alternatives thus the original covariance matrix has (3)(4)/2 = 6
parameters. Of these, only (3-1)3/2-1=2 parameters are identified.
We now consider an application where a trinomial probit choice model is applied. This
application is from Keane (1982). The application considers the employment choices of men and
models three choices manufacturing (M), nonmanufacturing (NM) and unemployment. The
data for this model is from a national longitudinal survey of men. Table 14.13 contains a
description of the independent variables. In this application, the intercept for unemployment is
set to zero and the variance for the utility for M is set to 1. Note that independent variables are
-
7/30/2019 Chapter_14 Advanced Regression Models
28/49
584
allowed to have different effects on M and NM. For example, the model allows education to
have different effects on manufacturing and non-manufacturing. Finally, there are two sets of
parameters one set of parameters is estimated when the correlation among the utilities (12) is
set to 0 and the variance of NM is set to 1 (Model-1). The other set of parameters is attained
when both the correlation and variance are estimated (Model-2).
[Table 14.13 about here]
The results show that Model-2 is marginally better than Model-1 in terms of the log-
likelihood. Further, the correlation is positive and significantly different from zero. Also note
that there are differences in the estimated parameters of Model-1 and Model-2. This implies that
allowing for a correlation among the utilities clearly affects the estimation of other parameters in
the model. The positive correlation also suggests that a multinomial logit model may not have
been appropriate for this setting as it would have failed to capture this correlation.
The multinomial probit model provides a flexible way of capturing the correlations that
might be present among the utilities. This alleviates the IIA problem that is inherent in the
multinomial logit model.
Tobit Analysis
Tobit models are a part of general class of models for analyzing censored data. These
types of data are encountered when for a large number of observations, the dependent variable is
clustered around a certain value (Tobin, 1958). For example, in a large scale study of the number
of hours that married women work, it was found that about 66% of respondents reported zero
hours (Greene & Quester, 1982). We will show that analyzing such censored data without
accounting for censoring will always lead to biased estimates.
-
7/30/2019 Chapter_14 Advanced Regression Models
29/49
585
There are many other scenarios where such a censoring is observed. For example, in
grocery settings consumers either dont purchase a brand or have positive quantity (Jedidi,
Ramaswamy, & Desarbo, 1993; Tellis, 1988). Technically then, any demand modeling must
employ a tobit modeling framework as quantity is inherently non-negative (i.e., is censored) and
has a cut off at zero. There are several ways of modeling such a demand situation. If the focus is
on modeling the demand of a single alternative then a censored regression is typically the chosen
method. We will show an example of this methodology. If, however, the focus is on modeling
both the choice of an alternative and quantity demanded subsequent to the choice, then a two
stage regression is usually adopted (Tellis, 1988). In this framework, the choice of alternative
and quantity demanded are assumed to be interconnected i.e. the errors in the utility of
alternatives are correlated with the error in the demand model. This correlation captures any
selectivity bias (Heckman, 1979). For example, consumers may buy more of their preferred
brand but less of a brand that is chosen on a promotion.
We now illustrate a censored regression analysis. In general, a censored regression can be
expressed as follows.
(30)
Here, for an observation t, the random variable qt* is a partially observable variable. The error, t
is normally distributed, N(0, 2). The observed value of this variable is qt , the quantity observed
for observation t, when it is greater than zero. The observed value is zero if q t* is less than zero.
In other words,
(31)
t.t x*t
q +=
=
0qif0
0qifqq
*t
*t
*t
t
-
7/30/2019 Chapter_14 Advanced Regression Models
30/49
586
The expected value of qt is E(qt) = E(qt|qt* > 0) P(qt* > 0). Thus,
)xE(x)0*
t
q|
t
E(q tt t>+=>
+=
t
tx
x
x
t
(32)
Here, the ( .) and ( . ) are the density and the cumulative distribution function respectively of
the standard normal distribution. The above equation is similar to a standard OLS model with an
additional term that corrects for censoring. We can estimate the above model together with the
correction factor to yield unbiased estimates. Notice, that if we did not include the correction
factor then a regular OLS estimation will lead to biased estimates due to the omitted variable.
Bomberger (1993) used the above censored regression model to estimate the impact of
income and wealth on household deposits. There were 4262 households in the dataset out of
which 290 households had no deposits. Therefore, for these households the dependent variable is
censored at zero.
Bomberger estimates the above model and compare the results with an OLS regression.
Table 14.14 shows these results. We can make several observations from these estimates. First,
the intercept has very different value in the two equations. Second, we find that wealth is
marginally significant is the tobit model while it is non-significant in the OLS regression. This
implies that a failure to properly model the censoring can alter both the sign and significance of
the estimates.
[Table 14.4 about here]
-
7/30/2019 Chapter_14 Advanced Regression Models
31/49
587
Conclusions
In this chapter, we discussed several methods that are applicable in scenarios wherein the
dependent variable is either discrete (e.g. choice of brand) or constrained in such a manner (e.g.
market share) that a linear regression with OLS estimation fails to be the best alternative.
We began the chapter with a discussion on discriminant analysis. We showed that this
method is applicable for a discrete dependent variable (predetermined groups). In this context,
we also showed how to determine the independent variables that best discriminate among groups
and how to calculate their relative importance for discrimination.
Next, we discussed logisitic regression. We showed that this method is suitable for both
binary discrete dependent variables (e.g. buy/ no buy situation) and dependent variables that are
between 0-1 (e.g. market share). Thus, this method is applicable for a wider set of situations than
a two-group discriminant analysis.
We extended the logistic regression model to multinomial choice models that are suitable
for scenarios with a dependent variable that can take multiple values. In this context, we showed
the multinomial logit and probit models. The former model is the most frequently used choice
model as it provides closed form probability expressions. It has a few limitations it suffers
from the IIA property and the elasticity expressions are constrained to show a particular
substitution pattern among alternatives (e.g. own price elasticity is smaller for higher share
brands). The multinomial probit model alleviates the IIA problem but at the expense of closed
form probability expressions. We noted that for applications where the unobserved factors
affecting the available alternatives are correlated (e.g. the red bus / blue bus problem) then a
multinomial probit model is more appropriate than a multinomial logit model.
-
7/30/2019 Chapter_14 Advanced Regression Models
32/49
588
We ended the chapter with a discussion of censored regression models or tobit models.
These models are a combination of a binary probit and a multiple regression and are applicable
in a wide range of scenarios where there is censoring of the data (e.g. demand of a good).
-
7/30/2019 Chapter_14 Advanced Regression Models
33/49
589
ENDNOTES
1This is a synthetic dataset. For generating this data, we set the sensitivities to Recency,
Frequency and Monetary at 2.3, 0.3 and 0.1 respectively.
2We use brand and alternative interchangeably in this example. Here the four alternatives
correspond to four different brands.
-
7/30/2019 Chapter_14 Advanced Regression Models
34/49
590
REFERENCES
Bellman, S., Lohse, G. L., & Johnson, E. J. (1999). Predictors of online buying behavior.
Communications of the ACM, 42(12), 32-38.
Ben-Akiva, M. (1973). Structure of Passenger Travel Demand Models, Ph.D. Dissertation.
Department of Civil Engineering, MIT, Cambridge, MA.
-----, & Lerman, S. (1985).Discrete choice analysis: Theory and application to travel demand.
Cambridge, MA: MIT Press.
Bomberger, W. A. (1993). Income, wealth and household demand for deposits. The American
Economic Review, 84(4), 1034-1044.
Currim, I. S. (1982). Predictive testing of consumer choice models not subject to independence
of irrelevant alternatives. Journal of Marketing Research, 19, 208-222.
Draganska, M., & Jain, D. (2005). Product line length as a competitive tool. Journal of
Economics and Management Strategy, 14(1), 1-28.
Greene, W. H., & Quester, A. (1982). Divorce risk and wives labor supply behavior. Social
Science Quarterly, 63, 16-27.
Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy.Journal of
Marketing Research, 25, 342-355.
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 46, 931-961.
Investment Company Institute (2005).ICI Factbook. Retrieved October 16, 2005, from
http://www.ici.org/.
Jedidi, K., Ramaswamy, V., & DeSarbo, W. S. (1993). A maximum likelihood method for latent
class regression involving a censored dependent variable, Psychometrika, 58(3), 375-394.
-
7/30/2019 Chapter_14 Advanced Regression Models
35/49
591
Johnson, R. A. & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle
River, NJ: Prentice Hall.
Keane, M. P. (1992). A note on identification in the multinomial probit model.Journal of
Business & Economic Statistics, 10(2), 193-200.
Lehmann, D.R., Gupta, S., & Steckel, J. H. (1998).Marketing Research. New York: Addison-
Wesley.
Tellis, G. J. (1988). Advertising exposure, loyalty and brand purchase: A two-stage model of
choice.Journal of Marketing Research, 25, 134-144.
Theil, H. (1971). Principles of econometrics. New York: Wiley.
Tobin, J. (1958). Estimation of relationship for limited dependent variables.Econometrica, 26,
24-36.
Train, K. (2003). Discrete choice models with simulation. Cambridge, MA: Cambridge
University Press.
-----, Ben-Akiva, M., & Atherton T. (1989). Consumption patterns and self-selecting tariffs.
Review of Economics and Statistics,71(1), 62-73.
-
7/30/2019 Chapter_14 Advanced Regression Models
36/49
592
Table 14.1
Southern Versus Non-Southern States
Variable Means Discriminant FunctionVariable South Non-South One-WayF
Un-standardized
Standardized
Average IncomePopulationPopulation ChangePercent UrbanTax Per CapitaGovernment Expen.College Enrollment
Mineral ProductionForest AcresManuf. OutputFarm Receipts
4.954.451.3757.00464.13286.13165.20
2006.2715.936.77
1801.73
5.914.191.1958.37618.23281.54192.45
610.4914.708.66
1943.37
17.130.050.300.0326.970.000.14
6.840.050.400.06
-0.430.92-0.370.02-0.010.00-0.01
0.000.01-0.27-0.00
-0.324.15-0.390.33-0.870.75-2.03
0.030.15-2.65-0.40
Chi-SquareDegrees of FreedomCanonical CorrelationWilks Lambda
42.71110.800.36
Source: Lehmann, Gupta and Steckel,Marketing Research. Page 668 (Addison-Wesley EducationalPublishers Inc., 1998)
Table 14.2
Hit Miss Table
Predicted Group
South Non-SouthActual GroupSouth
Non-South
14
3
1
32
Source: Lehmann, Gupta and Steckel,Marketing Research.Page 668 (Addison-Wesley Educational Publishers Inc., 1998)
-
7/30/2019 Chapter_14 Advanced Regression Models
37/49
593
Table 14.3
Averages for the Five Food Expenditure Groups
Group
Variables1
< $152
$15-$293
$30-$444
$45-$595
> $60
Education of wifeEducation of husbandAgeIncomeFamily sizeHow often they shopNumber of brands
shopped forInformation soughtSample size
3.322.794.091.622.091.911.82
1.9134
4.113.753.462.062.522.182.25
1.91284
4.294.083.062.753.132.272.34
1.81293
4.474.572.503.474.142.292.25
1.84181
4.494.692.723.755.112.622.72
1.8761
Source: Lehmann, Gupta and Steckel,Marketing Research. Page 670 (Addison-Wesley EducationPublishers Inc., 1998)
Table 14.4
Discriminant Functions
Unstandardized Coeff. Standardized Coeff.
Variables1 2 3 4 1 2 3 4
Education of wifeAgeIncome
Family sizeHow often they shopNumber of brandsshopped forConstant
0.02-0.01-0.25
-0.58-0.29-0.01
3.19
0.010.55-0.29
0.400.350.26
-3.62
-0.560.200.21
0.21-0.80-0.28
2.93
0.42-0.13-0.62
0.38-0.25-0.30
0.36
0.02-0.01-0.41
-0.77-0.20-0.01
-
0.010.81-0.43
0.560.240.37
-
-0.700.290.30
0.29-0.58-0.37
-
0.52-0.20-0.89
0.53-0.18-0.43
-
Source: Lehmann, Gupta and Steckel,Marketing Research. Page 687 (Addison-Wesley EducationPublishers Inc., 1998)
-
7/30/2019 Chapter_14 Advanced Regression Models
38/49
594
Table 14.5
Means of Groups
FunctionsGroups 1 2 3 4
12345
1.030.590.04-0.71-1.44
0.160.06-0.06-0.190.47
0.65-0.06-0.070.09-0.01
-0.060.05-0.070.05-0.01
Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)
Table 14.6
Hit Miss Table (Multiple Discriminant Analysis)
Predicted Group
Actual Group
1 2 3 4 5
1
2
3
4
5
20
86
50
7
2
13
106
65
7
1
1
59
90
33
6
0
24
57
84
12
0
9
31
50
40
Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)
-
7/30/2019 Chapter_14 Advanced Regression Models
39/49
595
Table 14.7
Summary Statistics for the RFM Data
Variable Mean Std. Dev.Recency
Frequency
Monetary
Choice
3.87
7.80
73.14
0.45
2.06
2.74
28.87
Table 14.8
Parameter Estimates for OLS Regression
Variable Estimate Std. Error
Intercept*
Recency*
Frequency
Monetary*
-0.92
0.15
0.02
0.01
0.13
0.01
0.01
0.001
R2
Adjusted R2
0.61
0.60*Significant at the 0.05 significance level.
-
7/30/2019 Chapter_14 Advanced Regression Models
40/49
596
Table 14.9
Parameter Estimates for Logistic Regression
Intercept Only Model Full Model
Variable Estimate Std. Error Estimate Std. Error
Intercept
Recency
Frequency
Monetary
-0.20
-
-
-
0.21
-
-
-
-30.29*
3.34*
0.59*
0.17*
8.55
0.93
0.24
0.05
-2LL 137.628 30.489
*Significant at the 0.05 level.
Table 14.10
Summary Statistics for Breakfast Foods Data
Brand Average Price($)
Promotion Market Share(%)
Brand 1
Brand 2
Brand 3
Brand 4
1.75
1.58
1.91
1.94
0.07
0.04
0.09
0.01
22.19
17.18
48.36
12.28
-
7/30/2019 Chapter_14 Advanced Regression Models
41/49
597
Table 14.11
Parameters estimates for Multinomial Logit Model
Intercepts Only Model Full Model
Variable Estimate Std. Error Estimate Std. Error
Intercept_Brand1
Intercept_Brand2
Intercept_Brand3
Price
Promotion
0.59
0.34
1.37
-
-
0.08
0.09
0.07
-
-
0.06
-0.64
1.91
-3.03
0.44
0.11
0.11
0.10
0.11
0.14
-2LL
Hit Rate
4321.88
48.3 %
3033.92
63.2 %
-
7/30/2019 Chapter_14 Advanced Regression Models
42/49
598
Table 14.12
Price Elasticities from the Full Multinomial Logit Model
Brand 1 2 3 4
1
2
3
4
-4.47
0.67
2.64
0.53
0.85
-4.14
2.64
0.53
0.85
0.67
-3.17
0.53
0.85
0.67
2.64
-5.37
Change inprice of j
Change in probability of k
-
7/30/2019 Chapter_14 Advanced Regression Models
43/49
599
Table 14.13
Parameters estimates for Multinomial Probit Model
Model 1 Model 2
Variable M NM M NM
Non labor income
Unemployment rate
Time trend
Years of education
Labor experience
Square of Exper.
Dummy for race
Dummy for marriage
Number of kids
Intercept
0.01(0.01)-0.08(0.01)-0.02(0.01)0.01(0.01)0.02
(0.01)-0.01(0.00)0.10(0.05)0.47(0.04)0.12(0.02)-0.06(0.14)
-0.05(0.01)-0.05(0.01)0.05(0.01)0.11(0.01)-0.03
(0.01)0.00(0.01)0.09(0.05)0.95(0.09)-0.18(0.03)-0.13(0.12)
0.00(0.01)-0.09(0.02)-0.01(0.01)0.03(0.01)0.01
(0.01)0.00(0.01)0.15(0.06)0.51(0.07)0.09(0.03)0.46(0.18)
-0.03(0.03)-0.08(0.02)0.04(0.03)0.10(0.06)-0.02
(0.02)0.00(0.01)0.14(0.07)0.91(0.39)-0.11(0.12)0.31(0.35)
Correlation
Variance
LL
0.00 (fixed)
1.00 (fixed)
-10,300.71
0.64(0.37)1.16
(0.58)-10,299.65
Source:Keane, M. P. (1992). A Note on Identification in the Multinomial Probit Model. Journal ofBusiness & Economic Statistics, 10, 2, Page 199
-
7/30/2019 Chapter_14 Advanced Regression Models
44/49
600
Table 14.14
Parameters estimates for OLS and Tobit Analysis
Variable OLS TobitEstimate Std. Error Estimate Std. Error
Intercept
Income
Wealth
-698.6
0.1015
-0.00001
1036.10
0.0065
0.0004
-9733
0.145
0.0002
3033
0.002
0.0001
Source: Bomberger, W. A. (1993). Income, Wealth and Household Demand for Deposits.The American Economic Review. 84, 4, Page 1038.
Figure 14.1
Purchasers and Non-purchasers Versus Age and Income
1
2
3
4
5
6
1 2 3 4 5 6
Age
Income
P
P
PP
P
P
P
P
P
PP
P
P
NN
N
N
NN
N
NN
N N
NNN
-
7/30/2019 Chapter_14 Advanced Regression Models
45/49
601
Figure 14.2
Group Means on First Two Discriminant Functions
Group Means
-0.7
-0.5
-0.3
-0.1
0.1
0.3
0.5
0.7
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Function 1
Function2
12
3
4
5
-
7/30/2019 Chapter_14 Advanced Regression Models
46/49
602
Figure 14.3
OLS Predictions Versus Actual Choice
-0.5
0
0.5
1
1.5
2
0 20 40 60 80 100 120
Predictions
Choice
-
7/30/2019 Chapter_14 Advanced Regression Models
47/49
603
Figure 14.4
Logistic Regression Predictions Versus Actual Choice
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100 120
Predictions
Choice
-
7/30/2019 Chapter_14 Advanced Regression Models
48/49
604
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Frequency
Probabilityo
fpurchase
Figure 14.5
Probability of Purchase Versus Frequency
-
7/30/2019 Chapter_14 Advanced Regression Models
49/49
5 10 15 20 25 30
Week
0.2
0.4
0.6
0.8
1
tek
raM
erahS
0.25
0.5
0.75
1
ecirP
FDC
FDC
FD
FDC
FDC
FD
FD
Figure 14.6
Variation in Market Share with changes in Marketing Mix
F = Feature, D = Display, C = Store Coupon