Driver analysis and product optimization using Bayesian networks
-
Upload
stefan-conrady -
Category
Documents
-
view
80 -
download
0
description
Transcript of Driver analysis and product optimization using Bayesian networks
Tutorial on Driver Analysis and Product Optimization with BayesiaLab
Stefan Conrady, [email protected]
Dr. Lionel Jouffe, [email protected]
December 1, 2010
Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
Table of Contents
Tutorial on Driver Analysis and Product Optimization with BayesiaLab
Introduction 1
BayesiaLab 1
Conrady Applied Science 1
Acknowledgements 1
Abstract 1
Bayesian Networks 1
Structural Equation Models 1
Probabilistic Structural Equation Models 2
Tutorial 2
Model Development 2
Data Preparation 2
Consumer Research 2
Data Import 2
Unsupervised Learning 5
Preliminary Analysis 6
Variable Clustering 8
Multiple Clustering 10
Analysis of Factors 12
Completing the PSEM 14
Market Driver Analysis 16
Product Driver Analysis 19
Product Optimization 19
Conclusion 24
Contact Information 25
Conrady Applied Science, LLC 25
Bayesia SAS 25
Copyright 25
Conrady Applied Science, LLC - www.conradyscience.com
Driver Analysis and Product Optimization with BayesiaLab i
Tutorial on Driver Analysis and Product Optimization with BayesiaLab
IntroductionThis tutorial is intended for new or prospective users of BayesiaLab. The example in this tutorial is taken from
the !eld of marketing science and is meant to illustrate
the capabilities of BayesiaLab with a real-world case
study and actual consumer data. Beyond market researchers, analysts and researchers in many !elds will
hopefully !nd the proposed methodology valuable and
intuitive. In this context, many of the technical steps are outlined in great detail, such as data preparation and the
network learning, as they are applicable to research with
BayesiaLab in general, regardless of the domain.
BayesiaLabBayesia SAS, based in Laval, France has been developing BayesiaLab since 1999 and it has emerged as the leading
software package for knowledge discovery, data mining
and knowledge modeling using Bayesian networks.
BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The
relevance of Bayesian networks, especially in the context
of market research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed
BayesiaLab globally since 2007.
Conrady Applied ScienceConrady Applied Science, based in Franklin, TN, is a
consulting !rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010,
Conrady Applied Science has been appointed Bayesia’s
authorized sales and consulting partner for North
America.
AcknowledgementsWe would like to express our gratitude to Ares Research
(www.ares-etudes.com) for generously providing data
from their consumer research for our case study.
AbstractMarket driver analysis and product optimization are one of the central tasks in Product Marketing and thus
relevant to virtually all types of businesses. BayesiaLab
provides a uni!ed software platform, which can, based on consumer data,
1. provide deep understanding of the market
preference structure
2. directly generate recommendations for prioritized
product actions.
The proposed approach utilizes Probabilistic Structural
Equation Models (PSEM), based on machine-learned Bayesian networks. PSEMs provide an ef!cient
alternative to Structural Equation Models (SEM), which
have been used traditionally in market research.
Bayesian Networks
A Bayesian network, belief network is a directed acyclic
graphical model that represents the joint probability distribution over a set of random variables and their
conditional dependencies via a directed acyclic graph
(DAG). For example, a Bayesian network could represent
the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to
compute the probabilities of the presence of various
diseases.
Structural Equation Models
Structural Equation Modeling (SEM) is a statistical
technique for testing and estimating causal relations using a combination of statistical data and qualitative
causal assumptions. This de!nition of SEM was
articulated by the geneticist Sewall Wright (1921), the
economist Trygve Haavelmo (1943) and the cognitive scientist Herbert Simon (1953), and formally de!ned by
Judea Pearl (2000).
Structural Equation Models (SEM) allow both con!rmatory and exploratory modeling, meaning they
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 1
are suited to both theory testing and theory
development.
Probabilistic Structural Equation Models
Traditionally, specifying and estimating an SEM required
a multitude of manual steps, which are typically very time consuming, often requiring weeks or even months
of an analyst’s time. PSEMs are based on the idea of
leveraging machine learning for automatically generating a structural model. As a result, creating PSEMs with
BayesiaLab is extremely fast and can thus form an
immediate basis for much deeper analysis and
optimization.
TutorialAt the beginning of this tutorial, we want to emphasize
the overarching objectives of this case study, so we don’t
lose sight of the “big picture” as we immerse ourselves into the technicalities of BayesiaLab and Bayesian
networks.
In this study we want to examine how product attributes perceived by consumers relate to purchase intention for
speci!c products. Put simply, we want to understand the
key drivers for purchase intent. Given the large number of attributes in our study, we also want to identify
common concepts among these attributes in order to
make interpretation easier and communication with
managerial decision makers more effective.
Secondly, we want to utilize the generated understanding
of consumer dynamics, so product developers can
optimize the characteristics of the products under study in order to increase purchase intent among consumers,
which is our ultimate business objective.
Notation
In order to clearly distinguish between natural language,
BayesiaLab-speci!c functions and study-speci!c variable names, the following notation is used:
• BayesiaLab functions, keywords, commands, etc., are
shown in bold type.
• Variable names are capitalized and italicized.
Model Development
Data Preparation
Consumer Research
This study is based on a monadic1 consumer survey
about perfumes, which was conducted in France. In this example we use survey responses from 1,320 women,
who have evaluated a total of 11 fragrances on a wide
range of attributes:
• 27 ratings on fragrance-related attributes, such as,
“sweet”, “!owery”, “feminine”, etc., measured on a 1-
to-10 scale.
• 12 ratings on projected imagery related to someone,
who would be wearing the respective fragrance, e.g.
“is sexy”, “is modern”, measured on a 1-to-10 scale.
• 1 variable for Intensity, a measure re"ecting the level of intensity, measured on a 1-to-5 scale.2
• 1 variable for Purchase Intent, measured on a 1-to-6
scale.
• 1 nominal variable, Product, for product identi!cation
purposes.
Data ImportTo start the analysis with BayesiaLab, we !rst import the
data set, which is formatted as a CSV !le.3 With
Data>Open Data Source>Text File, we start the Data Import wizard, which immediately provides a preview of the data !le.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 2
1 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.
2 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-
about-right” level.
3 CSV stands for “comma-separated values”, a common format for text-based data !les.
The table displayed in the Data Import wizard shows the
individual variables as columns and the responses as rows. There are a number of options available, e.g. for
sampling. However, this is not necessary in our example
given the relatively small size of the database.
Clicking the Next button, prompts a data type analysis,
which provides BayesiaLab’s best guess regarding the
data type of each variable.
Furthermore, the Information box provides a brief
summary regarding the number of records, the number
of missing values, !ltered states, etc.4
For this example, we will need to override the default
data type for the Product variable, as each value is a
nominal product identi!er rather than a numerical scale
value. We can change the data type by highlighting the
Product variable and clicking the Discrete check box,
which changes the color of the Product column to red.
We will also de!ne Purchase Intent and Intensity as a
discrete variables, as the default number of states of
these variables is already adequate for our purposes.5
The next screen provides options as to how to treat any
missing values. In our case, there are no missing values
so the corresponding panel is grayed-out.
Clicking the small upside-down triangle next to the
variable names brings up a window with key statistics of
the selected variable, in this case Fresh.
The next step is the Discretization and Aggregation
dialogue, which allows the analyst to determine the type
of discretization, which must be performed on all
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 3
4 There are no missing values in our database and !ltered states are not applicable in this survey.
5 The desired number of variable states is largely a function of the analyst’s judgment.
continuous variables.6 For this survey, and given the
number of observations, it is appropriate to reduce the number of states from the original 10 states (1 through
10) to smaller number. One could, for instance, bin the
1-10 rating into low, mid and high, or apply any other arbitrary method deemed appropriate by the analyst.
The screenshot shows the dialogue for the Manual selection of discretization steps, which permits to select binning thresholds by point-and-click.
For this particular example, we select Equal Distances with 5 intervals for all continuous variables. This was the analyst’s choice in order to be consistent with prior
research.
Note
For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful:
• For supervised learning, choose Decision Tree.
• For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies.
Clicking Select All Continuous followed by Finish
completes the import process and the 49 variables (columns) from our database are now shown as blue
nodes in the Graph Panel, which is the main window for
network editing.
This initial view represents a fully unconnected Bayesian
network.
For reasons, which will become clear later, we will
initially exclude two variables, Product and Purchase
Intent. We can do so by right-clicking the nodes and
selecting Properties>Exclusion. Alternatively, holding “x” while double-clicking the nodes performs the same
exclusion function.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 4
6 BayesiaLab requires discrete distributions for all variables.
Unsupervised LearningAs the next step, we will perform the !rst unsupervised
learning of a network by selecting Learning>Association Discovering>EQ.
The resulting view shows the learned network with all
the nodes in their original position.
Needless to say, this view of the network is not very intuitive. BayesiaLab has numerous built-in layout
algorithms, of which the Force Directed Layout is perhaps the most commonly used.
It can be invoked by View>Automatic Layout>Force Directed Layout or alternatively through the keyboard shortcut “p”. This shortcut is worthwhile to remember
as it is one of the most commonly used functions.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 5
The resulting network will look similar to the following
screenshot.
To optimize the use of the available screen, clicking the
Best Fit button in the toolbar “zooms to !t” the
graph to the screen. In addition, rotating the graph with
the Rotate Left and Rotate Right buttons helps to create a suitable view.
The !nal graph should closely resemble the following
screenshot and, in this view, the properties of this !rst learned Bayesian network become immediately apparent.
This network is a now compact representation of the 47
dimensions of the joint probability distribution of the underlying database.
It is very important to note that, although this learned graph happens to have a tree structure, this is not the
result of an imposed constraint.
Preliminary AnalysisThe analyst can further examine this graph by switching
into the Validation Mode, which immediately opens up the Monitor Panel on the right side of the screen.
This panel is initially empty, but by clicking on any node or multiple nodes in my network, Monitors appear
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 6
inside the Monitor Panel and the corresponding nodes
are highlighted in yellow.
By default, the Monitors show the marginal distributions
of all selected variables. This shows, for instance, 9.7% of respondents rated their perfume at <=2.8 in terms of
the Fresh attribute.
On this basis, one can start to experiment with the properties of this particular Bayesian network and query
it. With BayesiaLab this can be done in an extremely
intuitive way, i.e. by setting evidence (or observations)
directly on the Monitors. For instance, we can compute the conditional probability distribution of Flowery, given
that we have observed a speci!c value, i.e. a speci!c
state of Fresh. In formal notation, this would be
P(Flowery | Fresh)
We will now set Flowery to the state that represents the highest rating (>8.2) and we can immediately observe the
conditional probability distribution of Fresh, i.e.
P(Fresh | Flowery = " > 8.2")
The gray arrows inside the bars indicate how the distributions have changed compared to the previous
distributions. This means that respondents, who have
rated the Flowery attribute of a perfume at the top level,
will have a 67% probability of also assigning a top rating to the Fresh attribute.
P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%
Switching brie"y back into the Modeling Mode and by
clicking on the Flowery node, one can see the probabilistic relationship between Flowery and Fresh in
detail. By learning the network, BayesiaLab has
automatically created a contingency table for every single direct relationship between nodes.
Note
The structure of our Bayesian network may be directed, but the directions of the arcs do not necessarily have to be meaningful.
For observational inference, it is only necessary that the Bayesian network correctly represents the joint probability distribution of the underlying database.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 7
All contingency tables, together with the graph structure,
thus encode the joint probability distribution of our original database.
Returning to the Validation Mode, we can further
examine the properties of our network. Of great interest is the strength of the probabilistic relationships between
the variables. In BayesiaLab this can be shown by
selecting Analysis>Graphic>Arcs’ Mutual Information.
The thickness of the arcs is now proportional to the
Mutual Information, i.e. the strength of the relationship
between the nodes.
Intuitively, Mutual Information measures the
information that X and Y share: it measures how much
knowing one of these variables reduces our uncertainty about the other. For example, if X and Y are
independent, then knowing X does not provide any
information about Y and vice versa, so their mutual
information is zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared
with Y: knowing X determines the value of Y and vice
versa.
We can also show the values of the Mutual Information
on the graph by clicking on Display Arc Comments.
In the top part of the comment box
attached to each arc the Mutual Information of the arc is shown. Below,
expressed as a percentage and highlighted
in blue, we see the relative Mutual Information in the direction of the arc (parent node ➔
child node). And, at the bottom, we have the relative
mutual information in the opposite direction of the arc
(child node ➔ parent node).
Variable ClusteringThe information about the strength between the manifest
variables can also be utilized for purposes of Variable Clustering. More speci!cally, a concept related closely to
the Mutual Information, namely the Kullback-Leibler Divergence (K-L Divergence) is utilized for clustering.
Formal De!nition of Mutual Information
I(X;Y ) = p(x, y)log p(x, y)p(x)p(y)
⎛⎝⎜
⎞⎠⎟x∈X
∑y∈Y∑
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 8
Such variable clusters will allow us to induce new latent
variables, which each represent a common concept among the manifest variables.7 From here on, we will
make a very clear distinction between manifest variables, which are directly observed, such as the survey responses, and latent variables, which are derived. In
traditional statistics, deriving such latent variables or
factors is typically performed by means of Factor Analysis, e.g. Principal Components Analysis (PCA).
In BayesiaLab, this “factor extraction” can be done very
easily via the Analysis>Graphics>Variable Clustering
function, which is also accessible through the keyboard shortcut “s”.
The speed in which this is performed is one of the
strengths of BayesiaLab, as the resulting variable clusters
are presented instantly.
For probability distributions P and Q of a discrete random variable their K–L divergence is de!ned to be
DKL = (P ||Q) = P(i)log P(i)Q(i)i
∑
In words, it is the average of the logarithmic difference between the joint probability distributions P(i) and Q(i), where the average is taken using the probabilities P(i).
In this case, BayesiaLab has identi!ed 15 variable clusters and each node is color-coded according to the
cluster membership. To interpret these newly-found
clusters, we can zoom in and visually examine the
structure on the graph panel.
To support the interpretation process, BayesiaLab can also display a Dendrogram, which allows the analyst to
review the linkage of nodes into variable clusters.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 9
7 An alternative approach is to interpret the derived concept or factor as a hidden common cause.
The analyst may also choose a different number of clusters, based on his own judgement relating to the
domain. A slider in the toolbar allows to choose various
numbers of clusters and the color association of the nodes will be update instantly.
By clicking the Validate Clustering button in the
toolbar, the clusters are saved and the color codes will be formally associated with the nodes. A clustering report
provides us with a formal summary of the new factors
and their associated manifest variables.8
The analyst also has the option to use his domain knowledge to modify which manifest variables belong to
speci!c factors. This can be done by right-clicking on the
Graph Panel and selecting Class Editor.
Multiple ClusteringAs our next step towards building the PSEM, we will introduce these newly-generated latent factors into our
existing network and also estimate their probabilistic
relationships with the manifest variables. This means we will create a new node for each latent factor, creating 15
new dimensions in our network. For this step, we will
need to return to the Modeling Mode, because the introduction of the factor nodes into the networks
requires the learning algorithms.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 10
8 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.
More speci!cally, we select Learning>Multiple Clustering, which brings up the Multiple Clustering
dialogue. There is a range of settings, but we will focus
here only a subset. Firstly, we need to specify an output directory for the to-be-learned networks. Secondly, we
need to set some parameters for the clustering process,
such as the minimum and maximum number of states, which can be created during the learning process.
In our example, we select Automatic Selection of the Number of Classes, which will allow the learning
algorithm to !nd the optimum number of factor states
up to a maximum of !ve states. This means that each
new factor will need to represent the corresponding
manifest variables with up to !ve states.
The Multiple Clustering process concludes with a report,
which shows details regarding the generated clustering.
The top portion of the report is shown in the following screenshot.
The detail section of Factor_0, as it relates to the manifest variables, is worth highlighting. Here we can
see the strength of the relationship between the manifest
variables, such as Trust, Bold, etc., and Factor_0. In a
traditional Factor Analysis, this would be the equivalent of factor loading.
After closing the report, we will now see a new
(unconnected) network, with 15 additional nodes, one for each factor, i.e. Factor_0 through Factor_14,
highlighted in yellow in the screenshot.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 11
Analysis of FactorsWe can also further examine how the new factors relate
to the manifest variables and how well they represent them. In the case of Factor_0, we want to understand
how it can summarize our !ve manifest variables.
By going into our previously-speci!ed output directory,
using the Windows Explorer or the Mac Finder, we can see that 15 new networks (in BayesiaLab’s xbl format for
networks) were generated. We open the speci!c network
for Factor_0, either by directly double-clicking the xbl !le or by selecting Network>Open. The factor-speci!c
networks are identi!ed by a suf!x/extension of the
format “_[Factor_#].xbl” and “#” stands for the factor number. We then see a network including the manifest
variables and with the factor being linked by arcs going
from the factor to the manifest variables.
Returning to the Validation Mode, we can see !ve states for Factor_0, labeled C1 through C5, as well as their
marginal distribution. As Factor_0 is a target node by
default, it automatically appears highlighted in red in the
Monitor Panel.
Here we can also study how the states of the manifest variables relate to the states of Factor_0. This can be
done easily by setting observations to the monitors, e.g.
setting C1 to 100%.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 12
We now see that given that Factor_0 is in state C1, the
variable Active has a probability of approx. 75% of
being in state <=2.8. Expressed more formally, we would state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%.
This means that for respondents, who have been
assigned to C1, it is likely that they would rate the Active attribute very low as well.
In the Monitor for Factor_0, in parentheses behind the
cluster name, we !nd the expected mean value of the numeric equivalents of the states of the manifest
variables, e.g. “C1 (2.08)”. That means that given the
state C1 of Factor_0, we expect the mean value of Trust,
Bold, Ful"lled, Active and Character to be 2.08.
To go into even greater detail, we can actually look at
every single respondent, i.e. every record in the database, and see what cluster they were assigned to. We select
Inference>Interactive Inference,
which will bring up a record selector in the toolbar.
With this record selector, we can now scroll through the entire database, review the actual ratings of the
respondents and then see the estimation to which cluster
each respondent belongs.
In our !rst case, record 0, we see the ratings of this respondent indicated by the manifest Monitors. In the
highlighted Monitor for Factor_0 we read that this
respondent, given her responses, has a 82% probability of belonging to Cluster 5 (C5) in Factor_0.
Moving to our second case, record 1, we see that the
respondent belongs to Cluster 3 (C3) with a 96%
probability.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 13
We can also evaluate the performance of our new
network based on Factor_0 by selecting
Analysis>Network Performance>Global.
This will return the log-likelihood density function, as
shown in the following screenshot.
Completing the PSEMWe are now returning to our main task and our principal
network, which has been augmented by the 15 new
factors.
Before we re-learn our network with the new factors, we need to include Purchase Intent as a variable and also
impose a number of constraints in the form of Forbidden Arcs.
Being in the Modeling Mode, we can include Purchase Intent by right-clicking the node and uncheck Exclusion.
This makes the Purchase Intent variable available in the
next stage of learning, which is re"ected visually as well in the node color and the icon.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 14
Our desired SEM-type network structure stipulates that
manifest variables be connected exclusively to the factors and that all the connections with Purchase Intent must
also go through the factors. We achieve such a structure
by imposing the following sets of forbidden arcs:
1. No arcs between manifest variables
2. No arcs from manifest variables to factors
3. No arcs between manifest variables and Purchase Intent
We can de!ne these forbidden arcs by right-clicking
anywhere on the graph panel, which brings up the
following menu.
In BayesiaLab, all manifest variables and all factors are conveniently grouped into classes, so we can easily de!ne
which arcs are forbidden in the Forbidden Arc Editor.
Upon completing this step, we can proceed to learning our network again: Learning>Association Discovering>EQ
The initial result will resemble the following screenshot.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 15
Using the Force Directed Layout algorithm (shortcut
“p”), as before, we can quickly transform this network
into a much more interpretable format.
Now we see manifest variables “laddering up” to the
factors and we also see how the factors are related to each other. Most importantly, we can observe where the
Purchase Intent node was attached to the network
during the learning process. The structure conveys that Purchase Intent has the strongest link with Factor_2.
Now that we can see the big picture, it is perhaps
appropriate to give the factors more descriptive names. For obvious reasons, this task is the responsibility of the
analyst. In this case study, Factor_0 was given the name
“Self-Con"dent”. We add this name into the node
comments by double-clicking Factor_0 and scrolling to
the right inside the Node Editor until we see the Comments tab.
We repeat this for all other nodes and we can
subsequently display the node comments for all factors by clicking the Display Node Comment icon in the
toolbar or by selecting View>Display Node Comments from the menu.
Market Driver AnalysisOur model, the PSEM, is complete and we can now use it to perform the actual analysis part of this exercise,
namely to !nd out what “drives” Purchase Intent.
We return to the Validation Mode and right-click on Purchase Intent and then check Set As Target Node.
Double-clicking the node while pressing “t” is a helpful
shortcut.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 16
This will also change the appearance of the node and literally give it the look of a target.
In order to understand the relationship between the
factors and Purchase Intent, we want to tune out all the
manifest variables for the time being. We can do so by right-clicking the Use of Classes icon in the bottom right
corner of the screen. This will bring up a list of all
classes. By default, all are checked and thus visible.
For our purposes, we want to deselect All and then only
check the Factor class.
The resulting view has all the manifest variables grayed-out, so the relationship between the factors becomes
more prominent. By deselecting the manifest variables,
we also exclude them from subsequent analysis.
We will now right-click inside the (currently empty)
Monitor Panel and select Monitors Sorted wrt Target Variable Correlations. The keyboard shortcut “x” will
do the same.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 17
This brings up the monitor for the target node, Purchase Intent, plus all the monitors for the factors, in the order
of the strength of relationship with the Target Node.
This immediately highlights the order of importance of
the factors relative to the Target Node, Purchase Intent.
Another way of comprehensively displaying the
importance is by selecting Reports>Target Analysis>Correlations With the Target Node
“Correlations” is more of a metaphor here, as BayesiaLab actually orders the factors by their mutual
information relative to the target node, Purchase Intent.
By clicking Quadrants, we can obtain a type of
opportunity graph, which shows the mean value of each
factor on the x-axis and the relative Mutual Information with Purchase Intent on the y-axis. Mutual Information
can be interpreted as importance in this context.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 18
By right-clicking on the graph, we can switch between
the display of the formal factor names, e.g. Factor_0, Factor_1, etc., and the factor comments, such as
Adequacy, Seduction, which is much easier for
interpretation.
As in the previous views, it becomes very obvious that
the factor Adequacy is most important with regard to
Purchase Intent, followed by the factor Seduction. This is very helpful for understanding the overall market
dynamics and for communicating the key drivers to
managerial decision makers.
The lines dividing the graph into quadrants re"ect the mean values for each axis. The upper-left quadrant
highlights opportunities as these particular factors are
“above average” in importance, but “below average” in terms of their rating.
Product Driver AnalysisAlthough this insight is relevant for the whole market, it
does not yet allow us to work on improving speci!c products. For this we need to look at product-speci!c
graphs. In addition, we may need to introduce
constraints as to where we may not have the ability to impact any attributes. Such information must come from
the domain expert, in our case from the perfumer, who
will determine if and how odoriferous compounds can
affect the consumers’ perception of the product attributes.
These constraints can be entered into BayesiaLab’s Cost Editor, which is accessible by right-clicking anywhere in the Graph Panel. Those attributes, which cannot be
changed (as determined by the expert), will be set to
“Not Observable”. As we proceed with our analysis,
these constraints will be extremely important when
searching for realistic product scenarios.
On a side note, an example from the presumably more
tangible auto industry may better illustrate such kinds of
constraints. For instance, a vehicle platform may have an inherent wheelbase limitation, which thus sets a hard
limit regarding the maximum amount of rear passenger
legroom. Even if consumers perceived a need for improvement on this attribute, making such a
recommendation to the engineers would be futile. As we
search for optimum product solutions with our Bayesian
network, this is very important to bear in mind and thus we must formally encode these constraints of our
domain through the Cost Editor.
Product OptimizationWe now return brie"y to the Modeling Mode to include the Product variable, which has been excluded from our
analysis thus far. Right-clicking the node and then
unchecking Properties>Exclusion will achieve this.
At this time, we will also move beyond the analysis of
factors and actually look at the individual product
attributes, so we select Manifest from the Display Classes menu.
Back in the Validation Mode, we can perform a Multi Quadrant Analysis: Tools>Multi Quadrant Analysis
This tool allows us to look at the attribute ratings of
each product and their respective importance, as
expressed with the Mutual Information. Thus we pick
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 19
Product as the Selector Node and choose Mutual Information for Analysis. In this case, we also want to check Linearize Nodes’ Values, Regenerate Values and
specify an Output Directory, where the product-speci!c
networks will be saved. In the process of generating the Multi Quadrant Analysis, BayesiaLab will actually
generate one Bayesian network for each Product. For all
Products the network structure will be identical to the network for the entire market, however, the parameters,
i.e. the contingency tables, will be speci!c to each
Product.
However, before we proceed to the product-speci!c
networks, we will !rst see a Multi Quadrant Analysis by
Product and we can select each product’s graph simply by right-clicking and choosing the appropriate product
identi!cation number.
Please note that only the observable variables are visible
on the chart, i.e. those variables which were not previously de!ned as “Not Observable” in the Cost Editor.
For Product No. 5, Personality is at the very top of the importance scale. But how will the Personality attribute
compare in the competitive context? If we Display Scales by right-clicking on the graph, it appears that Personality is already at the best level among the competitors, i.e. to
the far right of the horizontal scale. On the other hand,
on the Fresh attribute Product No. 59 marks the bottom end of the competitive range.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 20
9 Any similarities of identi!ers with actual product names are purely coincidental.
For a perfumer it would thus be reasonable to assume
that there is limited room for improvement in regard to Personality and that Fresh offers perhaps signi!cant
opportunity for Product No. 5.
To highlight the differences between products, we will also show Product No. 1 in comparison.
For Product No. 1 it becomes apparent that Intensity is
highly important, but that its rating is towards the
bottom end of the scale. The perfumer may thus conclude a bolder version of the same fragrance will
improve Purchase Intent.
Finally, by hovering over any data point in the opportunity chart, BayesiaLab can also display the
position of competitors compared to the reference
product for any attribute. The screenshot shows Product
No. 5 as the reference and the position of competitors on the Personality attribute.
BayesiaLab also allows us to measure and save the “gap
to best level” (=variations) for each product and each variable through the Export Variations function. This
formally captures our opportunity for improvement.
Please note that these variations need to be saved
individually by Product.
By now we have all the components necessary for a
comprehensive optimization of product attributes:
1. Constraints on “non-actionable” attributes, i.e.
excluding those variables, which can’t be affected
through product changes.
2. A Bayesian network for each Product.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 21
3. The current attribute rating of each Product and each
attribute’s importance relative to Purchase Intent.
4. The “gap to best level” (variation) for each attribute
and Product.
With the above, we are now in a position to search for realistic product con!gurations, based on the existing
product, which would realistically optimize Purchase
Intent.
We proceed individually by Product and for illustration
purposes we use Product No. 5 again. We load the
product-speci!c network, which was previously saved
when the Multi Quadrant Analysis was performed.
One of the powerful features of BayesiaLab is Target Dynamic Pro!le, which we will apply here on this network to optimize Purchase Intent:
Analysis>Report>Target Analysis>Target Dynamic Pro!le
The Target Dynamic Pro!le provides a number of important options:
• Pro!le Search Criterion: we intend to optimize the
mean of the Purchase Intent.
• Criterion Optimization: maximization is the objective.
• Search Method: We select Mean and also click on Edit Variations, which allows us to manually stipulate the
range of possible variations of each attribute. In our case, however, we had saved the actual variations of
Product No. 5 versus the competition, so we load that
data set, which subsequently displays the values in the Variation Editor. For example, Fresh could be
improved by 10.7% before catching up to the highest-
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 22
rated product in this attribute.
• Search Stop Criterion: We check Maximum Number of Evidence Reached and set this parameter to 4. This
means that no more than the top-four attributes will
be suggested for improvement.
Upon completion of all computations, we will obtain a
list of product action priorities: Fresh, Fruity, Flowery
and Wooded.
The highlighted Value/Mean column shows the successive improvement upon implementation of each
action. From initially 3.76, the Purchase Intent improves
to 3.92, which may seem like a fairly small step.
However, the importance lies in the fact that this improvement is not based on utopian thinking, but
rather on attainable product improvements within the
range of competitive performance.
Initially, we have the marginal distribution of the
attributes and the original mean value for Purchase Intent, i.e. 3.77.
To further illustrate the impact of our product actions,
we will simulate their implementation step-by-step, which is available through Inference>Interactive Inference.
With the selector in the toolbar, we can go through each
product action step-by-step in the order in which they
were recommended.
Upon implementation of the !rst product action, we
obtain the following picture and Purchase Intent grows
to 3.9. Please note that this is not a sea change in terms of Purchase Intent, but rather a realistic consumer
response to a product change.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 23
The second change results in further subtle improvement to Purchase Intent:
The third and fourth step are analogous and bring us to
the !nal value for Purchase Intent of 3.92.
Although BayesiaLab generates these recommendation very quickly and easily, they represent a major
innovation in the !eld of marketing science. This
particular optimization task has not been tractable with
traditional methods.
ConclusionThe presented case study demonstrates how BayesiaLab can transform simple survey data into a deep
understanding of consumers’ thinking and quickly
provides previously-inconceivable product recommendations. As such, BayesiaLab is an
revolutionary tool, especially as the work"ow shown
here may take no more than a few hours for an analyst to implement. This kind of rapid and “actionable”10
insight is clearly a breakthrough and creates an entirely
new level of relevance of research for business
applications.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 24
10 The authors cringe at the in"ationary use of “actionable”, but here, for once, it actually seems appropriate.
Contact Information
Conrady Applied Science, LLC312 Hamlet’s End Way
Franklin, TN 37067
USA
+1 888-386-8383 [email protected]
www.conradyscience.com
Bayesia SAS6, rue Léonard de Vinci
BP 119
53001 Laval CedexFrance
+33(0)2 43 49 75 69
www.bayesia.com
Copyright© Conrady Applied Science, LLC and Bayesia SAS 2010. All rights reserved.
Any redistribution or reproduction of part or all of the
contents in any form is prohibited other than the
following:
• You may print or download this document for your
personal and non-commercial use only.
• You may copy the content to individual third parties for their personal use, but only if you acknowledge
Conrady Applied Science as the source of the material.
• You may not, except with our express written permission, distribute or commercially exploit the
content. Nor may you transmit it or store it in any
other website or other form of electronic retrieval
system.
Driver Analysis and Product Optimization with BayesiaLab
Conrady Applied Science, LLC - www.conradyscience.com 25