Driver analysis and product optimization using Bayesian networks

Tutorial on Driver Analysis and Product Optimization with BayesiaLab

Stefan Conrady, [email protected]

Dr. Lionel Jouffe, [email protected]

December 1, 2010

Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting

mailto:[email protected]




Table of Contents


Introduction 1

BayesiaLab 1

Conrady Applied Science 1

Acknowledgements 1

Abstract 1

Bayesian Networks 1

Structural Equation Models 1

Probabilistic Structural Equation Models 2

Tutorial 2

Model Development 2

Data Preparation 2

Consumer Research 2

Data Import 2

Unsupervised Learning 5

Preliminary Analysis 6

Variable Clustering 8

Multiple Clustering 10

Analysis of Factors 12

Completing the PSEM 14

Market Driver Analysis 16

Product Driver Analysis 19

Product Optimization 19

Conclusion 24

Contact Information 25

Conrady Applied Science, LLC 25

Bayesia SAS 25

Copyright 25

Conrady Applied Science, LLC - www.conradyscience.com

Driver Analysis and Product Optimization with BayesiaLab i

http://www.conradyscience.com



IntroductionThis tutorial is intended for new or prospective users of BayesiaLab. The example in this tutorial is taken from

the !eld of marketing science and is meant to illustrate

the capabilities of BayesiaLab with a real-world case

study and actual consumer data. Beyond market researchers, analysts and researchers in many !elds will

hopefully !nd the proposed methodology valuable and

intuitive. In this context, many of the technical steps are outlined in great detail, such as data preparation and the

network learning, as they are applicable to research with

BayesiaLab in general, regardless of the domain.

BayesiaLabBayesia SAS, based in Laval, France has been developing BayesiaLab since 1999 and it has emerged as the leading

software package for knowledge discovery, data mining

and knowledge modeling using Bayesian networks.

BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The

relevance of Bayesian networks, especially in the context

of market research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed

BayesiaLab globally since 2007.

Conrady Applied ScienceConrady Applied Science, based in Franklin, TN, is a

consulting !rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010,

Conrady Applied Science has been appointed Bayesia’s

authorized sales and consulting partner for North

America.

AcknowledgementsWe would like to express our gratitude to Ares Research

(www.ares-etudes.com) for generously providing data

from their consumer research for our case study.

AbstractMarket driver analysis and product optimization are one of the central tasks in Product Marketing and thus

relevant to virtually all types of businesses. BayesiaLab

provides a uni!ed software platform, which can, based on consumer data,

1. provide deep understanding of the market

preference structure

2. directly generate recommendations for prioritized

product actions.

The proposed approach utilizes Probabilistic Structural

Equation Models (PSEM), based on machine-learned Bayesian networks. PSEMs provide an ef!cient

alternative to Structural Equation Models (SEM), which

have been used traditionally in market research.

Bayesian Networks

A Bayesian network, belief network is a directed acyclic

graphical model that represents the joint probability distribution over a set of random variables and their

conditional dependencies via a directed acyclic graph

(DAG). For example, a Bayesian network could represent

the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to

compute the probabilities of the presence of various

diseases.

Structural Equation Models

Structural Equation Modeling (SEM) is a statistical

technique for testing and estimating causal relations using a combination of statistical data and qualitative

causal assumptions. This de!nition of SEM was

articulated by the geneticist Sewall Wright (1921), the

economist Trygve Haavelmo (1943) and the cognitive scientist Herbert Simon (1953), and formally de!ned by

Judea Pearl (2000).

Structural Equation Models (SEM) allow both con!rmatory and exploratory modeling, meaning they

Driver Analysis and Product Optimization with BayesiaLab

Conrady Applied Science, LLC - www.conradyscience.com 1

http://www.ares-etudes.com

http://www.ares-etudes.com



are suited to both theory testing and theory

development.

Probabilistic Structural Equation Models

Traditionally, specifying and estimating an SEM required

a multitude of manual steps, which are typically very time consuming, often requiring weeks or even months

of an analyst’s time. PSEMs are based on the idea of

leveraging machine learning for automatically generating a structural model. As a result, creating PSEMs with

BayesiaLab is extremely fast and can thus form an

immediate basis for much deeper analysis and

optimization.

TutorialAt the beginning of this tutorial, we want to emphasize

the overarching objectives of this case study, so we don’t

lose sight of the “big picture” as we immerse ourselves into the technicalities of BayesiaLab and Bayesian

networks.

In this study we want to examine how product attributes perceived by consumers relate to purchase intention for

speci!c products. Put simply, we want to understand the

key drivers for purchase intent. Given the large number of attributes in our study, we also want to identify

common concepts among these attributes in order to

make interpretation easier and communication with

managerial decision makers more effective.

Secondly, we want to utilize the generated understanding

of consumer dynamics, so product developers can

optimize the characteristics of the products under study in order to increase purchase intent among consumers,

which is our ultimate business objective.

Notation

In order to clearly distinguish between natural language,

BayesiaLab-speci!c functions and study-speci!c variable names, the following notation is used:

• BayesiaLab functions, keywords, commands, etc., are

shown in bold type.

• Variable names are capitalized and italicized.

Model Development

Data Preparation

Consumer Research

This study is based on a monadic1 consumer survey

about perfumes, which was conducted in France. In this example we use survey responses from 1,320 women,

who have evaluated a total of 11 fragrances on a wide

range of attributes:

• 27 ratings on fragrance-related attributes, such as,

“sweet”, “!owery”, “feminine”, etc., measured on a 1-

to-10 scale.

• 12 ratings on projected imagery related to someone,

who would be wearing the respective fragrance, e.g.

“is sexy”, “is modern”, measured on a 1-to-10 scale.

• 1 variable for Intensity, a measure re"ecting the level of intensity, measured on a 1-to-5 scale.2

• 1 variable for Purchase Intent, measured on a 1-to-6

scale.

• 1 nominal variable, Product, for product identi!cation

purposes.

Data ImportTo start the analysis with BayesiaLab, we !rst import the

data set, which is formatted as a CSV !le.3 With

Data>Open Data Source>Text File, we start the Data Import wizard, which immediately provides a preview of the data !le.



1 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.

2 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-

about-right” level.

3 CSV stands for “comma-separated values”, a common format for text-based data !les.



The table displayed in the Data Import wizard shows the

individual variables as columns and the responses as rows. There are a number of options available, e.g. for

sampling. However, this is not necessary in our example

given the relatively small size of the database.

Clicking the Next button, prompts a data type analysis,

which provides BayesiaLab’s best guess regarding the

data type of each variable.

Furthermore, the Information box provides a brief

summary regarding the number of records, the number

of missing values, !ltered states, etc.4

For this example, we will need to override the default

data type for the Product variable, as each value is a

nominal product identi!er rather than a numerical scale

value. We can change the data type by highlighting the

Product variable and clicking the Discrete check box,

which changes the color of the Product column to red.

We will also de!ne Purchase Intent and Intensity as a

discrete variables, as the default number of states of

these variables is already adequate for our purposes.5

The next screen provides options as to how to treat any

missing values. In our case, there are no missing values

so the corresponding panel is grayed-out.

Clicking the small upside-down triangle next to the

variable names brings up a window with key statistics of

the selected variable, in this case Fresh.

The next step is the Discretization and Aggregation

dialogue, which allows the analyst to determine the type

of discretization, which must be performed on all



4 There are no missing values in our database and !ltered states are not applicable in this survey.

5 The desired number of variable states is largely a function of the analyst’s judgment.



continuous variables.6 For this survey, and given the

number of observations, it is appropriate to reduce the number of states from the original 10 states (1 through

10) to smaller number. One could, for instance, bin the

1-10 rating into low, mid and high, or apply any other arbitrary method deemed appropriate by the analyst.

The screenshot shows the dialogue for the Manual selection of discretization steps, which permits to select binning thresholds by point-and-click.

For this particular example, we select Equal Distances with 5 intervals for all continuous variables. This was the analyst’s choice in order to be consistent with prior

research.

Note

For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful:

• For supervised learning, choose Decision Tree.

• For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies.

Clicking Select All Continuous followed by Finish

completes the import process and the 49 variables (columns) from our database are now shown as blue

nodes in the Graph Panel, which is the main window for

network editing.

This initial view represents a fully unconnected Bayesian

network.

For reasons, which will become clear later, we will

initially exclude two variables, Product and Purchase

Intent. We can do so by right-clicking the nodes and

selecting Properties>Exclusion. Alternatively, holding “x” while double-clicking the nodes performs the same

exclusion function.



6 BayesiaLab requires discrete distributions for all variables.



Unsupervised LearningAs the next step, we will perform the !rst unsupervised

learning of a network by selecting Learning>Association Discovering>EQ.

The resulting view shows the learned network with all

the nodes in their original position.

Needless to say, this view of the network is not very intuitive. BayesiaLab has numerous built-in layout

algorithms, of which the Force Directed Layout is perhaps the most commonly used.

It can be invoked by View>Automatic Layout>Force Directed Layout or alternatively through the keyboard shortcut “p”. This shortcut is worthwhile to remember

as it is one of the most commonly used functions.





The resulting network will look similar to the following

screenshot.

To optimize the use of the available screen, clicking the

Best Fit button in the toolbar “zooms to !t” the

graph to the screen. In addition, rotating the graph with

the Rotate Left and Rotate Right buttons helps to create a suitable view.

The !nal graph should closely resemble the following

screenshot and, in this view, the properties of this !rst learned Bayesian network become immediately apparent.

This network is a now compact representation of the 47

dimensions of the joint probability distribution of the underlying database.

It is very important to note that, although this learned graph happens to have a tree structure, this is not the

result of an imposed constraint.

Preliminary AnalysisThe analyst can further examine this graph by switching

into the Validation Mode, which immediately opens up the Monitor Panel on the right side of the screen.

This panel is initially empty, but by clicking on any node or multiple nodes in my network, Monitors appear





inside the Monitor Panel and the corresponding nodes

are highlighted in yellow.

By default, the Monitors show the marginal distributions

of all selected variables. This shows, for instance, 9.7% of respondents rated their perfume at <=2.8 in terms of

the Fresh attribute.

On this basis, one can start to experiment with the properties of this particular Bayesian network and query

it. With BayesiaLab this can be done in an extremely

intuitive way, i.e. by setting evidence (or observations)

directly on the Monitors. For instance, we can compute the conditional probability distribution of Flowery, given

that we have observed a speci!c value, i.e. a speci!c

state of Fresh. In formal notation, this would be

P(Flowery | Fresh)

We will now set Flowery to the state that represents the highest rating (>8.2) and we can immediately observe the

conditional probability distribution of Fresh, i.e.

P(Fresh | Flowery = " > 8.2")

The gray arrows inside the bars indicate how the distributions have changed compared to the previous

distributions. This means that respondents, who have

rated the Flowery attribute of a perfume at the top level,

will have a 67% probability of also assigning a top rating to the Fresh attribute.

P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%

Switching brie"y back into the Modeling Mode and by

clicking on the Flowery node, one can see the probabilistic relationship between Flowery and Fresh in

detail. By learning the network, BayesiaLab has

automatically created a contingency table for every single direct relationship between nodes.

Note

The structure of our Bayesian network may be directed, but the directions of the arcs do not necessarily have to be meaningful.

For observational inference, it is only necessary that the Bayesian network correctly represents the joint probability distribution of the underlying database.





All contingency tables, together with the graph structure,

thus encode the joint probability distribution of our original database.

Returning to the Validation Mode, we can further

examine the properties of our network. Of great interest is the strength of the probabilistic relationships between

the variables. In BayesiaLab this can be shown by

selecting Analysis>Graphic>Arcs’ Mutual Information.

The thickness of the arcs is now proportional to the

Mutual Information, i.e. the strength of the relationship

between the nodes.

Intuitively, Mutual Information measures the

information that X and Y share: it measures how much

knowing one of these variables reduces our uncertainty about the other. For example, if X and Y are

independent, then knowing X does not provide any

information about Y and vice versa, so their mutual

information is zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared

with Y: knowing X determines the value of Y and vice

versa.

We can also show the values of the Mutual Information

on the graph by clicking on Display Arc Comments.

In the top part of the comment box

attached to each arc the Mutual Information of the arc is shown. Below,

expressed as a percentage and highlighted

in blue, we see the relative Mutual Information in the direction of the arc (parent node ➔

child node). And, at the bottom, we have the relative

mutual information in the opposite direction of the arc

(child node ➔ parent node).

Variable ClusteringThe information about the strength between the manifest

variables can also be utilized for purposes of Variable Clustering. More speci!cally, a concept related closely to

the Mutual Information, namely the Kullback-Leibler Divergence (K-L Divergence) is utilized for clustering.

Formal De!nition of Mutual Information

I(X;Y ) = p(x, y)log p(x, y)p(x)p(y)

⎛⎝⎜

⎞⎠⎟x∈X

∑y∈Y∑





Such variable clusters will allow us to induce new latent

variables, which each represent a common concept among the manifest variables.7 From here on, we will

make a very clear distinction between manifest variables, which are directly observed, such as the survey responses, and latent variables, which are derived. In

traditional statistics, deriving such latent variables or

factors is typically performed by means of Factor Analysis, e.g. Principal Components Analysis (PCA).

In BayesiaLab, this “factor extraction” can be done very

easily via the Analysis>Graphics>Variable Clustering

function, which is also accessible through the keyboard shortcut “s”.

The speed in which this is performed is one of the

strengths of BayesiaLab, as the resulting variable clusters

are presented instantly.

For probability distributions P and Q of a discrete random variable their K–L divergence is de!ned to be

DKL = (P ||Q) = P(i)log P(i)Q(i)i

∑

In words, it is the average of the logarithmic difference between the joint probability distributions P(i) and Q(i), where the average is taken using the probabilities P(i).

In this case, BayesiaLab has identi!ed 15 variable clusters and each node is color-coded according to the

cluster membership. To interpret these newly-found

clusters, we can zoom in and visually examine the

structure on the graph panel.

To support the interpretation process, BayesiaLab can also display a Dendrogram, which allows the analyst to

review the linkage of nodes into variable clusters.



7 An alternative approach is to interpret the derived concept or factor as a hidden common cause.



The analyst may also choose a different number of clusters, based on his own judgement relating to the

domain. A slider in the toolbar allows to choose various

numbers of clusters and the color association of the nodes will be update instantly.

By clicking the Validate Clustering button in the

toolbar, the clusters are saved and the color codes will be formally associated with the nodes. A clustering report

provides us with a formal summary of the new factors

and their associated manifest variables.8

The analyst also has the option to use his domain knowledge to modify which manifest variables belong to

speci!c factors. This can be done by right-clicking on the

Graph Panel and selecting Class Editor.

Multiple ClusteringAs our next step towards building the PSEM, we will introduce these newly-generated latent factors into our

existing network and also estimate their probabilistic

relationships with the manifest variables. This means we will create a new node for each latent factor, creating 15

new dimensions in our network. For this step, we will

need to return to the Modeling Mode, because the introduction of the factor nodes into the networks

requires the learning algorithms.



8 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.



More speci!cally, we select Learning>Multiple Clustering, which brings up the Multiple Clustering

dialogue. There is a range of settings, but we will focus

here only a subset. Firstly, we need to specify an output directory for the to-be-learned networks. Secondly, we

need to set some parameters for the clustering process,

such as the minimum and maximum number of states, which can be created during the learning process.

In our example, we select Automatic Selection of the Number of Classes, which will allow the learning

algorithm to !nd the optimum number of factor states

up to a maximum of !ve states. This means that each

new factor will need to represent the corresponding

manifest variables with up to !ve states.

The Multiple Clustering process concludes with a report,

which shows details regarding the generated clustering.

The top portion of the report is shown in the following screenshot.

The detail section of Factor_0, as it relates to the manifest variables, is worth highlighting. Here we can

see the strength of the relationship between the manifest

variables, such as Trust, Bold, etc., and Factor_0. In a

traditional Factor Analysis, this would be the equivalent of factor loading.

After closing the report, we will now see a new

(unconnected) network, with 15 additional nodes, one for each factor, i.e. Factor_0 through Factor_14,

highlighted in yellow in the screenshot.





Analysis of FactorsWe can also further examine how the new factors relate

to the manifest variables and how well they represent them. In the case of Factor_0, we want to understand

how it can summarize our !ve manifest variables.

By going into our previously-speci!ed output directory,

using the Windows Explorer or the Mac Finder, we can see that 15 new networks (in BayesiaLab’s xbl format for

networks) were generated. We open the speci!c network

for Factor_0, either by directly double-clicking the xbl !le or by selecting Network>Open. The factor-speci!c

networks are identi!ed by a suf!x/extension of the

format “_[Factor_#].xbl” and “#” stands for the factor number. We then see a network including the manifest

variables and with the factor being linked by arcs going

from the factor to the manifest variables.

Returning to the Validation Mode, we can see !ve states for Factor_0, labeled C1 through C5, as well as their

marginal distribution. As Factor_0 is a target node by

default, it automatically appears highlighted in red in the

Monitor Panel.

Here we can also study how the states of the manifest variables relate to the states of Factor_0. This can be

done easily by setting observations to the monitors, e.g.

setting C1 to 100%.





We now see that given that Factor_0 is in state C1, the

variable Active has a probability of approx. 75% of

being in state <=2.8. Expressed more formally, we would state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%.

This means that for respondents, who have been

assigned to C1, it is likely that they would rate the Active attribute very low as well.

In the Monitor for Factor_0, in parentheses behind the

cluster name, we !nd the expected mean value of the numeric equivalents of the states of the manifest

variables, e.g. “C1 (2.08)”. That means that given the

state C1 of Factor_0, we expect the mean value of Trust,

Bold, Ful"lled, Active and Character to be 2.08.

To go into even greater detail, we can actually look at

every single respondent, i.e. every record in the database, and see what cluster they were assigned to. We select

Inference>Interactive Inference,

which will bring up a record selector in the toolbar.

With this record selector, we can now scroll through the entire database, review the actual ratings of the

respondents and then see the estimation to which cluster

each respondent belongs.

In our !rst case, record 0, we see the ratings of this respondent indicated by the manifest Monitors. In the

highlighted Monitor for Factor_0 we read that this

respondent, given her responses, has a 82% probability of belonging to Cluster 5 (C5) in Factor_0.

Moving to our second case, record 1, we see that the

respondent belongs to Cluster 3 (C3) with a 96%

probability.





We can also evaluate the performance of our new

network based on Factor_0 by selecting

Analysis>Network Performance>Global.

This will return the log-likelihood density function, as

shown in the following screenshot.

Completing the PSEMWe are now returning to our main task and our principal

network, which has been augmented by the 15 new

factors.

Before we re-learn our network with the new factors, we need to include Purchase Intent as a variable and also

impose a number of constraints in the form of Forbidden Arcs.

Being in the Modeling Mode, we can include Purchase Intent by right-clicking the node and uncheck Exclusion.

This makes the Purchase Intent variable available in the

next stage of learning, which is re"ected visually as well in the node color and the icon.





Our desired SEM-type network structure stipulates that

manifest variables be connected exclusively to the factors and that all the connections with Purchase Intent must

also go through the factors. We achieve such a structure

by imposing the following sets of forbidden arcs:

1. No arcs between manifest variables

2. No arcs from manifest variables to factors

3. No arcs between manifest variables and Purchase Intent

We can de!ne these forbidden arcs by right-clicking

anywhere on the graph panel, which brings up the

following menu.

In BayesiaLab, all manifest variables and all factors are conveniently grouped into classes, so we can easily de!ne

which arcs are forbidden in the Forbidden Arc Editor.

Upon completing this step, we can proceed to learning our network again: Learning>Association Discovering>EQ

The initial result will resemble the following screenshot.





Using the Force Directed Layout algorithm (shortcut

“p”), as before, we can quickly transform this network

into a much more interpretable format.

Now we see manifest variables “laddering up” to the

factors and we also see how the factors are related to each other. Most importantly, we can observe where the

Purchase Intent node was attached to the network

during the learning process. The structure conveys that Purchase Intent has the strongest link with Factor_2.

Now that we can see the big picture, it is perhaps

appropriate to give the factors more descriptive names. For obvious reasons, this task is the responsibility of the

analyst. In this case study, Factor_0 was given the name

“Self-Con"dent”. We add this name into the node

comments by double-clicking Factor_0 and scrolling to

the right inside the Node Editor until we see the Comments tab.

We repeat this for all other nodes and we can

subsequently display the node comments for all factors by clicking the Display Node Comment icon in the

toolbar or by selecting View>Display Node Comments from the menu.

Market Driver AnalysisOur model, the PSEM, is complete and we can now use it to perform the actual analysis part of this exercise,

namely to !nd out what “drives” Purchase Intent.

We return to the Validation Mode and right-click on Purchase Intent and then check Set As Target Node.

Double-clicking the node while pressing “t” is a helpful

shortcut.





This will also change the appearance of the node and literally give it the look of a target.

In order to understand the relationship between the

factors and Purchase Intent, we want to tune out all the

manifest variables for the time being. We can do so by right-clicking the Use of Classes icon in the bottom right

corner of the screen. This will bring up a list of all

classes. By default, all are checked and thus visible.

For our purposes, we want to deselect All and then only

check the Factor class.

The resulting view has all the manifest variables grayed-out, so the relationship between the factors becomes

more prominent. By deselecting the manifest variables,

we also exclude them from subsequent analysis.

We will now right-click inside the (currently empty)

Monitor Panel and select Monitors Sorted wrt Target Variable Correlations. The keyboard shortcut “x” will

do the same.





This brings up the monitor for the target node, Purchase Intent, plus all the monitors for the factors, in the order

of the strength of relationship with the Target Node.

This immediately highlights the order of importance of

the factors relative to the Target Node, Purchase Intent.

Another way of comprehensively displaying the

importance is by selecting Reports>Target Analysis>Correlations With the Target Node

“Correlations” is more of a metaphor here, as BayesiaLab actually orders the factors by their mutual

information relative to the target node, Purchase Intent.

By clicking Quadrants, we can obtain a type of

opportunity graph, which shows the mean value of each

factor on the x-axis and the relative Mutual Information with Purchase Intent on the y-axis. Mutual Information

can be interpreted as importance in this context.





By right-clicking on the graph, we can switch between

the display of the formal factor names, e.g. Factor_0, Factor_1, etc., and the factor comments, such as

Adequacy, Seduction, which is much easier for

interpretation.

As in the previous views, it becomes very obvious that

the factor Adequacy is most important with regard to

Purchase Intent, followed by the factor Seduction. This is very helpful for understanding the overall market

dynamics and for communicating the key drivers to

managerial decision makers.

The lines dividing the graph into quadrants re"ect the mean values for each axis. The upper-left quadrant

highlights opportunities as these particular factors are

“above average” in importance, but “below average” in terms of their rating.

Product Driver AnalysisAlthough this insight is relevant for the whole market, it

does not yet allow us to work on improving speci!c products. For this we need to look at product-speci!c

graphs. In addition, we may need to introduce

constraints as to where we may not have the ability to impact any attributes. Such information must come from

the domain expert, in our case from the perfumer, who

will determine if and how odoriferous compounds can

affect the consumers’ perception of the product attributes.

These constraints can be entered into BayesiaLab’s Cost Editor, which is accessible by right-clicking anywhere in the Graph Panel. Those attributes, which cannot be

changed (as determined by the expert), will be set to

“Not Observable”. As we proceed with our analysis,

these constraints will be extremely important when

searching for realistic product scenarios.

On a side note, an example from the presumably more

tangible auto industry may better illustrate such kinds of

constraints. For instance, a vehicle platform may have an inherent wheelbase limitation, which thus sets a hard

limit regarding the maximum amount of rear passenger

legroom. Even if consumers perceived a need for improvement on this attribute, making such a

recommendation to the engineers would be futile. As we

search for optimum product solutions with our Bayesian

network, this is very important to bear in mind and thus we must formally encode these constraints of our

domain through the Cost Editor.

Product OptimizationWe now return brie"y to the Modeling Mode to include the Product variable, which has been excluded from our

analysis thus far. Right-clicking the node and then

unchecking Properties>Exclusion will achieve this.

At this time, we will also move beyond the analysis of

factors and actually look at the individual product

attributes, so we select Manifest from the Display Classes menu.

Back in the Validation Mode, we can perform a Multi Quadrant Analysis: Tools>Multi Quadrant Analysis

This tool allows us to look at the attribute ratings of

each product and their respective importance, as

expressed with the Mutual Information. Thus we pick





Product as the Selector Node and choose Mutual Information for Analysis. In this case, we also want to check Linearize Nodes’ Values, Regenerate Values and

specify an Output Directory, where the product-speci!c

networks will be saved. In the process of generating the Multi Quadrant Analysis, BayesiaLab will actually

generate one Bayesian network for each Product. For all

Products the network structure will be identical to the network for the entire market, however, the parameters,

i.e. the contingency tables, will be speci!c to each

Product.

However, before we proceed to the product-speci!c

networks, we will !rst see a Multi Quadrant Analysis by

Product and we can select each product’s graph simply by right-clicking and choosing the appropriate product

identi!cation number.

Please note that only the observable variables are visible

on the chart, i.e. those variables which were not previously de!ned as “Not Observable” in the Cost Editor.

For Product No. 5, Personality is at the very top of the importance scale. But how will the Personality attribute

compare in the competitive context? If we Display Scales by right-clicking on the graph, it appears that Personality is already at the best level among the competitors, i.e. to

the far right of the horizontal scale. On the other hand,

on the Fresh attribute Product No. 59 marks the bottom end of the competitive range.



9 Any similarities of identi!ers with actual product names are purely coincidental.



For a perfumer it would thus be reasonable to assume

that there is limited room for improvement in regard to Personality and that Fresh offers perhaps signi!cant

opportunity for Product No. 5.

To highlight the differences between products, we will also show Product No. 1 in comparison.

For Product No. 1 it becomes apparent that Intensity is

highly important, but that its rating is towards the

bottom end of the scale. The perfumer may thus conclude a bolder version of the same fragrance will

improve Purchase Intent.

Finally, by hovering over any data point in the opportunity chart, BayesiaLab can also display the

position of competitors compared to the reference

product for any attribute. The screenshot shows Product

No. 5 as the reference and the position of competitors on the Personality attribute.

BayesiaLab also allows us to measure and save the “gap

to best level” (=variations) for each product and each variable through the Export Variations function. This

formally captures our opportunity for improvement.

Please note that these variations need to be saved

individually by Product.

By now we have all the components necessary for a

comprehensive optimization of product attributes:

1. Constraints on “non-actionable” attributes, i.e.

excluding those variables, which can’t be affected

through product changes.

2. A Bayesian network for each Product.





3. The current attribute rating of each Product and each

attribute’s importance relative to Purchase Intent.

4. The “gap to best level” (variation) for each attribute

and Product.

With the above, we are now in a position to search for realistic product con!gurations, based on the existing

product, which would realistically optimize Purchase

Intent.

We proceed individually by Product and for illustration

purposes we use Product No. 5 again. We load the

product-speci!c network, which was previously saved

when the Multi Quadrant Analysis was performed.

One of the powerful features of BayesiaLab is Target Dynamic Pro!le, which we will apply here on this network to optimize Purchase Intent:

Analysis>Report>Target Analysis>Target Dynamic Pro!le

The Target Dynamic Pro!le provides a number of important options:

• Pro!le Search Criterion: we intend to optimize the

mean of the Purchase Intent.

• Criterion Optimization: maximization is the objective.

• Search Method: We select Mean and also click on Edit Variations, which allows us to manually stipulate the

range of possible variations of each attribute. In our case, however, we had saved the actual variations of

Product No. 5 versus the competition, so we load that

data set, which subsequently displays the values in the Variation Editor. For example, Fresh could be

improved by 10.7% before catching up to the highest-





rated product in this attribute.

• Search Stop Criterion: We check Maximum Number of Evidence Reached and set this parameter to 4. This

means that no more than the top-four attributes will

be suggested for improvement.

Upon completion of all computations, we will obtain a

list of product action priorities: Fresh, Fruity, Flowery

and Wooded.

The highlighted Value/Mean column shows the successive improvement upon implementation of each

action. From initially 3.76, the Purchase Intent improves

to 3.92, which may seem like a fairly small step.

However, the importance lies in the fact that this improvement is not based on utopian thinking, but

rather on attainable product improvements within the

range of competitive performance.

Initially, we have the marginal distribution of the

attributes and the original mean value for Purchase Intent, i.e. 3.77.

To further illustrate the impact of our product actions,

we will simulate their implementation step-by-step, which is available through Inference>Interactive Inference.

With the selector in the toolbar, we can go through each

product action step-by-step in the order in which they

were recommended.

Upon implementation of the !rst product action, we

obtain the following picture and Purchase Intent grows

to 3.9. Please note that this is not a sea change in terms of Purchase Intent, but rather a realistic consumer

response to a product change.





The second change results in further subtle improvement to Purchase Intent:

The third and fourth step are analogous and bring us to

the !nal value for Purchase Intent of 3.92.

Although BayesiaLab generates these recommendation very quickly and easily, they represent a major

innovation in the !eld of marketing science. This

particular optimization task has not been tractable with

traditional methods.

ConclusionThe presented case study demonstrates how BayesiaLab can transform simple survey data into a deep

understanding of consumers’ thinking and quickly

provides previously-inconceivable product recommendations. As such, BayesiaLab is an

revolutionary tool, especially as the work"ow shown

here may take no more than a few hours for an analyst to implement. This kind of rapid and “actionable”10

insight is clearly a breakthrough and creates an entirely

new level of relevance of research for business

applications.



10 The authors cringe at the in"ationary use of “actionable”, but here, for once, it actually seems appropriate.



Contact Information

Conrady Applied Science, LLC312 Hamlet’s End Way

Franklin, TN 37067

USA

+1 888-386-8383 [email protected]

www.conradyscience.com

Bayesia SAS6, rue Léonard de Vinci

BP 119

53001 Laval CedexFrance

+33(0)2 43 49 75 69

[email protected]

www.bayesia.com

Copyright© Conrady Applied Science, LLC and Bayesia SAS 2010. All rights reserved.

Any redistribution or reproduction of part or all of the

contents in any form is prohibited other than the

following:

• You may print or download this document for your

personal and non-commercial use only.

• You may copy the content to individual third parties for their personal use, but only if you acknowledge

Conrady Applied Science as the source of the material.

• You may not, except with our express written permission, distribute or commercially exploit the

content. Nor may you transmit it or store it in any

other website or other form of electronic retrieval

system.









http://www.bayesia.com

http://www.bayesia.com



Driver analysis and product optimization using Bayesian networks

Documents

Transcript of Driver analysis and product optimization using Bayesian networks