Knowledge Discovery in the Stock Market
-
Upload
bayesia-usa -
Category
Documents
-
view
608 -
download
0
description
Transcript of Knowledge Discovery in the Stock Market
Knowledge Discovery in the Stock Market
Supervised and Unsupervised Learning with BayesiaLab
Stefan Conrady, [email protected]
Dr. Lionel Jouffe, [email protected]
June 29, 2011
Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
Table of Contents
Tutorial
Highlights 1
Background & Objective 1
Notation 2
Dataset 3
Data Preparation and Transformation 4
Data Import 5
Determining Discretization Intervals 6
Modeling Mode 8
Unsupervised Learning 12
Bayesian Network versus Correlation Matrix 16
Inference with Bayesian Networks 16
Inference with Hard Evidence 18
Inference with Soft Evidence 22
Bayesian Network Metrics 25
Arc Force 25
Mutual Information 26
Correlation 27
Summary - Unsupervised Learning 27
Supervised Learning 29
Inference with Supervised Learning 32
Adaptive Questionnaire 34
Summary - Supervised Learning 38
Appendix
Appendix 39
Markov Blanket 39
Bayes’ Theorem 39
About the Authors 40
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com ii
Stefan Conrady 40
Lionel Jouffe 40
Contact Information 41
Conrady Applied Science, LLC 41
Bayesia S.A.S. 41
Copyright 41
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com iii
Tutorial
Highlights• Unsupervised Learning with BayesiaLab can rapidly generate plausible structures of unfamiliar problem domains, as
illustrated in this paper with examples from the U.S. stock market.
• Supervised Learning with BayesiaLab delivers reliable models in high-dimensional domains, providing both powerful
predictive performance plus a platform for simulating domain dynamics.
• Knowledge representation with Bayesian networks is highly intuitive and effectively provides computable knowledge that allows inference and reasoning under uncertainty.
Background & ObjectivePerhaps more than any other kind of time series data, !nancial markets have been scrutinized by countless mathemati-
cians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scienti!c ad-vances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancient
alchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scienti!c effort in
studying !nancial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkably dif!cult.
We neither aspire to develop a crystal ball for investors nor do we expect to contribute to the economic and economet-
ric literature. However, we !nd the wealth of data in the !nancial markets to be fertile ground for experimenting with knowledge discovery algorithms and for generating knowledge representations in the form of Bayesian networks. This
area can perhaps serve as a very practical proof of the powerful properties of Bayesian networks, as we can quickly
compare machine-learned !ndings with our own understanding of market dynamics. For instance, the prevailing opin-
ions among investors regarding the relationships between major stocks should be re"ected in any structure that is to be discovered by our algorithms.
More speci!cally, we will utilize the unsupervised and supervised learning algorithms of the BayesiaLab software pack-
age to automatically generate Bayesian networks from daily stock returns over a six-year period. We will examine 459 stocks from the S&P 500 index, for which observations are available over the entire timeframe. We selected the S&P
500 as the basis for our study, as the companies listed on this index are presumably among the best-known corporations
worldwide, so even a casual observer should be able to critically review the machine-learned !ndings. In other words, we are trying to machine-learn the obvious, as any mistakes in this process would automatically become self-evident.
Quite often experts’ reaction to such machine-learned !ndings is, “well, we already knew that.” That is the very point
we want to make, as machine-learning can — within seconds — catch up with human expertise accumulated over years,
and then rapidly expand beyond what is already known.
The power of such algorithmic learning will be still more apparent in entirely unknown domains. However, if we were
to machine-learn the structure of a foreign equity market for expository purposes in this paper, chances are that many
readers would not immediately be able to judge the resulting structure as plausible or not.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 1
In addition to generating human-readable and interpretable structures, we want to illustrate how we can immediately
use machine-learned Bayesian networks as “computable knowledge” for automated inference and prediction. Our ob-jective is to gain both a qualitative and quantitative understanding of the stock market by using Bayesian networks. In
the quantitative context, we will also show how BayesiaLab can reliably carry out inference with multiple pieces of un-
certain and even con"icting evidence. The inherent ability of Bayesian networks to perform computations under uncer-tainty makes them highly suitable for a wide range of real-world applications.
Continuing the practice established in our previous white papers, we attempt to present the proposed approach in the
style of a tutorial, so that each step can be immediately replicated (and scrutinized) by any reader equipped with the BayesiaLab software.1 This re"ects our desire to establish a high degree of transparency regarding all proposed methods
and to minimize the risk of Bayesian networks being perceived as a black-box technology.
NotationTo clearly distinguish between natural language, software-speci!c functions and example-speci!c variable names, the
following notation is used:
• Bayesian network and BayesiaLab-speci!c functions, keywords, commands, etc., are capitalized and shown in bold type.
• Names of attributes, variables, nodes and are italicized.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 2
1 The preprocessed dataset with daily return data is available for download from our website:
www.conradyscience.com/white_papers/!nancial/SP500_v6_dlog_b.csv
DatasetThe S&P 500 is a free-"oat capitalization-weighted index of the prices of 500 large-cap common stocks actively traded in the United States, which has been published since 1957. The stocks included in the S&P 500 are those of large pub-
licly held companies that trade on either of the two largest American stock market exchanges; the New York Stock Ex-
change and the NASDAQ. For our case study we have tracked the daily closing prices of all stocks included in the S&P
500 index from January 3, 2005 through December 30, 2010, only excluding those stocks which were not traded con-tinuously over the entire study period. This leaves a total of 459 stock prices with 1,510 observations each.
A
0 400 800 1200
20
40 A AA
0 400 800 1200
20
40 AA AAPL
0 400 800 1200
100
300 AAPL ABC
0 400 800 1200
20
30 ABC ABT
0 400 800 1200
40
60ABT ACE
0 400 800 1200
40
60 ACE
ADBE
0 400 800 120020
40ADBE ADI
0 400 800 1200
20
40ADI ADM
0 400 800 1200
20
40ADM ADP
0 400 800 1200
35
45 ADP ADSK
0 400 800 120020
60ADSK AEE
0 400 800 120020
40 AEE
AEP
0 400 800 1200
30
40 AEP AES
0 400 800 1200
10
20 AES AET
0 400 800 120020
60AET AFL
0 400 800 120020
60 AFL AGN
0 400 800 1200
40
80AGN AIG
0 400 800 1200
500
1000 AIG
AIV
0 400 800 1200
10
30AIV AIZ
0 400 800 1200
25
75AIZ AKAM
0 400 800 1200
20
60AKAM AKS
0 400 800 1200
25
75AKS ALL
0 400 800 120020
60ALL ALTR
0 400 800 1200
20
40ALTR
AMAT
0 400 800 120010
20 AMAT AMD
0 400 800 1200
20
40 AMD AMGN
0 400 800 120040
80 AMGN AMT
0 400 800 1200
30
50 AMT AMZN
0 400 800 120050
150AMZN AN
0 400 800 1200
10
30AN
ANF
0 400 800 120025
75 ANF AON
0 400 800 120020
40AON APA
0 400 800 120050
150APA APC
0 400 800 1200
40
80APC APD
0 400 800 120050
100APD APH
0 400 800 1200
30
50 APH
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 3
Data Preparation and TransformationRather than treating the time series in levels, we will difference the stock prices and compute the daily returns. More speci!cally, we will take differences of the logarithms of the levels, which is a good approximation of the daily stock
return in percentage terms. After this transformation, 1,509 observations remain and a selection of the !rst 36 stocks (in
alphabetical order) is shown below.
A
0 400 800 1200
0.0
0.1 A AA
0 400 800 1200-0.1
0.1AA AAPL
0 400 800 1200-0.1
0.1 AAPL ABC
0 400 800 1200-0.1
0.1 ABC ABT
0 400 800 1200
-0.05
0.05 ABT ACE
0 400 800 1200-0.1
0.1ACE
ADBE
0 400 800 1200
-0.1
0.1 ADBE ADI
0 400 800 1200
-0.1
0.1ADI ADM
0 400 800 1200
-0.1
0.1 ADM ADP
0 400 800 1200
-0.05
0.05 ADP ADSK
0 400 800 1200
-0.1
0.1 ADSK AEE
0 400 800 1200-0.1
0.1AEE
AEP
0 400 800 1200
-0.05
0.05AEP AES
0 400 800 1200
0.0
0.2 AES AET
0 400 800 1200
-0.1
0.1 AET AFL
0 400 800 1200-0.25
0.25 AFL AGN
0 400 800 1200
0.0
0.2AGN AIG
0 400 800 1200-0.5
0.5 AIG
AIV
0 400 800 1200-0.2
0.2 AIV AIZ
0 400 800 1200-0.2
0.2AIZ AKAM
0 400 800 1200-0.2
0.2 AKAM AKS
0 400 800 1200
0.0
0.2 AKS ALL
0 400 800 1200
-0.1
0.1 ALL ALTR
0 400 800 1200
-0.1
0.1 ALTR
AMAT
0 400 800 1200
0.0
0.1 AMAT AMD
0 400 800 1200-0.2
0.2 AMD AMGN
0 400 800 1200
0.0
0.1 AMGN AMT
0 400 800 1200
0.0
0.1 AMT AMZN
0 400 800 1200-0.2
0.2 AMZN AN
0 400 800 1200-0.1
0.1AN
ANF
0 400 800 1200
-0.1
0.1 ANF AON
0 400 800 1200-0.1
0.1AON APA
0 400 800 1200
0.0
0.1 APA APC
0 400 800 1200
-0.1
0.1 APC APD
0 400 800 1200
-0.05
0.05 APD APH
0 400 800 1200
0.0
0.1 APH
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 4
Data ImportWe use BayesiaLab’s Data Import Wizard to load all 459 time series2 into memory from a comma-separated !le. BayesiaLab automatically detects the column headers, which contain the ticker symbols3 as variable names.
The next step identi!es the data types contained in the dataset and, as expected, BayesiaLab !nds 459 continuous vari-
ables.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 5
2 Although the dataset has a temporal ordering, for expository simplicity we will treat each time interval as an inde-
pendent observation.
3 A ticker symbol is a short abbreviation used to uniquely identify publicly traded stocks.
There are no missing values in the dataset and we do not want to !lter out any observations, so the next screen of the
Data Import Wizard can be skipped entirely.
The next step, however, is critical. As part of every data import process into BayesiaLab we must discretize any con-tinuous variables, which means all 459 variables in our particular case.
BayesiaLab offers a number of algorithms to automatically discretize the continuous variables and one of the most prac-
tical ones, for subsequent Unsupervised Learning, is the K-Means algorithm. It provides a very quick way to capture the salient characteristics of probability density curves and creates suitable thresholds for binning purposes.
Determining Discretization IntervalsAnalyst judgement is required though for choosing an appropriate number of intervals. A common heuristic found in
the statistical literature is !ve observations per parameter. We adapt this as a guide for the minimum number of obser-
vations required for each cell in any of the yet-to-be-learned Conditional Probability Tables (CPT).
In our particular case we already know that we will initially perform Unsupervised Learning with the Maximum Weight Spanning Tree algorithm. This tree structure implies that each Node will have only have one parent, which, in turn,
means that each CPT will have the size determined by number of parent states times the number of child states. Choos-
ing !ve intervals for the discretization process would thus mean a CPT size of 25 cells.4
With a uniform distribution of the states this would suggest that we have approximately 60 observations per cell, which
would clearly be more than enough. However, upon visual inspection of the actual distributions of the variables, the
uniform distribution assumption does de!nitely not hold. The graph below shows the distribution of variable AA:
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 6
4 Other learning algorithms do not have this one-parent constraint and, for instance, a !ve-interval discretization with
three parents per node would generate CPTs consisting of 625 cells. Even when assuming uniform distributions, the available observations would be insuf!cient for estimation purposes.
Rather, looking at this graph, it may be more appropriate to assume a normal distribution.5 Given that each Node will
have one parent, we would perhaps further assume a bivariate normal distribution for the joint distribution of each pair of Nodes. We need to emphasize that we are not attempting to !t distributions per se, but that we are rather trying to
!nd a heuristic that allows us to establish the minimum number of observations needed to characterize the tail ends of
the distributions.
An assumed bivariate normal distribution would yield a discrete probability density function similar to what is shown in
the table below. In other words, this is what we would expect the Conditional Probability Table (CPT) to approxi-
mately look like, once we have discretized the states and learned the CPT from the actual occurrences. However, we
have not yet discretized the states and much less estimated the CPT. Actually, we have not really determined how many discretization levels are correct. So, it is a catch-22 and hence the need for a heuristic.
Our heuristic is that we use our qualitative understanding of the distributions to determine a reasonable number of in-
tervals that provides a minimum number of samples for the tails. More formally, the “thinnest tail” is the minimal local joint probability (MLJP). Assuming 5 states for parent and child each, and with a total of 1,509 observations, this
would translate into approximately 4 observations for the MLJP (highlighted in red).
!" !# $ # "!" !"#$% &"'&% #"&(% &"'&% !"#$%!# &"'&% (")(% $"*(% (")(% &"'&%$ #"&(% $"*(% &("$#% $"*(% #"&(%# &"'&% (")(% $"*(% (")(% &"'&%" !"#$% &"'&% #"&(% &"'&% !"#$%
&(!$ +,-./012345-
!" !# $ # "!" 6 #! '' #! 6!# #! )) &6* )) #!$ '' &6* #6! &6* ''# #! )) &6* )) #!" 6 #! '' #! 6
789 :212.-;4<;7=3>?;@4?.-
:212.-;4<;81/.52;@4?.-
:212.-;4<;7=3>?;@4?.-
:212.-;4<;81/.52;@4?.-
789
Although the number of expected samples for the MLJP appears to be below the recommended minimum, we will for
now proceed on this basis and set the number of intervals to 5. Only upon completion of the discretization, and after learning the network including the CPTs, we will know for sure whether this was indeed a reasonable assumption or
not.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 7
5 We omit plotting the distributions of all variables, but all the variables’ distributions do indeed resemble the normal
distribution.
Clicking Finish will now perform the discretization. A progress bar will be shown to track the state of this process.
Modeling ModeUpon conclusion, the variables are delivered as blue Nodes into the Graph Panel of BayesiaLab and by default we are now in the Modeling Mode. The original variable names, which were stored the !rst line of the database, become our
Node Names.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 8
At this point it is practical to add Node Comments to the Node Names. Node Comments are typically used in
BayesiaLab for longer and more descriptive titles, which can be turned on or off, depending on the desired view of the graph. Here, we associate a dictionary of the complete company names with the Node Comments, while the more com-
pact ticker symbols remain as Node Names.6
The syntax for this association is rather straightforward: we simply de!ne a text !le which includes one Node Name per line. Each Node Name is followed by the equal sign (“=”), or alternatively TAB or SPACE, and then by the full com-
pany name, which will serve as the Node Comment.
This !le can then be loaded into BayesiaLab via Data>Associate Dictionary>Node>Comments.
Once the comments are loaded, a small call-out symbol will appear next to each Node Name. This indicates that
Node Comments are available for display.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 9
6 To maintain a compact presentation, we will typically use the ticker symbol when referencing a particular stock rather
than the full company name.
As the name implies, selecting View>Display Node Comments (or alternatively the keyboard shortcut “M”) will reveal
the company names.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 10
Node Comments can be displayed for either all Nodes or only for selected ones.
Before proceeding with the !rst learning step, it is also recommended to brie"y switch into the Validation Mode (F5)
and to check the distributions of the states of the Nodes. The Monitors of the !rst nine Nodes are shown below. At !rst glance, the distributions appear to be plausible representations of the historical return distributions.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 11
Unsupervised LearningTo perform the !rst Unsupervised Learning algorithm on our dataset, we switch back into Modeling Mode (F4) and select Learning>Association Discovering>Maximum Spanning Tree.7 This starts the Maximum Weight Spanning Tree
algorithm, which is the fastest of the Unsupervised Learning algorithms and thus recommended at the beginning of most
studies.8 As the name implies, this algorithm generates a tree structure, i.e. it permits only one parent per Node. This
constraint is one of the reasons for the extreme learning speed of this algorithm.9 Performing the algorithm with a !le of this size should only take a few seconds.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 12
7 In BayesiaLab nomenclature, Unsupervised Learning is listed in the Learning menu as “Association Discovering”
8 Several other Unsupervised Learning algorithms are available in BayesiaLab, including Taboo, EQ, SopLEQ and Ta-boo Order.
9 It goes beyond the scope of this tutorial to discuss the different types of learning algorithms and their speci!c proper-
ties.
At !rst glance, however, the resulting network does not appear simple and tree-like at all.
This can be quickly resolved with BayesiaLab’s built-in layout algorithms. Selecting View>Automatic Layout (shortcut
“P”) rearranges the network instantly to reveal a much more intuitive structure.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 13
The resulting, reformatted Bayesian network representing the stock returns can now be read and interpreted
immediately:10 11
For instance, we can zoom into the branch of the Bayesian network which contains Procter & Gamble (PG).
BayesiaLab offers a search function (shortcut Ctrl-F or ⌘-F), which helps !nd individual nodes very easily.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 14
10 A separate, high-resolution PDF of this Bayesian network can be downloaded here:
www.conradyscience.com/white_papers/!nancial/SP500_V13.pdf. This allows those readers without an active BayesiaLab installation to explore the network graph in much greater detail.
11 For expositional clarity we have only learned contemporaneous relationships and, as a result, potential lag structures
will not appear in this network. However, in BayesiaLab, Unsupervised Learning can be generalized to a temporal ap-plication. A white paper speci!cally focusing on learning temporal (or dynamic) Bayesian networks is planned for the
near future.
The neighborhood of Procter & Gamble contains many familiar company names, mostly from the CPG industry.12 Per-haps these companies appear all-too-obvious and the reader may wonder what insight is gained at this point. Chances
are that even a casual observer of the industry would have mentioned Kimberly-Clark, Colgate-Palmolive and Johnson
& Johnson as businesses operating in the same !eld as Procter & Gamble, which would therefore presumably have somewhat related stock price movements.
The key point is that without any prior knowledge of this domain a computer algorithm automatically extracted this
structure, i.e. a Bayesian network, which intuitively matches the understanding that we have established over years as
consumers of these companies’ products.
Clearly, if this was an unfamiliar domain, the knowledge gain for the reader would be far greater. However, a lesser-
known domain would presumably prevent the reader’s intuitive veri!cation of the machine-discovered structure here.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 15
12 CPG stands for Consumer Packaged Goods.
Bayesian Network versus Correlation MatrixThe bene!t of the concise representation as a Bayesian network is further demonstrated by juxtaposing it to a correla-
tion matrix, which would perhaps be the !rst step in a traditional statistical analysis of this domain. Even when using
heat map-style color-coding, the sheer number of relationships13 makes an immediate visual interpretation of the corre-lation matrix very dif!cult (see the subset of 25 by 25 cells from the correlation matrix below).
A AA AAPL ABC ADI ADM ADP ADSK AEE AEP AES AET AFL AGN AIV AIZ AKAM AKS ALL ALTR AMAT AMD AMGN AMT AMZNA 1 0.570668 0.46678 0.408163 0.533252 0.425324 0.535525 0.495613 0.531351 0.486749 0.490094 0.384297 0.476417 0.465186 0.506165 0.450875 0.4315 0.533276 0.490529 0.521889 0.541416 0.454983 0.388191 0.526454 0.447969AA 0.570668 1 0.412423 0.363121 0.432512 0.49727 0.513374 0.453742 0.540668 0.487494 0.555778 0.386198 0.505749 0.417878 0.533665 0.525495 0.433653 0.691676 0.558741 0.443481 0.502896 0.406542 0.357239 0.532022 0.369067AAPL 0.46678 0.412423 1 0.236667 0.43525 0.323588 0.403402 0.417302 0.340484 0.322327 0.319482 0.289725 0.334087 0.328982 0.402068 0.340316 0.38855 0.432112 0.351426 0.444068 0.463454 0.395558 0.330339 0.437053 0.450858ABC 0.408163 0.363121 0.236667 1 0.329262 0.298421 0.416881 0.31158 0.440094 0.417974 0.347976 0.408529 0.294418 0.391646 0.33699 0.360633 0.288028 0.340885 0.39043 0.318401 0.309671 0.244243 0.36276 0.347773 0.269919ADI 0.533252 0.432512 0.43525 0.329262 1 0.321593 0.483858 0.482746 0.425898 0.371848 0.343594 0.314271 0.389693 0.366576 0.462091 0.371839 0.426141 0.460124 0.423266 0.691107 0.638214 0.495377 0.330517 0.467126 0.420969ADM 0.425324 0.49727 0.323588 0.298421 0.321593 1 0.378516 0.322902 0.452433 0.403492 0.417093 0.305003 0.366817 0.304062 0.366267 0.358504 0.389176 0.452943 0.392224 0.352995 0.339473 0.274791 0.266671 0.414046 0.313261ADP 0.535525 0.513374 0.403402 0.416881 0.483858 0.378516 1 0.452686 0.542809 0.527541 0.456298 0.372908 0.50101 0.486193 0.526986 0.507023 0.406286 0.476395 0.514611 0.513513 0.515278 0.394056 0.406387 0.48288 0.41627ADSK 0.495613 0.453742 0.417302 0.31158 0.482746 0.322902 0.452686 1 0.421398 0.402325 0.442238 0.349215 0.417223 0.389226 0.447525 0.405751 0.392804 0.43849 0.41419 0.46149 0.497755 0.396007 0.333145 0.45594 0.383973AEE 0.531351 0.540668 0.340484 0.440094 0.425898 0.452433 0.542809 0.421398 1 0.756735 0.590583 0.424766 0.513378 0.475327 0.474898 0.473565 0.321768 0.452686 0.537636 0.447271 0.436028 0.31983 0.390525 0.465076 0.32218AEP 0.486749 0.487494 0.322327 0.417974 0.371848 0.403492 0.527541 0.402325 0.756735 1 0.565275 0.403458 0.42596 0.440173 0.419188 0.458727 0.318872 0.422276 0.459285 0.396228 0.417472 0.292099 0.398822 0.446867 0.314108AES 0.490094 0.555778 0.319482 0.347976 0.343594 0.417093 0.456298 0.442238 0.590583 0.565275 1 0.378383 0.476892 0.40224 0.420327 0.453099 0.34483 0.492532 0.476188 0.349014 0.398017 0.315139 0.308978 0.438492 0.28071AET 0.384297 0.386198 0.289725 0.408529 0.314271 0.305003 0.372908 0.349215 0.424766 0.403458 0.378383 1 0.370713 0.421565 0.364347 0.420521 0.249157 0.360531 0.427641 0.290668 0.279035 0.275143 0.321026 0.401321 0.280863AFL 0.476417 0.505749 0.334087 0.294418 0.389693 0.366817 0.50101 0.417223 0.513378 0.42596 0.476892 0.370713 1 0.418877 0.588516 0.588617 0.351403 0.446767 0.634718 0.390395 0.459462 0.364762 0.285856 0.50493 0.359955AGN 0.465186 0.417878 0.328982 0.391646 0.366576 0.304062 0.486193 0.389226 0.475327 0.440173 0.40224 0.421565 0.418877 1 0.422619 0.396071 0.323589 0.388559 0.443402 0.332295 0.393542 0.347243 0.345897 0.461649 0.336944AIV 0.506165 0.533665 0.402068 0.33699 0.462091 0.366267 0.526986 0.447525 0.474898 0.419188 0.420327 0.364347 0.588516 0.422619 1 0.558192 0.408232 0.49093 0.644666 0.485371 0.541239 0.390922 0.30768 0.512831 0.397449AIZ 0.450875 0.525495 0.340316 0.360633 0.371839 0.358504 0.507023 0.405751 0.473565 0.458727 0.453099 0.420521 0.588617 0.396071 0.558192 1 0.353718 0.45162 0.616235 0.378966 0.430116 0.315676 0.343417 0.513195 0.347806AKAM 0.4315 0.433653 0.38855 0.288028 0.426141 0.389176 0.406286 0.392804 0.321768 0.318872 0.34483 0.249157 0.351403 0.323589 0.408232 0.353718 1 0.438362 0.364883 0.435992 0.428331 0.368554 0.245363 0.419715 0.385661AKS 0.533276 0.691676 0.432112 0.340885 0.460124 0.452943 0.476395 0.43849 0.452686 0.422276 0.492532 0.360531 0.446767 0.388559 0.49093 0.45162 0.438362 1 0.478014 0.420897 0.475609 0.423204 0.337167 0.508704 0.390437ALL 0.490529 0.558741 0.351426 0.39043 0.423266 0.392224 0.514611 0.41419 0.537636 0.459285 0.476188 0.427641 0.634718 0.443402 0.644666 0.616235 0.364883 0.478014 1 0.436321 0.503192 0.387605 0.312268 0.525026 0.351342ALTR 0.521889 0.443481 0.444068 0.318401 0.691107 0.352995 0.513513 0.46149 0.447271 0.396228 0.349014 0.290668 0.390395 0.332295 0.485371 0.378966 0.435992 0.420897 0.436321 1 0.645041 0.490712 0.332572 0.480285 0.443469AMAT 0.541416 0.502896 0.463454 0.309671 0.638214 0.339473 0.515278 0.497755 0.436028 0.417472 0.398017 0.279035 0.459462 0.393542 0.541239 0.430116 0.428331 0.475609 0.503192 0.645041 1 0.481282 0.354883 0.482778 0.435212AMD 0.454983 0.406542 0.395558 0.244243 0.495377 0.274791 0.394056 0.396007 0.31983 0.292099 0.315139 0.275143 0.364762 0.347243 0.390922 0.315676 0.368554 0.423204 0.387605 0.490712 0.481282 1 0.230527 0.390012 0.318144AMGN 0.388191 0.357239 0.330339 0.36276 0.330517 0.266671 0.406387 0.333145 0.390525 0.398822 0.308978 0.321026 0.285856 0.345897 0.30768 0.343417 0.245363 0.337167 0.312268 0.332572 0.354883 0.230527 1 0.327344 0.330847AMT 0.526454 0.532022 0.437053 0.347773 0.467126 0.414046 0.48288 0.45594 0.465076 0.446867 0.438492 0.401321 0.50493 0.461649 0.512831 0.513195 0.419715 0.508704 0.525026 0.480285 0.482778 0.390012 0.327344 1 0.412541AMZN 0.447969 0.369067 0.450858 0.269919 0.420969 0.313261 0.41627 0.383973 0.32218 0.314108 0.28071 0.280863 0.359955 0.336944 0.397449 0.347806 0.385661 0.390437 0.351342 0.443469 0.435212 0.318144 0.330847 0.412541 1
Admittedly, there are a number of statistical techniques available which can help in this situation, but the point is that generating a Bayesian network (e.g. with the Maximum Weight Spanning Tree algorithm we used) takes the practitioner
about the same amount of time as computing a correlation matrix, yet the former yields a much richer picture.
Beyond visual interpretability, there is another key distinction between these two representations. Whereas the correla-tion matrix is merely descriptive, the Bayesian network is actually computable. By its very nature, any Bayesian network
is a functioning model. On the other hand, with the correlation matrix one could not predict the value of one stock
given the observation of several others. For this purpose, we would have to !t and estimate speci!c models, e.g. a re-
gression. In a Bayesian network, however, we can use the graph of the Bayesian network itself for computing inference. For instance, given that we observe the values of JNJ and CL, we immediately obtain an updated value for PG and, at
the same time, also updated values for all other Nodes in the network. We refer to this property as omnidirectional in-
ference, which re"ects the updating of beliefs given evidence according to Bayes’ Rule.14 We shall illustrate carrying out omnidirectional inference in the next section.
Inference with Bayesian NetworksWe have shown that the Maximum Weight Spanning Tree algorithm can generate a readily-interpretable and fully-
computable Bayesian network from daily stock return data. However, we have not yet explained in detail what this
structure represents speci!cally.
Each Arc in this structure represents a probabilistic relationship between a pair of Nodes. The parameters15 of these
relationships are encoded in Conditional Probability Tables. In the example of the PG and JNJ relationship shown be-
low, the table de!nes the probabilities of the states of PG, given the states of JNJ. This table can be accessed in the
Modeling Mode by simply double-clicking on the desired Node, which opens up the Node Editor.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 16
13 4592 − 4592
= 105,111
14 See appendix for a brief summary of Bayes’ Theorem.
15 We use the term “parameter” rather loosely in this context, as Bayesian networks are entirely nonparametric models
in BayesiaLab.
For clarity, we show the relevant portion of the network for JNJ and PG below plus an enlarged version of the condi-
tional probability table from the Node Editor:
This says, among other things, given that we observe a JNJ return greater than 1.2%, there would be a 50.9% probabil-
ity that we would observe a PG return of greater than 1.2% (see bottom right cell in the above table). More formally
we can also write, P(PG>0.012 | JNJ > 0.012) = 50.9%.
The upper left cell says, given that we observe a JNJ return smaller than -0.9% there is a 46.5% probability that we will observe a PG return smaller than -1.3%, i.e. P(PG<=0.013 | JNJ <=0.009) = 46.5%.16
If we follow the network “downstream,” i.e from PG to KMB, we see that their relationship is quanti!ed in yet another
conditional probability table.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 17
16 As the discretization intervals were generated by the K-Means algorithm, the bins do not necessarily have the same
interval size, which we see in this example.
This can be interpreted in the same way: given that we observe a return of PG greater than 1.2%, there is a 42.4%
probability that we would also observe a KMB return of higher than 1.2%. This kind of inference is perhaps the sim-plest type, as we can directly read the table, i.e. “given this, then that.”
Inference with Hard Evidence
Beyond reviewing the conditional probability tables directly in Modeling Mode in the Node Editor, as above, we can
carry out inference conveniently in the Validation Mode (shortcut F5) of BayesiaLab.
This allows setting evidence and observing inference directly via the Monitors in the Monitor Panel (right side of screen-shot). We will now highlight JNJ and PG and focus on their Monitors only. Prior to setting any evidence, we will sim-
ply see their marginal distributions in the Monitors. As we would expect, we see the returns distributed around 0 and
the expected value of the returns is 0.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 18
Observing a speci!c state of a Node is equivalent to setting evidence and we can do that directly on the histograms in-side the Monitors. For instance, we can double-click on the state JNJ > 0.012, which sets it to a 100% probability, as
indicated by the green bar. Setting such evidence will automatically propagate this evidence throughout the network and
we can immediately observe the new distribution of PG. The gray arrows indicate how the distributions have changed compared to before setting evidence.
So far, this provides no more insight than what we could read from the Conditional Probability Table in the Node Edi-tor of the PG Node. What is not readily accessible from the CPT is the inverse probability by carrying out inference in the opposite direction of the Arc, i.e. setting evidence on PG and computing JNJ. Bayes’ Rule speci!es the necessary
computation in this case.17
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 19
17 See appendix for more details about Bayes’ Rule. Although this calculation is straightforward, application errors are
unfortunately commonplace. The error is so common that is now widely known as the Prosecutor’s Fallacy. In a recent white paper, Paradoxes and Fallacies, we dedicated a chapter to this problem:
www.conradyscience.com/index.php/paradoxes
In BayesiaLab the inference computation of JNJ is automatic once we set evidence to PG. To illustrate this, we arbitrar-
ily set the PG return to <=-1.3% and we can immediately see the updated distribution of JNJ.
So far, this could have been computed quite easily by directly applying Bayes’ Rule. It becomes a bit more challenging when we look at more than two Nodes at the same time. This time we will examine JNJ, PG and KMB (their relevant
subnetwork is shown for reference below).
Once again, prior to setting any evidence, the Monitors show the marginal distributions of JNJ, PG and KMB.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 20
Upon setting JNJ > 0.012, we can now see how the evidence not only propagates to PG, but also further “downstream”
to KMB:
We can also invert the chain of inference by simply setting evidence at the other end of the network, e.g. KMB > 0.012:
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 21
Or, we can set evidence on both ends, i.e. on JNJ and KMB, and then read the inference in the middle, for PG.
This inference will probably not surprise us: we now have an 80% probability that PG will have a return greater than
1.2%, given that we set both JNJ and KMB to >0.012.
Inference with Soft Evidence
We are not limited to only setting “hard evidence,” as we did above. In the real world, observations often provide “soft
evidence” only. So, instead of setting any of these variables to a state with a 100% probability and thus make them “hard evidence,” we can use BayesiaLab to set any evidence according to its nature, even when it is uncertain.
For illustration purposes, we will now generate two kinds of “soft evidence,” one for JNJ and one for KMB.
1. We set the evidence directly by right-clicking on the JNJ Monitor and selecting Enter Probabilities:
We can now adjust the histogram by dragging the bars to the desired probability levels which re"ect our subjective belief.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 22
Clicking the light-green button con!rms our choice of probabilities.
In addition, we right-click on the Monitor again to Fix Probabilities, meaning that we want to hold these values re-
gardless of any subsequent evidence we enter.
2. Assuming that we have a more general expectation regarding the KMB return, without having any beliefs regarding the probabilities of speci!c states, we can set the expected mean of the entire KMB distribution. For instance, we set
the expected mean of the states of KMB to -1% by right-clicking the KMB Monitor and selecting Distribution for Target Value/Mean.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 23
We type in “-0.01” into the dialog box,
which generates a new KMB distribution with the desired mean value of -0.01 or -1%.
It is obvious that an in!nite number of combinations could generate a mean value of -1%. However, as an aid to the
analyst, BayesiaLab computes which distribution with a mean value of -1% would be “closest” to the a-priori distri-bution.
Not only are these observations “soft,” in this example they are also of the opposite sign, i.e. JNJ has a positive mean of
the return and KMB has a negative mean of the return.
As a result, carrying out inference generates a more uniform probability distribution for PG (rather than a narrower
distribution), effectively increasing our uncertainty about the state of PG compared to the marginal distribution. The knowledge gain for the analyst is that greater volatility for PG must be expected.
We have limited our example to inference within a small subnetwork of only three Nodes, but we could have performed
the same approach over the entire Bayesian network of 459 Nodes. With this, the analyst has the complete freedom to set an unlimited number of all different kinds of evidence, both hard and soft, and to carry out inference “backwards”
and “forwards” within the network. For users of the BayesiaLab software, the automatic computation of inference and
the instant visual updating of the Monitors is comparable to recalculating all cells in a large spreadsheet.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 24
Bayesian Network MetricsAs shown in these examples, the Arcs represent the probabilistic relationships between Nodes. In addition to visually
interpreting the network structure, and beyond carrying out inference, we can also review the “summary statistics” of
the network and its components with several metrics.
It is important to point out that we use the information theory-based concepts of Entropy, Arc Force and Mutual In-formation as central metrics in generating and analyzing Bayesian networks. This is a clear departure from commonly
used metrics in traditional statistics, such as covariance and correlation. While these information theory-based metrics may appear novel to end-users of research, they have many advantages. Most importantly, we can entirely discard the
(often incorrect) assumption regarding linearity and normal distributions. As a result, highly nonlinear dynamics can be
easily captured in a Bayesian network.
Arc Force
For instance, the importance of each Arc can be highlighted by displaying the associated Arc Force and its contribution
with respect to the overall network. From within the Validation Mode, the Arc Force can be displayed by selecting
Analysis>Graphic>Arc Force (or with the shortcut “F”).
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 25
Mutual Information
A perhaps more accessible interpretation is possible by displaying the Mutual Information, which can be obtained by selecting Analysis>Graphic>Arcs’ Mutual Information.18
The Mutual Information I(X,Y) measures how much (on average) the observation of random variable Y tells us about
the uncertainty of X, i.e. by how much the entropy of X is reduced if we have information on Y. Mutual Information is
a symmetric metric, which re"ects the uncertainty reduction of X by knowing Y as well as of Y by knowing X.
In our example, knowing the value of PG on average reduces the uncertainty of the value of KMB by 0.2843 bits, which means that it reduces its uncertainty by 13.27% (shown in blue, in the direction of the arc). Conversely, knowing KMB
reduces the uncertainty or PG by 13.09% (shown in red, in the opposite direction of the arc).
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 26
18 Although interpreting Mutual Information is somewhat more intuitive, in the case of a network tree, Mutual Infor-mation is identical to Arc Force. For Bayesian networks that are not trees, this distinction becomes very important.
Correlation
While we emphasize the importance of Arc Force and Mutual Information as measures capable for capturing nonlinear relationships, BayesiaLab allows to display Pearson’s R for the network (select Analysis>Graphic>Pearson’s Correlation
or shortcut “G”).
By displaying the Pearson’s correlation coef!cient, we implicitly make the assumption of linear relationships between
the connected Nodes, which may often not hold in practice. Special care must thus be taken when interpreting low val-
ues of R, as they may re"ect nonlinearity rather than independence. On the other hand, R values close to 1 do indeed suggest the presence of linear relationship. Furthermore, Pearson’s R can be very helpful for determining the sign of the
relationship between variables. BayesiaLab will color-code positive and negative correlations by highlighting the associ-
ated Arcs in blue and red respectively. Finally, correlation is typically a much more familiar metric to most audiences who are not familiar with Mutual Information.
Summary - Unsupervised LearningIn summary, Unsupervised Learning is an excellent approach to obtain a general understanding of simultaneous rela-
tionships between many variables in a dataset. The learned Bayesian network allows immediate visual interpretation
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 27
plus immediate computation of omnidirectional inference based on any type of evidence, including uncertain and con-
"icting observations. Given these properties, Unsupervised Learning with Bayesian networks becomes a universal and robust tool for knowledge discovery and modeling in unknown problem domains.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 28
Supervised LearningUpon gaining a general understanding of a domain, questions typically arise regarding individual variables and how to predict them speci!cally. Even though we can use Unsupervised Learning to discover a network structure and use it for
prediction, Supervised Learning is often a more appropriate method when studying a speci!c target variable. By focus-
ing on a single target variable, BayesiaLab’s learning algorithms focus on !tting a (generative) model to a single target
rather than !tting a model that balances the !t in terms of all variables.
To remain consistent with the example we started earlier, we will once again use PG for illustration purposes. More
speci!cally, we will characterize PG as the Target Node. We can do so by right-clicking on the node and then selecting
Set as Target Node from the contextual menu (or by double-clicking the Node while holding “T”).
Now that we have de!ned a Target Node, we can perform a range of Supervised Learning algorithms implemented in
BayesiaLab.19
The Markov Blanket20 algorithm is suitable for this kind of application and its speed is particularly helpful when deal-
ing with hundreds or even thousands of variables. Furthermore, BayesiaLab offers the Augmented Markov Blanket, which starts with the Markov Blanket structure and then uses an unsupervised search to !nd the probabilistic relations that hold between each variable belonging to the Markov Blanket.21 This unsupervised search requires additional com-
putation time but generally results in an improved predictive performance of the model.
The learning process can be started by selecting Learning>Target Node Characterization>Augmented Markov Blanket from the menu.22
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 29
19 For expositional clarity we will only learn contemporaneous relationships and, as a result, potential lag structures will
not appear in the resulting networks. However, in BayesiaLab, Supervised Learning can be generalized to a temporal application.
20 See appendix for a de!nition of the Markov Blanket
21 Intuitively, the “augmented” part of the network plays the same role as the interaction terms between independent
variables in a regression.
22 In BayesiaLab nomenclature, Supervised Learning is listed in the Learning menu as “Target Node Characterization”
As we still have our previous network that was generated through Unsupervised Learning, we need to con!rm the dele-
tion of that original network before proceeding with Supervised Learning.
After a few seconds, we will see the result of the Supervised Learning process. Our Target Node PG is now connected to
all variables in its Markov Blanket. This means that, given the knowledge of the Nodes in the Markov Blanket, PG is independent of the remaining network. This effectively identi!es the subset of variables which are most important for
predicting the value of the Target Node, PG.
As stated in the introduction, it is not our intention to forecast stock prices per se, but rather to identify meaningful and
relevant structures in the market. Such a structure is this Augmented Markov Blanket and a stock market analyst can use it to identify a relevant subset of stocks for an in-depth analysis, perhaps with the objective of establishing a buy/sell
recommendation or to directly trade on such knowledge.
Once we have this network, we can use it to analyze these Nodes’ relationships in a number of ways within BayesiaLab. For instance, we can select Analysis>Graphic>Target Mean Analysis, which graphs PG as a function of the other Nodes in the network.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 30
Alternatively, by selecting Analysis>Report>Target Analysis>Correlation with the Target Node,
we obtain a table displaying the Mutual Information between the Nodes in the network and the Target Variable, PG:
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 31
By clicking Quadrants these values can be displayed as a graph:
Inference with Supervised LearningTo illustrate potential applications of Supervised Learning, beyond interpretation, we have created a simple simulation
of possible stock market conditions. Despite the hypothetical nature of these scenarios, the underlying Bayesian network
was learned from actual market data (as is the case for this entire white paper) and, as a result, the computed inference based on these assumed conditions is “real.”
One could imagine this purely hypothetical scenario: Colgate-Palmolive and Johnson & Johnson are involved in a pat-
ent lawsuit and an investment analyst speculates about the impact of the imminent verdict in this court case. It is fairly easy to imagine that a verdict in favor of Johnson & Johnson would result in a boost to its stock price and simultane-
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 32
ously cause a sharp drop for Colgate-Palmolive’s stock. Conversely, a win for Colgate-Palmolive would result in just the
opposite. However, our question is how either outcome would affect Procter & Gamble’s return, PG. We can best an-swer this question by simulating either outcome within the Bayesian network we learned.
Prior to setting any evidence, our marginal distributions of returns would be as follows, i.e. this is what we would ex-
pect any given day without any other knowledge:
If we were now to believe in a Johnson & Johnson win in combination with a Colgate-Palmolive loss and the corre-
sponding stock price movement for both of them, we could create the following scenario:
The gray arrows now highlight the impact on all other stocks in this model, including our target variable, PG. The
model suggests that the new distribution for PG would now be distinctly bimodal as opposed to the normal marginal
distribution.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 33
Now considering the opposite verdict, i.e. a Colgate-Palmolive win and a Johnson & Johnson defeat, we can once again
assume their resulting stock price movements and then infer the impact on PG.
This time, the a gain for PG would be much more probable.
So, if an analyst had a deep understanding of the subject matter (or insider knowledge23) and hence could anticipate the patent trial’s outcome, he should, everything else being equal, update his beliefs regarding the Procter & Gamble stock
return according to the computed inference of our model.
It is important to stress that this doesn’t mean we have discovered a causal pathway, but rather that we are taking ad-
vantage of historically observed associations between returns, which have generated a model in the form of a Bayesian network. The Bayesian network simply allows us to consequently exploit our learned knowledge.
Adaptive QuestionnaireThe Bayesian network from above can perhaps also serve to illustrate how evidence-gathering can be optimized in
BayesiaLab. Once again, this is purely hypothetical, but let’s assume that a stock trader seeks to predict tomorrow’s
return of PG. Tomorrow, as it turns out, earnings will also be released for numerous other stocks in the CPG industry, excluding PG. With limited time, our stock trader needs to prioritize his research resources on those stocks, which will
be most informative of the PG return. BayesiaLab has a convenient function, Adaptive Questionnaire, which allows the
analyst to adapt his evidence-seeking process as per the most recent information obtained and given the previously learned Bayesian network (shown again below for reference).
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 34
23 It should be noted that insider trading can refer to both legal and illegal conduct. See
http://www.sec.gov/answers/insider.htm
The function can be called by selecting Inference>Adaptive Questionnaire. The following pop-up window then prompts
to select and con!rm the Target.
Initially, the analyst’s research should begin with CL as the most informative Node, which is listed at the top of all
Monitors, right below the Target, PG.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 35
Let’s now assume he receives a tip, suggesting that CL earnings are coming in much higher than expected. He translates
this updated, subjective beliefs into “soft” evidence and thus sets P(CL>0.017)=60%, P(CL<=0.017)=30%, P(CL<=0.05)=10%, plus the remaining states to zero.
Upon entering this probability distribution, the Adaptive Questionnaire will move CL to the bottom (green bars with
gray background) and scroll up the next most important Node to study, in this case KMB.
Upon setting this evidence, the probabilities need to be !xed by right-clicking the Monitor and selecting Fix Probabili-ties.
This is important as other simultaneous beliefs have yet to be set. By not !xing the probabilities of CL, subsequent evi-
dence could inadvertently update the probabilities that were just de!ned.
Next, the analyst may obtain inconclusive views from his sources on KMB and thus he cannot set any new evidence to
this particular Node, although it would be the most informative evidence at this point. Rather, he moves on to CLX, which is widely believed to meet the expected earnings without any surprises. As a result, our analyst sets hard negative
evidence on either end of the return distribution, meaning that he anticipates no major swings either way:
P(CLX<=-0.11)=0 and P(CLX>0.13)=0. Upon setting this evidence, and once again !xing it, the Adaptive Question-
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 36
naire presents a new order of Nodes. Interestingly, given the evidence set on CLX, KMB has declined in importance
with respect to PG.
In the new order JNJ is next and our analyst determines that the stock will de!nitely gain based on insider rumors he
heard. He translates this insight into a certain JNJ return greater than 1.2% and sets it as “hard” evidence accordingly.
Given all the evidence he gathered, although some of it may be vague, the analyst concludes that there is now a 90%
probability of a PG return greater than 0.3%. Perhaps more importantly, the chance of a decline of -1.3% or below has
diminished to virtually zero. This translates into an expected mean return of 1.5% versus the a-priori expectation of 0%.
With the Bayesian network generated through Unsupervised Learning and the subsequent application of the Adaptive Questionnaire, the analyst has optimized his information-seeking process and thus spent the least amount of resources for a maximum reduction of uncertainty regarding the variable of interest.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 37
Summary - Supervised LearningIn many ways, Supervised Learning with BayesiaLab resembles traditional modeling and can thus be benchmarked
against a wide range of statistical techniques. In addition to its predictive performance, BayesiaLab offers an array of
analysis tools, which can provide the analyst with a deeper understanding of the domain’s underlying dynamics. The Bayesian network also provides the basis for a wide range of scenario simulation and optimization algorithms imple-
mented in BayesiaLab. Beyond mere one-time predictions, BayesiaLab allows dealing with evidence interactively and
incrementally, which makes it a highly adaptive tool for real-time inference.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 38
Appendix
Appendix
Markov Blanket In many cases, the Markov Blanket algorithm is a good starting point for any predictive model, whether used for scor-
ing or classi!cation. This algorithm is extremely fast and can even be applied to databases with thousands of variables and millions of records.
The Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and its children’s other par-
ents (=spouses).
The Markov Blanket of the node A contains all the variables, which, if we know their states, will shield the node A
from the rest of the network. This means that the Markov Blanket of a node is the only knowledge needed to predict
the behavior of that node A. Learning a Markov Blanket selects relevant predictor variables, which is particularly help-
ful when there is a large number of variables in the database (In fact, this can also serve as a highly-ef!cient variable selection method in preparation for other types of modeling, outside the Bayesian network framework).
Bayes’ TheoremBayes’ theorem relates the conditional and marginal probabilities of discrete events A and B, provided that the probabil-
ity of B does not equal zero:
P(A∣B) = P(B∣A)P(A)P(B)
In Bayes’ theorem, each probability has a conventional name:
• P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense that it
does not take into account any information about B. The unconditional probability P(A) was called “a priori” by
Ronald A. Fisher.
• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from
or depends upon the speci!ed value of B.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 39
• P(B|A) is the conditional probability of B given A. It is also called the likelihood.
• P(B) is the prior or marginal probability of B.
Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is
related to the converse conditional probability of B given A.
About the Authors
Stefan Conrady
Stefan Conrady is the cofounder and managing partner of Conrady Applied Science, LLC, a privately held consulting
!rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010, Conrady Applied
Science was appointed the authorized sales and consulting partner of Bayesia S.A.S. for North America.
Stefan Conrady studied Electrical Engineering and has extensive management experience in the !elds of product plan-
ning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and Asia. Prior to es-
tablishing his own !rm, he was heading the Analytics & Forecasting group at Nissan North America.
Lionel Jouffe
Dr. Lionel Jouffe is cofounder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in Computer Science
and has been working in the !eld of Arti!cial Intelligence since the early 1990s. He and his team have been developing
BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as
in business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is high-
lighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 40
Contact Information
Conrady Applied Science, LLC312 Hamlet’s End Way
Franklin, TN 37067
USA
+1 888-386-8383 [email protected]
www.conradyscience.com
Bayesia S.A.S.6, rue Léonard de Vinci
BP 119
53001 Laval CedexFrance
+33(0)2 43 49 75 69
www.bayesia.com
Copyright© 2011 Conrady Applied Science, LLC and Bayesia S.A.S. All rights reserved.
Any redistribution or reproduction of part or all of the contents in any form is prohibited other than the following:
• You may print or download this document for your personal and noncommercial use only.
• You may copy the content to individual third parties for their personal use, but only if you acknowledge Conrady
Applied Science, LLC and Bayesia S.A.S as the source of the material.
• You may not, except with our express written permission, distribute or commercially exploit the content. Nor may you transmit it or store it in any other website or other form of electronic retrieval system.
Knowledge Discovery in the Stock Market with Bayesian Networks
www.conradyscience.com | www.bayesia.com 41