Full Bayesian Network Classifiers by Jiang Su and Harry...
Transcript of Full Bayesian Network Classifiers by Jiang Su and Harry...
![Page 1: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/1.jpg)
Full Bayesian Network Classifiersby Jiang Su and Harry Zhang
Flemming Jensen
November 2008
![Page 2: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/2.jpg)
Purpose
To introduce the full Bayesian network classifier(FBC).
![Page 3: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/3.jpg)
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
![Page 4: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/4.jpg)
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
![Page 5: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/5.jpg)
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
![Page 6: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/6.jpg)
Introduction
Bayesian networks are often used for the classification problem,where a learner attempts to construct a classifier from a given setof labeled training examples.
Since the number of possible network structures is extremely huge,structure learning often has high computational complexity.
The idea behind the full Bayesian network classifier is to reduce thecomputational complexity of structure learning by using a fullBayesian network as the structure, and represent variableindependence in the conditional probability tables instead of in thenetwork structure.
We use decision trees to represent the conditional probabilitytables to keep the compact representation of the joint distribution.
![Page 7: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/7.jpg)
Variable Independence
Definition - Conditionally independence
Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:
P(X |Y ,Z ) = P(X |Z )
Definition - Contextually independence
Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:
P(X |Y ,Z , t) = P(X |Z , t)
![Page 8: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/8.jpg)
Variable Independence
Definition - Conditionally independence
Let X , Y , Z be subsets of the variable set W . The subsets X andY are conditionally independent given Z if:
P(X |Y ,Z ) = P(X |Z )
Definition - Contextually independence
Let X , Y , Z , T be disjoint subsets of the variable set W . Thesubsets X and Y are contextually independent given Z and thecontext t if:
P(X |Y ,Z , t) = P(X |Z , t)
![Page 9: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/9.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 10: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/10.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 11: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/11.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 12: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/12.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 13: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/13.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 14: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/14.jpg)
Existence
Theorem - Existence
For any BN B, there exists an FBC FB, such that B and FB encodethe same variable independencies.
Proof:
Since B is an acyclic graph, the nodes of B can be sorted onthe basis of the topological ordering.
Go through each node X in the topological ordering, and addarcs to all the nodes ranked after X .
The resulting network FB is a full BN.
Build a CPT-tree for each node X in FB, such that anyvariable that is not in the parent set ΠX of X in B does notoccur in the CPT-tree of X in FB.
![Page 15: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/15.jpg)
Example - FBC for Naive Bayes
Example of a naive Bayes
C
X1 X2 X3 X4
![Page 16: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/16.jpg)
Example - FBC for Naive Bayes
Example of an FBC for the naive Bayes
C
X1 X2 X3 X4
X1 X1
C
p11p12p13p14
X2 X2
C
p21p22p23p24
X3 X3
C
p31p32p33p34
X4 X4
C
p41p42p43p44
![Page 17: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/17.jpg)
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
![Page 18: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/18.jpg)
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
![Page 19: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/19.jpg)
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
![Page 20: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/20.jpg)
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
![Page 21: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/21.jpg)
Learning Full Bayesian Network Classifiers
Learning an FBC consists of two parts:
Construction of a full BN.
Learning of decision trees to represent the CPT of eachvariable.
The full BN is implemented using a Bayesian multinet.
Definition - Bayesian multinet
A Bayesian multinet is a set of Bayesian networks, each of whichcorresponds to a value c of the class variable C .
![Page 22: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/22.jpg)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
![Page 23: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/23.jpg)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
![Page 24: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/24.jpg)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
![Page 25: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/25.jpg)
Structure Learning
Learning the structure of a full BN actually means learning anorder of variables and then adding arcs from a variable to all thevariables ranked after it.
A variable is ranked based on its total influence on other variables.
The influence (dependency) between two variables can bemeasured by mutual information.
Definition - Mutual information
Let X and Y be two variables in a Bayesian network. The mutualinformation is defined as:
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
![Page 26: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/26.jpg)
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
![Page 27: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/27.jpg)
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
![Page 28: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/28.jpg)
Structure Learning
It is possible that the dependency between two variables, measuredby mutual information, is caused merely by noise.
Results by Friedman are used as a dependency threshold to filterout unreliable dependencies.
Definition - Dependency threshold
Let Xi and Xj be two variables in a Bayesian network. Thedependency threshold, denoted by φ, is defined as:
φ(Xi ,Xj) = logN2N × Tij , where Tij = |Xi | × |Xj |.
![Page 29: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/29.jpg)
Structure Learning
The total influence of a variable on other variables can now bedefined:
Definition - Total influence
Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj).
![Page 30: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/30.jpg)
Structure Learning
The total influence of a variable on other variables can now bedefined:
Definition - Total influence
Let Xi be a variable in a Bayesian network. The total influence ofXi on other variables, denoted by W (Xi ), is defined as:
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj).
![Page 31: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/31.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 32: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/32.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 33: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/33.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 34: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/34.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 35: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/35.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .
Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 36: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/36.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .
For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 37: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/37.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X
- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 38: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/38.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .
- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 39: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/39.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .
Add the resulting network Bc to B.
4 Return B.
![Page 40: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/40.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 41: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/41.jpg)
Structure Learning Algorithm
Algorithm FBC-Structure(S, X)
1 B = empty.
2 Partition the training data S into |C | subsets Sc by the classvalue c .
3 For each training data set Sc
Compute the mutual information M(Xi ; Xj) and thedependency threshold φ(Xi ,Xj) between each pair of variablesXi and Xj .Compute W (Xi ) for each variable Xi .For all variables Xi in X- Add all the variables Xj with W (Xj) > W (Xi ) to the parentset ΠXi of Xi .- Add arcs from all the variables Xj in ΠXi to Xi .Add the resulting network Bc to B.
4 Return B.
![Page 42: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/42.jpg)
Example - Structure Learning Algorithm
Example using 1000 labeled instances, where C is the class variableand A, B, and D are feature variables.
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25
C A B D #
c2 a1 b1 d1 36
c2 a1 b1 d2 36
c2 a1 b2 d1 259
c2 a1 b2 d2 29
c2 a2 b1 d1 96
c2 a2 b1 d2 96
c2 a2 b2 d1 43
c2 a2 b2 d2 5
![Page 43: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/43.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 44: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/44.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 45: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/45.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 46: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/46.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 47: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/47.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 48: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/48.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a111+5400
7+17400
a2227+97
40011+25400
P(A,B)
![Page 49: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/49.jpg)
Example - Structure Learning Algorithm
C A B D #
c1 a1 b1 d1 11
c1 a1 b1 d2 5
c1 a1 b2 d1 7
c1 a1 b2 d2 17
c1 a2 b1 d1 227
c1 a2 b1 d2 97
c1 a2 b2 d1 11
c1 a2 b2 d2 25The 400 data instances where C = c1.
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
![Page 50: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/50.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 51: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/51.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 52: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/52.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 53: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/53.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 54: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/54.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 55: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/55.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09)
a2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09)
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 56: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/56.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 57: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/57.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B)= 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 58: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/58.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 59: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/59.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085)+0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 60: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/60.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 61: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/61.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015)+0.09 · log(
0.09
0.135) = 0.027
![Page 62: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/62.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135)= 0.027
![Page 63: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/63.jpg)
Example - Structure Learning Algorithm
b1 b2
a1 0.04 0.06
a2 0.81 0.09
P(A,B)
b1 b2
a1 0.085 0.015
a2 0.765 0.135
P(A)P(B)
M(X ; Y ) =∑
x∈X ,y∈Y
P(x , y)logP(x , y)
P(x)P(y)
M(A; B) = 0.04 · log(0.04
0.085) + 0.81 · log(
0.81
0.765)
+ 0.06 · log(0.06
0.015) + 0.09 · log(
0.09
0.135) = 0.027
![Page 64: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/64.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027
M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 65: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/65.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004
M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 66: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/66.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 67: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/67.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 68: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/68.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D)
= 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 69: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/69.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800
= 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 70: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/70.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 71: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/71.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 72: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/72.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A)
= M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 73: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/73.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B)
= 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 74: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/74.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027
indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 75: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/75.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B)
= M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 76: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/76.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B)
+M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 77: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/77.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D)
= 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 78: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/78.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045
indent indent indentW (D) = M(B; D) = 0.018
![Page 79: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/79.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D)
= M(B; D) = 0.018
![Page 80: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/80.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D)
= 0.018
![Page 81: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/81.jpg)
Example - Structure Learning Algorithm
Mutual informationM(A; B) = 0.027M(A; D) = 0.004M(B; D) = 0.018
Dependency thresholdφ(Xi ,Xj) = logN
2N × Tij
φ(A,B) = φ(A,D) = φ(B,D) = 4log400800 = 0.013
Total influence
W (Xi ) =
M(Xi ;Xj )>φ(Xi ,Xj )∑j(j 6=i)
M(Xi ; Xj)
indent indent indentW (A) = M(A; B) = 0.027indent indent indentW (B) = M(A; B) +M(B; D) = 0.045indent indent indentW (D) = M(B; D) = 0.018
![Page 82: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/82.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 83: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/83.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 84: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/84.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 85: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/85.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 86: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/86.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 87: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/87.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 88: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/88.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 89: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/89.jpg)
Example - Structure Learning Algorithm
We now construct a full Bayesian network with variable orderaccording to the total influence values:
W (A) = 0.027W (B) = 0.045W (D) = 0.018
W (B) > W (A) > W (D)
B A D
We now have the full Bayesian network Bc1 , which is the part ofthe multinet that corresponds to C = c1.We should now repeat the process to construct Bc2 and therebycomplete the FBC structure learning.
![Page 90: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/90.jpg)
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
![Page 91: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/91.jpg)
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
![Page 92: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/92.jpg)
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
![Page 93: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/93.jpg)
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
![Page 94: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/94.jpg)
CPT-tree Learning
We now need to learn a CPT-tree for each variable in the full BN.
A traditional decision tree learning algorithm, such as C4.5, can beused to learn CPT-trees. However, since the time complexitytypically is O(n2 · N) the resulting FBC learning algorithm wouldhave a complexity of O(n3 · N).
Instead a fast decision tree learning algorithm is purposed.
The algorithm uses the mutual information to determine a fixedordering of variables from root to leaves.
The predefined variable ordering makes the algorithm faster thantraditional decision tree learning algorithms.
![Page 95: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/95.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 96: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/96.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.
2 If (S is pure or empty) or (ΠXiis empty)
Return T.3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 97: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/97.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 98: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/98.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.
4 While (qualified == False) and (ΠXiis not empty)
Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 99: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/99.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)
Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 100: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/100.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).
Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 101: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/101.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .
Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 102: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/102.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.
Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 103: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/103.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.
If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True5 If qualified == True
Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 104: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/104.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 105: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/105.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == True
Create a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 106: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/106.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.
Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 107: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/107.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .
For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 108: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/108.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 109: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/109.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)
- Add Tx as a child of Xj .
6 Return T.
![Page 110: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/110.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 111: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/111.jpg)
CPT-tree Learning Algorithm
Algorithm Fast-CPT-Tree(ΠXi, S)
1 Create an empty tree T.2 If (S is pure or empty) or (ΠXi
is empty)Return T.
3 qualified = False.4 While (qualified == False) and (ΠXi
is not empty)Choose the variable Xj with the highest M(Xj ; Xi ).Remove Xj from ΠXi .Compute the local mutual information MS(Xi ; Xj) on S.Compute the local dependency threshold φS(Xi ,Xj) on S.If MS(Xi ; Xj) > φS(Xi ,Xj) qualified = True
5 If qualified == TrueCreate a root Xj for T.Partition S into disjoint subsets Sx , x is a value of Xj .For all values x of Xj
- Tx = Fast-CPT-Tree(ΠXi , Sx)- Add Tx as a child of Xj .
6 Return T.
![Page 112: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/112.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.
Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 113: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/113.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 114: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/114.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.
MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 115: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/115.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018
, φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 116: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/116.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013
MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 117: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/117.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.
Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 118: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/118.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .
Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 119: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/119.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)
Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 120: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/120.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 121: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/121.jpg)
Example - CPT-tree Learning Algorithm
We construct the CPT-tree for the variable D first.Fast-CPT-Tree(ΠD = {A,B}, S)
M(D; B) = 0.018 > M(D; A) = 0.004 so Xj = B.MS(D; B) = M(D; B) = 0.018 , φS(D,B) = φ(D,B) = 0.013MS(D; B) > φS(D,B) so qualified = True.Since qualified == True, create a root for Xj = B andpartition S into the subsets Sb1 and Sb2 .Recursively call:Fast-CPT-Tree(ΠD = {A}, Sb1) andFast-CPT-Tree(ΠD = {A}, Sb2)Add the resulting trees as children of Xj = B.
Bb1 b2
![Page 122: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/122.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 123: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/123.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 124: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/124.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6
, φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 125: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/125.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015
MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 126: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/126.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 127: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/127.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 128: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/128.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 129: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/129.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 130: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/130.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5
, φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 131: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/131.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059
MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 132: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/132.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 133: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/133.jpg)
Example - CPT-tree Learning Algorithm
Fast-CPT-Tree(ΠD = {A}, Sb1)
Only one parent variable remains, so Xj = A.
MSb1 (D; A) = 7 · 10−6 , φSb1 (D,A) = 0.015MSb1 (D; A) ≯ φSb1 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
Fast-CPT-Tree(ΠD = {A}, Sb2)
Only one parent variable remains, so Xj = A.
MSb2 (D; A) = 4 · 10−5 , φSb2 (D,A) = 0.059MSb2 (D; A) ≯ φSb2 (D,A) so qualified = False.
Since qualified == False, return the empty tree.
![Page 134: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/134.jpg)
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
We should repeat this process for each variable in each network.
![Page 135: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/135.jpg)
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
We should repeat this process for each variable in each network.
![Page 136: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/136.jpg)
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
Dd1 d2
Dd1 d2
We should repeat this process for each variable in each network.
![Page 137: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/137.jpg)
Example - CPT-tree Learning Algorithm
We now only need to add Xi = D as children of B and specify theprobabilities, which are trivial to calculate.
Bb1 b2
Dd1 d2
Dd1 d2
11+227340 =0.7
5+97340 =0.3
7+1160 =0.3
17+2560 =0.7
We should repeat this process for each variable in each network.
![Page 138: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/138.jpg)
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
![Page 139: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/139.jpg)
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
![Page 140: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/140.jpg)
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).
Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
![Page 141: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/141.jpg)
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
![Page 142: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/142.jpg)
Complexity
Let n be the number of variables and N the number of datainstances.
FBC-Structure has time complexity O(n2 · N).
Fast-CPT-Tree has time complexity O(n · N).Fast-CPT-Tree is called once for each variable in each of the |C |multinet parts. Hence the time complexity:O(|C | · n2 · N
|C |) = O(n2 · N).
Thus, the FBC learning algorithm has the time complexityO(n2 · N).
![Page 143: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/143.jpg)
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
![Page 144: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/144.jpg)
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
![Page 145: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/145.jpg)
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
![Page 146: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/146.jpg)
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
![Page 147: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/147.jpg)
Experiments - Results
33 UCI data sets, available in Weka, are used for experiments.
Performance of an algorithm on each data set is observed via 10runs of 10-fold cross validation.
Two-tailed t-test with a 95% confidence interval is conducted tocompare each pair of algorithms on each data set.
Results on accuracy - classification (data sets won - draw - lost)AODE HGC TAN NBT C4.5 SMO
FBC 8/22/3 4/27/2 6/27/0 6/27/0 11/19/3 6/24/2
Results on AUC - ranking (data sets won - draw - lost)AODE HGC TAN NBT C4.5L SMO
FBC 7/22/4 6/25/2 9/24/0 8/24/1 25/7/1 10/20/3
![Page 148: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/148.jpg)
Experiments - Complexity
Complexity of tested algorithms
Training Classification
FBC O(n2 · N) O(n)AODE O(n2 · N) O(n2)HGC O(n4 · N) O(n)TAN O(n2 · N) O(n)NBT O(n3 · N) O(n)C4.5 O(n2 · N) O(n)SMO O(n2.3) O(n)
![Page 149: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/149.jpg)
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.
![Page 150: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/150.jpg)
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.
![Page 151: Full Bayesian Network Classifiers by Jiang Su and Harry Zhangpeople.cs.aau.dk/~tdn/Teaching/MI2008/Slides/FJ_2.pdfBayesian network as the structure, and represent variable independence](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fec590402f58b2ab30c60b3/html5/thumbnails/151.jpg)
Experiments - Conclusion
FBC demonstrates good performance in both classification andranking.
FBC is among the most efficient algorithms in both training andclassification time.
Overall, the performance of FBC is the best among the algorithmscompared.