Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
-
Upload
rosamund-daniels -
Category
Documents
-
view
219 -
download
0
Transcript of Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Artificial Intelligence Project #3 Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks: Diagnosis Using Bayesian Networks
May 19, 2005
(c) 2005 SNU CSE Biointelligence Lab
2
Goals of the ProjectGoals of the Project
Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.
Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other
classifiers such as neural networks.
(c) 2005 SNU CSE Biointelligence Lab
3
Given Bayesian NetworksGiven Bayesian Networks
Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (http://research.microsoft.com/adapt/MSBNx/)
(c) 2005 SNU CSE Biointelligence Lab
4
Example Bayesian Network Structure IExample Bayesian Network Structure I
(c) 2005 SNU CSE Biointelligence Lab
5
Example Bayesian Network Structure IIExample Bayesian Network Structure II
(c) 2005 SNU CSE Biointelligence Lab
6
**.dsc Files.dsc Files
Node name
Possible states
Parents
Child
Conditional probability
distribution
(c) 2005 SNU CSE Biointelligence Lab
7
Data GenerationData Generation
X1
X3 X4
X2
X5 X6
1. Sample X1 from P(X1)
2. Sample X2 from P(X2)
3. Sample X3 from P(X3| X1)
4. Sample X4 from P(X4| X1, X2)
5. Sample X5 from P(X5| X3)
6. Sample X6 from P(X6| X4)
(c) 2005 SNU CSE Biointelligence Lab
8
Data Generation ToolData Generation Tool
data_generator Usage: data_generator [network file style] [# of nodes] [# of
data samples] [input file] [output file]...
(c) 2005 SNU CSE Biointelligence Lab
9
Structural Learning of Bayesian NetworksStructural Learning of Bayesian Networks
Using WEKA software (http://www.cs.waikato.ac.nz/ml/weka/)
(c) 2005 SNU CSE Biointelligence Lab
10
Learning ExampleLearning Example
The original networkstructure
Learned networkstructure
(c) 2005 SNU CSE Biointelligence Lab
11
Materials for the First OneMaterials for the First One
Given Bayesian networks
sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc
Data generation tool data_generator.exe [for Windows], data_generator [for Linux]
Downloadable MSBNX (http://research.microsoft.com/adapt/MSBNx/)
WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
You should write your own code for comparing Bayesian network structures.
(c) 2005 SNU CSE Biointelligence Lab
12
StudyStudy
Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003.
60 leukemia patients
Bone marrow samples
Affymetrix GeneChip arrays
Gene expression data
(c) 2005 SNU CSE Biointelligence Lab
13
Gene Expression DataGene Expression Data
# of data examples 120 (60: before treatment, 60: after treatment)
# of genes measured 12600 (Affymetrix HG-U95A array)
Task Classification between “before treatment” and “after treatment”
based on gene expression pattern
(c) 2005 SNU CSE Biointelligence Lab
14
Affymetrix GeneChip ArraysAffymetrix GeneChip Arrays
Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by
Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal
(c) 2005 SNU CSE Biointelligence Lab
15
PreprocessingPreprocessing
Remove the genes having more than 60 ‘A’ calls # of genes: 12600 3190
Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high)
(c) 2005 SNU CSE Biointelligence Lab
16
Gene FilteringGene Filtering
Using mutual information
Estimated probabilities were used. # of genes: 3190 1000
Final dataset # of attributes: 1001 (one for the class)
Class: 0 (after treatment), 1 (before treatment)
# of data examples: 120
,
( , )( ; ) ( , ) log
( ) ( )G C
P G CI G C P G C
P G P C
(c) 2005 SNU CSE Biointelligence Lab
17
Final DatasetFinal Dataset
120
1000
(c) 2005 SNU CSE Biointelligence Lab
18
Materials for the Second OneMaterials for the Second One
Given Preprocessed microarray data file: data2.txt
Downloadable WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
(c) 2005 SNU CSE Biointelligence Lab
19
Due: June 16, 2005Due: June 16, 2005
Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.
Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other
classifiers such as neural networks.