Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

19
Artificial Intelligence Project Artificial Intelligence Project #3 #3 : Diagnosis Using Bayesian : Diagnosis Using Bayesian Networks Networks May 19, 2005

Transcript of Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

Page 1: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

Artificial Intelligence Project #3 Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks: Diagnosis Using Bayesian Networks

May 19, 2005

Page 2: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

2

Goals of the ProjectGoals of the Project

Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.

Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other

classifiers such as neural networks.

Page 3: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

3

Given Bayesian NetworksGiven Bayesian Networks

Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (http://research.microsoft.com/adapt/MSBNx/)

Page 4: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

4

Example Bayesian Network Structure IExample Bayesian Network Structure I

Page 5: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

5

Example Bayesian Network Structure IIExample Bayesian Network Structure II

Page 6: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

6

**.dsc Files.dsc Files

Node name

Possible states

Parents

Child

Conditional probability

distribution

Page 7: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

7

Data GenerationData Generation

X1

X3 X4

X2

X5 X6

1. Sample X1 from P(X1)

2. Sample X2 from P(X2)

3. Sample X3 from P(X3| X1)

4. Sample X4 from P(X4| X1, X2)

5. Sample X5 from P(X5| X3)

6. Sample X6 from P(X6| X4)

Page 8: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

8

Data Generation ToolData Generation Tool

data_generator Usage: data_generator [network file style] [# of nodes] [# of

data samples] [input file] [output file]...

Page 9: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

9

Structural Learning of Bayesian NetworksStructural Learning of Bayesian Networks

Using WEKA software (http://www.cs.waikato.ac.nz/ml/weka/)

Page 10: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

10

Learning ExampleLearning Example

The original networkstructure

Learned networkstructure

Page 11: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

11

Materials for the First OneMaterials for the First One

Given Bayesian networks

sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc

Data generation tool data_generator.exe [for Windows], data_generator [for Linux]

Downloadable MSBNX (http://research.microsoft.com/adapt/MSBNx/)

WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

You should write your own code for comparing Bayesian network structures.

Page 12: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

12

StudyStudy

Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003.

60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Page 13: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

13

Gene Expression DataGene Expression Data

# of data examples 120 (60: before treatment, 60: after treatment)

# of genes measured 12600 (Affymetrix HG-U95A array)

Task Classification between “before treatment” and “after treatment”

based on gene expression pattern

Page 14: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

14

Affymetrix GeneChip ArraysAffymetrix GeneChip Arrays

Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by

Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal

Page 15: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

15

PreprocessingPreprocessing

Remove the genes having more than 60 ‘A’ calls # of genes: 12600 3190

Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high)

Page 16: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

16

Gene FilteringGene Filtering

Using mutual information

Estimated probabilities were used. # of genes: 3190 1000

Final dataset # of attributes: 1001 (one for the class)

Class: 0 (after treatment), 1 (before treatment)

# of data examples: 120

,

( , )( ; ) ( , ) log

( ) ( )G C

P G CI G C P G C

P G P C

Page 17: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

17

Final DatasetFinal Dataset

120

1000

Page 18: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

18

Materials for the Second OneMaterials for the Second One

Given Preprocessed microarray data file: data2.txt

Downloadable WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

Page 19: Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

(c) 2005 SNU CSE Biointelligence Lab

19

Due: June 16, 2005Due: June 16, 2005

Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size.

Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other

classifiers such as neural networks.