THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty...
-
Upload
augustus-collins -
Category
Documents
-
view
217 -
download
0
Transcript of THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty...
![Page 1: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/1.jpg)
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY
CSIT 5220: Reasoning and Decision under Uncertainty
L10: Model-Based Classification and Clustering
Nevin L. ZhangRoom 3504, phone: 2358-7015,
Email: [email protected] Home page
![Page 2: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/2.jpg)
CSIT 5220
L10: Model-Based Classification and Clustering
Probabilistic Models (PMs) for Classification
PMs for Clustering
Page 2
![Page 3: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/3.jpg)
CSIT 5220
The problem:
Given data:
Find mapping (A1, A2, …, An) |- C
Possible solutions
ANN
Decision tree (Quinlan)
…
(SVM: Continuous data)
Classification
![Page 4: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/4.jpg)
CSIT 5220
Probabilistic Approach to Classification
![Page 5: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/5.jpg)
CSIT 5220Page 5
Will Boss Play Tennis?
![Page 6: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/6.jpg)
CSIT 5220Page 6
Will Boss Play Tennis?
![Page 7: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/7.jpg)
CSIT 5220Page 7
![Page 8: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/8.jpg)
CSIT 5220Page 8
![Page 9: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/9.jpg)
CSIT 5220Page 9
![Page 10: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/10.jpg)
CSIT 5220Page 10
![Page 11: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/11.jpg)
CSIT 5220Page 11
Naïve Bayes model often has good performance in practice
Drawbacks of Naïve Bayes: Attributes mutually independent given class variable
Often violated, leading to double counting.
Fixes: General BN classifiers
Tree augmented Naïve Bayes (TAN) models
…
Bayesian Networks for Classification
![Page 12: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/12.jpg)
CSIT 5220Page 12
General BN classifier Treat class variable just as another variable
Learn a BN.
Classify the next instance based on values of variables in the Markov
blanket of the class variable.
Pretty bad because it does not utilize all available information because
of Markov boundary
Bayesian Networks for Classification
![Page 13: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/13.jpg)
CSIT 5220Page 13
Bayesian Networks for Classification
Tree-Augmented Naïve Bayes (TAN) model Capture dependence among attributes using a tree structure.
During learning, First learn a tree among attributes: use Chow-Liu algorithm
Special structure learning problem, easy
Add class variable and estimate parameters
Classification arg max_c P(C=c|A1=a1, …, An=an)
BN inference
Many other methods
![Page 14: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/14.jpg)
CSIT 5220
Task: Find a tree model over observed variables that has maximum
likelihood given data.
Maximized loglikelihood
Chow-Liu Trees
![Page 15: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/15.jpg)
CSIT 5220
![Page 16: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/16.jpg)
CSIT 5220
![Page 17: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/17.jpg)
CSIT 5220
![Page 18: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/18.jpg)
CSIT 5220
![Page 19: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/19.jpg)
CSIT 5220
![Page 20: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/20.jpg)
CSIT 5220
![Page 21: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/21.jpg)
CSIT 5220
Mutual Information
Chow-Liu Trees
Task is equivalent to finding maximum spanning tree of the following weighted and undirected graph:
![Page 22: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/22.jpg)
CSIT 5220
Maximum Spanning Trees
![Page 23: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/23.jpg)
CSIT 5220
http://www.cs.cmu.edu/~guestrin/Class/15781/recitations/r10/11152007chowliu.pdf
Illustration of Kruskal’s Algorithm
![Page 24: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/24.jpg)
CSIT 5220
L10: Probabilistic Models (PMs) for Classification and Clustering
Page 24
Probabilistic Models (PMs) for Classification
PMs for Clustering
![Page 25: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/25.jpg)
CSIT 5220Page 25
![Page 26: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/26.jpg)
CSIT 5220Page 26
![Page 27: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/27.jpg)
CSIT 5220Page 27
![Page 28: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/28.jpg)
CSIT 5220Page 28
![Page 29: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/29.jpg)
CSIT 5220Page 29
![Page 30: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/30.jpg)
CSIT 5220Page 30
![Page 31: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/31.jpg)
CSIT 5220Page 31
![Page 32: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/32.jpg)
CSIT 5220Page 32
![Page 33: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/33.jpg)
CSIT 5220
An Medical Application
In medical diagnosis, sometimes gold standard exists
Example: Lung Cancer
Symptoms: Persistent cough, Hemoptysis (Coughing up blood), Constant chest
pain, Shortness of breath, Fatigue, etc
Information for diagnosis: symptoms, medical history, smoking
history, X-ray, sputum.
Gold standard: Biopsy: the removal of a small sample of tissue for examination under
a microscope by a pathologist
![Page 34: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/34.jpg)
CSIT 5220
An Medical Application
Sometimes gold standard does not exist
Example: Rheumatoid Arthritis (RA)
Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint
Stiffness, etc
Information for diagnosis: Symptoms, medical history, physical exam,
Lab tests including a test for rheumatoid factor.
(Rheumatoid factor is an antibody found in the blood of about 80 percent of
adults with RA. )
No gold standard: None of the symptoms or their combinations are not clear-cut indicators of RA
The presence or absence of rheumatoid factor does not indicate that one has RA.
![Page 35: THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649de75503460f94ae1255/html5/thumbnails/35.jpg)
CSIT 5220
LC Analysis of Hannover Rheumatoid Arthritis Data
Class specific probabilities
Cluster 1: “disease” free
Cluster 2: “back-pain type”
Cluster 3: “Joint type”
Cluster 4: “Severe type”