Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.
![Page 1: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/1.jpg)
Machine Learning IIEnsembles & Cotraining
CSE 573
Representing Uncertainty
![Page 2: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/2.jpg)
© Daniel S. Weld 2
Logistics
• Reading Ch 13 Ch 14 thru 14.3
![Page 3: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/3.jpg)
© Daniel S. Weld 3
DT Learning as Search• Nodes
• Operators
• Initial node
• Heuristic?
• Goal?
• Type of Search?
Decision Trees
Tree Refinement: Sprouting the tree
Smallest tree possible: a single leaf
Information Gain
Best tree possible (???)
Hill climbing
![Page 4: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/4.jpg)
Bias
• Restriction
• Preference
![Page 5: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/5.jpg)
© Daniel S. Weld 5
Decision Tree Representation
Outlook
Humidity Wind
YesYes
Yes
No No
Sunny Overcast Rain
High StrongNormal Weak
Good day for tennis?Leaves = classificationArcs = choice of valuefor parent attribute
Decision tree is equivalent to logic in disjunctive normal formG-Day (Sunny Normal) Overcast (Rain Weak)
![Page 6: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/6.jpg)
© Daniel S. Weld 6
Overfitting
Model complexity (e.g. Number of Nodes in Decision tree )
Accuracy
0.9
0.8
0.7
0.6
On training dataOn test data
![Page 7: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/7.jpg)
© Daniel S. Weld 7
Machine Learning Outline
•Supervised Learning Review
•Ensembles of Classifiers Bagging Cross-validated committees Boosting Stacking
•Co-training
![Page 8: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/8.jpg)
© Daniel S. Weld 8
Voting
![Page 9: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/9.jpg)
© Daniel S. Weld 9
Ensembles of Classifiers• Assume
Errors are independent (suppose 30% error) Majority vote
• Probability that majority is wrong…
• If individual area is 0.3• Area under curve for 11 wrong is
0.026• Order of magnitude improvement!
Ensemble of 2
1
classi
fiers
Prob 0.2
0.1
Number of classifiers in error
= area under binomial distribution
![Page 10: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/10.jpg)
© Daniel S. Weld 10
Constructing Ensembles
• Partition examples into k disjoint equiv classes• Now create k training sets
Each set is union of all equiv classes except one So each set has (k-1)/k of the original training data
• Now train a classifier on each set
Cross-validated committees
Hold
ou
t
![Page 11: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/11.jpg)
© Daniel S. Weld 11
Ensemble Construction II
• Generate k sets of training examples• For each set
Draw m examples randomly (with replacement) From the original set of m examples
• Each training set corresponds to 63.2% of original (+ duplicates)
• Now train classifier on each set
Bagging
![Page 12: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/12.jpg)
© Daniel S. Weld 12
Ensemble Creation III
• Maintain prob distribution over set of training ex• Create k sets of training data iteratively:• On iteration i
Draw m examples randomly (like bagging) But use probability distribution to bias selection Train classifier number i on this training set Test partial ensemble (of i classifiers) on all training exs Modify distribution: increase P of each error ex
• Create harder and harder learning problems...• “Bagging with optimized choice of examples”
Boosting
![Page 13: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/13.jpg)
© Daniel S. Weld 13
Ensemble Creation IVStacking
• Train several base learners• Next train meta-learner
Learns when base learners are right / wrong Now meta learner arbitrates
Train using cross validated committees• Meta-L inputs = base learner predictions• Training examples = ‘test set’ from cross
validation
![Page 14: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/14.jpg)
© Daniel S. Weld 14
Machine Learning Outline
•Supervised Learning Review
•Ensembles of Classifiers Bagging Cross-validated committees Boosting Stacking
•Co-training
![Page 15: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/15.jpg)
© Daniel S. Weld 15
Co-Training Motivation
• Learning methods need labeled data Lots of <x, f(x)> pairs Hard to get… (who wants to label data?)
• But unlabeled data is usually plentiful… Could we use this instead??????
![Page 16: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/16.jpg)
© Daniel S. Weld 16
Co-training
• Have little labeled data + lots of unlabeled
• Each instance has two parts:x = [x1, x2]x1, x2 conditionally independent given f(x)
• Each half can be used to classify instancef1, f2 such that f1(x1) ~ f2(x2) ~ f(x)
• Both f1, f2 are learnablef1 H1, f2 H2, learning algorithms A1, A2
Suppose
![Page 17: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/17.jpg)
© Daniel S. Weld 17
Without Co-training f1(x1) ~ f2(x2) ~ f(x)
A1 learns f1 from x1
A2 learns f2 from x2A Few Labeled Instances
[x1, x2]
f2
A2
<[x1, x2], f()>
Unlabeled Instances
A1
f1 }Combine with ensemble?
Bad!! Not using
Unlabeled Instances!
f’
![Page 18: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/18.jpg)
© Daniel S. Weld 18
Co-training f1(x1) ~ f2(x2) ~ f(x)
A1 learns f1 from x1
A2 learns f2 from x2A Few Labeled Instances
[x1, x2]
Lots of Labeled Instances
<[x1, x2], f1(x1)>f2
Hypothesis
A2
<[x1, x2], f()>
Unlabeled InstancesA
1
f1
![Page 19: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/19.jpg)
© Daniel S. Weld 19
Observations
• Can apply A1 to generate as much training data as one wants If x1 is conditionally independent of x2 / f(x), then the error in the labels produced by A1 will look like random noise to A2 !!!
• Thus no limit to quality of the hypothesis A2 can make
![Page 20: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/20.jpg)
© Daniel S. Weld 20
Co-training f1(x1) ~ f2(x2) ~ f(x)
A1 learns f1 from x1
A2 learns f2 from x2A Few Labeled Instances
[x1, x2]
Lots of Labeled Instances
<[x1, x2], f1(x1)>
Hypothesis
A2
<[x1, x2], f()>
Unlabeled InstancesA
1
f1 f2
f 2
Lots of
f2f1
![Page 21: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/21.jpg)
© Daniel S. Weld 21
It really works!• Learning to classify web pages as course pages
x1 = bag of words on a page x2 = bag of words from all anchors pointing to a
page
• Naïve Bayes classifiers 12 labeled pages 1039 unlabeled
Percentage error
![Page 22: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/22.jpg)
Representing Uncertainty
![Page 23: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/23.jpg)
© Daniel S. Weld 23
Many Techniques Developed
• Fuzzy Logic• Certainty Factors• Non-monotonic logic• Probability
• Only one has stood the test of time!
© Daniel S. Weldd
![Page 24: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/24.jpg)
© Daniel S. Weld 24
Aspects of Uncertainty
• Suppose you have a flight at 12 noon• When should you leave for SEATAC
What are traffic conditions? How crowded is security?
• Leaving 18 hours early may get you there But … ?
![Page 25: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/25.jpg)
© Daniel S. Weld 25
Decision Theory = Probability + Utility Theory
Min before noon P(arrive-in-time) 20 min 0.05 30 min 0.25 45 min 0.50 60 min 0.75
120 min 0.98 1080 min 0.99
Depends on your preferencesUtility theory: representing &
reasoning about preferences
![Page 26: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/26.jpg)
© Daniel S. Weld 26
What Is Probability?
• Probability: Calculus for dealing with nondeterminism and uncertainty Cf. Logic
• Probabilistic model: Says how often we expect different things to occur
![Page 27: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/27.jpg)
© Daniel S. Weld 27
What Is Statistics?
• Statistics 1: Describing data
• Statistics 2: Inferring probabilistic models from data
• Structure• Parameters
![Page 28: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/28.jpg)
© Daniel S. Weld 28
Why Should You Care?• The world is full of uncertainty
Logic is not enough Computers need to be able to handle uncertainty
• Probability: new foundation for AI (& CS!)
• Massive amounts of data around today Statistics and CS are both about data Statistics lets us summarize and understand it Statistics is the basis for most learning
• Statistics lets data do our work for us
![Page 29: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/29.jpg)
© Daniel S. Weld 29
Outline
• Basic notions Atomic events, probabilities, joint distribution Inference by enumeration Independence & conditional independence Bayes’ rule
• Bayesian Networks• Statistical Learning• Dynamic Bayesian networks (DBNs)• Markov decision processes (MDPs)
![Page 30: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/30.jpg)
© Daniel S. Weld 30
Prop. Logic vs. Probability
Symbol: Q, R … Random variable: Q …
Boolean values: T, F Domain: you specifye.g. {heads, tails} [1, 6]
State of the world: Assignment to Q, R … Z
Atomic event: completespecification of world: Q… Z• Mutually exclusive• Exhaustive
Prior probability (akaUnconditional prob: P(Q)
Joint distribution: Prob.of every atomic event
![Page 31: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/31.jpg)
Types of Random Variables
![Page 32: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/32.jpg)
Axioms of Probability Theory• Just 3 are enough to build entire theory!
1. All probabilities between 0 and 1 0 ≤ P(A) ≤ 1 2. P(true) = 1 and P(false) = 0 3. Probability of disjunction of events is:
)()()()( BAPBPAPBAP
A B
A B
Tru
e
![Page 33: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/33.jpg)
Prior and Joint Probability
We will see later how any question can be answered by the joint distribution
0.2
![Page 34: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/34.jpg)
Conditional (or Posterior) Probability
• Conditional or posterior probabilitiese.g., P(cavity | toothache) = 0.8
i.e., given that Toothache is true (and all I know)
• Notation for conditional distributions:P(Cavity | Toothache) = 2-element vector of 2-
element vectors (2 P values when Toothache is true and 2 when false)
• If we know more, e.g., cavity is also given, then we haveP(cavity | toothache, cavity) = ?
• New evidence may be irrelevant, allowing simplification:P(cavity | toothache, sunny) = P(cavity | toothache) =
0.8
1
![Page 35: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/35.jpg)
Conditional Probability
• P(A | B) is the probability of A given B• Assumes that B is the only info known.• Defined as:
)(
)()|(
BP
BAPBAP
A BAB
Tru
e
![Page 36: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/36.jpg)
Dilemma at the Dentist’s
What is the probability of a cavity given a toothache?What is the probability of a cavity given the probe catches?
![Page 37: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/37.jpg)
© Daniel S. Weld 37
Inference by Enumeration
P(toothache)=.108+.012+.016+.064 = .20 or 20%This process is called
“Marginalization”
![Page 38: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/38.jpg)
© Daniel S. Weld 38
Inference by Enumeration
P(toothachecavity = .20 + ??
.072 + .008
.28
![Page 39: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/39.jpg)
© Daniel S. Weld 39
Inference by Enumeration
![Page 40: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/40.jpg)
Problems with Enumeration
• Worst case time: O(dn) Where d = max arity of random
variables e.g., d = 2 for Boolean (T/F)
And n = number of random variables• Space complexity also O(dn)
Size of joint distribution• Problem: Hard/impossible to
estimate all O(dn) entries for large problems
![Page 41: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/41.jpg)
© Daniel S. Weld 41
Independence
• A and B are independent iff:
)()|( APBAP
)()|( BPABP
)()(
)()|( AP
BP
BAPBAP
)()()( BPAPBAP
These two constraints are logically equivalent
• Therefore, if A and B are independent:
![Page 42: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/42.jpg)
© Daniel S. Weld 42
Independence
Tru
e
B
A A B
![Page 43: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/43.jpg)
© Daniel S. Weld 43
Independence
Complete independence is powerful but rareWhat to do if it doesn’t hold?
![Page 44: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/44.jpg)
© Daniel S. Weld 44
Conditional IndependenceTru
e
B
A A B
A&B not independent, since P(A|B) < P(A)
![Page 45: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/45.jpg)
© Daniel S. Weld 45
Conditional IndependenceTru
e
B
A A B
C
B C
AC
But: A&B are made independent by C
P(A|C) =P(A|B,C)
![Page 46: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/46.jpg)
© Daniel S. Weld 46
Conditional Independence
Instead of 7 entries, only need 5
![Page 47: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/47.jpg)
© Daniel S. Weld 47
Conditional Independence IIP(catch | toothache, cavity) = P(catch | cavity)P(catch | toothache,cavity) = P(catch |cavity)
Why only 5 entries in table?
![Page 48: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/48.jpg)
© Daniel S. Weld 48
Power of Cond. Independence
• Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!!
• Conditional independence is the most basic & robust form of knowledge about uncertain environments.
![Page 49: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/49.jpg)
Next Up…
• Bayes’ Rule• Bayesian Inference• Bayesian Networks
Bayes rules!
![Page 50: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/50.jpg)
© Daniel S. Weld 50
Bayes Rule
Simple proof from def of conditional probability:
)(
)()|()|(
EP
HPHEPEHP
)(
)()|(
EP
EHPEHP
)(
)()|(
HP
EHPHEP
)()|()( HPHEPEHP
QED:
(Def. cond. prob.)
(Def. cond. prob.)
)(
)()|()|(
EP
HPHEPEHP
(Mult by P(H) in line 2)
(Substitute #3 in #1)
![Page 51: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/51.jpg)
© Daniel S. Weld 51
Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neckP(M) = 0.0001, P(S) = 0.1, P(S|M)= 0.8
P(M|S) =
![Page 52: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/52.jpg)
© Daniel S. Weld 52
Bayes’ Rule & Cond. Independence
![Page 53: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/53.jpg)
© Daniel S. Weld 53
Bayes Nets
•In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference
•BNs provide a graphical representation of conditional independence relations in P
usually quite compact requires assessment of fewer parameters, those being
quite natural (e.g., causal) efficient (usually) inference: query answering and belief
update
![Page 54: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/54.jpg)
© Daniel S. Weld 55
An Example Bayes Net
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 55: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/55.jpg)
© Daniel S. Weld 56
Earthquake Example (con’t)
•If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls
P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E)
•By the chain rule we haveP(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B)
•Full joint requires only 10 parameters (cf. 32)
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Radio
![Page 56: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/56.jpg)
© Daniel S. Weld 57
BNs: Qualitative Structure
•Graphical structure of BN reflects conditional independence among variables
•Each variable X is a node in the DAG•Edges denote direct probabilistic influence
usually interpreted causally parents of X are denoted Par(X)
•X is conditionally independent of all nondescendents given its parents
Graphical test exists for more general independence “Markov Blanket”
![Page 57: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/57.jpg)
© Daniel S. Weld 58
Given Parents, X is Independent of
Non-Descendants
![Page 58: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/58.jpg)
© Daniel S. Weld 59
For Example
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Radio
![Page 59: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/59.jpg)
© Daniel S. Weld 60
Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X) Childs(X) Par(Childs(X))
![Page 60: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/60.jpg)
© Daniel S. Weld 61
Conditional Probability Tables
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 61: Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d415503460f94a1b6d1/html5/thumbnails/61.jpg)
© Daniel S. Weld 62
Conditional Probability Tables•For complete spec. of joint dist., quantify BN
•For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)|
•If X1, X2,... Xn is any topological sort of the network, then we
are assured:P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)
… P(X2 | X1) · P(X1)
= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)