Solving problems using propositional logic
description
Transcript of Solving problems using propositional logic
Solving problems using propositional logic
• Need to write what you know as propositional formulas• Theorem proving will then tell you whether a given new
sentence will hold given what you know• Three kinds of queries
– Is my knowledgebase consistent? (i.e. is there at least one world where everything I know is true?) Satisfiability
– Is the sentence S entailed by my knowledge base? (i.e., is it true in every world where my knowledge base is true?)
– Is the sentence S consistent/possibly true with my knowledge base? (i.e., is S true in at least one of the worlds where my knowledge base holds?)• S is consistent if ~S is not entailed
• But cannot differentiate between degrees of likelihood among possible sentences
Example• Pearl lives in Los Angeles. It is a high-crime area. Pearl installed a
burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth-quake prone. Alarm goes off when there is an earth-quake.
Burglary => AlarmEarth-Quake => AlarmAlarm => John-callsAlarm => Mary-calls
If there is a burglary, will Mary call? Check KB & E |= MIf Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~B
Example (Real)• Pearl lives in Los Angeles. It is a high-
crime area. Pearl installed a burglar alarm. He asked his neighbors John & Mary to call him if they hear the alarm. This way he can come home if there is a burglary. Los Angeles is also earth-quake prone. Alarm goes off when there is an earth-quake.
• Pearl lives in real world where (1) burglars can sometimes disable alarms (2) some earthquakes may be too slight to cause alarm (3) Even in Los Angeles, Burglaries are more likely than Earth Quakes (4) John and Mary both have their own lives and may not always call when the alarm goes off (5) Between John and Mary, John is more of a slacker than Mary.(6) John and Mary may call even without alarm going off
Burglary => Alarm Earth-Quake => Alarm Alarm => John-callsAlarm => Mary-calls
If there is a burglary, will Mary call? Check KB & E |= MIf Mary didn’t call, is it possible that Burglary occurred? Check KB & ~M doesn’t entail ~BJohn already called. If Mary also calls, is it more likely that Burglary occurred?You now also hear on the TV that there was an earthquake. Is Burglary more or less likely
now?
How do we handle Real Pearl?
• Omniscient & Eager way:– Model everything!– E.g. Model exactly the conditions under which John will call
• He shouldn’t be listening to loud music, he hasn’t gone on an errand, he didn’t recently have a tiff with Pearl etc etc.
A & c1 & c2 & c3 &..cn => J (also the exceptions may have interactions c1&c5 => ~c9 )
• Ignorant (non-omniscient) and Lazy (non-omnipotent) way:– Model the likelihood – In 85% of the worlds where there was an alarm, John will actually
call– How do we do this?
• Non-monotonic logics• “certainty factors”• “fuzzy logic”• “probability” theory?
Qualification and Ramification problems make this an infeasible enterprise
Potato in the tail-pipe
Non-monotonic (default) logic• Prop calculus (as well as the first order logic we shall discuss later) are
monotonic, in that once you prove a fact F to be true, no amount of additional knowledge can allow us to disprove F.
• But, in the real world, we jump to conclusions by default, and revise them on additional evidence– Consider the way the truth of the statement “F: Tweety Flies” is revised by us
when we are given facts in sequence: 1. Tweety is a bird (F)2. Tweety is an Ostritch (~F) 3. Tweety is a magical Ostritch (F) 4. Tweety was cursed recently (~F) 5. Tweety was able to get rid of the curse (F)
• How can we make logic show this sort of “defeasible” (aka defeatable) conclusions?– Many ideas, with one being negation as failure– Let the rule about birds be Bird & ~abnormal => Fly
• The “abnormal” predicate is treated special— if we can’t prove abnormal, we can assume ~abnormal is true
• (Note that in normal logic, failure to prove a fact F doesn’t allow us to assume that ~F is true since F may be holding in some models and not in other models).
– Non-monotonic logic enterprise involves (1) providing clean semantics for this type of reasoning and (2) making defeasible inference efficient
Certainty Factors• Associate numbers to each of the facts/axioms• When you do derivations, compute c.f. of the
results in terms of the c.f. of the constituents (“truth functional”)
• Problem: Circular reasoning because of mixed causal/diagnostic directions– Raining => Grass-wet 0.9– Grass-wet => raining 0.7
• If you know grass-wet with 0.4, then we know raining which makes grass more wet, which….
Fuzzy Logic vs. Prob. Prop. Logic
• Fuzzy Logic assumes that the word is made of statements that have different grades of truth– Recall the puppy example
• Fuzzy Logic is “Truth Functional”—i.e., it assumes that the truth value of a sentence can be established in terms of the truth values only of the constituent elements of that sentence.
• PPL assumes that the world is made up of statements that are either true or false
• PPL is truth functional for “truth value in a given world” but not truth functional for entailment status.
Prop. Logic vs. Prob. Prop. Logic• Model theory for logic
– Set of all worlds where the formula/KB holds
– Each world is either a model or not
• Proof theory/inference– Rather than enumerate
all models, write KB sentences that constrain the models
• Model theory for prob. Logic– For each world, give the probability
p that the formula holds in that world• p(w)=0 means that world is
definitely not a model• Otherwise, 0< p(w) <= 1• Sum of p(w) = 1• AKA Joint Probability
Distribution—2n – 1 numbers• Proof Theory
– statements on subsets of propositions• E.g. p(A)=0.5; p(B|A)=0.7 etc• (Cond) Independences…
– These constrain the joint probability distribution
Easy Special Cases • If in addition, each
proposition is equally likely to be true or false, – Then the joint probability
distribution can be specified without giving any numbers! • All worlds are equally
probable! If there are n props, each world will be 1/2n probable
– Probability of any propositional conjunction with m (< n) propositions will be 1/2m
• If there are no relations between the propositions (i.e., they can take values independently of each other)– Then the joint probability
distribution can be specified in terms of probabilities of each proposition being true
– Just n numbers instead of 2n
Will we always need 2n numbers?• If every pair of variables is independent of each other, then
– P(x1,x2…xn)= P(xn)* P(xn-1)*…P(x1)– Need just n numbers!– But if our world is that simple, it would also be very uninteresting & uncontrollable
(nothing is correlated with anything else!)
• We need 2n numbers if every subset of our n-variables are correlated together– P(x1,x2…xn)= P(xn|x1…xn-1)* P(xn-1|x1…xn-2)*…P(x1)– But that is too pessimistic an assumption on the world
• If our world is so interconnected we would’ve been dead long back…
A more realistic middle ground is that interactions between variables are contained to regions. --e.g. the “school variables” and the “home variables” interact only loosely (are independent for most practical purposes) -- Will wind up needing O(2k) numbers (k << n)
Probabilistic Calculus to the Rescue
Suppose we know the likelihood of each of the (propositional)
worlds (aka Joint Probability distribution )
Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you)
So, Joint Probability Distribution is all that you ever need!
In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers)--In general 2n separate numbers
(which should add up to 1)
If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with?
--Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2n numbers
---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take!
Burglary => Alarm Earth-Quake => Alarm Alarm => John-callsAlarm => Mary-calls
Only 10 (instead of 32) numbers to specify!
Topology encodes the conditional independence assertions
Probabilistic Calculus to the Rescue
Suppose we know the likelihood of each of the (propositional)
worlds (aka Joint Probability distribution )
Then we can use standard rules of probability to compute the likelihood of all queries (as I will remind you)
So, Joint Probability Distribution is all that you ever need!
In the case of Pearl example, we just need the joint probability distribution over B,E,A,J,M (32 numbers)--In general 2n separate numbers
(which should add up to 1)
If Joint Distribution is sufficient for reasoning, what is domain knowledge supposed to help us with?
--Answer: Indirectly by helping us specify the joint probability distribution with fewer than 2n numbers
---The local relations between propositions can be seen as “constraining” the form the joint probability distribution can take!
Burglary => Alarm Earth-Quake => Alarm Alarm => John-callsAlarm => Mary-calls
Only 10 (instead of 32) numbers to specify!
Propositional Probabilistic Logic
If you know th
e full j
oint,
You can answ
er ANY query
TA ~TACA 0.04 0.06~CA 0.01 0.89
P(CA & TA) =
P(CA) =
P(TA) =
P(CA V TA) =
P(CA|~TA) =
P(~TA|CA) =
Check if P(CA|~TA) = P(~TA|CA)* P(CA)
Computing Prob. Queries Given a Joint Distribution
Your name:
TA ~TACA 0.04 0.06~CA 0.01 0.89
P(CA & TA) = 0.04
P(CA) = 0.04+0.06 = 0.1 (marginalizing over TA)
P(TA) = 0.04+0.01= 0.05
P(CA V TA) = P(CA) + P(TA) – P(CA&TA) = 0.1+0.05-0.04 = 0.11
P(CA|~TA) = P(CA&~TA)/P(~TA) = 0.06/(0.06+.89) = .06/.95=.0631
Think of this as analogous to entailment by truth-table enumeration!
If B=>A then P(A|B) = ? P(B|~A) = ? P(B|A) = ?
The material in this slide was presented on the white board
Most useful probabilistic reasoning involves computing posterior
distributions
Probability
Variable values
P(CA)
P(CA|TA)P(CA|TA;Catch)
Important: Computing posterior distribution is inference; not learning
P(CA|TA;Catch;Romney-won)?
CONDITIONAL PROBABLITIES
Non-monotonicity w.r.t. evidence– P(A|B) can be either higher, lower or equal to P(A)
The material in this slide was presented on the white board
A be Anthrax; Rn be Runny NoseP(A|Rn) = P(Rn|A) P(A)/ P(Rn)
Get by with easier to assess numbers
Generalized bayes rule
P(A|B,e) = P(B|A,e) P(A|e) P(B|e)
Think of this as analogous to inference rules (like modus-ponens)
The material in this slide was presented on the white board
Relative ease/utility of Assessing various types of probabilities• Joint distribution requires us to assess probabilities of type P(x1,~x2,x3,….~xn)• This means we have to look at all entities in the world and see which fraction of them
have x1,~x2,x3….~xm true • Difficult experiment to setup..
• Conditional probabilities of type P(A|B) are relatively easier to assess– You just need to look at the set of
entities having B true, and look at the fraction of them that also have A true
– Eventually, they too can get baroque P(x1,~x2,…xm|y1..yn)
• Among the conditional probabilities, causal probabilities of the form P(effect|cause) are better to assess than diagnostic probabilities of the form P(cause|effect)– Causal probabilities tend to me more stable compared to diagnostic
probabilities– (for example, a text book in dentistry can publish P(TA|Cavity) and hope
that it will hold in a variety of places. In contrast, P(Cavity|TA) may depend on other fortuitous factors—e.g. in areas where people tend to eat a lot of icecream, many tooth aches may be prevalent, and few of them may be actually due to cavities.
Doc, Doc, I have
flu. Can you tell
if I have a runny nose?
The material in this slide was presented on the white board
What happens if there are multiple symptoms…?
Patient walked in and complained of toothache You assess P(Cavity|Toothache)
Now you try to probe the patients mouth with that steel thingie, and it catches…
How do we update our belief in Cavity?
P(Cavity|TA, Catch) = P(TA,Catch| Cavity) * P(Cavity) P(TA,Catch)
= a P(TA,Catch|Cavity) * P(Cavity)Need to know this!If n evidence variables,We will need 2n probabilities!
Conditional independenceTo the rescue Suppose P(TA,Catch|cavity) = P(TA|Cavity)*P(Catch|Cavity)
The material in this slide was presented on the white board
Local Semantics Node independent of non-descendants given its parentsGives global semantics i.e. the full joint
Put variables in reverse topological sort; apply chain rule
P(J|M,A,~B,~E)*P(M|A,~B,~E)*P(A|~B,~E)*P(~B|~E)*P(~E)P(J|A) * P(M|A) *P(A|~B,~E) * P(~B) * P(~E)
By conditional independence inherent in the Bayes Net
.9 .7 .001 .999 .998
.000628
Conditional Independence Assertions
• We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables Y given evidence on the set of variables Z (where X,Y,Z are subsets of the set of all random variables in the domain model)
• Inference can exploit conditional independence assertions. Specifically, – X || Y| Z implies
• P(X & Y|Z) = P(X|Z) * P(Y|Z)• P(X|Y, Z) = P(X|Z)• P(Y|X,Z) = P(Y|Z)
– If A||B|C then P(A,B,C)=P(A|B,C)P(B,C) =P(A|B,C)P(B|C)P(C) =P(A|C)P(B|C)P(C) (Can get by with 1+2+2=5 numbers instead of 8)
Why not write down all conditional independence assertions that hold in a domain?
Cond. Indep. Assertions (Contd)• Idea: Why not write down all conditional independence assertions
(CIA) (X || Y | Z) that hold in a domain?• Problem: There can be exponentially many conditional independence
assertions that hold in a domain (recall that X, Y and Z are all subsets of the domain variables).
• Brilliant Idea: May be we should implicitly specify the CIA by writing down the “local dependencies” between variables using a graphical model– A Bayes Network is a way of doing just this.
• The Bayes Net is a Directed Acyclic Graph whose nodes are random variables, and the immediate dependencies between variables are represented by directed arcs
– The topology of a bayes network shows the inter-variable dependencies. Given the topology, there is a way of checking if any Cond. Indep. Assertion. holds in the network (the Bayes Ball algorithm and the D-Sep idea)
Topological Semantics
Indep
enden
ce fro
m
Non-desc
edan
ts hold
s
Given ju
st the p
aren
ts
Indep
enden
ce fro
m
Every
node
holds
Given m
arkov
blanket
These two conditions are equivalent Many other conditional indepdendence assertions follow from these
Markov Blanket Parents;Children;Children’s other parents
CIA implicit in Bayes Nets• So, what conditional independence assumptions
are implicit in Bayes nets?– Local Markov Assumption:
• A node N is independent of its non-descendants (including ancestors) given its immediate parents. (So if P are the immediate paretnts of N, and A is a subset of of Ancestors and other non-descendants, then {N} || A| P )
• (Equivalently) A node N is independent of all other nodes given its markov blanket (parents, children, children’s parents)
– Given this assumption, many other conditional independencies follow. For a full answer, we need to appeal to D-Sep condition and/or Bayes Ball reachability
Alarm
P(A|J,M) =P(A)?
Burglary Earthquake
How many probabilities are needed?
13 for the new; 10 for the old Is this the worst?
1 2
4
8
16
31!
Introduce variables in the causal order
Easy when you know the causality of the domain hard otherwise..
Bayesian (Personal) Probabilities
• Continuing bad friends, in the question above, suppose a second friend comes along and says that he can give you the conditional probabilities that you want to complete the specification of your bayes net. You ask him a CPT entry, and pat comes a response--some number between 0 and 1. This friend is well meaning, but you are worried that the numbers he is giving may lead to some sort of inconsistent joint probability distribution. Is your worry justified ( i.e., can your friend give you numbers that can lead to an inconsistency?)
(To understand "inconsistency", consider someone who insists on giving you P(A), P(B), P(A&B) as well as P(AVB) and they wind up not satisfying the P(AVB)= P(A)+P(B) -P(A&B)[or alternately, they insist on giving you P(A|B), P(B|A), P(A) and P(B), and the four numbers dont satisfy the bayes rule]
• Answer: No—as long as we only ask the friend to fill up the CPTs in the bayes network, there is no way the numbers won’t makeup a consistent joint probability distribution– This should be seen as a feature..
• Personal Probabilities– John may be an optimist and
believe that P(burglary)=0.01 and Tom may be a pessimist and believe that P(burglary)=0.99
– Bayesians consider both John and Tom to be fine (they don’t insist on an objective frequentist interpretation for probabilites)
– However, Bayesians do think that John and Tom should act consistently with their own beliefs• For example, it makes no
sense for John to go about installing tons of burglar alarms given his belief, just as it makes no sense for Tom to put all his valuables on his lawn
Ideas for reducing the number of probabilties to be specified
• Problem 1: Joint distribution requires 2n numbers to specify; and those numbers are harder to assess
• Problem 2: But, CPTs will be as big as the full joint if the network is dense CPTs
• Problem 3: But, CPTs can still be quite hard to specify if there are too many parents (or if the variables are continuous)
• Solution: Use Bayes Nets to reduce the numbers and specify them as CPTs
• Solution: Introduce intermediate variables to induce sparsity into the network
• Solution: Parameterize the CPT (use Noisy OR etc for discrete variables; gaussian etc for continuous variables)
Making the network Sparse by introducing intermediate variables
• Consider a network of boolean variables where n parent nodes are connected to m children nodes (with each parent influencing each child).– You will need n + m*2n conditional
probabilities• Suppose you realize that what is
really influencing the child nodes is some single aggregate function on the parent’s values (e.g. sum of the parents). – We can introduce a single
intermediate node called “sum” which has links from all the n parent nodes, and separately influences each of the m child nodes• Now you will wind up needing only
n+ 2n + 2m conditional probabilities to specify this new network!
Learning such hidden variables from data poses challenges..
Compact/Parameterized distributions are pretty much the only way to go when continuous variables are involved!
We only consider the failure to causeprobability of the Causes that hold
k
i=j+1ri
How about Noisy And? (hint: A&B => ~( ~A V ~B) )Prob that X holds even though ith
parent doesn’t
Think of a firing squad with upto k gunners trying to shoot you You will live only if everyone who shoots misses..
Constructing Belief Networks: Summary
• [[Decide on what sorts of queries you are interested in answering– This in turn dictates what factors to model in the network
• Decide on a vocabulary of the variables and their domains for the problem– Introduce “Hidden” variables into the network as needed to make the
network “sparse”• Decide on an order of introduction of variables into the network
– Introducing variables in causal direction leads to fewer connections (sparse structure) AND easier to assess probabilities
• Try to use canonical distributions to specify the CPTs– Noisy-OR– Parameterized discrete/continuous distributions
• Such as Poisson, Normal (Gaussian) etc
Case Study: Pathfinder System• Domain: Lymph node diseases
– Deals with 60 diseases and 100 disease findings• Versions:
– Pathfinder I: A rule-based system with logical reasoning– Pathfinder II: Tried a variety of approaches for uncertainity
• Simple bayes reasoning outperformed – Pathfinder III: Simple bayes reasoning, but reassessed probabilities– Parthfinder IV: Bayesian network was used to handle a variety of conditional
dependencies. • Deciding vocabulary: 8 hours• Devising the topology of the network: 35 hours• Assessing the (14,000) probabilities: 40 hours
– Physician experts liked assessing causal probabilites• Evaluation: 53 “referral” cases
– Pathfinder III: 7.9/10– Pathfinder IV: 8.9/10 [Saves one additional life in every 1000 cases!]– A more recent comparison shows that Pathfinder now outperforms experts who
helped design it!!
Conditional Independence Assertions
• We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables Y given evidence on the set of variables Z (where X,Y,Z are subsets of the set of all random variables in the domain model)
• Inference can exploit conditional independence assertions. Specifically, – X || Y| Z implies
• P(X & Y|Z) = P(X|Z) * P(Y|Z)• P(X|Y, Z) = P(X|Z)• P(Y|X,Z) = P(Y|Z)
– If A||B|C then P(A,B,C)=P(A|B,C)P(B,C) =P(A|B,C)P(B|C)P(C) =P(A|C)P(B|C)P(C) (Can get by with 1+2+2=5 numbers instead of 8)
Why not write down all conditional independence assertions that hold in a domain?
Topological Semantics
Indep
enden
ce fro
m
Non-desc
edan
ts hold
s
Given ju
st the p
aren
ts
Indep
enden
ce fro
m
Every
node
holds
Given m
arkov
blanket
These two conditions are equivalent Many other conditional indepdendence assertions follow from these
Markov Blanket Parents;Children;Children’s other parents
Independence in Bayes Networks:Causal Chains; Common Causes;
Common EffectsCausal chain (linear) X causes Y through Z is blocked if Z is given
Common Cause (diverging) X and Y are caused by Z is blocked if Z is given
Common Effect (converging) X and Y cause Z is blocked only if neither Z nor its descendants are given
D-sep (direction dependent Separation)• X || Y | E if every
undirected path from X to Y is blocked by E
– A path is blocked if there is a node Z on the path s.t.
1. [Z] Z is in E and Z has one arrow coming in and another going out
2. [Z] is in E and Z has both arrows going out
3. [Z] Neither Z nor any of its descendants are in E and both path arrows lead to Z
B||M|A(J,M)||E | AB||E
B||E | AB||E | M
Topological Semantics
Indep
enden
ce fro
m
Non-desc
edan
ts hold
s
Given ju
st the p
aren
ts
Indep
enden
ce fro
m
Every
node
holds
Given m
arkov
blanket
These two conditions are equivalent Many other conditional indepdendence assertions follow from these
Markov Blanket Parents;Children;Children’s other parents
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
• “Asia” network:
V||A | T
T ||L ?
T||L | D
X||D
X||D | A
(V,S) || X | A
(V,S)||(X,D)| A
How many probabilities? (assume all are boolean variables)
P(V,~T,S,~L,A,B,~X,~D) =
Name:
0th idea for Bayes Net Inference• Given a bayes net, we can compute all the entries
of the joint distribution (by just multiplying entries in CPTs)
• Given the joint distribution, we can answer any probabilistic query.
• Ergo, we can do inference on bayes networks• Qn: Can we do better?
– Ideas: • Implicity enumerate only the part of the joint that is needed• Use sampling techniques to compute the probabilities
Factor Algebra…
P(A,B)
Variable Elimination (Summing Away)
P(A,B) =
P(B) =? Eliminate A Sum rows where A changes polarity
Elimination is what we mostly do to compute probabilities
P(A|B) = P(AB)/P(B)
Normalization
If the expression tree is evaluated in a depth first fashion, then the space requirement is linear..
fA(a,b,e)*fj(a)*fM(a)+fA(~a,b,e)*fj(~a)*fM(~a)+
Complexity depends on the size of the largest factor which in turn depends on the order in which variables are eliminated..
A join..
*Read Worked Out Example of Variable Elimination in the Lecture
Notes*
A More Complex Example
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
• “Asia” network:
[From Lise Getoor’s notes]
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b
Initial factors
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b
Initial factors
Eliminate: v
Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term
Compute: v
v vtPvPtf )|()()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b
• Initial factors
Eliminate: s
Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables
Compute: s
s slPsbPsPlbf )|()|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: x,t,l,a,b
• Initial factors
Eliminate: x
Note: fx(a) = 1 for all values of a !!
Compute: x
x axPaf )|()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: t,l,a,b
• Initial factors
Eliminate: tCompute:
tvt ltaPtflaf ),|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|(),()(),( badPlafaflbf txs
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: l,a,b
• Initial factors
Eliminate: lCompute:
ltsl laflbfbaf ),(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|(),()(),( badPlafaflbf txs
),|()(),( badPafbaf xl
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
• We want to compute P(d)• Need to eliminate: b
• Initial factors
Eliminate: a,bCompute:
b
aba
xla dbfdfbadpafbafdbf ),()(),|()(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|()(),( badPafbaf xl),|(),()(),( badPlafaflbf txs
)(),( dfdbf ba
Variable Elimination
• We now understand variable elimination as a sequence of rewriting operations
• Actual computation is done in elimination step
• Computation depends on order of elimination– Optimal elimination order can be computed—but is
NP-hard to do so
In general, any leaf node that is not a query or evidence variable is irrelevant (and can be removed)(once it is removed, others may be seen to be irrelevant)
Can drop irrelevant variables from the network before starting the query off..
Sufficient Condition 1
Sufficient Condition 2
Note that condition 2 doesn’t subsume condition 1. In particular, it won’t allow us to say that M is irrelevant for the query P(J|B)
Notice that sampling methods could in general be used even when we don’t know the bayes net (and are just observing the world)! We should strive to make the sampling more efficient given that we know the bayes net
Generating a Sample from the Network
<C, ~S, R, W>
Network Samples Joint distribution
Note that the sample is generated in the causal order so all the required probabilities can be read off from CPTs! (To see how useful this is, consider generating the sample in the order of wetgrass, rain, sprinkler … To sample WG, we will first need P(WG), then we need P(Rain|Wg), then we need P(Sp|Rain,Wg)—NONE of these are directly given in CPTs –and have to be computed… Note that in MCMC we do (re)sample a node given its markov blanket—which is not in causal order—since MB contains children and their parents.
That is, the rejection sampling method doesn’t really use the bayes network that much…
Notice that to attach the likelihood to the evidence, we are using the CPTs in the bayes net. (Model-free empirical observation, in contrast, either gives you a sample or not; we can’t get fractional samples)
Notice that to attach the likelihood to the evidence, we are using the CPTs in the bayes net. (Model-free empirical observation, in contrast, either gives you a sample or not; we can’t get fractional samples)
Note that the other parents of zj are part of the markov blanket
P(rain|cl,sp,wg) = P(rain|cl) * P(wg|sp,rain)
Examples of singly connected networks include Markov Chains and Hidden Markov Models
Network Topology & Complexity of Inference
Singly Connected Networks (poly-trees – At most one path between any pair of nodes)Inference is polynomial
Cloudy
Sprinklers Rain
Wetgrass
Multiply- connectedInference NP-hard
Can be convertedto singly-connected(by merging nodes)
Cloudy
Wetgrass
Sprinklers+Rain (takes 4 values 2x2)
The “size” of the merged netw
ork can beExponentially larger (so polynom
ial inferenceon that netw
ork isn’t exactly god’s gift
Summary of BN Inference Algorithms
Exact Inference• Algorithms
– Enumeration– Variable elimination
• Avoids the redundant computations of Enumeration
– [Many others such as “message passing” algorithms, Constraint-propagation based algorithms etc.]
Approximate Inference • Algorithms
– Based on Stochastic Simulation• Sampling from empty networks• Rejection sampling• Likelihood weighting • MCMC [And many more]
• Complexity– NP-hard (actually #P-Complete; since we “count” models)
• Polynomial for “Singly connected” networks (one path between each pair of nodes)
– NP-hard also for absolute and relative approximation
Conjunctive queries are essentially computing joint distributions on sets of query variables. A special case of computing the full joint on query variables is finding just the query variable configuration that isMost likely given the evidence. There are two special cases here
MPE—Most Probable Explanation Most likely assignment to all other variables given the evidence Mostly involves max/product
MAP—Maximum a posteriori Most likely assignment to some variables given the evidence
Can involve, max/product/sum operations