Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2,...

Probabilistic Reasoning

Copyright, 1996 © Dale Carnegie & Associates, Inc.

Chapter 14 (14.1, 14.2, 14.3, 14.4)

• Capturing uncertain knowledge

• Probabilistic inference

CSE 471/598 by H. Liu 2

Knowledge representationJoint probability distribution can answer any question about the domain can become intractably large as #RV grows can be difficult to specify P for atomic events

Conditional independence can simplify probabilistic assignmentA data structure - a belief network or Bayesian network that represents the dependence between variables and gives a concise specification of the joint.

CSE 471/598 by H. Liu 3

A Bayesian network is a graph: A set of random variables A set of directed links connects pairs of nodes Each node has a conditional P table that

quantifies the effects that the parents have on the node

The graph has no directed cycles (DAG)

It is usually much easier for an expert to decide conditional dependence relationships than specifying probabilities Sometimes, experts can have very

different opinions

CSE 471/598 by H. Liu 4

Once the network is specified, we need only specify conditional probabilities for the nodes that participate in direct dependencies, and use those to compute any other probabilities.A simple Bayesian network (Fig 14.1)An example of burglary-alarm-call (Fig 14.2)The topology of the network can be thought of as the general structure of the causal process.Many details (Mary listening to loud music, or phone ringing and confusing John) are summarized in the uncertainty associated with the links from Alarm to JohnCalls and MaryCalls.

CSE 471/598 by H. Liu 5

The probabilities actually summarize a potentially infinite set of possible circumstances Overcoming both laziness and ignorance The degree of approximation can be improve if

we have additional relevant information

Specifying the CPT for each node (Fig 14.2) A conditioning case - a possible combination of

values for the parent nodes (2n) Each row in a CPT must sum to 1 A node with no parents has only one row

(priors)

CSE 471/598 by H. Liu 6

The semantics of Bayesian networksTwo equivalent views of a Bayesian network Representing the JPD - helpful in understanding

how to construct networks Representing conditional independence

relations - helpful in designing inference procedures

CSE 471/598 by H. Liu 7

Representing JPD - constructing a BN

A Bayesian network provides a complete description of the domain. Every entry in the JPD can be calculated from the info in the network.A generic entry in the joint is the probability of a conjunction of particular assignments to each variable.

P(x1,…,xn)=P(xi|Parents(xi)) (14.1)What’s the probability of the event of J^M^A^!B^!E? =P(j|a)P(m|a)P(a|!b^!e)P(!b)P(!e) Find the values in Figure 14.2 and done

CSE 471/598 by H. Liu 8

A method for constructing Bayesian networksEq 14.1 defines what a given BN means but implies certain conditional independence relationships that can be used to guide the construction.

P(x1,…,xn)=P(xn|xn-1,…,x1)P(xn-1,…,x1) continue for P(xn-1,…,x1) to form the Chain Rule we get (14.2) below

P(Xi|Xi-1,…,X1)=P(Xi|Parents(Xi)) (14.2) Parents(Xi) is contained in {Xi-1,…,X1}

The BN is a correct representation of the domain only if each node is C-independent of its predecessors in the node ordering, given its parents. E.g., P(M|J,A,E,B)=P(M|A)

CSE 471/598 by H. Liu 9

Incremental network construction

Choose relevant variables describing the domainChoose an ordering for the variablesWhile there are variables left: Pick a var and add a node to the network Set its parents to some minimal set of

nodes already in the net to satisfy Eq.14.2 Define the CPT for the var.

CSE 471/598 by H. Liu 10

CompactnessA Bayesian network can often be far more compact than the full joint.In a locally structured system, each sub-component interacts directly with only a bounded number of other components.A local structure is usually associated with linear rather than exponential growth in complexity.With 30 (n) nodes, if a node is directly influenced by 5 (k) nodes, what’s the difference between BN & joint? 30*2^5 vs. 2^30, or n*2^k vs. 2^n

CSE 471/598 by H. Liu 11

Node orderingThe correct order to add nodes is to add the “root causes” first, then the variables they influence, and so on until we reach the leaves that have no direct causal influence on the other variables. Domain knowledge helps!

What if we happen to choose the wrong order? Fig 14.3 shows an example.If we stick to a true causal model, we end up having to specify fewer numbers, and the numbers will often be easier to come up with.

CSE 471/598 by H. Liu 12

Conditional independence relations

Designing inference algorithms, we need to know if more general conditional independences hold.Given a network, can we know if a set of nodes X is independent of another set Y, given a set of evidence nodes E? It boils down to the concept of non-descendants. As in Fig 14.2, JohnCalls is indept of Burglary and

Earthquake, given Alarm.A node is cond independent of all other nodes in the network, given its parents, children, and children’s parents (its Markov blanket). Burglary is indept of JohnCalls and MaryCalls, given

Alarm and Earthquake

CSE 471/598 by H. Liu 13

Representation of CPTsGiven canonical distributions, the complete table can be specified by naming the distribution with some parameters.A deterministic node has its values specified exactly by the values of its parents.Uncertain relationships can often be characterized by “noisy” logical relationships.

Noisy-OR (page 500)

An example for determine cond probabilities starting with P(!fever) on page 501 given the individual inhibition probabilities given cold, flu, malaria as

P(!fever|c,!f,!m) = 0.6, P(!fever|!c,f,!m) = 0.2, and P(!fever|!c,!f,m) = 0.1

CSE 471/598 by H. Liu 14

Inference in Bayesian networks

Exact inference Inference by enumeration The variable elimination algorithm The complexity of exact inference Clustering algorithms

Approximate inference Direct sampling methods

Rejection sampling Likelihood weighting

Inference by Markov chain simulation

CSE 471/598 by H. Liu 15

Knowledge engineering for uncertain reasoningDecide what to talk aboutDecide on a vocabulary of random variablesEncode general knowledge about the dependence Encode a description of the specific problem instancePose queries to the inference procedure and get answers

CSE 471/598 by H. Liu 16

Other approaches to uncertain reasoning

Different generations of expert systems Strict logic reasoning (ignore uncertainty) Probabilistic techniques using the full Joint Default reasoning - believed until a better reason

is found to believe something else Rules with certainty factors Handling ignorance - Dempster-Shafer theory Vagueness - something is sort of true (fuzzy logic)

Probability makes the same ontological commitment as logic: the event is true or false

CSE 471/598 by H. Liu 17

Default reasoningThe four-wheel car conclusion is reached by default.New evidence can cause the conclusion retracted, while FOL is strictly monotonic.Representatives are default logic, nonmonotonic logic, circumscriptionThere are problematic issues Details in Chapter 10

CSE 471/598 by H. Liu 18

Rule-based methodsLogical reasoning systems have properties like: Monotonicity: additional facts won’t affect the

existing ones Locality: each rule is considered independently Detachment: After it is derived, a rule can be

detached from its justification Truth-functionality: the truth of complex sentences

can be computed from the truth of the components

These properties are good for obvious computational advantages; bad as they’re inappropriate for uncertain reasoning.

CSE 471/598 by H. Liu 19

SummaryReasoning properly In FOL, it means conclusions follow from premises In probability, it means having beliefs that allow an

agent to act rationallyConditional independence info is vitalA Bayesian network is a complete representation for the JPD, but exponentially smaller in sizeBayesian networks can reason causally, diagnostically, intercausally, or combining two or more of the three.For polytrees (singly connected networks), the computational time is linear in network size.

Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2,...

Documents

Transcript of Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2,...