Uncertain KR&R Chapter 9. 2 Outline Probability Bayesian networks Fuzzy logic.

Post on 27-Dec-2015

215 views 0 download

Transcript of Uncertain KR&R Chapter 9. 2 Outline Probability Bayesian networks Fuzzy logic.

Uncertain KR&R

Chapter 9

2

Outline

• Probability

• Bayesian networks

• Fuzzy logic

3

Probability

FOL fails for a domain due to:

Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result

Theoretical ignorance: there is no complete theory for the domain

Practical ignorance: have not or cannot run all necessary tests

4

Probability

• Probability = a degree of belief

• Probability comes from:

Frequentist: experiments and statistical assessment

Objectivist: real aspects of the universe

Subjectivist: a way of characterizing an agent’s beliefs

• Decision theory = probability theory + utility theory

5

Probability

Prior probability: probability in the absence of any other information

P(Dice = 2) = 1/6

random variable: Dice

domain = <1, 2, 3, 4, 5, 6>

probability distribution: P(Dice) = <1/6, 1/6, 1/6, 1/6, 1/6, 1/6>

6

Probability

Conditional probability: probability in the presence of some evidence

P(Dice = 2 | Dice is even) = 1/3

P(Dice = 2 | Dice is odd) = 0

P(A | B) = P(A B)/P(B)

P(A B) = P(A | B).P(B)

7

Probability

Example:S = stiff neckM = meningitisP(S | M) = 0.5P(M) = 1/50000P(S) = 1/20

P(M | S) = P(S | M).P(M)/P(S) = 1/5000

8

Probability

Joint probability distributions:

X: <x1, …, xm> Y: <y1, …, yn>

P(X = xi, Y = yj)

9

Probability

Axioms:

• 0 P(A) 1

• P(true) = 1 and P(false) = 0

• P(A B) = P(A) + P(B) P(A B)

10

Probability

Derived properties:

• P(A) = 1 P(A)

• P(U) = P(A1) + P(A2) + ... + P(An)

U = A1 A2 ... An collectively exhaustive

Ai Aj = false mutually exclusive

11

Probability

Bayes’ theorem:

P(Hi | E) = P(E | Hi).P(Hi)/iP(E | Hi).P(Hi)

Hi’s are collectively exhaustive & mutually exclusive

12

Probability

Problem: a full joint probability distribution P(X1, X2, ..., Xn) is sufficient for computing any (conditional) probability on Xi's, but the number of joint probabilities is exponential.

13

Probability

• Independence:

P(A B) = P(A).P(B)

P(A) = P(A | B)

• Conditional independence:

P(A B | E) = P(A | E).P(B | E)

P(A | E) = P(A | E B)

Example:

P(Toothache | Cavity Catch) = P(Toothache | Cavity)

P(Catch | Cavity Toothache) = P(Catch | Cavity)

14

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

15

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

Q1: If earthquake happens, how likely will John make a call?

16

Probability

"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."

Q1: If earthquake happens, how likely will John make a call?

Q2: If the alarm sounds, how likely is the house burglarized?

17

Bayesian Networks

• Pearl, J. (1982). Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach, presented at the Second National Conference on Artificial Intelligence (AAAI-82), Pittsburgh, Pennsylvania

• 2000 AAAI Classic Paper Award

18

Bayesian Networks

Burglary Earthquake

Alarm

JohnCalls MaryCalls

P(B)

.001

P(E)

.002

B E P(A)

T TT FF TF F

.95

.94

.29

.001

A P(J)TF

.90

.05

A P(M)TF

.70

.01

19

Bayesian NetworksSyntax:

• A set of random variables makes up the nodes

• A set of directed links connects pairs of nodes

• Each node has a conditional probability table that quantifies the effects of its parent nodes

• The graph has no directed cycles

A

B C

D

A

B C

D

yes no

20

Bayesian Networks

Semantics:

• An ordering on the nodes: Xi is a predecessor of Xj i < j

• P(X1, X2, …, Xn)

= P(Xn | Xn-1, …, X1).P(Xn-1 | Xn-2, …, X1). … .P(X2 | X1).P(X1)

= iP(Xi | Xi-1, …, X1) = iP(Xi | Parents(Xi))

P (Xi | Xi-1, …, X1) = P(Xi | Parents(Xi)) Parents(Xi) {Xi-1, …, X1}

Each node is conditionally independent of its predecessors given its parents

21

Bayesian Networks

Example:

P(J M A B E)

= P(J | A).P(M | A).P(A | B E).P(B).P(E)

= 0.00062

B E

A

J M

22

• Why Bayesian Networks?

Bayesian Networks

23

P(Query | Evidence) = ?

Diagnostic (from effects to causes): P(B | J)

Causal (from causes to effects): P(J | B)

Intercausal (between causes of a common effect): P(B | A, E)

Mixed: P(A | J, E), P(B | J, E)

Uncertain Question Answering

B E

A

J M

24

• The independence assumptions in a Bayesian Network simplify computation of conditional probabilities on its variables

Uncertain Question Answering

25

Uncertain Question Answering

Q1: If earthquake happens, how likely will John make a call?

Q2: If the alarm sounds, how likely is the house burglarized?

Q3: If the alarm sounds, how likely both John and Mary make calls?

26

P(B | A)

= P(B A)/P(A)

= P(B A)

P(B | A)

= P(B A)

= 1P(B A) + P(B A))

Uncertain Question Answering

27

• A Bayesian Network implies all conditional independence among its variables

General Conditional Independence

28

General Conditional Independence

Y1 Yn

U1 Um

Z1j Znj

X

A node (X) is conditionally independent of its non-

descendents (Zij's), given its parents (Ui's)

29

General Conditional Independence

Y1 Yn

U1 Um

Z1j Znj

X

A node (X) is conditionally independent of all other

nodes, given its parents (Ui's), children (Yi's), and

children's parents (Zij's)

30

General Conditional Independence

X and Y are conditionally independent given E

Z

Z

Z

X YE

31

General Conditional Independence

Example:

Battery

GasIgnitionRadio

Starts

Moves

32

General Conditional Independence

Example:

Gas - (Ignition) - Radio

Gas - (Battery) - Radio

Gas - (no evidence at all) - Radio

Gas Radio | Starts

Gas Radio | Moves

Battery

GasIgnitionRadio

Starts

Moves

33

Vagueness

• The Oxford Companion to Philosophy (1995):

“Words like smart, tall, and fat are vague since in most contexts of use there is no bright line separating them from not smart, not tall, and not fat respectively …”

34

Vagueness

• Imprecision vs. Uncertainty:

The bottle is about half-full.

vs.

It is likely to a degree of 0.5 that the bottle is full.

35

Fuzzy Sets

• Zadeh, L.A. (1965). Fuzzy SetsJournal of Information and Control

36

Fuzzy Sets

37

Fuzzy Sets

38

Fuzzy Set Definition

A fuzzy set is defined by a membership function that maps elements of a given domain (a crisp set) into values in [0, 1].

A: U [0, 1]

A A

0

1

25

40

Age30

0.5

young

39

Fuzzy Set Representation

• Discrete domain:high-dice score: {1:0, 2:0, 3:0.2, 4:0.5, 5:0.9,

6:1}

• Continuous domain:

A(u) = 1 for u[0, 25]A(u) = (40 - u)/15 for u[25, 40]A(u) = 0 for u[40, 150]

0

1

25

40

Age30

0.5

40

Fuzzy Set Representation

• -cuts:

A = {u | A(u) }

A = {u | A(u) } strong -cut

A0.5 = [0, 30]

0

1

25

40

Age30

0.5

41

Fuzzy Set Representation

• -cuts:

A = {u | A(u) }

A = {u | A(u) } strong -cut

A(u) = sup { | u A}

A0.5 = [0, 30]

0

1

25

40

Age30

0.5

42

Fuzzy Set Representation

• Support: supp(A) = {u | A(u) > 0} = A0+

• Core: core(A) = {u | A(u) = 1} = A1

• Height: h(A) = supUA(u)

43

Fuzzy Set Representation

• Normal fuzzy set: h(A) = 1

• Sub-normal fuzzy set: h(A) 1

44

Membership Degrees

• Subjective definition

45

Membership Degrees

• Subjective definition

• Voting model:

Each voter has a subset of U as his/her own crisp definition of the concept that A represents.

A(u) is the proportion of voters whose crisp definitions include u.

46

Membership Degrees

• Voting model:

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

1

2

3 x x

4 x x x x x

5 x x x x x x x x x

6 x x x x x x x x x x

47

Fuzzy Subset Relations

A B iff A(u) B(u) for every uU

A is more “specific” than B

“X is A” entails “X is B”

48

Fuzzy Set Operations

• Standard definitions:

Complement: A(u) = 1 A(u)

Intersection: (AB)(u) = min[A(u), B(u)]

Union: (AB)(u) = max[A(u), B(u)]

49

Fuzzy Set Operations

• Example:

not young = young

not old = old

middle-age = not youngnot old

old = young

50

Fuzzy Relations

• Crisp relation:

R(U1, ..., Un) U1 ... Un

R(u1, ..., un) = 1 iff (u1, ..., un) R or = 0 otherwise

51

Fuzzy Relations

• Crisp relation:

R(U1, ..., Un) U1 ... Un

R(u1, ..., un) = 1 iff (u1, ..., un) R or = 0 otherwise

• Fuzzy relation: a fuzzy set on U1 ... Un

52

Fuzzy Relations

• Fuzzy relation:

U1 = {New York, Paris}, U2 = {Beijing, New York, London}

R = “very far”

R = {(NY, Beijing): 1, ...}

NY Paris

Beijing 1 .9

NY 0 .7

London .6 .3

53

Fuzzy Numbers

• A fuzzy number A is a fuzzy set on R:

A must be a normal fuzzy set

A must be a closed interval for every (0, 1]

supp(A) = A0+ must be bounded

54

Basic Types of Fuzzy Numbers

1

0

1

0

1

0

1

0

55

Basic Types of Fuzzy Numbers

1

0

1

0

56

Operations of Fuzzy Numbers

• Interval-based operations:

(A B)= A B

57

Operations of Fuzzy Numbers

• Arithmetic operations on intervals:

[a, b][d, e] = {fg | a f b, d g e}

58

Operations of Fuzzy Numbers

• Arithmetic operations on intervals:

[a, b][d, e] = {fg | a f b, d g e}

[a, b] + [d, e] = [a + d, b + e]

[a, b] [d, e] = [a e, b d]

[a, b]*[d, e] = [min(ad, ae, bd, be), max(ad, ae, bd, be)]

[a, b]/[d, e] = [a, b]*[1/e, 1/d] 0[d, e]

59

Operations of Fuzzy Numbers

+

1

about 2

about 2 + about 3 = ?

about 2 about 3 = ?

about 3

02 3

60

Operations of Fuzzy Numbers

• Discrete domains:

A = {xi: A(xi)} B = {yi: B(yi)}

A B= ?

61

Operations of Fuzzy Numbers

• Extension principle:

f: U1 U2 V

induces

g: U1 U2 V

[g(A, B)](v) = sup{(u1, u2) | v = f(u1, u2)}min{A(u1), B(u2)}

~~~

62

Operations of Fuzzy Numbers

• Discrete domains:

A = {xi: A(xi)} B = {yi: B(yi)}

(A B)(v)= sup{(xi, yj) | v = xi°yj)}min{A(xi), B(yj)}

63

Fuzzy Logic

if x is A then y is Bx is A*

------------------------ y is B*

64

Fuzzy Logic

• View a fuzzy rule as a fuzzy relation

• Measure similarity of A and A*

65

Fuzzy Controller

• As special expert systems

• When difficult to construct mathematical models

• When acquired models are expensive to use

66

Fuzzy Controller

IF the temperature is very high

AND the pressure is slightly low

THEN the heat change should be slightly negative

67

Fuzzy Controller

Controlledprocess

Defuzzificationmodel

Fuzzificationmodel

Fuzzy inference engine

Fuzzy rule base

actions

conditions

FUZZY CONTROLLER

68

Fuzzification

1

0

x0

69

Defuzzification

• Center of Area:

x = (A(z).z)/A(z)

70

Defuzzification

• Center of Maxima:

M = {z | A(z) = h(A)}

x = (min M + max M)/2

71

Defuzzification

• Mean of Maxima:

M = {z | A(z) = h(A)}

x = z/|M|

72

Exercises

• In Klir-Yuan’s textbook: 1.9, 1.10, 2.11, 2.12, 4.5