CPSC 533 - Artificial Intelligence Michael M. Richter Section 6: Impreciseness.

CPSC 533 - Artificial Intelligence Michael M. Richter

Section 6: Impreciseness


Forms of Impreciseness

Impreciness occurs frequently:• A number is only approximatively correct (toerance)• A frequency says nothing about a specific case or is

even not exactly known (probability)• The exact number is not of interest (abstraction)• The expression itself has no exact meaning (informal semantics of e.g. like „this is useful“).• Some forms are of objective and others are of

subjective character „(the tolerance is 0.2“ or „the weather is nice“)


Forms of Uncertainty and Vagueness (2)

We distinguish vagueness which has an objective origin from vagueness which has a mainly subjective character.

In an objective situation there is an agreement which has a formal character and a model to which one can refer refer. The informal notion than has a formal original. E.g. “high fever” has as original a certain precise temperature (which may not be known or may not be of interest).

In a subjective situation there is no exact original (like “this is stupid”), only a subjective impression is reflected.

In order put subjective vagueness on solid foundations the reasoning in the model is replaced by experiments with the individuals who express their subjective opinions.

From the expression itself it may not be evident whether it has an objective or a subjective meaning.


Subjective Uncertainty and a Turing Test (1)Suppose there is a partial ordering „<„ with the concept C associated: The partial ordering then again has two versions: formal and subjective.The Turing test refers to these two versions of „<„ :

Subjective humanversion of C

Formalversionof C

The tests whenvariations of the arguments of < arepresented. Goal:The human says „up“if and only if the formalsystem says „up“

To be verifiedby experiments


Subjective Uncertainty and a Turing Test (2)

Concept to grasp: Typical lionFormalversion usesOrdering: Quotient of length/height

Human:Subjective judgement

better

betterThe partial ordering approximates the concept C in the sense that semantics of y < z is : z is more typical for Cthan y is.

Example: Fuzzy values for “typical lion”


A Warning Example (1) Reasoning with vague concepts may easily lead to

consequences which are inconsistent. Example: We consider two medications A and B which are

applied two male and female patients in a hospital. The successes are recorded.

There are three predicates introduced with the following semantics: BetterM(X,Y): More successes with medication X applied

to men BetterF(X,Y): More successes with X applied to women Better(X,Y): More successes with X in total.

Question: Is BetterM(X,Y) BetterF(X,Y) Better(X,Y) true ? This sounds plausible but there is a counterexample.


A Warning Example (2)

Better M

(A,B)Better F

(A,B)

Better(B,A)

+ -

+ -

A

B

A

B

20 180

4 96

20 20

37 63

50% Success

10% Success

4% Success

37% Success

16,6% Success

20,5% Success

A

B

40 200

41 159

+ -

Men:

Women:

Total:


Rough Sets (1): A Basic Method We consider a universe U; the uncertainty is given by a

binary relation „“ over U called indiscernability relation. Assumption: is reflexive and symmetric. Idea: We cannot distinguish two objects a and b with a b. We consider some predicate P(x) over U (represented as subsets of U). The

relation motivates the following definition: Def.:

(i) The lower approximation of P is

Pu = { x U | for all y mit x y : y P }

(ii) The upper approximation of P is

Po = { x U | there is y P mit x y }

Elements of Pu are surely and elements of Po are possibly in P.


Rough Sets (2)PPo Pu

x* *

y

11

x

y*

*

Here we have x y and x1 y1.The set Po \Pu called the uncertainty area. Because decisions about elements in Pu and elements not in Po are certain the rough set method can be regarded as „to be on the safe side“.


Rough Sets (3) There are basically two types of indiscernability

relations: is transitive: Then is an equivalence relation. A

typical example occurs in the attribute value representation when two objects cannot be distinguished because the values of some attributes are missing.

is not transitive: Then is not an equivalence relation. A typical example is when domains of some attributes are real numbers which cannot be distinguished if the difference is smaller then some threshold (e.g. due to measurement errors).

Although the two types have many differences the rough set method applies to both of them.


Fuzzy Sets (1) Fuzzy sets are generalizations of ordinary („crisp“)

sets. Suppose U is some (ordinary) set. A fuzzy subset X of U is defined by a function

µX : U [0,1]

Notation: X f U

For y in U µX (y) is called the degree of membership of y to X and µX is the membership function of X.

Example: X = Young_Customer, µX(Bill) = 0.5 if Bill is of age 32

This is easily generalized to n-ary relations


Fuzzy Sets (2)

In fuzzy logic and set theory many classical notions are generalized.A fuzzy partition of U into n fuzzy subsets is given by membership functions µ1(x),...,µn(x) such that (µi | 1in) = 1.In particular, disjoint means now disjoint to some degree.A fuzzy classifier for such a partition is mapping

cf : U [ 0, 1]n

such that forcf(x) = (y1(x),...,yn(x)) we have ( yi(x) | 1in) = 1.


Fuzzy Sets (3)

Fuzzy equality E is a special fuzzy equivalence relation satisfying (1) E(x,x) = 1, (2) E(x,y) = E(y,x), (3) E(x,y) + E(y,z) -1 E(x,z)

A weakening of fuzzy equivalence is a similarity measure sim :

SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)

Similarity measures are generalized measures

SIM f V: = A x B, µSIM (a, b): = sim (a, b)

We call the pairs (a,b) partners and sim(a,b) the degree of partnership.


Example

3 5 3 6 3 7 3 8 3 9 4 0 4 1 4 2

0

1H i g hf e v e r

L o wf e v e r

This is a fuzzy partition: There is an area where high and lowfever overlap.

In the intersection area: What is the decision, high or low ?Rationality principle: The one with highest degree of membership.There is one point where both decisions can be made.


Regular Membership Functions

It is not useful to consider arbitrary membership functions.Def.: For T IR a function µ : T IR is called regular if (i) µ is piecewise continuous(ii)If x,y T and [0,1] then µ(x + (1-)y) = min(µ(x), µ(y))(iii) There is some x in T with µ(x) = 1

(i) is often specialized to “piecewise linear” which makes it computationally more feasible. Certain exceptions should beallowed in order to include the jump from 0 to 1 in classical logic.(ii) excludes several maxima and (iii) postulates the existence of an ideal argument.


Norms and Co-Norms (1)

t -norms f(x, y) (intended to compute µA B):Axioms:(T1) f(x, y) = f(y, x)(T2) f(x, f(y, z)) = f(f (x, y), z)(T3) x x' , y y' f(x, y) f(x', y') (T4) f(x, 1) = x

Typical t-norms are f(x,y) = min(x,y) or f(x,y) = xyco-t-norms f(x, y) (intended to compute µA B):

Axioms:(T1), (T2), (T3) and(T4*) f(x, 0) = x

Typical co-t-norms are f(x,y) = max(x,y) or f(x,y) = x+y - xy.


Norms and Co-Norms (2)

Consequences:

f(x, 0) = 0 for t-norms

f(x, 1) = 1 for co-t-norms.

Norms and co-norms cannot reflect relative importance (e.g. using weights)

There are other fuzzy combination rules available which are fuzzy versions of general Boolean operators like different types of implications.


Negation and Complement A negation is a function neg: [0,1] [0,1] with the

axioms (N1) neg(0) = 1, neg(1) = 0 (N2) x y neg(y) neg(x)

The negation function is intended to compute the membership function of the complement C(A) of A.

A typical example is neg(x) = 1 - x. For this function the de Morgan laws hold:

µC(A B)(x) = µC(A) C(B) (x), µC(A B)(x) = µC(A) C(B) (x)if the t-norm min and the co-t-norm max are

used.


The Mamdami-Implication

We consider implications between literals, i.e.of the form 12nBecause the literals have often names from natural language such implications are called linguistic rules.

Predicate logic interpretation: Predicates, interpreted in a model.

Fuzzy logic interpretation: Membership functions over the universe. The membership function in the conjunction is

computed using a t-norm. The implication is considered as a conjunction

12n


ExampleLinguistic rule: If temperature low and road narrow then speed slow.Fuzzy representation

Actual temperature Actual narrownessInferred membershipfunction for “slow”

The result of the implication is a membership function, not a single value.To obtain such a value an operation called defuzzification is needed.


Defuzzification There are three major defuzzification methods which

assign to am membership function µ with domain Y an element y Y. We put Max = {y Y y’Y: µ(y’) µ(y)}

1) Maximum method: Choose an arbitrary y Max

2) Mean-of-Maximum method: Choose y as the mean of Max: y =Max-1 (y’ y’Max 3) Center-of-Area method: Define y as the element below the center of gravity of

the area bounded by the curve µ and the y-axis: y = -1dyyµ )( dyyµy )(*


Linguistic Expressions

Linguistic expressions are taken from natural language Something is large, expensive, ...., nice, pale, .... IF the water is very dirty THEN add much soap

Ways of formalization: Classical logic: Logical predicates and rules with binary

interpretation. A specific object satisfies a predicate or it does not.

Fuzzy logic: Fuzzy set with membership function. An object satisfies a predicate to some degree (fuzzification).

Rules where the conclusion is an action or a decision: Classical logic: Action can be performed if preconditions

are satisfied. Fuzzy logic: The conclusion is the membership function

which cannot be performed directly. The defuzzification transforms this into some action, depending on the degrees of truth of the conditions.

Abstract Level

Abstractionto Abstractclassicalpredicates

instantiationdefuzzification

Concrete Level(Real Data)

Abstractionto linguisticpredicateswith fuzzyinterpretations

Abstract and Linguistic Predicates


Linguistic variables:Variable: X

temp ; Values: {low, med, high}

Membership functions:

Variable: Xpressure ;

Values: {no, low, med, high} Membership functions:

low med high0.7

0.49

Temp

MF

low med high

0.3

0.21

Pressure

MF

0.09

no

Example (1)


New fuzzy variable for representing the decision class: Variable: X

class

Values: {K1, K2}

RulesR1: IF X

Temp is high

OR XPressure

is no THEN XClass

is K1R2: IF X

Temp is low

OR XPressure

is high THEN XClass

is K1R3: IF X

Temp is high

OR XPressure

is high THEN XClass

is K2

Actual query: X*Temp

is med, X*Pressure

is low

K21

Class

MF

K1

Example (2)


- Application of rule R1:max

x min {

Temp,high(x) ,

Temp,med(x) } = 0.49

maxx min {

Pressure,no(x) ,

Pressure,low(x) } = 0.21

- Fuzzy for precondition of rule R1: min{1, 0.49 + 0.21} = 0.8

- Resulting membership function for conclusion is a K1 singleton with value 0.8

Application of R2: membership function for conclusionis a K1 singleton with value 0.58

Application of R3: membership function for conclusion is a K2 singleton with value 0.58

Example (3)


Aggregation of all rules leads to the following fuzzy set:

The application of the “Mean-of-Max” defuzzification operator (selecting the value with the maximum membership value) leads to the crisp value K1.

K20.8

Class

MF

K1

0.58

Example (4)


Fuzzy Sets and Rough Sets We assume a fuzzy set, represented by a

membership function µ. A crisp interval property P is defined by a real

number , 0 1 s.t. P(x) µ(x) (or µ(x) ) or it is a complement of such a set.

If the number a is not exactly known this gives rise to a rough set by introducing to numbers b,g, 0 1, which function as thresholds:

Pu(x) µ(x) Pox) µ(x) The smaller the difference between the thresholds

is the smaller the uncertainty area is.

not P(x) for sureP(x) for sure

uncertainty


Similarities We consider a fixed set U (the universe). Similarity has a relational and a functional aspect Relational aspect: R(x,y,u,v) intended as “y is more

similar to x than v is to u”. Special case: R(x,y, x,u), “y is more similar to x than v

is to x”. Functional aspect: A similarity measure on a set U is a

mapping sim: U x U [0,1] (real interval) A dual notion is a distance measure A similarity measure quantifies the degree of similarity. Fuzzy equalities are special cases of similarity

measures.


From Fuzzy Sets to Similarities In order to define a similarity measure one needs not to start from pairs of objects. If we have simply a fuzzy set K f

U then we would need in addition a reference object

in order to define a measure. Such a reference object has to satisfy

µK (x) = 1.In this case we can define a similarity measure by

sim(x, y) = µK (y)and get a measure satisfying sim(x,x) = 1. If there is a subset P U such that for each x P we have some fuzzy subset Kx f U with membership functions µx (y), for which µK (x) = 1 holds then we can again define a measure on UxP by

sim (y, x) = µx(y), y U, x P.


From Similarities to Fuzzy Sets One can also start with a similarity measure and want to associate some fuzzy subsets with it:

SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)

We associate to each x U a fuzzy subsetFx f

U

by µx (y) = µ Fx (y) = sim (x,y).

Reflexivity of sim means sim(x,x) = 1.If sim is symmetric then µx (y) = µy (x) holds.So we obtain from sim for each x some fuzzy subset which can be regarded how U is structured by sim from the viewpoint of x: x regards itself as the centralelement and the degree of membership of the otherelements depend on the similarity to x.


Generalization: Partnerships We consider to arbitrary sets A and B. A partner relation is a fuzzy subset PART f

V: = A x B.

Functional view: part: A x B [0,1] (real interval)

Best partners are those with highest degree of partnership. Examples:

A = set of women, B = set of men, partnership: marriage

A = customer demands, B = products, partnership: customer satisfaction

A = diseases, B = therapies, partnership: best for health


Objects, Fuzzy Sets and Similarities We assume a universe U and a fuzzy set X of U with a

membership function µX . Up to now we have only considered similarities and

distances between objects of U, i.e. sim(object, object). There are three more possibilities:

1) sim(membership function, membership function) 2) sim(object, membership function) 3) The third is 2) with permutated arguments (this

plays a role if the first argument is considered as a query and the second as an answer.

Because membership functions interpret linguistic expressions this can also be interpreted as a similarity between linguistic predicates and between objects and linguistic predicates.


a) Crisp Method:Select ai for which i (ai) maximal, i = 1,2; d(1,2) = a1 - a2 b) Integral Method:Fi = Area between i and x - axis

d (1, 2) = Size (F1 F2) ( Symmetric Difference)

1 2

Distances between Membership Functions (1)

Distances now compare fuzzy-membership functions !A corresponding similarity is e.g. sim (1, 2) = 1- d (1, 2)


Distances betweenMembership Functions (2) The disadvantage of the integral method is that two fuzzy

functions with disjoint areas have always the same distance; the crisp method avoids this.

The disadvantage of the crisp method is that the shape of the curves do not play a role. The integral method avoids this.

A combined method is as follows: If the areas are not disjoint apply the integral method If the areas are disjoint add to the distance obtained by

the integral method the distance between the two points where both curves reach zero.

A generalization is obtained if the euclidean distance a1 - a2 is replaced by an arbitrary distance measure.


Similarities between Objects and Fuzzy Functions

Distinction: For y U sim(y, µX ) should not be the same as µX(y) ! E.g. for the fuzzy set high fever 37C and 36C have both membership value 0, but we expect sim(36, high fever) < sim(37, high fever).

In order to combine fuzzy sets and similarity measures we need some simplifying assumptions: The universe U as an interval of real numbers (the approach

generalizes to the situation where U is partially ordered). We consider n linguistic predicates A 1, ..., A n where each

predicate A i has a regular membership function which attains the maximum value 1 at exactly one argument denote by m(A i).

There is a similarity measure sim given on U.


Application: Case-Based Reasoning Case-based reasoning (CBR) is a method which

has its origin in analogical reasoning. The main intention is to reuse previous

experiences for actual problems. The difficulty arises when the actual situation is

not identical to the previous one: There is an inexactness involved.

Its main aspect is that CBR-techniques allow inexact (approximate) reasoning in a controlled manner.

CBR has many application in Knowledge Management


Case-Based Reasoning (CBR)

Problem

LösungFall-i

CASE

BASE

store

new problem

new solution

problem

solutioncase-i

similarity~

adaptation

Cases = Experience = Problem/Solution pair. Cases are stored in a Case Base for further reuse To solve a new problem:

retrieve case with a similar problem from the case base adapt the solution from the most similar case to the new

problem


Probabilities and Diagnosis In a diagnostic situation there are

Certain observations which are regarded as an event E which has happened

A number of hypothetical diagnoses H1, ..., Hn Each hypothesis has a known a priori

probability P(Hi) and for each Hi the conditional probability P(E Hi) is assumed to be known.

If the a posteriori probabilities P(Hi E) (the conditional probabilities after the event E has happened the maximum likelihood principle selects a hypothesis: Take the Hi

where P(Hi E) is maximal.

It remains to compute the a posteriori probabilities from the known ones.


Bayes FormulaBayes formula allows to compute the desired probabilities:

)/()(

)/()()/(

1jHEPHP

HEPHPEHP

j

n

i

iii

=

=

For the proof we consider:

)(

) ()/(

EP

EHPEHP

ii

)(

)()/(

i

ii

HP

HEPHEP

, and

P(E) = P(i (E Hi)) = iP(EHi)P(Hi) from which the

formula follows directly.


Bayesian Nets

A Bayesian net is a semantic net with labels: Nodes: Random variables Directed edges: Conditional probabilities

An edge from A to B exists if node A is a causal reason for node B, i. e. the conditional probability is non-trivial.

If the probability of some node is known then the probability of all linked nodes can be calculated.

If certain nodes are initialized (i.e. the event happened) this gives rise to probabilities of their son nodes: In this way the probabilities are propagated over the net.


Subjective Probabilities They express the opinion of people and are therefore of

informal nature. They are qualitatively represented with the basic notions

A B (“B is more likely then A”) The derived notion is A B neither A B nor B A . (see chapter on partial orderings)

The partial ordering allows to establish Turing tests. There are various attempts to formulate axioms for the

partial ordering with different aims: To derive a probability measure which induces the

partial ordering To grasp peoples behavior when dealing with

probabilities. In probability theory expectation is a derived concept;

for subjective probabilities this may be questioned.


Intervals for Probabilities (1)

Often it is not reasonable to assume that we know exactly the probabilities: What is the probability that your left knee

will by ok by next week? („at least 50%“) What is the probability that the sun is shining

at next Christmas? („no idea, any probability“)

What is the probability that you are available tomorrow evening? („at least 98“)

Consequence: We need to introduce an uncertainty on the level of probabilities too!


Intervals for Probabilities (2)

Suppose = {E1, ...,En} is a set of events with an unknown probability distribution P(E1),...,P(En ), i P(Ei) = 1.

Def.:A set of real intervals [Li, Ui], 0 Li Ui 1 is called an n-dimensional probability interval for . Its purpose to give estimates for the unknown probabilities.

Def.: An n-dimensional probability interval is called reasonable if there are numbers pi, 0 pi 1, i pi = 1 such that Li pi Ui for all i.

Reasonable means it estimates at least one probability distribution.


Intervals for Probabilities (3) Proposition: An n-dimensional probability interval is

reasonable if and only if i Li 1 i Ui .

Proof: If the condition holds then there is some number p, 0 p 1 with pi Li +(1-p) i Ui = 1; th desired probabilities are P(Ei) = pLi + (1-p)Ui. On the other hand the condition is necessary because otherwise the probabilities would not be normalized to 1.

Probability interval for an unknown distribution is a knowledge unit. If several such units are given then they can be combined resulting in some sharper interval estimate.

CPSC 533 - Artificial Intelligence Michael M. Richter Section 6: Impreciseness.

Documents

Transcript of CPSC 533 - Artificial Intelligence Michael M. Richter Section 6: Impreciseness.