Post on 19-Dec-2015
CPSC 533 - Artificial Intelligence Michael M. Richter
Section 6: Impreciseness
CPSC 533 - Artificial Intelligence Michael M. Richter
Forms of Impreciseness
Impreciness occurs frequently:• A number is only approximatively correct (toerance)• A frequency says nothing about a specific case or is
even not exactly known (probability)• The exact number is not of interest (abstraction)• The expression itself has no exact meaning (informal semantics of e.g. like „this is useful“).• Some forms are of objective and others are of
subjective character „(the tolerance is 0.2“ or „the weather is nice“)
CPSC 533 - Artificial Intelligence Michael M. Richter
Forms of Uncertainty and Vagueness (2)
We distinguish vagueness which has an objective origin from vagueness which has a mainly subjective character.
In an objective situation there is an agreement which has a formal character and a model to which one can refer refer. The informal notion than has a formal original. E.g. “high fever” has as original a certain precise temperature (which may not be known or may not be of interest).
In a subjective situation there is no exact original (like “this is stupid”), only a subjective impression is reflected.
In order put subjective vagueness on solid foundations the reasoning in the model is replaced by experiments with the individuals who express their subjective opinions.
From the expression itself it may not be evident whether it has an objective or a subjective meaning.
CPSC 533 - Artificial Intelligence Michael M. Richter
Subjective Uncertainty and a Turing Test (1)Suppose there is a partial ordering „<„ with the concept C associated: The partial ordering then again has two versions: formal and subjective.The Turing test refers to these two versions of „<„ :
Subjective humanversion of C
Formalversionof C
The tests whenvariations of the arguments of < arepresented. Goal:The human says „up“if and only if the formalsystem says „up“
To be verifiedby experiments
CPSC 533 - Artificial Intelligence Michael M. Richter
Subjective Uncertainty and a Turing Test (2)
Concept to grasp: Typical lionFormalversion usesOrdering: Quotient of length/height
Human:Subjective judgement
better
betterThe partial ordering approximates the concept C in the sense that semantics of y < z is : z is more typical for Cthan y is.
Example: Fuzzy values for “typical lion”
CPSC 533 - Artificial Intelligence Michael M. Richter
A Warning Example (1) Reasoning with vague concepts may easily lead to
consequences which are inconsistent. Example: We consider two medications A and B which are
applied two male and female patients in a hospital. The successes are recorded.
There are three predicates introduced with the following semantics: BetterM(X,Y): More successes with medication X applied
to men BetterF(X,Y): More successes with X applied to women Better(X,Y): More successes with X in total.
Question: Is BetterM(X,Y) BetterF(X,Y) Better(X,Y) true ? This sounds plausible but there is a counterexample.
CPSC 533 - Artificial Intelligence Michael M. Richter
A Warning Example (2)
Better M
(A,B)Better F
(A,B)
Better(B,A)
+ -
+ -
A
B
A
B
20 180
4 96
20 20
37 63
50% Success
10% Success
4% Success
37% Success
16,6% Success
20,5% Success
A
B
40 200
41 159
+ -
Men:
Women:
Total:
CPSC 533 - Artificial Intelligence Michael M. Richter
Rough Sets (1): A Basic Method We consider a universe U; the uncertainty is given by a
binary relation „“ over U called indiscernability relation. Assumption: is reflexive and symmetric. Idea: We cannot distinguish two objects a and b with a b. We consider some predicate P(x) over U (represented as subsets of U). The
relation motivates the following definition: Def.:
(i) The lower approximation of P is
Pu = { x U | for all y mit x y : y P }
(ii) The upper approximation of P is
Po = { x U | there is y P mit x y }
Elements of Pu are surely and elements of Po are possibly in P.
CPSC 533 - Artificial Intelligence Michael M. Richter
Rough Sets (2)PPo Pu
x* *
y
11
x
y*
*
Here we have x y and x1 y1.The set Po \Pu called the uncertainty area. Because decisions about elements in Pu and elements not in Po are certain the rough set method can be regarded as „to be on the safe side“.
CPSC 533 - Artificial Intelligence Michael M. Richter
Rough Sets (3) There are basically two types of indiscernability
relations: is transitive: Then is an equivalence relation. A
typical example occurs in the attribute value representation when two objects cannot be distinguished because the values of some attributes are missing.
is not transitive: Then is not an equivalence relation. A typical example is when domains of some attributes are real numbers which cannot be distinguished if the difference is smaller then some threshold (e.g. due to measurement errors).
Although the two types have many differences the rough set method applies to both of them.
CPSC 533 - Artificial Intelligence Michael M. Richter
Fuzzy Sets (1) Fuzzy sets are generalizations of ordinary („crisp“)
sets. Suppose U is some (ordinary) set. A fuzzy subset X of U is defined by a function
µX : U [0,1]
Notation: X f U
For y in U µX (y) is called the degree of membership of y to X and µX is the membership function of X.
Example: X = Young_Customer, µX(Bill) = 0.5 if Bill is of age 32
This is easily generalized to n-ary relations
CPSC 533 - Artificial Intelligence Michael M. Richter
Fuzzy Sets (2)
In fuzzy logic and set theory many classical notions are generalized.A fuzzy partition of U into n fuzzy subsets is given by membership functions µ1(x),...,µn(x) such that (µi | 1in) = 1.In particular, disjoint means now disjoint to some degree.A fuzzy classifier for such a partition is mapping
cf : U [ 0, 1]n
such that forcf(x) = (y1(x),...,yn(x)) we have ( yi(x) | 1in) = 1.
CPSC 533 - Artificial Intelligence Michael M. Richter
Fuzzy Sets (3)
Fuzzy equality E is a special fuzzy equivalence relation satisfying (1) E(x,x) = 1, (2) E(x,y) = E(y,x), (3) E(x,y) + E(y,z) -1 E(x,z)
A weakening of fuzzy equivalence is a similarity measure sim :
SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)
Similarity measures are generalized measures
SIM f V: = A x B, µSIM (a, b): = sim (a, b)
We call the pairs (a,b) partners and sim(a,b) the degree of partnership.
CPSC 533 - Artificial Intelligence Michael M. Richter
Example
3 5 3 6 3 7 3 8 3 9 4 0 4 1 4 2
0
1H i g hf e v e r
L o wf e v e r
This is a fuzzy partition: There is an area where high and lowfever overlap.
In the intersection area: What is the decision, high or low ?Rationality principle: The one with highest degree of membership.There is one point where both decisions can be made.
CPSC 533 - Artificial Intelligence Michael M. Richter
Regular Membership Functions
It is not useful to consider arbitrary membership functions.Def.: For T IR a function µ : T IR is called regular if (i) µ is piecewise continuous(ii)If x,y T and [0,1] then µ(x + (1-)y) = min(µ(x), µ(y))(iii) There is some x in T with µ(x) = 1
(i) is often specialized to “piecewise linear” which makes it computationally more feasible. Certain exceptions should beallowed in order to include the jump from 0 to 1 in classical logic.(ii) excludes several maxima and (iii) postulates the existence of an ideal argument.
CPSC 533 - Artificial Intelligence Michael M. Richter
Norms and Co-Norms (1)
t -norms f(x, y) (intended to compute µA B):Axioms:(T1) f(x, y) = f(y, x)(T2) f(x, f(y, z)) = f(f (x, y), z)(T3) x x' , y y' f(x, y) f(x', y') (T4) f(x, 1) = x
Typical t-norms are f(x,y) = min(x,y) or f(x,y) = xyco-t-norms f(x, y) (intended to compute µA B):
Axioms:(T1), (T2), (T3) and(T4*) f(x, 0) = x
Typical co-t-norms are f(x,y) = max(x,y) or f(x,y) = x+y - xy.
CPSC 533 - Artificial Intelligence Michael M. Richter
Norms and Co-Norms (2)
Consequences:
f(x, 0) = 0 for t-norms
f(x, 1) = 1 for co-t-norms.
Norms and co-norms cannot reflect relative importance (e.g. using weights)
There are other fuzzy combination rules available which are fuzzy versions of general Boolean operators like different types of implications.
CPSC 533 - Artificial Intelligence Michael M. Richter
Negation and Complement A negation is a function neg: [0,1] [0,1] with the
axioms (N1) neg(0) = 1, neg(1) = 0 (N2) x y neg(y) neg(x)
The negation function is intended to compute the membership function of the complement C(A) of A.
A typical example is neg(x) = 1 - x. For this function the de Morgan laws hold:
µC(A B)(x) = µC(A) C(B) (x), µC(A B)(x) = µC(A) C(B) (x)if the t-norm min and the co-t-norm max are
used.
CPSC 533 - Artificial Intelligence Michael M. Richter
The Mamdami-Implication
We consider implications between literals, i.e.of the form 12nBecause the literals have often names from natural language such implications are called linguistic rules.
Predicate logic interpretation: Predicates, interpreted in a model.
Fuzzy logic interpretation: Membership functions over the universe. The membership function in the conjunction is
computed using a t-norm. The implication is considered as a conjunction
12n
CPSC 533 - Artificial Intelligence Michael M. Richter
ExampleLinguistic rule: If temperature low and road narrow then speed slow.Fuzzy representation
Actual temperature Actual narrownessInferred membershipfunction for “slow”
The result of the implication is a membership function, not a single value.To obtain such a value an operation called defuzzification is needed.
CPSC 533 - Artificial Intelligence Michael M. Richter
Defuzzification There are three major defuzzification methods which
assign to am membership function µ with domain Y an element y Y. We put Max = {y Y y’Y: µ(y’) µ(y)}
1) Maximum method: Choose an arbitrary y Max
2) Mean-of-Maximum method: Choose y as the mean of Max: y =Max-1 (y’ y’Max 3) Center-of-Area method: Define y as the element below the center of gravity of
the area bounded by the curve µ and the y-axis: y = -1dyyµ )( dyyµy )(*
CPSC 533 - Artificial Intelligence Michael M. Richter
Linguistic Expressions
Linguistic expressions are taken from natural language Something is large, expensive, ...., nice, pale, .... IF the water is very dirty THEN add much soap
Ways of formalization: Classical logic: Logical predicates and rules with binary
interpretation. A specific object satisfies a predicate or it does not.
Fuzzy logic: Fuzzy set with membership function. An object satisfies a predicate to some degree (fuzzification).
Rules where the conclusion is an action or a decision: Classical logic: Action can be performed if preconditions
are satisfied. Fuzzy logic: The conclusion is the membership function
which cannot be performed directly. The defuzzification transforms this into some action, depending on the degrees of truth of the conditions.
Abstract Level
Abstractionto Abstractclassicalpredicates
instantiationdefuzzification
Concrete Level(Real Data)
Abstractionto linguisticpredicateswith fuzzyinterpretations
Abstract and Linguistic Predicates
CPSC 533 - Artificial Intelligence Michael M. Richter
Linguistic variables:Variable: X
temp ; Values: {low, med, high}
Membership functions:
Variable: Xpressure ;
Values: {no, low, med, high} Membership functions:
low med high0.7
0.49
Temp
MF
low med high
0.3
0.21
Pressure
MF
0.09
no
Example (1)
CPSC 533 - Artificial Intelligence Michael M. Richter
New fuzzy variable for representing the decision class: Variable: X
class
Values: {K1, K2}
RulesR1: IF X
Temp is high
OR XPressure
is no THEN XClass
is K1R2: IF X
Temp is low
OR XPressure
is high THEN XClass
is K1R3: IF X
Temp is high
OR XPressure
is high THEN XClass
is K2
Actual query: X*Temp
is med, X*Pressure
is low
K21
Class
MF
K1
Example (2)
CPSC 533 - Artificial Intelligence Michael M. Richter
- Application of rule R1:max
x min {
Temp,high(x) ,
Temp,med(x) } = 0.49
maxx min {
Pressure,no(x) ,
Pressure,low(x) } = 0.21
- Fuzzy for precondition of rule R1: min{1, 0.49 + 0.21} = 0.8
- Resulting membership function for conclusion is a K1 singleton with value 0.8
Application of R2: membership function for conclusionis a K1 singleton with value 0.58
Application of R3: membership function for conclusion is a K2 singleton with value 0.58
Example (3)
CPSC 533 - Artificial Intelligence Michael M. Richter
Aggregation of all rules leads to the following fuzzy set:
The application of the “Mean-of-Max” defuzzification operator (selecting the value with the maximum membership value) leads to the crisp value K1.
K20.8
Class
MF
K1
0.58
Example (4)
CPSC 533 - Artificial Intelligence Michael M. Richter
Fuzzy Sets and Rough Sets We assume a fuzzy set, represented by a
membership function µ. A crisp interval property P is defined by a real
number , 0 1 s.t. P(x) µ(x) (or µ(x) ) or it is a complement of such a set.
If the number a is not exactly known this gives rise to a rough set by introducing to numbers b,g, 0 1, which function as thresholds:
Pu(x) µ(x) Pox) µ(x) The smaller the difference between the thresholds
is the smaller the uncertainty area is.
not P(x) for sureP(x) for sure
uncertainty
CPSC 533 - Artificial Intelligence Michael M. Richter
Similarities We consider a fixed set U (the universe). Similarity has a relational and a functional aspect Relational aspect: R(x,y,u,v) intended as “y is more
similar to x than v is to u”. Special case: R(x,y, x,u), “y is more similar to x than v
is to x”. Functional aspect: A similarity measure on a set U is a
mapping sim: U x U [0,1] (real interval) A dual notion is a distance measure A similarity measure quantifies the degree of similarity. Fuzzy equalities are special cases of similarity
measures.
CPSC 533 - Artificial Intelligence Michael M. Richter
From Fuzzy Sets to Similarities In order to define a similarity measure one needs not to start from pairs of objects. If we have simply a fuzzy set K f
U then we would need in addition a reference object
in order to define a measure. Such a reference object has to satisfy
µK (x) = 1.In this case we can define a similarity measure by
sim(x, y) = µK (y)and get a measure satisfying sim(x,x) = 1. If there is a subset P U such that for each x P we have some fuzzy subset Kx f U with membership functions µx (y), for which µK (x) = 1 holds then we can again define a measure on UxP by
sim (y, x) = µx(y), y U, x P.
CPSC 533 - Artificial Intelligence Michael M. Richter
From Similarities to Fuzzy Sets One can also start with a similarity measure and want to associate some fuzzy subsets with it:
SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)
We associate to each x U a fuzzy subsetFx f
U
by µx (y) = µ Fx (y) = sim (x,y).
Reflexivity of sim means sim(x,x) = 1.If sim is symmetric then µx (y) = µy (x) holds.So we obtain from sim for each x some fuzzy subset which can be regarded how U is structured by sim from the viewpoint of x: x regards itself as the centralelement and the degree of membership of the otherelements depend on the similarity to x.
CPSC 533 - Artificial Intelligence Michael M. Richter
Generalization: Partnerships We consider to arbitrary sets A and B. A partner relation is a fuzzy subset PART f
V: = A x B.
Functional view: part: A x B [0,1] (real interval)
Best partners are those with highest degree of partnership. Examples:
A = set of women, B = set of men, partnership: marriage
A = customer demands, B = products, partnership: customer satisfaction
A = diseases, B = therapies, partnership: best for health
CPSC 533 - Artificial Intelligence Michael M. Richter
Objects, Fuzzy Sets and Similarities We assume a universe U and a fuzzy set X of U with a
membership function µX . Up to now we have only considered similarities and
distances between objects of U, i.e. sim(object, object). There are three more possibilities:
1) sim(membership function, membership function) 2) sim(object, membership function) 3) The third is 2) with permutated arguments (this
plays a role if the first argument is considered as a query and the second as an answer.
Because membership functions interpret linguistic expressions this can also be interpreted as a similarity between linguistic predicates and between objects and linguistic predicates.
CPSC 533 - Artificial Intelligence Michael M. Richter
a) Crisp Method:Select ai for which i (ai) maximal, i = 1,2; d(1,2) = a1 - a2 b) Integral Method:Fi = Area between i and x - axis
d (1, 2) = Size (F1 F2) ( Symmetric Difference)
1 2
Distances between Membership Functions (1)
Distances now compare fuzzy-membership functions !A corresponding similarity is e.g. sim (1, 2) = 1- d (1, 2)
CPSC 533 - Artificial Intelligence Michael M. Richter
Distances betweenMembership Functions (2) The disadvantage of the integral method is that two fuzzy
functions with disjoint areas have always the same distance; the crisp method avoids this.
The disadvantage of the crisp method is that the shape of the curves do not play a role. The integral method avoids this.
A combined method is as follows: If the areas are not disjoint apply the integral method If the areas are disjoint add to the distance obtained by
the integral method the distance between the two points where both curves reach zero.
A generalization is obtained if the euclidean distance a1 - a2 is replaced by an arbitrary distance measure.
CPSC 533 - Artificial Intelligence Michael M. Richter
Similarities between Objects and Fuzzy Functions
Distinction: For y U sim(y, µX ) should not be the same as µX(y) ! E.g. for the fuzzy set high fever 37C and 36C have both membership value 0, but we expect sim(36, high fever) < sim(37, high fever).
In order to combine fuzzy sets and similarity measures we need some simplifying assumptions: The universe U as an interval of real numbers (the approach
generalizes to the situation where U is partially ordered). We consider n linguistic predicates A 1, ..., A n where each
predicate A i has a regular membership function which attains the maximum value 1 at exactly one argument denote by m(A i).
There is a similarity measure sim given on U.
CPSC 533 - Artificial Intelligence Michael M. Richter
Application: Case-Based Reasoning Case-based reasoning (CBR) is a method which
has its origin in analogical reasoning. The main intention is to reuse previous
experiences for actual problems. The difficulty arises when the actual situation is
not identical to the previous one: There is an inexactness involved.
Its main aspect is that CBR-techniques allow inexact (approximate) reasoning in a controlled manner.
CBR has many application in Knowledge Management
CPSC 533 - Artificial Intelligence Michael M. Richter
Case-Based Reasoning (CBR)
Problem
LösungFall-i
CASE
BASE
store
new problem
new solution
problem
solutioncase-i
similarity~
adaptation
Cases = Experience = Problem/Solution pair. Cases are stored in a Case Base for further reuse To solve a new problem:
retrieve case with a similar problem from the case base adapt the solution from the most similar case to the new
problem
CPSC 533 - Artificial Intelligence Michael M. Richter
Probabilities and Diagnosis In a diagnostic situation there are
Certain observations which are regarded as an event E which has happened
A number of hypothetical diagnoses H1, ..., Hn Each hypothesis has a known a priori
probability P(Hi) and for each Hi the conditional probability P(E Hi) is assumed to be known.
If the a posteriori probabilities P(Hi E) (the conditional probabilities after the event E has happened the maximum likelihood principle selects a hypothesis: Take the Hi
where P(Hi E) is maximal.
It remains to compute the a posteriori probabilities from the known ones.
CPSC 533 - Artificial Intelligence Michael M. Richter
Bayes FormulaBayes formula allows to compute the desired probabilities:
)/()(
)/()()/(
1jHEPHP
HEPHPEHP
j
n
i
iii
=
=
For the proof we consider:
)(
) ()/(
EP
EHPEHP
ii
)(
)()/(
i
ii
HP
HEPHEP
, and
P(E) = P(i (E Hi)) = iP(EHi)P(Hi) from which the
formula follows directly.
CPSC 533 - Artificial Intelligence Michael M. Richter
Bayesian Nets
A Bayesian net is a semantic net with labels: Nodes: Random variables Directed edges: Conditional probabilities
An edge from A to B exists if node A is a causal reason for node B, i. e. the conditional probability is non-trivial.
If the probability of some node is known then the probability of all linked nodes can be calculated.
If certain nodes are initialized (i.e. the event happened) this gives rise to probabilities of their son nodes: In this way the probabilities are propagated over the net.
CPSC 533 - Artificial Intelligence Michael M. Richter
Subjective Probabilities They express the opinion of people and are therefore of
informal nature. They are qualitatively represented with the basic notions
A B (“B is more likely then A”) The derived notion is A B neither A B nor B A . (see chapter on partial orderings)
The partial ordering allows to establish Turing tests. There are various attempts to formulate axioms for the
partial ordering with different aims: To derive a probability measure which induces the
partial ordering To grasp peoples behavior when dealing with
probabilities. In probability theory expectation is a derived concept;
for subjective probabilities this may be questioned.
CPSC 533 - Artificial Intelligence Michael M. Richter
Intervals for Probabilities (1)
Often it is not reasonable to assume that we know exactly the probabilities: What is the probability that your left knee
will by ok by next week? („at least 50%“) What is the probability that the sun is shining
at next Christmas? („no idea, any probability“)
What is the probability that you are available tomorrow evening? („at least 98“)
Consequence: We need to introduce an uncertainty on the level of probabilities too!
CPSC 533 - Artificial Intelligence Michael M. Richter
Intervals for Probabilities (2)
Suppose = {E1, ...,En} is a set of events with an unknown probability distribution P(E1),...,P(En ), i P(Ei) = 1.
Def.:A set of real intervals [Li, Ui], 0 Li Ui 1 is called an n-dimensional probability interval for . Its purpose to give estimates for the unknown probabilities.
Def.: An n-dimensional probability interval is called reasonable if there are numbers pi, 0 pi 1, i pi = 1 such that Li pi Ui for all i.
Reasonable means it estimates at least one probability distribution.
CPSC 533 - Artificial Intelligence Michael M. Richter
Intervals for Probabilities (3) Proposition: An n-dimensional probability interval is
reasonable if and only if i Li 1 i Ui .
Proof: If the condition holds then there is some number p, 0 p 1 with pi Li +(1-p) i Ui = 1; th desired probabilities are P(Ei) = pLi + (1-p)Ui. On the other hand the condition is necessary because otherwise the probabilities would not be normalized to 1.
Probability interval for an unknown distribution is a knowledge unit. If several such units are given then they can be combined resulting in some sharper interval estimate.