(Artificial Intelligence)UNIT-3 · Propositional logic with First order logic (Predicate Calculus)....

35
www.uptunotes.com (Artificial Intelligence)UNIT-3 Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur) UNIT-3 Knowledge Representation & Reasoning: Propositional logic Theory of first order logic Inference in First order logic Forward & Backward chaining, Resolution. Probabilistic reasoning Utility theory Hidden Markov Models (HMM) Bayesian Networks

Transcript of (Artificial Intelligence)UNIT-3 · Propositional logic with First order logic (Predicate Calculus)....

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

UNIT-3 Knowledge Representation & Reasoning:

Propositional logic

Theory of first order logic

Inference in First order logic

Forward & Backward chaining,

Resolution.

Probabilistic reasoning

Utility theory

Hidden Markov Models (HMM)

Bayesian Networks

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Short Question & Answers

Ques 1. Differentiate between declarative knowledge and procedural knowledge.

Ans : Declarative knowledge means representation of facts or assertions. A declarative representation

declares ever piece of knowledge and permits the reasoning system to use the rules of inferences and derive

some new facts and conclusions. A declarative knowledge consists of a database containing relevant

information of some objects. E.g : Relational database of Company employees , Students record in a

particular class.

Procedural Knowledge represents actions or consequences and tells HOW of a situation. This knowledge

uses inference rules to manipulate these procedures to arrive at the result. Example Algorithm to solve

Travelling salesman problem sequentially in a systematic order.

Ques 2. Define the terms Bilief, and hypothesis , Knowledge, Epistemology.

Belief : This is any meaningful and coherent expression that can be manipulated .

Hypothesis: This is a justified belief that is not known to be true. Thus hypothesis is a belief

which is backed up with some supporting evidence.

Knowledge: True justified belief is called knowledge.

Epistemology: Study of the nature of knowledge.

Ques 3. What is formal logic? Give an example.

Ans : This is a technique for interpreting some sort of reasoning process. It is a symbolic manipulation

mechanism. Given a set of sentences taken to be true , the technique determines what other sentences can be

arranged to be true. The logical nature or validity of argument depends on the form of argument.

Example: Consider following two sentences: All men are mortal 2. Socrates is a man , So we can infer

that Socrates is mortal.

Ques 4. What is CNF and DNF ?

Ans : CNF( Conjunctive Normal Form) : A formula P is said to be in CNF , if it is of the form

P = P1 ˄ P2 ˄P3 , ….,Pn-1 , Pn. ; n ≥1, where each Pi from i = 1 to n is a disjunction of an atom

Example: (Q P) ˄ (T ~ Q) ˄ ( P ~T).

DNF(Disjunctive Normal form) A formula P is said to be in DNF if it has the forma

P = P1 P2 P3, ….Pn-1 Pn.; n ≥1, where each Pi from i = 1 to n is a conjunction of an

atom . Example: (Q ˄ P) (T ˄~ Q) ( P ˄ ~T)

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ques 5.What are Horn Clauses ? What is its usefulness in logic programming?

Ans : A horn clause is a clause(disjunction of literals) with at most one positive literal. A horn clause with

exactly one positive literal is called definite clause. A horn clause with no positive literals is sometimes

called a goal clause. A dual horn clause is a clause with at most one negative literal.

Example : ~ P Q ….~ T U.is a definite horn clause. Relevance of horn clause to theorem

proving by predicate logic resolution is that the resolution of two horn clauses is a horn clause. Resolution

of a goal clause and a definite clause is again a goal clause. In automated reasoning it improves the

efficiency of algorithms. Prolog is based on Horn clauses.

Ques 6. Determine whether the following PL formula is (a) Satisfiable (b) contradictory

(c) Valid : ( p q ) r q

Ans : Truth table for above problem :

Therefore, the given formula is a Tautology.

Ques 7. Convert the following sentences into wff of Predicate Logic ( First order logic).

(i) Ruma dislikes children who drink tea.

(ii) Any person who is respected by every person is a king.

Ans : (i) x child(x) ˄ DrinkTea (x) →Dislikes ( Ruma, x)

(ii) x y : Person (y) ˄ Respects( y , x) → King (x)

p Q p ˄ q r q ~q r ~q ( p q ) r q

T T T T T F T T

T F F T F T T T

F T F F T F F T

F F F F F T T T

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Long Question & Answers

Ques 8 : Define the term knowledge. What is the role of knowledge in Artificial Intelligence?

Explain various techniques of knowledge representation.

Ans : Knowledge: Knowledge is just another form of data. Data is a raw facts. When these raw facts are

organized systematically and are ready to be processed in human brain or some machine , then it becomes

the knowledge. From this knowledge we can easily draw desired conclusions which can be used to solve

real world complex and simple problems.

Example : A doctor treating a patient requires both the knowledge as well as data. The data is patient’s

record (i.e. patient’s history, measurements of vital signs , diagnosticreports, response of medicines etc…).

Knowledge is that information which the doctor has gained in medical college during his studies.

Cycle of knowledge from data is as follows :

(a) Raw data when refined , processed or analyzed yields information which becomes useful in

answering users queries.

(b) Further refinement , analysis and the adition of heuristics, information may be converted into

knowledge, which is useful in problem solving and from which additional knowledge may be

inferred.

Role of Knowledge in AI : Knowledge is central to AI. More is the knowledge then better are the chances

of a person to be more intelligent as compared from others. Knowledge also improves search efficiency of

human brain. Knowledge to support Intelligence is needed because :

(a) We can understand natural language with the help of it and use it when required.

(b) We can make decisions if we possess sufficient knowledge about the certain domain.

(c) We can recognize different objects with varying features quite easily.

(d) We can interpret various changing situations very easily and logically.

(e) We can plan strategies to solve difficult problems altogether.

(f) Knowledge is dynamic and Data is static.

An AI system must be capable of doing following three things :

(a) Store the knowledge in knowledge base(Both static and dynamic KB)

(b) Apply the knowledge stored to solve problems.

(c) Acquire new knowledge through the experience.

Three key components of an AI system.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

1. Representation 2. Learning 3. Reasoning.

Various techniques of knowledge representation

(A ) Relation Knowledge : This is the simplest way to represent knowledge in static form , which is stroed

in a database as a set of records.Facts about the set of objects and relationship between objects are set out

systematically in columns. This technique has very little opportunity for inference. But it provides

knowledge base for other powerful inference mechanisms.Example: Set of records of Employees in an

organization Set of records and related information of voters for elections.

(B) Inheritable Knowledge :One of the most useful form of inference is property inheritance. In this

method Elements of certain classes inherit attributes and values from more general classes in which they are

needed. Features of inheritable knowledge are :

Property inheritance (Objects inherit values from being members of a class, data must be organized

into a hierarchy of classes.)

Boxed nodes (contains objects and values of attributes of objects).

Values can be objects with attributes and so on…

Arrows ( point from object to its value).

This structure is known as Slot and Filler Architecture, Semantic network or collection of

frames.

In semantic networks nodes of classes or objects with some inherent meaning are connected in a

network structure.

Simple relational knowledge

Inheritable knowledge

Inferential knowledge

Procedural knowledge.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

(C) Inferential Knowledge : Knowledge is useless unless there is some inference process that can exploit

it.The required inference process implements the standard logical rules of inference. It represents knowledge

as a form of formal logic . Example : All dogs have tails , :x dog(x) →hastail(x)

This knowledge supports automated reasoning. Advantages of this approach is:

It has set of strict rules.

Can be used to derive more facts.

Truth of new statements can be verified.

Guaranteed correctness.

(D) Procedural Knowledge: This is encoded form of some procedures. Example: Small programs that

know how to do specific things , how to proceed e.g. a parser in a natural language system has the

knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to

routines that know how to process articles , adjectives and nouns.

Advantages :

Heuristic or domain specific knowledge can be represented

Extended logical inferences, like default reasoning is incorporated.

Side effects of actions may be modeled.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Disadvantages :

Not all the cases may be represented.

Not all the deductions may be correct

Modularity is not necessary, control information is tedious.

Ques 9 : Define the term logic. What is the role of logic in Artificial Intelligence? Compare

Propositional logic with First order logic (Predicate Calculus).

Ans : Logic is defined as a scientific study of the process of reasoning and the system of rules and

procedures that help in reasoning process. Logic is the process of reasoning representations using

expressions in formal logic to represent the knowledge required. Inference rules and proof procedures can

apply this knowledge to solve specific problems.

We can derive new piece of knowledge by proving that it is a consequence of knowledge that is already

known. We generate logical statements to prove the certain assertions.

Algorithm = logic + control

Role of Logic in AI

Computer scientists are familiar with the idea that logic provides techniques for analyzing the

inferential properties of languages. Logic can provide specification for a programming language by

characterizing a mapping from programs to the computations that they implement.

A compiler that implements the language can be incomplete as long as it approximates the logical

requirements of given problem. This makes it possible to involve logic in AI applications to vary

from relatively weak uses in which logic informs the implementation process with analysis in depth .

Logical theories in AI are independent from implementations. They provide insights into the

reasoning

problem without directly informing the implementation.

Ideas from logic theorem proving and model construction techniques are used in AI.

Logic works as a analysis tool , knowledge representation technique for automated reasoning and

developing Expert Systems. Also it gives the base to programming language like Prolog to develop

AI softwares.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

George Boole (1815-1864) wrote a book in , named as “ Investigation of Laws of Thoughts”

To investigate the fundamental laws of those operations of the mind by which reasoning is

performed ; to give expression to them in the symbolical language of a Calculus and upon this

foundation to establish the science of Logic and construct its method. To make this method

Itself the basis of a general method from the various elements of truth brought to view in the

course of these inquiries some probable intimations concerning the nature and constitution of

human mind.

Comparison b/w Propositional Logic & First Order Predicate Logic

Ques 10 (A) Convert the following sentences to wff in first order predicate logic.

(i) No coat is water proof unless it has been specially treated.

(ii) A drunker is enemy of himself.

(iii) Any teacher is better than a lawyer.

(iv) If x and y are both greater than zero, so is the product of x and y.

(v)Every one in the purchasing department over 30 years is married.

(B) Determine whether each of the following sentence is satisfiable, contradictory or valid

S1 : (p q) (p q) p S2 : p q p

S.NO PL FOPL

1. Less Declarative More Declarative

2. Contexts dependent semantics Context independent semantics

3. Ambiguous and less expressive Unambiguous and more expressive.

4. Propositions are used as components

with logical connectives.

Use of predicates/relations between

objects, functions , variables , logical

connectives and quantifiers( Existential

and Universal)

5. Rules of inferences are used for

deduction like Modus Ponen, Modus

Tollens,disjunctive syllogism etc.

Rules of inferences are used along with

the rules of Quantifiers .

6. Inference algorithms like inference

rules , DPLL, GSAT are used.

Inference algorithms like Unification ,

Resolution , backward and forward

chaining are used.

7. NP complete Semi-decidable

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ans : (A) (i) No coat is water proof unless it has been specially treated.

:x [ C(x) → ( ~W(x) S(x) ] , where :

C(x) : x is a coat , ~ W(x) : x is not water proof , S(x) : x is specially treated.

(ii)A drunker is enemy of himself

:x [ D(x) → E(x,x)] , where : D(x) : x is a drunker , E(x,x) : x is enemy of x.

(iii) Any teacher is better than a lawyer.

:x [ T(x) → y : ( L(y) ˄ B(x , y)] , where :

T(x) : x is a teacher , L(y) : y is lawyer , B(x , y) : x is better than y.

(iv) If x and y are both greater than zero, so is the product of x and y.

x y [ GT (x , 0 ) ˄ GT (y , 0) → GT ( times (x , y) , 0 ) ].

Where : GT : greater than , times(x ,y) : x times y (times is predicate), or we can use

product_of (x , y) , product_of is a function.

(v) Every one in the purchasing department over 30 years is married.

x y [ works_in (x , purch_deptt ) ˄ has_age (x , y ) ˄ GT(y,30 ) → Married(x) ]

(B) (i) Truth table for : (p q) (p q) p

Hence by last column of truth table, the above statement is satisfiable.

(ii)Truth table for : p q p

P Q p q ~q p ~q (p q) (p q) p

T T T F T T

T F T T T T

F T T F F F

F F F T T T

P

q

p → q

~p

p q p

T T T F F

T F F F T

F T T T T

F F T T T

Hence by last column of

truth table, the above

statement

is satisfiable.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ques 11: Using the inference rules of Propositional logic , Prove the validity of following

axioms:

(i) If either algebra is required or geometry is required then all students will study

mathematics.

(ii) Algebra is required and trignometry is required therefore all students will study

mathematics.

Ans : Converting above sentences to propositional logic and applying inference rules :

(i) (A G → S)

(ii) (A ˄ T) To prove that : S is true

Where A : algebra is required , G : geometry is required , T : trigonometry is required.

(iii) (A ˄ T) is true

By simplification A is true (applying simplification in formula (ii))

(iv) Now (A G ) is true. (applying addition in (iii))

(v) Therefore , S is true ( applying Modus Ponen b/w (i) & (iv))

Hence above axioms are valid, because all are proved to be true.

Ques 12 : Determine whether the following argument is valid or not. “ If I work whole night on this

problem, then I can solve it . If I solve the problem , then I will understand the topic.

Therefore , I will work whole night on this problem, then I will understand the topic.”

Ans : Converting above sentences to propositional logic and applying inference rules :

(i) WN → S , where WN : If I work whole night, S : I can solve it

(ii) S → U , where U : I will understand the topic,

To prove the validity of : WN → U.

(iii) Between the axioms (i) & (ii) apply Hypothetical syllogism/chain rule of inference.

So we get : WN →U, Hence the validity of axioms is proved.

Ques 13. Given the following sentences, Prove their validity :

(i) Either Smith attended the meeting or Smith was not invited in the meeting.

(ii) If directors wanted Smith in meeting then Smith was invited in me.

(iii) Smith didn’t attended the meeting.

(i) (iv) If director’s didn’t want Smith in meeting and Smith was not invited to meeting, then

(ii) Smith is on his way out of the company.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

(iii)

(iv) Ans : Converting above sentences to propositional logic and applying inference rules :

(v) (i) ( A ~ I ) , where A : Smith attended the meeting , ~I :smith was not invited in meeting.

(vi) (ii) ( D → I ) , where D : Directors wanted Smith in meeting , I : Smith was invited in meeting.

(vii) (iii) ~ A , Smith did not attend the meeting.

(viii) (iv) ( ~ D ˄ ~ I ) → W , To prove that W is true , Smith is on his way out of the company.

(ix) (v) ~ I (By applying Disjunctive Syllogism b/w axiom (i) & (iii).

(x) (vi) ~ D ( By applying Modus Tollens b/w axiom (ii) & (v)).

(xi) (vii) (~ D ˄ ~ I) ( By applying Conjunction b/w axiom (v) & (vi)).

(xii) (viii) W ( By applying Modus Ponen b/w axiom (iv) & (vii)). (Hence Proved.)

(xiii)

(xiv) Ques 14 : What is clause form of a wff (well-formed formula)? Convert the following formula into

(xv) clause form : x y [ z P( f(x), y, z) → { u Q( x , u) ˄ v R( y, v) } ].

(xvi)

(xvii) Ans : Clause Form : In Theory of logic either it is propositional logic or predicate logic , while proving the

validity of statements using resolution principle it is required to convert well-formed formula into the

clause form. Clause form is the set of axioms in which propositions or formula are connected only through

OR (˅) connector.

Step 1: Elimination of Implication: Applying P → Q ~ P ˅ Q

yx (~ z P ( f(x), y, z ) ˅ (u Q ( x , u) ˄ v R( y ,v) )

Step 2 : Resolving the scope of Negation: Applying ~ (x) F(x) x ~ F(x).

yx ( z ~ P ( f(x), y, z ) ˅ (u Q ( x , u) ˄ v R( y ,v) )

Step 3. Applying Qx F(x) ˅ G Qx [ F(x) ˅ G ]

yx z (~ P ( f(x), y, z ) ˅ (u Q ( x , u) ˄ v R( y ,v) )

Step 4. Conversion to Prenex Normal Form

yx z u v (~ P ( f(x), y, z ) ˅ ( Q ( x , u) ˄ R( y ,v) )

Step5. Skolemization : Conversion to Skolem standard form

y (~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ R( y , I(v) )

Step 6. Removal of Universal Quantifiers

(~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ (R( y , I(v) ) )

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Step 7. Apply Distributive Law for CNF: P ˅ ( Q ˄ R ) ( P ˅ Q ) ˄ ( P ˅ R )

( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) ) ˄ ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )

Step 8. On removing ˄ we get two clauses:

Clause 1: ( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) )

Clause 2 : ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )

Ques 15 : (A) What is resolution Principle in propositional logic, explain?

(B) Let the following set of axioms is given to be true: P , (P ˄ Q ) → R ,

( S ˅ T ) → Q , T . Assumption is that all are true. To Prove that R is true.

Ans : Resolution Principle : This is also called proof by refutation. To prove a statement is valid ,

resolution attempts to show that the negation of statement produces a contradiction with known

statements . At each step two clauses, called PARENT CLAUSES are compared / resolved,

yielding a new clause that has been inferred from them.

Example : Let two clauses in PL C1 and C2 are given as :

C1 : winter ˅ summer , C2: ~ winter ˅ cold . Assumption is that both C1 and C2

Are true. From C1 and C2 we can infer/deduce summer ˅ cold. This is RESOLVENT CLAUSE

Resolvent Clause is obtained by combining all of the literals of the two parent clauses except the

ones that cancel. If the clause that is produced is empty clause, then a contradiction has been found.

E.g : winter and ~ winter will produce an empty clause.

Algorithm of resolution in propositional logic:

Step 1: Convert all the propositions of F to clause form, where F is set of axioms.

Step 2: Negate proposition P and convert the result to clause form. Add it to the set of clauses

obtained in step 1.

Step 3. Repeat until either a contradiction is found or no progress can be made:

(a) Select two clauses as a parent clause.

(b) Resolve them together. Resolvent clause will be the disjunction of all literals of both the

parent clause with following conditions :

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

(i) If there are any pairs of literals L and ~L such that one of the parent clauses

contains L and other contains ~L , then select one such pair and eliminate both L

and ~L from resolvent clause.

(c) If resolvent is empty clause , then a contradiction has been found. If it is not , then ad it

to the set of clauses available to the procedure.

Ans : (B) Let ~R is true, ad it to the set of clauses formed from given axioms(as a set of support).

C1 : P is true , C2 : ~P V ~Q V R ( By eliminating implication in

(P ˄ Q) → R

C3 : ~ S ˅ Q , C4 : ~ T ˅ Q , C5 : T , C6 : ~R.

( Eliminating implication from ( S ˅ T ) → Q

~ ( S ˅ T ) ˅ Q ≡ (~ S ˄ ~ T ) V Q ( By demorgan’s law), Now apply distributive law

We obtain : (~ S ˅ Q ) ˄ (~ T ˅ Q ) , convert it into two clauses C3 and C4 after

removing AND connector.

Clauses C1 to C5 are base set and C6 is set of support.

~ P ˅ ~ Q ˅ R ~ R

~ P ˅ ~ Q (Resolvent Clause ) P

~ T ˅ Q ~ Q

~ T T

Empty Clause

(Contradiction Found)

Assumption that ~ R is true is false.

So R is true.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ques 16: How is resolution in first order predicate logic different from that of propositional

performed? What is Unification Algorithm & why it is required?

Ans : In FOPL , while solving through resolution , situation is more complicated since we must consider all

the possible ways of substituting values for variables. Due to the presence of existential and universal

quantifiers in wff and arguments in predicates , the thing becomes more complicated

Theoretical basis of resolution procedure in predicate logic is “Herbrand’s Theorem” , which is as

follows :

(i) To show that a set of clause S is is unsatisfiable, it is necessary to consider only

interpretations over a particular set, called as Herbrand Universe S.

(ii) A set of clauses S is unsatisfiable iff a finite subset of ground instances ( in which all

bound variables have a value substituted for them), of S is unsatisfiable.

Finding a contradiction is to try systematically the possible substitutions and see if each produces a

contradiction. To apply resolution in predicate logic , we first need to apply unification technique.

Because in FOPL literals with arguments are to be resolved , then matching of arguments is also

required.

Unification Algorithm: Unification algorithm is used as a Recursive Procedure. Let two literals in

FOPL are P (x ,x ) and P ( y , z ). Here predicate name P matches in both literals , but arguments do

not match. O now substitution is required. Now 1st arguments of both x and y do not match. So

substitute y for x , then it will match.

So substitution 𝝈 = y / x is required. (𝝈 is called UNIFIER)Now if we apply 𝜎 = z / x ,

then it is not a consistent substitution , because we can not substitute both y and z for x.

So after applying 𝜎 = y / x , we can perform : P ( y , y ) and P ( y , z ) . Now unify aruments

y and z , by 𝜎 = z / y. So new composition can be : (z / y)(y /

Some Rules for unification algorithm :

i. A variable can be unified with a constant.

ii. A variable can be unified with another variable.

iii. A variable can be unified with a function.

iv. A variable can’t be unified by a function which has an argument as a same variable.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

v. A constant can’t be unified by a constant.

vi. Predicate/ Literals’ with different number of arguments can’t be unified.

Ques 17: Given the following set of facts, Prove that “ Some who are intelligent can’t read ”.

(i) Who ever can read is literal.

(ii) Dolphins are not literate

(iii) Some Dolphins are intelligent.

Ans : Solution : Form wff of given sentences.

S1 : ∀x [ R (x) → L(x)] , R(x) : whoever can read , L(x) : x is literate.

S2 : ~ L (Dolphins) , ~ L means not literate.

S3 : ∃x [ D(x) ˄ I (x) ] , D(x) : x is Dolphin, I(x) : x is intelligent.

S1 to S3 is Base Set. Let us assume that negation of statement to be proved is true.

So to prove that : ∃x [ I(x) ˄ ~ R (x) ] is true, we assume ~∃x [ I(x) ˄ ~ R (x) ] is true.

So add it as a set of support in the Base Set.

~∃x [ I(x) ˄ ~ R (x) ] ≡ ∀ x [ ~ I(x) ˅ R(x) ] ≡ ~ I(x) ˅ R(x)

Convert all wffs into clause form :

C1 : ~R(x) ˅ L(x) , C2 : ~ L( Dolphins)

In S3 : Apply existential Instantiation to remove ∃ quantifier.

Therefore C3 : D(c) ˄ I(c) { This is in CNF now.}.

Now two clauses can be formed after eliminating Connector ˄. So we get :

C3 (a) : D (c) , C3(b) : I(c).

C4 : ~I(x) ˅ R(x) , This is Set of Support.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ques 18 : Given the following set of facts :-

(i) John likes all kinds of food

(ii) Apples are food

(iv) Anything any one eats and is not killed by is food.

(iii) Bill eats peanuts and is still alive.

(iv) Sue eats everything Bill eats

Translate above into predicate logic. Convert each wff so formed in the clause form.

“ Prove that John likes peanuts Using resolution “

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ans .Converting given statements into wff of FOPL

∀ 𝑥 : Food(x) → Likes (John , x)

Food (Apples)

Food (Chicken)

∀ 𝑥 ∀ 𝑦 : Eats ( x , y) ˄ ~ Killed (x) → Food ( y)

∀ 𝑥 : Eats ( Bill , Peanuts ) ˄ alive (Bill )

∀ 𝑥 : Eats ( Bill , x) → Eats (Sue , x)

To Prove that : Likes ( John , Peanuts)

Conversion of above wffs into clause form :

C1 : ~ Food (x) ˅ Likes ( John , x)

C2 : Food (Apples)

C3 : Food ( Chicken)

C4 : ∀ 𝑥 ∀ 𝑦 : Eats ( x , y) ˄ ~ Killed (x) → Food ( y)

≡ ~ [ Eats ( x , y) ˄ ~ Killed (x) ] ˅ Food ( y)

C5(a) : Eats ( Bill , Peanuts)

C5 (b) : alive (Bill) OR ~ Killed ( Bill )

C6 : ~ Eats ( Bill , x) ˅ Eats ( Sue , x )

Let us assume that John does not Likes peanuts is True.

C7 : ~ Likes ( John , Peanuts ) , ( This is set of

support )

C1 C7

𝝈 = x/Peanuts

~ Food ( Peanuts)

C4

𝝈 = y / Peanuts

~ Eats ( x , Peanuts ) ˅ Killed (x)

𝝈 = 𝒙/𝑩𝒊𝒍𝒍 C5 (a)

Killed (Bill)

But from Sub clause C5 (b) we have

Alive (Bill), i.e Bill is alive. So

contradiction has occurred.

Therefore, our assumption that John

does not likes Peanuts is false. Hence

we can say that Likes( John ,

Peanuts) is true.

Ques 19 : Explain Backward and forward Chaining , with example in logic representation. Also mention

advantages and disadvantages of both the algorithms.

Ans : The process of the output of one rule activating another rule is called chaining. Chaining technique is

to break the task into small procedures and then to inform each procedure within the sequence by itself. Two

types of chaining techniques are known: forward chaining and backward chaining.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

(A) Forward chaining :

This a data-driven reasoning, and starts with the known facts and tries to match the rules

with these facts.

There is a possibility that all the rules match the information (conditions). In forward chaining,

firstly the rules looking for matching facts are tested, and then the action is executed.

In the next stage the working memory/term memory is updated by new facts and the matching

process all over again starts. This process is running until no more rules are left, or the goal is

reached.

Forward chaining is useful when a lot of information is available. Forward chaining is useful to

be implemented if there are an infinite number of potential solutions like configuration

problems and planning.

A rule based KB is given as : and it is to prove the conclusion.

Rule1: IF A OR B THEN C

Rule 2 : IF D AND E AND F THEN G

Rule 3: IF C AND G THEN H

The following facts are presented: B, D, E, F. Goal: prove H. The structure of a forward chaining

example is given in the following figure:

Backward Chaining :

The opposite of a forward chaining is a backward chaining.

Contrast to forward chaining, a backward chaining is a goal-driven reasoning method. The

backward chaining starts from the goal (from the end) which is a hypothetical solution and the

inference engine tries to find the matching evidence.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

When it is found, the condition becomes the sub-goal, and then rules are searched to prove

these sub-goals. It simply matches the RHS of the goal. This process continues until all the

sub-goals are proved, and it backtracks to the previous step where a rule was chosen.

If there is no rule to be established in an individual sub-goal, another rule is chosen.

The backward chaining reasoning is good for the cases where there are not so much facts and

the information (facts) should be generated by the user. The backward chaining reasoning is

also effective for application in the diagnostic tasks.

In many cases the linear logic programming languages are implemented using the

backward chaining technique. The combination of backward chaining with forward

chaining provides better results in many applications.

\

.

Decision Criteria for Forward or Backward Reasoning

1. More possible goal states or start states?

(a) Move from smaller set of states to the larger

(b) Is Justification of Reasoning required?

2. Prefer direction that corresponds more closely to the way users think.

3. What kind of events triggers problem-solving?

(a)If it is arrival of a new fact, forward chaining makes sense.

(b) If it is a query to which a response is required, backward chaining is more natural.

4. In which direction is branching factor greatest?

(a) Go in direction with lower branching factor

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Advantages and disadvantages of forward chaining :

1. Runs great when a problem naturally begins by collecting data and searching for information that

can be collected from it to be used in future steps.

2. Forward chaining has the capability of providing a lot of data from the available few initial data or

facts.

3. Forward chaining is a very popular technique for implementation to expert systems, and systems

using production rules in the knowledge base. For the expert system that needs interruption, control,

monitoring, and planning, the forward chaining is the best choice.

4. When there are few facts and initial states, the forward chaining is very useful to be applied.

Disadvantages of a Forward Chaining :

1. New information will be generated by the inference engine without any knowledge about which

information will be used for reaching the goal.

2. The user might be asked to enter a lot of inputs without knowing which input is relevant to the

conclusion.

3. Several rules may fire that have nothing to reach the goal;

4. It might produce different conclusions which are the causes of a high cost of the chaining process.

Advantages of Backward Chaining :

1. The system will stop processing once the variable has its value. It's a “floor system”.

2. The system that uses backward chaining tries to set goals in order which they arrive in the

knowledge base.

3. The search in backward chaining is directed.

4. While searching, the backward chaining considers those parts of the knowledge base which are

directly related to the considered problem or backward chaining never performs unnecessary

inferences.

5. Backward chaining is an excellent tool for specific types of problems such as diagnosing and

debugging.

6. Compare to forward chaining, few data are asked, but many rules are searched.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Some disadvantages of backward chaining:

1. The goal must be known to perform the backward chaining process;

2. The implementation process of backward chaining is difficult.

Ques 20: What is Utility theory and its importance in AI ? Explain with the help of suitable examples.

Ans : Utility theory is concerned with people's choices and decisions. It is concerned also with people's

preferences and with judgments of preferability, worth, value, goodness or any of a number of similar

concepts. Utility means quality of being useful. So as per this each state in environment has a degree of

usefulness to an agent, that agent will prefer states with higher utility.

Decision Theory = Probability theory + Utility Theory.

Interpretations of utility theory are often classified under two headings, prediction and prescription:

(i) The predictive approach is interested in the ability of a theory to predict actual choice behavior.

(ii) The prescriptive approach is interested in saying how a person ought to make a decision.

E.g : Psychologists are primarily interested in prediction.

Economists in both prediction and prescription. In statistics the emphasis is on prescription

in decision making under uncertainty. The emphasis in management science is prescriptive

also.

Sometimes it is useful to ignore uncertainty, focus on ultimate choices. Other times, must model

uncertainty explicitly. Examples: Insurance markets, Financial markets., Game theory. Rather than

choosing outcome directly, decision-maker chooses uncertain prospect (or lottery). A lottery is a probability

distribution over outcomes.

Expected Utility : Expected utility of action A , given evidence E , E ∪( A | E) is calculated as follows :

E ∪( A | E ) = ∑ 𝑷𝒊 ( Result i(A) | D0 (A) , E ) ∪ ((𝑹𝒆𝒔𝒖𝒍𝒕𝒊 (𝑨 ) ), where ,

P (Resulti (A) | D0 (A) ) is probability assigned by agent for action A to be executed.

D0(A) : Proposition that A is executed in current state.

This has two basic components; consequences (or outcomes) and lotteries.

(a) Consequences: These are what the decision-maker ultimately cares about.

Example: “I get pneumonia, my health insurance company covers most of the costs, but I have to pay

a $500 deductible.” Consumer does not choose consequences directly. Lotteries Consumer chooses a

lottery, p

(b) Lotteries are probability distributions over consequences: p : C → [0, 1] ;

with ∑c ∈ C p (c) = 1. Set of all lotteries is denoted by P. Example: “A gold-level health insurance

plan, which covers all kinds of diseases, but has a $500 deductible.” Makes sense because consumer

assumed to rank health insurance plans only insofar as lead to different probability distributions over

consequences.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Utility Function : U : P → R has an expected utility form if there exists a function

u : C → R such that U (p) = ∑ p (c) u (c) for all p ∈ P. c ∈ C . In this case, the function U is called an

expected utility function, and the function u is call a von Neumann-Morgenstern utility function. These

functions are used to capture agent’s preferences between various world states .This function assigns a

single number to express desirability of a state utilities. Utilities are combined with outcome probabilities of

actions to give an expected utility for each action. U (s) : Means utility of state S , for agent’s Decision.

Maximum expected Utility ( MEU) : This represents that a rational agent should select an action that

maximizes the agent’s expected utility. MEU principle says “ If an agent maximizes a utility function that

correctly reflects the performance measure by which its behavior is being judged , then it will achieve the

highest possible performance score if we average over the environment of agent.”

Ques 21: What are constraint notations in utility theory ? Define the term Lottery. Also mention the

following axioms of Utility Theory :

(i) Orderability (ii) Substitutability (iii) Monotonicity (iv)Decomposability.

Ans : Constraint Notations in Utility theory for two outcomes / consequences A and B are as mentioned

below :

A B : A is preferred over B.

A ~ B : Agent is indifferent between A and B.

A ≥ B : Agent prefers A to B or is indifferent b/w them.

A Lottery L with possible outcomes C1 , C2 , C3 …..Cn that can occur with probabilities [ p1 , C1 ;

p2 , C2 ; …..; pn , Cn ].Each outcome of a lootery can be an atomic state or another lottery.

Axioms of Utility Theory:

(i) Orderability : Given any two states , a rational agent must prefer one to other or else rate the

two as equally preferable. So agent can’t avoid the decision.

( A B) ˅ ( B A) ˅ ( A ~ B)

(ii) Substitutability: If an agent A is indifferent b/w two lotteries A and B , then the agent is

indifferent b/w two more complex lotteries that are same except that B is substituted for A in

one of them.

( A ~ B) [ p , A ; 1 – p , c ] ~ [ p , B ; 1 – p , c]

(iii) Monotonicity: Let two lotteries have same outcomes A and b. If ( A

B) , then agent prefers lottery with higher probability for A.

( A B) ( p ≥ q [ p , A ; 1 – p , B ] ≥ [ q , A ; 1 – q , B ]

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

(iv) Decomposability: Compound lotteries can be reduced or decomposed to simpler ones .

[ p , A ; 1 – p, [ q , B ; 1 – q , C ] ] ~ [ p , A ; (1 - p) q, B ; (1 - p) (1 -q) , C]

Ques 22 : What is probability reasoning ? Why it is required in AI applications?

Ans : Probabilistic Reasoning in Intelligent Systems is a complete and accessible account of the theoretical

foundations and computational methods that underlie plausible reasoning under uncertainty.

Intelligent agent’s almost never have acess to the whole truth about their environment. So agents act under

uncertainty. The agent’s knowledge can only provide degree of belief. Main concept for dealing with degree

of belief is PROBABILITY THEORY.

If probability is 0 , then belief is that statement is false.

If probability is 1 , then belief is that statement is true.

Percepts received from the environment form the evidence on which probability assertions are based.

As agent receives new percepts , its probability assessments are updated to reflect new Evidence.

Before the evidence is find , we talk about prior (unconditional) probability.

After the evidence is given , we deal with posterior (conditional ) probability.

Probability associated with a proposition (sentence) P is the degree of belief associated with it in the

absence of any other information.

• In AI applications, sample points are defined by set of random variables

– Random vars: boolean, discrete, continuous

Probability Distribution: With respect to some random variable we talk about the probabilities of all

possible outcomes of a random variable. E.g : Let weather is random variable , Given that :

P( weather = sunny) = 0.7 , P( weather = rainy) = 0.2 , P( weather = cloudy) = 0.08

P( weather = snowy ) = 0.02

Joint Probability Distribution: Joint probability distribution for a set of random variables gives the

probability of every atomic event on those random variables (i.e., every sample point).In this case

P(Weather, Cavity) can be given by a 4 × 2 matrix of values

This is known as Joint Probability Distribution of weather and cavity.

Weather = Sunny Rainy Cloudy Snowy

Cavity = True 0.144 0.02 0.016 0.02

Cavity = False 0.576 0.08 0.064 0.08

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

If a complete set of random variable is covered then it is called “ Full Joint Probability Distribution”.

Conditional Probability:

Definition of conditional probability: P(a∣b) = P(a ∧ b) | P(b) if P(b) ≠ 0 .

Product rule gives an alternative formulation: P(a ∧ b) = P(a∣b) . P(b) = P(b∣a)P(a) .

A general version holds for whole distributions, e.g., P(Weather, Cavity) = P(Weather ∣Cavity)P(Cavity)

Chain rule is derived by successive application of product rule: P(X1, . . . , Xn) = P(X1, . . . , Xn−1)

P(Xn∣X1, . . . , Xn−1) = P(X1, . . . , Xn−2) P(Xn−1∣X1, . . . , Xn−2) P(Xn∣X1, . . . , Xn−1) = . . . = ∏ n i =

1 P(Xi ∣X1, . . . , Xi−1) .

Applications of Probability theory in AI

Uncertainty in medical diagnosis

(i) Diseases produce symptoms (ii) In diagnosis, observed symptoms => disease ID

(iii) Uncertainties

• Symptoms may not occur

• Symptoms may not be reported

• Diagnostic tests not perfect

• False positive, false negative

• Uncertainty in medical decision-making

(iv) Physicians, patients must decide on treatments

(v) Treatments may not be successful

(vi)Treatments may have unpleasant side effects

Ques 23:Explain in detail Markov Model and its applications in Artificial Intelligence.

Ans. Markov Model:

Markov model is an un-précised model that is used in the systems that does not have any fixed

patterns of occurrence i.e. randomly changing systems.

Markov model is based upon the fact of having a random probability distribution or pattern that may

be analysed statistically but cannot be predicted precisely.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

In Markov model, it is assumed that the future states only depend upon the current states and not the

previously occurred states. In I order markov, current state depends only on just previous state. i.e.

Conditional probability is : P ( Xt | X0 : t-1) = P ( Xt | X t-1)

Set of states: { S1 S2 , S3 …. Sn }. Process moves from one state to another generating a sequence of

states.

Observable state sequence lead to a Markov Chain Model. Non Observable state leads to Hidden

Markov Models.

Transition Probability Matrix: Each time when a new state is reached the system is set to have

incremented one step ahead. Each step represents a time period which would result in another possible

state. Let Si is state I of environment for I = 1 , 2… n.

Conditional probability of moving from state Si to Sj = P ( Sj | Si ) = P ij, Si : current state , Sj : next

state. Pij = 0 if no transition takes place.

Transition Matrix : P =

[ 𝑃11 𝑃12…… . 𝑃1𝑚𝑃21 𝑃22…… . 𝑃2𝑚……………………… .……………………… .𝑃𝑚1 𝑃𝑚2 𝑃𝑚𝑚 ]

Markov chain property: probability of each subsequent state depends only on what was the

previous state: P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1) .

To define Markov model, the following probabilities have to be specified:

Transition probabilities: a ij = P ( Sj |Si) i.e. probability of transition from state i to j.

Initial Probabilities: ∏ = 𝑷 (𝑺𝒊)𝒊 , Calculation of conditional probabilities of state sequences

are given as below :

P ( Si1 , Si2 , …….Sik-1 , Sik) = P ( Sik | Si1 , Si2 ,……., Sik-1). P ( Si1 , Si2 , …… Sik-1)

= P ( Sik | Sik-1 ) . P ( Si1 , Si2 , ….. Sik-2)

= P ( Sik | Sik-1). P( Sik-1 | Sik-2)………..P ( Si2 | Si1) . P(Si).

There are four common Markov-Models:

(i)Markov Decision Models (ii) Markov Chains (iii) Hidden Markov Model (iv)Partially

observable Markov Decision Process

Example : Consider a Problem of weather conditions, Transition diagram is as given below :

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

•  Two states: { ‘ Rain’ and ‘ Dry’}

•  Transition probabilities: P(‘Rain ’|‘Rain’)=0.3 , P(‘Dry ’|‘Rain’)=0.7 , P(‘Rain ‘|’Dry’)=0.2,

P(‘ Dry ’|‘Dry’) =0.8

•  Initial probabilities: say P(‘Rain’) =0.4 , P(‘Dry’) = 0.6 . Suppose we want to calculate a probability of a

sequence of states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}.

P({‘Dry’, ’Dry’,’ Rain’, Rain’} ) = P(‘Rain ‘|’Rain’) P(‘Rain ’|’Dry’) P(‘Dry ‘|’Dry’) P(‘Dry’)

= 0.3*0.2*0.8*0.6 = 0.0288 ≈ 0.0

Ques 24 : Explain Hidden Markov Model and its applications in AI .

Ans : Hidden Markov Model(HMM)

Hidden Markov-Model is an temporal probabilistic model for which a single discontinuous random

variable determines all the states of the system. A Hidden Markov Model, is a stochastic model where the

states of the model are hidden. Each state can emit an output which is observed. This model is used

because simple markov chain is too restricted for complex applications.

It means that, possible values of variable = Possible states in the system.

For example: Sunlight can be the variable and Sun can be the only possible state.

To make markov model more flexible in HMM assumptions are made that the observations of

model are probabilistic function of each state.

Concept of Hidden Markov Model

Let Imagine , You were locked in a room for several days and you were asked about the weather outside.

The only piece of evidence you have is whether the person who comes into the room bringing your daily

meal is carrying an umbrella or not.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

What is hidden? Sunny, Rainy, Cloudy

What can you observe? Umbrella or Not

In Hidden Markov-Model, every individual state has limited number of transitions and emissions.

State sequences are not directly observable, rather it can be recognized from the sequence of

observations produced by the system.

Probability is assigned for each transition between states.

Hence, the past states are totally independent of future states.

The fact that HMM is called hidden because of its ability of being a memory less process i.e. its

future and past states are not dependent on each other.

This can be achieved on two algorithms called as:

(i) Forward Algorithm. (ii) Backward Algorithm.

Components of HMM :

Set of states: { S1 S2 , S3 …. Sn }.

Sequence of states generated by the system : { Si1 , Si2 , …….Sik-1 , Sik }

Joint probability Distribution by Markovian Chain :

P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1)

Observations / Visible states : { V1 , V2 , …Vm-1 , V m}

For HMM following probabilities are to be specified:

(a) Transition Probabilities: a ij = P ( Sj |Si) i.e. probability of transition from state i to j.

(b) Observation probability Matrix: B = ( bi ( Vm) ) , where bi ( Vm ) = P ( Vm | Si).

(c) Vector of initial probabilities : : ∏ = 𝑷 (𝑺𝒊)𝒊

Model is defined as : M = ( A , B , 𝝅).

Transient state: Process does not returns in this state.

Recurrent state: Initial State and process returns to it at last with probability = 1.

Absorbing state: If a process enters to a state and is destined to remain there forever , Then it is

called absorbing state.

Applications Of Hidden Markov Model

Speech Recognition.

Gesture Recognition.

Language Recognition.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Motion Sensing and Analysis.

Protein Folding.

Ques 25 : Consider the following data provided for Weather Forecasting Scenario.

Two states (Hidden) : ‘Low’ and ‘High’ atmospheric pressure.

Two observations (Visible States) : ‘Rain’ and ‘Dry’.

Suppose  we want to calculate a probability of a sequence of observations in our

example, { ‘Dry’,’ Rain’}.

Ans : Solution :

Transition probabilities:

P(‘Low’|‘Low’) = 0.3

P(‘High’|‘Low’) = 0.7,

P(‘Low ’|‘High’) = 0.2 ,

P(‘High ’|‘High’) = 0.8

Observation probabilities:

P(‘Rain ’|‘Low’) = 0.6

P(‘Dry ’|‘Low’) = 0.4

P(‘Rain ’|‘High’) = 0.4

P(‘Dry ’|‘High’) =0.3 .

Initial probabilities: say P(‘Low’) = 0.4 , P(‘High’) = 0.6 .

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Calculation of observation sequence probability

Consider all possible hidden state sequences:

P({‘Dry’, ’Rain’} ) = P({‘Dry’,’ Rain’} , {‘Low’, ‘Low’}) + P({‘Dry’,’ Rain’} ,

{‘Low’, ‘High’}) + P ({‘Dry’,’ Rain’} , {‘High’, ‘Low’}) +

P({‘Dry’,’ Rain’} , {‘High’, ‘High’})

Where first term is :

P ({‘Dry’,’ Rain’} , {‘Low’, ‘Low’}) = P({‘Dry’,’ Rain’} | {‘Low’, ‘Low’}) P({‘Low’, ‘Low’})

= P (‘Dry ‘|’Low’) . P (‘Rain ‘|’Low’) P (‘Low’) P (‘Low’|’Low)

= 0.4*0.4*0.6*0.4*0.3

Ques 26 : Explain in detail Bayesian Theory and its use in AI. Define Likelihood ratio.

Ans : In probabilistic reasoning our conclusions are generally based on available evidences and past

experience . This information is mostly incomplete. When outcomes are unpredictable we use probabilistic

reasoning, E.g Weather forecasting system, Disease Diagnosis, Traffic congestion control system.

When a doctor examines a patient’s history , symptoms , test rules , evidence of possible disease.

In weather fore casting prediction of tomorrow’s cloud coverage , wind speed and direction , sun

heat intensity.

A Business manager must take decision based on uncertain predictions , when to launch a new

product . Factors can be : Target consumer’s life style , population growth in specific city / state,

Average income of consumers, economic scenario of the country . All this can be depend on past

experience of market.

From the product rule of probability theory we express the following equations:

P ( a ∧ b ) = P(a ∣ b) . P( b ) ……….Eq 1.

P( a ∧ b ) = P( b ∣ a ) P( a ) …………Eq 2.

On Equating both the equations: P ( b | a ) = 𝑷(𝒂 |𝒃) 𝑷 (𝒃)

𝑷(𝒂)

Baye’s rule is used in modern AI systems for probabilistic inferences. It uses he notion of conditional

probability: P ( H | E ), This expression is read as “ The probability of hypothesis H given that we have

observed evidence E ”. For this we require prior probability H ( if we have no evidence) and extent to which

E provides evidence of H.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Baye’s theorem states : P ( Hi | E ) = 𝑷( 𝑬 |𝑯𝒊).𝑷(𝑯𝒊)

∑ 𝑷 (𝑬 |𝑯𝒏).𝑷(𝑯𝒏)𝑲𝒏=𝟏

Where , P ( Hi | E) = Probability that hypothesis Hi is true given evidence E.

P( E | Hi) = Probability that we will observe evidence E given that hypothesis Hi is true.

P( Hi) = Priori probability that Hi is true in absence of E.

K = No. of possible hypothesis.

Example : (i) If we know the prior probabilities of finding each of the various minerals and we know the

probabilities that if mineral is present then certain physical characteristics will be observed. So Baye’s rule

can be used to find likelihood of minerals to be present.

(ii) Let for solving a medical diagnosis problem :

S : patient has spots , F : Patient has high fever , M : Patient has measles.

Without any additional evidence , presence of spots serves as evidence in favour of measles. It also

Serves as evidence of fever measles would cause fever. But if patient has measles is already known.

Alternatively either spots or fever alone would constitute evidence in favour of measles.

Likelihood Ratio: This is also a conditional probability expression obtained from Baye’s Rule.

If probability P( E ) is difficult to obtain , then we can write as :

P ( ~ H | E ) = 𝑷( 𝑬 |~𝑯).𝑷(~𝑯)

𝑷( 𝑬 ) ……. Eq (i)

We have P ( H | E) = 𝑷( 𝑬 |𝑯).𝑷(𝑯)

𝑷( 𝑬 ) ……….Eq (ii)

On dividing Eq ( ii ) by Eq ( i) We get :

𝑷( 𝑯 | 𝑬)

𝑷(~𝑯 | 𝑬 ) =

𝑷( 𝑬 |𝑯).𝑷(𝑯)

𝑷( 𝑬 |~ 𝑯 ) 𝑷 ( ~ 𝑯) ………….Eq (iii)

This is Ratio of a probability of an event to the probability of its negation. Ratio is known as

“ ODDs of Event : O ( E)”. Ratio 𝑷( 𝑬 | 𝑯)

𝑷(𝑬 | ~𝑯) is known as Likelihood ratio w.r.t H = L (E/H)

Odds likelihood form of Baye’s Rule from Eq (iii) is : O ( H | E) = L ( E | H ) . O( H )

Disadvantages of Baye’s Theorem: For a complex problem , the size of joint probabilities that we

require to compute this function grows as 2 n if n different propositions are there.

Knowledge acquisition is difficult. Too many probabilities are needed.

Sapce for all probabilities is too large.

Computation terms of all probabilities are too large.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Ques 27 : What is Bayesian Network or Belief Network ? Explain its importance with the help of

an example.

Ans : To describe a real world , it is not necessary to use huge joint probability table in which the list of

probabilities of all possible outcomes is stored. To represent relationship between independent and

conditional independent variables a systematic approach in the form of a data structure called Bayesian

Network is used. It is also known as Causal network, Belief network , probabilistic network, Knowledge

Map. Extension of this is decision network or influence diagram.

“ A Bayesian network is a directed graph in which each node is attached with a quantitative probability

information”. This network is supported by CPT, known as conditional probability table. These are used

for representing knowledge in an Uncertain Domain

Belief network used to encode the meaningful dependence between variables.

1. Nodes represent random variables 2. Arcs represent direct influence

2. Nodes have conditional probability table that gives that variables probability given the

different states of its parents

The Semantics of Belief Networks

1. To construct network , think of as representing the joint probability distribution.

2. To infer from network , think of as representing conditional independence statements.

3. Calculate a member of the joint probability by multiplying individual conditional probabilities.

P(X1=x1, . . . Xn=xn) = P(X1=x1 | parents(X1)) * . . . * P( Xn=xn | parents (Xn) )

P (X1 , X2 , ….Xn , Xn-1) = ∏ 𝐏( 𝐗𝐢 |𝐩𝐚𝐫𝐞𝐧𝐭𝐬 (𝐗𝐢))𝐧𝐢=𝟏

To incrementally construct a network:

1. Decide on the variables

2. Decide on an ordering of them : The direct influences must be added to network first if they are to

become parents of the node they influence. So correct order in which to ad nodes is to add the Root

Causes first, then the variables they influence ans so on until we reach leaves( having no direct

causal influence on other variables).A node is conditionally independent of its non-descendants

given its parent. A node is conditionally independent of all other nodes innetwork given its parents ,

children and children’s parents.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

3. Do until no variables are left:

(a)Pick a variable and make a node for it

b) Set its parents to the minimal set of pre-existing nodes

(c) Define its conditional probability

Often, the resulting conditional probability tables are much smaller than the exponential size of the full

joint. Different tables may encode the same probabilities.

Some canonical distributions that appear in conditional probability tables:

(a) deterministic logical relationship (e.g. AND, OR)

(b) deterministic numeric relationship (e.g. MIN)

(c) parameteric relationship (e.g. weighted sum in neural net)

(d) noisy logical relationship (e.g. noisy-OR, noisy-MAX)

Inference in Belief Networks: agate beliefs. After constructing such a network an inference engine

can use it to maintain and propagate beliefs. When new information is received , the effects can be

propagated throughout the network , until equilibrium probabilities are reached.

(a) Diagnostic inference: symptoms to causes

(b) Causal inference: causes to symptoms

(c) Intercausal inference

(d) Mixed inference: mixes those above

Inference in Multiply Connected Belief Networks

(a)Multiply connected graphs have 2 nodes connected by more than one path

(b)Techniques for handling:

Clustering: Group some of the intermediate nodes into one meganode.

Pro: Perhaps best way to get exact evaluation.

Con: Conditional probability tables may exponentially increase in size.

Cutset conditioning: Obtain simpler polytrees by instantiating variables as constants.

Con: May obtain exponential number of simpler polytrees.

Pro: It may be safe to ignore trees with lo probability (bounded cutset

conditioning).

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

Stochastic simulation`: run thru the net with randomly choosen values for each node

(weighed by prior probabilities).

The probability of any atomic event (it's joint probability) can be gotten from the

network.

The correct order to add the nodes is "root causes" first, then the variables they influence

until we reach the "leaves", which have no direct causal influence on the other variables.

If we don't, the network will have : More links and less natural probabilities needed

Example: Scenario is about a new burglar alarm installed at home. It also

responds in minor earthquakes. Two neighbors John and Mary are always available in

case of any emergency. John always calls when he hears alarms but sometimes confuses

with telephone ring. Mary likes loud music and sometimes misses to hear the alarm

sound. The probabilities actually summarize a potentially infinite set of circumstances in

which the alarm might fail to go off ( E.g : High humidity , power failure , dead battery ,

cut wires , a dead mouse stuck inside the bell etc.) OR ( John or Mary might fail to

call and report it due to out for lunch , on vacations , temporarily deaf , passing of

airplane near the home etc.

Joint Probability Distribution is : P ( Burglary | alarm , JohnCalls , MarCalls) = P( Burglary | Alarm).

So only Alarm as a parent is needed.

www.uptunotes.com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna Assitant Professor(KIOT,Kanpur)

[ END OF 3rd UNIT ]

www.uptunotes.com (Artificial Intelligence)UNIT-3

Anuj Khanna, Assistant Professor ( CSE Deptt)