CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting...

26
CSc411 CSc411 Artificial Intelligence Artificial Intelligence 1 Chapter 5 STOCHASTIC METHODS Contents Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic Methodology Bayes’ Theorem
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    0

Transcript of CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting...

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 11

Chapter 5

STOCHASTIC METHODS

ContentsContents• The Elements of Counting• Elements of Probability Theory• Applications of the Stochastic

Methodology• Bayes’ Theorem

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 22

Application AreasApplication AreasDiagnostic Reasoning. In medical diagnosis, for example, there is not always an obvious cause/effect relationship between the set of symptoms presented by the patient and the causes of these symptoms. In fact, the same sets of symptoms often suggest multiple possible causes.Natural language understanding. If a computer is to understand and use a human language, that computer must be able to characterize how humans themselves use that language. Words, expressions, and metaphors are learned, but also change and evolve as they are used over time.Planning and scheduling. When an agent forms a plan, for example, a vacation trip by automobile, it is often the case that no deterministic sequence of operations is guaranteed to succeed. What happens if the car breaks down, if the car ferry is cancelled on a specific day, if a hotel is fully booked, even though a reservation was made?Learning. The three previous areas mentioned for stochastic technology can also be seen as domains for automated learning. An important component of many stochastic systems is that they have the ability to sample situations and learn over time.

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 33

Set OperationsSet OperationsLet A and B are two sets, U universeLet A and B are two sets, U universe– Cardinality |A|: number of elements in ACardinality |A|: number of elements in A– Complement Ā: all elements in U but not in AComplement Ā: all elements in U but not in A– Subset: A Subset: A B B– Empty set: Empty set: – Union: A Union: A B B– Intersection: A Intersection: A B B– Difference: A - BDifference: A - B

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 44

Addition RulesAddition RulesThe Addition rule for combining two sets:

The Addition rule for combining three sets:

This Addition rule may be generalized to any finite number of sets

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 55

• The Cartesian Product of two sets A and B

• The multiplication principle of counting, for two sets

5

Multiplication RulesMultiplication Rules

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 66

• The permutations of a set of n elements taken r at a time

• The combinations of a set of n elements taken r at a time

Permutations and CombinationsPermutations and Combinations

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 77

Events and ProbabilityEvents and Probability

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 88

• The probability of any event E from the sample space S is:

• The sum of the probabilities of all possible outcomes is 1

• The probability of the compliment of an event is

• The probability of the contradictory or false outcome of an event

Probability PropertiesProbability Properties

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 99

Independent EventsIndependent Events

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1010

The Kolmogorov AxiomsThree Kolmogorov Axioms:

From these three Kolmogorov axioms, all of probability theory can be constructed.

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1111

Traffic ExampleTraffic ExampleProblem descriptionProblem descriptionA driver realizes the gradual slowdown and A driver realizes the gradual slowdown and

searches for possible explanation by means of searches for possible explanation by means of car-based download systemcar-based download system

– Road construction?Road construction?– Accident?Accident?

Three Boolean parametersThree Boolean parameters– S: whether slowdownS: whether slowdown– A: whether accidentA: whether accident– C: whether road constructionC: whether road construction

Download data – next pageDownload data – next page

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1212

The joint probability distribution for the traffic slowdown, S, accident, A, and construction, C, variable of the example

A Venn diagram representation of the probability distributions is traffic slowdown, A is accident, C is construction.

• Download data:

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1313

VariablesVariables

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1414

ExpectationExpectation

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1515

Prior and Posterior ProbabilityPrior and Posterior Probability

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1616

A Venn diagram illustrating the calculations of P(d|s) as a function of p(s|d).

Conditional ProbabilityConditional Probability

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1717

The chain rule for two sets:

The generalization of the chain rule to multiple sets

We make an inductive argument to prove the chain rule, consider the nth case:

We apply the intersection of two sets of rules to get:

And then reduce again, considering that:

Until is reached, the base case, which we have already demonstrated.

Chain RulesChain Rules

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1818

Independent EventsIndependent Events

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 1919

Probabilistic FSMProbabilistic FSM

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2020

A probabilistic finite state acceptor for the pronunciation of “tomato”.

Probabilistic Finite State AcceptorProbabilistic Finite State Acceptor

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2121

The ni words with their frequencies and probabilities from the Brown and Switchboard corpora of 2.5M words.

The ni words

The ni phone/word probabilities from the Brown and Switchboard corpora.

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2222

• Given a set of evidence E, and a set of hypotheses H ={hi}

The conditional probability of hi given E is:

p(hi|E) = (p(E|hi) h(hi))/p(E)

• Maximum a posteriori hypothesis (most probable hypothesis), since p(E)is a constant for all hypotheses

arg max(hi) p(E|hi)p(hi)

• E is partitioned by all hypotheses, thus

p(E) = i p(E|hi)p(hi)

Bayes’ Rules

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2323

The general form of Bayes’ theorem where we assume the set of hypotheses H partition the evidence set E:

General Form of Bayes’ Theorem

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2424

• Used in PROSPECTOR

• A simple example: suppose to purchase an automobile:

Applications of Bayes’ Theorem

DealersDealers Go to probabilityGo to probability Purchase a1 Purchase a1 probabilityprobability

11 d1 = 0.2d1 = 0.2 p1 = 0.2p1 = 0.2

22 d2 = 0.4d2 = 0.4 p2 = 0.4p2 = 0.4

33 d3 = 0.4d3 = 0.4 p3 = 0.3p3 = 0.3

• The application of Bayes’ rule to the car purchase problem:

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2525

• Naïve Bayes, or the Bayes classifier, that uses the partition assumption, even when it is not justified:.

• Assume all evidences are independent, given a particular hypothesis

Bayes Classifier

CSc411CSc411 Artificial IntelligenceArtificial Intelligence 2626

The Bayesian representation of the traffic problem with potential explanations.

The joint probability distribution for the traffic and construction variables

The Traffic Problem

Given bad traffic, what is the probability of road construction?

p(C|T)=p(C=t, T=t)/(p(C=t, T=t)+p(C=f, T=t))=.3/(.3+.1)=.75