Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

31
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves

Transcript of Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Page 1: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Bayesian Decision Theory

Basic Concepts

Discriminant Functions

The Normal Density

ROC Curves

Page 2: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 1.1

Page 3: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Basic Concepts

Definitions.

State of nature.Let c denote the state of nature. c is a random variable. (e.g., c = c1 for sea bass or c = c2 for salmon )

A priori probabilities.Let P(c1) and P(c2) denote the a priori probability of c1 and c2 respectively.

We know P(c1) + P(c2) = 1.

Decision rule.Decide c1 if P(c1) > P(c2); otherwise choose c2.

But we know there will be many mistakes ….

Page 4: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Basic Concepts

Class-conditional probability density function.

Let x be a continuous random variable. p(x|c) is the probability density for x given the state of nature c.

For example, what is the probability of lightness given that theclass is salmon?

p(lightness | salmon) ?

Or what is the probability of lightness given that the class is sea bass?

P(lightness | sea bass) ?

Page 5: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.1

Page 6: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Bayes Formula

How do we combine a priori and class-conditional probabilities to know the probability of a state of nature?

Bayes Formula.

P(cj | x) = p(x|cj) P(cj) / p(x)

posteriorprobability

likelihood

evidence

prior probability

Bayes Decision:Choose c1 if P(c1|x) > P(c2|x); otherwise choose c2.

Page 7: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.2

Page 8: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Simplifying Bayes Rule

Bayes Formula.

P(cj | x) = p(x|cj) P(cj) / p(x)

The evidence p(x) is the same for all states or classes sowe can dispense with it to make a decision.

Rule: Choose c1 if p(x|c1)P(c1) > p(x|c2)P(c2); otherwise decide c2

Observations:• If p(x|c1) = p(x|c2) then the decision depends on the prior probabilities only.• If P(c1) = P(c2) then the decision depends on the likelihoods only.

Page 9: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Thomas Bayes

Born in London (1701). Studied logic and theology (Univ. of Edinburgh).Fellow of the Royal Society (year 1742).

Given white and black balls in an urn, what is the prob. of drawing one or the other?

Given one or more balls, what can be said about the number of balls in the urn?

Page 10: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Minimizing Error

What is the probability of error?

P(error | x) = P(c1|x) if we decide c2 P(c2|x) if we decide c1

Does Bayes rule minimize the probability of error?

P(error) = ∫ P(error,x) dx = ∫ P(error|x) p(x) dx

and if for every x we minimize the error then P(error|x) is as small as it can be.

Answer is “yes”.

Page 11: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Loss Function

Let {c1,c2, …} be the possible states of nature.Let {a1, a2, …} be the possible actions.Loss function λ(ai|cj) is the loss incurred for taking action ai when the state of nature is cj.

Suppose we observe a particular x and think about taking action ai. If the true state is wj, the loss will be λ(ai|cj).

Expected lossR(ai|x) = Σj λ(ai|cj) P(cj|x) (this is also called the conditional risk.)

Decision: Select the action that minimizes the conditional risk ( ** best possible performance ** )

Page 12: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Two Categories

How does this apply when we only have two categories?Assume λij = λ (ai|cj).

R(a1|x) = λ11 P(c1|x) + λ12 P(c2|x)R(a2|x) = λ21 P(c1|x) + λ22 P(c2|x)

Decide a1 if

(λ21 - λ11) P(c1|x) > (λ12 - λ22) P(c2|x)

Page 13: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Two Categories

Another way of stating this is as follows.

Decide a1 if

P(x|c1) / P(x|c2) > [λ12 - λ22 / λ21 - λ11] P(c2) / P(c1)

Stated as:

Choose a1 if the likelihood ratio exceeds a threshold value independent of the observation x.

likelihood ratio constant

Page 14: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.3

Page 15: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Zero-One Loss

We will normally be concerned with the symmetrical or zero-one loss function:

λ (ai|cj) = 0 if i = j 1 if i = j

In this case the conditional risk is:

R(ai|x) = Σj λ(ai|cj) P(cj|x)

= 1 – P(ci|x)

Page 16: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Exercise

This question refers to Bayes's classifiers. Let {C1,C2, …, Cn} be the possible states of nature or classes. Let {a1, a2, …, ak} be the possible actions. We define a loss function λ(ai|Cj) as the loss incurred for taking action ai when the state of nature is Cj. Let's define a new loss function as follows:

λ(ai|Cj) = 0 if i = j andλ(ai|Cj) = 3 if i ≠ j

Assume a uniform distribution on the posterior probability P(Cj|x) (i.e., all posterior probabilities have the same probability).

What is the expected risk R(ai | x) ?

Page 17: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Exercise

R(ai|x) = Σj λ(ai|Cj) P(Cj|x)= Σj λ(ai|Cj) 1/n

= 1/n 3(n-1) = 3 - 3/n = 3(1 - 1/n)

Page 18: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Discriminant Functions

How do we represent pattern classifiers?

The most common way is through discriminant functions.Remember we use {c1,c2, …} to be the possible states of nature.

For each class we create a discriminant function gi(x).

The classifier assigns class ci if

gi(x) > gj(x) for all j = i

Our classifier is a network or machine that computes c discriminant functions.

Page 19: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.5

Page 20: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Simplifying the Discriminant Functions

Notice the decision is the same if we change every gi(x) for f(gi(x)) Assuming f(.) is a monotonically increasing function.

Alternatives:

• gi(x) = P(ci|x)• gi(x) = p(x|ci) P(ci)• gi(x) = ln p(x|ci) + ln P(ci) ( where ln is the natural logarithm)

The net effect is to divide the feature space into c regions (one for each class). We then have c decision regions separated by decision boundaries.

Page 21: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.6

Page 22: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Normal Density

Recall that using Bayes rule a discriminant function needs two probabilities:

gi(x) = p(x|ci) P(ci)

How do we model the likelihood or class-conditional probability?

The most common approach is to assume a multivariate normal density.

p(x) = 1 / σ (2Π)0.5 e ( -1/2 [(x – u) / σ]2 )

u = E [ x ] = ∫ x p(x) dx

σ 2 = E [ (x – u)2 ] = ∫ (x – u)2 p(x) dx

Page 23: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.7

Page 24: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Normal Density

If features are statistically independent and the variance is the samefor all features, the discriminant function is simple and is linear in nature.

A classifier that uses linear discriminant functions is called a linear machine.

The decision surface are pieces of hyperplanes defined by linear equations.

Page 25: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.10

Page 26: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.11

Page 27: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Signal Detection Theory

Suppose we want to detect a single pulse from a signal.

We assume the signal has some random noise.

When the signal is present we observe a normal distribution with mean u2.When the signal is not present we observe a normal distribution with mean u1.We assume same standard deviation.

Can we measure the discriminability of the problem?Can we do this independent of the threshold x*?

Discriminability:

d’ = | u2 – u1 | / σ

Page 28: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.19

Page 29: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Signal Detection Theory

How do we find d’ if we do not know u1, u2, or x*?

From the data we can compute:

1. P( x > x* | c2) a hit.

2. P( x > x* | c1) a false alarm.

3. P( x < x* | c2) a miss.

4. P( x < x* | c1) a correct rejection.

If we plot a point in a space representing hit and false alarm rates,then we have a ROC (receiver operating characteristic) curve.

With it we can distinguish between discriminability and bias.

Page 30: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Figure 2.20

Page 31: Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Points to Remember

Bayes formula and Bayes decision theory

Loss functions (e.g., zero-one loss)

Discriminant functions

ROC curves