Last lecture summary Naïve Bayes Classifier

34
Last lecture summary Naïve Bayes Classifier

description

Last lecture summary Naïve Bayes Classifier. Bayes Rule. Prior and likelihood must be learnt (i.e. estimated from the data). Likelihood. Prior. Posterior. Normalization Constant. learning prior - PowerPoint PPT Presentation

Transcript of Last lecture summary Naïve Bayes Classifier

Page 1: Last lecture summary Naïve Bayes Classifier

Last lecture summaryNaïve Bayes Classifier

Page 2: Last lecture summary Naïve Bayes Classifier

Bayes Rule

Normalization Constant

Likelihood PriorPosterior

Prior and likelihood must be learnt (i.e. estimated from the data)

Page 3: Last lecture summary Naïve Bayes Classifier

• learning prior– A hundred independently drawn training examples

will usually suffice to obtain a reasonable estimate of P(Y).

• larning likelihood– The Naïve Bayes Assumption: Assume that all

features are independent given the class label Y.

𝑃 (𝑋 1 ,…, 𝑋𝑛|𝑌 )=∏𝑖=1

𝑛

𝑃 (𝑋 𝑖∨𝑌 )

Page 4: Last lecture summary Naïve Bayes Classifier

Example – Play Tennis

Page 5: Last lecture summary Naïve Bayes Classifier

Example – Learning Phase

Outlook Play=Yes Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

Page 6: Last lecture summary Naïve Bayes Classifier

Example - Predictionx’=(Outl=Sunny, Temp=Cool, Hum=High, Wind=Strong)

Look up tables

P(Outl=Sunny|Play=No) = 3/5P(Temp=Cool|Play=No) = 1/5P(Hum=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14

P(Outl=Sunny|Play=Yes) = 2/9P(Temp=Cool|Play=Yes) = 3/9P(Hum=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Page 7: Last lecture summary Naïve Bayes Classifier

Last lecture summaryBinary classifier performance

Page 8: Last lecture summary Naïve Bayes Classifier

TP, TN, FP, FN

Precision, Positive Predictive Value (PPV) TP / (TP + FP)

Recall, Sensitivity, True Positive Rate (TPR), Hit rate TP / P = TP/(TP + FN)

False Positive Rate (FPR), Fall-out FP / N = FP / (FP + TN)

Specificity, True Negative Rate (TNR) TN / (TN + FP) = 1 - FPR

Accuracy (TP + TN) / (TP + TN + FP + FN)

Page 9: Last lecture summary Naïve Bayes Classifier
Page 10: Last lecture summary Naïve Bayes Classifier

Neural networks(new stuff)

Page 11: Last lecture summary Naïve Bayes Classifier

Biological motivation

• The human brain has been estimated to contain (~1011) brain cells (neurons).

• A neuron is an electrically excitable cell that processes and transmits information by electrochemical signaling.

• Each neuron is connected with other neurons through the connections called synapses.

• A typical neuron possesses a cell body (often called soma), dendrites (many, mm), and an axon (one, 10 cm – 1 m).

Page 12: Last lecture summary Naïve Bayes Classifier
Page 13: Last lecture summary Naïve Bayes Classifier

• Synapse permits a neuron to pass an electrical or chemical signal to another cell.

• Synapse can be either excitatory, or inhibitory.• Synapses are of different strength (the stronger

the synapse is, the more important it is).• The effects of synapses cumulate inside the

neuron.• When the cumulative effect of synapses reaches

certain threshold, the neuron gets activated, the signal is sent to the axon, through which the neuron is connected to other neuron(s).

Page 14: Last lecture summary Naïve Bayes Classifier

• Simplistic view of the function of neuron– Neuron accumulates positive/negative stimuli

from other neurons.– Then is processed further – – to produce an

output, i.e. neuron sends an output signal to neurons connected to it.

Page 15: Last lecture summary Naïve Bayes Classifier

Neural networks for applied science and engineering, Samarasinghe

Page 16: Last lecture summary Naïve Bayes Classifier

Warren McCulloch Walter Pitts1899 - 1969 1923 - 1969

Threshold neuron

Page 17: Last lecture summary Naïve Bayes Classifier

• 1st mathematical model of neuron – McCulloch & Pitts binary (threshold) neuron– only binary inputs and output– the weights are pre-set, no learning

x1 x2 t

0.2 0.3 0

0.2 0.8 0

0.8 0.2 0

1.0 0.8 1

– inputs – weights – activation (tansfer) function - output

Page 18: Last lecture summary Naïve Bayes Classifier

• In this exercise, both weights will be fixed

• When the target is classified as 0 and when as 1?

• Set the threshold. – If threshold, then it is classified as 1. – If threshold, then it is classified as 0.

• Which threshold would you use?– e.g.

2

1 1 2 2 1 21

. j jj

w x w x w x x x

w x

x1 x2 t

0.2 0.3 0

0.2 0.8 0

0.8 0.2 0

1.0 0.8 1

Page 19: Last lecture summary Naïve Bayes Classifier

Heavyside (threshold) activation function

Page 20: Last lecture summary Naïve Bayes Classifier

• Threshold is incorporated as a weight of one additional input with input value .

• Such input is called bias.

2

0 1 1 2 20

1.0j jj

w x w w x w x

Page 21: Last lecture summary Naïve Bayes Classifier

• Because the location of the threshold function defines the two categories, its value of 1.3 decides a classification boundary that can be formulated as

Page 22: Last lecture summary Naïve Bayes Classifier

Perceptron (1957)

Frank Rosenblatt

Developed the learning algorithm.

Used his neuron (pattern recognizer = perceptron) for classification of letters.

Page 23: Last lecture summary Naïve Bayes Classifier

• binary classifier, maps its input x (real-valued vector) to – a binary value (0 or 1)

• (including bias)• 0 … otherwise

• perceptron can adjust its weights (i.e. can learn) – perceptron learning algorithm

Page 24: Last lecture summary Naïve Bayes Classifier

Multiple output perceptron• for multicategory (i.e. more than 2 classes) classification• one output neuron for each class

input layer

output layer

single layer (one-layered)vs.

double layer (two-layered)

Page 25: Last lecture summary Naïve Bayes Classifier

Learning

• Set the weights (including threshold ).• Supervised learning, we know the target

values .• We want the outputs to be as close as

possible to the desired values of . • We define an error (Sum of Squares Error, we

already know this one)

Page 26: Last lecture summary Naïve Bayes Classifier

• “ to be as close as possible to ” means that shoud be minimal

• So we want to minimize , which is the function of weights .– is also called objective function or sometimes

energy.

Page 27: Last lecture summary Naïve Bayes Classifier

2

0 0i i j

E Ew w w

requirements for the minimum

Gradient grad is a vector pointing in the direction of the greatest rate of increase of the function

We want to decline, we take -grad.

1 2

grad ,E EE Ew w

Page 28: Last lecture summary Naïve Bayes Classifier

Delta rule

• gradient descent• How to train linear neuron using delta rule?• Demonstration will be given for one neuron

with one input , no bias, one output .

Σ 𝑦𝑥 𝑤1

Page 29: Last lecture summary Naïve Bayes Classifier

• Neuron is presented with an input pattern.• It calculates , and its outuput as (no threshold

is used)• The error E:

• If you draw against , which curve you get?

erro

r gradient

Page 30: Last lecture summary Naïve Bayes Classifier

• To find a gradient , differentiate the error E with respect to w1:

• According to the delta rule, weight change is proportional to the negative of the error gradient:

• New weight:

1

d 2d 2E t y x t y x xw

1w x

1 1 1 1new old oldw w w w x

𝐸=12

(𝑡 – 𝑦 )2=12(𝑡 –𝑤1𝑥 )2

𝑑𝐸𝑑𝑤1

=?

Page 31: Last lecture summary Naïve Bayes Classifier

β is called a learning rate. It determines how far along the gradient it is necessary to move.

Page 32: Last lecture summary Naïve Bayes Classifier

11 1 1 1i i iw w w w x the new weight after ith iteration

Page 33: Last lecture summary Naïve Bayes Classifier

• This is an iterative algorithm, one pass through training set is not enough.

• One pass of the whole training data set is called an epoch.

• Adjusting the weights after each input pattern presentation (iteration) is called example-by-example (online) learning.– For some problems this can cause weights to

oscillate – adjustment required by one pattern may be canceled by the next pattern.

– More popular is the next method.

Page 34: Last lecture summary Naïve Bayes Classifier

• Batch learning – wait until all input patterns (i.e. epoch) have been processed and then adjust weights in the average sense.– More stable solution.– Obtain the error gradient for each input pattern– Average them at the end of the epoch– Use this average value to adjust the weights using

the delta rule

11

1 n

i ii

w xn