KNN Classifier. Handed an instance you wish to classify Look around the nearby region to see what...

26
INSTANCE BASED APPROACH KNN Classifier

description

 Assign the most common class among the K-nearest neighbors (like a vote) 8/29/033Instance Based Classification

Transcript of KNN Classifier. Handed an instance you wish to classify Look around the nearby region to see what...

Page 1: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

INSTANCE BASED APPROACH

KNN Classifier

Page 2: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 2

Simple classification technique

Handed an instance you wish to classify

Look around the nearby region to see what other classes are around

Whichever is most common—make that the prediction

8/29/03

0 5 100

510

Two Classes

X

Y

Page 3: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 3

K-nearest neighbor Assign the most common class among

the K-nearest neighbors (like a vote)

8/29/03

KNN CLASSIFIER

Page 4: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 4

How Train?

8/29/03

Don’t

Page 5: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 5

Let’s get specific Train

Load training data Classify

Read in instance Find K-nearest neighbors in the training data Assign the most common class among the

K-nearest neighbors (like a vote)

8/29/03

𝑑(𝑥 𝑖 ,𝑥 𝑗)≡√∑𝑟=1

𝑛

(𝑎𝑟 (𝑥 𝑖 )−𝑎𝑟 (𝑥 𝑗 ))¿2¿Euclidean distance: a is an attribute (dimension)

Page 6: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 6

𝑑(𝑥 𝑖 ,𝑥 𝑗)≡√∑𝑟=1

𝑛

(𝑎𝑟 (𝑥 𝑖 )−𝑎𝑟 (𝑥 𝑗 ))¿2¿

How find nearest neighbors

Naïve approach: exhaustive For the instance to be classified

Visit every training sample and calculate distance

Sort First K in the list

8/29/03

Euclidean distance: a is an attribute (dimension)

Voting Formula

Where is ’s class, if ; 0 otherwise

Page 7: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 7

Classifying a lot of work The Work that Must be Performed

Visit every training sample and calculate distance

Sort Lots of floating point calculations Classifier puts-off work till time to classify

8/29/03

Euclidean distance: a is an attribute (dimension)

𝑑(𝑥 𝑖 ,𝑥 𝑗)≡√∑𝑟=1

𝑛

(𝑎𝑟 (𝑥 𝑖 )−𝑎𝑟 (𝑥 𝑗 ))¿2¿

Page 8: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 8

Lazy This is known as a “lazy” learning method If do most of the work during the training stage

known as “eager” Our next classifier, Naïve Bayes, will be eager

Training takes a while but can classify fast Which do you think is better?

8/29/03

Lazy vs. Eager

Training or ClassifyingWhere the work happens

Page 9: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 9

Book mentions KD-TreeFrom Wikipedia: space‑partitioning data structure for organizing points in a k‑dimensional space. kd‑trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). kd-trees are a special case of BSP trees.

8/29/03

Page 10: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 10

If use some data structure …

Speeds up classification

Probably slows “training”

8/29/03

Page 11: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 11

How choose K? Choosing K can be a bit of an art What if you could include all data-points

(K=n)? How might you do such a thing?

8/29/03

How include all data points?What if weighted the votes of each training sample by its distance from the point being classified?

Weighted Voting Formula

Where , and is “1” if it is a member of class (i.e. where returns the class of )

Page 12: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Weight Curve

1 over distance squared

Could get less fancy and go linear But then training

data very-far-away still have strong influence

8/29/03 Instance Based Classification 12

0 20 40 60 80 100

020

4060

80100

Distance

Weight

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Distance

Weight

Page 13: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 13

Could go more fancy Other Radial Basis

Functions Sometimes known as a

Kernel Function One of the more

common

8/29/03

𝐾 (𝑑 (𝑥 , 𝑥𝑡 ))= 1𝑒 √2𝜋

𝑒−(𝑥−𝜇)2 /2𝜎 2

-4 -2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

Distance

Weight

Page 14: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 14

Issues Work back-loaded

Worse the bigger the training data Can alleviate with data structures

What else?

8/29/03

Other Issues?What if only some dimensions contribute to ability to classify? Differences in other dimensions would put distance between that point and the target.

Page 15: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 15

Curse of dimensionality Book calls this the curse of dimensionality More is not always better Might be identical in important dimensions but

distant in others

8/29/03

From Wikipedia:In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space.

For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.)

Page 16: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 16

Gene expression data Thousands of genes Relatively few patients Is there a curse?

8/29/03

gene

patient

g1 g2 g3 … gndisea

se

p1 x1,1 x1,2 x1,3 … x1,n Yp2 x2,1 x2,2 x2,3 … x2,n N

.

.

.

.

.

.

pm xm,1 xm,2 xm,3 … xm,n ?

Page 17: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 17

Can it classify discrete data?

Bayesian could Think of discrete data as being pre-binned Remember RNA classification

Data in each dimension was A, C, U, or G

8/29/03

How measure distance?A might be closer to G than C or U (A and G are both purines while C and U are pyrimidines). Dimensional distance becomes domain specific.

Representation becomes all importantIf could arrange appropriately could use techniques like Hamming distances

Page 18: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 18

Another issue

Redness Yellowness Mass Volume Class4.816472 2.347954 125.5082 25.01441 apple2.036318 4.879481 125.8775 18.2101 lemon2.767383 3.353061 109.9687 33.53737 orange4.327248 3.322961 118.4266 19.07535 peach2.96197 4.124945 159.2573 29.00904 orange5.655719 1.706671 147.0695 39.30565 apple

8/29/03

First few records in the training data See any issues? Hint: think of how Euclidean distance is

calculatedShould really normalize the data

For each entry in a dimension

Page 19: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 19

Other uses of instance based approaches

Function approximation Real valued prediction: take average of

nearest k neighbors

If don’t know the function and/or it is too complex to “learn”, just plug-in a new value the KNN classifier can “learn” the predicted value on the fly by averaging the nearest neighbors8/29/03

�̂� (𝑥𝑞)←∑𝑖=1

𝑘

𝑓 (𝑥𝑖)

𝑘-10 -5 0 5

-10

-50

510

X

Y

Why average?

Page 20: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 20

Regression Choose an m and b that minimizes the

squared error But again,

computationallyHow?

8/29/03

-10 -5 0 5

-10

-50

510

X

Y

m and b that minimize

Page 21: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 21

Other things that can be learned

If want to learn an instantaneous slope Can do local regression Get the slope of a line that fits just the

local data

8/29/03

-10 -5 0 5 10

-3000

-2000

-1000

01000

2000

3000

X

Y

Page 22: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 22

The How: Big Picture For each of the training datum

we know what Y should be If we have a randomly

generated m and b, these, along with X will tell us a predicted Y

Know whether the m and b yield too large or too small a prediction

Can nudge “m” and “b” in an appropriate direction (+ or -)

Sum these proposed nudges across all training data8/29/03

∆𝑚∆ 𝑏

Target Y too low

Line represents output or predicted Y

Page 23: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 23

Gradient Descent Which way

should m go to reduce error?

8/29/03

y actual

y actual

𝑦 𝑝𝑟𝑒𝑑=𝑚𝑔𝑢𝑒𝑠𝑠 𝑥+𝑏𝑔𝑢𝑒𝑠𝑠

Rise

b

𝑦 𝑝𝑟𝑒𝑑− 𝑦𝑎𝑐𝑡

Could Average

Then do same for bThen do again

Page 24: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 24

Back to why we went down this road

Locally weighted linear regression Would still perform gradient descent Becomes a global function approximation

8/29/03

-10 -5 0 5 10

-3000

-2000

-1000

01000

2000

3000

X

Y

�̂� (𝑥 )=𝑤0+𝑤1𝑎1 (𝑥 )+…+𝑤𝑛𝑎𝑛 (𝑥)

Page 25: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 25

Summary KNN highly effective for many practical

problems With sufficient training data

Robust to noisy training Work back-loaded Susceptible to dimensionality curse

8/29/03

Page 26: KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Instance Based Classification 268/29/03