K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers...

K‐Nearest‐Neighbor Classifiers

Reference: The Elements of Statistical Learning,by T. Hastie, R. Tibshirani, J. Friedman, Springer

1

K‐Nearest‐Neighbor ClassifiersFramework• Classifiers:

‐memory‐based‐ require no model to be fit

• Given a query point find the training points closest in distance to classify using majority vote among the neighbors

2

K‐Nearest‐Neighbor ClassifiersFramework• Successful in large number of classification

problems:handwritten digits, satellite number image scenes and EKG patterns

• Each class has many possible prototypes• Decision boundary is very irregular

3

K‐Nearest‐Neighbor ClassifiersFramework• The decision boundary of a

15‐nearest‐neighbor classifier applied to the three‐class simulated data

• Decision boundary is fairly smooth compared to the lower panel (1‐nearest‐neigbor classifier)

4

K‐Nearest‐Neighbor ClassifiersFramework• Close relationship between

nearest‐neighbor and prototype methods:

in 1‐nearest‐neghbor classification, each training point is a prototype

5

K‐Nearest‐Neighbor ClassifiersExample• STATLOG project: used part of a LANDSAT image

as a benchmark for classification

6

Four heat‐map images, two in the visible spectrum and two in the infrared, for an area of agricultural land in Australia

K‐Nearest‐Neighbor ClassifiersExample• Each pixel has a class label from the 7‐element

set }

• Determined manually by research assistants surveying the area

• Lower middle panel ‐ shows actual land usage‐ shaded by different colors to indicate the classes

7

K‐Nearest‐Neighbor ClassifiersExample• Objective:

classify the land usage at a pixel, based on the information in the four spectral bands

• Extracted an 8‐neighbor feature map – the pixel itself and its 8 immediate neighbors

8

• Done separately in four spectral bands, giving input features

• Five‐nearest‐neighbors classification was carried out in this 36‐dimensional feature space

K‐Nearest‐Neighbor ClassifiersExample

9

• Produced the predicted map shown in the bottom right panel

K‐Nearest‐Neighbor ClassifiersExample• Resulting test error rate • Among all the methods used in the STATLOG

project, k‐nearest‐neighbors performed best decision boundaries in are quite irregular

10

Adaptive Nearest‐Neighbor

• In a high‐dimensional space, nearest neighbors can be very far away.

• Consider Fig. 13.13, a nearest‐neighborhood is depicted by the circular region.

11

• Class probabilities vary only in the horizontal direction.• If we knew, we would stretch

neighborhood in vertical direction• Need adaptive metric

• Stretch out in directions for which class probabilities don’t change much


• At query point a neighborhood of say 50 points is formed, and the class distribution is used to decide how to adapt the metric• Thus each query point has a different metric

• Neighborhood should be stretched in direction orthogonal to line joining the class centroids

• This direction coincides with the linear discriminant boundary• The direction in which the class probabilities

change the least.

12


• Neighborhoods stretch out orthogonally to the decision boundaries when both classes are present

13

• Discriminant Adaptive Nearest‐Neighbor (DANN)

Adaptive Nearest‐NeighborDANN• Assuming a local discriminant model, information

contained in the local within‐ and between‐class covariance matrices is all that is needed.

• The discriminant adaptive nearest‐neighbor (DANN) metric at a query point is defined by:

where the metric is defined as:⁄ ⁄ ⁄ ⁄

is pooled within‐class covariance matrix ∑is between class covariance matrix ∑ ̅ ̅ ̅ ̅and computed using 50 nearest neighbors around

14

Adaptive Nearest‐NeighborDANN• The parameter rounds the neighborhood, from an

infinite strip to an ellipsoid.• seems to work well• In pure regions with only one class, the

neighborhoods remain circular. Then we obtain:

is the identity matrix due to the property:

15

Computational Considerations• One drawback of nearest‐neighbor algorithms is

the computational load.

• With observations and predictors (attributes), nearest‐neighbor classification requires operations to find the neighbors per query point.

• There are fast algorithms for finding nearest‐neighbors

16

Computational Considerations• Reducing the storage requirements is more

difficult.• Various editing and condensing procedures have

been proposed.• The idea is to isolate a subset of the training set

that suffices for nearest‐neighbor predictions, and throw away the remaining training data.• The multi‐edit algorithm divides the data cyclically

into training and test sets, computing a nearest neighbor rule on the training set and deleting test points that are misclassified.

• The condensing procedure goes further, trying to keep only important exterior points of these clusters. 17

K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers...

Documents

Transcript of K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers...