K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers...

17
KNearestNeighbor Classifiers Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1

Transcript of K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers...

Page 1: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor Classifiers

Reference: The Elements of Statistical Learning,by T. Hastie, R. Tibshirani, J. Friedman, Springer

1

Page 2: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersFramework• Classifiers:

‐memory‐based‐ require no model to be fit

• Given a query point  find the  training points closest in distance to  classify using majority vote among the neighbors

2

Page 3: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersFramework• Successful in large number of classification 

problems:handwritten digits, satellite number image scenes and EKG patterns

• Each class has many possible prototypes• Decision boundary is very irregular

3

Page 4: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersFramework• The decision boundary of a 

15‐nearest‐neighbor classifier applied to the three‐class simulated data

• Decision boundary is fairly smooth compared to the lower panel (1‐nearest‐neigbor classifier)

4

Page 5: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersFramework• Close relationship between 

nearest‐neighbor and prototype methods:

in 1‐nearest‐neghbor classification, each training point is a prototype

5

Page 6: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersExample• STATLOG project: used part of a LANDSAT image 

as a benchmark for classification

6

Four heat‐map images, two in the visible  spectrum and two in the infrared, for an area of agricultural land in Australia

Page 7: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersExample• Each pixel has a class label from the 7‐element 

set }

• Determined manually by research assistants surveying the area

• Lower middle panel ‐ shows actual land usage‐ shaded by different colors to indicate the classes

7

Page 8: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersExample• Objective:

classify the land usage at a pixel, based on the information in the four spectral bands

• Extracted an 8‐neighbor feature map – the pixel itself and its 8 immediate neighbors

8

• Done separately in four spectral bands, giving input features

• Five‐nearest‐neighbors classification was carried out in this 36‐dimensional feature space

Page 9: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersExample

9

• Produced the predicted map shown in the bottom right panel

Page 10: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

K‐Nearest‐Neighbor ClassifiersExample• Resulting test error rate • Among all the methods used in the STATLOG 

project, k‐nearest‐neighbors performed best decision boundaries in  are quite irregular

10

Page 11: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Adaptive Nearest‐Neighbor

• In a high‐dimensional space, nearest neighbors can be very far away.

• Consider Fig. 13.13, a nearest‐neighborhood is depicted by the circular region.

11

• Class probabilities vary only in the horizontal direction.• If we knew, we would stretch 

neighborhood in vertical direction• Need adaptive metric

• Stretch out in directions for which class probabilities don’t change much

Page 12: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Adaptive Nearest‐Neighbor

• At query point a neighborhood of say 50 points is formed, and the class distribution is used to decide how to adapt the metric• Thus each query point has a different metric

• Neighborhood should be stretched in direction orthogonal to line joining the class centroids

• This direction coincides with the linear discriminant boundary• The direction in which the class probabilities 

change the least.

12

Page 13: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Adaptive Nearest‐Neighbor

• Neighborhoods stretch out orthogonally to the decision boundaries when both classes are present

13

• Discriminant Adaptive Nearest‐Neighbor (DANN) 

Page 14: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Adaptive Nearest‐NeighborDANN• Assuming a local discriminant model, information 

contained in the local within‐ and between‐class covariance matrices is all that is needed.

• The discriminant adaptive nearest‐neighbor (DANN) metric at a query point  is defined by:

where the metric  is defined as:⁄ ⁄ ⁄ ⁄

is pooled within‐class covariance matrix ∑is between class covariance matrix ∑ ̅ ̅ ̅ ̅and  computed using 50 nearest neighbors around 

14

Page 15: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Adaptive Nearest‐NeighborDANN• The  parameter rounds the neighborhood, from an 

infinite strip to an ellipsoid.• seems to work well• In pure regions with only one class, the 

neighborhoods remain circular. Then we obtain:

is the identity matrix due to the property:

15

Page 16: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Computational Considerations• One drawback of nearest‐neighbor algorithms is 

the computational load.

• With  observations and  predictors (attributes), nearest‐neighbor classification requires  operations to find the neighbors per query point.

• There are fast algorithms for finding nearest‐neighbors

16

Page 17: K Nearest Neighbor Classifiersseem5470/lecture/K-Nearest... · K‐Nearest‐Neighbor Classifiers Framework • Classifiers: ‐memory‐based ‐require no model to be fit • Given

Computational Considerations• Reducing the storage requirements is more 

difficult.• Various editing and condensing procedures have 

been proposed.• The idea is to isolate a subset of the training set 

that suffices for nearest‐neighbor predictions, and throw away the remaining training data.• The multi‐edit algorithm divides the data cyclically 

into training and test sets, computing a nearest neighbor rule on the training set and deleting test points that are misclassified.

• The condensing procedure goes further, trying to keep only important exterior points of these clusters. 17