Classification Using Nearest Neighbors 5

Examine Quality of KNN Classifier

This example shows how to examine the quality of a k-nearest neighbor classifier using resubstitution and cross validation.

Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.

load fisheriris
X = meas;    
Y = species; 
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'NumNeighbors',4);

Examine the resubstitution loss, which, by default, is the fraction of misclassifications from the predictions of Mdl. (For nondefault cost, weights, or priors, see loss.).

rloss = resubLoss(Mdl)
rloss = 0.0400

The classifier predicts incorrectly for 4% of the training data.

Construct a cross-validated classifier from the model.

CVMdl = crossval(Mdl);

Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training.

kloss = kfoldLoss(CVMdl)
kloss = 0.0333

The cross-validated classification accuracy resembles the resubstitution accuracy. Therefore, you can expect Mdl to misclassify approximately 4% of new data, assuming that the new data has about the same distribution as the training data.

Predict Classification Using KNN Classifier

This example shows how to predict classification for a k-nearest neighbor classifier.

Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.

load fisheriris
X = meas;    
Y = species; 
Mdl = fitcknn(X,Y,'NumNeighbors',4);

Predict the classification of an average flower.

flwr = mean(X); % an average flower
flwrClass = predict(Mdl,flwr)
flwrClass = 1x1 cell array
    {'versicolor'}


Modify KNN Classifier

This example shows how to modify a k-nearest neighbor classifier.

Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.

load fisheriris
X = meas;    
Y = species; 
Mdl = fitcknn(X,Y,'NumNeighbors',4);

Modify the model to use the three nearest neighbors, rather than the default one nearest neighbor.

Mdl.NumNeighbors = 3;

Compare the resubstitution predictions and cross-validation loss with the new number of neighbors.

loss = resubLoss(Mdl)
loss = 0.0400
rng(10); % For reproducibility
CVMdl = crossval(Mdl,'KFold',5);
kloss = kfoldLoss(CVMdl)
kloss = 0.0333

In this case, the model with three neighbors has the same cross-validated loss as the model with four neighbors (see Examine Quality of KNN Classifier).

Modify the model to use cosine distance instead of the default, and examine the loss. To use cosine distance, you must recreate the model using the exhaustive search method.

CMdl = fitcknn(X,Y,'NSMethod','exhaustive','Distance','cosine');
CMdl.NumNeighbors = 3;
closs = resubLoss(CMdl)
closs = 0.0200

The classifier now has lower resubstitution error than before.

Check the quality of a cross-validated version of the new model.

CVCMdl = crossval(CMdl);
kcloss = kfoldLoss(CVCMdl)
kcloss = 0.0200

CVCMdl has a better cross-validated loss than CVMdl. However, in general, improving the resubstitution error does not necessarily produce a model with better test-sample predictions.