This example shows how to examine the quality of a k-nearest neighbor classifier using resubstitution and cross validation.
Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.
load fisheriris X = meas; Y = species; rng(10); % For reproducibility Mdl = fitcknn(X,Y,'NumNeighbors',4);
Examine the resubstitution loss, which, by default, is the fraction of misclassifications from the predictions of Mdl. (For nondefault cost, weights, or priors, see loss.).
rloss = resubLoss(Mdl)
rloss = 0.0400
The classifier predicts incorrectly for 4% of the training data.
Construct a cross-validated classifier from the model.
CVMdl = crossval(Mdl);
Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training.
kloss = kfoldLoss(CVMdl)
kloss = 0.0333
The cross-validated classification accuracy resembles the resubstitution accuracy. Therefore, you can expect Mdl to misclassify approximately 4% of new data, assuming that the new data has about the same distribution as the training data.
This example shows how to predict classification for a k-nearest neighbor classifier.
Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.
load fisheriris X = meas; Y = species; Mdl = fitcknn(X,Y,'NumNeighbors',4);
Predict the classification of an average flower.
flwr = mean(X); % an average flower flwrClass = predict(Mdl,flwr)
flwrClass = 1x1 cell array {'versicolor'}
This example shows how to modify a k-nearest neighbor classifier.
Construct a KNN classifier for the Fisher iris data as in Construct KNN Classifier.
load fisheriris X = meas; Y = species; Mdl = fitcknn(X,Y,'NumNeighbors',4);
Modify the model to use the three nearest neighbors, rather than the default one nearest neighbor.
Mdl.NumNeighbors = 3;
Compare the resubstitution predictions and cross-validation loss with the new number of neighbors.
loss = resubLoss(Mdl)
loss = 0.0400
rng(10); % For reproducibility CVMdl = crossval(Mdl,'KFold',5); kloss = kfoldLoss(CVMdl)
kloss = 0.0333
In this case, the model with three neighbors has the same cross-validated loss as the model with four neighbors (see Examine Quality of KNN Classifier).
Modify the model to use cosine distance instead of the default, and examine the loss. To use cosine distance, you must recreate the model using the exhaustive search method.
CMdl = fitcknn(X,Y,'NSMethod','exhaustive','Distance','cosine'); CMdl.NumNeighbors = 3; closs = resubLoss(CMdl)
closs = 0.0200
The classifier now has lower resubstitution error than before.
Check the quality of a cross-validated version of the new model.
CVCMdl = crossval(CMdl); kcloss = kfoldLoss(CVCMdl)
kcloss = 0.0200
CVCMdl has a better cross-validated loss than CVMdl. However, in general, improving the resubstitution error does not necessarily produce a model with better test-sample predictions.