Classification Using Nearest Neighbors 4

Find Nearest Neighbors Using a Custom Distance Metric

This example shows how to find the indices of the three nearest observations in X to each observation in Y with respect to the chi-square distance. This distance metric is used in correspondence analysis, particularly in ecological applications.

Randomly generate normally distributed data into two matrices. The number of rows can vary, but the number of columns must be equal. This example uses 2-D data for plotting.

rng(1) % For reproducibility
X = randn(50,2);
Y = randn(4,2);

h = zeros(3,1);
figure
h(1) = plot(X(:,1),X(:,2),'bx');
hold on
h(2) = plot(Y(:,1),Y(:,2),'rs','MarkerSize',10);
title('Heterogeneous Data')

Figure contains an axes object. The axes object with title Heterogeneous Data contains 2 objects of type line.

The rows of X and Y correspond to observations, and the columns are, in general, dimensions (for example, predictors).

The chi-square distance between j-dimensional points x and z is

 

χ(x,z)=GJ?j=1wj(xjzj)2,

 

where wj is the weight associated with dimension j.

Choose weights for each dimension, and specify the chi-square distance function. The distance function must:

  • Take as input arguments one row of X, e.g., x, and the matrix Z.

  • Compare x to each row of Z.

  • Return a vector D of length nz, where nz is the number of rows of Z. Each element of D is the distance between the observation corresponding to x and the observations corresponding to each row of Z.

w = [0.4; 0.6];
chiSqrDist = @(x,Z)sqrt((bsxfun(@minus,x,Z).^2)*w);

This example uses arbitrary weights for illustration.

Find the indices of the three nearest observations in X to each observation in Y.

k = 3;
[Idx,D] = knnsearch(X,Y,'Distance',chiSqrDist,'k',k);

idx and D are 4-by-3 matrices.

  • idx(j,1) is the row index of the closest observation in X to observation j of Y, and D(j,1) is their distance.

  • idx(j,2) is the row index of the next closest observation in X to observation j of Y, and D(j,2) is their distance.

  • And so on.

Identify the nearest observations in the plot.

for j = 1:k
    h(3) = plot(X(Idx(:,j),1),X(Idx(:,j),2),'ko','MarkerSize',10);
end 
legend(h,{'\texttt{X}','\texttt{Y}','Nearest Neighbor'},'Interpreter','latex')
title('Heterogeneous Data and Nearest Neighbors')
hold off

Figure contains an axes object. The axes object with title Heterogeneous Data and Nearest Neighbors contains 5 objects of type line. These objects represent \texttt{X}, \texttt{Y}, Nearest Neighbor.

Several observations of Y share nearest neighbors.

Verify that the chi-square distance metric is equivalent to the Euclidean distance metric, but with an optional scaling parameter.

[IdxE,DE] = knnsearch(X,Y,'Distance','seuclidean','k',k, ...
    'Scale',1./(sqrt(w)));
AreDiffIdx = sum(sum(Idx ~= IdxE))
AreDiffIdx = 0
AreDiffDist = sum(sum(abs(D - DE) > eps))
AreDiffDist = 0

The indices and distances between the two implementations of three nearest neighbors are practically equivalent.

K-Nearest Neighbor Classification for Supervised Learning

The ClassificationKNN classification model lets you:

Prepare your data for classification according to the procedure in Steps in Supervised Learning. Then, construct the classifier using fitcknn.

Construct KNN Classifier

 

This example shows how to construct a k-nearest neighbor classifier for the Fisher iris data.

Load the Fisher iris data.

 

load fisheriris
X = meas;    % Use all data for fitting
Y = species; % Response data

Construct the classifier using fitcknn.

Mdl = fitcknn(X,Y)
Mdl = 
  ClassificationKNN
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'setosa'  'versicolor'  'virginica'}
           ScoreTransform: 'none'
          NumObservations: 150
                 Distance: 'euclidean'
             NumNeighbors: 1


  Properties, Methods

A default k-nearest neighbor classifier uses a single nearest neighbor only. Often, a classifier is more robust with more neighbors than that.

Change the neighborhood size of Mdl to 4, meaning that Mdl classifies using the four nearest neighbors.

Mdl.NumNeighbors = 4;

 

Matlabsolutions.com provides guaranteed satisfaction with a commitment to complete the work within time. Combined with our meticulous work ethics and extensive domain experience, We are the ideal partner for all your homework/assignment needs. We pledge to provide 24*7 support to dissolve all your academic doubts. We are composed of 300+ esteemed Matlab and other experts who have been empanelled after extensive research and quality check.

Matlabsolutions.com provides undivided attention to each Matlab assignment order with a methodical approach to solution. Our network span is not restricted to US, UK and Australia rather extends to countries like Singapore, Canada and UAE. Our Matlab assignment help services include Image Processing Assignments, Electrical Engineering Assignments, Matlab homework help, Matlab Research Paper help, Matlab Simulink help. Get your work done at the best price in industry.

Machine Learning in MATLAB

Train Classification Models in Classification Learner App

Train Regression Models in Regression Learner App

Distribution Plots

Explore the Random Number Generation UI

Design of Experiments

Machine Learning Models

Logistic regression

Logistic regression create generalized linear regression model - MATLAB fitglm 2

Support Vector Machines for Binary Classification

Support Vector Machines for Binary Classification 2

Support Vector Machines for Binary Classification 3

Support Vector Machines for Binary Classification 4

Support Vector Machines for Binary Classification 5

Assess Neural Network Classifier Performance

Naive Bayes Classification

ClassificationTree class

Discriminant Analysis Classification

Ensemble classifier

ClassificationTree class 2

Train Generalized Additive Model for Binary Classification

Train Generalized Additive Model for Binary Classification 2

Classification Using Nearest Neighbors

Classification Using Nearest Neighbors 2

Classification Using Nearest Neighbors 3

Classification Using Nearest Neighbors 4

Classification Using Nearest Neighbors 5

Linear Regression

Linear Regression 2

Linear Regression 3

Linear Regression 4

Nonlinear Regression

Nonlinear Regression 2

Visualizing Multivariate Data

Generalized Linear Models

Generalized Linear Models 2

RegressionTree class

RegressionTree class 2

Neural networks

Gaussian Process Regression Models

Gaussian Process Regression Models 2

Understanding Support Vector Machine Regression

Understanding Support Vector Machine Regression 2

RegressionEnsemble