KNN

Description

The K-Nearest Neighbor (KNN) Classifier is a simple classifier that works well on basic recognition problems, however it can be slow for real-time prediction if there are a large number of training examples and is not robust to noisy data. The KNN algorithm is part of the GRT classification modules.

Advantages

The K-Nearest Neighbor (KNN) Classifier is a very simple classifier that works well on basic recognition problems.

Disadvantages

The main disadvantage of the KNN algorithm is that it is a lazy learner, i.e. it does not learn anything from the training data and simply uses the training data itself for classification. To predict the label of a new instance the KNN algorithm will find the K closest neighbors to the new instance from the training data, the predicted class label will then be set as the most common label among the K closest neighboring points. The main disadvantage of this approach is that the algorithm must compute the distance and sort all the training data at each prediction, which can be slow if there are a large number of training examples. Another disadvantage of this approach is that the algorithm does not learn anything from the training data, which can result in the algorithm not generalizing well and also not being robust to noisy data. Further, changing K can change the resulting predicted class label.

Training Data Format

You should use the LabelledClassificationData data structure to train the KNN classifier.

Example Code

This examples demonstrates how to initialize, train, and use the KNN algorithm for classification. The example loads the data shown in the image below and uses this to train the KNN algorithm. The data is a recording of a Wii-mote being held in 5 different orientations, the top graph shows the raw accelerometer data from the recording (showing the x, y, and z accelerometer data), while the bottom graph shows the label recorded for each sample (you can see the 5 different classes in the label data). You can download the actual dataset in the Code & Resources section below.

KNN Training Data
The data is a recording of a Wii-mote being held in 5 different orientations, the top graph shows the raw accelerometer data from the recording (showing the x, y, and z accelerometer data), while the bottom graph shows the label recorded for each sample (you can see the 5 different classes in the label data). WiiAccelerometerData.jpg
/*
 GRT KNN Example
 This examples demonstrates how to initialize, train, and use the KNN algorithm for classification.

 The K-Nearest Neighbor (KNN) Classifier is a simple classifier that works well on basic recognition problems, however it can be slow for real-time prediction if there are a large number of training examples and is not robust to noisy data.

 In this example we create an instance of a KNN algorithm and then train the algorithm using some pre-recorded training data.
 The trained KNN algorithm is then used to predict the class label of some test data.

 This example shows you how to:
 - Create an initialize the KNN algorithm and set the number of neighbors to use for clasification
 - Load some LabelledClassificationData from a file and partition the training data into a training dataset and a test dataset
 - Train the KNN algorithm using the training dataset
 - Test the KNN algorithm using the test dataset
 - Manually compute the accuracy of the classifier
*/


#include "GRT.h"
using namespace GRT;

int main (int argc, const char * argv[])
{
    //Create a new KNN classifier with a K value of 10
    KNN knn(10);
    knn.setNullRejectionCoeff( 10 );
    knn.enableScaling( true );
    knn.enableNullRejection( true );

    //Train the classifier with some training data
    LabelledClassificationData trainingData;

    if( !trainingData.loadDatasetFromFile("KNNTrainingData.txt") ){
        cout << "Failed to load training data!\n";
        return EXIT_FAILURE;
    }

    //Use 20% of the training dataset to create a test dataset
    LabelledClassificationData testData = trainingData.partition( 80 );

    //Train the classifier
    if( !knn.train( trainingData ) ){
        cout << "Failed to train classifier!\n";
        return EXIT_FAILURE;
    }

    //Save the knn model to a file
    if( !knn.saveModelToFile("KNNModel.txt") ){
        cout << "Failed to save the classifier model!\n";
        return EXIT_FAILURE;
    }

    //Load the knn model from a file
    if( !knn.loadModelFromFile("KNNModel.txt") ){
        cout << "Failed to load the classifier model!\n";
        return EXIT_FAILURE;
    }

    //Use the test dataset to test the KNN model
    double accuracy = 0;
    for(UINT i=0; i<testData.getNumSamples(); i++){
        //Get the i'th test sample
        UINT classLabel = testData[i].getClassLabel();
        vector< double > inputVector = testData[i].getSample();

        //Perform a prediction using the classifier
        bool predictSuccess = knn.predict( inputVector );

        if( !predictSuccess ){
            cout << "Failed to perform prediction for test sampel: " << i <<"\n";
            return EXIT_FAILURE;
        }

        //Get the predicted class label
        UINT predictedClassLabel = knn.getPredictedClassLabel();
        vector< double > classLikelihoods = knn.getClassLikelihoods();
        vector< double > classDistances = knn.getClassDistances();

        //Update the accuracy
        if( classLabel == predictedClassLabel ) accuracy++;

        cout << "TestSample: " << i <<  " ClassLabel: " << classLabel << " PredictedClassLabel: " << predictedClassLabel << endl;
    }

    cout << "Test Accuracy: " << accuracy/double(testData.getNumSamples())*100.0 << "%" << endl;

    return EXIT_SUCCESS;
}

Code & Resources

KNNExample.cpp KNNTrainingData.txt

Documentation

You can find the documentation for this class at KNN documentation.