Getting Started

Getting Started

This page provides a general overview of the GRT, describing the common steps required to use the GRT to create your own custom gesture-recognition system.

  • First download and install the GRT.
  • There are generally 6 main steps to creating a gesture-recognition system using the GRT:
  1. Select a suitable pre-processing algorithm, feature-extraction algorithm, recognition algorithm, and post-processing algorithm that might work well to solve your gesture-recognition problem
  2. Setup a gesture-recognition pipeline using these algorithms
  3. Record some training data
  4. Train the pipeline
  5. Test the recognition-accuracy of the pipeline
  6. Use the pipeline for the real-time prediction of your gestures

[1] Selecting suitable algorithms to solve your gesture-recognition problem

A common stumbling block for anyone who is new to creating their own gesture-recognition system is simply knowing which machine-learning algorithm and supporting algorithms might work best for solving your gesture-recognition problem.

A good place to start is to first ask yourself what the output of your system should consist of, i.e. do you want the output of the system to:

  1. consist of a specific value indicating that you (or someone else) has just made a gesture
  2. consist of a continuous value or values

If the first of these options sounds best then you are trying to solve a classification problem. Alternatively, if the second option sounds more suitable then you are trying to solve a regression problem. An example of a classification problem might involve giving your system an input image, captured from a webcam for instance, and the job of the system is to classify a specific user's face from this within this image. Whereas an example of a regression problem might involve giving your system an input image, perhaps captured from a camera on top of a moving car, and the job of the system is to estimate what angle the car's steering wheel should be turned to keep the car on the road.

If you are trying to solve a classification problem then you can further break this problem down into two sub-categories, that of recognizing (1) static postures from (2) temporal gestures. A static posture, for example, might be that a user is holding a device, equipped with a 3-axis accelerometer, in a certain orientation (think of how your smart phone automatically changes the layout of your phone's display from portrait to landscape, or vise-versa, when you hold it in a specific orientation). Alternatively, a temporal gesture might consist of a user making a left-handed swipe gesture in front of your interface. Static postures are generally much easier to recognize (as there is no temporal variability in the gesture to account for).

The GRT has a number of algorithms that are suitable for both static-posture recognition and temporal-gesture recognition, examples of these are:

  • Static Posture Recognition Algorithms
  1. K-Nearest Neighbor Classifier (KNN): A very simple classifier that works well on basic recognition problems, can be slow for real-time prediction though and is not robust to noisy data
  2. Adaptive Naive Bayes Classifier (ANBC): A naive but powerful classifier that works very well on both basic and more complex recognition problems
  3. Support Vector Machine (SVM): A very powerful classifier that works very well on complex classification problems
  • Temporal Gesture Recognition
  1. Dynamic Time Warping (DTW): A powerful classifier for temporal gestures, not really suitable for gestures that are not temporal

The KNN, ANBC, and SVM algorithms can also be used for temporal-gesture recognition if they are paired with a suitable feature-extraction algorithm that can take a temporal signal and compute some relevant features from this, which can then be input into the aforementioned classifiers.

The GRT also has a number of algorithms for regression and continuous mapping, these include:

  • Regression Algorithms
  1. Linear Regression:
  2. Multi Layer Perceptron (MLP) Artificial Neural Network:

[2] Setting up a gesture-recognition pipeline

After you have selected the algorithms that might be suitable for your gesture-recognition problem, you can then setup a new gesture-recognition system which you can train and use to recognize your gestures. The GRT represents a gesture-recognition system as a Gesture Recognition Pipeline. A Gesture Recognition Pipeline allows you to pass your sensor data into the pipeline and then retrive the predicted class label of a gesture from the end of the pipeline. A pipeline serves as a container to which you can add the classification or regression algorithm that you think might work best for solving your gesture recognition problem.

You can also add an unlimited number of pre-processing, feature-extraction, and post-processing algorithms to the pipeline. These supporting algorithms can not only make the job easier for the classification algorithm, they can also help make the classification results more accurate by mitigating false-positive classification errors or by stopping one gesture from being recognized a number of times (when you only performed it once). These supporting algorithms are represented as modules in the pipeline, allowing the output of one module to be used as input to the next module. The gesture recognition pipeline image below shows an illustration of this modularity.

The example code below demonstrates how to create a new gesture recognition pipeline, using an ANBC classification algorithm at the core of the pipeline, supported by a number of pre-processing, feature-extraction, and post-processing modules.

//Create a new GestureRecognitionPipeline
GestureRecognitionPipeline pipeline;

//Add a moving average filter (with a buffer size of 5 and for a 1 dimensional signal) as a pre-processing module to the start of the pipeline
pipeline.addPreProcessingModule( MovingAverageFilter(5,1) );

//Add an FFT (with a window size of 512) as a feature-extraction module to the pipeline, the input to this module will consist of the output of the moving average filter.  
pipeline.addFeatureExtractionModule( FFT(512) );

//Add a custom feature module to the pipeline, its easy to integrate your own custom feature-extraction (and preprocessing or postprocessing) modules into any pipeline
//This custom feature-extraction module might, for example, take the output of the FFT module as its input and compute some features from the FFT signal, such as the top N frequency values
pipeline.addFeatureExtractionModule( MyOwnFeatureMethod() );

//Set the classifier at the core of the pipeline, in this case we are using an Adaptive Naive Bayes Classifier
pipeline.setClassifier( ANBC() );

//Add a class label timeout filter to the end of the pipeline (with a timeout value of 1 second), this will filter the predicted class output from the ANBC algorithm
pipeline.addPostProcessingModule( ClassLabelTimeoutFilter(1000) );

After setting up your custom gesture recognition pipeline you can then train it, and then use the trained pipeline for real-time gesture recognition.


[3] Recording some training data

Before you can use a gesture-recognition pipeline to recognize your real-time gestures, you need to train the classification or regression algorithm at the core of the pipeline. To train the algorithm you need to record some examples of the gestures you want the pipeline to recognize and then use this training data to train the pipeline.

The GRT has a number of utilities and data structures to assist you to record, label, manage, save and load training data.

For example, if you are using an ANBC, KNN, GMM or SVM classifier at the core of the pipeline then you should record your training data using the ClassificationData:

//Create a new instance of the ClassificationData
ClassificationData trainingData;

//Set the dimensionality of the data
trainingData.setNumDimensions( 3 );

//Here you would grab some data from your sensor and label it with the corresponding gesture it belongs to
UINT gestureLabel = 1;
vector< double > sample(3);
sample[0] = //....Data from sensor
sample[1] = //....Data from sensor
sample[2] = //....Data from sensor

//Add the sample to the training data
trainingData.addSample( gestureLabel, sample );

//After recording your training data you can then save it to a file
bool saveResult = trainingData.saveDatasetToFile( "TrainingData.txt" );

//This can then be loaded later
bool loadResult = trainingData.loadDatasetFromFile( "TrainingData.txt" );

where UINT is a GRT type representing an unsigned int.

You can also use the LabelledClassificationData class to load CSV data directly from a file, allowing you to record, label and edit your training data in another program (such as Excel or Matlab). You can find a complete example of how to record, label, manage, save and load training data on the ClassificationData reference page. The reference page also contains a link to the full documentation for the ClassificationData class.


[4] Training the pipeline

After recording the training data you can then use this to train the classification or regression algorithm at the core of the pipeline. The pipeline can be trained using the training data:

//Train the pipeline
bool trainSuccess = pipeline.train( trainingData );

where trainingData is one of the GRT data structures containing the training data.

The code listed above for training the recognition algorithm at the core of your recognition pipeline is the same, regardless of which machine-learning is being used. This means that no matter what machine-learning algorithm you are using, either classification or regression, Support Vector Machine or Hidden Markov Model, training the algorithm requires exactly the same one line of code!


[5] Testing the recognition-accuracy of the pipeline

After training the pipeline you can quickly test it to validate how well the pipeline will work with new data:

//Test the classification accuracy of the trained pipeline
bool testSuccess = pipeline.test( testData );

//You can then get then get the accuracy of how well the pipeline performed with the test data
double accuracy = pipeline.getTestAccuracy();

//Along with some other results such as the F-Measure, Precision and Recall
double fMeasure = pipeline.getTestFMeasure();
double precision = pipeline.getTestPrecision();
double recall = pipeline.getTestRecall();

where testData is one of the GRT data structures containing the test data. Check out the ClassificationData reference page to see how you can easily partition one large dataset into a training dataset and a test dataset.

Alternatively, if you don't have enough training data to create a separate test dataset you can train the pipeline using k-fold cross validation and the existing training data:

//Perform the prediction
bool trainSuccess = pipeline.train( trainingData, 10 );

//You can then get then get the accuracy of how well the pipeline performed during the k-fold cross validation testing
double accuracy = pipeline.getCrossValidationAccuracy();

where trainingData is one of the GRT data structures containing your training data and 10 is the number of folds to use for the cross-validation training.


[6] Using the pipeline for real-time prediction

After training the pipeline you can then use it to predict the class label (i.e. gesture label) of new data:

//Perform the prediction
bool predictionSuccess = pipeline.predict( inputVector );

//You can then get the predicted class label from the pipeline
UINT predictedClassLabel = pipeline.getPredictedClassLabel();

//Along with some other results such as the likelihood of the most likely class or the likelihood of all the classes in the model
double bestLoglikelihood = pipeline.getMaximumLikelihood();
vector<double> classLikelihoods = pipeline.getClassLikelihoods();

//You can then use the predicted class label to trigger the action associated with that gesture
if( predictedClassLabel == 1 ){
    //Trigger the action associated with gesture 1
}
if( predictedClassLabel == 2 ){
    //Trigger the action associated with gesture 2
}

where inputVector is a c++ vector containing the new data from your sensor and UINT is a GRT type representing an unsigned int.

Example Code

You can find a simple example that shows you how to create a gesture-recognition pipeline, train the pipeline and test it using some pre-recorded data in the Hello World tutorial on the tutorials page.