This example application shows how to use the hidden Markov model gesture model for offline classification of pre-segmented gestures. The gestures used here are full body human gestures which have been recorded using a pair of video cameras.
There are 12 different gestures used as input to the example, each gesture with 40 recorded trials. Each gesture is modeled by a separate hidden Markov model, trained using 8 trials of the gesture.
A gesture trial is represented as a sequence of feature vectors, with individual
feature vectors representing a frame of the video recording of the trial.
The feature vectors are calculated from the video frames using an algorithm
developed by Bo Peng and Gang Qian (currently under review for publication).
The details of the feature vector calculation are not important for this
discussion - what is important is that a particular gesture trial is manifested
as a sequence of the feature vectors. To use the AME Patterns library, this
means we need to treat the feature vector as an Observation
used by a ObservationDistribution.
Due to the nature of these particular feature vectors, the absolute value
of the dot product turns out to be a similarity measure between two feature
vectors. Hence, we can use the nearest_neighbor
ObservationDistribution
as follows:
// we will stora a feature vector corresponding to a video frame into an array typedef boost::array<double, feature_vector_size> observation_type; // the similarity measure between two feature vectors is the absolute value // of the dot product struct similarity { double operator()(const observation_type &example, const observation_type &a) const { double sum = 0; for(observation_type::const_iterator observation = a.begin(), example_observation = example.begin(); observation!= a.end(); observation++, example_observation++) { sum += (*example_observation) * (*observation); } return fabs(sum); } }; #ifdef USE_NEAREST_NEIGHBOR typedef ame::patterns::model_state::nearest_neighbor<observation_type, similarity> model_state_type; #else struct difference { double operator()(const observation_type &example, const observation_type &a) const { return std::max(0.0, 1 - similarity()(example, a)); } }; typedef ame::patterns::model_state::mixture_of_gaussians<observation_type, difference> model_state_type; #endif //[ video_gesture_recognition__read_gesture typedef std::vector<observation_type> gesture_type; gesture_type read_gesture(int num, int index);
The sequences of feature vectors corresponding to a gesture are read from a file. The following function reads a requested trial of a requested gesture and returns it as a vector of feature vectors:
[video_gesturerecognition_read_gesture]
To train a hmm gesture model, we just read in the first 8 trials of the gesture and initialize the model:
template<typename hmm_type> void train(hmm_type &hmm, unsigned gesture) { std::vector<gesture_type> training; for(int i=0; i<8; i++) training.push_back(read_gesture(gesture, i)); hmm.train_with_examples(training, 0.2, 0.7, 0.1, 12, 10, 0, model_state_type(0.1)); }
To test the model, we read in a test trial, and match it to the model. The function will return the probability of the most likely state sequence that explains the matched trial - higher probabilities indicate that the trial is a better match for the model:
template<typename hmm_type> double test(hmm_type &hmm, unsigned gesture, unsigned trial) { return hmm.match_sequence(read_gesture(gesture, trial)); }
We can now write our main function. This will be a simple recognition rate test, which will train 12 HMM gesture models, and then use the remaining 32 trials of each gesture for testing:
template<unsigned Scenario> void run_test() { using namespace ame::patterns; using namespace ame; typedef boost::mpl::vector< patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type, patterns::ConstantParameters>, patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type, patterns::zigzag_parameters>, patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type, patterns::IndividualParameters>, patterns::hmm<selectors::static_circular_buffer<1000>, model_state_type> > gr_types; typedef typename boost::mpl::at_c<gr_types, Scenario>::type hmm_type; // train the 12 HMMs hmm_type hmm[num_gestures]; for(unsigned i=0; i<num_gestures; i++) train(hmm[i], i); // go through each gesture for(unsigned g=0; g<num_gestures; g++) { // this will keep track of how many we get right unsigned correct_count = 0; std::vector<unsigned> missclassifications; // go through each test trial for(unsigned i=first_test_trial; i<last_test_trial; i++) { // get the probability given by each of the 12 HMMs std::vector<double> probabilities; for(unsigned h=0; h<num_gestures; h++) probabilities.push_back(test(hmm[h], g, i)); // find the best match unsigned best_match = boost::max_element(probabilities) - probabilities.begin(); // if we got it right, record it if (best_match == g) correct_count++; else missclassifications.push_back(best_match); } // print the recognition rate std::cout << "Gesture " << g << " recognition rate: " << 100.0 * correct_count / (last_test_trial - first_test_trial + 1) << "%" << " (missclasifications:"; BOOST_FOREACH(unsigned m, missclassifications) std::cout << " " << m; std::cout << ")" << std::endl; } } int main(int, char* []) { std::cout << "CONSTANT" << std::endl; run_test<0>(); std::cout << "ZIGZAG" << std::endl; run_test<1>(); std::cout << "INDIVIDUAL" << std::endl; run_test<2>(); std::cout << "STANDARD" << std::endl; run_test<3>(); return 0; } // int test_main(int, char* [])
The results on this particular dataset are:
Gesture 0 recognition rate: 96.875% Gesture 1 recognition rate: 90.625% Gesture 2 recognition rate: 96.875% Gesture 3 recognition rate: 100% Gesture 4 recognition rate: 96.875% Gesture 5 recognition rate: 40.625% Gesture 6 recognition rate: 90.625% Gesture 7 recognition rate: 96.875% Gesture 8 recognition rate: 96.875% Gesture 9 recognition rate: 50% Gesture 10 recognition rate: 90.625% Gesture 11 recognition rate: 84.375%