This example application shows how to use the hidden Markov model gesture model for offline classification of pre-segmented gestures. The gestures used here are full body human gestures which have been recorded using a pair of video cameras.

There are 12 different gestures used as input to the example, each gesture with 40 recorded trials. Each gesture is modeled by a separate hidden Markov model, trained using 8 trials of the gesture.

A gesture trial is represented as a sequence of feature vectors, with individual feature vectors representing a frame of the video recording of the trial. The feature vectors are calculated from the video frames using an algorithm developed by Bo Peng and Gang Qian (currently under review for publication). The details of the feature vector calculation are not important for this discussion - what is important is that a particular gesture trial is manifested as a sequence of the feature vectors. To use the AME Patterns library, this means we need to treat the feature vector as an Observation used by a ObservationDistribution.

Due to the nature of these particular feature vectors, the absolute value of the dot product turns out to be a similarity measure between two feature vectors. Hence, we can use the nearest_neighbor ObservationDistribution as follows:

// we will stora a feature vector corresponding to a video frame into an array
typedef boost::array<double, feature_vector_size> observation_type;

// the similarity measure between two feature vectors is the absolute value
// of the dot product
struct similarity
{
    double operator()(const observation_type &example, const observation_type &a) const
    {
        double sum = 0;
        for(observation_type::const_iterator observation = a.begin(),
            example_observation = example.begin();
            observation!= a.end();
            observation++, example_observation++)
        {
            sum += (*example_observation) * (*observation);
        }
        return fabs(sum);
    }
};

#ifdef USE_NEAREST_NEIGHBOR

typedef ame::patterns::model_state::nearest_neighbor<observation_type, similarity> model_state_type;

#else

struct difference
{
    double operator()(const observation_type &example, const observation_type &a) const
    {
        return std::max(0.0, 1 - similarity()(example, a));
    }
};

typedef ame::patterns::model_state::mixture_of_gaussians<observation_type, difference> model_state_type;

#endif

//[ video_gesture_recognition__read_gesture
typedef std::vector<observation_type> gesture_type;
gesture_type read_gesture(int num, int index);

The sequences of feature vectors corresponding to a gesture are read from a file. The following function reads a requested trial of a requested gesture and returns it as a vector of feature vectors:

[video_gesturerecognition_read_gesture]

To train a hmm gesture model, we just read in the first 8 trials of the gesture and initialize the model:

template<typename hmm_type>
void train(hmm_type &hmm, unsigned gesture)
{
    std::vector<gesture_type> training;
    
    for(int i=0; i<8; i++)
        training.push_back(read_gesture(gesture, i));
    hmm.train_with_examples(training, 0.2, 0.7, 0.1, 12, 10, 0, model_state_type(0.1));
}

To test the model, we read in a test trial, and match it to the model. The function will return the probability of the most likely state sequence that explains the matched trial - higher probabilities indicate that the trial is a better match for the model:

template<typename hmm_type>
double test(hmm_type &hmm, unsigned gesture, unsigned trial)
{
    return hmm.match_sequence(read_gesture(gesture, trial));
}

We can now write our main function. This will be a simple recognition rate test, which will train 12 HMM gesture models, and then use the remaining 32 trials of each gesture for testing:

template<unsigned Scenario>
void run_test()
{    
    using namespace ame::patterns;
    using namespace ame;

 typedef boost::mpl::vector<
    patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type, patterns::ConstantParameters>,
    patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type, patterns::zigzag_parameters>,
    patterns::gesture_snm<selectors::static_circular_buffer<1000>, model_state_type,  patterns::IndividualParameters>,
    patterns::hmm<selectors::static_circular_buffer<1000>, model_state_type> 
    > gr_types;

    typedef typename boost::mpl::at_c<gr_types, Scenario>::type hmm_type;

    // train the 12 HMMs
    hmm_type hmm[num_gestures];
    for(unsigned i=0; i<num_gestures; i++)
        train(hmm[i], i);
    
    // go through each gesture
    for(unsigned g=0; g<num_gestures; g++)
    {
        // this will keep track of how many we get right
        unsigned correct_count = 0;
        std::vector<unsigned> missclassifications;
        
        // go through each test trial
        for(unsigned i=first_test_trial; i<last_test_trial; i++)
        {
            // get the probability given by each of the 12 HMMs
            std::vector<double> probabilities;
            for(unsigned h=0; h<num_gestures; h++)
                probabilities.push_back(test(hmm[h], g, i));
                
            // find the best match
            unsigned best_match = boost::max_element(probabilities) - probabilities.begin();
            // if we got it right, record it
            if (best_match == g)
                correct_count++;
            else
                missclassifications.push_back(best_match);
        }
        // print the recognition rate
        std::cout << "Gesture " << g << " recognition rate: " << 100.0 * correct_count / (last_test_trial - first_test_trial + 1) << "%" << " (missclasifications:";
        BOOST_FOREACH(unsigned m, missclassifications)
            std::cout << " " << m;
        std::cout << ")" << std::endl;
    }    
}

int main(int, char* [])
{
    std::cout << "CONSTANT" << std::endl;
    run_test<0>();
    std::cout << "ZIGZAG" << std::endl;
    run_test<1>();
    std::cout << "INDIVIDUAL" << std::endl;
    run_test<2>();
    std::cout << "STANDARD" << std::endl;
    run_test<3>();

    return 0;
} // int test_main(int, char* [])

The results on this particular dataset are:

Gesture 0 recognition rate: 96.875%
Gesture 1 recognition rate: 90.625%
Gesture 2 recognition rate: 96.875%
Gesture 3 recognition rate: 100%
Gesture 4 recognition rate: 96.875%
Gesture 5 recognition rate: 40.625%
Gesture 6 recognition rate: 90.625%
Gesture 7 recognition rate: 96.875%
Gesture 8 recognition rate: 96.875%
Gesture 9 recognition rate: 50%
Gesture 10 recognition rate: 90.625%
Gesture 11 recognition rate: 84.375%