Incorporating Weighted Clustering in 3D Gesture Recognition

Similar documents
OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 1: Machine Learning Basics

Rule Learning With Negation: Issues Regarding Effectiveness

The Good Judgment Project: A large scale test of different methods of combining expert predictions

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Rule Learning with Negation: Issues Regarding Effectiveness

Speech Emotion Recognition Using Support Vector Machine

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CS Machine Learning

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning Methods in Multilingual Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

A Case Study: News Classification Based on Term Frequency

Learning From the Past with Experiment Databases

Word Segmentation of Off-line Handwritten Documents

Assignment 1: Predicting Amazon Review Ratings

Lecture 1: Basic Concepts of Machine Learning

Australian Journal of Basic and Applied Sciences

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

BENCHMARK TREND COMPARISON REPORT:

Active Learning. Yingyu Liang Computer Sciences 760 Fall

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Chapter 4 - Fractions

Generative models and adversarial training

Python Machine Learning

Reducing Features to Improve Bug Prediction

Software Maintenance

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Human Emotion Recognition From Speech

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Extending Place Value with Whole Numbers to 1,000,000

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

Online Updating of Word Representations for Part-of-Speech Tagging

Circuit Simulators: A Revolutionary E-Learning Platform

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Conference Presentation

SOFTWARE EVALUATION TOOL

arxiv: v1 [cs.lg] 15 Jun 2015

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Grade 6: Correlated to AGS Basic Math Skills

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Speaker recognition using universal background model on YOHO database

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mandarin Lexical Tone Recognition: The Gating Paradigm

WHEN THERE IS A mismatch between the acoustic

An Online Handwriting Recognition System For Turkish

Why Did My Detector Do That?!

Transfer Learning Action Models by Measuring the Similarity of Different Domains

School Size and the Quality of Teaching and Learning

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

Proceedings of Meetings on Acoustics

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Theory of Probability

Using computational modeling in language acquisition research

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

SARDNET: A Self-Organizing Feature Map for Sequences

Why OUT-OF-LEVEL Testing? 2017 CTY Johns Hopkins University

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Science Clubs as a Vehicle to Enhance Science Teaching and Learning in Schools

Using focal point learning to improve human machine tacit coordination

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Modeling function word errors in DNN-HMM based LVCSR systems

Knowledge Transfer in Deep Convolutional Neural Nets

Abstractions and the Brain

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

INPE São José dos Campos

Classify: by elimination Road signs

Linking Task: Identifying authors and book titles in verbose queries

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

ACBSP Related Standards: #3 Student and Stakeholder Focus #4 Measurement and Analysis of Student Learning and Performance

Truth Inference in Crowdsourcing: Is the Problem Solved?

Switchboard Language Model Improvement with Conversational Data from Gigaword

Disambiguation of Thai Personal Name from Online News Articles

Large Kindergarten Centers Icons

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Evolution of Symbolisation in Chimpanzees and Neural Nets

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Probabilistic Latent Semantic Analysis

How People Learn Physics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

GROUP COMPOSITION IN THE NAVIGATION SIMULATOR A PILOT STUDY Magnus Boström (Kalmar Maritime Academy, Sweden)

Automatic Pronunciation Checker

Introduction to Simulation

Running head: DELAY AND PROSPECTIVE MEMORY 1

Transcription:

Incorporating Weighted Clustering in 3D Gesture Recognition John Hiesey jhiesey@cs.stanford.edu Clayton Mellina cmellina@cs.stanford.edu December 16, 2011 Zavain Dar zdar@cs.stanford.edu 1 Introduction We expand and improve upon the Hidden Markov Model algorithm (KHMM) presented in Klingmann [1] to improve identification of 3d gestures on mobile devices. Specifically, we extend the algorithm to include a a weighted probability measure for clustering fit, as well as expand our feature space to use gyroscope information, which is available on most mobile devices today. This gives nontrivial gains in both robustness and accuracy in 3d gesture recognition over Klingmann s initial implementation. We alter KHMM s clustering assumptions. Specifically, we address the implicit assumption that clustering centroids remain uniform across gesture types. By assigning to each gesture type a unique set of centroids we make significant and immediate gains. We derive two alternative algorithms to test our hypotheses, denoting them Cluster-Matching-HMM and Cluster-Matching. There are a variety of reasons why we use gyroscope information. Firstly, gyroscope data is additional information that is useful for identifying gestures, especially considering that slight variations in the device s orientation from gesture to gesture will affect accelerometer readings. These variations, however, can be corrected for with judicious use of gyroscope data. We hypothesize that a machine learning algorithm will be capable of utilizing the data to effectively make these corrections, thereby resulting in better orientation invariance. Secondly, if we wish to use 3d gestures as an input modality for mobile devices, inclusion of gyroscope information offers opportunities for greater expressiveness. Gestures can now include and be differentiated by orientation changes. On a pragmatic note, we develop and test on the iphone platform, although our solutions have general applicability, and an available Matlab implementation. The iphone is chosen amongst our team as two of our members have previous iphone development experience. 2 Data As with any learning algorithm we require to have some data available. We use Apple s iphone and xsensor application to record, train and test our various algorithms. For each training example for any gesture type xsensor returns to us an ordered series of vectors in R n containing instantaneous acceleration and gyroscope data sampled at 32 hz. For the most part we normalize each training example by scaling n each vector in the training example by n xn, where the training example consists of n instantaneous feature vectors. 3 Algorithms We now discuss the three main algorithms that we examine and analyze, starting first with Klingmann s baseline KHMM, followed by our Cluster-Matching-HMM and Cluster-Matching alterations. 1

3.1 KHMM KHMM uses k-means clustering to discretize the space of real vectors followed by training one Hidden Markov Model for each gesture type. The intuition behind this can be thought of as follows: the clustering across all gesture types serves mainly to project each data vector which is in R n (where there are n features) to Z +. Concretely, this discretizes the data uniformly regardless of what gesture type it belongs to. Following the discretization KHMM then trains a Hidden Markov Model of the discretized training sets for each gesture type. 3.1.1 Clustering For all training examples across all gesture types, running k-means on all of the training data simultaneously defines a single clustering function f : R n Z +. 3.1.2 Hidden Markov Model For each set of training examples for each gesture type, train a Hidden Markov Model. Namely in this stage, for each gesture type g, we generate a function h g : (Z +k ) R [0,1] by the Hidden Markov Model process: HMM : (Z +k ) t h g, where each training example consists of k instantaneous data points and there are t training examples for gesture type g. 3.1.3 Classification Procedure To classify an unknown gesture example, which exists in (R n ) k, we first transform it to its discretized version, namely f(r n ) k and then for each gesture type g, apply h g ((f(r n )) k ) and classify the example according to the g were h g returns maximum probability. 3.2 Cluster-Matching-HMM The motivation for Cluster-Matching-HMM derives from the realization that in the clustering step KHMM applies the same centroids to each training example regardless of what gesture type it belongs to. We hypothesize that different gesture types have vastly different vector features score (i.e. lie in different subspaces of R n ) and thus we might gain information by clustering each gesture type independently and then training an HMM for each clustered training type. In the classification step we then classify over each clustering and each corresponding HMM. Moreover we weight the resulting HMM probabilities according to how well the training example fits in the the assigned cluster. 3.2.1 Clustering For each gesture type g across all training examples, running k-means on all of the training data from for one gesture only defines a clustering function f g : R n Z + for each gesture type. 3.2.2 Hidden Markov Model For each set of training examples for each gesture type, train a Hidden Markov Model. Namely in this set we generate a function h g : (Z +k ) R [0,1] by the Hidden Markov Model process: HMM : (Z +k ) t h g, where each training example consists of k instantaneous data points and there are t training examples for gesture type g. 2

3.2.3 Classification Procedure To classify a new example, which exists in (R n ) k, we first transform, for each gesture type g, the training example to its discretized version, namely f g (R n ) k and then apply h g ((f g (R n )) k ). Finally, we classify the example according to the g where h g p((f g ) k ) returns the maximum value. Here p is a fittness value of how well the training example fit into the given clustering. Namely we used an aggregate sum of the distances of the training vectors to their respective assigned centroids. Thus not only does each gesture type receive its own HMM but also its own clustering. 3.3 Cluster-Matching Motivated by the effectiveness 1 of Cluster-Matching-HMM we define Cluster-Matching to naively cluster the training examples for specific gesture types. Then, when classifying, we assign an unknown gesture type to the gesture clustering to which it best fits. 3.3.1 Clustering For each gesture type g across all training examples, k-means defines a clustering function f g : R n Z +. 3.3.2 Classification Procedure For all g, compute p((f g (R n )) k ). Return g that returns minimum value. 4 Results We tested our algorithms 2 on six datasets: circles (with the phone held relatively flat), triangles (with the phone held similarly), bowling motions, flicking the phone upward, and flicking up and to the right. The last set, used only in a few tests, consisted of squares. These datasets all include gestures of deliberately variable quality: some are carefully controlled, and others less so. Normalization Features Gestures Types Algorithms Accuracy Yes Both All Cluster-Matching-HMM (10) 94.6% Yes Both All KHMM 89.6% No Both All KHMM 88.4% Yes Accel All Cluster-Matching-HMM (10) 92.0% Yes Accel All KHMM 84.2% Yes Gyro All Cluster-Matching-HMM (10) 92.3% Yes Gyro All KHMM 89.7% Yes Both All Cluster-Matching (10) 96.7% Yes Both All Cluster-Matching (20) 97.9% Yes Both All Cluster-Matching (40) 98.4% No Both All Cluster-Matching (40) 98.4% Yes Both Squares and Triangles Cluster-Matching (10) 100% Yes Both Circles and Reversed-circles Cluster-Matching (10) Chance Yes Both Circles and Reversed-circles Cluster-Matching-HMM (10) 72.0% Yes Both Circles and Reversed-circles Cluster-Matching-HMM (20) 88.8% 1 See results. 2 All code available at https://github.com/jhiesey/gesturerecognizer 3

4.1 Choice of features In general, the highest accuracy is achieved when using both accelerometer and gyroscope data, followed by gyroscope data only, followed by accelerometer data only. For example, using Cluster-Matching-HMM, both accelerometer and gyroscope data together achieves 94.6% on all gesture types, followed by gyroscope data only at 92.3%, and accelerometer data only at 92.0%. The same general pattern is evident in the KHMM data as well. As we could not find any examples of using gyroscope data for gesture recognition in the literature, this seems to be unique way of improving gesture recognition accuracy. 4.2 Clustering approach Our initial approach of clustering on all training data together worked reasonably well. When running on all 5 sample gestures and with the HMM enabled, we got an an accuracy of 89.6%. However, on the same data, our Cluster-Matching algorithm combined with an HMM reaches 94.6% accuracy. We also observed similar improvements with other examples, as seen in the data. 4.3 HMM Although we found a few examples where the Cluster-Matching algorithm benefits from using a Hidden Markov Model, in most real-world examples there is little benefit. When running on all of the training data, we found that adding the HMM actually decreased accuracy slightly, from 96.7% to 94.6%, which may be due to random fluctuations. However, when comparing the data for circles to our contrived data set consisting of the first and second halves of each circle switched, then using an HMM with alphabet size of 20 improved the results from chance to 88.8%. Unfortunately we could not get such dramatic improvements with any non contrived data. 4.4 Number of symbols Increasing the number of symbols in the HMM s alphabet improves accuracy up to a point, but at the cost of a decrease in performance. As shown in Figure 1, the optimum accuracy is achieved at an alphabet size of around 25 symbols, with gradually decreasing accuracy outside of that range. 4.5 Normalization In general, our normalization makes only a small difference. With the KHMM algorithm we saw an improvement of less than 1.5% in accuracy, and we saw no detectable difference using Cluster- Matching. 5 Future Research We see a variety of promising avenues for the future development of Cluster-Matching. More work needs to be done to assess the scalability Figure 1: Classification accuracy of Cluster- Matching-HMM for 6 gestures 4

of our method, although the outlook is promising. Current literature on accelerometer-based gesture recognition usually reports classification results for 4 to 8 gesture types. We believe that our method has the potential to scale to more gesture types given our current results. Our technique currently incorporates centroid-distance and HMM log-likelihood scores heuristically by simply multiplying them before classifying. It is likely that a more principled method of combining centroid-distance and HMM loglikelihood - one possibly incorporating the data itself - will yield better results. This becomes especially important for gestures in which one component of the classifier performs significantly better than the other. Lastly, future work should address the possibility of using Cluster-Matching for automated segmentation of continuous accelerometer and gyroscope data. Current proposals use only heuristic methods and do not incorporate the trained gesture classifier, e.g. Prekopcsk demonstrates the feasibility of using a speed threshold to detect the onset of a gesture. HMMs do not provide an obvious means by which to use them for segmentation, as they will readily assign log-likelihoods to sequences of any length [3]. We suspect that Cluster-Matching can be computed over a sliding window on the data stream, providing a means by which to determine when a sequence of samples is being fit well by a subset of the clusters of one of the gesture cluster-sets. 6 Conclusions We thus show non trivial gains in accuracy with resect to our baseline classifier KHMM. These gains are accrued not only through the collection of a new set of features using gyroscope data, but also through our novel algorithms Cluster-Matching-HMM and Cluster-Matching. Interestingly enough it appears that erven though we are classifying sequential data, simply assigning cluster fitness scores to all the vectors in a gesture example, regardless of sequential order, returns the most accurate classifiers. In fact, the only times when this is not the case is when we synthetically manipulate the data files to purposely break our Cluster-Matching classifier. However, we could not mimic this break case using actual gesture data. We are thus confident in the ability of Cluster-Matching to accurately classify real human gestures. Regardless, we hope to have pushed forward the study and field of gesture recognition; not only by presenting classifiers with high accuracy, but also through the presentation of a new cluster based paradigm of classifiers. References [1] Klingmann, Marco. Accelerometer-Based Gesture Recognition with the iphone. Masters Thesis, Goldsmith University of London. 2009. [2] Wu, Jiahui and Pan, Gang and Zhang, Daqing and Qi, Guande and Li, Shijian. Gesture Recognition with a 3-D Accelerometer. 2009. [3] Zoltn Prekopcsk Accelerometer Based Real-Time Gesture Recognition. 2008. 5