Feature Based Hybrid Neural Network for Hand Gesture Recognition

Similar documents
Human Emotion Recognition From Speech

Speech Emotion Recognition Using Support Vector Machine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Word Segmentation of Off-line Handwritten Documents

A Case Study: News Classification Based on Term Frequency

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Lecture 1: Machine Learning Basics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning With Negation: Issues Regarding Effectiveness

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Reducing Features to Improve Bug Prediction

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

A study of speaker adaptation for DNN-based speech synthesis

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning From the Past with Experiment Databases

Speech Recognition by Indexing and Sequencing

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods for Fuzzy Systems

Knowledge Transfer in Deep Convolutional Neural Nets

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Assignment 1: Predicting Amazon Review Ratings

Time series prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

arxiv: v1 [cs.lg] 15 Jun 2015

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

(Sub)Gradient Descent

INPE São José dos Campos

Rule Learning with Negation: Issues Regarding Effectiveness

WHEN THERE IS A mismatch between the acoustic

Learning Methods in Multilingual Speech Recognition

Data Fusion Models in WSNs: Comparison and Analysis

Device Independence and Extensibility in Gesture Recognition

Calibration of Confidence Measures in Speech Recognition

Australian Journal of Basic and Applied Sciences

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Activity Recognition from Accelerometer Data

Evolutive Neural Net Fuzzy Filtering: Basic Description

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Artificial Neural Networks written examination

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Multivariate k-nearest Neighbor Regression for Time Series data -

Speech Recognition at ICSI: Broadcast News and beyond

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

SARDNET: A Self-Organizing Feature Map for Sequences

Lecture 1: Basic Concepts of Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

Probabilistic Latent Semantic Analysis

Test Effort Estimation Using Neural Network

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Issues in the Mining of Heart Failure Datasets

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Mining Association Rules in Student s Assessment Data

CS Machine Learning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

On-Line Data Analytics

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Switchboard Language Model Improvement with Conversational Data from Gigaword

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

A Case-Based Approach To Imitation Learning in Robotic Agents

Axiom 2013 Team Description Paper

Speaker Identification by Comparison of Smart Methods. Abstract

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Lecture 10: Reinforcement Learning

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Action Recognition and Video

Classification Using ANN: A Review

Why Did My Detector Do That?!

Soft Computing based Learning for Cognitive Radio

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Universidade do Minho Escola de Engenharia

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Model Ensemble for Click Prediction in Bing Search Ads

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

arxiv: v2 [cs.cv] 30 Mar 2017

Transcription:

, pp.124-128 http://dx.doi.org/10.14257/astl.2016.129.25 Feature Based Hybrid Neural Network for Hand Gesture Recognition HyeYeon Cho 1, Hyo-Rim Choi 1 and Taeyong Kim 1 1 Dept. of Advanced Imaging Science, Chung-Ang University, Heukseok-dong, Dongjak-gu, Seoul, 156-756, Korea nuage1009@gmail.com, funappear@nate.com, kimty@cau.ac.kr Abstract. This paper presents using neural network as a method for classifying hand gesture effectiveness in home appliances or serious games. Neural learning from imbalanced data has some difficulties, but, we presents Feature based Hybrid Neural Network(FHNN) that simple calculation can extract data distribution feature and add it to input layer for Neural learning. As data distribution feature, we used particular point of gestures to get approximate classification and extrema of gesture trajectory. The experimental results show that FHNN can outperform the compared methods. Keywords: Neural network, Kinect, Gesture recognition, HCI 1 Introduction Recently, progress in technology is booming in Human-Computer Interaction (HCI) research [1]. There are several techniques for interaction, where, camera based gesture recognition is used much in order to have a natural interaction method [2]. Previous researches used 2D image, but it lacks robustness of following environmental changes which makes depth based camera research to take a spotlight. Microsoft Kinect sensor provides joint orientation information for the skeletons tracked, that can be easily gesture recognition research [3]. But, previous research showed increase in database makes calculation slow, complex statistical method is needed. Daily used gestures, such as games, medical system, interplay between man and robot, hand signal, sign language, and so on. Those gestures really have many of data distribution in each class [4]. To make a widely spread hand gesture recognition effectively, we present a method with simple calculation that users can make easy feed-forward neural networks structure that has a learning algorithm using back-propagation. 1 This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2015R1D1A1A01058394) and by the Chung-Ang University's Cross Functional Team (CFT) Program under Brain Korea 21 PLUS Project in 2015. ISSN: 2287-1233 ASTL Copyright 2016 SERSC

2 Hand gesture recognition and neural networks There have been varied studies on hand gesture recognition ranging from Hidden Markov Models(HMM), Dynamic Time Warping(DTW), support vector machine (SVM), Neural Networks(NN). Although HMM has rich mathematical structure and its statistical model of sequence data that is suitable in various application field such as speech signal and gesture recognition, but it has discomfort in discretization of multi-dimensional data converted to one-dimensional data. DTW is an algorithm for measuring time and speed difference between two sequences although has a weakness of data increase to operation increase. SVM is a supervised learning model for gesture recognition but its speed and size has a limit in learning and testing phase. Neural Networks (NN) are typically organized in layers. NN trains and updates net-work until the output and target is matched [5]. Once network is trained, recognition and classification can be used in dataset test. It has advantages which require less training and possible to have non-linear classification [6] and [9]. There is a problem with overfitting and imbalanced data, but much research has been carried out on the problem [4] and [7]. NN is non-parametric model that is easier and faster to adopt than other models. We suggest hybrid structure neural network, using preextracted feature as input node to recognize many diverse gestures, is processed effectively and speedy. 3 Extract hand gesture trajectory and calculate extended node In this chapter, we suggest a method which makes diverse hand gestures effectively recognized that base on feed-forward neural network classifier. If there are many overlaps between gestures classes, recognition performance declines [4]. Therefore, to have a balance in data distribution, extract distribution of gesture with simple calculation and compose it as an extended node. Insert extended node to previous input node that contains only basic gesture trajectory data, showing in Figure 1. In this experiment, the extended nodes were used to approximate classification results from gesture trajectory distribution feature and extrema of trajectory by processing. At first, we compose training dataset and test dataset to spotted hand gesture in order to approximate classification. Process each distribution feature of gesture trajectory, which are part of the training dataset. Here, we used an average point of trajectory and end point of trajectory for distribution feature. Calculate each class of gesture s representative value, which is expressible by coordinates, and covariance for processed distribution feature by using a statistical model. Calculated representative value is A k and covariance is S k, where k is class of gesture. This statistical data is a standard gesture point for approximate classification. Extract distribution feature B from a hand gesture that composed of only gesture trajectory data. Such a method was previously used for process the distribution feature from the training dataset. At the approximate classification, repeatedly calculates Mahalanobis distances between extracted A k and B of a hand gesture as each k class. As a gesture belongs to certain class, Mahalanobis distance of class become shorter, then compose N as an extended node, where N is an index of 1 st nearest k class gesture. Another extended node is composed with trajectory extrema. Copyright 2016 SERSC 125

If gesture holds a large number of waving, it may invade other class that can fall off in recognition performance. With a number of extrema of gesture trajectory, you can earn a hint with simple calculation that shows how waving is the gesture. Unnecessary extrema can be detected on fine hand shake, even with less waving. In order to reduce this problem, Gaussian filter kernel for smoothing is extracted hand trajectory from a gesture. We use first derivative of the smoothed trajectory to calculate numbers of local extrema. And compose extrema as extended node from calculation. If there is to overlap in data distribution of each class of gesture, recognition performance will fall off [4]. However, simple calculation can make balance in data with composing extended node, with approximate classification data and numbers of extrema. Fig. 1. Extended nodes are added to the input nodes. 4 Experiments In the following experiment, of all 20 joints information earned by Kinect, functional game performed for bicycle hand signal used only wrist information to recognize hand motion. Studied left and right hand separately with Kinect and made 5 hand gestures while riding a bike. Total 500 learning data were established with 5 men, 20 input data in a hand per gesture, which makes 100 input data per man. Total 150 testing data used for recognition were established for 3 people who established learning data, made 10 input data per gesture, which made 50 testing data per man. Input node is composed with 40 hand trajectory points and 3 bias nodes, and hidden layer is composed with 26 layers that 5 targets. Table 1 represents the result of performance analysis of FNN, FHNN, and DTW. Test dataset is composed with data 126 Copyright 2016 SERSC

from learning dataset experiment. 1NN-DTW is a dynamic time warping, comparing with one-nearest-neighbor. KNN-DTW is dynamic-time-warping comparing with k- nearest neighbor. FNN is feed-forward neural network that is the most popular in neural network. FHNN is a suggested feature based hybrid neural network in this paper, composing extracted feature from gesture distribution to bias and add the feature to input layer to learn. Shown as Table 1 one of Input Parameters S is an extracted hand gesture trajectory data. E is extended node that is extracted from gesture distribution. FHNN is more accurate than FNN and DTW. 1NN-DTW is fast but less accurate and KNN-DTW is more accurate than 1NN-DTW, but is slow. Although takes 1.503 sec for FHNN to learn, once learned, can be classified fast in test dataset. Suggested method is suitable for the real-time classification. As a result, recognition rate for FNN is 95.4% and numbers of updates for weight is 31. Recognition rate for FHNN is 97.6%and numbers of updates for weight is 18, which shows higher recognition rate and less updates. Table 1. Accuracy of hand gesture recognition. Technique Input data Accuracy (%) 1NN-DTW S 86.66 KNN-DTW S 92.00 FNN S 87.33 FHNN S, E 98.00 3 Conclusions In this thesis, with adding extracted extended feature to input layer made FHNN is more accurate to classify various gesture and also faster recognition speed which fits for live classification. Not just simple gesture, but also waving can be recognized, however z-axis leaned complex dynamic gesture will need more experiments. Therefore, In the future, research plan will handle controlling home appliances or apply to serious game with more gestures in order to live more convenient. References 1. Aggarwal1. Aggarwal, J. K., & Park, S.: Human motion: Modeling and recognition of actions and interactions. In 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004. Proceedings. 2nd International Symposium on, pp. 640--647. IEEE (2004) 2. Roomi, S. M. M., Priya, R. J., & Jayalakshmi, H.: Hand gesture recognition for humancomputer interaction. Journal of Computer Science, 6(9), 1002--1007 (2010) 3. Xia, L., Chen, C. C., & Aggarwal, J. K.: Human detection using depth information by Kinect. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 15--22. IEEE (2011) 4. Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In AAAI workshop on learning from imbalanced data sets, Vol. 68, pp. 10--15 (2000) Copyright 2016 SERSC 127

5. Bishop, C. M.: Neural networks for pattern recognition. Oxford University press (1995) 6. Tu, J. V.: Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology, 49(11), 1225- -1231 (1996) 7. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929--1958 (2014) 128 Copyright 2016 SERSC