, pp.124-128 http://dx.doi.org/10.14257/astl.2016.129.25 Feature Based Hybrid Neural Network for Hand Gesture Recognition HyeYeon Cho 1, Hyo-Rim Choi 1 and Taeyong Kim 1 1 Dept. of Advanced Imaging Science, Chung-Ang University, Heukseok-dong, Dongjak-gu, Seoul, 156-756, Korea nuage1009@gmail.com, funappear@nate.com, kimty@cau.ac.kr Abstract. This paper presents using neural network as a method for classifying hand gesture effectiveness in home appliances or serious games. Neural learning from imbalanced data has some difficulties, but, we presents Feature based Hybrid Neural Network(FHNN) that simple calculation can extract data distribution feature and add it to input layer for Neural learning. As data distribution feature, we used particular point of gestures to get approximate classification and extrema of gesture trajectory. The experimental results show that FHNN can outperform the compared methods. Keywords: Neural network, Kinect, Gesture recognition, HCI 1 Introduction Recently, progress in technology is booming in Human-Computer Interaction (HCI) research [1]. There are several techniques for interaction, where, camera based gesture recognition is used much in order to have a natural interaction method [2]. Previous researches used 2D image, but it lacks robustness of following environmental changes which makes depth based camera research to take a spotlight. Microsoft Kinect sensor provides joint orientation information for the skeletons tracked, that can be easily gesture recognition research [3]. But, previous research showed increase in database makes calculation slow, complex statistical method is needed. Daily used gestures, such as games, medical system, interplay between man and robot, hand signal, sign language, and so on. Those gestures really have many of data distribution in each class [4]. To make a widely spread hand gesture recognition effectively, we present a method with simple calculation that users can make easy feed-forward neural networks structure that has a learning algorithm using back-propagation. 1 This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2015R1D1A1A01058394) and by the Chung-Ang University's Cross Functional Team (CFT) Program under Brain Korea 21 PLUS Project in 2015. ISSN: 2287-1233 ASTL Copyright 2016 SERSC
2 Hand gesture recognition and neural networks There have been varied studies on hand gesture recognition ranging from Hidden Markov Models(HMM), Dynamic Time Warping(DTW), support vector machine (SVM), Neural Networks(NN). Although HMM has rich mathematical structure and its statistical model of sequence data that is suitable in various application field such as speech signal and gesture recognition, but it has discomfort in discretization of multi-dimensional data converted to one-dimensional data. DTW is an algorithm for measuring time and speed difference between two sequences although has a weakness of data increase to operation increase. SVM is a supervised learning model for gesture recognition but its speed and size has a limit in learning and testing phase. Neural Networks (NN) are typically organized in layers. NN trains and updates net-work until the output and target is matched [5]. Once network is trained, recognition and classification can be used in dataset test. It has advantages which require less training and possible to have non-linear classification [6] and [9]. There is a problem with overfitting and imbalanced data, but much research has been carried out on the problem [4] and [7]. NN is non-parametric model that is easier and faster to adopt than other models. We suggest hybrid structure neural network, using preextracted feature as input node to recognize many diverse gestures, is processed effectively and speedy. 3 Extract hand gesture trajectory and calculate extended node In this chapter, we suggest a method which makes diverse hand gestures effectively recognized that base on feed-forward neural network classifier. If there are many overlaps between gestures classes, recognition performance declines [4]. Therefore, to have a balance in data distribution, extract distribution of gesture with simple calculation and compose it as an extended node. Insert extended node to previous input node that contains only basic gesture trajectory data, showing in Figure 1. In this experiment, the extended nodes were used to approximate classification results from gesture trajectory distribution feature and extrema of trajectory by processing. At first, we compose training dataset and test dataset to spotted hand gesture in order to approximate classification. Process each distribution feature of gesture trajectory, which are part of the training dataset. Here, we used an average point of trajectory and end point of trajectory for distribution feature. Calculate each class of gesture s representative value, which is expressible by coordinates, and covariance for processed distribution feature by using a statistical model. Calculated representative value is A k and covariance is S k, where k is class of gesture. This statistical data is a standard gesture point for approximate classification. Extract distribution feature B from a hand gesture that composed of only gesture trajectory data. Such a method was previously used for process the distribution feature from the training dataset. At the approximate classification, repeatedly calculates Mahalanobis distances between extracted A k and B of a hand gesture as each k class. As a gesture belongs to certain class, Mahalanobis distance of class become shorter, then compose N as an extended node, where N is an index of 1 st nearest k class gesture. Another extended node is composed with trajectory extrema. Copyright 2016 SERSC 125
If gesture holds a large number of waving, it may invade other class that can fall off in recognition performance. With a number of extrema of gesture trajectory, you can earn a hint with simple calculation that shows how waving is the gesture. Unnecessary extrema can be detected on fine hand shake, even with less waving. In order to reduce this problem, Gaussian filter kernel for smoothing is extracted hand trajectory from a gesture. We use first derivative of the smoothed trajectory to calculate numbers of local extrema. And compose extrema as extended node from calculation. If there is to overlap in data distribution of each class of gesture, recognition performance will fall off [4]. However, simple calculation can make balance in data with composing extended node, with approximate classification data and numbers of extrema. Fig. 1. Extended nodes are added to the input nodes. 4 Experiments In the following experiment, of all 20 joints information earned by Kinect, functional game performed for bicycle hand signal used only wrist information to recognize hand motion. Studied left and right hand separately with Kinect and made 5 hand gestures while riding a bike. Total 500 learning data were established with 5 men, 20 input data in a hand per gesture, which makes 100 input data per man. Total 150 testing data used for recognition were established for 3 people who established learning data, made 10 input data per gesture, which made 50 testing data per man. Input node is composed with 40 hand trajectory points and 3 bias nodes, and hidden layer is composed with 26 layers that 5 targets. Table 1 represents the result of performance analysis of FNN, FHNN, and DTW. Test dataset is composed with data 126 Copyright 2016 SERSC
from learning dataset experiment. 1NN-DTW is a dynamic time warping, comparing with one-nearest-neighbor. KNN-DTW is dynamic-time-warping comparing with k- nearest neighbor. FNN is feed-forward neural network that is the most popular in neural network. FHNN is a suggested feature based hybrid neural network in this paper, composing extracted feature from gesture distribution to bias and add the feature to input layer to learn. Shown as Table 1 one of Input Parameters S is an extracted hand gesture trajectory data. E is extended node that is extracted from gesture distribution. FHNN is more accurate than FNN and DTW. 1NN-DTW is fast but less accurate and KNN-DTW is more accurate than 1NN-DTW, but is slow. Although takes 1.503 sec for FHNN to learn, once learned, can be classified fast in test dataset. Suggested method is suitable for the real-time classification. As a result, recognition rate for FNN is 95.4% and numbers of updates for weight is 31. Recognition rate for FHNN is 97.6%and numbers of updates for weight is 18, which shows higher recognition rate and less updates. Table 1. Accuracy of hand gesture recognition. Technique Input data Accuracy (%) 1NN-DTW S 86.66 KNN-DTW S 92.00 FNN S 87.33 FHNN S, E 98.00 3 Conclusions In this thesis, with adding extracted extended feature to input layer made FHNN is more accurate to classify various gesture and also faster recognition speed which fits for live classification. Not just simple gesture, but also waving can be recognized, however z-axis leaned complex dynamic gesture will need more experiments. Therefore, In the future, research plan will handle controlling home appliances or apply to serious game with more gestures in order to live more convenient. References 1. Aggarwal1. Aggarwal, J. K., & Park, S.: Human motion: Modeling and recognition of actions and interactions. In 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004. Proceedings. 2nd International Symposium on, pp. 640--647. IEEE (2004) 2. Roomi, S. M. M., Priya, R. J., & Jayalakshmi, H.: Hand gesture recognition for humancomputer interaction. Journal of Computer Science, 6(9), 1002--1007 (2010) 3. Xia, L., Chen, C. C., & Aggarwal, J. K.: Human detection using depth information by Kinect. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 15--22. IEEE (2011) 4. Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In AAAI workshop on learning from imbalanced data sets, Vol. 68, pp. 10--15 (2000) Copyright 2016 SERSC 127
5. Bishop, C. M.: Neural networks for pattern recognition. Oxford University press (1995) 6. Tu, J. V.: Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology, 49(11), 1225- -1231 (1996) 7. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929--1958 (2014) 128 Copyright 2016 SERSC