Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

Similar documents
Neural Network Model of the Backpropagation Algorithm

Fast Multi-task Learning for Query Spelling Correction

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

More Accurate Question Answering on Freebase

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

MyLab & Mastering Business

1 Language universals

Lecture 1: Machine Learning Basics

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

INPE São José dos Campos

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Methods in Multilingual Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Rule Learning With Negation: Issues Regarding Effectiveness

Semi-Supervised Face Detection

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

On-Line Data Analytics

Protocol for using the Classroom Walkthrough Observation Instrument

Evolutive Neural Net Fuzzy Filtering: Basic Description

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Case-Based Approach To Imitation Learning in Robotic Agents

Comparison of network inference packages and methods for multiple networks inference

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lecture 10: Reinforcement Learning

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Rule Learning with Negation: Issues Regarding Effectiveness

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Australian Journal of Basic and Applied Sciences

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Learning Methods for Fuzzy Systems

ACTIVITY: Comparing Combination Locks

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Human Emotion Recognition From Speech

Visual CP Representation of Knowledge

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Bluetooth mlearning Applications for the Classroom of the Future

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Reinforcement Learning by Comparing Immediate Reward

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Reducing Features to Improve Bug Prediction

Speech Recognition at ICSI: Broadcast News and beyond

Axiom 2013 Team Description Paper

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

What is beautiful is useful visual appeal and expected information quality

2 nd grade Task 5 Half and Half

Artificial Neural Networks written examination

Measurement. When Smaller Is Better. Activity:

Cal s Dinner Card Deals

Abstractions and the Brain

Appendix L: Online Testing Highlights and Script

Mathematics subject curriculum

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

On the Combined Behavior of Autonomous Resource Management Agents

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Mathematics Success Level E

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

Briefing for Parents on SBB, DSA & PSLE

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

National Survey of Student Engagement at UND Highlights for Students. Sue Erickson Carmen Williams Office of Institutional Research April 19, 2012

Update on the Next Accreditation System Drs. Culley, Ling, and Wood. Anesthesiology April 30, 2014

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS Machine Learning

End-of-Module Assessment Task K 2

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

WHEN THERE IS A mismatch between the acoustic

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Field Experience Management 2011 Training Guides

Mathematics process categories

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

UK Institutional Research Brief: Results of the 2012 National Survey of Student Engagement: A Comparison with Carnegie Peer Institutions

Functional Skills Mathematics Level 2 assessment

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

NCEO Technical Report 27

E-learning Strategies to Support Databases Courses: a Case Study

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Schoology Getting Started Guide for Teachers

An Online Handwriting Recognition System For Turkish

Software Maintenance

Introduction to Causal Inference. Problem Set 1. Required Problems

Procedia - Social and Behavioral Sciences 93 ( 2013 ) rd World Conference on Learning, Teaching and Educational Leadership WCLTA 2012

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Transcription:

Journal of Applied Research and Technology ISSN: 1665-6423 jar@aleph.cinsrum.unam.mx Cenro de Ciencias Aplicadas y Desarrollo Tecnológico México Avilés-Arriaga, H.H.; Sucar-Succar, L.E.; Mendoza-Durán, C.E.; Pineda-Corés, L.A. A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion Journal of Applied Research and Technology, vol. 9, núm. 1, abril, 2011, pp. 81-102 Cenro de Ciencias Aplicadas y Desarrollo Tecnológico Disrio Federal, México Available in: hp://www.redalyc.org/ariculo.oa?id47419311007 How o cie Complee issue More informaion abou his aricle Journal's homepage in redalyc.org Scienific Informaion Sysem Nework of Scienific Journals from Lain America, he Caribbean, Spain and Porugal Non-profi academic projec, developed under he open access iniiaive

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion H.H. Avilés-Arriaga* 1, L.E. Sucar-Succar 2, C.E. Mendoza-Durán 3, L.A. Pineda-Corés 4 1,4 Deparmen of Compuer Science, Insiuo de Invesigaciones en Maemáicas Aplicadas y en Sisemas, Universidad Nacional Auónoma de México Circuio Escolar, Ciudad Universiaria, 04510 Mexico Ciy, Mexico *haviles@live.com 2 Compuer Science Deparmen, Insiuo Nacional de Asrofísica, Ópica y Elecrónica, Luis Enrique Erro 1, 72840 Tonanzinla, Mexico 3 Universidad Anáhuac (México Nore), Av. Universidad Anáhuac, Núm. 46, Col. Lomas Anáhuac, 52786 Huixquilucan, Mexico ABSTRACT In his paper we presen a sudy o assess he performance of dynamic naive Bayesian classifiers (DNBCs) versus sandard hidden Markov models (HMMs) for gesure recogniion. DNBCs incorporae explici condiional independence among gesure feaures given saes ino HMMs. We show ha his facorizaion offers compeiive classificaion raes and error dispersion, i requires fewer parameers and i improves raining ime considerably in he presence of several aribues. We propose a se of qualiaive and naural se of posure and moion aribues o describe gesures. We show ha hese posure-moion feaures increase recogniion raes significanly in comparison o moion feaures. Addiionally, an adapive skin deecion approach o cope wih muliple users and differen lighing condiions is proposed. We performed one of he mos exensive experimenaion presened in he lieraure o dae ha considers gesures of a single user, muliple people and wih variaions on disance and roaion using a gesure daabase wih 9441 examples of 9 differen classes performed by 15 people. Resuls show he effeciveness of he overall approach and he reliabiliy of DNBCs in gesure recogniion. Keywords: Gesure recogniion, hidden Markov models, moion analysis, visual racking. RESUMEN En ese documeno se compara el desempeño de los clasificadores Bayesianos dinámicos simples (CBDSs) y los modelos oculos de Markov (MOM) en el reconocimieno visual de ademanes. Los CBDSs exienden a los MOM incorporando suposiciones de independencia condicional enre los aribuos dado el esado del modelo. Esa facorización ofrece porcenajes de clasificación y dispersión de error compeiivos, un menor número de parámeros para el modelo y una mejora considerable del iempo de enrenamieno. Para describir los gesos se propone un conjuno de aribuos simples de posura y movimieno que incremenan el porcenaje de reconocimieno en comparación a modelos que sólo uilizan información de movimieno. Adicionalmene, se propone un esquema de deección de color de piel adapaivo para considerar diferenes usuarios y condiciones de iluminación. Se describe uno de los conjunos de experimenos más exhausivos presenados en la lieraura de reconocimieno de gesos hasa el momeno que incluyen gesos de un usuario, de diferenes personas, con variaciones de disancia y de roación. Se presena ambién una base de daos con 9441 ejemplos de 9 gesos de 15 personas. Los resulados muesran la efecividad de esa aproximación y la confiabilidad de los CBDSs en el reconocimieno de gesos. 1. Inroducion Hidden Markov models are successful and widely used classifiers in gesure recogniion [1,2,3,4]. In he presence of several aribues, however, observaion probabiliy funcions of HMMs imply condiional dependence among aribues given he sae. This makes difficul o visualize independence relaionships of aribues and heir saisical behavior. Clariy in knowledge descripion is essenial o a beer undersanding of gesure execuion and recogniion processes. Naive Journal of Applied Research and Technology 81

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Bayesian classifiers (NBCs) srongly relax he assumpion of condiional dependence. This facorizaion improves clariy in he descripion of he aribues and decreases he number of parameers o be esimaed. Moreover, NBCs are compeiive o oher more complex probabilisic and non-probabilisic classifiers, even when condiional independence does no hold [5,6]. However, in conras wih HMMs, NBCs do no naurally cope wih sequenial daa. For hese reasons, a model ha combines he advanages of HMMs and NBCs in gesure recogniion is desirable. In his paper, we presen an exensive empirical comparison beween dynamic naive Bayesian classifiers [7] and HMMs for gesure recogniion. DNBCs incorporae he concep of condiional independence among aribues given he sae ino sandard HMMs o combine he descripiveness capabiliies of NBCs wih he capaciy of HMMs o model daa sreams. These models have received diverse names and used in differen problem domains in he pas. For example, mulidimensional HMMs (MDHMMs) [8] o mix various sources of informaion in modeling eleoperaion asks for a robo manipulaor; hybrid Naive Bayes HMMs (HNBHMMs) [9] o merge individual word informaion o classify muli-page documens; Oupu HMMs (OHMMs) [10] o combine exper's opinions in gene classificaion; and muliobservaion HMMs (MOHMMs) [11] for fusioning behavioral paerns in he deecion of abnormal acions in scenes. More recenly, similar ideas have been applied successfully in aciviy recogniion [12,13]. In all hese works, facorizaion is proposed for mixing various sources of informaion. However, his is implicily done in common HMMs applicaions in which, as saed above, each observaion is defined by he conjuncion of each feaure value. By conras, in our work we emphasize he imporance of DNBCs o decrease he number of parameers of he model and improve raining ime, clariy in he represenaion of he aribues and o allow srucural learning and feaure selecion [14,15]. Addiionally, no mehodical and sysemaic experimenal evidence on he performance of his exension in comparison o sandard HMMs in gesure recogniion has been presened in he lieraure. We propose o describe gesures in erms of a se of qualiaive and fairly simple discree moion and posure feaures. We show ha posure and moion aribues increase recogniion raes in comparison o models wih moion feaures only, wih gesures: a) aken from a single person, b) from muliple people, and c) wih variaions on disance and roaions. In addiion, we describe a monocular visual sysem wih a simple adapive skin color sraegy o cope wih differen users and lighing condiions. This visual sysem was used o consruc a gesure daabase ha comprises 9441 gesures samples of 9 gesures classes execued by 15 people used in our experimenaion. This is one of he more exensive se of experimens o compare probabilisic graphical models in gesure recogniion, wih one of he highes number of gesure samples documened in he lieraure o dae. Our resuls demonsrae he compeiiveness of DNBCs in comparison o sandard HMMs o learn, represen and classify gesures, and he effeciveness of he overall approach in he gesure recogniion problems described above. Early resuls were presened in [16]. Here, we elaborae new experimens and resuls along wih several improvemens and correcions o our mehodology. The main conribuions of his paper are 1) DNBCs ha provide compeiive recogniion resuls and efficien learning, 2) a se of simple and naural posure and moion feaures o effecively describe gesures, 3) improvemen of posure-moion feaures o describe gesures, and 4) a more complee se of experimens han previous work presened in lieraure o dae. 1.1 Ouline This documen is organized as follows: Secion 2 reviews various exensions o HMMs and differen alernaives for he selecion of gesure feaures. In secion 3, we describe DNBCs. The adapive sraegy of our visual sysem is presened in secion 4. Secion 5 and 6 describe our gesure daabase and posure and moion feaures, respecively. Experimens and resuls for he validaion of DNBCs and heir comparison o HMMs, and a brief discussion, are described in Secion 7. Finally, Secion 8 summarizes our conclusions. 2. Relaed work 2.1 HMMs Classifiers for gesure recogniion HMMs describe saisical properies of dynamic gesures, wih well-known probabiliy esimaion 82 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 algorihms for learning and recogniion [17] -See Fig. 1a for a Bayesian nework descripion of hese models; shaded nodes mean hidden variables. Several exensions o sandard HMMs have been proposed o deal wih paricular issues in gesure recogniion. Parameric HMMs (PHMMs) [18] represen gesures ha involve spaial variaions in heir execuion -e.g., ''This lengh'' or ''Go here''. In PHMMs, observaion variables are condiioned o he sae variable and one or more parameers ha accoun for such variaions -Fig. 1b. Parameer values are known and consan on raining. On esing, values ha maximize he likelihood of he PHMM are recovered via a ailored EM algorihm. Coupled HMMs (CHMMs) [19] join HMMs by inroducing condiional dependencies beween sae variables -see Fig. 1c. These models are suiable o represen influences beween subprocesses ha occur in parallel -e.g., wo-hand gesures. Inpu-Oupu HMMs (IOHMMs) [20] consider an exra ''inpu'' parameer ha affecs he saes of he Markov chain, and opionally, observaion variables, -Fig. 1d. The inpu variable corresponds o he gesure observaions. The oupu signal of IOHMMs is he class of he gesure ha is being execued. A single IOHMM can describe a complee se of gesure classes. Parallel HMMs (PaHMMs) [21] require fewer HMMs han CHMMs for composie processes, by assuming muual independence beween HMMs -Fig. 1e. The idea is o consruc independen HMMs for he possible moions of each hand and combine hem by muliplying heir individual likelihoods. PaHMMs wih he mos probable join likelihood define he desired class. Hierarchical hidden Markov models (HHMMs) [22] arrange HMMs ino layers a differen levels of absracion -Fig. 1f. In a wo-layer HHMMs, he lower layer is a se of HMMs ha represens sub-gesure sequences. The upper layer is a Markov chain ha governs he dynamics of hese sub-gesures. Layering allows re-using he basic HMMs simply by changing upper layers. Mixed-sae dynamic Bayesian neworks (MSDBNs) [23] combine discree and coninuous sae spaces ino a wo-layer srucure. MSDBNs are composed by a HMM in he upper layer and a linear dynamic sysem (LDS) in he lower layer. LDS is used o model ransiions beween real-valued saes. Oupu values of he HMM drive he linear sysem - Fig. 1g. In MSDBNs, HMMs can describe discree high-level conceps, such as a gesure grammar, while he LDS describes he moion of he hand in a coninuous-sae space. Hidden semi-markov models (HSMMs) [24] exploi emporal knowledge of he process by defining explici duraions on each sae -Fig. 1h. HSMMs are suiable o avoid an exponenial decay of he sae probabiliies when modeling large observaion sequences. More recenly, derivaions of HMMs ha incorporae some of he characerisics presened above have been proposed as well [25,26]. Parially observable Markov decision processes (POMDPs) [27] generalize HMMs by including acion and reward funcions Fig. 1i. The POMDP framework is usually used o quanify he ''convenience'' of he saes of a sysem alhough is real siuaion is no compleely known, and hence, o plan acions o reach a goal sae. In [28] POMDPs are focused on acions o infer: i) he reacion o be aken in response o a gesural simulus, ii) he cause ha generaes a gesure, or iii) decisions o maximize he reurn in a cooperaive game beween wo players using gesure communicaion. HMMs-based archiecures have been successfully applied o challenging problems faced by novel applicaions of gesure recogniion. In general, hese approaches incorporae new variables o represen specific conceps ino he sandard HMMs framework, or facor he sae space ino various Markov chains o simplify is represenaion. Despie he usefulness of his framework, lile aenion has been paid o oher imporan aspecs of he problem, such as facorizaion o reduce he number of parameers of gesure feaures and o improve is descripion, and he evaluaion of his exension in gesure recogniion. 2.2 Gesure feaures The selecion of accurae and general gesure feaures is one of he mos pursued goals in gesure recogniion [29,30,31,32,33,34]. In pracice, feaures are seleced according o he characerisics of he gesures, and he applicaion domain. Roughly speaking, alernaives o describe gesures can be divided in a) moion feaures, b) posure aribues, and c) posure-moion feaures. In he early 70s, Johannson [35] showed ha isolaed visible poins over he joins of human acors in moion are enough o infer posures and aciviies. He named his visual phenomenon biological moion. Afer his findings, many auhors have focused on feaures ha emphasize moion Journal of Applied Research and Technology 83

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Figure 1. Bayesian neworks represenaion of (a) sandard HMMs wih sae and observaion variables S and O, respecively, (b) PHMMs wih a single parameer θ, (c) 2-Coupled HMMs, (d) IOHMMs wih he inpu parameer I, (e) PaHMMs wih wo independen HMMs, (f) HHMMs wih he Markov chain ransiions in he upper layer denoed by U and U +1, (g) MSDBNs wih a HMM in he upper layer, and L a LDS in he lower layer indicaed by and L +1 ; in his case, Y and Y +1 correspond o D observaions obained from he process, (h) HSMMs wih duraion variables and D + 1, (i) POMDPs wih an acion funcion A and a reward funcion R. Shaded nodes indicae hidden saes. Dashed arrows indicae opional dependencies. Models are unrolled wo imes only when required. signals as he core informaion o describe gesures [36,37,38]. One represenaive example is emporal emplaes [39]. This echnique -inspired in sroboscopic phoography- collapses ''moion appearance'' ino a single image, wihou regarding posure informaion o classify aciviies. Anoher alernaive is o see gesures as sequences of body posures [40] or global image coordinaes -e.g., ''raw'' (x,y) daa. In accordance o his scheme, some neurobiological experimens [41] have shown ha moion informaion may be inferred from form simuli, more han from form moion, as i was suggesed by Johannson's work. These resuls have generaed a live research field in feaure selecion from he neurobiological poin of view [42,43,44]. Sokoe [45] suggess ha gesures are characerized by moion, posure, orienaion and posiion. In his form, some approaches have described gesures wih a hybrid se of posure- 84 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 moion feaures [46,47,48]. However, mos of hese proposals focus on he archiecural design of he classifiers, wihou regarding on he discriminaory power of he feaures. This is usual in gesure recogniion, where feaures are evaluaed in conjucion wih classifiers as a whole. Only a few ess for comparison of posure and moion feaures have been repored in lieraure. Campbell e al. [49] conduced experimens o es en differen feaure ses based on posure -e.g., raw daa, or polar coordinaes- and moion informaion -i.e., Caresian, polar velociies, insananeous speed and local curvaure using HMMs. Their resuls showed ha velociy-based feaures obained beer recogniion raes han posure daa. Vogler and Meaxas [50] presened a comparison of en feaure ses of 2D and 3D posure and moion aribues for he classificaion of ASL. The aribues are similar o hose used by [49] and include (x,y,z) daa, polar and spherical coordinaes of he hands, and is derivaives. However, in conras o Campbell's work, heir resuls showed ha posure aribues slighly ouperformed velociy aribues. Recenly, coincidenly in ime o our evaluaion on he combinaion of posure and moion feaures [16], Ahmad e al. [51] mixed 2D opical flow wih a descripion of he human body shape based on principal componen analysis for aciviy recogniion. Their resuls were similar o our findings. These auhors showed ha he combinaion of posure and moion informaion improves recogniion raes in aciviy recogniion in comparison o models ha consider posure or moion feaures only. However, i is difficul o draw srong conclusions from heir resuls due o he small number of gesure classes and examples. Because of his, more exensive and conclusive experimens showing he imporance of he combinaion of posure and moion daa on differen gesure recogniion problems, and how hese aribues can be represened, is sill required. The approach presened in his documen is a conribuion o solve hese problems. 3. Dynamic naive bayesian classifiers In order o describe dynamic naive Bayesian classifiers, consider firs a sequence S S 1,,T ha is a realizaion of he saes of he process, wher 1 S N being N he number of possible saes; and, a sequence A A,T A A m 1, wh ere each 1 m M is a se of M aribue values generaed by he process a sae S. Superscrips m idenify a specific aribue, in our case, an individual gesure feaure. m Each aribue A can be eiher discree or coninuous, alhough in his paper we consider he finie discree case only; le m m m m m A k 1 k K where K be he possible values of each aribue m. A dynamic naive Bayesian classifier has he join probabiliy funcion: T 1 P(A, S) P(S ) 1 P(S+ 1 S ) P(A 1 1 m 1 (1) T M m S where P(S 1 ) is he prior probabiliy value of being a sae S 1 a ime 1, P(S + 1 S ) is he ransiion probabiliy beween classes S and S +1, m and, P(A S ) is he probabiliy funcion of he observed feaure m a ime given he class S. DNBCs follow wo main assumpions: i) he firs-order Markov propery, and ii) he process is saionary. A DNBC is denoed as λ P(S ),P(S S ),P(A S. The main difference beween he DNBCs and HMMs probabiliy funcions [17] is he produc M m 1 P(A 1 + 1 ) m S ) ha sands for he assumpion of condiional independence among aribues given he class -HMMs implicily assume a join P(A S ). If only a single aribue is considered or his aribue is produc of he concaenaion of several feaures, M 1 and (1) reduces o he join probabiliy funcion of a sandard HMM. Figure 2 shows a DNBC unrolled wo imes wih hree aribues. ) Journal of Applied Research and Technology 85

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Figure 2. Graphical represenaion of a DNBC unrolled 2 imes wih 3 aribues. 3.1 Parameer learning As usual in HMMs applicaions, he complee daa pair (A, S) is no available and only A is accessible. Maximum likelihood esimaion (ML) [17] is a common crierion for he selecion of he parameers λ ha bes explain he observed and unseen daa. This parameer learning process can be performed for DNBCs by means of he Baum- Welch algorihm [52] o ieraively improve P(A) unil no relevan difference in consecuive likelihoods of he model is found. Equaions o compue new expecaions λ' can be derived using he Baum's auxiliary funcion as described in [53]. Following his procedure, re-esimaion formulas are P(A,S1 λ) P(S 1 λ' ), 1 S 1 N, (2) P(A λ) for prior saes' probabiliies. Transiion probabiliies are calculaed as T 1 P(A,S+1,S λ) 1 P(S+ 1 S,λ' ) P(A,S λ), 1 S,S+ 1 N, and finally, for each aribue P(A k S,λ' ) 1 k K T (4) 1 A : P(A,S T 1 λ)δ P(A,S, 1 S N A λ),k (3), where δ 1 iffa A k, and 0 oherwise.,k P(S + 1 S and P(k S ) Given ha parameers ) do no depend on ime ; hence P(S 2 j S1 i) P(S + 1 j S i) for [ 2, T 1] and i, j, and P(k S i), for [ 2, T] and i. The esimaion of he previous disribuions is based on he well-known variables forward α,i, backward β,j, and ξ,i,j, he join probabiliy of moving from sae i o sae j a ime. The compuaion of hese variables mus be modified o reflec he fac ha P(A,S λ) M m 1 P(A variable is reformulaed as α,i P(A,,S [ [ N j 1 N j 1 α α 1, j 1, j i λ) P(S P(S i i S,λ). The forward S S 1 1 (5) j,λ)]p(a S j,λ)] M m 1 P(A i,λ) S The backward variable is compued as follows: β,j P(A,A N i 1 N i 1 P(S P(S +1 + 1 + 1 and finally, ξ,i,j is,a +2,,A S j,λ) i S j)p(a i S j)[ (6) T M + 1 m 1 S P(A + 1 + 1 i)β S + 1 + 1, i i)β i,λ) + 1, i ], 86 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 4. Visual sysem ξ,i,j P(S N i 1 j 1 N + 1 α,i N α N,i i 1 j 1 j,s A,λ) P(S α α,i P(S,i + 1 P(S + 1 P(S + 1 + 1 j S i)p(a j j S i) j M + 1 S i)p(a m 1 M S i) P(A m 1 S + 1 + 1 + 1 S P(A j)β + 1 S + 1 + 1 S + 1, j j)β j)β + 1 + 1, j + 1, j j)β + 1, j A monocular visual sysem based on an adapive skin deecion scheme o deal wih differen users was developed. The sysem is iniiaed wih a person sanding in a res posiion, a a disance beween 1.5m and 4m in fron of he video camera. Face deecion is performed using he face deecor algorihm presened in [55]. The image subregions where i is expeced o find he righ-hand and orso of he person are esimaed wih body proporions based on face dimensions [56]. Figure 3 shows a resul of his procedure. (7) Parameer adjusmen mus be done for each aribue independenly; however, facorizaion does no affec considerably he number of operaions o compue he inermediae parameers of he Baum- Welch algorihm. For example, he number of muliplicaions performed o compue forward variables is N(N + 1)(T 1) for sandard HMMs; for DNBCs, his number increases only o N(N + M)(T 1). Nowihsanding, as we will show below, aribue facorizaion reduces imporanly he raining ime required by DNBCs in comparison o HMMs. Scaling and muliple observaions sequences can be considered by he mehod proposed in [54]. 3.2 Classificaion Classificaion of a sequence of aribue observaions A is as usual. Given a se of L DNBCs λ i i 1,,L,, each of hem rained wih samples of a paricular gesure class, compue P(A λ i ) N j 1 P(A,S λ ) j i N j 1 α T,j for each λ i, by using he Forward algorihm. I is assumed ha he λ i wih higher probabiliy, i.e., corresponds argmax λ P(A λi ), o he gesure class ha has i been execued. Figure 3. Esimaion of he orso and hand posiions of he person. Hand segmenaion and racking proceeds as follows: We consruced a Bayes classifier accordingly o [57] o label pixel colors in he rgb space as skin or non-skin. We sampled 1,975,242 skin pixels aken from 30 people and 19,552,655 non-skin pixels under various lighing condiions o build general skin and non-skin probabiliy funcions, P g (rgb skin) and P g (rgb skin), respecively. Thiry-wo class inervals were defined for each $rgb$ color channel. Once he hand is deeced, a small skin-color search window is applied for is racking. A direc likelihood comparison rule Pg (rgb skin)> P g(rgb skin) was used o speed up he sysem. This approach worked well over four years in several demonsraions in our Lab [58]. A video ha shows he applicaion of his sysem for eleconrolling a Journal of Applied Research and Technology 87

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 mobile robo can be found a hp://www.youube.com/wach?vopauo0zjhgy. However, a new camera and esing environmen wih inense whie lighing and whie walls caused he camera o perceive low-sauraed colors ha did no correspond o he probabiliy disribuions iniially creaed. Because of his, he visual sysem was unable o locae he hand accuraely, even wih oher color models such as Hue-Sauraion-Value. To deal wih his problem and differen users, an adapive scheme was developed by combining he general P g (rgb ) skin and non-skin probabiliy funcions, wih ''personal'' ones, (rgb ), creaed on-line by sampling randomly he face and orso of he user. These color funcions are combined by he independen likelihood pool [59] rule defined as [16] ILP(rgb ) P (rgb ) P (rgb ) (8) g This way, one pixel is classified as skin iff: ILP(rgb skin) > ILP(rgb skin) (9) The CAMSHIFT algorihm [60] is used o rack he hand moion over he res of he image sequence. This sraegy allows he visual sysem o rack he hand effecively in our experimenal condiions. An example of he racking sysem is shown in Figure 4. A video showing his visual sysem is available a hp://www.youube.com/wach?vdfff01tjvww. An inuiive explanaion of he posiive resuls wih he ILP approach is ha i weighs general color p P p disribuions wih precise informaion obained from he images on-line. Oher rules such as linear combinaion of probabiliies: ILP(rgb) w1p g(rgb )+ w2pp (rgb ) did no generaed he same resuls, probably because of he need o selec accurae weighs w 1 and w 2. However, a deeper analysis of he poenial of his rule is beyond he scope of his documen, and more experimenaion is required o provide conclusive argumens on he applicaion of his scheme for skin deecion. 5. Gesure daabase We propose 9 dynamic gesures oriened o inerac wih a mobile robo -see Fig. 5. Gesures were performed by 10 men and 5 women wih he righ arm a 3m in fron of he video camera. To minimize he adverse effecs of he visual processing errors over he feaure exracion sep, a blue-screen background was se and each paricipan was asked o use long-sleeved clohes of colors differen from skin color. A shor video was used o insruc people o perform each gesure class before saring is corresponding sampling round and no special recommendaions were given aferwards. Excep for one person named here man10, none of he oher people had experience wih he visual sysem or previous raining execuing gesures. The complee se of examples is composed of 7308 gesures. Every person conribued wih a differen number of samples; however, here are recorded a leas 50 samples of each gesure per person. Figure 4. Example of he resuls of hand racking hrough a sequence of 3 images. 88 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Figure 5. Gesure se: (a) come, (b) aenion, (c) sop, (d) righ, (e) lef, (f) urn lef, (g) urn righ, (h) waving-hand and (i) poining; (j) iniial and final posiion for each gesure. Each sample is composed by he lengh T of he observaion sequence -ha ranges from 6 o 42 observaions- and he gesure daa iself. Every observaion is composed by i) (x, y)-coordinaes of he upper and lower corners of he recangle ha segmens he righ hand, ii) (x, y)-coordinaes of he upper and lower corners of he recangle ha segmens he user's orso, and iii) (x, y)-coordinaes of he cener of he user's face. This coarse posure daa enable us o easily ransform he informaion o differen feaure ses. All coordinaes are relaive o he usual upper-lef corner of he image. Daa was recorded on plain ex files. Spaial crierion abou he posiion of he hand was used o sar and end he capure of each gesure example. Observaions were sampled every 4 images a a frame rae of 30 images per second approximaely. This daabase can be downloaded from hp://sourceforge.ne/projecs/visualgesures/. Addiionally, wo more ses of gesures were consruced. One person -labeled as man10- execued he 9 gesures a disances of 2m and 4m from he video camera. The same person performed again he 9 gesures wih roaions of ± 45 around he verical axis a a disance of 3m. In he roaed sampling round we noed ha he visual sysem worked well, alhough i was no originally designed for ha purpose. The oal number of gesure samples is 1081 for he daabase wih disance variaions, and 1052 for he daabase wih roaion changes. Again, here are a leas 50 samples per gesure a each disance and orienaion. 6. Gesure aribues From he coarse posure informaion described in Secion 5, we exraced he following 7 gesure aribues: a) 3 feaures o describe moion, and b) 4 o describe posure. Moion feaures are Δarea - or changes in hand area-, Δx and Δy -or changes in hand posiion of he XY-plane of he image. The conjuncion of hese hree aribues le us esimae hand moion in he Caresian space XYZ. Each one of hese feaures akes only one of hree possible values: {+, -, 0} ha indicae incremen, decremen or no change, depending on he area and posiion of he hand in a previous image of he sequence. Journal of Applied Research and Technology 89

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 For example, if he hand moves o he righ, hen Δx +, if is moion is o he lef, Δx - and if here is no moion in he -axis, Δx 0. An example on how hese variables are insaniaed accordingly o he user's hand moion is presened in Figure 6. Figure 7. Example of he posure feaures. The image shows ha he hand has a verical posiion, below he head, o he righ of he user and no over he user's orso, so he aribue values are above false, righ rue, orso false, and form +. 7. Experimens and resuls Figure 6. Figure 6: Example of moion feaures. In his image, he hand moion is performed o he righ of he observer and downwards, so Δx + and Δy -; red poins indicae he cener of he hand. Given ha he hand area does no change significanly, Δarea 0. Posure feaures named form, righ, above, and orso describe hand orienaion and spaial relaions beween he hand and oher body pars, such as he face and orso. Hand orienaion is represened by form. This feaure is discreized ino one of hree values: + if he hand is verical, - if he hand is horizonal, or 0 if he hand is lean o he lef or righ over he XY plane. righ indicaes if he hand is o he righ of he head, above if he hand is above he head, and orso if he hand is in fron of he orso. These hree laer aribues ake binary values, rue or false, ha represen if heir corresponding condiion is saisfied or no. An example of posure exracion in erms of hese variables is depiced in Figure 7. This feaure se does no make explici use of magniude componens as usual on oher approaches. The inenion is raher o represen gesures hrough qualiaive descripions such as ''The hand is moving o he user's righ and upwards'' or ''The gesure is performed above he head''. We conduced hree main experimens o compare classificaion and learning performances of DNBCs and HMMs. In he firs experimen, gesures aken from he same person are used for recogniion. In he second experimen, we evaluae he generalizaion capabiliies of he classifiers by raining and esing wih gesures from differen people. Experimen hree considers gesures wih variaions on disance and roaion. Firs, we describe our experimenal seup. 7.1 Experimenal seup Our visual sysem processes up o 30 f.p.s. The hardware is an IBM PC Inel Penium 1.6 GHz, 512Mb RAM, a Sony EVI-D30 camera and a WinTV frame grabber. The image resoluion is640 480 pixels. Sample code of he visual sysem is available a hp://sourceforge.ne/projecs/visualgesures/. All he experimens were carried ou wih DNBCs and HMMs wih posure-moion on he one hand, and moion feaures only on he oher. Figure 8 shows a graphical descripion of he 4 models. For DNBCs -Figs. 8a and 8b- insead of assuming saisical independence beween Δx and Δy given he class variable, hey were joined as a single aribue. Doing his, we obained beer classificaion resuls for hese classifiers. All models 90 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Number of gesures correcly classified recogniion rae 100 (10) Number of esing samples (a) (b) (c) (d) Figure 8. Graphical represenaion of DNBCs and HMMs considered in our experimens. (a) DNBCs wih posure-moion feaures, (b) DNBCs wih moion feaures, (c) HMMs wih posure-moion informaion and (d) HMMs wih moion aribues. Aribues wihin parenhesis conform a single join probabiliy disribuion. were se o follow sandard ''linear'' ransiion opologies wihou skip ransiions, iniialized o an uniform discree probabiliy disribuion. The number of parameers o specify sae observaion disribuions of HMMs wih posuremoion feaures is 648 and wih moion daa only is 27. Wih DNBCs, parameers are 21 in he former case, and o 12 in he laer case. For raining, sopping crierion is achieved if he absolue difference of log P(A ) of wo consecuive models in an EM ieraion is less han 1.0E-1. The whole gesure sequence was used wihou preprocessing. We use a modified version of he Tapas Kanungo's HMMs Toolki [61] for raining and esing HMMs and DNBCs. Recogniion rae is calculaed as follows: 7.2 Individual recogniion We use gesure samples performed a 3m in fron of he video camera in he experimens presened Journal of Applied Research and Technology 91

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 in his secion. For a single paricipan, 50 gesures of each class were seleced randomly. From his pool, 20 gesure examples were chosen a random o consruc a raining daa se. The remaining 30 samples compose he es daa se. Training and esing examples are he same for all classifiers. The experimen was performed for each one of he 15 paricipans and repeaed 10 imes o average resuls. Table 1 shows he average error rae, oal raining ime and he number of EM ieraions of he four models as a funcion of he number of saes of he model. As i is shown, he error rae ends o decrease as he number of saes increases for all models. This indicaes ha commonly suggesed opologies ha range from 3 o 6 saes [49,62,63,64] could no be adequae enough in all siuaions in gesure recogniion. However, performance is no improved imporanly beyond 12 saes and slighly decreases wih 18 saes. Excep for he experimen wih a 3-sae ransiion opology, DNBCs ouperform recogniion performances of HMMs. I also shows ha DNBCs benefi raining ime significanly, wihou compromising recogniion raes. In paricular, raining ime of DNBCs wih posure-moion daa is consisenly around oneenh of he ime required for HMMs. This difference is due o he number of possible observaions of he models ha is higher for HMMs. The number of ieraions required for HMMs and DNBCs wih and wihou posure does no vary considerably on each rial. This is because logp(a ) of hese models are similar, as i is shown below. For he res of he experimens, we seleced a 12-sae ransiion opology as a compromise beween raining ime and recogniion resuls. I is useful o measure how erroneous responses are disribued among classes by he classifiers. We follow he mehod inroduced by R. van Son o calculae error dispersion measures d s and d r from confusion marices [65]. This mehod relies on enropy-base measure perplexiy [66]. is he mean number of wrong responses per correc class; is he mean number of samples incorrecly classified on each possible class. These indices accoun for dispersion hrough he horizonal and verical dimensions of he confusion marix, respecively. The higher he dispersion is, he higher he value of hese measures should be. To obain hese Posure-moion models Moion models Toal Toal Average error rae (%) Training ime (Sec) Number of ieraions Average error rae (%) Training ime (Sec) Number of ieraions Number of saes DNBCs HMMs DNBCs HMMs DNBCs HMMs DNBCs HMMs DNBCs HMMs DNBCs HMMs 3 3.81 3.02 36.77 322.82 13964 25007 28.43 28.71 28.26 44.28 13238 16548 6 2.37 2.6 126.7 1047.3 23305 28449 14.28 18.66 99.63 134.03 24719 25126 9 1.94 2.3 288.19 2344.5 29296 31676 13.28 16.73 217.35 303.96 32306 32707 12 1.78 2.18 516.63 4360.37 33270 34540 13.03 15.75 380.59 556.04 37183 38472 15 1.78 2.14 778.02 7805.28 34773 35940 13.42 15.84 599.79 868.54 41116 41258 18 1.72 2.2 1135.12 11854.8 36885 38696 13.82 16.09 897.48 1272.06 44718 44166 Table 1. Average raining ime, oal raining ime and number of ieraions for DNBCs and HMMs wih and wihou posure daa, as a funcion of he number of saes in he ransiion opology. 92 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 measures, we calculaed cumulaive confusion marices by pooling marices of DNBCs and HMMs classifiers generaed in he experimen wih 12- saes, for all he paricipans. Table 2 shows he values of hese measures. For comparison purposes, consider a 99 confusion marix wih a uniform disribuion. For his marix, error rae is 88%, and d s d r 8, i.e., 8 is he mean number of enries in which misclassificaions are disribued. DNBCs wih moion daa provide lower error dispersion in comparison o HMMs wih he same daa. By conras, DNBCs wih posure and moion aribues generae slighly higher values han he corresponding HMMs. Nowihsanding, his laer difference does no seem o be significan in comparison o he dispersion values obained from he uniform error disribuion. Figure 9a shows recogniion raes for each person following his seup. Classifiers wih posure and moion aribues improve recogniion raes significanly in comparison o classifiers wih moion aribues in all cases. Figures 9b and 9c depic average number of ieraions and average raining ime o consruc he classifiers, respecively. Figure 9d presens log P(A ) for each model. In order o ake a closer look on he performance of hese models, an independen experimen was performed by varying he number of raining samples for man10. In all rials, 30 examples seleced a random are used for esing. Figure 10 shows he average recogniion rae on 10 runs of he experimen as a funcion of he number of raining examples. The only rial where HMMs clearly ouperform DNBCs is he one raining sample case wih moion models, wih a difference of 8.74%. However, his difference could no be meaningful a all, since HMMs rae is hardly above 50% and, i is somewha unrealisic o expec reliable recogniion raes wih one or wo raining examples using moion only. Figure 11a shows he progression of logp(a ) of each classifier as a funcion of he number of EM ieraions for he same gesure example. I is shown ha DNBCs converge faser han HMMs. To evaluae how DNBCs reflec he evoluion of a gesure, we compued he mos probable saes pah of each model via he Vierbi algorihm -Figure 11b. I can be seen ha pahs are quie similar among all models and also observaions spread uniformly over he 12 saes in all cases. 7.3 Experimens wih muliple people I is common o consruc and validae gesure models wih samples aken from a single person. We agree wih previous discussions ha i is difficul o correcly recognize gesures from people no considered on raining. However, in various applicaions, recogniion mus be performed wih gesures from people no previously presened o he classifiers. Few sysemaic work has been done o es he behavior of he classifiers under his siuaion. To evaluae his, we use he classifiers consruced in he previous experimen for each person, o classify gesures from he remaining 14 people. For esing, we randomly exraced 2 samples per gesure from each personal daabase, excluding gesures from he person for whom he classifiers were consruced. In his form, a es se of 48 samples per gesure was generaed. d s d r DNBCs moion-posure 1.69 1.69 HMMs moion-posure 1.53 1.54 DNBCs moion 3.06 3.09 HMMs moion 3.23 3.28 Table 2. Error rae and error dispersion indices and for DNBCs and HMMs wih moion and posure-moion aribues. Journal of Applied Research and Technology 93

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 (a) (b) (c) (d) Figure 9. Resuls of he individual recogniion of gesures using DNBCs and HMMs wih posure-moion and moion aribues: (a) average recogniion raes for each paricipan, (b) he average number of ieraions for raining, (c) average raining ime, and, (d) average log probabiliies of observaions given he model. Figure 10. Average recogniion raes of man10 as a funcion of he number of raining examples. 94 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Figure 12a presens average recogniion resuls of 10 repeiions of his experimen as a funcion of he personal classifiers used for esing. Average recogniion raes for DNBCs wih posure and moion feaures is 73.85%; for HMMs is 74.80%. Average recogniion raes for DNBCs and HMMs wih moion daa is 52.80% and 51.60%, respecively. Anoher experimenal seing wih models rained wih 2 samples per gesure of 14 people and esed wih 30 samples per gesure of he fifeenh person is depiced in Figure 12b. Percenages were obained by averaging 10 insances of his experimen. Horizonal axis indicaes he person o whom he esing se belongs o. Average recogniion percenages for posure-moion models is 85.79% for DNBCs and 86.45% for HMMs. DNBCs wih moion aribues obained 67.73% and HMMs 64.18%. Recogniion performance of he DNBCs and HMMs counerpars are closer in hese wo experimens, evincing he compeiiveness of DNBCs for his problem. 7.4 Variaions on disance and roaion In many applicaions, gesures are always execued a he same disance and orienaion from he capure devices. In oher applicaion domains, - such as in human-robo ineracion in which boh he person and he robo can move- his resricion may no hold all he ime. For he experimen on disance variaion, 15 samples were randomly exraced for each gesure performed a 2m and 4m, giving a es se of 30 samples per gesure. The classifiers consruced in he firs experimen for man10 were used. Figure 13a shows average recogniion resuls of 10 runs of he experimen, as a funcion of he number of raining samples. DNBCs provide compeiive classificaion resuls in comparison o HMMs, wih posure-moion and moion feaures. The recogniion of roaed gesures is a difficul problem in gesure recogniion. The selecion of accurae invarian feaures is one of he mos evasive goals in his area. Alhough i is usually suggesed ha 3D informaion [49,67,68] or muliple views are necessary [51], we decided o evaluae he recogniion performance of our models on his problem. The seing of his experimen is similar o he previous one. Fifeen samples of each gesure class execued a a ± 45 were exraced a random o conform 30 esing samples per gesure. Again, we used he models consruced in he firs experimen. Figure 13b shows hese average recogniion resuls of 10 runs of he experimen, as a funcion of he number of raining samples. HMMs wih posure-moion feaures ourange heir DNBCs counerpar wih an average difference of 4.61%. However, HMMs recogniion rae is 72.55% in is bes case, showing his is also a complex problem for HMMs. Moion models performed poorly in all cases. (a) (b) Figure 11. Examples of a single raining and esing rial: a) convergence graph, and b) sae ransiion hrough an observaion sequence. Journal of Applied Research and Technology 95

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 (a) (b) Figure 12. Recogniion raes of he experimens wih muliple people: a) resuls wih ''personal'' classifiers ha are used o recognize gesures from he oher 14 people, and b) wih esing examples of each person o evaluae classifiers consruced wih gesures from he oher paricipans. (a) (b) Figure 13. Recogniion resuls of gesures execued a (a) 2m and 4m, and (b) ± 45. The classifiers consruced in he firs experimen for man10 were used. 96 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 7.5 Discussion Resuls presened in he previous secions show he compeiiveness in erms of recogniion raes of DNBCs in comparison o sandard HMMs in various issues in gesure recogniion, using wo ses of aribues. Aribue facorizaion allows an imporan decrease on raining ime for discree moion and posure-moion models ha benefis online learning of gesures. The recogniion of roaed gesures wih posure and moion informaion is he only experimen in which HMMs clearly ouperform DNBCs. We believe his could be due o he large number of observaion symbols required by HMMs ha allows HMMs o handle such srong variaions slighly beer. We also show ha he models wih posure and moion daa surpass he classifiers wih moion feaures in all he experimens, in paricular, when considering changes in disance and roaion. Nowihsanding hese posiive resuls, we apply a single DNBCs srucure o all our gesures, as usual in gesure recogniion. However, besides classificaion, a complee gesure analysis requires also he developmen of models ha effecively describe aribues and heir saisical dependence relaionships for each gesure class. We have shown ha condiional independence assumpions decrease recogniion performance only in complex siuaions ha are difficul even for HMMs, and ye allow us o explore srucural learning [69] and feaure selecion echniques [70]. For example, in [15], an evoluionary learning approach o cope wih feaure selecion is proposed, searching for dependencies beween aribues and he number of hidden saes for each gesure using DNBCs srucures wih our daabase and feaure se. Their resuls sugges he possibiliy o improve recogniion raes wih differen aribue ses, associaions and number of saes for each gesure. We believe ha hese findings could lead o a fruiful research field in gesure recogniion in he near fuure ha may help us o improve our knowledge of gesures and o develop more accurae models for his purpose. 8. Conclusions In his paper, an empirical comparison of DNBCs and sandard HMMs was presened. DNBCs incorporae condiional independence among gesure feaures given he sae ino HMMs framework. DNBCs i) provide compeiive error dispersion and recogniion raes in various problems in gesure recogniion, ii) require fewer parameers, iii) improve raining ime, and iv) permi srucural learning and feaure selecion echniques o consruc such dependences. In addiion, we showed ha a se of naural and simple posure and moion gesures allows us o correcly classify gesures. We also showed ha classificaion performance of recognizers wih hese posure-moion daa surpass moionbased ones. Also, an adapive skin-color scheme o rack he righ hand of muliple people wih differen skin ones under differen lighing condiions was described, and is implemenaion made available for oher research groups. An exensive and comprehensive se of experimens was carried ou wih gesures aken from a single person, from muliple people, and wih variaions on disance and roaion. An addiional produc of his work is a freely accessible gesure daabase wih more han 7000 samples of 9 gesure classes performed by 15 people. Our resuls show he effeciveness of he proposed approach and ha DNBCs are a suiable alernaive ha opens he way o imporan issues such as feaure selecion and on-line learning. Journal of Applied Research and Technology 97

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 References [1] Sarner T., Weaver J. & Penland A., Real-Time American Sign Language Recogniion Using Desk and Wearable Compuer-Based Video, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 20, No. 12, Dec 1998, pp. 1371-1375. [2] Lee H.K. & Kim J.H., An HMM-Based Threshold Model Approach for Gesure Recogniion, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 21, No. 10, Oc 1999, pp. 1371-1375. [3] Inoue M. & Ueda N., Exploiaion of Unlabeled Sequences in Hidden Markov Models, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 25, No. 12, pp. Dec 2003, pp. 1570-1581. [4] Pavlovic V., Sharma R. & Huang T.S., Visual Inerpreaion of Hand Gesures for Human-Compuer Ineracion: A Review, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 19, No. 7, 1997, pp. 677-695. [5] Domingos P. & Pazzani M., On he Opimaliy of he Simple Bayesian Classifier under Zero-One Loss, Machine Learning, Vol. 29, No. 2-3, 1997, pp. 103-130. [6] Friedman N., Geiger D. & Goldszmid M., Bayesian Nework Classifiers, Machine Learning, Vol. 29, No. 2-3, 1997, pp. 131-163. [7] Avilés H. & Sucar L.E., Dynamic Bayesian neworks for visual recogniion of dynamic gesures, Journal of Inelligen and Fuzzy Sysems, Vol. 12, No. 3 4, 2002, pp. 243-250. [8] Hannaford B., Muli-dimensional hidden Markov model of Telemanipulaion Tasks wih Varying Oucomes, Proc. IEEE Inernaional Conference on Sysems, Man and Cyberneics, 1990, pp. 127-133. [9] Frasconi P., Soda G. & Vullo A., Tex Caegorizaion for Muli-page Documens: A hybrid Naive Bayes HMM Approach, Proc. ACM/IEEE Join Conference on Digial Libraries, 2001, pp. 11-20. [10] Pavlovic V., Garg A. & Kasif S., A Bayesian Framework for combining gene predicions, Bioinformaics, Vol. 18, No. 1, 2002, pp. 19-27. [11] Xiang T. & Gong S., Incremenal and adapive abnormal behaviour deecion, Compuer Vision and Image Undersanding, 2008, pp. 59-73. [12] Leser J., Choudhury T., Kern N., Borriello G. & Hannaford B., A hybrid discriminaive/generaive approach for modeling human aciviies, Proc. Nineeeh Inernaional Join Conference on Arificial Inelligence, 2005, pp. 766-722. [13] Ahmad M. & Lee S.W., Human acion recogniion using shape and CLG-moion flow from muli-view image sequences, Paern Recogniion, Vol. 41, No. 7, 2008, pp. 2237-2252. [14] Palacios M.A., Brizuela C.A. & Sucar L.E., Evoluionary Learning of Dynamic Naive Bayesian Classifiers, Proc. 21h Inernaional FLAIRS Conference, 2008, pp. 655-659. [15] Palacios M.A., Brizuela C.A. & Sucar L.E., Evoluionary Learning of Dynamic Naive Bayesian Classifiers, Journal of Auomaed Reasoning, Vol. 45 No. 1, 2009, pp. 21-37. [16] Avilés H., Sucar L.E. & Mendoza C.E., Visual Recogniion of Similar Gesures, 18h Inernaional Conference on Paern Recogniion, 2006, pp. 1100-1103. [17] Rabiner L.E., A uorial on hidden Markov models and seleced applicaions in speech recogniion, Readings in speech recogniion, Alex Waibel, Kai-Fu Lee Ediors, Morgan Kaufmann, 1990, pp. 267-296. [18] Wilson A. & Bobick A., Using Hidden Markov Models o Model and Recognize Gesure Under Variaion, Inernaional Journal on Paern Recogniion and Arificial Inelligence, Special Issue on Hidden Markov Models in Compuer Vision, Vol. 15, No. 1, 2000, pp. 123-160. [19] Brand M., Olivier N. & Penland A., Coupled hidden markov models for complex acion recogniion, Proc. IEEE Conference on Compuer Vision and Paern Recogniion, 1999, pp. 994-999. [20] Marcel S., Bernier O., Vialle J.E. & Collober D., Hand gesure recogniion using inpu-oupu hidden Markov models, Proc. Fourh IEEE Inernaional Conference on Auomaic Face and Gesure Recogniion, 2000, pp. 456-461. [21] Vogler C. & Meaxas D.N., Parallel Hidden Markov Models for American Sign Language Recogniion, Proc. Inernaional Conference on Compuer Vision, 1999, pp. 116-122. [22] Chambers G.S., Venkaesh S., Wes G.A.W. & Bui H.H., Hierarchical recogniion of inenional human gesures for spors video annoaion, Proc. 16h Inernaional Conference on Paern Recogniion, Vol. 2, 2002, pp. 1082-1085. 98 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 [23] Pavlovic V., Frey B.J. & Huang T.S., Variaional learning in mixed-sae dynamic graphical models, Proc. Uncerainy in Arificial Inelligence (UAI), 1999, pp. 522-530. [24] Naarajan P. & Nevaia R., Hierarchical Muli-channel Hidden Semi Markov Models, Proc. Inernaional Join Conference on Arificial Inelligence (IJCAI'07), 2007, pp. 2562-2567. [25] Duong T., Bui H.H., Phung D.Q. & Venkaech S., Aciviy Recogniion and Abnormaliy Deecion wih he Swiching Hidden semi-markov Model, Proc. 9h IEEE Inernaional Conference on Compuer Vision, Vol.1, 2005, pp. 838-845. [26] Arieres T., Marukaa S. & Gallinari P., Online Handwrien Shape Recogniion Using Segmenal Hidden Markov Models, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 29, No. 2, Feb 2007. pp. 205-217. [27] Cassandra A.R., Kaelbling L.P. & Liman M.L., Acing opimally in parially observable sochasic domains, Proc. Twelfh Naional Conference on Arificial Inelligence (AAAI), Vol. 2, 1994, pp. 1023-1028. [28] Hoey J. & Lile J.J., Value-Direced Human Behavior Analysis from Video Using Parially Observable Markov Decision Processes, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 29, No. 7, Jul 2007, pp. 1118-1132. [29] Rubine D., Specifying Gesure by Example, Compuer Graphics, Vol. 25, No. 4, July 1991, pp. 329-337. [30] Mardia K.V., Ghali N.M., Hainsworh T.J., Howes M. & Sheehy N., Techniques for online gesure recogniion on worksaions, Image and Vision Compuing, Vol. 11, No. 5, 1993, pp. 283-294. [31] Monero J.A. & Sucar L.E., Feaure Selecion for Visual Gesure Recogniion Using Hidden Markov Models, Proc. Fifh Mexican Inernaional Conference in Compuer Science, (ENC'04), 2004, pp. 1-8. [32] Cui Y., Swes D. & Weng J., Learning-based hand sign recogniion using SHOSLIF-16, Proc. 5h In. Conf. Compuer Vision, 1995, pp. 631-636. [33] Mahews I., Cooes T.F., Bangham J.A., Cox S. & Harvey R., Exracion of Visual Feaures for Lipreading, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 24, No. 2, Feb 2002, pp. 198-213. [34] Shanableh T., Assaleh K. & Al-Rousan M., Spaio- Temporal Feaure-Exracion Techniques for Isolaed Gesure Recogniion in Arabic Sign Language, IEEE Trans. Sysems, Man, and Cyberneics-Par B: Cyberneics, Vol. 37, No. 3, June 2007, pp. 641-650. [35] Johansson G., Visual Percepion of Biological Moion and a model for is analysis, Percepion and Psychophysics, Vol. 14, No. 2, 1973, pp. 201-211. [36] Webb J.A. & Aggarwal J.K., Srucure from moion from rigid and joined objecs, Arificial Inelligence, Vol. 19, No. 1, 1982, pp. 107-130. [37] Shah M., Undersanding human behavior from moion imagery, Machine Vision and Applicaions, Vol. 14, No. 1, 2003, pp. 210-214. [38] Giese M.A. & Poggio T., Morphable Models for he Analysis and Synhesis of Complex Moion Paerns, Inernaional Journal of Compuer Vision, Vol. 38, No. 1, 2000, pp. 59-73. [39] Bobick A.F. & Davis J.W., The recogniion of human movemen using emporal emplaes, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 23, No. 3, Mar 2001, pp. 257-267. [40] Waldherr S., Gesure Recogniion on a Mobile Robo, Diploma hesis, Carnegie Mellon Universiy. School of Compuer Science, 1998. [41] Beinema J.A. & Lappe M., Percepion of Biological moion wihou local image moion, Proc. of he Naional Academy of Sciences, Vol. 4, No. 8, 2002, pp. 5661-5663. [42] Sigala R., Serre T., Poggio T. & Giese M., Learning Feaures of Inermediae Complexiy for he Recogniion of Biological Moion, Inernaional Conference on Arificial Neural Neworks (ICANN), 2005, pp. 241-246. [43] Casile A. & Giese M., Roles of moion and form in biological moion and recogniion, Inernaional Conference on Arificial Neural Neworks (ICANN), 2003, pp. 854-862. [44] Casile A. & Giese M., Criical feaures for he recogniion of biological moion, Journal of Vision, Vol. 5, 2005, pp. 348-360. [45] Sokoe W., Sign Language Srucure, Universiy Buffalo Press, 1960. [46] Jus A., Bernier O. & Marcel S., Recogniion of Isolaed Complex Mono and BiManual 3D Hand Gesures, Sixh IEEE Inernaional Conference on Auomaic Face and Gesure Recogniion, 2004, pp. 571-577. [47] Ren H., Xu G. & Kee S.C., Subjec-independen Naural Acion Recogniion, Proc. Sixh IEEE Conference on Auomaic Face and Gesure Recogniion, 2004, pp. 523-528. Journal of Applied Research and Technology 99

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 [48] Corradini A. & Gross H.M., Camera-based Gesure Recogniion for Robo Conrol, IEEE-INNS-ENNS Inernaional Join Conference on Neural Neworks, Vol. 4, 2000, pp. 133-138. [49] Campbell L.W., Becker A.D., Azarbayejani A., Bobick A.F. & Penland A., Invarian feaures for 3-D Gesure Recogniion, Technical repor 379, M.I.T. Media Laboraory Percepual Compuing Secion, 1996. [50] Vogler C. & Meaxas D., ASL Recogniion based on a Coupling Beween HMMs and 3D Moion Analysis, Proc. Inernaional Conference on Compuer Vision (ICCV'98), 1998, pp. 363-369. [51] Ahmad M. & Lee S.W., Human Acion Recogniion Using Muli-View Image Sequences Feaures, Sevenh Inernaional Conference on Auomaic Face and Gesure Recogniion, pp. 523-528, 2006. [52] Baum L.E., Perie T., Soules G. & Weiss N., A maximizaion echnique occurring in he saisical analysis of probabilisic funcions of Markov chains, Ann. Mah. Sa., Vol. 41, No. 1, 1970, pp. 164-171. [53] Bilmes J.A., A Genle Tuorial of he EM Algorihm and is Applicaion o Parameer Esimaion for Gaussian Mixure and Hidden Markov Models, U.C. Berkeley, TR- 97-021, hp://cieseer.is.psu.edu/1570.hml, 1998. [54] Rabiner L. & Juang B.H., Fundamenals on Speech Recogniion, Prenice-Hall Signal Processing Series, New Jersey, 1993. [55] Viola P.A. & Jones M.J., Robus Real-ime Objec Deecion, Inernaional Journal of Compuer Vision, Vol. 57, No. 2, May 2004, pp. 137-154. [56] Azpeiia L.G.G., Con la Vara que Midas. Universidad de Colima, Colima, México. 1987. (In Spanish). [57] Jones M.J. & Rehg J.M., Saisical Color Models wih Applicaion o Skin Deecion, Technical repor CRL- 98/11, Cambridge Research Laboraory, 1996. [58] Avilés H. & Sucar L.E., Real-Time Visual Recogniion of Dynamic Arm Gesures, Video-Based Surveillance Sysems: Compuer Vision and Disribued Processing, P. Remagnino, P., G.A. Jones, N, Paragios, C.S. Regazzoni, Ediors, Kluwer Academic, 2002, pp. 227-238. [60] Bradski G.R., Real Time Face and Objec Tracking as a Componen of a Percepual User Inerface, Proc. 4h IEEE Workshop on Applicaions of Compuer Vision (WACV'98), 1998, pp. 214-219. [61] Kanungo T., Hidden Markov Models Sofware, Available a: hp://www.kanungo.com/. Las rerieved: May 26, 2008 [62] Kendon A., An agenda for gesure sudies, Semioic Review of Books, Vol. 7 No. 3, pp. 8-12, 1996. Available a: hp://www.univie.ac.a/wissenschafsheorie/srb/srb/ges ure.hml. [63] Yang H.D., Park A.Y. & Lee S.W., Gesure Spoing and Recogniion for Human-Robo Ineracion, IEEE Trans. in Roboics, Vol. 23, No. 2, Apr 2007, pp. 256-279. [64] Elmezain M., Al Hamadi A., Appenrod J. & Michaelis B., A Hidden Markov Model-based coninuous gesure recogniion sysem for hand moion rajecory, 19h Inernaional Conference on Paern Recogniion, 2008, pp. 1-4. [65] van Son R.J.J.H., The Relaion Beween he Error Disribuion and he Error Rae in Idenificaion Experimens, Proc. European Conference on Speech Communicaion and Technology, 1995, pp. 2277-2280. [66] Shannon C.E., A Mahemaical Theory of Communicaion, Bell Sysem Technical Journal, Vol. 27, 1948, pp. 379-423 and 623-656. [67] Wu Y. & Huang T.S., Vision-Based Gesure Recogniion: A Review, Gesure-Based Communicaion in Human-Compuer Ineracion, A. Camurri, G. Volpe, Springer Berlin / Heidelberg, Vol. 1739/1999, 1999, pp. 103-115. [68] Parameswaran V. & Chellappa R., Human acionrecogniion using muual invarians, Compuer Vision and Image Undersanding, Vol. 98, 2005, pp. 295-325. [69] Friedman N., Murphy K. & Russell S., Learning he Srucure of Dynamic Probabilisic Neworks, Foureenh Conference on Uncerainy in Arificial Inelligence (UAI), 1998, pp. 139-147. [70] Bressan M. & Viria J., On he Selecion and Classificaion of Independen Feaures, IEEE Trans. Paern Analysis and Machine Inelligence, Vol. 25, No. 10, Oc 2003, pp. 1312-1317. [59] Manyika J. & Durran-Whye H., Daa Fusion and Sensor Managemen: A descenralized Informaion- Theoreic Approach, Ellis Horwood, NY-London, 1994. 100 Vol.9 No.1 April 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Auhors Biographies Hécor AVILÉS-ARRIAGA Hécor Avilés has a bachelor's degree in compuer science from he Insiuo Tecnológico de Ciudad Madero in 1997, maser's and docor s degrees in compuer science (2000 and 2006, respecively) from he Insiuo Tecnológico y de Esudios Superiores de Monerrey, Campus Cuernavaca. He held posdocoral appoinmens from he Insiuo Nacional de Asrofísica, Ópica y Elecrónica (2006-2007) and from he Insiuo de Invesigaciones en Maemáicas Aplicadas y en Sisemas of he Universidad Nacional Auónoma de México (2008-2010). He is currenly a member of he academic saff of he Compuer Deparmen a he Insiuo de Invesigaciones en Maemáicas Aplicadas y en Sisemas. Dr. Avilés has wrien more han 25 papers in journals, book chapers, inernaional conferences and workshops. His research ineress include visual recogniion of gesures, mulimodal humanrobo ineracion combining gesures and speech, and design and implemenaion of inelligen service robos. L. Enrique SUCAR-SUCCAR L. Enrique Sucar has a Ph. D. in compuing from Imperial College, London, UK, 1992; an M.Sc. in elecrical engineering from Sanford Universiy, California, USA, 1982; and a B.Sc. in elecronics and communicaions engineering from ITESM, Monerrey, Mexico, 1980. He has been a researcher a he Elecrical Research Insiue and professor a ITESM Cuernavaca and is currenly a senior researcher a INAOE, Puebla, Mexico. He has been an invied professor a he Universiy of Briish Columbia, Canada; Imperial College, London; and INRIA, France. He has more han 100 publicaions in journals and conference proceedings, and has direced 15 Ph.D. Thesis. Dr. Sucar is a member of he Naional Sysem of Researchers, he Mexican Science Academy, AAAI, SMIA and senior member of he IEEE. He has served as presiden of he Mexican AI Sociey, has been a member of he Advisory Board of IJCAI, and is associae edior of he journals Compuación y Sisemas and Revisa Iberoamericana de Ineligencia Arificial. His main research ineress are in graphical models and probabilisic reasoning, and heir applicaions in compuer vision, roboics and biomedicine. Journal of Applied Research and Technology 101

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesure Recogniion, H.H. Avilés Arriaga e al., 81 102 Carlos Eduardo MENDOZA-DURÁN Carlos Eduardo Mendoza-Durán holds a bachelor s degree in acuarial science from UNAM (1976), an M.A. in saisics from Princeon (1982), a Ph. D. in saisics from Princeon (1984). He is currenly a full-ime professor a Universidad Anáhuac México Nore a he School of Engineering. His main ineress are daa analysis, arificial inelligence and applied saisics. Luis A. PINEDA-CORTÉS Luis A. Pineda has a bachelor's degree in elecronics from Universidad Anáhuac in Mexico Ciy, an M. Sc. in compuer science from ITESM, Campus Morelos and a Ph. D. in cogniive science from he Universiy of Edinburgh (1986-1989). He has been he daa cener manager of NCR in Mexico Ciy (1981-1983), a researcher a Insiuo de Invesigaciones Elécricas (IIE) in Cuernavaca, Mexico (1983-1986 and 1992-1998) and also a research associae a he Human Communicaion Research Cenre (HCRC) a he Universiy of Edinburgh (1989-1992). Since 1998, he has worked as an associae researcher in he Deparmen of Compuer Science a Insiuo de Invesigaciones en Maemáicas Aplicadas y en Sisemas (IIMAS) of he Universidad Nacional Auónoma de México (UNAM), where he has been he head wice (1998-2002 and 2005-2010). He has published exensively on compuaional linguisics and arificial inelligence. Dr. Pineda is a regular member of he Mexican Academy of Science, a member of he Naional Sysem of Researchers (SNI), level II, and since January 2010 he has been he coordinaor of he Mexican Nework for Research and Developmen in Compuer Science (REMIDEC). 102 Vol.9 No.1 April 2011