Gesture Recognition. Introduction. Mathematical Model. Analysis, Recognition, and Grammar. Gesture Modeling. Xiaojun Qi

Similar documents
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Annotation and Taxonomy of Gestures in Lecture Videos

Large vocabulary off-line handwriting recognition: A survey

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Word Segmentation of Off-line Handwritten Documents

A Case-Based Approach To Imitation Learning in Robotic Agents

Speech Recognition by Indexing and Sequencing

Human Emotion Recognition From Speech

An Online Handwriting Recognition System For Turkish

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

WHEN THERE IS A mismatch between the acoustic

Robot manipulations and development of spatial imagery

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

Evolutive Neural Net Fuzzy Filtering: Basic Description

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Visual CP Representation of Knowledge

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

INPE São José dos Campos

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Action Recognition and Video

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

An Introduction to the Minimalist Program

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Probability and Game Theory Course Syllabus

Statewide Framework Document for:

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Probability and Statistics Curriculum Pacing Guide

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

AQUA: An Ontology-Driven Question Answering System

THE DEPARTMENT OF DEFENSE HIGH LEVEL ARCHITECTURE. Richard M. Fujimoto

Saliency in Human-Computer Interaction *

Piano Safari Sight Reading & Rhythm Cards for Book 1

Eye Movements in Speech Technologies: an overview of current research

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Software Maintenance

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

16.1 Lesson: Putting it into practice - isikhnas

Device Independence and Extensibility in Gesture Recognition

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

NATIONAL CENTER FOR EDUCATION STATISTICS RESPONSE TO RECOMMENDATIONS OF THE NATIONAL ASSESSMENT GOVERNING BOARD AD HOC COMMITTEE ON.

Language Acquisition Chart

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

Speech Recognition at ICSI: Broadcast News and beyond

Lecture 1: Machine Learning Basics

Collaborative Construction of Multimodal Utterances

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Lecture 2: Quantifiers and Approximation

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

On-Line Data Analytics

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Introduction to CRC Cards

CS Machine Learning

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Learning Methods for Fuzzy Systems

Learning Methods in Multilingual Speech Recognition

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

MULTIMEDIA Motion Graphics for Multimedia

A Case Study: News Classification Based on Term Frequency

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

English Language Arts Missouri Learning Standards Grade-Level Expectations

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Computerized Adaptive Psychological Testing A Personalisation Perspective

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Using computational modeling in language acquisition research

Calibration of Confidence Measures in Speech Recognition

BMBF Project ROBUKOM: Robust Communication Networks

STA 225: Introductory Statistics (CT)

the conventional song and dance formula is kept in 21st century H istory Movies, filmmakers are now moving towards

Bharatanatyam. Introduction. Dancing for the Gods. Instructional Time GRADE Welcome. Age Group: (US Grades: 9-12)

An Introduction to Simio for Beginners

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Empowering Students Learning Achievement Through Project-Based Learning As Perceived By Electrical Instructors And Students

Adaptation Criteria for Preparing Learning Material for Adaptive Usage: Structured Content Analysis of Existing Systems. 1

Transcription:

Gesture Recognition Xiaojun Qi Introduction Numerous approaches have been applied to the problem of visual interpretation of gestures for Human-Computer Interaction (HCI). Many of those approaches have been chosen and implemented so that they focus on one particular aspect of gestures: hand tracking, pose classification, or hand gesture interpretation. 1 To effectively study the process of hand gesture interpretation, a global structure of the interpretation system needs to be established. 2 Block diagram of a global visionbased gesture interpretation system 3 Mathematical Model The system requires that a mathematical model of gestures be established first. Such model is pivotal for the successful functioning of the system. Once the model is decided, the system follows a classical path. Model parameters are computed in the analysis stage from image features extracted from single or multiple video input streams. 4 Analysis, Recognition, and Grammar The analysis state is followed by the recognition block. Here, the parameters are classified and interpreted in the light of the accepted model and the rules imposed by some adequate grammar. The grammar reflects not only the internal syntax of gestural commands but also the possibility of interaction of gestures with other communication modes like speech, gaze, or facial expressions. 5 The quality of gestural interface for HCI is directly related to the proper modeling of hand gestures. How to model hand gestures depends, primarily on the intended application within the HCI context. In some instances, for example, a very coarse and simple model may be sufficient. However, if the purpose is a natural-like interaction, a model has to be established that allows many if not all natural gestures to be interpreted by the computer. 6 1

-- One Definition of Gesture Let h(t) S be a vector that describes the pose of the hands and/or arms and their spatial position within an environment at time t in the parameter space S. A hand gesture is represented by a trajectory in the parameter space S over a suitably defined interval I. Two challenging Issues: 1. How to construct the gestural model over the parameter set? 2. How to define the gesture interval? 7 -- Gesture Taxonomy Gestural Taxonomy Applicable to HCI 8 -- Temporal Modeling of Gestures In a HCI environment, the following set of rules determines the temporal segmentation of gestures: 1. Gesture interval consists of 3 phases: preparation, stroke, and retraction. 2. Hand pose during the stroke follows a classifiable path in the parameter space. 3. Gestures are confined to a specified spatial volume (workspace). 4. Repetitive hand movement are gestures. 5. Manipulative gesture have longer gesture interval lengths than communicative gestures. 9 10 -- Spatial Modeling of Gestures A complete gesture model for HCI is the one whose parameters belong to the parameter space S constructed in the following manner: S = {x x = position of all hand and arm segment joints and fingertips in a 3D space} -- Two Spatial Gesture Models This model relies on the assumption that the human hand and arm can be thought of as an articulated object. 11 12 2

-- 3D Hand/Arm Model -- 3D Hand/Arm Model Constraints Skeleton-based model of the human hand: A reduced set of 13 joint angle parameters together with segment lengths is used. 14 -- Appearance-Based Model The appearance-based model will model gestures by relating the appearance of any gesture to the appearance of the set of predefined, template gestures. Deformable 2D templates: The template sets and their corresponding variability parameters are obtained through PCA of many of the training sets of data. Hand Image Property Parameters: Contours and edges, image moments, image eigenvectors, and fingertip positions. 15 The purpose of the analysis stage is to estimate the parameters (trajectory in parameter space) of the gesture model based on a number of low-level features extracted from images of human operators acting on a HCI 16 environment -- Hand/Arm Localization To lower the burden of the localization and segmentation analysis, a variety of restrictions are usually used: 1. Restrictions on background: A uniform, distinctive (dark) background greatly simplifies the segmentation task. 2. Restrictions on users: Require users to wear long dark sleeves to simplify the localization problem. 3. Restrictions on imaging: Require on-hand focused cameras. Sample methods: Thresholding method, and color space-based analysis. 17 18 3

-- Hand/Arm Feature Extraction The extraction of low level image features depends on the model of the gestures in use. Even though different models use different types of parameters, the features employed to calculate the parameters are often very similar. Some examples are: Hand/Arm Silhouette, contour, and fingertip. 19 -- Hand/Arm Model Parameter Computation from Features Most of the 3D hand/arm model-based gesture models employ successive approximation methods for their parameter computation. The basic idea is to vary model parameters until the features extracted from the model match the ones obtained from the data images. 20 -- Model Parameter Computation (Cont.) The matching procedure usually begins with the palm and ends with the matching of fingers. -- Model Parameter Computation (Cont.) Initial model parameters are usually selected as either the ones that match a generic hand position (open hand, for example), or the ones obtained from the prediction analysis of parameters in the previous images in the sequence. 21 3D hand/arm model parameter computation through successive approximation techniques 22 Gesture Recognition Gesture recognition is the phase in which the trajectory in the parameter space obtained from the analysis stage is classified as a member of some meaningful subsets of the parameter space. Two recognition processes: 1) Optimal partitioning of the time-model parameter space: An optimal partitioning should be such that it produces a single class in the parameter space corresponding to each allowed gesture that minimally intersect with any other gesture classes. 2) Implementation of the recognition procedure. 23 Gesture Recognition (Cont.) -- Partitioning Methods Time partitioning: It requires that the global hand/arm motion be known since that is what distinguishes the three temporal phases. Model parameter space partitioning: K-means HMM NN 24 4

Gesture Recognition -- Partitioning Methods (Cont.) HMM is one technique that is particularly appropriate in this case since the states of the HMM can easily be associated with the temporal gesture phases. Therefore, the gesture HMM should contain at least, and usually more than, 3 hidden states. The HMM training procedure is built on learningfrom-examples based classification of timeparameter space, while the recognition procedure uses dynamic time warping for temporally invariant classification. 25 Applications 26 References A. Wilson and A. Bobick, Parametric Hidden Markov Models for Gesture Recognition, IEEE Transactions on PAMI, Vol. 21, No. 9, 1999. Vladimir I. Pavlovic, Rajeev Sharma, and Thomas S. Huang, "Visual interpretation of hand gestures for human-computer interaction: A review," IEEE Trans. on PAMI, July 1997. 27 Other Websites http://www.cybernet.com/~ccohen/ http://ls7-www.cs.unidortmund.de/research/gesture/vbgrtable.html Check out: International Conference on Automatic Face and Gesture Recognition. 28 5