INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &

Similar documents
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Human Emotion Recognition From Speech

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A study of speaker adaptation for DNN-based speech synthesis

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speech Recognition at ICSI: Broadcast News and beyond

Speech Emotion Recognition Using Support Vector Machine

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Evolutive Neural Net Fuzzy Filtering: Basic Description

Lecture 1: Machine Learning Basics

Artificial Neural Networks written examination

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

WHEN THERE IS A mismatch between the acoustic

Evolution of Symbolisation in Chimpanzees and Neural Nets

Laboratorio di Intelligenza Artificiale e Robotica

Modeling function word errors in DNN-HMM based LVCSR systems

Python Machine Learning

DIANA: A computer-supported heterogeneous grouping system for teachers to conduct successful small learning groups

On the Formation of Phoneme Categories in DNN Acoustic Models

Modeling function word errors in DNN-HMM based LVCSR systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Calibration of Confidence Measures in Speech Recognition

Australian Journal of Basic and Applied Sciences

Word Segmentation of Off-line Handwritten Documents

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

SARDNET: A Self-Organizing Feature Map for Sequences

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Linking Task: Identifying authors and book titles in verbose queries

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

A Case-Based Approach To Imitation Learning in Robotic Agents

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Learning Methods in Multilingual Speech Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Lecture 1: Basic Concepts of Machine Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Speaker Identification by Comparison of Smart Methods. Abstract

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Corpus Linguistics (L615)

Speaker recognition using universal background model on YOHO database

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Cooperative evolutive concept learning: an empirical study

Heredity In Plants For 2nd Grade

Rule Learning with Negation: Issues Regarding Effectiveness

Voice conversion through vector quantization

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

AQUA: An Ontology-Driven Question Answering System

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Some Principles of Automated Natural Language Information Extraction

The dilemma of Saussurean communication

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Backwards Numbers: A Study of Place Value. Catherine Perez

CS Machine Learning

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

EGRHS Course Fair. Science & Math AP & IB Courses

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Switchboard Language Model Improvement with Conversational Data from Gigaword

Segregation of Unvoiced Speech from Nonspeech Interference

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Introduction to Simulation

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Speech Recognition by Indexing and Sequencing

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Modeling user preferences and norms in context-aware systems

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Body-Conducted Speech Recognition and its Application to Speech Support System

Detailed course syllabus

Specification of the Verity Learning Companion and Self-Assessment Tool

An empirical study of learning speed in backpropagation

Arizona s College and Career Ready Standards Mathematics

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Genevieve L. Hartman, Ph.D.

Course Law Enforcement II. Unit I Careers in Law Enforcement

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Problems of the Arabic OCR: New Attitudes

Transcription:

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976-6375(Online), Volume TECHNOLOGY 5, Issue 5, May (2014), (IJCET) pp. 76-81 IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 5, Issue 5, May (2014), pp. 76-81 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET I A E M E SPEECH RECOGNITION USING GENETIC ALGORITHM ¹Asst.Prof. Dr. Jane J.Stephan, ²Asst.Lecture. Rasha H.Ali ¹Iraqi Commission for Computers and Informatics, Baghdad, Iraq ²Computer Department, Education College for Women, Baghdad University, Baghdad, Iraq ABSTRACT Speech recognition is a very important field that can be used in many applications such as banking, and transaction over telephone network database access service, voice email, investigations, and management. In this paper, an approach for recognition isolated Arabic words is presented. Discrete Wavelet Transform (DWT) from type Haar Wavelet with (third and fourth levels) and Magnitude is used in feature extraction and Genetic Algorithm (GA) is used in classification. The results showed that the recognition rate in third level was 90% and fourth level was 87.5%. Keywords: Speech Recognition, Genetic Algorithm, Wavelet Transform, Magnitude. 1. INTRODUCTION Speech signals are composed of a sequence of sound. These sound and the transitions between them serve as symbolic representation of information. The arrangement of these sounds (symbols) is governed by the rule of language. The study of these rules and their implications in human communication is the domain of linguistic. The study and classification of the sounds of speech is called phonetics. Speech can be represented in term of its message content or information. An alternative way of characterizing speech is in terms of the signal carrying the message information, i.e., the acoustic waveform [1]. Speech is one of the most important tools for communication between human and his environment, therefore manufacturing of Automatic System Recognition (ASR) is desire for him all the time [2]. Speech recognition systems are separated in several different classes by describing what types of utterances they have the ability to recognize. These classes are based on the fact that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes an utterance [3]. The types of speech are isolated word, connected word, and continuous word.in this paper the Arabic isolated words are treated for recognition. 76

2. GENETIC ALGORITHM Genetic algorithm based on natural genetics; therefore they share the same names. The genetic algorithms is a stochastic search technique (stochastic search use probability to help guide their search) inspired by the mechanics of natural selection and natural genetics [4]. The basic idea behind the genetic algorithms is to maintain a population of strings or chromosome, which are encoding of a potential solution to the problem being investigated. Each chromosome is tested using a fitness function to know the good solution of the problem. The new population is created by selecting chromosome from the old population. The new population is reevaluated and the processes continue until the solution is found [5]. The strings of artificial genetic system are analogous to chromosome in biological system. The chromosomes are composed of features, or detectors that are called genes. This may take on some number of values, called alleles. Features may be located on different positions on the string, the position of genes, it is locus, is identified separately from the gene's function. Thus, we can talk a particular gene, for example, an animal's eye color gene, its locus, position 10, and its allele value, blue eyes. The total package of strings (chromosome) is called a structure. These structures decoded to form a particular parameter set. [4]. In natural systems, one or more chromosome combined to form the total genetic prescription for the construction and operation of some organism. The total genetic package (structure) is called the genotype. The organism formed by the interaction of the total genetic package with its environment is called the phenotype [5]. 3. THE DATABASE OF SPEECH The system has been applied on eleven Arabic words; these words were recorded by microphone and independent speaker (5 speakers). Each speaker repeats each word 7 times, three for storing as reference and four for testing. The format of these files are wave format The total number of words in the database becomes 385 utterances,165 utterances are used for storing as reference and 220 utterances are used as testing in the proposed algorithm. Each word has different length. These words are: (عمان, رمان, سلطان, بغداد, حسين, يوسف, شبكة, الخير, تمارا, رفيف, ياسين ( This word was recorded at sampling rate of 22 KHz coded in 8bits, one channel. The digitization of the signal was made by the professional software "sound forge version 9.0"[6]. 4. ARCHITECTURE OF THE PROPOSED APPROACH The proposed approach for speech recognition consists of three s: the preprocessing, the feature extraction, and the classification. Figure (1) shows the architecture of the proposed approach. 77

Input signal Preprocessing Feature extraction Store as Reference Word recognized Classification using Genetic Algorithm Feature extraction Input signal Preprocessing Figure 1: The proposed approach for recognition 4.1 Preprocessing This represents the signal processing part (i.e. converting the signal to its parametric representation. The signal must be transition from analog sound to the digital sound [7]. The speech signal is blocked into frames of m samples. Since we deal with speech signal, which is non stationary signal (vary with time), the framing process is essential to deal with frame not with original signal. After this the speech signal has many frames and the number of frames depends on the number of samples for each word. The number of samples in each frame is 256 samples and the overlap value is 128 samples. 4.2 Feature extraction The speech signal consisting of infinite information, we must extract the most important ones. A direct comparison treatment on this kind of signal is impossible because there is too much information [7]. So the discrete wavelet transform (DWT) using haar function coefficients are applied on all the frames for all words used in this proposed work, the filter bank used for the extraction of features is (three and four) level, The DWT of the extracted frame (256 samples) will result in four sub bands called (d1, d2, d3, a3). First level results is 128 samples from (high pass filter + down sampling) called (d1) and 128 samples from (low pass filter + down sampling) called (a1). In the second level, the same filters are applied on the output (a1), this results in 64 samples from the (high pass filter + down sampling) called (d2) and 64 samples from the (low pass filter + down sampling) called (a2). The third level [using the same filters on (a2)] result is 32 samples from the (high pass filter + down sampling) called (d3) and 32 samples from the (low pass filter + down sampling) called (a3). Since most of the information is concentrated in the low frequency components only the 32 samples result from the low pass filters (a3) will be considered as the features of the input signal. 78

The DWT using (haar) scaling function coefficients are applied on all the frames for all words used in this proposed work. Then the magnitude for each signal to produce resultant features also named feature vectors that are classified by GA in classification. Eq (1) shows the computation of magnitude in each frame. This is used to reduce the number of data characterizing and shows a limited number of parameters or coefficients [2]. Nosamples 1 magn [ N ] = Data [ M ]. (1) M = 0 N Where m:- is number of samples, Data[M] is the value of signal. 4.3 Classification The recognition process is made by a genetic algorithm by comparing the test and reference files. Before using Genetic Algorithm, the features that are extracted from feature extraction is divided into blocks (such as 5 blocks) that can be used in matching by Genetic Algorithms for recognition the spoken word. Fig (2) showed the segmentation of extracted signal. The following points clarify how the genetic algorithm was used for speech recognition. Start Input the signal produced from Feature extraction Input the no. of blocks Compute Magnitude for each block Appling GA steps on blocks End Figure 2: Segmentation for Speech Signal 1. Initialization of the population The reference files are the population managed by Genetic Algorithms, the number of the individual in the population is 165 individuals. The choice of the initial population is random for each word to be recognized. 79

2. Encoding Binary encoding is used to encode the magnitude of feature vector for speech signal as show in table (1). Table (1) Magnitude computation for each block (as chromosome) Test file after process by Eq(1) In binary as block 3. Fitness Function To evaluate the population, the fitness of each individual or chromosome are be calculated, which is the differences between the word to be recognized and each word in the database, whenever the distance value is less or near to zero then the word is recognized. It's formula is derived from the Euclidean distance using Mean Square Error (MSE) as shown in Eq. (2) [8], and applied the fitness function using the Eq.(3) to limit the boundary of The Fitness value between (0,1), this means if the fitness value near to one then the best matching is being occur between the test and reference files and vise versa. n 1 2 1- Distance (A,B)= MSE = ( A B ).. (2) N i = 1 81 140 38 255 231 1010001 10001100 100110 11111111 11100111 A is the vector test file, B is the vector reference file, N is number of block 2- Fitness= 1 + 1 MSE.... (3) 4. Selecting After evaluating individuals of the population, the elitist selection method will be used; this method allows the Genetic Algorithm to retain a number of best individuals for the next generation. These individuals may be lost if they are not selected to reproduce. 5. Crossover Single point crossover was used in Magnitude features block by replace the value of some blocks between the test and reference that are selected randomly, but the replace operation only in first digit value of block in order to keep the features of the block and do not maximum change in value of block, The probability of the crossover is 0.7 as shown in following table (2) كلمة بغداد Table (2): Single Crossover for Magnitude blocking Magnitude value for each block Test 53 49 53 55 49 In Binary 1101 01 110001 110101 110111 110001 Reference 54 57 55 54 26 In Binary 1101 10 111001 110111 110110 11010 Test After crossover 1101 10 110001 110101 110111 110001 80

6. Mutation A mutation operation in this paper is simple change because keeping the feature extraction is very important in this work, so if the word did not recognized, the mutation operation will be done by replacing the individual. The probability of mutation is 0.001. 7. The Stop Criterion The stop criterion of the proposed approach is either the word is recognized or all subpopulations have been covered or the number of generation is finished. CONCLUSION In this paper, we suggest a system for automatic recognition of isolated Arabic words, with small vocabulary of 11 words each word recording 7 times that represent the Arabic language characteristic. Our system used discrete wavelet transform (DWT) and Magnitude (Magn) in feature extraction. And genetic algorithm in classification with binary encoding and single crossover. The recognition rate when using third level of DWT is 90% and the recognition rate when using fourth level of DWT is 87.5%. REFERENCES [1]- Holmes J. and Holmes W. "Speech Synthesis and Recognition", Second Edition,Taylor and Francis e-library London and New York, 2001. [2]- Rabiner L.R. and Schafer R.W., "Digital Processing of Speech Signals", Englewood Cliffs, New Jersey: Prentice Hall, 1978. [3]- Cook S. "Speech Recognition Howto", http://www.gear21.com/speech/big_html/speech_big.html, April, 19, 2007. [4]- Goldberg D., "Genetic Algorithm in search, Optimization and Machine Learning", Addison Wesely Longman, Boston, MA, 1989. [5]- Mitchell M., "An Introduction to Genetic Algorithms", The MIT press, Cambridge, MA, 1996. [6]- Sound Forge 9.0, http://www.wsystem.com/html/sound_forge.html. [7]- Maouche F., Benmohamed M.,"Automatic Recognition of Arabic Word by Genetic Algorithm and MFFC Modeling ", Faculty of Informatics, Mentouri University, Constantine, Algeria,2009. [8]- P Mahalakshmi and M R Reddy, Speech Processing Strategies for Cochlear Prostheses-The Past, Present and Future: A Tutorial Review, International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 3, Issue 2, 2012, pp. 197-206, ISSN Print: 0976-6480, ISSN Online: 0976-6499. [9]- Dr. Mustafa Dhiaa Al-Hassani and Dr. Abdulkareem A. Kadhim, Design a Text-Prompt Speaker Recognition System using LPC-Derived Features, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 4, Issue 3, 2013, pp. 68-84, ISSN Print: 0976 6405, ISSN Online: 0976 6413. [10]- Pallavi P. Ingale and Dr. S.L. Nalbalwar, Novel Approach to Text Independent Speaker Identification, International Journal of Electronics and Communication Engineering & Technology (IJECET), Volume 3, Issue 2, 2012, pp. 87-93, ISSN Print: 0976-6464, ISSN Online: 0976 6472. 81