PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY

Save this PDF as:
Size: px
Start display at page:

Download "PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY"

Transcription

1 PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY V. Karthikeyan 1 and V. J. Vijayalakshmi 2 1 Department of ECE, VCEW, Thiruchengode, Tamilnadu, India, 2 Department of EEE, KPRCET, Coimbatore Tamilnadu, India, Abstract - Performance Comparison of Speech Recognition for Voice Enabling Applications is presented. This paper focuses on the speaker independent system to provide all the mobile phone applications without touching the device especially for those visually challenged. The proposed system is evaluated through two different classification modeling such as Template generation using Dynamic time warping (DTW) and Hidden Markov Model (HMM) / Vector Quantization (VQ) with Mel Frequency Cepstral Coefficient (MFCC) features. The performance comparison between two classification modeling is made based on the average recognition accuracy Hidden Markov Model (HMM) / Vector Quantization (VQ) classification modeling with Mel Frequency Cepstral Coefficient (MFCC) features gives recognition rate of 82.77% for the utterances which is higher than the conventional methods. TMS320C2x DSP processor can be used to implement voice enabled mobile phone. Index Terms- Template generation, Dynamic time warping (DTW), Hidden Markov Model (HMM) / Vector Quantization (VQ) and Mel Frequency Cepstral Coefficient (MFCC) INTRODUCTION Speech recognition is a field in which the system recognizes the spoken words [1]. It enables the system to identify the words that a person speaks into microphone and recognizes them by converting them into written text. Automatic Speech Recognition (ASR) is the process of determining a sequence of words spoken by human using machines. The goal of ASR is to have speech as a medium of interaction between man and machine. This Automatic Speech Recognition (ASR) is commonly used in voice enabled mobile phones implemented using voice as an input. In these systems, it is necessary to perform recognition of voice input so that the appropriate action can be enabled. Voice enabled mobile phone refers to the enabling of mobile applications where a user may dial a contact in the device memory by saying the digit of the contact which is saved in speed dial or name of the contact[2], sending and receiving the message and making the phone to put on hold mode and loudspeaker mode. The first step in the process of voice enabling application is speech recognition. To do so, it is necessary to extract the features from the uttered word. Recognition of the uttered word is done by means of comparing both trained sample and test sample. The most commonly used feature extraction algorithm is Mel frequency Cepstral coefficient front end. The power frequency data is then filtered with filter that resembles the human ear s sensitivity curve called Mel scale Filter. While the human ear is sensitive to frequency variations in lower frequency values, this sensitivity is reduced for higher frequency 48

2 signal components. From the experimental results, it is well known that Mel Frequency Cepstral Coefficients (MFCC) is among the best acoustic features used in automatic speech recognition [3]. The Mel Frequency Cepstral Coefficients are robust, contain much information about the vocal tract configuration regardless the source of excitation, and can be used to represent all classes of speech sounds. This paper is organized as follows: In section 2, the general principles of ASR system have been discussed. Section 3 deals with the Mel Frequency Cepstral Coefficient (MFCC) feature extraction technique which is used in the proposed system and section 4 deals with two classification modeling. In section 5 results and conclusion are discussed which shows the accuracy of two classification modeling. AUTOMATIC SPEECH RECOGNITION SYSTEM In general, the Automatic Speech Recognition (ASR) system [4] consists of two modes such as training mode and testing mode. First the input speech signal is pre-processed and then features are extracted. From the extracted features reference samples are created from which the comparison and recognition is made. The general block diagram of Automatic Speech Recognition (ASR) system is shown in figure.1. Training Mode: In speaker dependent and speaker independent system, the system has to be trained by the speaker. It means that samples have to be collected from different speakers using microphone as an input device Accuracy is directly proportional to the no of samples Preprocessing methods are used to extract acoustic characteristics of the speech signal. Software analyses and generates patterns from the extracted feature vectors and stores it in matrix form as a reference pattern. These reference patterns are matched with the input speech during the recognition mode. TESTING MODE: In this mode, the test sample is analyzed for its acoustic characteristics and the important features are extracted from the input speech sample. This feature vectors are used to generate an input pattern using and stores it in matrix form. This unknown pattern is compared against known reference pattern, element by element. Once the best match is found, the appropriate action is enabled. Fig. 1.Block Diagram of Automatic Speech Recognition System FEATURE EXTRACTION Feature extraction is the key to the front-end process in speaker verification systems. The performance of a speaker verification system is highly dependent on the quality of the selected 49

3 speech features. The speech signal is a slowly varying signal and is often termed stationary. Therefore, short-time spectral analysis is the most common way to characterize speech signal. Before extracting the features the speech signal is preprocessed using following steps. i) Framing ii) Windowing, The speech signal is divided into short fixed length frames. The continuous speech signal is divided into frames where each frame consists of M samples [5]. Very often successive frames are overlapping with each other by M samples. For the proposed system, frame size of M = 256 with an overlap of 50% i.e. N=128 have been used. After frame segmentation, windowing is carried out to minimize the spectral distortion by using the window to taper the signal on both ends thus reducing the side effects caused by signal discontinuity at the beginning and at the end due to framing. Hamming window is used as spectral leakage is less. It is multiplied with each frame and the window function is as given in eqn. (1), (1) Where M is the number of samples in each frame There are different types of feature extraction techniques available such as Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Cepstral Co-efficient (LPCC), Bark Frequency Cepstral Co-efficient (BFCC), Perceptual Linear Prediction (PLP) and Rasta Perceptual Linear Prediction. Among those Mel Frequency Cepstral Coefficient (MFCC) provides good recognition accuracy. Mel-Frequency Cepstral Coefficient (MFCC) vectors are used to provide an estimate of the vocal tract filter [6]. Background noise energy level is evaluated at the beginning and the end of speech signal and energy thresholds are applied to find speech beginning and end points. The pre emphasized speech signal is blocked into frames of N = 256 samples, with adjacent frames separated by M = 128 samples. Then windowing is done to minimize the speech signal discontinuities at the beginning and end of each analysis frame. The general block diagram of Mel Frequency Cepstral Coefficient (MFCC) is shown in figure.2 Fig. 2. Block Diagram of MFCC Technique After windowing, Fast Fourier Transform (FFT) is applied. Then a spectrum is passed through 20 Mel scale triangular filter bank. The Mel scale is a critical band frequency scale that takes into account the frequency perception in the human auditory system. Discrete Cosine Transform (DCT) is applied to the log Mel scale filter outputs and thus 12 Mel-frequency Cepstral coefficients (MFCC) are obtained. Then Cepstral filtering is performed with the help of eqn. (2), (2) 50

4 Where N is the number of filter bank channels. Mel-Frequency Cepstral Coefficients (MFCCs) are calculated from the log filter bank amplitudes {mj} using the eqn. (3) Where N is the number of filter bank channels. SPEAKER MODELLING As the extracted features requires more storage memory, it is necessary to convert those features into vectors so that storage memory requirement can be met which can be done by using classification modeling. The proposed system is verified through Template model and Hidden Markov Model (HMM) / Vector Quantization (VQ). The block diagram of the proposed system is as shown in figure.3. (3) Fig.3. Block Diagram of Proposed System Template generation modeling uses Dynamic Time Warping (DTW) for speech pattern matching [7] for speaker dependent system in which it expands or contracts the time axis non-linearly to match the input speech with the reference template. The reason here to use Dynamic Time Warping (DTW) algorithm is that Dynamic Time Warping (DTW) -based recognition engine has been widely embedded inside Qualcomm MSM (Mobile Station Modem) chips because of its less computational complexity for phone dialing. Template Generation using DTW Templates are reference patterns that are derived from features recorded over the length of the whole word rather than at particular points. The procedure is as follows: Choose one utterance from the training data for reference. Use Dynamic Time Warping (DTW) technique to align all the training data to match with the reference. Once the training data are aligned, compute the reference pattern vector as the Centroid of the feature vectors (of cepstral coefficients) corresponding to all the occurrences of the digit. Dynamic Time Warping is used to create reference templates and to find the best match between the reference template and the input template derived from the test input speech sample. The matching process needs to compensate for length differences and take account of the non-linear 51

5 nature of the length differences within the words. Dynamic Time Warping (DTW) grid is used to find the best match between input data and stored sequence. We can find a path through the grid which minimizes the total distance between them. The input data is either stretched or compressed in order to match with the input. Once an overall path has been found, the total distance between the input sequence and reference sequence can be calculated for this particular input template. In Dynamic Time Warping method (DTW), when comparing sequences with different length, the sequence length is modified by repeating or omitting some frames, so that both sequences will have the same length. This modification of sequences is called time warping. Hidden Markov Model with Vector Quantization (HMM/VQ) Here the proposed method, speaker independent isolated word recognition is implemented using Vector Quantization (VQ) and Hidden Markov model (HMM) which is suitable to provide higher accuracy rate with more no of samples. One of the most popular approaches to speaker independent [8] speech recognition, is the combination of Vector Quantization (VQ) for the encoding of segments of speech with a Hidden Markov Modelling (HMM) for the classification of sequences of segments [9] as in figure.4. After extracting the features, K-means clustering algorithm is used to iteratively create the vector quantizer codebook until the average distance falls below a preset threshold. A set of such vectors (corresponding to multiple utterances of the same word) is used to re-estimate the Hidden Markov Model [10] for that word. Fig.4. Block Diagram of Hidden Markov Model with Vector Quantization After extracting the features, K-means clustering algorithm is used to iteratively create the vector quantizer codebook until the average distance falls below a preset threshold. A set of such vectors (corresponding to multiple utterances of the same word) is used to re-estimate the Hidden Markov Model [10] for that word. This procedure is repeated for each word in the vocabulary. In the testing mode, the set of Mel Frequency Cepstral Coefficient (MFCC) vectors corresponding to the unknown word is quantized by the vector quantizer to give a vector of codebook indices. This is scored on each word Hidden Markov model (HMM) to give a probability score for each word model. The decision rule is used to choose the word whose model gives the highest probability. VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a centroid. The collection of all code words is called a codebook [11]. In speech recognition, vector quantization can be used to train Hidden Markov model (HMMs). RESULTS AND DISCUSSION The proposed speaker independent isolated speech recognition system is implemented in 52

6 MATLAB software. Database containing 0-9 digits and 8 words Dial, Cut, On, Off, Hold, Read, Write and Loudspeaker are considered for voice dialing applications. The digits are used to dial a contact no assigned for speed dial. The words like Dial, Cut and On, Off and Hold are used to control voice dialing application and Mobile phone on and off operation respectively. The words like Read, Write and Loudspeaker are used to enable message read and write application and hands free mode respectively. Database is collected from 25 speakers pronouncing each word 10 times yielding 250 utterances per word and 180 utterances per speaker, totally 4500 samples. The first 8 utterance of each word is used for training and remaining is used for testing. In Training phase, the uttered digits are recorded using 8-bit Pluse Code Modulation (PCM) with a sampling rate of 8 KHz, which holds the information of the communication and converted as a wave file using Total audio converter software The performance of the speech recognition systems is given in terms of a word error rate (%) and is calculated from eqn.4, W= (M/N) x100% (4) Where: N = Total no of samples taken M = Total no of recognized samples In the confusion matrix, each uttered word is compared with all other words and it shows how many times each word was correctly recognized and from which the average recognition accuracy is calculated. The Table.1 shows the result of speech recognition system using Template generation modeling with Mel Frequency Cepstral Coefficient (MFCC) features. As shown in Table.1, digit 0 and 2 have the highest accuracy rate of 60% and digit 4 & 9 and the word cut have the least accuracy rate of 50%. The word off has the second highest accuracy rate of 58%. In Hidden Markov Model (HMM) / Vector Quantization (VQ) model, codebook is generated using Vector Quantization (VQ) method. In this codebook generation, some n no of clusters are generated after different no of iterations as shown in Table 2. As shown in Table.3, digit 0 has the highest accuracy rate of 88% and digit 2 has the least accuracy rate of 78%. Digit 9 and dial has the next highest accuracy rate of 87%. Words hold and off have the accuracy rate of 85%. Words Loudspeaker and on have the accuracy rate of 84%. Digit 5, 8 and word write have the accuracy rate of 83% Digit 3 and word cut have the accuracy rate of 82%. Digit 6 has the accuracy rate of 81% followed by the digit 4 and word write which have the accuracy rate of 80%. Comparison between the recognition rates of two different classification modeling with Mel Frequency Cepstral Coefficient (MFCC) features is shown in Table.4 which implies that Hidden Markov Model (HMM) / Vector Quantization (VQ) has higher recognition rate than Template generation modeling CONCLUSION The proposed system designed using Hidden Markov Model (HMM) / Vector Quantization (VQ) classification modeling with Mel Frequency Cepstral Coefficient (MFCC) features gives recognition rate of 82.77% for the utterances which is higher than Template generation modeling using Dynamic time warping (DTW) which is of 53.88%. Further it can be implemented using DSP processor for hardware implementation so that voice enabling applications can be made. TMS320C2x dsp processor can be used to implement voice enabled mobile phone. 53

7 54

8 References [1] Rabiner, L.R. and R.W. Schafer, Prentice-Hall Inc. Juha Iso-Sipila, Design and Implementation of a Speaker-Independent Voice Dialing System: A Multi- Lingual Approach Ph.D., 1978, Digital Processing of Speech Signals, April [2] Tilo Schurer, An Experimental comparison of different feature extraction for Telephone speech, in proc of 2 nd IEEE workshop on interactive voice technology for 55

9 Telecommunication applications, 1994 [3] Jason Chong and Roberto Togneri, Speaker Independent Recognition of Small Vocabulary, M.S.Thesis, Centre for Intelligent Information Processing Systems, the University of Western Australia [4] L. Rabiner and B. H. Jaung, Fundamentals of Speech recognition, Prentice Hall Englewood Cliffs, New Jersey, [5] S.B. Davis and P. Mermelstein, Comparison of Parametric representations for Monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-28(4), pp , August 1980 [6] Itakura F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-23 (1), [7] Schafer, R.W. Scientific bases of human-machine communication by voice in Proceedings of the National Academy of Science. Vol. 92, [8] Hoshimi M., Miyata M., and Hiraoka S., Speaker independent speech recognition method using training from a small number of speakers, in proc IEEE international conference on acoustics, speech and signal processing, pp ,1992. [9] Nilsson M., Ejnarsson M., Speech Recognition Using HMM: Performance Evaluation in Noisy Environments, MS Thesis, Blekinge Institute of Technology, Department of Telecommunications and Signal Processing, 2002 [10] Ferrer M.A., Alonso I., Travieso C., Influence of initialization and Stop Criteria on MM based recognizers Electronics letters of IEEE, Vol. 36, pp , June [11] Rabiner L.R., Levinson S.E., Rosenberg A.E., Wilson J.G., "Speaker independent Recognition of isolated words using clustering techniques, IEEE Trans. Acoustic Speech Signal Process Vol.27, pp ,

Isolated Speech Recognition Using MFCC and DTW

Isolated Speech Recognition Using MFCC and DTW Isolated Speech Recognition Using MFCC and DTW P.P.S.Subhashini Associate Professor, RVR & JC College of Engineering. ABSTRACT This paper describes an approach of isolated speech recognition by using the

More information

PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1

PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1 PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1 Kavya.B.M, 2 Sadashiva.V.Chakrasali Department of E&C, M.S.Ramaiah institute of technology, Bangalore, India Email: 1 kavyabm91@gmail.com,

More information

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 59 Feature Extraction Using Mel Frequency Cepstrum Coefficients for Automatic Speech Recognition Dr. C.V.Narashimulu

More information

INTRODUCTION. Keywords: VQ, Discrete HMM, Isolated Speech Recognizer. The discrete HMM isolated Hindi Speech recognizer

INTRODUCTION. Keywords: VQ, Discrete HMM, Isolated Speech Recognizer. The discrete HMM isolated Hindi Speech recognizer INVESTIGATIONS INTO THE EFFECT OF PROPOSED VQ TECHNIQUE ON ISOLATED HINDI SPEECH RECOGNITION USING DISCRETE HMM S Satish Kumar*, Prof. Jai Prakash** *Research Scholar, Mewar University, Rajasthan, India,

More information

An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features *

An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features * An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features * K. GOPALAN, TAO CHU, and XIAOFENG MIAO Department of Electrical and Computer Engineering Purdue University

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Speaker Identification for Biometric Access Control Using Hybrid Features

Speaker Identification for Biometric Access Control Using Hybrid Features Speaker Identification for Biometric Access Control Using Hybrid Features Avnish Bora Associate Prof. Department of ECE, JIET Jodhpur, India Dr.Jayashri Vajpai Prof. Department of EE,M.B.M.M Engg. College

More information

Design and Development of Database and Automatic Speech Recognition System for Travel Purpose in Marathi

Design and Development of Database and Automatic Speech Recognition System for Travel Purpose in Marathi IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 5, Ver. IV (Sep Oct. 2014), PP 97-104 Design and Development of Database and Automatic Speech Recognition

More information

Speaker Identification system using Mel Frequency Cepstral Coefficient and GMM technique

Speaker Identification system using Mel Frequency Cepstral Coefficient and GMM technique Speaker Identification system using Mel Frequency Cepstral Coefficient and GMM technique Om Prakash Prabhakar 1, Navneet Kumar Sahu 2 1 (Department of Electronics and Telecommunications, C.S.I.T.,Durg,India)

More information

Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers

Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers Vol.2, Issue.3, May-June 2012 pp-854-858 ISSN: 2249-6645 Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers Bishnu Prasad Das 1, Ranjan Parekh

More information

GENERATING AN ISOLATED WORD RECOGNITION SYSTEM USING MATLAB

GENERATING AN ISOLATED WORD RECOGNITION SYSTEM USING MATLAB GENERATING AN ISOLATED WORD RECOGNITION SYSTEM USING MATLAB Pinaki Satpathy 1*, Avisankar Roy 1, Kushal Roy 1, Raj Kumar Maity 1, Surajit Mukherjee 1 1 Asst. Prof., Electronics and Communication Engineering,

More information

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION James H. Nealand, Alan B. Bradley, & Margaret Lech School of Electrical and Computer Systems Engineering, RMIT University,

More information

SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH

SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH 1 SUREKHA RATHOD, 2 SANGITA NIKUMBH 1,2 Yadavrao Tasgaonkar Institute Of Engineering & Technology, YTIET, karjat, India E-mail:

More information

Speaker Recognition Using Vocal Tract Features

Speaker Recognition Using Vocal Tract Features International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 1 (August 2013) PP: 26-30 Speaker Recognition Using Vocal Tract Features Prasanth P. S. Sree Chitra

More information

A Study of Speech Emotion and Speaker Identification System using VQ and GMM

A Study of Speech Emotion and Speaker Identification System using VQ and GMM www.ijcsi.org http://dx.doi.org/10.20943/01201604.4146 41 A Study of Speech Emotion and Speaker Identification System using VQ and Sushma Bahuguna 1, Y. P. Raiwani 2 1 BCIIT (Affiliated to GGSIPU) New

More information

International Journal of Computer Trends and Technology (IJCTT) Volume 39 Number 2 - September2016

International Journal of Computer Trends and Technology (IJCTT) Volume 39 Number 2 - September2016 Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices Swapnanil Gogoi 1, Utpal Bhattacharjee 2 1

More information

RECENT ADVANCES in COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS and CYBERNETICS

RECENT ADVANCES in COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS and CYBERNETICS Gammachirp based speech analysis for speaker identification MOUSLEM BOUCHAMEKH, BOUALEM BOUSSEKSOU, DAOUD BERKANI Signal and Communication Laboratory Electronics Department National Polytechnics School,

More information

Text-Independent Speaker Recognition System

Text-Independent Speaker Recognition System Text-Independent Speaker Recognition System ABSTRACT The article introduces a simple, yet complete and representative text-independent speaker recognition system. The system can not only recognize different

More information

Speaker Recognition in Farsi Language

Speaker Recognition in Farsi Language Speaker Recognition in Farsi Language Marjan. Shahchera Abstract Speaker recognition is the process of identifying a person with his voice. Speaker recognition includes verification and identification.

More information

Speaker Identification based on GFCC using GMM

Speaker Identification based on GFCC using GMM Speaker Identification based on GFCC using GMM Md. Moinuddin Arunkumar N. Kanthi M. Tech. Student, E&CE Dept., PDACE Asst. Professor, E&CE Dept., PDACE Abstract: The performance of the conventional speaker

More information

Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems

Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems Nisha.V.S, M.Jayasheela Abstract Speaker recognition is the process of automatically recognizing a person on the basis

More information

COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM

COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM Leena R Mehta 1, S.P.Mahajan 2, Amol S Dabhade 3 Lecturer, Dept. of ECE, Cusrow Wadia Institute of Technology, Pune, Maharashtra,

More information

AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION

AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION Hassan Dahan, Abdul Hussin, Zaidi Razak, Mourad Odelha University of Malaya (MALAYSIA) hasbri@um.edu.my Abstract Automatic articulation scoring

More information

Study of Speaker s Emotion Identification for Hindi Speech

Study of Speaker s Emotion Identification for Hindi Speech Study of Speaker s Emotion Identification for Hindi Speech Sushma Bahuguna BCIIT, New Delhi, India sushmabahuguna@gmail.com Y.P Raiwani Dept. of Computer Science and Engineering, HNB Garhwal University

More information

Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words

Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words Suitable Feature Extraction and Recognition Technique for Isolated Tamil Spoken Words Vimala.C, Radha.V Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for

More information

VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS

VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS Vol 9, Suppl. 3, 2016 Online - 2455-3891 Print - 0974-2441 Research Article VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS ABSTRACT MAHALAKSHMI P 1 *, MURUGANANDAM M 2, SHARMILA

More information

I.INTRODUCTION. Fig 1. The Human Speech Production System. Amandeep Singh Gill, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18552

I.INTRODUCTION. Fig 1. The Human Speech Production System. Amandeep Singh Gill, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18552 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 10 Oct. 2016, Page No. 18552-18556 A Review on Feature Extraction Techniques for Speech Processing

More information

Speech Recognition with Indonesian Language for Controlling Electric Wheelchair

Speech Recognition with Indonesian Language for Controlling Electric Wheelchair Speech Recognition with Indonesian Language for Controlling Electric Wheelchair Daniel Christian Yunanto Master of Information Technology Sekolah Tinggi Teknik Surabaya Surabaya, Indonesia danielcy23411004@gmail.com

More information

Isolated Word Recognition for Marathi Language using VQ and HMM

Isolated Word Recognition for Marathi Language using VQ and HMM Isolated Word Recognition for Marathi Language using VQ and HMM Kayte Charansing Nathoosing Department Of Computer Science, Indraraj College, Sillod. Dist. Aurangabad, 431112 (M.S.) India charankayte@gmail.com

More information

Introduction to Speech Technology

Introduction to Speech Technology 13/Nov/2008 Introduction to Speech Technology Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition

More information

Spoken Language Identification Using Hybrid Feature Extraction Methods

Spoken Language Identification Using Hybrid Feature Extraction Methods JOURNAL OF TELECOMMUNICATIONS, VOLUME 1, ISSUE 2, MARCH 2010 11 Spoken Language Identification Using Hybrid Feature Extraction Methods Pawan Kumar, Astik Biswas, A.N. Mishra and Mahesh Chandra Abstract

More information

Speaker Recognition Using MFCC and GMM with EM

Speaker Recognition Using MFCC and GMM with EM RESEARCH ARTICLE OPEN ACCESS Speaker Recognition Using MFCC and GMM with EM Apurva Adikane, Minal Moon, Pooja Dehankar, Shraddha Borkar, Sandip Desai Department of Electronics and Telecommunications, Yeshwantrao

More information

Automatic Speech Recognition using Different Techniques

Automatic Speech Recognition using Different Techniques Automatic Speech Recognition using Different Techniques Vaibhavi Trivedi 1, Chetan Singadiya 2 1 Gujarat Technological University, Department of Master of Computer Engineering, Noble Engineering College,

More information

Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis

Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis Speaker Transformation Goal: map acoustic properties of one speaker onto another Uses: Personification of

More information

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral EVALUATION OF AUTOMATIC SPEAKER RECOGNITION APPROACHES Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral matousek@kiv.zcu.cz Abstract: This paper deals with

More information

Automatic Speech Recognition Theoretical background material

Automatic Speech Recognition Theoretical background material Automatic Speech Recognition Theoretical background material Written by Bálint Lükõ, 1998 Translated and revised by Balázs Tarján, 2011 Budapest, BME-TMIT CONTENTS 1. INTRODUCTION... 3 2. ABOUT SPEECH

More information

Modulation frequency features for phoneme recognition in noisy speech

Modulation frequency features for phoneme recognition in noisy speech Modulation frequency features for phoneme recognition in noisy speech Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Idiap Research Institute, Rue Marconi 19, 1920 Martigny, Switzerland Ecole Polytechnique

More information

Myanmar Language Speech Recognition with Hybrid Artificial Neural Network and Hidden Markov Model

Myanmar Language Speech Recognition with Hybrid Artificial Neural Network and Hidden Markov Model ISBN 978-93-84468-20-0 Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT'2015) Singapore, March 29-30, 2015, pp. 116-122 Myanmar Language Speech Recognition with

More information

MareText Independent Speaker Identification based on K-mean Algorithm

MareText Independent Speaker Identification based on K-mean Algorithm International Journal on Electrical Engineering and Informatics Volume 3, Number 1, 2011 MareText Independent Speaker Identification based on K-mean Algorithm Allam Mousa Electrical Engineering Department

More information

CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL

CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL Speaker recognition is a pattern recognition task which involves three phases namely,

More information

Prof. Soedharto Street, Kampus UNDIP Tembalang, Semarang, Indonesia

Prof. Soedharto Street, Kampus UNDIP Tembalang, Semarang, Indonesia Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM) Sukmawati Nur Endah 1, a, Satriyo Adhy 2,b, Sutikno 3,c and Rizky Akbar 4,d 1,2,3,4 Informatics

More information

Performance Analysis of Spoken Arabic Digits Recognition Techniques

Performance Analysis of Spoken Arabic Digits Recognition Techniques JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL., NO., JUNE 5 Performance Analysis of Spoken Arabic Digits Recognition Techniques Ali Ganoun and Ibrahim Almerhag Abstract A performance evaluation of

More information

21-23 September 2009, Beijing, China. Evaluation of Automatic Speaker Recognition Approaches

21-23 September 2009, Beijing, China. Evaluation of Automatic Speaker Recognition Approaches 21-23 September 2009, Beijing, China Evaluation of Automatic Speaker Recognition Approaches Pavel Kral, Kamil Jezek, Petr Jedlicka a University of West Bohemia, Dept. of Computer Science and Engineering,

More information

Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features

Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features Siddheshwar S. Gangonda*, Dr. Prachi Mukherji** *(Smt. K. N. College of Engineering,Wadgaon(Bk), Pune, India). sgangonda@gmail.com

More information

Comparison of HMM and DTW for Isolated Word Recognition System of Punjabi Language

Comparison of HMM and DTW for Isolated Word Recognition System of Punjabi Language Comparison of HMM and DTW for Isolated Word Recognition System of Punjabi Language Kumar Ravinder Department of Computer Science & Engineering, Thapar Univeristy, Patiala 147004 (India) ravinder@thapar.edu

More information

Speech processing for isolated Marathi word recognition using MFCC and DTW features

Speech processing for isolated Marathi word recognition using MFCC and DTW features Speech processing for isolated Marathi word recognition using MFCC and DTW features Mayur Babaji Shinde Department of Electronics and Communication Engineering Sandip Institute of Technology & Research

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

Volume 1, No.3, November December 2012

Volume 1, No.3, November December 2012 Volume 1, No.3, November December 2012 Suchismita Sinha et al, International Journal of Computing, Communications and Networking, 1(3), November-December 2012, 115-125 International Journal of Computing,

More information

LBP BASED RECURSIVE AVERAGING FOR BABBLE NOISE REDUCTION APPLIED TO AUTOMATIC SPEECH RECOGNITION. Qiming Zhu and John J. Soraghan

LBP BASED RECURSIVE AVERAGING FOR BABBLE NOISE REDUCTION APPLIED TO AUTOMATIC SPEECH RECOGNITION. Qiming Zhu and John J. Soraghan LBP BASED RECURSIVE AVERAGING FOR BABBLE NOISE REDUCTION APPLIED TO AUTOMATIC SPEECH RECOGNITION Qiming Zhu and John J. Soraghan Centre for Excellence in Signal and Image Processing (CeSIP), University

More information

Fuzzy Clustering For Speaker Identification MFCC + Neural Network

Fuzzy Clustering For Speaker Identification MFCC + Neural Network Fuzzy Clustering For Speaker Identification MFCC + Neural Network Angel Mathew 1, Preethy Prince Thachil 2 Assistant Professor, Ilahia College of Engineering and Technology, Muvattupuzha, India 2 M.Tech

More information

L12: Template matching

L12: Template matching Introduction to ASR Pattern matching Dynamic time warping Refinements to DTW L12: Template matching This lecture is based on [Holmes, 2001, ch. 8] Introduction to Speech Processing Ricardo Gutierrez-Osuna

More information

I D I A P R E S E A R C H R E P O R T. Sriram Ganapathy a b. June 2008

I D I A P R E S E A R C H R E P O R T. Sriram Ganapathy a b. June 2008 R E S E A R C H R E P O R T I D I A P Hilbert Envelope Based Spectro-Temporal Features for Phoneme Recognition in Telephone Speech Samuel Thomas a b Hynek Hermansky a b IDIAP RR 08-18 June 2008 Sriram

More information

Speech Recognisation System Using Wavelet Transform

Speech Recognisation System Using Wavelet Transform Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 6, June 2014, pg.421

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR

Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR Hans-Günter Hirsch Institute for Pattern Recognition, Niederrhein University of Applied Sciences, Krefeld,

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

HUMAN SPEECH EMOTION RECOGNITION

HUMAN SPEECH EMOTION RECOGNITION HUMAN SPEECH EMOTION RECOGNITION Maheshwari Selvaraj #1 Dr.R.Bhuvana #2 S.Padmaja #3 #1,#2 Assistant Professor, Department of Computer Application, Department of Software Application, A.M.Jain College,Chennai,

More information

Speech to Text Conversion in Malayalam

Speech to Text Conversion in Malayalam Speech to Text Conversion in Malayalam Preena Johnson 1, Jishna K C 2, Soumya S 3 1 (B.Tech graduate, Computer Science and Engineering, College of Engineering Munnar/CUSAT, India) 2 (B.Tech graduate, Computer

More information

Implementation of Vocal Tract Length Normalization for Phoneme Recognition on TIMIT Speech Corpus

Implementation of Vocal Tract Length Normalization for Phoneme Recognition on TIMIT Speech Corpus 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Implementation of Vocal Tract Length Normalization for Phoneme Recognition

More information

Using Neural Networks for a Discriminant Speech Recognition System

Using Neural Networks for a Discriminant Speech Recognition System 12 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS, Suceava, Romania, May 15-17, 2014 Using Neural Networks for a Discriminant Speech Recognition System Daniela ŞCHIOPU, Mihaela OPREA

More information

Towards Lower Error Rates in Phoneme Recognition

Towards Lower Error Rates in Phoneme Recognition Towards Lower Error Rates in Phoneme Recognition Petr Schwarz, Pavel Matějka, and Jan Černocký Brno University of Technology, Czech Republic schwarzp matejkap cernocky@fit.vutbr.cz Abstract. We investigate

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 11, November ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 11, November ISSN International Journal of Scientific & Engineering Research, Volume 6, Issue 11, November-2015 185 Speech Recognition with Hidden Markov Model: A Review Shivam Sharma Abstract: The concept of Recognition

More information

Automatic Speech Recognition system for class room database management in Fixed C Language

Automatic Speech Recognition system for class room database management in Fixed C Language IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 4, Ver. III (Jul.-Aug. 2017), PP 62-68 www.iosrjournals.org Automatic Speech

More information

I D I A P R E S E A R C H R E P O R T. July submitted for publication

I D I A P R E S E A R C H R E P O R T. July submitted for publication R E S E A R C H R E P O R T I D I A P Analysis of Confusion Matrix to Combine Evidence for Phoneme Recognition S. R. Mahadeva Prasanna a B. Yegnanarayana b Joel Praveen Pinto and Hynek Hermansky c d IDIAP

More information

A NEW SPEAKER VERIFICATION APPROACH FOR BIOMETRIC SYSTEM

A NEW SPEAKER VERIFICATION APPROACH FOR BIOMETRIC SYSTEM A NEW SPEAKER VERIFICATION APPROACH FOR BIOMETRIC SYSTEM J.INDRA 1 N.KASTHURI 2 M.BALASHANKAR 3 S.GEETHA MANJURI 4 1 Assistant Professor (Sl.G),Dept of Electronics and Instrumentation Engineering, 2 Professor,

More information

Stochastic techniques in deriving perceptual knowledge.

Stochastic techniques in deriving perceptual knowledge. Stochastic techniques in deriving perceptual knowledge. Hynek Hermansky IDIAP Research Institute, Martigny, Switzerland Abstract The paper argues on examples of selected past works that stochastic and

More information

Course Name: Speech Processing Course Code: IT443

Course Name: Speech Processing Course Code: IT443 Course Name: Speech Processing Course Code: IT443 I. Basic Course Information Major or minor element of program: Major Department offering the course: Information Technology Department Academic level:400

More information

VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT

VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT Prerana Das, Kakali Acharjee, Pranab Das and Vijay Prasad* Department of Computer Science & Engineering and Information Technology, School of Technology, Assam

More information

Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier

Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier SWATHY M.S / PG Scholar Dept.of ECE Thejus Engineering College Thrissur, India MAHESH K.R/Assistant Professor Dept.of ECE Thejus Engineering

More information

NEURAL NETWORKS FOR HINDI SPEECH RECOGNITION

NEURAL NETWORKS FOR HINDI SPEECH RECOGNITION NEURAL NETWORKS FOR HINDI SPEECH RECOGNITION Poonam Sharma Department of CSE & IT The NorthCap University, Gurgaon, Haryana, India Abstract Automatic Speech Recognition System has been a challenging and

More information

TEXT-INDEPENDENT SPEAKER IDENTIFICATION SYSTEM USING AVERAGE PITCH AND FORMANT ANALYSIS

TEXT-INDEPENDENT SPEAKER IDENTIFICATION SYSTEM USING AVERAGE PITCH AND FORMANT ANALYSIS TEXT-INDEPENDENT SPEAKER IDENTIFICATION SYSTEM USING AVERAGE PITCH AND FORMANT ANALYSIS M. A. Bashar 1, Md. Tofael Ahmed 2, Md. Syduzzaman 3, Pritam Jyoti Ray 4 and A. Z. M. Touhidul Islam 5 1 Department

More information

Pass Phrase Based Speaker Recognition for Authentication

Pass Phrase Based Speaker Recognition for Authentication Pass Phrase Based Speaker Recognition for Authentication Heinz Hertlein, Dr. Robert Frischholz, Dr. Elmar Nöth* HumanScan GmbH Wetterkreuz 19a 91058 Erlangen/Tennenlohe, Germany * Chair for Pattern Recognition,

More information

Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks

Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks Bajibabu Bollepalli, Jonas Beskow, Joakim Gustafson Department of Speech, Music and Hearing, KTH, Sweden Abstract. Majority

More information

Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM

Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM Dr.E.Chandra M.Sc., M.phil., PhD 1, S.Sujiya M.C.A., MSc(Psyc) 2 1. Director, Department of Computer Science, Dr.SNS Rajalakshmi College

More information

Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition

Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition Ibrahim Missaoui and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School of

More information

A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network

A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network A Novel Approach for Text-Independent Speaker Identification Using Artificial Neural Network Md. Monirul Islam 1, FahimHasan Khan 2, AbulAhsan Md. Mahmudul Haque 3 Senior Software Engineer, Samsung Bangladesh

More information

SPEAKER RECOGNITION MODEL BASED ON GENERALIZED GAMMA DISTRIBUTION USING COMPOUND TRANSFORMED DYNAMIC FEATURE VECTOR

SPEAKER RECOGNITION MODEL BASED ON GENERALIZED GAMMA DISTRIBUTION USING COMPOUND TRANSFORMED DYNAMIC FEATURE VECTOR SPEAKER RECOGNITION MODEL BASED ON GENERALIZED GAMMA DISTRIBUTION USING COMPOUND TRANSFORMED DYNAMIC FEATURE VECTOR K Suri Babu 1, Srinivas Yarramalle 2, Suresh Varma Penumatsa 3 1 Scientist, NSTL (DRDO),Govt.

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-213 1439 Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine Akshay S. Utane, Dr.

More information

Design and Implementation of Silent Pause Stuttered Speech Recognition System

Design and Implementation of Silent Pause Stuttered Speech Recognition System Design and Implementation of Silent Pause Stuttered Speech Recognition System V.Naveen Kumar 1, Y Padma Sai 2, C Om Prakash 3 Project Engineer, Dept. of ECE, VNRVJIET, Bachupally, Hyderabad, Telangana,

More information

A new method to distinguish non-voice and voice in speech recognition

A new method to distinguish non-voice and voice in speech recognition A new method to distinguish non-voice and voice in speech recognition LI CHANGCHUN Centre for Signal Processing NANYANG TECHNOLOGICAL UNIVERSITY SINGAPORE 639798 Abstract we addressed the problem of remove

More information

Speech Recognition for Keyword Spotting using a Set of Modulation Based Features Preliminary Results *

Speech Recognition for Keyword Spotting using a Set of Modulation Based Features Preliminary Results * Speech Recognition for Keyword Spotting using a Set of Modulation Based Features Preliminary Results * Kaliappan GOPALAN and Tao CHU Department of Electrical and Computer Engineering Purdue University

More information

An Emotion Recognition System based on Right Truncated Gaussian Mixture Model

An Emotion Recognition System based on Right Truncated Gaussian Mixture Model An Emotion Recognition System based on Right Truncated Gaussian Mixture Model N. Murali Krishna 1 Y. Srinivas 2 P.V. Lakshmi 3 Asst Professor Professor Professor Dept of CSE, GITAM University Dept of IT,

More information

Lombard Speech Recognition: A Comparative Study

Lombard Speech Recognition: A Comparative Study Lombard Speech Recognition: A Comparative Study H. Bořil 1, P. Fousek 1, D. Sündermann 2, P. Červa 3, J. Žďánský 3 1 Czech Technical University in Prague, Czech Republic {borilh, p.fousek}@gmail.com 2

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (I)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (I) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (I) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Comparative study of automatic speech recognition techniques

Comparative study of automatic speech recognition techniques Published in IET Signal Processing Received on 21st May 2012 Revised on 26th November 2012 Accepted on 8th January 2013 ISSN 1751-9675 Comparative study of automatic speech recognition techniques Michelle

More information

LPC and MFCC Performance Evaluation with Artificial Neural Network for Spoken Language Identification

LPC and MFCC Performance Evaluation with Artificial Neural Network for Spoken Language Identification International Journal of Signal Processing, Image Processing and Pattern Recognition LPC and MFCC Performance Evaluation with Artificial Neural Network for Spoken Language Identification Eslam Mansour

More information

PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION. Jianglin Wang, Michael T. Johnson

PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION. Jianglin Wang, Michael T. Johnson 2014 IEEE International Conference on Acoustic, and Processing (ICASSP) PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION Jianglin Wang, Michael T. Johnson and Processing Laboratory

More information

CRIMINALISTIC PERSON IDENTIFICATION BY VOICE SYSTEM

CRIMINALISTIC PERSON IDENTIFICATION BY VOICE SYSTEM CRIMINALISTIC PERSON IDENTIFICATION BY VOICE SYSTEM Bernardas SALNA Lithuanian Institute of Forensic Examination, Vilnius, Lithuania ABSTRACT: Person recognition by voice system of the Lithuanian Institute

More information

Phoneme Recognition using Hidden Markov Models: Evaluation with signal parameterization techniques

Phoneme Recognition using Hidden Markov Models: Evaluation with signal parameterization techniques Phoneme Recognition using Hidden Markov Models: Evaluation with signal parameterization techniques Ines BEN FREDJ and Kaïs OUNI Research Unit Signals and Mechatronic Systems SMS, Higher School of Technology

More information

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 WAVELET ENTROPY AND NEURAL NETWORK FOR TEXT-DEPENDENT SPEAKER IDENTIFICATION Ms.M.D.Pawar 1, Ms.S.C.Saraf 2, Ms.P.P.Patil

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976-6375(Online), Volume TECHNOLOGY 5, Issue 5, May (2014),

More information

A Low-Complexity Speaker-and-Word Recognition Application for Resource- Constrained Devices

A Low-Complexity Speaker-and-Word Recognition Application for Resource- Constrained Devices A Low-Complexity Speaker-and-Word Application for Resource- Constrained Devices G. R. Dhinesh, G. R. Jagadeesh, T. Srikanthan Centre for High Performance Embedded Systems Nanyang Technological University,

More information

I D I A P R E S E A R C H R E P O R T. Sriram Ganapathy a b. May to appear in EUSIPCO 2008

I D I A P R E S E A R C H R E P O R T. Sriram Ganapathy a b. May to appear in EUSIPCO 2008 R E S E A R C H R E P O R T I D I A P Spectro-Temporal Features for Automatic Speech Recognition using Linear Prediction in Spectral Domain Samuel Thomas a b Hynek Hermansky a b IDIAP RR 08-05 May 2008

More information

Keywords: Spoken Hindi word & numerals, Fourier descriptors, Correlation, Mel Frequency Cepstral Coefficient (MFCC) and Feature extraction.

Keywords: Spoken Hindi word & numerals, Fourier descriptors, Correlation, Mel Frequency Cepstral Coefficient (MFCC) and Feature extraction. Volume 3, Issue 5, May 213 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Frequency Analisys

More information

Real-Time Speaker Identification

Real-Time Speaker Identification Real-Time Speaker Identification Evgeny Karpov 15.01.2003 University of Joensuu Department of Computer Science Master s Thesis Table of Contents 1 Introduction...1 1.1 Basic definitions...1 1.2 Applications...4

More information

A SURVEY: SPEECH EMOTION IDENTIFICATION

A SURVEY: SPEECH EMOTION IDENTIFICATION A SURVEY: SPEECH EMOTION IDENTIFICATION Sejal Patel 1, Salman Bombaywala 2 M.E. Students, Department Of EC, SNPIT & RC, Umrakh, Gujarat, India 1 Assistant Professor, Department Of EC, SNPIT & RC, Umrakh,

More information

II. SID AND ITS CHALLENGES

II. SID AND ITS CHALLENGES Call Centre Speaker Identification using Telephone and Data Lerato Lerato and Daniel Mashao Dept. of Electrical Engineering, University of Cape Town Rondebosch 7800, Cape Town, South Africa llerato@crg.ee.uct.ac.za,

More information

L16: Speaker recognition

L16: Speaker recognition L16: Speaker recognition Introduction Measurement of speaker characteristics Construction of speaker models Decision and performance Applications [This lecture is based on Rosenberg et al., 2008, in Benesty

More information

On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification

On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification Md. Sahidullah and Goutam Saha Department of Electronics and Electrical Communication Engineering Indian Institute of

More information

Voice and Speech Recognition for Tamil Words and Numerals

Voice and Speech Recognition for Tamil Words and Numerals Vol.2, Issue.5, Sep-Oct. 2012 pp-3406-3414 ISSN: 2249-6645 Voice and Speech Recognition for Tamil Words and Numerals V. S. Dharun, M. Karnan Research Scholar, Manonmaniam Sundaranar University, Tirunelveli,

More information