A Hybrid Model of MFCC/MSFLA for Speaker Recognition

Similar documents
Speech Emotion Recognition Using Support Vector Machine

Human Emotion Recognition From Speech

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A study of speaker adaptation for DNN-based speech synthesis

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speaker recognition using universal background model on YOHO database

Speaker Identification by Comparison of Smart Methods. Abstract

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Learning Methods in Multilingual Speech Recognition

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Modeling function word errors in DNN-HMM based LVCSR systems

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Modeling function word errors in DNN-HMM based LVCSR systems

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Classification Using ANN: A Review

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Word Segmentation of Off-line Handwritten Documents

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Affective Classification of Generic Audio Clips using Regression Models

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

WHEN THERE IS A mismatch between the acoustic

Support Vector Machines for Speaker and Language Recognition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Speaker Recognition. Speaker Diarization and Identification

Learning Methods for Fuzzy Systems

Evolutive Neural Net Fuzzy Filtering: Basic Description

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Automatic Pronunciation Checker

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Problems of the Arabic OCR: New Attitudes

Mining Association Rules in Student s Assessment Data

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Calibration of Confidence Measures in Speech Recognition

Seminar - Organic Computing

Segregation of Unvoiced Speech from Nonspeech Interference

Reducing Features to Improve Bug Prediction

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Rule Learning With Negation: Issues Regarding Effectiveness

Spoofing and countermeasures for automatic speaker verification

Australian Journal of Basic and Applied Sciences

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Python Machine Learning

Probabilistic Latent Semantic Analysis

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

On the Formation of Phoneme Categories in DNN Acoustic Models

Speech Recognition at ICSI: Broadcast News and beyond

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Laboratorio di Intelligenza Artificiale e Robotica

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Lecture 1: Basic Concepts of Machine Learning

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

Soft Computing based Learning for Cognitive Radio

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Artificial Neural Networks written examination

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Laboratorio di Intelligenza Artificiale e Robotica

Proceedings of Meetings on Acoustics

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Time series prediction

Circuit Simulators: A Revolutionary E-Learning Platform

THE enormous growth of unstructured data, including

On-Line Data Analytics

Axiom 2013 Team Description Paper

Rule Learning with Negation: Issues Regarding Effectiveness

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS

An Online Handwriting Recognition System For Turkish

Speech Recognition by Indexing and Sequencing

Lecture 1: Machine Learning Basics

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Softprop: Softmax Neural Network Backpropagation Learning

A Pipelined Approach for Iterative Software Process Model

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Generative models and adversarial training

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Investigation on Mandarin Broadcast News Speech Recognition

GDP Falls as MBA Rises?

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Assignment 1: Predicting Amazon Review Ratings

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Transcription:

American Journal of Computer Science and Engineering 2015; 2(5): 32-37 Published online August 30, 2015 (http://www.openscienceonline.com/journal/ajcse) A Hybrid Model of MFCC/MSFLA for Speaker Recognition Majida Ali Abed 1, Hamid Ali Abed Alasadi 2 1 College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2 Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq Email address majida.ali@tu.edu.iq (M. A. Abed), hamid_alasadi@ieee.org (H. A. A. Alasadi) To cite this article Majida Ali Abed, Hamid Ali Abed Alasadi. A Hybrid Model of MFCC/MSFLA for Speaker Recognition. American Journal of Computer Science and Engineering. Vol. 2, No. 5, 2015, pp. 32-37. Abstract In this paper, speaker recognition system is optimized based on one of Swarm Intelligence Algorithm called Modified Shuffle Frog Leaping Algorithm (MSFLA) with Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) feature extraction approach. In this algorithm Search has been applied on speaker recognition systems and voice. Thus by applying this algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the extracted optimized features produced by the MSFLA. The recognition accuracy for various noise conditions (white Gaussian noises, car-noises and B-noises) with same dataset are 94.02%, 96.78% and 84.33%, respectively, using a Hybrid model of MFCC/MSFLA. Keywords Speaker Recognition, Mel Frequency Cepstral Coefficients (MFCCs), Modified Shuffled Frog Leaping Algorithm (MSFLA) 1. Introduction Speaker recognition systems became the topic of research in the early 1970 s [1]. Some of the first studies of speaker recognition were published in 1971, which used feature extraction technique included, pitch contours [2], Linear Prediction (LP), Cepstral analysis, linear prediction error energy and autocorrelation coefficients.current speaker recognition research depend on the Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) are the most common short-time feature extraction approaches [3]. Speaker recognition includes speaker identification or speaker verification based on his/her voice in the form of speech. Speech signal carries information about speech message, speaker and also the environment of recording. For speaker recognition, speech data from a speaker is collected and is used to develop a model for capturing the speaker specific information. For text-independent speaker recognition the speech data is usually of about one minute duration. The model speaker is divided two models [4]. (1). Statistical model like a Gaussian Mixture Model, Hidden Markov Model, Support Vector Machines (SVM) and Vector Quantization (VQ). (2). Neural network model like Feed forward Auto associative network Now these two models are used as classification methods in speaker recognition based by applying the evolutionary algorithms such as genetic algorithms and genetic programming, Swarm Intelligence (SI) algorithms such as Ant Colony Optimization (ACO), Bee Colony Optimization (BCO), Cat Swarm Optimization (CSO), Shuffled Frog Leaping Algorithm (SFLA), and Cuckoo Search Algorithm (CSA). The process of Speaker Recognition is optimized by a fitness function of these algorithms by matching of voices being done on only the extracted optimized features produced by the Swarm Intelligence (SI) algorithms [5, 6]. In Our paper we used Modified Shuffled Frog Leaping Algorithm (MSFLA). Our paper is prepared as, Section 2; we discuss the principle of speaker recognition, Section 3, features extraction used in this paper. In Sections 4 and 5, the principle of MSFLA and the speaker recognition system using the MSFLA are described, respectively. The performance of the recognition systems based on principle of speaker recognition and system features is evaluated, and the

33 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition results are discussed in Section 6. Section 7, gives a conclusion of the paper. 2. Speaker Recognition The speaker recognition task is often divided into two related applications and Characterized into text-independent and text-dependent recognition [7]. As shown in Figure (1): Speaker Identification. Speaker Verification. Speaker identification is used to determine the speaker from a set of registered speakers when the result of this set is finest speaker matched, the set is called closed set identification but when the result can be a speaker or a nomatch result and is called open set identification. Speaker Verification determines if the voice matches a particular registered speaker result is the probability of a match or a similarity measure [8]. Figure (1). The two essential tasks of speaker recognition. 3. Feature Extraction Modified Shuffle Frog Leaping Algorithm (MSFLA) work on only on best features, so there is a need to initially extract the features from the voices [9]. There are many different speech features that have been shown to be indicative of speaker identity. These include field related features: Linear Prediction Cepstral Coefficients (LPCCs). Maximum Autocorrelation Value (MACV). Mel Frequency Cepstral Coefficients (MFCCs). We used in our research the speech feature Mel Frequency Cepstral Coefficients (MFCCs) extracted from the spectrum. The reason for use this speech feature is that in many applications speaker identification is a precursor to speech recognition, to identify what is being said. Among the possible features MFCCs have verified to be the most successful and hearty features for speech recognition [10]. The features will be extracted from the inputted voice. This inputted voice will be in the form of spectrograms consisting of various frequencies as per time. Fourier-Bessel Cepstral coefficients (FBCC) based feature extraction indicates an improved accuracy and efficiency in comparison to (LPCCs) and (MACV) feature extracted [11]. 4. Modified Shuffled Frog Leaping Algorithm (MSFLA) Shuffled Frog Leaping Algorithm (SFLA) and Modified Shuffled Frog Leaping Algorithm (MSFLA) is a newly developed nature-inspired method [12-16], which is characterized by great capability in global search and easy execution. MSFLA combines the advantages of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), is shown in Figure (2).

American Journal of Computer Science and Engineering 2015; 2(5): 32-37 34 Figure (2). Modified Shuffle Frog Leaping Algorithm.

35 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition extracted features voice of speakers and these extracted features voice must be matched with input voice s features. We used relationship between them, when the extracted features nearby to the stored features will be the one that will be matched. To evade the voice matching in all stage of our system especially when we have un - aboveboard speaker, a basis small value is used to correct un-aboveboard or abandonment a speaker which stipulates a probability ratio, which will denote the amount of match of speaker recognition. Then the voice will either be accepted or disallowed. Acceptance means that the speaker is aboveboard as the voice is matched otherwise it will be disallowed. The matching between the input voice and the database voice gets when the matched voice will have a high relationship otherwise a low value below the threshold is mistreated, hence the speaker is not permitted the admittance. In our paper text dependent speaker recognition is used, in which the conscription and test safety codes are same [17]. The following Figure (3) explains the process of Text Independent Speaker Recognition using Modified Shuffled Frog Leaping Algorithm. 6. Simulation and Results Figure (3). Process of our proposed Speaker Recognition. 5. Voices Speaker Matching In this section, they described the Simulation by using MATLAB, in order to simulate it and discussed, first explain the database of our system contains different enunciation of 40 different speakers, both male and female speakers (as an examples in Figure (4)), and each speaker has expressed 5 different sentences. After the feature extraction stage we obtained stored (a) (b) Figure (4). Speaker Signal examples (a) Male (b) Female.

American Journal of Computer Science and Engineering 2015; 2(5): 32-37 36 The database is required the extracted features of the user be relevant to different enunciation. In our work the Mel Frequency Cepstral Coefficients (MFCC) is the popular acoustic features used in speech recognition system for different speech data. The extracted feature database of the enunciation is made using MFCC for making a hearty speech recognizer for different users, and for efficient working of the MSFLA. The features extracted are accessed by the MSFLA to search out the best match. The enunciation is added with different types of noise (white Gaussian noises, car-noises and B-noises) the features of the signal with added noise are extracted and the MSFLA discoveries optimally the best match for the features extracted with admiration to the feature database, and shows the result for best match. The obtained results of the recognition accuracy are found to be best using MFCC features with MSFLA for various noise conditions using same dataset are as below in Figure (5). The recognition accuracy for added white Gaussian noises, carnoises and B-noises are 94.02%, 96.78% and 84.33%, respectively. Figure (5). Simulation results for different types of noises. 7. Conclusion Our paper is based on one of Swarm Intelligence Algorithm called Modified Shuffle Frog Leaping Algorithm (MSFLA). The aim of this algorithm use Biometrics is to identify an individual as per their some special characteristics as voice. In this MSFLA Search has been applied on speaker recognition systems and voice. Thus by applying this algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the extracted optimized features produced by the MSFLA. The recognition accuracy is found to be best using a hybrid model of MFCC/MSFLA (MFCC features with MSFLA) for various noise conditions. This work addresses the hybrid model of MFCC/MSFLA as a system reliability optimization with a multi-criteria approach provided useful insights into patterns of interaction among articulatory-acoustic feature dimensions in the further work. References [1] D. Ververidis, C. Kotropoulos, Gaussian mixture modeling by exploiting the mahalanobis distance, IEEE transactions on signal processing, Vol. 56, No. 7, July 2008. [2] K. Sri Rama Murty and B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition, IEEE Signal Processing Letters, vol 13, no 1, Jan. 2006. [3] S.R.M. Prasanna, S.G. Cheedella, B. Yegnanarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Communication, Vol. 48, Issue 10, October 2006. [4] S. Chakroborty, A. Roy, S. Majumdar, G. Saha, Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text- Independent Speaker Identification, International conference on Computing theory and applications, March 2007. [5] Y. Liu, M. Russell, M. Carey, The Role of Dynamic Features in Text-Dependent and Independent Speaker Verification, IEEE international conf. on acousto. Speech and signal processing (ICASSP), Vol. 1, May 2006. [6] E. Elbeltagi, T. Hegazy, and D. Grierson, Comparison among five evolutionary based optimization algorithms, Advanced Engineering Informatics, Vol. 19, Jan. 2005. [7] D. A. Reynolds, Speaker identification and verification using Gaussian mixture models, Speech Comm., vol. 17, Aug. 1995. [8] Chu, W. C., "Speech Coding Algorithms'', John Wiley & Sons, Vol.4, USA. 2003.

37 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition [9] S. P. Kishore and B. Yegnanarayana, Speaker verification Minimizing the channel effects using auto associative neural network models, in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, Istanbul, 2000. [10] M. Shajith Ikbal, Hemant Misra, and B. Yegnanarayana, Analysis of auto associative mapping neural networks, in Int. Joint Conf. on Neural Networks,Washington, USA, 1999. [11] B.Wildermoth and K. K. Paliwal. Use of voicing and pitch information for speaker recognition. In Use of Voicing and Pitch Information for Speaker Recognition, 2000. [12] Eusuff, M.M. and Lansey, K.E. Optimization of water distribution network design using the shuffled frog leaping algorithm, Journal of Water Resources Planning andmanagement, Vol. 129, No. 3, 2003. [14] B. Amiri, M. Fathian, A. Maroosi, Application of shuffled frog-leaping algorithm on clustering, Journal of International Advanced Manufacturing Technology, Vol.45, 2009. [15] X. H. Luo, Y. Yang, and X. Li, Modified shuffled frogleaping algorithm to solve traveling salesman problem, Journal of Communications, Vol. 30, Jul. 2009. [16] A. Khorsandi, A. Alimardani, B. Vahidi, and S.H. Hosseinian, Hybrid shuffled frog leaping algorithm and Nelder Mead simplexsearch for optimal reactive power dispatch, IET Genetation Transmission & Distribution, Vol. 5, 2, 2011. [17] H.B. Kekre, Vaishali Kulkarni, Prashant Gaikar and Nishant Gupta, Speaker Identification using Spectrograms of Varying Frame Sizes, International Journal of Computer Applications Vol. 50 - No. 20, July 2012. [13] Taher Niknam, Ehsan Azad Farsani, A hybrid self-adaptive particle swarm optimization and modified shuffled frog leaping algorithm for distribution feeder reconfiguration, Engineering Applications of Artificial Intelligence, 2010.