A Hybrid Model of MFCC/MSFLA for Speaker Recognition

Size: px

Start display at page:

Download "A Hybrid Model of MFCC/MSFLA for Speaker Recognition"

Kevin Cook
5 years ago
Views:

1 American Journal of Computer Science and Engineering 2015; 2(5): Published online August 30, 2015 ( A Hybrid Model of MFCC/MSFLA for Speaker Recognition Majida Ali Abed 1, Hamid Ali Abed Alasadi 2 1 College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2 Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq address majida.ali@tu.edu.iq (M. A. Abed), hamid_alasadi@ieee.org (H. A. A. Alasadi) To cite this article Majida Ali Abed, Hamid Ali Abed Alasadi. A Hybrid Model of MFCC/MSFLA for Speaker Recognition. American Journal of Computer Science and Engineering. Vol. 2, No. 5, 2015, pp Abstract In this paper, speaker recognition system is optimized based on one of Swarm Intelligence Algorithm called Modified Shuffle Frog Leaping Algorithm (MSFLA) with Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) feature extraction approach. In this algorithm Search has been applied on speaker recognition systems and voice. Thus by applying this algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the extracted optimized features produced by the MSFLA. The recognition accuracy for various noise conditions (white Gaussian noises, car-noises and B-noises) with same dataset are 94.02%, 96.78% and 84.33%, respectively, using a Hybrid model of MFCC/MSFLA. Keywords Speaker Recognition, Mel Frequency Cepstral Coefficients (MFCCs), Modified Shuffled Frog Leaping Algorithm (MSFLA) 1. Introduction Speaker recognition systems became the topic of research in the early 1970 s [1]. Some of the first studies of speaker recognition were published in 1971, which used feature extraction technique included, pitch contours [2], Linear Prediction (LP), Cepstral analysis, linear prediction error energy and autocorrelation coefficients.current speaker recognition research depend on the Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) are the most common short-time feature extraction approaches [3]. Speaker recognition includes speaker identification or speaker verification based on his/her voice in the form of speech. Speech signal carries information about speech message, speaker and also the environment of recording. For speaker recognition, speech data from a speaker is collected and is used to develop a model for capturing the speaker specific information. For text-independent speaker recognition the speech data is usually of about one minute duration. The model speaker is divided two models [4]. (1). Statistical model like a Gaussian Mixture Model, Hidden Markov Model, Support Vector Machines (SVM) and Vector Quantization (VQ). (2). Neural network model like Feed forward Auto associative network Now these two models are used as classification methods in speaker recognition based by applying the evolutionary algorithms such as genetic algorithms and genetic programming, Swarm Intelligence (SI) algorithms such as Ant Colony Optimization (ACO), Bee Colony Optimization (BCO), Cat Swarm Optimization (CSO), Shuffled Frog Leaping Algorithm (SFLA), and Cuckoo Search Algorithm (CSA). The process of Speaker Recognition is optimized by a fitness function of these algorithms by matching of voices being done on only the extracted optimized features produced by the Swarm Intelligence (SI) algorithms [5, 6]. In Our paper we used Modified Shuffled Frog Leaping Algorithm (MSFLA). Our paper is prepared as, Section 2; we discuss the principle of speaker recognition, Section 3, features extraction used in this paper. In Sections 4 and 5, the principle of MSFLA and the speaker recognition system using the MSFLA are described, respectively. The performance of the recognition systems based on principle of speaker recognition and system features is evaluated, and the

2 33 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition results are discussed in Section 6. Section 7, gives a conclusion of the paper. 2. Speaker Recognition The speaker recognition task is often divided into two related applications and Characterized into text-independent and text-dependent recognition [7]. As shown in Figure (1): Speaker Identification. Speaker Verification. Speaker identification is used to determine the speaker from a set of registered speakers when the result of this set is finest speaker matched, the set is called closed set identification but when the result can be a speaker or a nomatch result and is called open set identification. Speaker Verification determines if the voice matches a particular registered speaker result is the probability of a match or a similarity measure [8]. Figure (1). The two essential tasks of speaker recognition. 3. Feature Extraction Modified Shuffle Frog Leaping Algorithm (MSFLA) work on only on best features, so there is a need to initially extract the features from the voices [9]. There are many different speech features that have been shown to be indicative of speaker identity. These include field related features: Linear Prediction Cepstral Coefficients (LPCCs). Maximum Autocorrelation Value (MACV). Mel Frequency Cepstral Coefficients (MFCCs). We used in our research the speech feature Mel Frequency Cepstral Coefficients (MFCCs) extracted from the spectrum. The reason for use this speech feature is that in many applications speaker identification is a precursor to speech recognition, to identify what is being said. Among the possible features MFCCs have verified to be the most successful and hearty features for speech recognition [10]. The features will be extracted from the inputted voice. This inputted voice will be in the form of spectrograms consisting of various frequencies as per time. Fourier-Bessel Cepstral coefficients (FBCC) based feature extraction indicates an improved accuracy and efficiency in comparison to (LPCCs) and (MACV) feature extracted [11]. 4. Modified Shuffled Frog Leaping Algorithm (MSFLA) Shuffled Frog Leaping Algorithm (SFLA) and Modified Shuffled Frog Leaping Algorithm (MSFLA) is a newly developed nature-inspired method [12-16], which is characterized by great capability in global search and easy execution. MSFLA combines the advantages of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), is shown in Figure (2).

3 American Journal of Computer Science and Engineering 2015; 2(5): Figure (2). Modified Shuffle Frog Leaping Algorithm.

35 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition extracted features voice of speakers and these extracted features voice must be

To evade the voice matching in all stage of our system especially when we have un - aboveboard speaker, a basis small value is used to correct un-aboveboard or abandonment a speaker

Acceptance means that the speaker is aboveboard as the voice is matched otherwise it will be disallowed.

the speaker is not permitted the admittance. In our paper text dependent speaker recognition is used, in which the conscription and test safety codes are same [17].

Process of our proposed Speaker Recognition. 5.

4 35 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition extracted features voice of speakers and these extracted features voice must be matched with input voice s features. We used relationship between them, when the extracted features nearby to the stored features will be the one that will be matched. To evade the voice matching in all stage of our system especially when we have un - aboveboard speaker, a basis small value is used to correct un-aboveboard or abandonment a speaker which stipulates a probability ratio, which will denote the amount of match of speaker recognition. Then the voice will either be accepted or disallowed. Acceptance means that the speaker is aboveboard as the voice is matched otherwise it will be disallowed. The matching between the input voice and the database voice gets when the matched voice will have a high relationship otherwise a low value below the threshold is mistreated, hence the speaker is not permitted the admittance. In our paper text dependent speaker recognition is used, in which the conscription and test safety codes are same [17]. The following Figure (3) explains the process of Text Independent Speaker Recognition using Modified Shuffled Frog Leaping Algorithm. 6. Simulation and Results Figure (3). Process of our proposed Speaker Recognition. 5. Voices Speaker Matching In this section, they described the Simulation by using MATLAB, in order to simulate it and discussed, first explain the database of our system contains different enunciation of 40 different speakers, both male and female speakers (as an examples in Figure (4)), and each speaker has expressed 5 different sentences. After the feature extraction stage we obtained stored (a) (b) Figure (4). Speaker Signal examples (a) Male (b) Female.

5 American Journal of Computer Science and Engineering 2015; 2(5): The database is required the extracted features of the user be relevant to different enunciation. In our work the Mel Frequency Cepstral Coefficients (MFCC) is the popular acoustic features used in speech recognition system for different speech data. The extracted feature database of the enunciation is made using MFCC for making a hearty speech recognizer for different users, and for efficient working of the MSFLA. The features extracted are accessed by the MSFLA to search out the best match. The enunciation is added with different types of noise (white Gaussian noises, car-noises and B-noises) the features of the signal with added noise are extracted and the MSFLA discoveries optimally the best match for the features extracted with admiration to the feature database, and shows the result for best match. The obtained results of the recognition accuracy are found to be best using MFCC features with MSFLA for various noise conditions using same dataset are as below in Figure (5). The recognition accuracy for added white Gaussian noises, carnoises and B-noises are 94.02%, 96.78% and 84.33%, respectively. Figure (5). Simulation results for different types of noises. 7. Conclusion Our paper is based on one of Swarm Intelligence Algorithm called Modified Shuffle Frog Leaping Algorithm (MSFLA). The aim of this algorithm use Biometrics is to identify an individual as per their some special characteristics as voice. In this MSFLA Search has been applied on speaker recognition systems and voice. Thus by applying this algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the extracted optimized features produced by the MSFLA. The recognition accuracy is found to be best using a hybrid model of MFCC/MSFLA (MFCC features with MSFLA) for various noise conditions. This work addresses the hybrid model of MFCC/MSFLA as a system reliability optimization with a multi-criteria approach provided useful insights into patterns of interaction among articulatory-acoustic feature dimensions in the further work. References [1] D. Ververidis, C. Kotropoulos, Gaussian mixture modeling by exploiting the mahalanobis distance, IEEE transactions on signal processing, Vol. 56, No. 7, July [2] K. Sri Rama Murty and B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition, IEEE Signal Processing Letters, vol 13, no 1, Jan [3] S.R.M. Prasanna, S.G. Cheedella, B. Yegnanarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Communication, Vol. 48, Issue 10, October [4] S. Chakroborty, A. Roy, S. Majumdar, G. Saha, Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text- Independent Speaker Identification, International conference on Computing theory and applications, March [5] Y. Liu, M. Russell, M. Carey, The Role of Dynamic Features in Text-Dependent and Independent Speaker Verification, IEEE international conf. on acousto. Speech and signal processing (ICASSP), Vol. 1, May [6] E. Elbeltagi, T. Hegazy, and D. Grierson, Comparison among five evolutionary based optimization algorithms, Advanced Engineering Informatics, Vol. 19, Jan [7] D. A. Reynolds, Speaker identification and verification using Gaussian mixture models, Speech Comm., vol. 17, Aug [8] Chu, W. C., "Speech Coding Algorithms'', John Wiley & Sons, Vol.4, USA

6 37 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition [9] S. P. Kishore and B. Yegnanarayana, Speaker verification Minimizing the channel effects using auto associative neural network models, in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, Istanbul, [10] M. Shajith Ikbal, Hemant Misra, and B. Yegnanarayana, Analysis of auto associative mapping neural networks, in Int. Joint Conf. on Neural Networks,Washington, USA, [11] B.Wildermoth and K. K. Paliwal. Use of voicing and pitch information for speaker recognition. In Use of Voicing and Pitch Information for Speaker Recognition, [12] Eusuff, M.M. and Lansey, K.E. Optimization of water distribution network design using the shuffled frog leaping algorithm, Journal of Water Resources Planning andmanagement, Vol. 129, No. 3, [14] B. Amiri, M. Fathian, A. Maroosi, Application of shuffled frog-leaping algorithm on clustering, Journal of International Advanced Manufacturing Technology, Vol.45, [15] X. H. Luo, Y. Yang, and X. Li, Modified shuffled frogleaping algorithm to solve traveling salesman problem, Journal of Communications, Vol. 30, Jul [16] A. Khorsandi, A. Alimardani, B. Vahidi, and S.H. Hosseinian, Hybrid shuffled frog leaping algorithm and Nelder Mead simplexsearch for optimal reactive power dispatch, IET Genetation Transmission & Distribution, Vol. 5, 2, [17] H.B. Kekre, Vaishali Kulkarni, Prashant Gaikar and Nishant Gupta, Speaker Identification using Spectrograms of Varying Frame Sizes, International Journal of Computer Applications Vol No. 20, July [13] Taher Niknam, Ehsan Azad Farsani, A hybrid self-adaptive particle swarm optimization and modified shuffled frog leaping algorithm for distribution feeder reconfiguration, Engineering Applications of Artificial Intelligence, 2010.

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,