Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier SWATHY M.S / PG Scholar Dept.of ECE Thejus Engineering College Thrissur, India MAHESH K.R/Assistant Professor Dept.of ECE Thejus Engineering College Thrissur, India Abstract This paper describes a hybrid technique for speaker recognition. Speaker recognition is that the method of identifying the person based on characteristics like pitch, tone, Cepstral coefficients in the speech wave. Here DWT and MFCC technique is employed for feature extraction. A mix of two or lot of techniques is named hybrid technique. DWT means divide the speech signal completely into different frequency bands. Multi_ class SVM is used for classification. Keywords Feature Extraction; DWT; Mel frequency; MFCC; Multi_ class SVM. I. INTRODUCTION Speaker recognition is that the method of automatically characteristic a speaker with the assistance of a machine supported feature vectors obtained from the speech signal. Utterance may be a common development in human beings. Lungs, vocal chords, tongue, jaw, lips, teeth and vocal organ area unit are the main speech production organs in the human system. Speech is a complicated signal because of it s non stationary nature. Characteristics of speech signal vary with respect to time. Here we tend to divide the speech signal during a range of number of short frames for simple analysis. Feature extraction and classification are the two major steps in speaker recognition. Feature extraction is done by using DWT based MFCC and classification is done by using multi_class SVM. Here are featured extraction is done by using DWT based Sub band coding technique and classification is done by using Multi_Class SVM. Main steps involved in this speaker recognition system are 1) Create a database (collection of voice samples in wav format). 2) Feature extraction. 3) Training. 4) Testing. After testing using Multi_ class SVM, we can identify the speaker. II. LITERATURE SURVEY Now a days speech recognition has great importance due to its increasing applications. Many techniques for speaker recognition have been developed in the past decades. Feature extraction and feature matching are the two main techniques in speaker recognition. one of the early method used for feature extraction is LPC (Linear Predictive Coefficients) [6]. In LPC prediction of present values from the combination of m previous values. But it does'nt detect similar vowels and effected by degradation. Author name suggested another technique named as LPCC (Linear Predictive Cepstral Coefficients)[6] it is an extension to LPC method. But it is an all pole model. Rescent studies introduce another techniques called MFCC (Mel Frequency Cepstral Coefficients) and WT (Wavelet Transform)[11][7]. Both techniques are popularly used now a days. MFCC is popular because it approximates human system more accurately. WT provide time frequency localization and also used for denoising. Similarly many techniques have been developed for feature matching(classification). DTW (Dynamic Time Warping) [6]is one of the earlier method for classification. VQ (Vector Quantization) [8]is one of the simple method used for feature matching. However it's encoding is complex. Some other methods for feature matching are HMM (Hidden Markov Model),GMM (Gaussian Mixture Model) and SVM (Support Vector Machine) [6]. HMM is computationally more complex. SVM have simple operation and comes under supervised learning algorithm.. ISSN: 2348 8549 www.internationaljournalssrg.org Page 47
III. PROPOSED METHOD We collect speech samples from a variety of speakers. Then extract features from these voice samples. Then train the SVM using these extracted features. Every speaker has their own specific features. Then extract the features from each voice sample with in the database. DWT based MFCC is used for feature extraction. Once applying MFCC to the voice samples we tend to get features as in the form of Cepstral coefficients. Then we train the SVM using Cepstral coefficients. After training we go to the testing section. Testing is done by the help of SVM. Here I am using Mat Lab for writing the code. two bands called high frequency and low frequency band. This division is called decomposition. Low frequency band contains the characteristics of the signal and high frequency band contains the noise part of the signal. Typically it may contain is going to contain the helpful data. The wavelet transform is one among the sub band coding technique. Here we use discrete wavelet transform. We will decide the number of decomposition levels as a power of 2. After each decomposition speech signal becomes more fun in nature. So we cannot apply very high decomposition level, it should be a medium level. After applying DWT, we apply this fine speech signal into MFCC section. Here two level decomposition is applied. DWT Discrete wavelet transform comes under sub band coding. Wavelets are finite length waves. 2 important operations in wavelet transform (WT) are scaling and shifting. Fig 1: Proposed system A. Data Base Builder The initial step of speaker recognition is to save speech signal from different speakers. Here we tend to collect two or three audio samples from each speaker for more accuracy. Here I am recording audio wave using Mat lab. Fig 3: Decomposition tree of DWT [19] Fig 2: Example of speech signal B. Feature Extraction The speech signal is non stationary in nature. Speech signal contain plenty of knowledge, however we cannot readily obtain this information. Feature extraction techniques are used for extracting features. Here DWT based MFCC is used for feature extraction. DWT is an example of the sub band coding technique. DWT means Discrete Wavelet Transform. Sub band coding divides the speech samples into different frequency bands(38). The basic principle of the wavelet transform is sub band coding. First, we divide the speech signal into Here f is the speech signal, G and H are the high pass and low pass filter respectively. When every decomposition we tend to get the approximation (35)and detailed coefficient. Approximation element is that the low frequency element and detailed element is that the high frequency element. We tend to take the approximation component for further steps. Thus DWT will take away noise in every decomposition. When removing the noise the low element is applied to MFCC. ISSN: 2348 8549 www.internationaljournalssrg.org Page 48
Fig 4: DWT decomposition of speech signal MFCC MFCC is called Mel Frequency Cepstral Coefficient. MFCC simplifies the task of feature extraction, After applying MFCC features as in the form of Cepstral coefficients. We usually take 20-40 Cepstral coefficients. Human perception of speech is linear up to 1000 Hz and logarithmic from there [21]. MFCC has two styles of filer which are spaced linearly at low frequency below 1000Hz and logarithmic spacing above 1000 Hz. That is MFCC approximates the human response accurately. Main steps involved in MFCC are, Fig 5 : MFCC features Pre emphasis is mainly used to boost the energy of high frequency signals. Speech is non stationary signal that is it varies according to the time. Therefore we divide the speech signal into small frames, with an assumption that therein that intervals speech signal shows stationary nature. The width of the frames is generally about 30 ms with an overlap of about 20 ms. Windowing is employed for smoothening the speech signal. Once smoothening we apply the signal into the FFT section. In FFT each time domain frame is converted into the frequency domain. Multiplying every FFT magnitude coefficient by the corresponding log of filter value. The Mel frequency equation is given below Pre emphasis Framing Windowing FFT Mel Filter bank processing. DCT Mel (f) = 2595 *log 10 (1+f/700) (1) Final step is to compute DCT of log filter bank energies. After applying MFCC we get more information about lower frequencies than higher frequencies due to Mel scale. Finally, we obtain the features as in the form of Cepstral coefficients Fig 5 : MFCC flowchart [21] Fig 6 : Mel Frequency Filter Bank [22] ISSN: 2348 8549 www.internationaljournalssrg.org Page 49
C. Classification Classifier is used for feature matching or classification. After classification we can identify the authorized and unauthorized person. IV. RESULT Speaker recognition system is developed using MATLAB. The results of the system are represented by the following screenshots. In my project SVM consists of two phases training and testing phase. In the training phase I create a database and keep it as a reference. in the testing phase recognize the speaker. Here processing pairs are A-B, A-C, B-C Fig 7 : SVM classification SVM Fig 8 : Recognized a particular speaker SVM means Support Vector Machine. Here SVM is used as a classifier. Training and testing section comes under classifier section. First, we have to train the SVM classifier using the features obtaining from the MFCC section. Here we choose multi SVM for recognizing a particular speaker from a group of speakers. The goal of SVM vector machine is to seek out the optimal separating hyper plane, which maximizes the margin of the training data. SVM is a supervised learning algorithm based classifier. SVM has higher accuracy. In multi class SVM, we divide the multi class problem into several binary sub problems and builds a standard SVM for each. Mainly two algorithms are used in SVM, One against all One against one In one against all, building one SVM for every class in order to distinguish one class of all other remaining classes. In one against one, type one SVM for every pair of classes. If we have n classes, then n (n-1) /2 SVMs are trained to distinguish the speech samples. In this project one against one approach is used. For testing takes speech sample of a speaker that we have got to acknowledge. Then apply it to the SVM as input. Then extract features of this test input. And compare these features with trained feature set and recognize the speaker. Here using one against one algorithm. Take into account n=3, that is we have got 3classes. In one against one algorithm as per the equation 3(3-1)/2= 3 SVMs are needed.. Fig 9: Recognized an unknown speaker IV. CONCLUSION AND FUTURE SCOPE This project has been proposed a speaker recognition system using DWT based MFCC with Multi SVM classifier. The study reveals that we are able to recognize a particular speaker from a number of speakers. DWT based MFCC gives better performance in feature extraction and high noise reduction in the given speech signal. Binary SVM is not sufficient for speaker recognition from a large number of speakers. So here we use Multi SVM for classification. In future combination of techniques (MFCC, LPC, LPCC,) can be used for feature extraction. Hybrid techniques continuously provide improved results. In future we can extend this project to recognize the speaker even if one imitates another. ISSN: 2348 8549 www.internationaljournalssrg.org Page 50
References [1] Supriya Tripathis Speaker Recognition,IEEE Explore,2012. [2] S. K. Singh, Prof P. C. Pandey, Features And Techniques For Speaker Recognition,IIT Bombay. [3] Harish Chander Mahendru, Quick review of human speech production mechanism, ISSN,Volume 9,January 2014. [4] Masaaki Honda, Speech Production Mechanisms. 2013. [5] Harald Hoge, Siemens AG, Basic Parameters Of Speech Signal Analysis [6] Kirandeep Kaur, Neelu Jain, Feature Extraction and Classification for Automatic Speech Recognition System,ISSN,VOLUME 5,January 2015. [7] Rekha Hibare, Anup Vibhute, Feature Extraction In Speech Processing A Survey,IJCA,November 2014. [21] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques Journal Of Computing, Volume 2, Issue 3, March 2010, Issn 2151-9617. [22] 1sayf A. Majeed, 2hafizah Husain, 3salina Abdul Samad, 4tariq F. Idbeaa Mel Frequency Cepstral Coefficients (Mfcc) Feature Extraction Enhancement In The Application Of Speech Recognition: A Comparison Study 2005-2015 JATIT & LLS. [8] Shubhangi S. Jarande1, Prof. Surendra Waghmare, A Survey On Different Classifier In Speech Recognition Techniques,IJETAE,March 2014. [9] Kirandeep Kaur, Neelu Jain, Feature Extraction and Classification for Automatic Speaker Recognition System A Review,ISSN,January 2015. [10] Umer Malik1, P.K. Mishra, Automatic Speaker Recognition Using SVM,IJSR,2013. [11] Shreya Narang, Ms. Divya Gupta, Speech Feature Extraction Techniques: A Review IJCSMC,March 2015. [12] S.B.Dhonde, S.M.Jagade, Feature Extraction Techniques in Speaker Recognition: A Review,IJRMEE,May 2015. [13] Umer Malik1, P.K. Mishra, Automatic Speaker Recognition Using SVM,IJSR 2013. [14] Shreya Narang, Ms. Divya Gupta, Speech Feature Extraction Techniques: A Review [15] Alfredo Maesa1, Fabio Garzia, Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models Journal of jis.2012.34041. [16] Md. Rashidul Hasan, Mustafa Jamil, Speaker Identification Using Mel Frequency Cepstral Coefficients Icece 2004. [17] Roma Bharti, Manav rachna, Real Time Speaker Recognition System using MFCC and Vector Quantization IJCA May 2015.. [18] Aamir Khan,Muhammad Farhan,Asar Ali Speech Recognition:Increasing Efficiency of Support Vector Machines IJCAVolume 35 No.7, December 2011. [19] K.Deepak,rishispeaker recognitionsing Support Vector Machines issn:issue-2, Feb.-2014. [20] Shanthini Pandiaraj and K.R. Shankar Kumar Speaker Identification Using Discrete Wavelet Transform journal Of Computer Science 2014. ISSN: 2348 8549 www.internationaljournalssrg.org Page 51