Engineering, University of Pune,Ambi, Talegaon Pune, Indi PDF Free Download

1011 MFCC Based Speaker Recognition using Matlab KAVITA YADAV 1, MORESH MUKHEDKAR 2. 1 PG student, Department of Electronics and Telecommunication, Dr.D.Y.Patil College of Engineering, University of Pune,Ambi, Talegaon Pune, India 2 Assistant Professor, Department of Electronics and Telecommunication, Dr.D.Y.Patil College of Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 kavitasyadav@gmail.com, 2 moresh.mukhedkar@gmail.com ABSTRACT Speech is the natural and efficient way to communicate with persons as well as machine hence it plays an vital role in signal processing. This paper describes how Speaker Recognition model using MFCC and VQ has been planned, built up and tested for male and female voice. In this paper cepstral method is used to find the pitch of speaker and according to that find out gender of the speaker. In this method the voice signals for male and female ware recorded at 16 KHz sampling frequency. This wav file for voice signal was processed using MATLAB software for computing pitch of male and female voice signal. Because of high accuracy MFCC algorithm is used for Feature Extraction and VQ is used for Feature matching. Euclidean distance is used to calculate the distance between the speakers. Keywords : MFCC, VQ, pitch, Euclidean Distance Cepstral method 1. INTRODUCTION Speaker recognition is the automatic process which identify the unknown speaker based on input speech signal. Due to the speech recognition,speaker recognition is also plays an important role in signal processing. Speaker recognition system is categorized into category Speaker identification and Speaker Verification.In Speaker identification, identify the unknown speaker from the given sets of speaker by using best matching technique. In Speaker Verification identity of unknown speaker is compared with set of speakers whose identity is to be claimed and according to that accept and reject the speaker. Based on dependency of the text it is further divided into Text Dependent and Text Independent. Two main modules are used in speaker recognition system i.e. Feature Extraction and Feature match. can be selected according to the applications. Feature Extraction and Feature matching these are two modules used in Speaker Recognition. PDA is pitch detection algorithm which is a set of steps used to detect the pitch of speech signal. The cepstral method is to find out the pitch and according to that identify the gender of the speaker.in this project we concentrate on Text dependent Speaker identification part Due to built in frequency domain analysis and simple programming Matlab is used for programming. 2. FEATURE EXTRACTION This module is used to convert the speech signal into set of feature vectors i.e. reduce the input speech signal dimensionally. There are different methods used for Feature Extraction such as MFCC, PLP,, LPC. In this project due to high accuracy I MFCC. They are a representation of the short-term power spectrum of a sound, based on the linear cosine transform of the log power spectrum on a nonlinear Mel scale of frequency.block diagram of MFCC is shown in Fig [1]. Fig1.Block diagram of MFCC The Mel-frequency Cepstrum Coefficient (MFCC) technique is often used to create the impression of the sound files. The MFCC are depend on the known variation of the human ear s critical bandwidth frequencies with filters spaced linearly at low frequencies and logarithmically at high frequencies used to capture the important characteristics of speech. The Mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The following given formula is used to compute the Mels for a particular frequency mel( f ) = 2595*log10(1+ f / 700) (1)

1012 From Fig.1 shows the steps involved in MFCC.As shown in the figure 1 continuous speech signals are coming from the microphone and they are processed over short period of time It is divided into to frames and overlapped with the previous one for the clear transition. In second step we used hamming window for overlapping frame which is used to reduce the distortion caused by the overlapping. Next to windowing, FFT convert speech signal from time domain to frequency domain.in Mel Frequency wrapping, each frame signals are passed through Mel-Scale band pass filter to mimic the human ear. In the final stage, again signals converted into time domain using DCT. Instead of using inverse FFT, Discrete Cosine Transform is used as it is more appropriate [5]. 3. FEATURE MATCH Once the impression of speech signal is created i.e. feature vector is created it will be stored in a database as a speaker. When an unknown speaker speech file is loaded into the matlab,its finger print also will be created and its vector will be compared against vectors which are present in the database already by using the Euclidian distance technique, and suitable speaker will be identified.. This process is called Feature matching. Various methods are used to match the extracted features of voice to the stored voice such as Dynamic Time Warping (DTW), Vector Quantization (VQ), Gaussian Mixture Modeling (GMM) etc. In this project we use Vector Quantization. 3.1.Vector Quantization A speaker recognition system must able to compute probability distributions of the estimated feature vectors. Due to impossibility of storing each and every feature vector it is necessary to quantized these feature vectors into the small template vector i.e. vector quantiziation. VQ is a process that takes large sets of feature vectors and create small set of feature vectors that represent the centroids of the distribution. These feature vectors are clustered to form a codebook for each speaker. In the recognition phase, the data from the unknown speaker is compared to the codebook of each speaker and estimate the difference.by using this difference recognition decision is to be made.the various algorithm used for codebook generation are such as: K-means algorithm, LGB algorithm, SOM an PNN algorithm. Fig2. Codewords in 2-dimensional space 3.2. K-mean algorithm It is used to cluster the training input vectors to form feature vectors. Here training vectors are clustered depends on the specifications into k partitions. The objective of the k-means is to minimize total intra-cluster variance V as shown in the following equation. k V= i=1 jεsi Xj µi 2 (2) This algorithm used least-squares partitioning method to cluster the input vectors into k initial sets. After that it calculates the mean point of each set. It built a new partition by associating each point with the nearest centroid. Then again centroids are calculated for the new clusters, and algorithm repeat until and unless the vectors switch clusters. 3.3 Euclidean Distance In the this system an unknown speaker s speech signal is represented by a specific sequence of feature vector and then it is compared with the codebooks of speakers into the database. The Euclidean distance is used to identify the unknown speaker by measuring the distance between two feature vectors.and the shortest distance can be used to find out the unknown speaker. It is proved by Pythagoras Thereom. The Euclidean distance[8] between two points P = (p1, p2 pn) and Q = (q1, q2...qn), is given by p1 q1 2 + p2 q2 2 + + pn qn 2 n = i=1 pi qi 2 (3)

1013 4. GENDER RECOGNITION Fig3.Block diagram of gender Recognition The voice signals of unknown speakers are recorded by standard computer The pre-processing block performs three basic task i.e. removal of noise, silence detection and removal and pre-emphasis. Pitch Detection is the important block for gender recognition. Pitch is fundamental frequency of sound. For detection of pitch the cepstral method is used for speech signals of male and female and plotted using MATLAB software. Due to the higher pith of female speaker than male speaker some threshold is to be set for differentiating male and female speaker.if the value of calculated pitch is less than the threshold then the tested speaker is male else if pitch value is greater than threshold then the given speaker is female. 5. SYSTEM ARCHITECTURE Fig4. Proposed System Model 5.1 Block Diagram Description The microphone is used as input device. It takes the voice command from the speaker and transfer to computer system as input for our system and it converts the voice signal into electrical signal. MATLAB software takes the input command & compare with the stored voice command and perform the assigned task. The PC has communication port which is used to transfer command or data to microcontroller circuit. Connection between PC & microcontroller circuit is done with the help of RS-232 cable, DB -9 connector & IC MAX232. The microcontroller LPC2138 having programmed done already for activating the relay driving circuit as well as motor diving circuit after the command compared by MATLAB. 5.2Hardware Section Table 1 Port connection of LPC2138 Sr NO PORTS OF LPC2138 HARDWARE ATTACHED 1 Port 1.18, Port 0.25, Port 0.23, Port L293D 1.19 2 Port 0.3,port 0.4,port 0.5,port 0.6,port LCD 0.7,port 1.4 3 Port 1.20 and 0.17 RELAY 4 Port 0.0 and port 0.1 Max 232 As we can see in Table 1, all devices are connected to corresponding port pin of microcontroller Lpc2138. Port 1.18, Port 0.25, Port 0.23, Port 1.19 pins are connected to the Motor driving circuit (L293D.)Port 0.3,port 0.4,port 0.5,port 0.6,port 0.7,port 1.4 pins are connected to the LCD. Port 1.20 and 0.17 are connected to relay circuit. Port 0.0 and port 0.1 are connected to Max232 These devices work according to program which is stored in microcontroller. When voice word by user through microphone is identified in matlab passes the data to LPC2138which will perform particular operation related to that key word. 5.3.Software Section Due to simple programming interface and frequency built in ability MATLAB is used as programming language and following software used for the various purposes

1014 Circuit & Layout Designing : Proteus 7.7 Debugging: Keil LPC2138: Flash Magic 6. EXPERIMENTAL RESULTS Following figures shows recorded voice of speaker1 and its matched voice wave from the database. Fig 4.Input recorded wave Fig 5. MFCC wave of input recorded wave Fig 6. Distances from the centroids Fig 7 Matched voice wave Table 2 Results of gender Recognition Speaker Frequency Gender Attempt False Rejection Speaker 1 210.5263 Female 3 0 0 Speaker 2 122.1374 Male 3 0 0 Speaker 3 161.6162 Male 3 2 0 Speaker 4 142.8571 Male 3 0 0 Speaker 5 216.2162 Female 3 1 0 Speaker 6 551.7241 Female 3 1 0 Speaker 7 122.1374 Male 3 1 0 False Acceptance 6. CONCLUSION The aim of this project is to identify the identity of the unknown speaker as well as its gender. For this we extract the feature of speech by using MFCC and compare them with the stored speakers extracted features.the function melcepst is used to calculate the mel cepstrum of a signal. The speaker was modeled using Vector Quantization (VQ) due to high accuracy. K means algorithm is used for clustering training feature vectors of every speakers and stored in database. In Gender recognition phase I used Pitch detection algorithm. In that Cepstral method is used to determine the gender and I get satisfied results ACKNOWLEDGMENT I would like to thank all the staff members of E&TC Department, Dr. D.Y.College of engineering, Ambi. for their support. REFERENCES [1] Campbell, J.P., Jr.; Speaker recognition: a tutorial Proceedings of the IEEE Volume 85, Issue 9, Sept. 1997 Page(s):1437 1462.

1015 [2] Revathi, R. Ganapathy and Y. Venkataramani, Text Independent Speaker Recognition and Speaker Independent Speech Recognition Using Iterative Clustering Approach, IJCSIT, Vol 1, No 2, November 2009 [3] Douglas A. Reynolds and Richard C. Rose, Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models, IEEE Transactions on Speech and Audio transactions and Audio Processing, VOL. 3, NO. 1, JANUARY 1995 [4] Alfredo Maesa,Fabio Garzia, Text independent Automatic Recognition using Mel frequency cepstrum coefficient and Gaussian Mixture Model, IEEE Proceedings Volume 3,No-4,OCT. 2012. [5] F. Bimbot, J. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega- Garcia, D. Petrovska-Delacretaz, and D. Reynolds, A tutorial on text-independent speaker verification, in EURASIP Journal on Applied Signal Processing, 2004, pp. 430 451. [6] Kavitha K J An automatic speaker recognition system using MATLAB World Journal of Science and Technology 2012, 2(10):36-38 ISSN: 2231 2587. [7] Kashyap Patel, R.K. Prasad Speech Recognition and Verification Using MFCC & VQ International Journal of Emerging Science and Engineering (IJESE) ISSN: 2319 6378, Volume-1, Issue-7, May 2013. [8] Tejal Chauhan, Hemant Soni, Sameena Zafar A Review of Automatic Speaker Recognition System International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-3, Issue-4, September 2013 [9] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani, Md. Saifur Rahman, Speaker Identification Using Mel Frequency Cepstral Coefficients, 3rd International Conference on Electrical and Computer Engineering (ICECE 2004), 28-30 December 2004, Dhaka, Bangladesh [10] Ms. Arundhati S. Mehendale and Mrs. M.R. Dixit "Speaker Identification" Signals and Image processing: An International Journal (SIPIJ) Vol. 2, No. 2, June 2011. [11]L. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Transactions on ASSP, vol. 24, No.5,pp. 399-417, October 1976. [12] Kumar Rakesh, Subhangi Dutta and Kumara Shama, Gender Recognition using Speech Processing Techniques in Labview International Journal of Advances in Engineering & Technology, May 2011, ISSN: 2231-1963

Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 2