Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 2

Similar documents
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Human Emotion Recognition From Speech

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Speaker recognition using universal background model on YOHO database

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speech Emotion Recognition Using Support Vector Machine

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

WHEN THERE IS A mismatch between the acoustic

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

A study of speaker adaptation for DNN-based speech synthesis

Speaker Recognition. Speaker Diarization and Identification

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Voice conversion through vector quantization

Speaker Identification by Comparison of Smart Methods. Abstract

Modeling function word errors in DNN-HMM based LVCSR systems

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Circuit Simulators: A Revolutionary E-Learning Platform

Modeling function word errors in DNN-HMM based LVCSR systems

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Learning Methods in Multilingual Speech Recognition

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Support Vector Machines for Speaker and Language Recognition

Python Machine Learning

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Speech Recognition at ICSI: Broadcast News and beyond

Affective Classification of Generic Audio Clips using Regression Models

Computer Science. Embedded systems today. Microcontroller MCR

Generative models and adversarial training

Australian Journal of Basic and Applied Sciences

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Control Tutorials for MATLAB and Simulink

Speech Recognition by Indexing and Sequencing

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Automatic Pronunciation Checker

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Software Maintenance

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Lecture 1: Machine Learning Basics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Lecture 9: Speech Recognition

Mining Association Rules in Student s Assessment Data

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

On-Line Data Analytics

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

University of Toronto Physics Practicals. University of Toronto Physics Practicals. University of Toronto Physics Practicals

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Application of Virtual Instruments (VIs) for an enhanced learning environment

Time series prediction

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

On the Formation of Phoneme Categories in DNN Acoustic Models

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Multidisciplinary Engineering Systems 2 nd and 3rd Year College-Wide Courses

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Reducing Features to Improve Bug Prediction

Spoofing and countermeasures for automatic speaker verification

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

School of Innovative Technologies and Engineering

Bluetooth mlearning Applications for the Classroom of the Future

A Practical Approach to Embedded Systems Engineering Workforce Development

Probabilistic Latent Semantic Analysis

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Radius STEM Readiness TM

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

THE DEPARTMENT OF DEFENSE HIGH LEVEL ARCHITECTURE. Richard M. Fujimoto

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Proceedings of Meetings on Acoustics

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Mathematics subject curriculum

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

SIE: Speech Enabled Interface for E-Learning

LABORATORY : A PROJECT-BASED LEARNING EXAMPLE ON POWER ELECTRONICS

Word Segmentation of Off-line Handwritten Documents

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

ME 443/643 Design Techniques in Mechanical Engineering. Lecture 1: Introduction

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Transcription:

1011 MFCC Based Speaker Recognition using Matlab KAVITA YADAV 1, MORESH MUKHEDKAR 2. 1 PG student, Department of Electronics and Telecommunication, Dr.D.Y.Patil College of Engineering, University of Pune,Ambi, Talegaon Pune, India 2 Assistant Professor, Department of Electronics and Telecommunication, Dr.D.Y.Patil College of Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 kavitasyadav@gmail.com, 2 moresh.mukhedkar@gmail.com ABSTRACT Speech is the natural and efficient way to communicate with persons as well as machine hence it plays an vital role in signal processing. This paper describes how Speaker Recognition model using MFCC and VQ has been planned, built up and tested for male and female voice. In this paper cepstral method is used to find the pitch of speaker and according to that find out gender of the speaker. In this method the voice signals for male and female ware recorded at 16 KHz sampling frequency. This wav file for voice signal was processed using MATLAB software for computing pitch of male and female voice signal. Because of high accuracy MFCC algorithm is used for Feature Extraction and VQ is used for Feature matching. Euclidean distance is used to calculate the distance between the speakers. Keywords : MFCC, VQ, pitch, Euclidean Distance Cepstral method 1. INTRODUCTION Speaker recognition is the automatic process which identify the unknown speaker based on input speech signal. Due to the speech recognition,speaker recognition is also plays an important role in signal processing. Speaker recognition system is categorized into category Speaker identification and Speaker Verification.In Speaker identification, identify the unknown speaker from the given sets of speaker by using best matching technique. In Speaker Verification identity of unknown speaker is compared with set of speakers whose identity is to be claimed and according to that accept and reject the speaker. Based on dependency of the text it is further divided into Text Dependent and Text Independent. Two main modules are used in speaker recognition system i.e. Feature Extraction and Feature match. can be selected according to the applications. Feature Extraction and Feature matching these are two modules used in Speaker Recognition. PDA is pitch detection algorithm which is a set of steps used to detect the pitch of speech signal. The cepstral method is to find out the pitch and according to that identify the gender of the speaker.in this project we concentrate on Text dependent Speaker identification part Due to built in frequency domain analysis and simple programming Matlab is used for programming. 2. FEATURE EXTRACTION This module is used to convert the speech signal into set of feature vectors i.e. reduce the input speech signal dimensionally. There are different methods used for Feature Extraction such as MFCC, PLP,, LPC. In this project due to high accuracy I MFCC. They are a representation of the short-term power spectrum of a sound, based on the linear cosine transform of the log power spectrum on a nonlinear Mel scale of frequency.block diagram of MFCC is shown in Fig [1]. Fig1.Block diagram of MFCC The Mel-frequency Cepstrum Coefficient (MFCC) technique is often used to create the impression of the sound files. The MFCC are depend on the known variation of the human ear s critical bandwidth frequencies with filters spaced linearly at low frequencies and logarithmically at high frequencies used to capture the important characteristics of speech. The Mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The following given formula is used to compute the Mels for a particular frequency mel( f ) = 2595*log10(1+ f / 700) (1)

1012 From Fig.1 shows the steps involved in MFCC.As shown in the figure 1 continuous speech signals are coming from the microphone and they are processed over short period of time It is divided into to frames and overlapped with the previous one for the clear transition. In second step we used hamming window for overlapping frame which is used to reduce the distortion caused by the overlapping. Next to windowing, FFT convert speech signal from time domain to frequency domain.in Mel Frequency wrapping, each frame signals are passed through Mel-Scale band pass filter to mimic the human ear. In the final stage, again signals converted into time domain using DCT. Instead of using inverse FFT, Discrete Cosine Transform is used as it is more appropriate [5]. 3. FEATURE MATCH Once the impression of speech signal is created i.e. feature vector is created it will be stored in a database as a speaker. When an unknown speaker speech file is loaded into the matlab,its finger print also will be created and its vector will be compared against vectors which are present in the database already by using the Euclidian distance technique, and suitable speaker will be identified.. This process is called Feature matching. Various methods are used to match the extracted features of voice to the stored voice such as Dynamic Time Warping (DTW), Vector Quantization (VQ), Gaussian Mixture Modeling (GMM) etc. In this project we use Vector Quantization. 3.1.Vector Quantization A speaker recognition system must able to compute probability distributions of the estimated feature vectors. Due to impossibility of storing each and every feature vector it is necessary to quantized these feature vectors into the small template vector i.e. vector quantiziation. VQ is a process that takes large sets of feature vectors and create small set of feature vectors that represent the centroids of the distribution. These feature vectors are clustered to form a codebook for each speaker. In the recognition phase, the data from the unknown speaker is compared to the codebook of each speaker and estimate the difference.by using this difference recognition decision is to be made.the various algorithm used for codebook generation are such as: K-means algorithm, LGB algorithm, SOM an PNN algorithm. Fig2. Codewords in 2-dimensional space 3.2. K-mean algorithm It is used to cluster the training input vectors to form feature vectors. Here training vectors are clustered depends on the specifications into k partitions. The objective of the k-means is to minimize total intra-cluster variance V as shown in the following equation. k V= i=1 jεsi Xj µi 2 (2) This algorithm used least-squares partitioning method to cluster the input vectors into k initial sets. After that it calculates the mean point of each set. It built a new partition by associating each point with the nearest centroid. Then again centroids are calculated for the new clusters, and algorithm repeat until and unless the vectors switch clusters. 3.3 Euclidean Distance In the this system an unknown speaker s speech signal is represented by a specific sequence of feature vector and then it is compared with the codebooks of speakers into the database. The Euclidean distance is used to identify the unknown speaker by measuring the distance between two feature vectors.and the shortest distance can be used to find out the unknown speaker. It is proved by Pythagoras Thereom. The Euclidean distance[8] between two points P = (p1, p2 pn) and Q = (q1, q2...qn), is given by p1 q1 2 + p2 q2 2 + + pn qn 2 n = i=1 pi qi 2 (3)

1013 4. GENDER RECOGNITION Fig3.Block diagram of gender Recognition The voice signals of unknown speakers are recorded by standard computer The pre-processing block performs three basic task i.e. removal of noise, silence detection and removal and pre-emphasis. Pitch Detection is the important block for gender recognition. Pitch is fundamental frequency of sound. For detection of pitch the cepstral method is used for speech signals of male and female and plotted using MATLAB software. Due to the higher pith of female speaker than male speaker some threshold is to be set for differentiating male and female speaker.if the value of calculated pitch is less than the threshold then the tested speaker is male else if pitch value is greater than threshold then the given speaker is female. 5. SYSTEM ARCHITECTURE Fig4. Proposed System Model 5.1 Block Diagram Description The microphone is used as input device. It takes the voice command from the speaker and transfer to computer system as input for our system and it converts the voice signal into electrical signal. MATLAB software takes the input command & compare with the stored voice command and perform the assigned task. The PC has communication port which is used to transfer command or data to microcontroller circuit. Connection between PC & microcontroller circuit is done with the help of RS-232 cable, DB -9 connector & IC MAX232. The microcontroller LPC2138 having programmed done already for activating the relay driving circuit as well as motor diving circuit after the command compared by MATLAB. 5.2Hardware Section Table 1 Port connection of LPC2138 Sr NO PORTS OF LPC2138 HARDWARE ATTACHED 1 Port 1.18, Port 0.25, Port 0.23, Port L293D 1.19 2 Port 0.3,port 0.4,port 0.5,port 0.6,port LCD 0.7,port 1.4 3 Port 1.20 and 0.17 RELAY 4 Port 0.0 and port 0.1 Max 232 As we can see in Table 1, all devices are connected to corresponding port pin of microcontroller Lpc2138. Port 1.18, Port 0.25, Port 0.23, Port 1.19 pins are connected to the Motor driving circuit (L293D.)Port 0.3,port 0.4,port 0.5,port 0.6,port 0.7,port 1.4 pins are connected to the LCD. Port 1.20 and 0.17 are connected to relay circuit. Port 0.0 and port 0.1 are connected to Max232 These devices work according to program which is stored in microcontroller. When voice word by user through microphone is identified in matlab passes the data to LPC2138which will perform particular operation related to that key word. 5.3.Software Section Due to simple programming interface and frequency built in ability MATLAB is used as programming language and following software used for the various purposes

1014 Circuit & Layout Designing : Proteus 7.7 Debugging: Keil LPC2138: Flash Magic 6. EXPERIMENTAL RESULTS Following figures shows recorded voice of speaker1 and its matched voice wave from the database. Fig 4.Input recorded wave Fig 5. MFCC wave of input recorded wave Fig 6. Distances from the centroids Fig 7 Matched voice wave Table 2 Results of gender Recognition Speaker Frequency Gender Attempt False Rejection Speaker 1 210.5263 Female 3 0 0 Speaker 2 122.1374 Male 3 0 0 Speaker 3 161.6162 Male 3 2 0 Speaker 4 142.8571 Male 3 0 0 Speaker 5 216.2162 Female 3 1 0 Speaker 6 551.7241 Female 3 1 0 Speaker 7 122.1374 Male 3 1 0 False Acceptance 6. CONCLUSION The aim of this project is to identify the identity of the unknown speaker as well as its gender. For this we extract the feature of speech by using MFCC and compare them with the stored speakers extracted features.the function melcepst is used to calculate the mel cepstrum of a signal. The speaker was modeled using Vector Quantization (VQ) due to high accuracy. K means algorithm is used for clustering training feature vectors of every speakers and stored in database. In Gender recognition phase I used Pitch detection algorithm. In that Cepstral method is used to determine the gender and I get satisfied results ACKNOWLEDGMENT I would like to thank all the staff members of E&TC Department, Dr. D.Y.College of engineering, Ambi. for their support. REFERENCES [1] Campbell, J.P., Jr.; Speaker recognition: a tutorial Proceedings of the IEEE Volume 85, Issue 9, Sept. 1997 Page(s):1437 1462.

1015 [2] Revathi, R. Ganapathy and Y. Venkataramani, Text Independent Speaker Recognition and Speaker Independent Speech Recognition Using Iterative Clustering Approach, IJCSIT, Vol 1, No 2, November 2009 [3] Douglas A. Reynolds and Richard C. Rose, Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models, IEEE Transactions on Speech and Audio transactions and Audio Processing, VOL. 3, NO. 1, JANUARY 1995 [4] Alfredo Maesa,Fabio Garzia, Text independent Automatic Recognition using Mel frequency cepstrum coefficient and Gaussian Mixture Model, IEEE Proceedings Volume 3,No-4,OCT. 2012. [5] F. Bimbot, J. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega- Garcia, D. Petrovska-Delacretaz, and D. Reynolds, A tutorial on text-independent speaker verification, in EURASIP Journal on Applied Signal Processing, 2004, pp. 430 451. [6] Kavitha K J An automatic speaker recognition system using MATLAB World Journal of Science and Technology 2012, 2(10):36-38 ISSN: 2231 2587. [7] Kashyap Patel, R.K. Prasad Speech Recognition and Verification Using MFCC & VQ International Journal of Emerging Science and Engineering (IJESE) ISSN: 2319 6378, Volume-1, Issue-7, May 2013. [8] Tejal Chauhan, Hemant Soni, Sameena Zafar A Review of Automatic Speaker Recognition System International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-3, Issue-4, September 2013 [9] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani, Md. Saifur Rahman, Speaker Identification Using Mel Frequency Cepstral Coefficients, 3rd International Conference on Electrical and Computer Engineering (ICECE 2004), 28-30 December 2004, Dhaka, Bangladesh [10] Ms. Arundhati S. Mehendale and Mrs. M.R. Dixit "Speaker Identification" Signals and Image processing: An International Journal (SIPIJ) Vol. 2, No. 2, June 2011. [11]L. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Transactions on ASSP, vol. 24, No.5,pp. 399-417, October 1976. [12] Kumar Rakesh, Subhangi Dutta and Kumara Shama, Gender Recognition using Speech Processing Techniques in Labview International Journal of Advances in Engineering & Technology, May 2011, ISSN: 2231-1963