Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier

Similar documents
Human Emotion Recognition From Speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Speech Emotion Recognition Using Support Vector Machine

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speaker recognition using universal background model on YOHO database

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Speaker Identification by Comparison of Smart Methods. Abstract

WHEN THERE IS A mismatch between the acoustic

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speaker Recognition. Speaker Diarization and Identification

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

A study of speaker adaptation for DNN-based speech synthesis

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Support Vector Machines for Speaker and Language Recognition

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Learning Methods in Multilingual Speech Recognition

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speech Recognition by Indexing and Sequencing

Proceedings of Meetings on Acoustics

Speech Recognition at ICSI: Broadcast News and beyond

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Lecture 1: Machine Learning Basics

Automatic Pronunciation Checker

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Python Machine Learning

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Affective Classification of Generic Audio Clips using Regression Models

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 9: Speech Recognition

Calibration of Confidence Measures in Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Artificial Neural Networks written examination

Probabilistic Latent Semantic Analysis

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Rule Learning with Negation: Issues Regarding Effectiveness

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Circuit Simulators: A Revolutionary E-Learning Platform

CS Machine Learning

Voice conversion through vector quantization

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Segregation of Unvoiced Speech from Nonspeech Interference

(Sub)Gradient Descent

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Word Segmentation of Off-line Handwritten Documents

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Learning From the Past with Experiment Databases

Evolutive Neural Net Fuzzy Filtering: Basic Description

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Australian Journal of Basic and Applied Sciences

Edinburgh Research Explorer

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

An Online Handwriting Recognition System For Turkish

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Assignment 1: Predicting Amazon Review Ratings

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Body-Conducted Speech Recognition and its Application to Speech Support System

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Linking Task: Identifying authors and book titles in verbose queries

Large vocabulary off-line handwriting recognition: A survey

Extending Place Value with Whole Numbers to 1,000,000

Automatic intonation assessment for computer aided language learning

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

A Hybrid Text-To-Speech system for Afrikaans

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Mining Association Rules in Student s Assessment Data

On the Formation of Phoneme Categories in DNN Acoustic Models

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Switchboard Language Model Improvement with Conversational Data from Gigaword

Transcription:

Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier SWATHY M.S / PG Scholar Dept.of ECE Thejus Engineering College Thrissur, India MAHESH K.R/Assistant Professor Dept.of ECE Thejus Engineering College Thrissur, India Abstract This paper describes a hybrid technique for speaker recognition. Speaker recognition is that the method of identifying the person based on characteristics like pitch, tone, Cepstral coefficients in the speech wave. Here DWT and MFCC technique is employed for feature extraction. A mix of two or lot of techniques is named hybrid technique. DWT means divide the speech signal completely into different frequency bands. Multi_ class SVM is used for classification. Keywords Feature Extraction; DWT; Mel frequency; MFCC; Multi_ class SVM. I. INTRODUCTION Speaker recognition is that the method of automatically characteristic a speaker with the assistance of a machine supported feature vectors obtained from the speech signal. Utterance may be a common development in human beings. Lungs, vocal chords, tongue, jaw, lips, teeth and vocal organ area unit are the main speech production organs in the human system. Speech is a complicated signal because of it s non stationary nature. Characteristics of speech signal vary with respect to time. Here we tend to divide the speech signal during a range of number of short frames for simple analysis. Feature extraction and classification are the two major steps in speaker recognition. Feature extraction is done by using DWT based MFCC and classification is done by using multi_class SVM. Here are featured extraction is done by using DWT based Sub band coding technique and classification is done by using Multi_Class SVM. Main steps involved in this speaker recognition system are 1) Create a database (collection of voice samples in wav format). 2) Feature extraction. 3) Training. 4) Testing. After testing using Multi_ class SVM, we can identify the speaker. II. LITERATURE SURVEY Now a days speech recognition has great importance due to its increasing applications. Many techniques for speaker recognition have been developed in the past decades. Feature extraction and feature matching are the two main techniques in speaker recognition. one of the early method used for feature extraction is LPC (Linear Predictive Coefficients) [6]. In LPC prediction of present values from the combination of m previous values. But it does'nt detect similar vowels and effected by degradation. Author name suggested another technique named as LPCC (Linear Predictive Cepstral Coefficients)[6] it is an extension to LPC method. But it is an all pole model. Rescent studies introduce another techniques called MFCC (Mel Frequency Cepstral Coefficients) and WT (Wavelet Transform)[11][7]. Both techniques are popularly used now a days. MFCC is popular because it approximates human system more accurately. WT provide time frequency localization and also used for denoising. Similarly many techniques have been developed for feature matching(classification). DTW (Dynamic Time Warping) [6]is one of the earlier method for classification. VQ (Vector Quantization) [8]is one of the simple method used for feature matching. However it's encoding is complex. Some other methods for feature matching are HMM (Hidden Markov Model),GMM (Gaussian Mixture Model) and SVM (Support Vector Machine) [6]. HMM is computationally more complex. SVM have simple operation and comes under supervised learning algorithm.. ISSN: 2348 8549 www.internationaljournalssrg.org Page 47

III. PROPOSED METHOD We collect speech samples from a variety of speakers. Then extract features from these voice samples. Then train the SVM using these extracted features. Every speaker has their own specific features. Then extract the features from each voice sample with in the database. DWT based MFCC is used for feature extraction. Once applying MFCC to the voice samples we tend to get features as in the form of Cepstral coefficients. Then we train the SVM using Cepstral coefficients. After training we go to the testing section. Testing is done by the help of SVM. Here I am using Mat Lab for writing the code. two bands called high frequency and low frequency band. This division is called decomposition. Low frequency band contains the characteristics of the signal and high frequency band contains the noise part of the signal. Typically it may contain is going to contain the helpful data. The wavelet transform is one among the sub band coding technique. Here we use discrete wavelet transform. We will decide the number of decomposition levels as a power of 2. After each decomposition speech signal becomes more fun in nature. So we cannot apply very high decomposition level, it should be a medium level. After applying DWT, we apply this fine speech signal into MFCC section. Here two level decomposition is applied. DWT Discrete wavelet transform comes under sub band coding. Wavelets are finite length waves. 2 important operations in wavelet transform (WT) are scaling and shifting. Fig 1: Proposed system A. Data Base Builder The initial step of speaker recognition is to save speech signal from different speakers. Here we tend to collect two or three audio samples from each speaker for more accuracy. Here I am recording audio wave using Mat lab. Fig 3: Decomposition tree of DWT [19] Fig 2: Example of speech signal B. Feature Extraction The speech signal is non stationary in nature. Speech signal contain plenty of knowledge, however we cannot readily obtain this information. Feature extraction techniques are used for extracting features. Here DWT based MFCC is used for feature extraction. DWT is an example of the sub band coding technique. DWT means Discrete Wavelet Transform. Sub band coding divides the speech samples into different frequency bands(38). The basic principle of the wavelet transform is sub band coding. First, we divide the speech signal into Here f is the speech signal, G and H are the high pass and low pass filter respectively. When every decomposition we tend to get the approximation (35)and detailed coefficient. Approximation element is that the low frequency element and detailed element is that the high frequency element. We tend to take the approximation component for further steps. Thus DWT will take away noise in every decomposition. When removing the noise the low element is applied to MFCC. ISSN: 2348 8549 www.internationaljournalssrg.org Page 48

Fig 4: DWT decomposition of speech signal MFCC MFCC is called Mel Frequency Cepstral Coefficient. MFCC simplifies the task of feature extraction, After applying MFCC features as in the form of Cepstral coefficients. We usually take 20-40 Cepstral coefficients. Human perception of speech is linear up to 1000 Hz and logarithmic from there [21]. MFCC has two styles of filer which are spaced linearly at low frequency below 1000Hz and logarithmic spacing above 1000 Hz. That is MFCC approximates the human response accurately. Main steps involved in MFCC are, Fig 5 : MFCC features Pre emphasis is mainly used to boost the energy of high frequency signals. Speech is non stationary signal that is it varies according to the time. Therefore we divide the speech signal into small frames, with an assumption that therein that intervals speech signal shows stationary nature. The width of the frames is generally about 30 ms with an overlap of about 20 ms. Windowing is employed for smoothening the speech signal. Once smoothening we apply the signal into the FFT section. In FFT each time domain frame is converted into the frequency domain. Multiplying every FFT magnitude coefficient by the corresponding log of filter value. The Mel frequency equation is given below Pre emphasis Framing Windowing FFT Mel Filter bank processing. DCT Mel (f) = 2595 *log 10 (1+f/700) (1) Final step is to compute DCT of log filter bank energies. After applying MFCC we get more information about lower frequencies than higher frequencies due to Mel scale. Finally, we obtain the features as in the form of Cepstral coefficients Fig 5 : MFCC flowchart [21] Fig 6 : Mel Frequency Filter Bank [22] ISSN: 2348 8549 www.internationaljournalssrg.org Page 49

C. Classification Classifier is used for feature matching or classification. After classification we can identify the authorized and unauthorized person. IV. RESULT Speaker recognition system is developed using MATLAB. The results of the system are represented by the following screenshots. In my project SVM consists of two phases training and testing phase. In the training phase I create a database and keep it as a reference. in the testing phase recognize the speaker. Here processing pairs are A-B, A-C, B-C Fig 7 : SVM classification SVM Fig 8 : Recognized a particular speaker SVM means Support Vector Machine. Here SVM is used as a classifier. Training and testing section comes under classifier section. First, we have to train the SVM classifier using the features obtaining from the MFCC section. Here we choose multi SVM for recognizing a particular speaker from a group of speakers. The goal of SVM vector machine is to seek out the optimal separating hyper plane, which maximizes the margin of the training data. SVM is a supervised learning algorithm based classifier. SVM has higher accuracy. In multi class SVM, we divide the multi class problem into several binary sub problems and builds a standard SVM for each. Mainly two algorithms are used in SVM, One against all One against one In one against all, building one SVM for every class in order to distinguish one class of all other remaining classes. In one against one, type one SVM for every pair of classes. If we have n classes, then n (n-1) /2 SVMs are trained to distinguish the speech samples. In this project one against one approach is used. For testing takes speech sample of a speaker that we have got to acknowledge. Then apply it to the SVM as input. Then extract features of this test input. And compare these features with trained feature set and recognize the speaker. Here using one against one algorithm. Take into account n=3, that is we have got 3classes. In one against one algorithm as per the equation 3(3-1)/2= 3 SVMs are needed.. Fig 9: Recognized an unknown speaker IV. CONCLUSION AND FUTURE SCOPE This project has been proposed a speaker recognition system using DWT based MFCC with Multi SVM classifier. The study reveals that we are able to recognize a particular speaker from a number of speakers. DWT based MFCC gives better performance in feature extraction and high noise reduction in the given speech signal. Binary SVM is not sufficient for speaker recognition from a large number of speakers. So here we use Multi SVM for classification. In future combination of techniques (MFCC, LPC, LPCC,) can be used for feature extraction. Hybrid techniques continuously provide improved results. In future we can extend this project to recognize the speaker even if one imitates another. ISSN: 2348 8549 www.internationaljournalssrg.org Page 50

References [1] Supriya Tripathis Speaker Recognition,IEEE Explore,2012. [2] S. K. Singh, Prof P. C. Pandey, Features And Techniques For Speaker Recognition,IIT Bombay. [3] Harish Chander Mahendru, Quick review of human speech production mechanism, ISSN,Volume 9,January 2014. [4] Masaaki Honda, Speech Production Mechanisms. 2013. [5] Harald Hoge, Siemens AG, Basic Parameters Of Speech Signal Analysis [6] Kirandeep Kaur, Neelu Jain, Feature Extraction and Classification for Automatic Speech Recognition System,ISSN,VOLUME 5,January 2015. [7] Rekha Hibare, Anup Vibhute, Feature Extraction In Speech Processing A Survey,IJCA,November 2014. [21] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques Journal Of Computing, Volume 2, Issue 3, March 2010, Issn 2151-9617. [22] 1sayf A. Majeed, 2hafizah Husain, 3salina Abdul Samad, 4tariq F. Idbeaa Mel Frequency Cepstral Coefficients (Mfcc) Feature Extraction Enhancement In The Application Of Speech Recognition: A Comparison Study 2005-2015 JATIT & LLS. [8] Shubhangi S. Jarande1, Prof. Surendra Waghmare, A Survey On Different Classifier In Speech Recognition Techniques,IJETAE,March 2014. [9] Kirandeep Kaur, Neelu Jain, Feature Extraction and Classification for Automatic Speaker Recognition System A Review,ISSN,January 2015. [10] Umer Malik1, P.K. Mishra, Automatic Speaker Recognition Using SVM,IJSR,2013. [11] Shreya Narang, Ms. Divya Gupta, Speech Feature Extraction Techniques: A Review IJCSMC,March 2015. [12] S.B.Dhonde, S.M.Jagade, Feature Extraction Techniques in Speaker Recognition: A Review,IJRMEE,May 2015. [13] Umer Malik1, P.K. Mishra, Automatic Speaker Recognition Using SVM,IJSR 2013. [14] Shreya Narang, Ms. Divya Gupta, Speech Feature Extraction Techniques: A Review [15] Alfredo Maesa1, Fabio Garzia, Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models Journal of jis.2012.34041. [16] Md. Rashidul Hasan, Mustafa Jamil, Speaker Identification Using Mel Frequency Cepstral Coefficients Icece 2004. [17] Roma Bharti, Manav rachna, Real Time Speaker Recognition System using MFCC and Vector Quantization IJCA May 2015.. [18] Aamir Khan,Muhammad Farhan,Asar Ali Speech Recognition:Increasing Efficiency of Support Vector Machines IJCAVolume 35 No.7, December 2011. [19] K.Deepak,rishispeaker recognitionsing Support Vector Machines issn:issue-2, Feb.-2014. [20] Shanthini Pandiaraj and K.R. Shankar Kumar Speaker Identification Using Discrete Wavelet Transform journal Of Computer Science 2014. ISSN: 2348 8549 www.internationaljournalssrg.org Page 51