A SURVEY: SPEECH EMOTION IDENTIFICATION

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A SURVEY: SPEECH EMOTION IDENTIFICATION"

Transcription

1 A SURVEY: SPEECH EMOTION IDENTIFICATION Sejal Patel 1, Salman Bombaywala 2 M.E. Students, Department Of EC, SNPIT & RC, Umrakh, Gujarat, India 1 Assistant Professor, Department Of EC, SNPIT & RC, Umrakh, Gujarat, India 2 Abstract: In recent years a great deal of research has been done to automatically recognize emotions from human speech. Emotion is a symbol to express the feelings of one s expression. It is also important to identify the emotion of the user to understand the feeling of an utterance. There are main three goals in speech emotion detection. First is to select appropriate features like energy, the pitch, LPC, LPCC, voice quality, MFCC, and delta MFCC for emotional speech recognition. Second goal is to provide data-base which includes different languages, number of persons, and number of emotion. Third goal is to classifying speech into emotional states. It utilizes Hidden Markov Model (HMM), Artificial Neural Network (ANN), and Support Vector Machine (SVM), correlation, Dynamic Time Wrapping (DTW). The paper surveys different techniques developed by keeping the above goals into consideration. The emotions considered in this study are anger, fear, happy, neutral and sad. Speech emotion recognition is useful for applications which require man machine interaction such as web movies and computer tutorial where the response of those systems to the user depends on the detected emotion. It is also useful for in-car board system where information of the mental state of the driver may be provided to the system to initiate his/her safety. Key Words: ANN, correlation, DTW, End point, MFCC, Start point. I. INTRODUCTION Speech is communication or expression of thoughts in spoken words. Humans also express their emotion via written and spoken language. Emotion is a symbol to express the feelings of one s expression. We are still Far from having a natural interaction between man and machine because the machine does not understand the emotional state of the speaker. There are several methods of feature extraction like MFCC, energy, pitch, LPC (Linear Predictive Coding), voice quality etc., several classification techniques like ANN, HMM, SVM, correlation, DTW etc. which can be effectively utilized for analysis of voice signal. The acoustic variability introduced by the existence of different sentences, speakers, speaking styles, and speaking rates adds another obstacle because these properties directly affect most of the common extracted speech features such as frequency, pitch, and energy. First human voice is converted into digital signal form to produce digital data representing each level of signal at every discrete time step. After that digitized Speech samples are processed using combination of features like start point, end point to produce voice features. Then after these voice All rights reserved by ISSN :

2 features can go through to select the pattern that matches the database and find correlation between each reference database and test input file in order to minimize the resulting error between them depending on feature. Speech emotion recognition is useful for application in car board system which gives the information of the mental state of the driver and provide to the system to initiate his /her safety. It is also useful for natural men-machine interaction such as computer tutorial application where the response of this system to the user depends on the detected emotion. It may be also useful in call center application and mobile communication [8]. It is useful in storytelling, interactive movies, to check the person s behavior, e-tutoring, call analysis in emergency services like ambulance and fire brigade. II. BLOCK DAIGRAM The basic block diagram of speech emotion detection is as shown in Figure 1. Figure 1 shows the overall process of speech recognition system [4]: Figure 1 Architecture of speech emotion detection [4] A. Input Speech: Speech contains information about textual messages speak identity and intended emotion. Speech is a complex signal produce from a time varying vocal tract system exited by a time varying excitation source. Speech is the fastest and efficient method of interaction between human. Speech is communication or expression of thoughts in spoken words. B. Preprocessing: In the pre-processing, work with sampling, frames, windows, start pointend point detection. The input signals are changed to suit the speaker recognition system by these works. We can Say that, it is a data compression processing, in that we are selecting the useful information from signal data available. The use of energy threshold is to remove the low energy frame which is usually the silence period of the speech signal. The silence frame will influence the value of features in the feature extraction process. We will compare the results when using different energy thresholds. 1. Energy of a signal The energy is a set of samples and it is related by the sum of the square of the samples. Find information about the time-dependent properties of the speech signal. It is very useful to select the threshold for start-point & end-point detection. To calculate STE the speech signal is sampled using a rectangular window function of width ω samples, where. Within each window, energy is computed as follow [6]: All rights reserved by ISSN :

3 Where, e = Energy of a particular window xi = Indicating i th sample 2. Start and End Point Detection It gives a more beneficial as they are used to remove background noise and made speech Signal better than previous signal. Start point of any voice signal provide the exact starting location of voice sample based on STE values, so that all previous unwanted samples would be removed and new voice signal would be created. [6] This same process is also applied to detect End points of any speech signal. [9] C. Feature Extraction Technique: Feature Extraction Technique is a convert speech signal into a sequence of feature vector.the important part of feature extraction is to extract characteristics from the signal that are unique to each individual. There are different feature extraction techniques like Pitch, Energy, Vocal track cross section area, Speech rate, Formants, Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction coefficient (LPC), Linear Prediction Cepstral Coefficients (LPCC), are used for emotion based speaker recognition system. 1. MFCC MFCC is the most popular feature extraction technique for speech recognition. (1) Figure 2 Block diagram of MFCC [3] Step 1: Pre-Emphasis: Pre-Emphasis is a technique used in speech processing to enhance high frequencies of the signal. It is done by using FIR high-pass filter.[4] 1. The speech signal generally contains more speaker specific information in high frequencies than the lower frequencies. 2. Pre-emphasis removes some of the glottal effects from the vocal tract parameter. Step 2: Framing: The continuous voice signal is divided into frames of N samples. Adjacent frames are being separated by M (M<N). [3] Generally the frame size is equal to power of two in order to facilitate the use of FFT. Successive frames are overlapping with each other by M samples. Overlapped frame is used for maintaining continuity between frames. Overlapping ensure high correlation between coefficient of successive frames. [6] Step 3: Windowing: Window is used as window shape by considering the next block in feature extraction processing chain and integrates all the closest frequency lines. [6] To All rights reserved by ISSN :

4 window each individual frame means to minimize the signal discontinuities at the beginning and end of each frame. To minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. If window is defined as, W (n), 0 n N-1 (2) Where, N = number of samples in each frame W(n) = Hamming window Then the result of windowing signal is Y[n] =X[n] W [n] ; 0 n L 1 (3) Where, W (n) = cos ; 0 n N-1 (4) = 0 herwise W [n] = Window operation Y[n] = Output signal X [n] = Input signal L = Number of samples per frame Step 4: Fast Fourier Transform (FFT): FFT is converting each frame of L samples from time domain into frequency domain. FFT reduces computation time required to compute a DFT and improve performance. The Fourier Transform is to convert the convolution of the glottal pulse U[n] and the vocal tract impulse response H[n] in the time domain. This statement supports the equation below: y(w) = FFT[h(t) x(t)] = H(w) X(w) (5) Where, H(w), x(w) and y(w) are the Fourier transform of h(t), x(t) and y(t) respectively. [6] Step 5: Mel Filter Bank: we can say that the frequencies range in FFT spectrum is very wide and voice signal does not follow the linear scale. Thus for each tone with an actual frequency f, Measured in Hz, a subjective pitch is measured on a scale called the mel scale [3]. Set of triangular filters that are used to compute a weighted sum of filter spectral components so that the output of process approximates to a Mel scale. Use the following approximate formula to compute the mel for a given frequency fin Hz: [3] (f) = 2595 log10(1 + f(hz) 700) (6) Figure 3 Mel scale filter bank [3][6] Above figure depicts a set of triangular filters that are used to compute a weighted sum of filter spectral components so that the output of process approximates to a Mel scale. Each filter s magnitude frequency is triangular in shape and equal to unity at the center frequency and decline linearly to zero at center frequency of two adjacent filters. Then, each filter output is the sum of its filtered spectral Components. All rights reserved by ISSN :

5 Step 6: Log and Discrete Cosine Transform (DCT): Output of the mel filter bank is spectral components which is given to the Log block. This log Mel spectrum is converted into time domain using Discrete Cosine Transform (DCT). The result of the reconstruction is called Mel-Frequency Cepstrum Coefficient. The collection of coefficient is called acoustic vectors. So that, each input utterance is transformed into a sequence of acoustic vector. [6] 2. LPC: LPC Coefficients can closely approximate current speech sample as a linear combination of past samples, the LPC coefficients cannot totally represent the current speech sample. Because the human speech production system is not linear while the LPC model is, the LPC coefficients only represent the linear property and loss the nonlinear part. Therefore, the LPCC coefficients were proposed for represent the nonlinear property. D. Classifier: Different types of classifier are used, such as, Polynomial classifier, Hidden Markov Model (HMM), Gaussian Mixture Model (GMM),correlation, Artificial Neural Network (ANN), Support Vector Machine (SVM), Dynamic Time Wrapping (DTW) is available. Here, MFCC is used as basic feature extraction techniques with a classifier Artificial Neural Network (ANN). 1. Correlation: The most familiar measure of dependence between two quantities is the person product moment correlation coefficient, or person s correlation coefficient, commonly called simply the correlation coefficient. It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl person developed the coefficient. Person correlation coefficient can be the correlation coefficient is symmetric: Corr (X,Y)= corr (Y,X) (7) The person correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), -1 in the case of a perfect decreasing (inverse) linear relationship (anticorrelation), and some value between -1 and 1 in all other cases, indicating the degree of linear dependence between the variable. As it approaches zero there is less of a relationship. The closer the coefficient is to either 1 or 1, the stronger the correlation between the variables. [12] 2. ANN: Artificial Neural Network (ANN) is a system which operates in a very similar manner as that of the human brain, and solves problems by self learning.[1] NN is referred to a network of biological neurons system that process and transmit information. An ANN is configured for a specific application such as pattern recognition or data classification through learning process. Learning in biological system involves adjustments to the synaptic connections that exist between the neurons. Artificial neural network (ANN) possesses excellent discriminates power and learning capabilities and represents implicit knowledge. Therefore, ANN is used in emotion recognition fields. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. Artificial neural networks are the result of academic investigation that uses mathematical formulations to model nervous system operations. The resulting techniques are being successfully applied in a variety of everyday business applications. 3. Dynamic Time Warping (DTW): All rights reserved by ISSN :

6 DTW algorithm is based on measuring similarity among two time series which may vary in time or speed. The similarity is evaluated in terms of alignment between two times series if one time series may be warped nonlinearly by stretching or shrinking it along its time axis [6][ 9]. This warping between two time series can then be used to find analogues regions between two time series or to determine similarity between the two time series. Mathematically, the DTW contrast two dynamic patterns and evaluates similarity by calculating a minimum distance between them. To realize this, consider two time series Q and C, which has length n and m respectively. Where, Q = q1, q2, qi,... qn C = c1, c2, cj,... cm To align two sequences using DTW, an n -by- m matrix where the (i th, j th ) element of the matrix involves the distance d (qi, cj) among the two points qi and cj is established. Then, the absolute distance between the values of two sequences is calculated using the Euclidean distance computation [4]: d(qi,cj)= 2 (qi - cj) (9) Each matrix element (i, j) corresponds to the alignment between the points qi and cj. Then, accumulated distance is obtained by: D (i, j) = min (D (i -1, j -1), D (i -1, j), D (i, j -1)] + D (i, j) This is done as follows [6]: 1. Start with the calculation of g (1, 1) = d (1, 1). Calculate the first row g (i, 1) =g (i 1, 1) + d(i, 1). 2. Calculate the first column g (1, j) = g (1, j) + d (1, j). 3. Move to the second row g (i, 2) = min (g (i, 1), g (I 1, 1), g (i 1, 2)) + d (i, 2). Book keep for each cell the index of this neighboring cell, which contributes the minimum score. 4. Carry on from left to right and from bottom to top with the rest of the grid g (i, j) = min (g (i, j 1), g (i 1, j 1), g (i 1, j)) + d (i, j). 5. Trace back the best path through the grid starting from g (n, m) & moving towards g (1,1). Hence the path which gives minimum distance after testing with the feature vectors stored in the database is the identified speaker emotions. COMPARISON TABLE OF SURVEY PAPER Sr.No. Features Classifier Remarks 1 MFCC, DWT, LPC, f0, voice energy ANN ANN is better but it is language dependant 2 MFCC, formants BP neural Network, Low accuracy 3 MFCC - MFCC gives better result MFCC, pitch, GMM gives the better performance for 4 GMM, HMM anger and surprise, HMM is for disgust, energy fear and neutral 5 MFCC,STE ZCR Correlation, DTW DTW & correlation are finding the best match proves to be an effective method for speech recognition All rights reserved by ISSN :

7 III. CONCLUDING REMARKS Speech emotion recognition systems based on the several classifiers is illustrated. The important issues in speech emotion recognition system are the signal processing unit in which appropriate features are extracted from available speech signal and another is a classifier which recognizes emotions from the speech signal. Different types of feature and classification techniques are studied. By the analysis it can be said that MFCC is the best feature extraction technique for emotion recognition. There are many applications like: To check the persons behavior, E-tutoring, Call analyses in emergency services like ambulance and fire brigade, Emotion Recognition in Call Center, Storytelling, In-Car Board System, Computer Games, and In Robots. REFERENCES [01] Firoz Shah.A, Raji Sukumar.A, Babu Anto.P, Automatic Emotion Recognition from Speech Using Artificial Neural Networks with Gender- Dependent Databases, International Conference on Asdvances in Computing, Control, and Telecommunication Technologies, IEEE, 2009 [02] Ying SHI And Weihua SONG, Speech Emotion Recognition Based on Data Mining Technology, International Conference on Natural Computation (ICNC 2010), IEEE, 2010 [03] Anurag Jain, Nupur Prakash, Evaluation of MFCC for Emotion Identification in Hindi Speech, IEEE, 2011 [04] Tsang-Long Pao, Chun-Hsiang Wang, and Yu-Ji Li, A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition, International Symposium on Parallel Architectures, Algorithms and Programming, IEEE, 2012 [05] Manav Bhaykar, Jainath Yadav, and K. Sreenivasa Rao, Speaker Dependent, Speaker Independent and Cross Language Emotion Recognition from Speech Using GMM and HMM, IEEE, 2013 [06] Nidhi Desai, Kinnal Dhameliya and Vijayendra Desai, Recognizing voice commands for robot using MFCC and DTW, International Journal of Advanced Research in Computer and Communication Engineering, May 2014 [07] Vinay, Shilpi Gupta, Anu Mehra, Gender Specific Emotion Recognition Through Speech Signals, International Conference on Signal Processing and Integrated Networks (SPIN), 2014 [08] Tatjana Liogien, Gintautas Tamuleviius, Minimal Cross-correlation Criterion for Speech Emotion Multi-level Feature Selection, IEEE, 2015 [09] S.Lalitha, Anoop Mudupu, Bala Visali Nandyala, Renuka Munagala, Speech Emotion Recognition using DWT, IEEE, 2015 [10] Ritu D.Shah, Dr. Anil.C.Suthar, Speech Emotion Recognition Based on SVM Using MATLAB, International Journal of Innovative Research in Computer and Communication Engineering, 2016 [11] Lawrence R. Rabiner and Ronald W. Schafer, Digital Processing of Speech Signals; 9th Edn; Dorling Kindersley Pvt. Ltd., licepearson Education in South Asia, 2012 [12] (accessed on: 24 march 2017) All rights reserved by ISSN :

Isolated Speech Recognition Using MFCC and DTW

Isolated Speech Recognition Using MFCC and DTW Isolated Speech Recognition Using MFCC and DTW P.P.S.Subhashini Associate Professor, RVR & JC College of Engineering. ABSTRACT This paper describes an approach of isolated speech recognition by using the

More information

Speech processing for isolated Marathi word recognition using MFCC and DTW features

Speech processing for isolated Marathi word recognition using MFCC and DTW features Speech processing for isolated Marathi word recognition using MFCC and DTW features Mayur Babaji Shinde Department of Electronics and Communication Engineering Sandip Institute of Technology & Research

More information

Speaker Recognition Using Vocal Tract Features

Speaker Recognition Using Vocal Tract Features International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 1 (August 2013) PP: 26-30 Speaker Recognition Using Vocal Tract Features Prasanth P. S. Sree Chitra

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features

Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features Speech Processing for Marathi Numeral Recognition using MFCC and DTW Features Siddheshwar S. Gangonda*, Dr. Prachi Mukherji** *(Smt. K. N. College of Engineering,Wadgaon(Bk), Pune, India). sgangonda@gmail.com

More information

VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS

VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS Vol 9, Suppl. 3, 2016 Online - 2455-3891 Print - 0974-2441 Research Article VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS ABSTRACT MAHALAKSHMI P 1 *, MURUGANANDAM M 2, SHARMILA

More information

Speaker Recognition Using MFCC and GMM with EM

Speaker Recognition Using MFCC and GMM with EM RESEARCH ARTICLE OPEN ACCESS Speaker Recognition Using MFCC and GMM with EM Apurva Adikane, Minal Moon, Pooja Dehankar, Shraddha Borkar, Sandip Desai Department of Electronics and Telecommunications, Yeshwantrao

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION James H. Nealand, Alan B. Bradley, & Margaret Lech School of Electrical and Computer Systems Engineering, RMIT University,

More information

L12: Template matching

L12: Template matching Introduction to ASR Pattern matching Dynamic time warping Refinements to DTW L12: Template matching This lecture is based on [Holmes, 2001, ch. 8] Introduction to Speech Processing Ricardo Gutierrez-Osuna

More information

Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words

Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words Suitable Feature Extraction and Recognition Technique for Isolated Tamil Spoken Words Vimala.C, Radha.V Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for

More information

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral EVALUATION OF AUTOMATIC SPEAKER RECOGNITION APPROACHES Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral matousek@kiv.zcu.cz Abstract: This paper deals with

More information

Speech Recognition using MFCC and Neural Networks

Speech Recognition using MFCC and Neural Networks Speech Recognition using MFCC and Neural Networks 1 Divyesh S. Mistry, 2 Prof.Dr.A.V.Kulkarni Department of Electronics and Communication, Pad. Dr. D. Y. Patil Institute of Engineering & Technology, Pimpri,

More information

Gender Classification Based on FeedForward Backpropagation Neural Network

Gender Classification Based on FeedForward Backpropagation Neural Network Gender Classification Based on FeedForward Backpropagation Neural Network S. Mostafa Rahimi Azghadi 1, M. Reza Bonyadi 1 and Hamed Shahhosseini 2 1 Department of Electrical and Computer Engineering, Shahid

More information

Accent Classification

Accent Classification Accent Classification Phumchanit Watanaprakornkul, Chantat Eksombatchai, and Peter Chien Introduction Accents are patterns of speech that speakers of a language exhibit; they are normally held in common

More information

Foreign Accent Classification

Foreign Accent Classification Foreign Accent Classification CS 229, Fall 2011 Paul Chen pochuan@stanford.edu Julia Lee juleea@stanford.edu Julia Neidert jneid@stanford.edu ABSTRACT We worked to create an effective classifier for foreign

More information

AN APPROACH FOR CLASSIFICATION OF DYSFLUENT AND FLUENT SPEECH USING K-NN

AN APPROACH FOR CLASSIFICATION OF DYSFLUENT AND FLUENT SPEECH USING K-NN AN APPROACH FOR CLASSIFICATION OF DYSFLUENT AND FLUENT SPEECH USING K-NN AND SVM P.Mahesha and D.S.Vinod 2 Department of Computer Science and Engineering, Sri Jayachamarajendra College of Engineering,

More information

Volume 1, No.3, November December 2012

Volume 1, No.3, November December 2012 Volume 1, No.3, November December 2012 Suchismita Sinha et al, International Journal of Computing, Communications and Networking, 1(3), November-December 2012, 115-125 International Journal of Computing,

More information

Performance Analysis of Spoken Arabic Digits Recognition Techniques

Performance Analysis of Spoken Arabic Digits Recognition Techniques JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL., NO., JUNE 5 Performance Analysis of Spoken Arabic Digits Recognition Techniques Ali Ganoun and Ibrahim Almerhag Abstract A performance evaluation of

More information

COMP150 DR Final Project Proposal

COMP150 DR Final Project Proposal COMP150 DR Final Project Proposal Ari Brown and Julie Jiang October 26, 2017 Abstract The problem of sound classification has been studied in depth and has multiple applications related to identity discrimination,

More information

Music Genre Classification Using MFCC, K-NN and SVM Classifier

Music Genre Classification Using MFCC, K-NN and SVM Classifier Volume 4, Issue 2, February-2017, pp. 43-47 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Music Genre Classification Using MFCC,

More information

AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION

AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION AUTOMATIC ARABIC PRONUNCIATION SCORING FOR LANGUAGE INSTRUCTION Hassan Dahan, Abdul Hussin, Zaidi Razak, Mourad Odelha University of Malaya (MALAYSIA) hasbri@um.edu.my Abstract Automatic articulation scoring

More information

Speech Emotion Recognition Using Residual Phase and MFCC Features

Speech Emotion Recognition Using Residual Phase and MFCC Features Speech Emotion Recognition Using Residual Phase and MFCC Features N.J. Nalini, S. Palanivel, M. Balasubramanian 3,,3 Department of Computer Science and Engineering, Annamalai University Annamalainagar

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT

VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT Prerana Das, Kakali Acharjee, Pranab Das and Vijay Prasad* Department of Computer Science & Engineering and Information Technology, School of Technology, Assam

More information

Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529

Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529 SMOOTHED TIME/FREQUENCY FEATURES FOR VOWEL CLASSIFICATION Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529 ABSTRACT A

More information

Review of Algorithms and Applications in Speech Recognition System

Review of Algorithms and Applications in Speech Recognition System Review of Algorithms and Applications in Speech Recognition System Rashmi C R Assistant Professor, Department of CSE CIT, Gubbi, Tumkur,Karnataka,India Abstract- Speech is one of the natural ways for humans

More information

Low-Delay Singing Voice Alignment to Text

Low-Delay Singing Voice Alignment to Text Low-Delay Singing Voice Alignment to Text Alex Loscos, Pedro Cano, Jordi Bonada Audiovisual Institute, Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain {aloscos, pcano, jboni }@iua.upf.es http://www.iua.upf.es

More information

Voice Recognition based on vote-som

Voice Recognition based on vote-som Voice Recognition based on vote-som Cesar Estrebou, Waldo Hasperue, Laura Lanzarini III-LIDI (Institute of Research in Computer Science LIDI) Faculty of Computer Science, National University of La Plata

More information

Comparative study of automatic speech recognition techniques

Comparative study of automatic speech recognition techniques Published in IET Signal Processing Received on 21st May 2012 Revised on 26th November 2012 Accepted on 8th January 2013 ISSN 1751-9675 Comparative study of automatic speech recognition techniques Michelle

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 2

Engineering, University of Pune,Ambi, Talegaon Pune, Indi 1 2 1011 MFCC Based Speaker Recognition using Matlab KAVITA YADAV 1, MORESH MUKHEDKAR 2. 1 PG student, Department of Electronics and Telecommunication, Dr.D.Y.Patil College of Engineering, University of Pune,Ambi,

More information

i-vector Algorithm with Gaussian Mixture Model for Efficient Speech Emotion Recognition

i-vector Algorithm with Gaussian Mixture Model for Efficient Speech Emotion Recognition 2015 International Conference on Computational Science and Computational Intelligence i-vector Algorithm with Gaussian Mixture Model for Efficient Speech Emotion Recognition Joan Gomes* and Mohamed El-Sharkawy

More information

On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification

On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification On the Use of Perceptual Line Spectral Pairs Frequencies for Speaker Identification Md. Sahidullah and Goutam Saha Department of Electronics and Electrical Communication Engineering Indian Institute of

More information

Utterance intonation imaging using the cepstral analysis

Utterance intonation imaging using the cepstral analysis Annales UMCS Informatica AI 8(1) (2008) 157-163 10.2478/v10065-008-0015-3 Annales UMCS Informatica Lublin-Polonia Sectio AI http://www.annales.umcs.lublin.pl/ Utterance intonation imaging using the cepstral

More information

Text-Independent Speaker Recognition System

Text-Independent Speaker Recognition System Text-Independent Speaker Recognition System ABSTRACT The article introduces a simple, yet complete and representative text-independent speaker recognition system. The system can not only recognize different

More information

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks. HussamHebbo Jae Won Kim Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

More information

BUILDING AN ASSISTANT MOBILE APPLICATION FOR TEACHING ARABIC PRONUNCIATION USING A NEW APPROACH FOR ARABIC SPEECH RECOGNITION

BUILDING AN ASSISTANT MOBILE APPLICATION FOR TEACHING ARABIC PRONUNCIATION USING A NEW APPROACH FOR ARABIC SPEECH RECOGNITION BUILDING AN ASSISTANT MOBILE APPLICATION FOR TEACHING ARABIC PRONUNCIATION USING A NEW APPROACH FOR ARABIC SPEECH RECOGNITION BASSEL ALKHATIB 1, MOUHAMAD KAWAS 2, AMMAR ALNAHHAS 3, RAMA BONDOK 4, REEM

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

Spoken Language Identification Using Hybrid Feature Extraction Methods

Spoken Language Identification Using Hybrid Feature Extraction Methods JOURNAL OF TELECOMMUNICATIONS, VOLUME 1, ISSUE 2, MARCH 2010 11 Spoken Language Identification Using Hybrid Feature Extraction Methods Pawan Kumar, Astik Biswas, A.N. Mishra and Mahesh Chandra Abstract

More information

Automatic identification of individual killer whales

Automatic identification of individual killer whales Automatic identification of individual killer whales Judith C. Brown a) Department of Physics, Wellesley College, Wellesley, Massachusetts 02481 and Media Laboratory, Massachusetts Institute of Technology,

More information

Gender Classification by Speech Analysis

Gender Classification by Speech Analysis Gender Classification by Speech Analysis BhagyaLaxmi Jena 1, Abhishek Majhi 2, Beda Prakash Panigrahi 3 1 Asst. Professor, Electronics & Tele-communication Dept., Silicon Institute of Technology 2,3 Students

More information

Recognition of Emotions in Speech

Recognition of Emotions in Speech Recognition of Emotions in Speech Enrique M. Albornoz, María B. Crolla and Diego H. Milone Grupo de investigación en señales e inteligencia computacional Facultad de Ingeniería y Ciencias Hídricas, Universidad

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Outline Introduction to Neural Network Introduction to Artificial Neural Network Properties of Artificial Neural Network Applications of Artificial Neural Network Demo Neural

More information

Speech Synthesizer for the Pashto Continuous Speech based on Formant

Speech Synthesizer for the Pashto Continuous Speech based on Formant Speech Synthesizer for the Pashto Continuous Speech based on Formant Technique Sahibzada Abdur Rehman Abid 1, Nasir Ahmad 1, Muhammad Akbar Ali Khan 1, Jebran Khan 1, 1 Department of Computer Systems Engineering,

More information

A Low-Complexity Speaker-and-Word Recognition Application for Resource- Constrained Devices

A Low-Complexity Speaker-and-Word Recognition Application for Resource- Constrained Devices A Low-Complexity Speaker-and-Word Application for Resource- Constrained Devices G. R. Dhinesh, G. R. Jagadeesh, T. Srikanthan Centre for High Performance Embedded Systems Nanyang Technological University,

More information

L16: Speaker recognition

L16: Speaker recognition L16: Speaker recognition Introduction Measurement of speaker characteristics Construction of speaker models Decision and performance Applications [This lecture is based on Rosenberg et al., 2008, in Benesty

More information

18-551, Fall 2006 Group 8: Final Report. Say That Again? Interactive Accent Decoder

18-551, Fall 2006 Group 8: Final Report. Say That Again? Interactive Accent Decoder 18-551, Fall 2006 Group 8: Final Report Say That Again? Interactive Accent Decoder Cherlisa Tarpeh Anthony Robinson Candice Lawrence Chantelle Humphreys ctarpeh@cmu.edu aarobins@andrew.cmu.edu clawrenc@andrew.cmu.edu

More information

Convolutional Neural Networks for Speech Recognition

Convolutional Neural Networks for Speech Recognition IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL 22, NO 10, OCTOBER 2014 1533 Convolutional Neural Networks for Speech Recognition Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang,

More information

AUTOMATIC SONG-TYPE CLASSIFICATION AND SPEAKER IDENTIFICATION OF NORWEGIAN ORTOLAN BUNTING (EMBERIZA HORTULANA) VOCALIZATIONS

AUTOMATIC SONG-TYPE CLASSIFICATION AND SPEAKER IDENTIFICATION OF NORWEGIAN ORTOLAN BUNTING (EMBERIZA HORTULANA) VOCALIZATIONS AUTOMATIC SONG-TYPE CLASSIFICATION AND SPEAKER IDENTIFICATION OF NORWEGIAN ORTOLAN BUNTING (EMBERIZA HORTULANA) VOCALIZATIONS Marek B. Trawicki & Michael T. Johnson Marquette University Department of Electrical

More information

VOICE CONVERSION BY PROSODY AND VOCAL TRACT MODIFICATION

VOICE CONVERSION BY PROSODY AND VOCAL TRACT MODIFICATION VOICE CONVERSION BY PROSODY AND VOCAL TRACT MODIFICATION K. Sreenivasa Rao Department of ECE, Indian Institute of Technology Guwahati, Guwahati - 781 39, India. E-mail: ksrao@iitg.ernet.in B. Yegnanarayana

More information

Pass Phrase Based Speaker Recognition for Authentication

Pass Phrase Based Speaker Recognition for Authentication Pass Phrase Based Speaker Recognition for Authentication Heinz Hertlein, Dr. Robert Frischholz, Dr. Elmar Nöth* HumanScan GmbH Wetterkreuz 19a 91058 Erlangen/Tennenlohe, Germany * Chair for Pattern Recognition,

More information

SPEAKER IDENTIFICATION

SPEAKER IDENTIFICATION SPEAKER IDENTIFICATION Ms. Arundhati S. Mehendale and Mrs. M. R. Dixit Department of Electronics K.I.T. s College of Engineering, Kolhapur ABSTRACT Speaker recognition is the computing task of validating

More information

Automatic Speech Recognition System: A Review

Automatic Speech Recognition System: A Review Automatic Speech Recognition System: A Review Neerja Arora Assistant Professor KIIT College of Engineering Gurgaon,India ABSTRACT Speech is the most prominent & primary mode of Communication among human

More information

Session 1: Gesture Recognition & Machine Learning Fundamentals

Session 1: Gesture Recognition & Machine Learning Fundamentals IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

More information

Quranic Verse Recitation Feature Extraction using Mel-Frequency Cepstral Coefficient (MFCC)

Quranic Verse Recitation Feature Extraction using Mel-Frequency Cepstral Coefficient (MFCC) University of Malaya From the SelectedWorks of Noor Jamaliah Ibrahim March, 2008 Quranic Verse Recitation Feature Extraction using Mel-Frequency Cepstral Coefficient (MFCC) Noor Jamaliah Ibrahim, University

More information

Lombard Speech Recognition: A Comparative Study

Lombard Speech Recognition: A Comparative Study Lombard Speech Recognition: A Comparative Study H. Bořil 1, P. Fousek 1, D. Sündermann 2, P. Červa 3, J. Žďánský 3 1 Czech Technical University in Prague, Czech Republic {borilh, p.fousek}@gmail.com 2

More information

ELEC9723 Speech Processing

ELEC9723 Speech Processing ELEC9723 Speech Processing COURSE INTRODUCTION Session 1, 2013 s Course Staff Course conveners: Dr. Vidhyasaharan Sethu, v.sethu@unsw.edu.au (EE304) Laboratory demonstrator: Nicholas Cummins, n.p.cummins@unsw.edu.au

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition

On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition Tomi Kinnunen 1, Ville Hautamäki 2, and Pasi Fränti 2 1 Speech and Dialogue Processing Lab Institution for Infocomm Research (I

More information

Machine Learning and Applications in Finance

Machine Learning and Applications in Finance Machine Learning and Applications in Finance Christian Hesse 1,2,* 1 Autobahn Equity Europe, Global Markets Equity, Deutsche Bank AG, London, UK christian-a.hesse@db.com 2 Department of Computer Science,

More information

PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK

PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK Divya Bansal 1, Ankita Goel 2, Khushneet Jindal 3 School of Mathematics and Computer Applications, Thapar University, Patiala (Punjab) India 1 divyabansal150@yahoo.com

More information

DEEP LEARNING FOR MONAURAL SPEECH SEPARATION

DEEP LEARNING FOR MONAURAL SPEECH SEPARATION DEEP LEARNING FOR MONAURAL SPEECH SEPARATION Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,

More information

learn from the accelerometer data? A close look into privacy Member: Devu Manikantan Shila

learn from the accelerometer data? A close look into privacy Member: Devu Manikantan Shila What can we learn from the accelerometer data? A close look into privacy Team Member: Devu Manikantan Shila Abstract: A handful of research efforts nowadays focus on gathering and analyzing the data from

More information

Physical Activity Recognition from Accelerometer Data Using a Multi Scale Ensemble Method

Physical Activity Recognition from Accelerometer Data Using a Multi Scale Ensemble Method Physical Activity Recognition from Accelerometer Data Using a Multi Scale Ensemble Method Yonglei Zheng, Weng Keen Wong, Xinze Guan (Oregon State University) Stewart Trost (University of Queensland) Introduction

More information

EE438 - Laboratory 9: Speech Processing

EE438 - Laboratory 9: Speech Processing Purdue University: EE438 - Digital Signal Processing with Applications 1 EE438 - Laboratory 9: Speech Processing June 11, 2004 1 Introduction Speech is an acoustic waveform that conveys information from

More information

Learning facial expressions from an image

Learning facial expressions from an image Learning facial expressions from an image Bhrugurajsinh Chudasama, Chinmay Duvedi, Jithin Parayil Thomas {bhrugu, cduvedi, jithinpt}@stanford.edu 1. Introduction Facial behavior is one of the most important

More information

Sequence Discriminative Training;Robust Speech Recognition1

Sequence Discriminative Training;Robust Speech Recognition1 Sequence Discriminative Training; Robust Speech Recognition Steve Renals Automatic Speech Recognition 16 March 2017 Sequence Discriminative Training;Robust Speech Recognition1 Recall: Maximum likelihood

More information

Applications of Machine Learning Algorithms. Speaker: Mohamed Elwakdy Date: 16/02/

Applications of Machine Learning Algorithms. Speaker: Mohamed Elwakdy Date: 16/02/ Applications of Machine Learning Algorithms Speaker: Mohamed Elwakdy Date: 16/02/2017 Email: mohamed.elwakdy@statslab-bi.co.nz Sponsors Outline & Content What is Machine Learning? Machine Learning Algorithms

More information

Speech Accent Classification

Speech Accent Classification Speech Accent Classification Corey Shih ctshih@stanford.edu 1. Introduction English is one of the most prevalent languages in the world, and is the one most commonly used for communication between native

More information

A Hybrid System for Audio Segmentation and Speech endpoint Detection of Broadcast News

A Hybrid System for Audio Segmentation and Speech endpoint Detection of Broadcast News A Hybrid System for Audio Segmentation and Speech endpoint Detection of Broadcast News Maria Markaki 1, Alexey Karpov 2, Elias Apostolopoulos 1, Maria Astrinaki 1, Yannis Stylianou 1, Andrey Ronzhin 2

More information

Neural Networks used for Speech Recognition

Neural Networks used for Speech Recognition JOURNAL OF AUTOMATIC CONTROL, UNIVERSITY OF BELGRADE, VOL. 20:1-7, 2010 Neural Networks used for Speech Recognition Wouter Gevaert, Georgi Tsenov, Valeri Mladenov, Senior Member, IEEE Abstract In this

More information

Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM

Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM Tamil Speech Recognition Using Hybrid Technique of EWTLBO and HMM Dr.E.Chandra M.Sc., M.phil., PhD 1, S.Sujiya M.C.A., MSc(Psyc) 2 1. Director, Department of Computer Science, Dr.SNS Rajalakshmi College

More information

Vowel Pronunciation Accuracy Checking System Based on Phoneme Segmentation and Formants Extraction

Vowel Pronunciation Accuracy Checking System Based on Phoneme Segmentation and Formants Extraction Vowel Pronunciation Accuracy Checking System Based on Phoneme Segmentation and Formants Extraction Chanwoo Kim and Wonyong Sung School of Electrical Engineering Seoul National University Shinlim-Dong,

More information

School of Computer Science and Information System

School of Computer Science and Information System School of Computer Science and Information System Master s Dissertation Assessing the discriminative power of Voice Submitted by Supervised by Pasupathy Naresh Trilok Dr. Sung-Hyuk Cha Dr. Charles Tappert

More information

Low-Audible Speech Detection using Perceptual and Entropy Features

Low-Audible Speech Detection using Perceptual and Entropy Features Low-Audible Speech Detection using Perceptual and Entropy Features Karthika Senan J P and Asha A S Department of Electronics and Communication, TKM Institute of Technology, Karuvelil, Kollam, Kerala, India.

More information

Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge

Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge 218 Bengio, De Mori and Cardin Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge Y oshua Bengio Renato De Mori Dept Computer Science Dept Computer Science McGill University

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Sentiment Analysis of Speech

Sentiment Analysis of Speech Sentiment Analysis of Speech Aishwarya Murarka 1, Kajal Shivarkar 2, Sneha 3, Vani Gupta 4,Prof.Lata Sankpal 5 Student, Department of Computer Engineering, Sinhgad Academy of Engineering, Pune, India 1-4

More information

AUTONOMOUS VEHICLE SPEAKER VERIFICATION SYSTEM, 12 MAY Autonomous Vehicle Speaker Verification System

AUTONOMOUS VEHICLE SPEAKER VERIFICATION SYSTEM, 12 MAY Autonomous Vehicle Speaker Verification System AUTONOMOUS VEHICLE SPEAKER VERIFICATION SYSTEM, 12 MAY 2014 1 Autonomous Vehicle Speaker Verification System Aaron Pfalzgraf, Christopher Sullivan, Dr. Jose R. Sanchez Abstract With the increasing interest

More information

Speaker Identification System using Autoregressive Model

Speaker Identification System using Autoregressive Model Research Journal of Applied Sciences, Engineering and echnology 4(1): 45-5, 212 ISSN: 24-7467 Maxwell Scientific Organization, 212 Submitted: September 7, 211 Accepted: September 3, 211 Published: January

More information

A comparison between human perception and a speaker verification system score of a voice imitation

A comparison between human perception and a speaker verification system score of a voice imitation PAGE 393 A comparison between human perception and a speaker verification system score of a voice imitation Elisabeth Zetterholm, Mats Blomberg 2, Daniel Elenius 2 Department of Philosophy & Linguistics,

More information

HMM-Based Emotional Speech Synthesis Using Average Emotion Model

HMM-Based Emotional Speech Synthesis Using Average Emotion Model HMM-Based Emotional Speech Synthesis Using Average Emotion Model Long Qin, Zhen-Hua Ling, Yi-Jian Wu, Bu-Fan Zhang, and Ren-Hua Wang iflytek Speech Lab, University of Science and Technology of China, Hefei

More information

A LEARNING PROCESS OF MULTILAYER PERCEPTRON FOR SPEECH RECOGNITION

A LEARNING PROCESS OF MULTILAYER PERCEPTRON FOR SPEECH RECOGNITION International Journal of Pure and Applied Mathematics Volume 107 No. 4 2016, 1005-1012 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v107i4.18

More information

Comparison between k-nn and svm method for speech emotion recognition

Comparison between k-nn and svm method for speech emotion recognition Comparison between k-nn and svm method for speech emotion recognition Muzaffar Khan, Tirupati Goskula, Mohmmed Nasiruddin,Ruhina Quazi Anjuman College of Engineering & Technology,Sadar, Nagpur, India Abstract

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Automatic Recognition of Speaker Age in an Inter-cultural Context

Automatic Recognition of Speaker Age in an Inter-cultural Context Automatic Recognition of Speaker Age in an Inter-cultural Context Michael Feld, DFKI in cooperation with Meraka Institute, Pretoria FEAST Speaker Classification Purposes Bootstrapping a User Model based

More information

L18: Speech synthesis (back end)

L18: Speech synthesis (back end) L18: Speech synthesis (back end) Articulatory synthesis Formant synthesis Concatenative synthesis (fixed inventory) Unit-selection synthesis HMM-based synthesis [This lecture is based on Schroeter, 2008,

More information

TOWARDS A ROBUST ARABIC SPEECH RECOGNITION SYSTEM BASED ON RESERVOIR COMPUTING. abdulrahman alalshekmubarak. Doctor of Philosophy

TOWARDS A ROBUST ARABIC SPEECH RECOGNITION SYSTEM BASED ON RESERVOIR COMPUTING. abdulrahman alalshekmubarak. Doctor of Philosophy TOWARDS A ROBUST ARABIC SPEECH RECOGNITION SYSTEM BASED ON RESERVOIR COMPUTING abdulrahman alalshekmubarak Doctor of Philosophy Computing Science and Mathematics University of Stirling November 2014 DECLARATION

More information

A method for recognition of coexisting environmental sound sources based on the Fisher s linear discriminant classifier

A method for recognition of coexisting environmental sound sources based on the Fisher s linear discriminant classifier A method for recognition of coexisting environmental sound sources based on the Fisher s linear discriminant classifier Ester Creixell 1, Karim Haddad 2, Wookeun Song 3, Shashank Chauhan 4 and Xavier Valero.

More information

Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod

Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod Advances in Music Information Retrieval using Deep Learning Techniques - Sid Pramod Music Information Retrieval (MIR) Science of retrieving information from music. Includes tasks such as Query by Example,

More information

Speech Emotion Recognition Using Deep Neural Network and Extreme. learning machine

Speech Emotion Recognition Using Deep Neural Network and Extreme. learning machine INTERSPEECH 2014 Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine Kun Han 1, Dong Yu 2, Ivan Tashev 2 1 Department of Computer Science and Engineering, The Ohio State University,

More information

Modulation frequency features for phoneme recognition in noisy speech

Modulation frequency features for phoneme recognition in noisy speech Modulation frequency features for phoneme recognition in noisy speech Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Idiap Research Institute, Rue Marconi 19, 1920 Martigny, Switzerland Ecole Polytechnique

More information

ISSN: Page 132

ISSN: Page 132 Voice recognition Using back propagation algorithm in neural networks Abdelmajid Hassan Mansour #1, Gafar Zen Alabdeen Salh *2, Hozayfa Hayder Zeen Alabdeen #3 # 1 Assistant Professor, Faculty of Computers

More information

SPECTRUM ANALYSIS OF SPEECH RECOGNITION VIA DISCRETE TCHEBICHEF TRANSFORM

SPECTRUM ANALYSIS OF SPEECH RECOGNITION VIA DISCRETE TCHEBICHEF TRANSFORM SPECTRUM ANALYSIS OF SPEECH RECOGNITION VIA DISCRETE TCHEBICHEF TRANSFORM Ferda Ernawan 1 and Nur Azman Abu, Nanna Suryana 2 1 Faculty of Information and Communication Technology Universitas Dian Nuswantoro

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Ganesh Sivaraman 1, Vikramjit Mitra 2, Carol Y. Espy-Wilson 1

Ganesh Sivaraman 1, Vikramjit Mitra 2, Carol Y. Espy-Wilson 1 FUSION OF ACOUSTIC, PERCEPTUAL AND PRODUCTION FEATURES FOR ROBUST SPEECH RECOGNITION IN HIGHLY NON-STATIONARY NOISE Ganesh Sivaraman 1, Vikramjit Mitra 2, Carol Y. Espy-Wilson 1 1 University of Maryland

More information