NEURAL NETWORKS FOR HINDI SPEECH RECOGNITION

Similar documents
Speech Emotion Recognition Using Support Vector Machine

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Modeling function word errors in DNN-HMM based LVCSR systems

A study of speaker adaptation for DNN-based speech synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Human Emotion Recognition From Speech

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speaker Identification by Comparison of Smart Methods. Abstract

Learning Methods in Multilingual Speech Recognition

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

WHEN THERE IS A mismatch between the acoustic

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speaker recognition using universal background model on YOHO database

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Review: Speech Recognition with Deep Learning Methods

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Word Segmentation of Off-line Handwritten Documents

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Speech Recognition at ICSI: Broadcast News and beyond

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

INPE São José dos Campos

On the Formation of Phoneme Categories in DNN Acoustic Models

Mining Association Rules in Student s Assessment Data

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Automatic Pronunciation Checker

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Learning Methods for Fuzzy Systems

Python Machine Learning

Deep Neural Network Language Models

Affective Classification of Generic Audio Clips using Regression Models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lecture 9: Speech Recognition

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Recognition by Indexing and Sequencing

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Calibration of Confidence Measures in Speech Recognition

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Test Effort Estimation Using Neural Network

Proceedings of Meetings on Acoustics

Circuit Simulators: A Revolutionary E-Learning Platform

Rule Learning with Negation: Issues Regarding Effectiveness

Reducing Features to Improve Bug Prediction

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Improvements to the Pruning Behavior of DNN Acoustic Models

Corrective Feedback and Persistent Learning for Information Extraction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Mandarin Lexical Tone Recognition: The Gating Paradigm

Edinburgh Research Explorer

arxiv: v1 [cs.cl] 27 Apr 2016

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Evolutive Neural Net Fuzzy Filtering: Basic Description

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Support Vector Machines for Speaker and Language Recognition

Softprop: Softmax Neural Network Backpropagation Learning

GACE Computer Science Assessment Test at a Glance

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Knowledge Transfer in Deep Convolutional Neural Nets

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Soft Computing based Learning for Cognitive Radio

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Spoofing and countermeasures for automatic speaker verification

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

School of Innovative Technologies and Engineering

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Indian Institute of Technology, Kanpur

Problems of the Arabic OCR: New Attitudes

Transcription:

NEURAL NETWORKS FOR HINDI SPEECH RECOGNITION Poonam Sharma Department of CSE & IT The NorthCap University, Gurgaon, Haryana, India Abstract Automatic Speech Recognition System has been a challenging and interesting area of research in last decades. But very few researchers have worked on Hindi and other Indian languages. In this paper a detailed study of using various neural networks for Hindi speech recognition with their detailed comparison is shown. In the first phase various MFCC, LPC and PLP features are calculated. In the second phase these features are fed to various neural networks to see their performance. Results show that the probabilistic neural networks give better performance as compared to the other methods. Anjali Garg Department of CSE The NorthCap University, Gurgaon, Haryana, India Linear Coding and Perceptual Linear Prediction (MFCC-LPC- PLP). Neural Network is trained by these samples and samples are tested against various neural networks. In Fig 1 the basic architecture of probabilistic network is shown. As speech model can also be compared to any Bayesian network because given the language model and acoustic model we have to find the probability of a particular word being spoken, therefore probabilistic neural networks are well suited for speech recognition. Keywords Mel Frequency Cepstral Coefficients (MFCC); Linear Predictive Code (LPC); Perceptual Linear Prediction (PLP); Probabilistic Neural Network (PNN) I. INTRODUCTION Automatic Speech Recognition (ASR) has gained significant progress in technology as well as in application. There exist vast performance gap between human speech recognition (HSR) and ASR which has restrained its full acceptance in real life situation. Over more than 50 years of research and advancement, speech recognition has gained huge success. But still performance is the major bottleneck for its practicality especially when it comes to Hindi language.. As a lot of research experiments and results are achieved in English language throughout the world but a limited success is achieved for Hindi Speech recognition. Moreover, Hindi is fourth-most spoken language in the whole world Therefore; there is huge scope to develop such systems in Hindi language. Features are extracted from input speech sample and in addition to training vector, it is sent for training using Neural Networks in supervised learning. Outputs are adjusted in accordance with targets. In modern ASR system, researchers use combination of basic technique in order to enhance recognition rate. The recognition rate is determined in terms of. In this work, the database of 150 samples (75 by male speaker, 75 by female speaker) is created and features are extracted using a combination of three feature extraction techniques, Mel frequency Cepstral Coefficient, Predictive Fig 1. Architecture of Probabilistic Neural Networks Back propagation networks work on the philosophy that It is better to learn from the errors. In this network the errors coming between output and target vectors are sent back again while training the network to increase the of the system. Basic architecture is shown in Fig 2. 108

A. Algorithm for Recognition The proposed algorithm is implemented on Matlab 2012a. The following steps are followed for recognizing speech: Step 1: At input, speech signal p i is given. Step 2: Perform windowing using hamming window of 25ms and do Discrete Fourier Transformation. w(n ) = 0.54 0.46 cos ( 2πn /N ), 0 n N (1) X(k) = N x(n)e j2πkn/n n=0 (2), 0 k N-1 Fig 2. Architecture of Back Propagation Neural Networks Linear Neural networks and perceptron are the simplest neural networks. These work better when the database is less and not much complexity is involved within the system. The basic architectures are shown in Fig 3 and Fig 4. Step 3: Compute features using Mel frequency cepstral coefficient (MFCC) and Mel frequency is given as below: Mel (f) = 2595 log 10 (1 + f/700) (3) Step 4: Compute features using Linear Predictive Coding(LPC). Step 5: Compute features using Perceptual Linear Prediction (PLP). Step 6: The final input vector obtained after merging features from step 3 and step 4 is: F = [M L C L P L] Step 7: Create final target vector for training. Fig 3. Architecture of Linear Neural Networks Step 8: Import final input vector and target vector and create different neural networks. For probabilistic neural network we can use the output criteria : 1 O i = [ ( (2πσ 2 P ] ( 1 ) Q exp Q q=1 { V Xq 2 } 2σ 2 (4) Step 9: Train the network and simulate results. Step 10: Test the selected sample against Neural Network. Step 11: If word spoken correctly, then Speech and display CORRECT; else, Speech Not and display INCORRECT. Step 12: Plot performance plot. Fig 4. Architecture of Perceptron Neural Networks II. PROPOSED METHOS In this paper, the algorithm is designed for Hindi speech recognition based on the results obtained from different feature extraction techniques namely MFCC, PLP and LPC. 109

2.2 Flowchart For male speaker, the output gives better results. The best performance is achieved at 0.03 at epoch 20. The Performance plot is shown below: Fig 6. Probabilistic Neural Network Training Performance Plot of male speaker 3.2 Simulated Results of various Feature Extraction Techniques 1. Results when Only Mel Frequency Cepstral Coefficient (MFCC): Table 2: Accuracy using only MFCC III. Fig 5: Flowchart. EXPERIMENT AND RESULT The below are the simulated results. In this paper, we have compared several Neural Networks and Feature Extraction techniques 3.1. Output of Algorithm 82.66% of average is achieved for male speaker and 80% is obtained for female speaker. So, at the output of average is achieved. 75 57 76% Accuracy using only MFCC 78% 2. Results of Only using Linear Predictive Coding (LPC): Table 3: Accuracy using only LPC Table 1: Output of algorithm Output of Algorithm 75 51 68% 75 53 70.66% Accuracy using only LPC 69.33% 110

3. Results of using Only Perceptual Linear Predictive (PLP): Table 4: Accuracy using only PLP 75 48 64% 75 45 60% Accuracy using only PLP 62% 4.Results when Combination of MFCC, LPC and PLP are used. Table 5: Accuracy using MFCC-LPC-PLP Accuracy using MFCC, LPC and PLP COMPARISON OF VARIOUS FEATURE EXTRACTION TECHNIQUES Table 6: Comparison of feature extraction technique SNo. Technique Accuracy 1. Mel-Frequency Cepstral 78% Coefficient (MFCC) 2. Linear Predictive Coding 69.33% (LPC) 3. Perceptual Linear Predictive 62% (PLP) 4. Mel-Frequency Cepstral Coefficient Linear Predictive Coding - Perceptual Linear Predictive (MFCC-LPC-PLP) 3.3 Simulated Results of various Neural Networks 1. Probabilistic Neural Network Table 7: Probabilistic Neural Network Probabilistic Neural Network 2. Feed Forward Back Propagation Network Table 8: Feed Forward Back Propagation Network 75 59 78.66% Feed Forward Back Propagation Network 79% 3. Perceptron Neural Network Table 9: Perceptron Neural Network 75 54 72% 75 56 74.66% Perceptron Neural Network 73.33% 4. Linear Neural Network Table 10: Linear Neural Network 75 53 70.66% 111

75 51 68% Linear Neural Network 69.33% COMPARISON OF DIFFERENT NEURAL NETWORKS Table 11. Comparison of different Neural Networks SNo. Technique Accuracy 1. Probabilistic Neural Network 2. Feed-Forward BPN 79% [5] T.Yoshioka ; M.J.F.Gales. (2014). Environmentally robust ASR for deep neural network acoustic models, Computer Speech and Language, pp. 65-86. [6] O.A.Hamid ; A.R. Mohamed ; H.Jiang. (2014). Convolution Neural Networks for speech recognition. IEEE/ACM transactions on Audio Speech and Language Processing, Vol 22, No 10, pp. 1533-1545. [7] L.R.Rabiner ; B.H.Juang. (1986). An introduction to Hidden Markov Models, IEEE Signal Process. Mag., pp. 4 16. [8] Sanghmitra V. Arora. (2013). Effect of time Derivatives of MFCC Features on HMM Based Speech Recognition System, ACEEE Internation Journal on Signal and Image Processing, Vol 4, No 3, pp. 50-55.. 3. Perceptron Neural Network 73.33% 4. Linear Neural Network 69.33% IV. CONCLUSION In this paper we have compared various neural network techniques for Hindi Speech recognition and it was observed that probabilistic neural networks work better as compared to other state of the art networks. This work can be extended further by incorporating some other features for increasing the recognition. Also the comparison is done only for isolated words so it can be extended to continuous speech also. Same techniques can be applied to other Indian languages also for designing their recognition systems. V. REFERENCES [1] J. M. Baker ; L. Deng ; J. Glass S. Khudanpur ; C. Lee, N. Morgan,; D. O Shaugnessy. (2009) : Research developments and directions in speech recognition and understanding, part 2, IEEE Signal Process. Mag., vol. 26, no. 4, pp. 78 85. [2] L.Dengand ; X. Li. (2013). Machine learning paradigms for speech recognition: An overview, IEEE Trans.Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060 1089. [3] S.Sinha ; S.S. Aggarwal ; Aruna Jain. (2013). Continuous Density Hidden Markov Models for Context Dependent Hind Speech Recognition, ICACCI,pp. 1953-1958. [4] A.Mohamed ; T.Sainath ; G.Dahl ; B.Ramabhadran ; G.Hinton ; M. Picheny. (2011). Deep belief networks using discriminative features for phone recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 5060 5063. 112