Formant Analysis of Vowels in Emotional States of Oriya Speech for Speaker across Gender

Similar documents
Human Emotion Recognition From Speech

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Emotion Recognition Using Support Vector Machine

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

A study of speaker adaptation for DNN-based speech synthesis

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Expressive speech synthesis: a review

Speaker Identification by Comparison of Smart Methods. Abstract

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Proceedings of Meetings on Acoustics

Speaker recognition using universal background model on YOHO database

Speaker Recognition. Speaker Diarization and Identification

Mandarin Lexical Tone Recognition: The Gating Paradigm

WHEN THERE IS A mismatch between the acoustic

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Segregation of Unvoiced Speech from Nonspeech Interference

Rhythm-typology revisited.

Modeling function word errors in DNN-HMM based LVCSR systems

SIE: Speech Enabled Interface for E-Learning

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Voice conversion through vector quantization

A Web Based Annotation Interface Based of Wheel of Emotions. Author: Philip Marsh. Project Supervisor: Irena Spasic. Project Moderator: Matthew Morgan

Speech Recognition at ICSI: Broadcast News and beyond

Automatic Pronunciation Checker

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Speech Recognition by Indexing and Sequencing

Body-Conducted Speech Recognition and its Application to Speech Support System

THE RECOGNITION OF SPEECH BY MACHINE

Word Stress and Intonation: Introduction

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Automatic intonation assessment for computer aided language learning

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

On the Formation of Phoneme Categories in DNN Acoustic Models

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

/$ IEEE

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Support Vector Machines for Speaker and Language Recognition

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Eyebrows in French talk-in-interaction

Corpus Linguistics (L615)

Classify: by elimination Road signs

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations

Rule Learning With Negation: Issues Regarding Effectiveness

Spoofing and countermeasures for automatic speaker verification

Statistical Parametric Speech Synthesis

Learning Methods in Multilingual Speech Recognition

Edinburgh Research Explorer

The Acquisition of English Intonation by Native Greek Speakers

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Affective Classification of Generic Audio Clips using Regression Models

Reviewed by Florina Erbeli

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

ANNEXURE VII (Part-II) PRACTICAL WORK FIRST YEAR ( )

A Hybrid Text-To-Speech system for Afrikaans

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Guidelines for blind and partially sighted candidates

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

SARDNET: A Self-Organizing Feature Map for Sequences

Word Segmentation of Off-line Handwritten Documents

Innovation and new technologies

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Lecture 9: Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Transcription:

Formant Analysis of Vowels in Emotional States of Oriya Speech for Speaker across Gender Sanjaya Kumar Dash-First Author E_mail id-sanjaya_145@rediff.com, Assistant Professor-Department of Computer Science And Engineering,Orissa Engineering College,Bhubaneswar,Odisha Prof.(Dr.) Sanghamitra Mohanty-Second Author E_mail id-sangham1@rediffmail.com, Former Professor- P.G. Department of Computer Sc. and Application-Utkal University,Odisha ABSTRACT This paper concentrates on formant analysis of the fundamental vowels in emotional states of isolated Oriya Word recognition across gender. Each formant of vowel is analyzed individually across data sets. Out of the eleven types of rasas (Emotional State) available in Indian languages we have tested for five of those due to unavailability of proper corpus needed for this purpose in Oriya language. Five major emotions are studied and their properties are noted across gender. Key words: vowels,formants,emotions,vocas. 1.INTRODUCTION Recognition of emotional speech is no doubt a challenging task. Data collection of real life scenario is often difficult to monitor and acquire. So it needs experienced artist to simulate a specific emotional state. Different types of emotional states are defined as per Paninian Pratishakhya. Those are namely erotic (love) (shringar), mirth (happiness) (hasya), pathetic (sad) (karuna), wrath (anger) (roudra), heroism (blra), terror (fear) (bhayanaka), disgusting (boredom) (bibhatsa), marvellous (adbhuta), quietus, 810

motherly affection (batslya) and devotional (bhakti). Out of these eleven emotions only five types of emotions are available for analysis as recorded data are not available for all the emotions at present due to the non-availability of professional artist who can utter the marked tests properly. Those are anger (R). sadness (K), love (Sh) quietus (S) and normal (N),. Different sentences corresponding to these emotions (rasas) are being recorded. This is also needed for Speech synthesis. By analyzing the parameters and incorporating these parameters in algorithm during prosody analysis for speech synthesis as well as speech recognition a more naturalistic voice can be synthesized and speech recognition will be more accurate. natural sounding male, female and child voices, made possible by the introduction of more powerful and flexible synthesizers and research tools. [4,5] The need for more synthetic voices incorporating extra linguistic and paralinguistic properties as increases, the amount of analysis required also becomes greater. For rule based synthesizer systems problems occur when trying to use extracted data, via acoustic analysis, from different speakers to model different extra linguistic or paralinguistic properties. This strategy may necessitate an overhaul of the rules in general to accommodate the parametric differences (e.g. segment durations, formant values, pitch, vowel turning points, MFCCs) between the speakers utilized in the modeling process.the work is done by using wavesurfer package. The need for more choices in voice qualities is one of the major issues that has been addressed in speech synthesis in recent years [2,3], especially when considering Voice Output Communication Aids (VOCAs) and the increasing needs of users of such devices. More emphasis has been placed on the research and production of more 2. EMOTIONS IN SPEECH SIGNAL Speech signals carry different features, which need detailed study across gender for making a standard database of different linguistic and paralinguistic factors. These features again are influenced by different factors like accent and emotion etc. For emotion 811

recognition different features like pitch, energy, formants and mel frequency cepstral coefficients are the base units. Formant is the most basic aspect as it is the natural resonances inside the vocal tract which can be represented through the natural frequencies that represent the excitation source to the output[1]. Studies on this aspect gives a good differentiation of different Emotional states across gender. Emotion recognition occurs in three states feature extraction, feature selection and feature classification. The most fundamental feature, the formants are extracted, then analysis is done for the study of their properties in different emotional states. Section 2 gives a description of the data collection, representation and analysis. Section 3 has the results and discussion while in Section 4 the Conclusion drawn is given. 2.1 Data creation and analysis For the recognition of emotions in isolated words in Oriya speech five types of emotional states are recorded and their corresponding vowels are analyzed. Because of non-availability of trained professional actors, we are unable to record all sorts of (emotions) rasas, which are specified in above section. We have recorded some specified words, which reflect the required emotions. For the analysis we have taken the voice recording of three male speaker and three female speakers. A total of 750 words are tested for different emotions, The vowels are the most interesting class of sound in any language. Most of the Indian languages have their origin from Sanskrit. As far as the Indian languages are concerned, the utterance of vowels is pretty modular and significant. The vowels are uttered independently. Out of nine, there are five fundamental vowels and they are /a/, /i/, /u/, /e/, /o/. A vowel is classified on the basis of nasality,pitch variation and duration.speech is controlled by the vowels in general and these vowels control the accents and emotions of any speaker.all these above vowels are common to above datasets 2.2 Data Representation For all sets of data each formant is ordered in terms of its frequency value. This gave a direct comparison in terms of individual formant in order of its frequency values (Table. 1) with respect to male and female speakers. 812

Table 1: Each vowel formant is listed in ascending order of formant frequency for each Gender. F0 Wrath Pathetic Erotic Normal Quietus (Roudra) ( karuna ) (Shringar) Male Female Male ) Female Male Female Male Female Male Female Lowest /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /u/ / u/ /u/ / u/ /u/ /u/ /u/ /e/ /u/ /u/ /o/ /o/ /e/ /e/ /o/ /o/ /o/ /o/ /e/ /o/ /e/ /e/ /o/ /o/ /e/ /e/ /e/ /u/ /o/ /e/ Highest /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ F1 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest / a/ /u/ /u/ /u/ /u/ lu/ /u/ /ui ml /uj /u/ /o/ /o/ /o/ /a/ /o/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /o/ /o/ /o/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ Highes /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ 813

F2 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /u/ /a/ /o/ /e/ /o/ /u/ /o/ /o/ /e/ /o/ /e/ /u/ /e/ /a/ /a/ /e/ /e/ /a/ /o/ /a/ /o/ /i/ /a/ /o/ /u/ /o/ /a/ /e/ /u/ /e/ /a/ /e/ /u/ /u/ /e/ /a/ /u/ /u/ /a/ /u/ Highest /i/ /o/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ F3 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /i/ /a/ /a/ /e/ /a/ /e/ /o/ /i/ /o/ /o/ /a/ /i/ /o/ /o/ /o/ /i/ /e/ /o/ /i/ /i/ /e/ /u/ /e/ /u/ /u/ /o/ /a/ /e/ /u/ /a/ /o/ /e/ /u/ /a/ /u/ /a/ /i/ /a/ /e/ /u/ Highest /u/ /o/ /u/ /i/ /e/ /u/ /u/ /u/ /a/ /e/ Listing each vowel formant in order of its frequency value was chosen here purely for its simplicity. The variation in formant frequency for the same vowel sound was therefore overcome by making each of the individual vowel F0 formant frequencies proportional to the highest F0 formant frequency value. Thus, the formant in the highest position attained a value of 100%. The same procedure is repeated for the FI, F2, F3 formants (where possible). 2.3 Data Analysis For different emotions recordings were done at an average of duration 30 milliseconds with a sampling rate of 22050Hz. With FFT filtering and hamming window of size 128. For each data set, the male and female data was arranged so that the order of the Vowel was identical. The mean was the calculated for comparison. This will give a perfect suggestion of comparison between the male and female data sets in terms of formant frequencies. 814

3. RESULT AND DISCUSSION 3.1 Comparison across Gender and Emotion For each of the sets of data the following results were obtained for male and female formant frequency position across all vowels (Table 2). For the comparison across emotion, the male female data are to be analysed separately. The results give the Mean of each vowel. These results can be presented in the graphical format. During speaker identification, vowels play important role. With different emotions the pitch of a person varies. However a proper identification of the vowels through their formants will help in the identification, as the variations are quite distinct in case of male and female. In the identification engine incorporation of this aspect can help in a more efficient identification process of speaker. Table 2: Mean of Several Formant s value in Hz of vowel /a/ for all speakers Formant Mean Male Female F0 604.3333 743.6667 Wrath F1 1100.667 1430.333 F2 2602.333 2654.37 F3 3647 3624 F0 604.3333 709 Pathetic F1 1202 1400.5 F2 2602.333 2970.5 F3 3372.333 4062.5 F0 600 729.3333 Erotic F1 1147.6667 1349.667 F2 2468.666 2836.333 F3 3424.3333 4053 F0 581.6667 715 Normal F1 1182.667 1460 F2 2673 2946.333 F3 3844 4056.333 F0 677 699.6667 Quietus F1 1233.667 2168 F2 2671 2900 F3 3800 4055 815

4. CONCLUSION According to the result the vowel /i/ has the lowest FO Formant value while the vowel /a/ has highest F0 value. i.e. when a speaker is speaking he/ she is giving small stress on vowel /i/ in any of the notional state and giving more stress on vowel /a/. Apart from these two vowels we can observe that all values are not same in all of the cases of vowels. Similarly we can see the Table 2 and we find that male FO value has the lower than the female speaker. Generally the female formants are at a higher level i.e. nearly at 700Hz..This can be taken as an important feature during emotional speech recognition across gender. ACKNOWLDGEMENT REFERENCES [1] Rabinier L. and Juang B.H. Fundamentals of Speech Recognition, Prentice Hall, (1993). [2] Karlson, I. Female voices in spcech synthesis, Journal of phonetics, Vol. 19, (1991). [3] Carlson, I. Modelling voice variations in female speech synthesis, Speech communication, Vol, 11, (1192). [4] Carlson, R., Granstrom, B., Karlson, I. Experiments with voice modelling in speech synthesis, Speech communication, Vol. 10, (1991). [5] Maitland, P., Whiteside, S. P., Beet, S.W., Baghai Ravary, L., Analysis of Ten Vowel sounds across gender And Regional/Cultural Accent. [6] Mohanty, S., Bhattacharya, S., Bose, S., Swain, S., Recognition of Vowels in Indian Language Paradigm for Designing a Speech Recogniser: A Pattern Recognition Approach, ISCA, (2004). (7] Mohanty S Bhattacharya S., Bose S., Swain S., in Approach to Parametric Base M0od Analysis in Oriya Speech Processing Proceedings of Frontier of Research in Speech and Music (FRSM), II CSRA Kolkata, India, (2005). [8] Oh Wook Kwon et. al. Emotion Recognition by Speech Signal EUROSPEECH CIENEVA., 2003 [9] Miriam. et al., Acoustal Analysis of Spectral and Temporal Changes in Emotional Speech. [10] Toivanen,J. et. Al. Automatic recognition of Emotion in Spoken Finnish:Preliminary Results and Application.. 816