Formant Analysis of Vowels in Emotional States of Oriya Speech for Speaker across Gender Sanjaya Kumar Dash-First Author E_mail id-sanjaya_145@rediff.com, Assistant Professor-Department of Computer Science And Engineering,Orissa Engineering College,Bhubaneswar,Odisha Prof.(Dr.) Sanghamitra Mohanty-Second Author E_mail id-sangham1@rediffmail.com, Former Professor- P.G. Department of Computer Sc. and Application-Utkal University,Odisha ABSTRACT This paper concentrates on formant analysis of the fundamental vowels in emotional states of isolated Oriya Word recognition across gender. Each formant of vowel is analyzed individually across data sets. Out of the eleven types of rasas (Emotional State) available in Indian languages we have tested for five of those due to unavailability of proper corpus needed for this purpose in Oriya language. Five major emotions are studied and their properties are noted across gender. Key words: vowels,formants,emotions,vocas. 1.INTRODUCTION Recognition of emotional speech is no doubt a challenging task. Data collection of real life scenario is often difficult to monitor and acquire. So it needs experienced artist to simulate a specific emotional state. Different types of emotional states are defined as per Paninian Pratishakhya. Those are namely erotic (love) (shringar), mirth (happiness) (hasya), pathetic (sad) (karuna), wrath (anger) (roudra), heroism (blra), terror (fear) (bhayanaka), disgusting (boredom) (bibhatsa), marvellous (adbhuta), quietus, 810
motherly affection (batslya) and devotional (bhakti). Out of these eleven emotions only five types of emotions are available for analysis as recorded data are not available for all the emotions at present due to the non-availability of professional artist who can utter the marked tests properly. Those are anger (R). sadness (K), love (Sh) quietus (S) and normal (N),. Different sentences corresponding to these emotions (rasas) are being recorded. This is also needed for Speech synthesis. By analyzing the parameters and incorporating these parameters in algorithm during prosody analysis for speech synthesis as well as speech recognition a more naturalistic voice can be synthesized and speech recognition will be more accurate. natural sounding male, female and child voices, made possible by the introduction of more powerful and flexible synthesizers and research tools. [4,5] The need for more synthetic voices incorporating extra linguistic and paralinguistic properties as increases, the amount of analysis required also becomes greater. For rule based synthesizer systems problems occur when trying to use extracted data, via acoustic analysis, from different speakers to model different extra linguistic or paralinguistic properties. This strategy may necessitate an overhaul of the rules in general to accommodate the parametric differences (e.g. segment durations, formant values, pitch, vowel turning points, MFCCs) between the speakers utilized in the modeling process.the work is done by using wavesurfer package. The need for more choices in voice qualities is one of the major issues that has been addressed in speech synthesis in recent years [2,3], especially when considering Voice Output Communication Aids (VOCAs) and the increasing needs of users of such devices. More emphasis has been placed on the research and production of more 2. EMOTIONS IN SPEECH SIGNAL Speech signals carry different features, which need detailed study across gender for making a standard database of different linguistic and paralinguistic factors. These features again are influenced by different factors like accent and emotion etc. For emotion 811
recognition different features like pitch, energy, formants and mel frequency cepstral coefficients are the base units. Formant is the most basic aspect as it is the natural resonances inside the vocal tract which can be represented through the natural frequencies that represent the excitation source to the output[1]. Studies on this aspect gives a good differentiation of different Emotional states across gender. Emotion recognition occurs in three states feature extraction, feature selection and feature classification. The most fundamental feature, the formants are extracted, then analysis is done for the study of their properties in different emotional states. Section 2 gives a description of the data collection, representation and analysis. Section 3 has the results and discussion while in Section 4 the Conclusion drawn is given. 2.1 Data creation and analysis For the recognition of emotions in isolated words in Oriya speech five types of emotional states are recorded and their corresponding vowels are analyzed. Because of non-availability of trained professional actors, we are unable to record all sorts of (emotions) rasas, which are specified in above section. We have recorded some specified words, which reflect the required emotions. For the analysis we have taken the voice recording of three male speaker and three female speakers. A total of 750 words are tested for different emotions, The vowels are the most interesting class of sound in any language. Most of the Indian languages have their origin from Sanskrit. As far as the Indian languages are concerned, the utterance of vowels is pretty modular and significant. The vowels are uttered independently. Out of nine, there are five fundamental vowels and they are /a/, /i/, /u/, /e/, /o/. A vowel is classified on the basis of nasality,pitch variation and duration.speech is controlled by the vowels in general and these vowels control the accents and emotions of any speaker.all these above vowels are common to above datasets 2.2 Data Representation For all sets of data each formant is ordered in terms of its frequency value. This gave a direct comparison in terms of individual formant in order of its frequency values (Table. 1) with respect to male and female speakers. 812
Table 1: Each vowel formant is listed in ascending order of formant frequency for each Gender. F0 Wrath Pathetic Erotic Normal Quietus (Roudra) ( karuna ) (Shringar) Male Female Male ) Female Male Female Male Female Male Female Lowest /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /u/ / u/ /u/ / u/ /u/ /u/ /u/ /e/ /u/ /u/ /o/ /o/ /e/ /e/ /o/ /o/ /o/ /o/ /e/ /o/ /e/ /e/ /o/ /o/ /e/ /e/ /e/ /u/ /o/ /e/ Highest /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ F1 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest / a/ /u/ /u/ /u/ /u/ lu/ /u/ /ui ml /uj /u/ /o/ /o/ /o/ /a/ /o/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /o/ /o/ /o/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ Highes /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ 813
F2 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /u/ /a/ /o/ /e/ /o/ /u/ /o/ /o/ /e/ /o/ /e/ /u/ /e/ /a/ /a/ /e/ /e/ /a/ /o/ /a/ /o/ /i/ /a/ /o/ /u/ /o/ /a/ /e/ /u/ /e/ /a/ /e/ /u/ /u/ /e/ /a/ /u/ /u/ /a/ /u/ Highest /i/ /o/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ F3 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /i/ /a/ /a/ /e/ /a/ /e/ /o/ /i/ /o/ /o/ /a/ /i/ /o/ /o/ /o/ /i/ /e/ /o/ /i/ /i/ /e/ /u/ /e/ /u/ /u/ /o/ /a/ /e/ /u/ /a/ /o/ /e/ /u/ /a/ /u/ /a/ /i/ /a/ /e/ /u/ Highest /u/ /o/ /u/ /i/ /e/ /u/ /u/ /u/ /a/ /e/ Listing each vowel formant in order of its frequency value was chosen here purely for its simplicity. The variation in formant frequency for the same vowel sound was therefore overcome by making each of the individual vowel F0 formant frequencies proportional to the highest F0 formant frequency value. Thus, the formant in the highest position attained a value of 100%. The same procedure is repeated for the FI, F2, F3 formants (where possible). 2.3 Data Analysis For different emotions recordings were done at an average of duration 30 milliseconds with a sampling rate of 22050Hz. With FFT filtering and hamming window of size 128. For each data set, the male and female data was arranged so that the order of the Vowel was identical. The mean was the calculated for comparison. This will give a perfect suggestion of comparison between the male and female data sets in terms of formant frequencies. 814
3. RESULT AND DISCUSSION 3.1 Comparison across Gender and Emotion For each of the sets of data the following results were obtained for male and female formant frequency position across all vowels (Table 2). For the comparison across emotion, the male female data are to be analysed separately. The results give the Mean of each vowel. These results can be presented in the graphical format. During speaker identification, vowels play important role. With different emotions the pitch of a person varies. However a proper identification of the vowels through their formants will help in the identification, as the variations are quite distinct in case of male and female. In the identification engine incorporation of this aspect can help in a more efficient identification process of speaker. Table 2: Mean of Several Formant s value in Hz of vowel /a/ for all speakers Formant Mean Male Female F0 604.3333 743.6667 Wrath F1 1100.667 1430.333 F2 2602.333 2654.37 F3 3647 3624 F0 604.3333 709 Pathetic F1 1202 1400.5 F2 2602.333 2970.5 F3 3372.333 4062.5 F0 600 729.3333 Erotic F1 1147.6667 1349.667 F2 2468.666 2836.333 F3 3424.3333 4053 F0 581.6667 715 Normal F1 1182.667 1460 F2 2673 2946.333 F3 3844 4056.333 F0 677 699.6667 Quietus F1 1233.667 2168 F2 2671 2900 F3 3800 4055 815
4. CONCLUSION According to the result the vowel /i/ has the lowest FO Formant value while the vowel /a/ has highest F0 value. i.e. when a speaker is speaking he/ she is giving small stress on vowel /i/ in any of the notional state and giving more stress on vowel /a/. Apart from these two vowels we can observe that all values are not same in all of the cases of vowels. Similarly we can see the Table 2 and we find that male FO value has the lower than the female speaker. Generally the female formants are at a higher level i.e. nearly at 700Hz..This can be taken as an important feature during emotional speech recognition across gender. ACKNOWLDGEMENT REFERENCES [1] Rabinier L. and Juang B.H. Fundamentals of Speech Recognition, Prentice Hall, (1993). [2] Karlson, I. Female voices in spcech synthesis, Journal of phonetics, Vol. 19, (1991). [3] Carlson, I. Modelling voice variations in female speech synthesis, Speech communication, Vol, 11, (1192). [4] Carlson, R., Granstrom, B., Karlson, I. Experiments with voice modelling in speech synthesis, Speech communication, Vol. 10, (1991). [5] Maitland, P., Whiteside, S. P., Beet, S.W., Baghai Ravary, L., Analysis of Ten Vowel sounds across gender And Regional/Cultural Accent. [6] Mohanty, S., Bhattacharya, S., Bose, S., Swain, S., Recognition of Vowels in Indian Language Paradigm for Designing a Speech Recogniser: A Pattern Recognition Approach, ISCA, (2004). (7] Mohanty S Bhattacharya S., Bose S., Swain S., in Approach to Parametric Base M0od Analysis in Oriya Speech Processing Proceedings of Frontier of Research in Speech and Music (FRSM), II CSRA Kolkata, India, (2005). [8] Oh Wook Kwon et. al. Emotion Recognition by Speech Signal EUROSPEECH CIENEVA., 2003 [9] Miriam. et al., Acoustal Analysis of Spectral and Temporal Changes in Emotional Speech. [10] Toivanen,J. et. Al. Automatic recognition of Emotion in Spoken Finnish:Preliminary Results and Application.. 816