Formant Analysis of Vowels in Emotional States of Oriya Speech for Speaker across Gender
|
|
- Kerrie Bathsheba Carr
- 6 years ago
- Views:
Transcription
1 Formant Analysis of Vowels in Emotional States of Oriya Speech for Speaker across Gender Sanjaya Kumar Dash-First Author E_mail Assistant Professor-Department of Computer Science And Engineering,Orissa Engineering College,Bhubaneswar,Odisha Prof.(Dr.) Sanghamitra Mohanty-Second Author E_mail Former Professor- P.G. Department of Computer Sc. and Application-Utkal University,Odisha ABSTRACT This paper concentrates on formant analysis of the fundamental vowels in emotional states of isolated Oriya Word recognition across gender. Each formant of vowel is analyzed individually across data sets. Out of the eleven types of rasas (Emotional State) available in Indian languages we have tested for five of those due to unavailability of proper corpus needed for this purpose in Oriya language. Five major emotions are studied and their properties are noted across gender. Key words: vowels,formants,emotions,vocas. 1.INTRODUCTION Recognition of emotional speech is no doubt a challenging task. Data collection of real life scenario is often difficult to monitor and acquire. So it needs experienced artist to simulate a specific emotional state. Different types of emotional states are defined as per Paninian Pratishakhya. Those are namely erotic (love) (shringar), mirth (happiness) (hasya), pathetic (sad) (karuna), wrath (anger) (roudra), heroism (blra), terror (fear) (bhayanaka), disgusting (boredom) (bibhatsa), marvellous (adbhuta), quietus, 810
2 motherly affection (batslya) and devotional (bhakti). Out of these eleven emotions only five types of emotions are available for analysis as recorded data are not available for all the emotions at present due to the non-availability of professional artist who can utter the marked tests properly. Those are anger (R). sadness (K), love (Sh) quietus (S) and normal (N),. Different sentences corresponding to these emotions (rasas) are being recorded. This is also needed for Speech synthesis. By analyzing the parameters and incorporating these parameters in algorithm during prosody analysis for speech synthesis as well as speech recognition a more naturalistic voice can be synthesized and speech recognition will be more accurate. natural sounding male, female and child voices, made possible by the introduction of more powerful and flexible synthesizers and research tools. [4,5] The need for more synthetic voices incorporating extra linguistic and paralinguistic properties as increases, the amount of analysis required also becomes greater. For rule based synthesizer systems problems occur when trying to use extracted data, via acoustic analysis, from different speakers to model different extra linguistic or paralinguistic properties. This strategy may necessitate an overhaul of the rules in general to accommodate the parametric differences (e.g. segment durations, formant values, pitch, vowel turning points, MFCCs) between the speakers utilized in the modeling process.the work is done by using wavesurfer package. The need for more choices in voice qualities is one of the major issues that has been addressed in speech synthesis in recent years [2,3], especially when considering Voice Output Communication Aids (VOCAs) and the increasing needs of users of such devices. More emphasis has been placed on the research and production of more 2. EMOTIONS IN SPEECH SIGNAL Speech signals carry different features, which need detailed study across gender for making a standard database of different linguistic and paralinguistic factors. These features again are influenced by different factors like accent and emotion etc. For emotion 811
3 recognition different features like pitch, energy, formants and mel frequency cepstral coefficients are the base units. Formant is the most basic aspect as it is the natural resonances inside the vocal tract which can be represented through the natural frequencies that represent the excitation source to the output[1]. Studies on this aspect gives a good differentiation of different Emotional states across gender. Emotion recognition occurs in three states feature extraction, feature selection and feature classification. The most fundamental feature, the formants are extracted, then analysis is done for the study of their properties in different emotional states. Section 2 gives a description of the data collection, representation and analysis. Section 3 has the results and discussion while in Section 4 the Conclusion drawn is given. 2.1 Data creation and analysis For the recognition of emotions in isolated words in Oriya speech five types of emotional states are recorded and their corresponding vowels are analyzed. Because of non-availability of trained professional actors, we are unable to record all sorts of (emotions) rasas, which are specified in above section. We have recorded some specified words, which reflect the required emotions. For the analysis we have taken the voice recording of three male speaker and three female speakers. A total of 750 words are tested for different emotions, The vowels are the most interesting class of sound in any language. Most of the Indian languages have their origin from Sanskrit. As far as the Indian languages are concerned, the utterance of vowels is pretty modular and significant. The vowels are uttered independently. Out of nine, there are five fundamental vowels and they are /a/, /i/, /u/, /e/, /o/. A vowel is classified on the basis of nasality,pitch variation and duration.speech is controlled by the vowels in general and these vowels control the accents and emotions of any speaker.all these above vowels are common to above datasets 2.2 Data Representation For all sets of data each formant is ordered in terms of its frequency value. This gave a direct comparison in terms of individual formant in order of its frequency values (Table. 1) with respect to male and female speakers. 812
4 Table 1: Each vowel formant is listed in ascending order of formant frequency for each Gender. F0 Wrath Pathetic Erotic Normal Quietus (Roudra) ( karuna ) (Shringar) Male Female Male ) Female Male Female Male Female Male Female Lowest /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /u/ / u/ /u/ / u/ /u/ /u/ /u/ /e/ /u/ /u/ /o/ /o/ /e/ /e/ /o/ /o/ /o/ /o/ /e/ /o/ /e/ /e/ /o/ /o/ /e/ /e/ /e/ /u/ /o/ /e/ Highest /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ /a/ F1 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest / a/ /u/ /u/ /u/ /u/ lu/ /u/ /ui ml /uj /u/ /o/ /o/ /o/ /a/ /o/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /a/ /o/ /a/ /a/ /o/ /o/ /o/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ /e/ Highes /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ 813
5 F2 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /u/ /a/ /o/ /e/ /o/ /u/ /o/ /o/ /e/ /o/ /e/ /u/ /e/ /a/ /a/ /e/ /e/ /a/ /o/ /a/ /o/ /i/ /a/ /o/ /u/ /o/ /a/ /e/ /u/ /e/ /a/ /e/ /u/ /u/ /e/ /a/ /u/ /u/ /a/ /u/ Highest /i/ /o/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ /i/ F3 Wrath Pathetic Erotic Normal Quietus (Roudra) (karuna) (Shringar) Male Female Male Female Male Female Male Female Male Female Lowest /i/ /a/ /a/ /e/ /a/ /e/ /o/ /i/ /o/ /o/ /a/ /i/ /o/ /o/ /o/ /i/ /e/ /o/ /i/ /i/ /e/ /u/ /e/ /u/ /u/ /o/ /a/ /e/ /u/ /a/ /o/ /e/ /u/ /a/ /u/ /a/ /i/ /a/ /e/ /u/ Highest /u/ /o/ /u/ /i/ /e/ /u/ /u/ /u/ /a/ /e/ Listing each vowel formant in order of its frequency value was chosen here purely for its simplicity. The variation in formant frequency for the same vowel sound was therefore overcome by making each of the individual vowel F0 formant frequencies proportional to the highest F0 formant frequency value. Thus, the formant in the highest position attained a value of 100%. The same procedure is repeated for the FI, F2, F3 formants (where possible). 2.3 Data Analysis For different emotions recordings were done at an average of duration 30 milliseconds with a sampling rate of 22050Hz. With FFT filtering and hamming window of size 128. For each data set, the male and female data was arranged so that the order of the Vowel was identical. The mean was the calculated for comparison. This will give a perfect suggestion of comparison between the male and female data sets in terms of formant frequencies. 814
6 3. RESULT AND DISCUSSION 3.1 Comparison across Gender and Emotion For each of the sets of data the following results were obtained for male and female formant frequency position across all vowels (Table 2). For the comparison across emotion, the male female data are to be analysed separately. The results give the Mean of each vowel. These results can be presented in the graphical format. During speaker identification, vowels play important role. With different emotions the pitch of a person varies. However a proper identification of the vowels through their formants will help in the identification, as the variations are quite distinct in case of male and female. In the identification engine incorporation of this aspect can help in a more efficient identification process of speaker. Table 2: Mean of Several Formant s value in Hz of vowel /a/ for all speakers Formant Mean Male Female F Wrath F F F F Pathetic F F F F Erotic F F F F Normal F F F F Quietus F F F
7 4. CONCLUSION According to the result the vowel /i/ has the lowest FO Formant value while the vowel /a/ has highest F0 value. i.e. when a speaker is speaking he/ she is giving small stress on vowel /i/ in any of the notional state and giving more stress on vowel /a/. Apart from these two vowels we can observe that all values are not same in all of the cases of vowels. Similarly we can see the Table 2 and we find that male FO value has the lower than the female speaker. Generally the female formants are at a higher level i.e. nearly at 700Hz..This can be taken as an important feature during emotional speech recognition across gender. ACKNOWLDGEMENT REFERENCES [1] Rabinier L. and Juang B.H. Fundamentals of Speech Recognition, Prentice Hall, (1993). [2] Karlson, I. Female voices in spcech synthesis, Journal of phonetics, Vol. 19, (1991). [3] Carlson, I. Modelling voice variations in female speech synthesis, Speech communication, Vol, 11, (1192). [4] Carlson, R., Granstrom, B., Karlson, I. Experiments with voice modelling in speech synthesis, Speech communication, Vol. 10, (1991). [5] Maitland, P., Whiteside, S. P., Beet, S.W., Baghai Ravary, L., Analysis of Ten Vowel sounds across gender And Regional/Cultural Accent. [6] Mohanty, S., Bhattacharya, S., Bose, S., Swain, S., Recognition of Vowels in Indian Language Paradigm for Designing a Speech Recogniser: A Pattern Recognition Approach, ISCA, (2004). (7] Mohanty S Bhattacharya S., Bose S., Swain S., in Approach to Parametric Base M0od Analysis in Oriya Speech Processing Proceedings of Frontier of Research in Speech and Music (FRSM), II CSRA Kolkata, India, (2005). [8] Oh Wook Kwon et. al. Emotion Recognition by Speech Signal EUROSPEECH CIENEVA., 2003 [9] Miriam. et al., Acoustal Analysis of Spectral and Temporal Changes in Emotional Speech. [10] Toivanen,J. et. Al. Automatic recognition of Emotion in Spoken Finnish:Preliminary Results and Application.. 816
Human Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationQuarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationSpeech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationExpressive speech synthesis: a review
Int J Speech Technol (2013) 16:237 260 DOI 10.1007/s10772-012-9180-2 Expressive speech synthesis: a review D. Govind S.R. Mahadeva Prasanna Received: 31 May 2012 / Accepted: 11 October 2012 / Published
More informationSpeaker Identification by Comparison of Smart Methods. Abstract
Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationA Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language
A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
More informationSpeaker recognition using universal background model on YOHO database
Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,
More informationSpeaker Recognition. Speaker Diarization and Identification
Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationSegregation of Unvoiced Speech from Nonspeech Interference
Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27
More informationRhythm-typology revisited.
DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSIE: Speech Enabled Interface for E-Learning
SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationPerceptual scaling of voice identity: common dimensions for different vowels and speakers
DOI 10.1007/s00426-008-0185-z ORIGINAL ARTICLE Perceptual scaling of voice identity: common dimensions for different vowels and speakers Oliver Baumann Æ Pascal Belin Received: 15 February 2008 / Accepted:
More informationRevisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab
Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationVoice conversion through vector quantization
J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,
More informationA Web Based Annotation Interface Based of Wheel of Emotions. Author: Philip Marsh. Project Supervisor: Irena Spasic. Project Moderator: Matthew Morgan
A Web Based Annotation Interface Based of Wheel of Emotions Author: Philip Marsh Project Supervisor: Irena Spasic Project Moderator: Matthew Morgan Module Number: CM3203 Module Title: One Semester Individual
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationAutomatic Pronunciation Checker
Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationRachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA
LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,
More informationA Cross-language Corpus for Studying the Phonetics and Phonology of Prominence
A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence Bistra Andreeva 1, William Barry 1, Jacques Koreman 2 1 Saarland University Germany 2 Norwegian University of Science and
More informationSpeech Recognition by Indexing and Sequencing
International Journal of Computer Information Systems and Industrial Management Applications. ISSN 215-7988 Volume 4 (212) pp. 358 365 c MIR Labs, www.mirlabs.net/ijcisim/index.html Speech Recognition
More informationBody-Conducted Speech Recognition and its Application to Speech Support System
Body-Conducted Speech Recognition and its Application to Speech Support System 4 Shunsuke Ishimitsu Hiroshima City University Japan 1. Introduction In recent years, speech recognition systems have been
More informationWord Stress and Intonation: Introduction
Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationAutomatic intonation assessment for computer aided language learning
Available online at www.sciencedirect.com Speech Communication 52 (2010) 254 267 www.elsevier.com/locate/specom Automatic intonation assessment for computer aided language learning Juan Pablo Arias a,
More informationUsing Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing
Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,
More informationUNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak
UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationDOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT
More information/$ IEEE
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER 2009 1567 Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog
More informationInstructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100
San Diego State University School of Social Work 610 COMPUTER APPLICATIONS FOR SOCIAL WORK PRACTICE Statistical Package for the Social Sciences Office: Hepner Hall (HH) 100 Instructor: Mario D. Garrett,
More informationSupport Vector Machines for Speaker and Language Recognition
Support Vector Machines for Speaker and Language Recognition W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo MIT Lincoln Laboratory, 244 Wood Street, Lexington, MA
More informationSEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH
SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationIS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?
21 JOURNAL FOR ECONOMIC EDUCATORS, 10(1), SUMMER 2010 IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME? Cynthia Harter and John F.R. Harter 1 Abstract This study investigates the
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationEyebrows in French talk-in-interaction
Eyebrows in French talk-in-interaction Aurélie Goujon 1, Roxane Bertrand 1, Marion Tellier 1 1 Aix Marseille Université, CNRS, LPL UMR 7309, 13100, Aix-en-Provence, France Goujon.aurelie@gmail.com Roxane.bertrand@lpl-aix.fr
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationClassify: by elimination Road signs
WORK IT Road signs 9-11 Level 1 Exercise 1 Aims Practise observing a series to determine the points in common and the differences: the observation criteria are: - the shape; - what the message represents.
More informationA Privacy-Sensitive Approach to Modeling Multi-Person Conversations
A Privacy-Sensitive Approach to Modeling Multi-Person Conversations Danny Wyatt Dept. of Computer Science University of Washington danny@cs.washington.edu Jeff Bilmes Dept. of Electrical Engineering University
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSpoofing and countermeasures for automatic speaker verification
INTERSPEECH 2013 Spoofing and countermeasures for automatic speaker verification Nicholas Evans 1, Tomi Kinnunen 2 and Junichi Yamagishi 3,4 1 EURECOM, Sophia Antipolis, France 2 University of Eastern
More informationStatistical Parametric Speech Synthesis
Statistical Parametric Speech Synthesis Heiga Zen a,b,, Keiichi Tokuda a, Alan W. Black c a Department of Computer Science and Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Personalising speech-to-speech translation Citation for published version: Dines, J, Liang, H, Saheer, L, Gibson, M, Byrne, W, Oura, K, Tokuda, K, Yamagishi, J, King, S, Wester,
More informationThe Acquisition of English Intonation by Native Greek Speakers
The Acquisition of English Intonation by Native Greek Speakers Evia Kainada and Angelos Lengeris Technological Educational Institute of Patras, Aristotle University of Thessaloniki ekainada@teipat.gr,
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationThe IRISA Text-To-Speech System for the Blizzard Challenge 2017
The IRISA Text-To-Speech System for the Blizzard Challenge 2017 Pierre Alain, Nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive, Claude Simon, Marie Tahon IRISA, University of Rennes 1 (ENSSAT),
More informationAffective Classification of Generic Audio Clips using Regression Models
Affective Classification of Generic Audio Clips using Regression Models Nikolaos Malandrakis 1, Shiva Sundaram, Alexandros Potamianos 3 1 Signal Analysis and Interpretation Laboratory (SAIL), USC, Los
More informationReviewed by Florina Erbeli
reviews c e p s Journal Vol.2 N o 3 Year 2012 181 Kormos, J. and Smith, A. M. (2012). Teaching Languages to Students with Specific Learning Differences. Bristol: Multilingual Matters. 232 p., ISBN 978-1-84769-620-5.
More informationInstructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question
1 Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question 2 Instructional Approach(s): The teacher should conduct the Concept
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationSchool Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide
SPECIAL EDUCATION School Year 2017/18 DDS MySped Application SPECIAL EDUCATION Training Guide Revision: July, 2017 Table of Contents DDS Student Application Key Concepts and Understanding... 3 Access to
More informationPHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS
PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS Akella Amarendra Babu 1 *, Ramadevi Yellasiri 2 and Akepogu Ananda Rao 3 1 JNIAS, JNT University Anantapur, Ananthapuramu,
More informationA comparison of spectral smoothing methods for segment concatenation based speech synthesis
D.T. Chappell, J.H.L. Hansen, "Spectral Smoothing for Speech Segment Concatenation, Speech Communication, Volume 36, Issues 3-4, March 2002, Pages 343-373. A comparison of spectral smoothing methods for
More informationOnline Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by:[university of Sussex] On: 15 July 2008 Access Details: [subscription number 776502344] Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationANNEXURE VII (Part-II) PRACTICAL WORK FIRST YEAR ( )
NETAJI SUBHAS OPEN UNIVERSITY SCHOOL OF EDUCATION 25/2 Ballygunge Circular Road, Kolkata-700019 Phone Number: 03340047570/1, Email: schooledu@wbnsou.ac.in a. WORKSHOP BASED PRACTICUM I (50 marks) ANNEXURE
More informationA Hybrid Text-To-Speech system for Afrikaans
A Hybrid Text-To-Speech system for Afrikaans Francois Rousseau and Daniel Mashao Department of Electrical Engineering, University of Cape Town, Rondebosch, Cape Town, South Africa, frousseau@crg.ee.uct.ac.za,
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento
More informationGuidelines for blind and partially sighted candidates
Revised August 2006 Guidelines for blind and partially sighted candidates Our policy In addition to the specific provisions described below, we are happy to consider each person individually if their needs
More informationVimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 1, 1-7, 2012 A Review on Challenges and Approaches Vimala.C Project Fellow, Department of Computer Science
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationQuarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Nord, L. and Hammarberg, B. and Lundström, E. journal:
More informationThe pronunciation of /7i/ by male and female speakers of avant-garde Dutch
The pronunciation of /7i/ by male and female speakers of avant-garde Dutch Vincent J. van Heuven, Loulou Edelman and Renée van Bezooijen Leiden University/ ULCL (van Heuven) / University of Nijmegen/ CLS
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationInnovation and new technologies
Innovation and new technologies in education Centro Cultural Estación Mapocho, Santiago de Chile, October 23th 2015 Jari Lavonen, Department of Teacher Education, University of Helsinki, Finland Jari.Lavonen@Helsinki.Fi
More informationCOMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION
Session 3532 COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION Thad B. Welch, Brian Jenkins Department of Electrical Engineering U.S. Naval Academy, MD Cameron H. G. Wright Department of Electrical
More informationDesigning a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses
Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,
More informationInternational Journal of Advanced Networking Applications (IJANA) ISSN No. :
International Journal of Advanced Networking Applications (IJANA) ISSN No. : 0975-0290 34 A Review on Dysarthric Speech Recognition Megha Rughani Department of Electronics and Communication, Marwadi Educational
More informationModern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization
CS 294-5: Statistical Natural Language Processing Speech Synthesis Lecture 22: 12/4/05 Modern TTS systems 1960 s first full TTS Umeda et al (1968) 1970 s Joe Olive 1977 concatenation of linearprediction
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationLecture 9: Speech Recognition
EE E6820: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 Recognizing speech 2 Feature calculation Dan Ellis Michael Mandel 3 Sequence
More informationLikelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract
More informationOn Developing Acoustic Models Using HTK. M.A. Spaans BSc.
On Developing Acoustic Models Using HTK M.A. Spaans BSc. On Developing Acoustic Models Using HTK M.A. Spaans BSc. Delft, December 2004 Copyright c 2004 M.A. Spaans BSc. December, 2004. Faculty of Electrical
More informationGetting the Story Right: Making Computer-Generated Stories More Entertaining
Getting the Story Right: Making Computer-Generated Stories More Entertaining K. Oinonen, M. Theune, A. Nijholt, and D. Heylen University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands {k.oinonen
More information