19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
|
|
- Bertram Shields
- 6 years ago
- Views:
Transcription
1 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 THE INFLUENCE OF LINGUISTIC AND EXTRA-LINGUISTIC INFORMATION ON SYNTHETIC SPEECH INTELLIGIBILITY PACS: Bp Gardzielewska, Hanna Institute of Acoustics, Adam Mickiewicz University, Umultowska 85, Poznan, Poland; ABSTRACT The key objective of the present study was to determine the relationship between the data reduction of Polish speech (the number of tones reproducing the speech signal) and its intelligibility. A more specific aim was to determine how synthetic speech intelligibility depends on the content of the linguistic information it carries, and so-called extra-linguistic information. A new sine-wave synthesis method was proposed for this analysis, which enabled high level results for Polish synthetic speech intelligibility to be achieved. Speech intelligibility was tested in different synthetic speech material, varying in grammatical structure, semantic information content, and the acoustic characteristics of the talker. INTRODUCTION Linguistic information is very resistant to distortion and spectral information reduction, as has been confirmed by numerous research results obtained, among others, by Remez et al [1] [2]. In their studies they treated speech signals with the SineWave Synthesis. In this synthesis, the changing pattern of vocal resonances (formants) is modeled by a limited number of tones reflecting the spectral dynamics and the structure of the signal. The synthesis rejects all the detailed acoustic information carried by a signal, including fundamental frequency, as well as harmonic and noise components. The reproduced sounds lose their naturalness but still remain intelligible. Most of the studies on intelligibility of SWS-compressed sounds [1] [3], [4] refer only to English, which is a vowel-dominated language. In contrast to English, the Polish language is consonant-dominant. The results of Polish synthetic speech intelligibility obtained with original SineWave Synthesis method turned out to be unintelligible [5]. An alternative technique of sinusoidal speech synthesis was proposed. It was based on the number of dominant frequency components presented in the original signal. In the proposed method, only the frequency components with the highest amplitude were reproduced, instead of the exact formant frequencies. The amplitudes and frequencies of the dominant frequency components were derived with 20ms resolution. Because of the large amount of energy in the high frequency range in Polish speech [6], the range of dominant frequency components being tracked incorporated a band from 200Hz up to 8kHz, at a sampling frequency equal to 16 khz. Perceptual tests performed on Polish speech show that signals synthesized with the proposed technique were judged as more natural and intelligible than SWS speech. The modified SWS method, elaborated in Adam Mickiewicz University in Poznan, provided 2.4 times better results for three tones used for Polish speech synthesis than the original method. The new method was used for Polish speech intelligibility analysis. The influence of extra-linguistic information on speech intelligibility was tested in different sentences varying in the content of acoustic characteristics of the talker (single versus different talker). The intelligibility results of sentences were compared with the intelligibility results of utterances devoid of grammatical structure and logical coherence (three unrelated words) or utterances devoid of semantic information (three words without meaning).
2 EXPERIMENT In order to determine whether in such unusual phonetic conditions (where the subjects were asked to perceive very distorted linguistic information) changes in the acoustic characteristics of a signal (different talkers) still affect speech intelligibility, sentences uttered by different and single talker were analyzed. The influences of grammar and logical structure on speech intelligibility were tested on utterances consisting of three unrelated words, uttered by a single talker. The intelligibility of utterances without semantic information was tested on three unrelated logatoms (words without meaning) uttered by a single talker. Subjects Forty four (10 women and 34 men) participants, aged 20 to 24, took part in the experiment. The participants were native speakers of Polish who reported no past or present hearing disorders and qualified as having normal hearing (normal hearing was defined as the audiometric threshold 20dB HL or better, for a frequency range from 250Hz to 8000Hz, ANS, 1996). The participants had previous experience in synthetic speech intelligibility assessment and were paid for their participation in the experiment. Speech material and equipment For sentence intelligibility testing the CORPORA multi-talker database, designed for automated recognition of Polish speech [7], was used. 27 sentences were picked at random from 114 possible sentences contained in the database. The average number of words in each sentence was 5, so that gave approximately 140 words for the list. The sentences of the list were pronounced either by different talkers or by a single, male or female talker, depending on the choice of experimental conditions. The duration of each sentence was on average 2 seconds. For testing utterances without grammatical and logical coherence, words selected from a frequency and phonetics balanced recorded wordlist for the Polish language, elaborated by W. Jassem were used. The list of signals was composed of 27 utterances, each comprising three words. The duration of each utterance corresponded to the duration of sentences, which was about 2 seconds. All the utterances were generated only by one male-talker. The signals presented in the last list were logatoms, nonsense words selected from a structurally and phonetically balanced list of logatoms for the Polish language [9]. In total 81 logatoms were randomly selected, from which 27 expressions were composed, each consisting of three logatoms. The duration of each expression was equal on average to 2 seconds. All expressions were synthesized with three tones. Procedure Five experimental conditions were tested. In the first listening session, participants listened to synthetic sentences. Participants were randomly divided into two groups of 20 and 24 persons. Sentences were generated either by different talkers or by single talker (2 conditions: DT, ST). The sentences produced by different talkers were presented to the first group and the other group only listened to the sentences generated by a single talker. The last group of participants was further divided into two subgroups: twelve participants listened to utterances presented by a female talker and twelve listened to utterances produced by a male talker (2 conditions: STF, STM). This was done in order to take into account the fact that with increasing the fundamental frequency of voice, the difficulty of determining the formant frequency increases as well. It usually leads to the conclusion that a female voice is less intelligible than a male voice [6] [10]. Comparison of the results obtained from both groups made it possible to evaluate the degree to which a change in the acoustic characteristics of a phonetically distorted signal (extra-linguistic information) affects the perception of linguistic information. In the second listening session participants assessed the intelligibility of 27 utterances built of three unrelated words (1 condition: N3). In the last listening session the participants assessed the recognition of 3 logatoms (1 condition: L3). Participants typed the contents of each utterance the way they heard it in a special dialogue box. The utterance typed by each participant was then compared to the original utterance. The 2
3 five experimental conditions were named as follow: DT, STF, STM, N3, L3. On the basis of the collected results the word s intelligibility expressed in percentage was assessed. RESULTS AND GENERAL DISCUSSION The speech intelligibility results, expressed as the percentage of average words correct for each list, are presented in Figure 1. No statistically significant interaction of participants genderresponses was obtained (STF and STM). The results obtained from the participants of different gender for sentences were averaged (ST). 100 percent words correct ST N3 DT L3 speech material Figure 1. The averaged percent words correct in 2s utterances for various speech material condition (sentences generated by single talker, ST; three unrelated words generated by single talker, N3; sentences generated by different talkers, DT; three unrelated logatoms generated by single talker, L3). Error bars indicate values of a standard deviation. Comparing the results on the intelligibility of two-second utterances, obtained in the experiment it may be concluded that a lower diversity of acoustic attributes and a higher coherence of information content improve the intelligibility of a speech signal. Preservation of grammar and the logical continuity of an utterance significantly facilitate recognition of individual words. However, three-word phrases of random words, devoid of grammatical and contextual cohesion, turned out to be easier to memorize and recognize than logical sentences uttered by different talkers. According to the results, the acoustic attributes of a talker (extra-linguistic information) cannot be neglected in speech perception, even in the case of synthetic speech. The results obtained once more confirm that paying attention to spoken words involves paying attention to the voice, which is reflected in the speech intelligibility scores [11]. Despite being prepared to receive a semantic message in such unnatural acoustic conditions, listeners showed evidence of integral processing of changes in their acoustic environment, namely talker-specific attributes, along with the processing (recognition) of linguistic attributes of a signal [12], [13], [14]. Diversity of extra-linguistic information has a great impact on correct synthetic speech signal identification. The results show that linguistic information reduction carried by the signal (sentences versus words without logical or grammatical coherence) has a lower impact on speech intelligibility results than speaker acoustic characteristic variation. The percentage of words correctly identified in utterances devoid of logical and grammatical coherence (N3) uttered by a single talker were 6% better identified than words in logical sentences uttered by a different talkers (DT). 3
4 The intelligibility of words in sentences uttered by different talkers (DT) was 12% lower than the intelligibility of the same words but uttered by single talker (ST). The results indicate differences in the perceptual processing of words, resulting not only from the physical realization of utterances [15], [16], but also from grammatical and semantic utterance information content. Preservation of grammar and the logical continuity of an utterance significantly facilitate recognition of individual words. In the case of logatoms, lack of any particular meaning made it almost impossible for subjects to reproduce them correctly. The results demonstrate how synthetic speech intelligibility is dependent on correct perceptual matching of the phonetic characteristics of heard sounds with phonetic characteristics stored in listener s long term memory [17], [18] [19] [20]. In cases when there are no original phonetic characteristics in the listener s long term memory, synthetic speech perception turned out to be almost impossible on the basis of such limited acoustic information. Perceptual matching, in principle, facilitates the invariability of a talker acoustic characteristics. The way the speech sounds are generated has secondary meaning. CONCLUSIONS Talker acoustic characteristics cannot be neglected in analyzing synthetic speech intelligibility. Extra-linguistic information has an even higher impact on synthetic speech intelligibility than the reduction of linguistic content of a signal. ACKNOWLEDGMENTS This research was supported by KBN Grant No. 4T07B References: [1] R. E. Remez, P. E. Rubin, D. B. Pisoni, T. D. Carrell: Speech perception without traditional speech cues. Science 212 (1981) [2] R. E. Remez, P. E. Rubin, S. E. Berns, J. S. Pardo, J. M. Lang: On the perceptual organization of speech. Psychological Review 101 (1994) [3] R. Q. McAulay, T. F. Quatieri: Speech analysis-synthesis based on a sinusoidal representation., IEEE Trans. ASSP 34 (1986) [4] M. Dorman, P. Loizou, D. Rainey: Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. Journal of the Acoustical Society of America 102 (1997) [5] H. Wojciechowska: Speech data reduction versus speech intelligibility. Polish-German Structured Conference on Acoustics (2004) [6] W. Jassem: Podstawy fonetyki akustycznej, PWN, Warszawa, [7] S. Grocholewski: Corpora-speech database for polish diphones. Eurospeech'97 (1997), [8] W. Jassem: Frequency and phonetics balanced polish wordlists. Speech and language technology, W. Jassem, C. Basztura (Editor), Vol. I, Polish Phonetic Association, Poznan (1977) [9] S. Brachmański, P. Staroniewicz: Phonetic structure of test material used for subjective speech quality measurements: Speech and language technology, W. Jassem, C. Basztura (Editor), Vol. III, WPN Format, Poznan (1999) [10] D. H. Klatt, L. C. Klatt: Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America 87, No.2 (1990) [11] L. C. Nygaard, D.B. Pisoni: Speech perception: New directions in research and theory. Speech, language, and communication, J. L. Miller, P. D. Eimas (Editor), Academic, San Diego, CA, (1995) [12] J. W. Mullenix, D. B. Pisoni,: Stimulus variability and processing dependencies in speech perception,, Perception and Psychophysics 47 (1990) [13] K. P. Green, G. R. Tomiak, P. K. Kuhl: The encoding of rate and talker information during phonetic perception. Perception and Psychophysics 59 (1997) [14] R. E. Remez, J. M. Fellowes, P. E. Rubin: Voice identification based on phonetic information. Journal of Experimental Psychology: Human Perception and Performance 23 (1997) [15] D. Reddy: Speech recognition by machine: A review. Proceedings of IEEE 64, No.4 (1976) [16] D. B. Pisoni, P. A. Luce: Acoustic-phonetic representations in word recognition. Cognition 25 (1987) [17] C. S. Martin, J. W. Mullenix, D. B. Pisoni, W. V. Summers: Effects of talker variability on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory, and Cognition 17 (1989) [18] P. W. Jusczyk, D. B. Pisoni, J. Mullennix: Some consequences of stimulus variability on speech processing by 2- month-old infants. Cognition 43 (1992) [19] R. V. Shannon, F. G. Zeng, J. Wygonski, V. Kamath, M. Ekelid: Speech recognition with primarily temporal cues. Science 270 (1995)
5 [20] J. M. McQueen, A. Cutier, D. Norris: Flow of information in the spoken word recognition system. Speech Communication 41, No.1 (2003)
Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35
More informationSpeech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationCourse Law Enforcement II. Unit I Careers in Law Enforcement
Course Law Enforcement II Unit I Careers in Law Enforcement Essential Question How does communication affect the role of the public safety professional? TEKS 130.294(c) (1)(A)(B)(C) Prior Student Learning
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationVoice conversion through vector quantization
J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationEffects of Open-Set and Closed-Set Task Demands on Spoken Word Recognition
J Am Acad Audiol 17:331 349 (2006) Effects of Open-Set and Closed-Set Task Demands on Spoken Word Recognition Cynthia G. Clopper* David B. Pisoni Adam T. Tierney Abstract Closed-set tests of spoken word
More informationPhonological and Phonetic Representations: The Case of Neutralization
Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider
More informationRhythm-typology revisited.
DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques
More informationInfants learn phonotactic regularities from brief auditory experience
B69 Cognition 87 (2003) B69 B77 www.elsevier.com/locate/cognit Brief article Infants learn phonotactic regularities from brief auditory experience Kyle E. Chambers*, Kristine H. Onishi, Cynthia Fisher
More informationBody-Conducted Speech Recognition and its Application to Speech Support System
Body-Conducted Speech Recognition and its Application to Speech Support System 4 Shunsuke Ishimitsu Hiroshima City University Japan 1. Introduction In recent years, speech recognition systems have been
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationVoiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System
ARCHIVES OF ACOUSTICS Vol. 42, No. 3, pp. 375 383 (2017) Copyright c 2017 by PAN IPPT DOI: 10.1515/aoa-2017-0039 Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationPerceptual scaling of voice identity: common dimensions for different vowels and speakers
DOI 10.1007/s00426-008-0185-z ORIGINAL ARTICLE Perceptual scaling of voice identity: common dimensions for different vowels and speakers Oliver Baumann Æ Pascal Belin Received: 15 February 2008 / Accepted:
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationCandidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.
The Test of Interactive English, C2 Level Qualification Structure The Test of Interactive English consists of two units: Unit Name English English Each Unit is assessed via a separate examination, set,
More informationAuthor's personal copy
Speech Communication 49 (2007) 588 601 www.elsevier.com/locate/specom Abstract Subjective comparison and evaluation of speech enhancement Yi Hu, Philipos C. Loizou * Department of Electrical Engineering,
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationLearners Use Word-Level Statistics in Phonetic Category Acquisition
Learners Use Word-Level Statistics in Phonetic Category Acquisition Naomi Feldman, Emily Myers, Katherine White, Thomas Griffiths, and James Morgan 1. Introduction * One of the first challenges that language
More informationSCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany
Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to
More informationDyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,
Adoption studies, 274 275 Alliteration skill, 113, 115, 117 118, 122 123, 128, 136, 138 Alphabetic writing system, 5, 40, 127, 136, 410, 415 Alphabets (types of ) artificial transparent alphabet, 5 German
More informationCambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services
Normal Language Development Community Paediatric Audiology Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services Language develops unconsciously
More informationA comparison of spectral smoothing methods for segment concatenation based speech synthesis
D.T. Chappell, J.H.L. Hansen, "Spectral Smoothing for Speech Segment Concatenation, Speech Communication, Volume 36, Issues 3-4, March 2002, Pages 343-373. A comparison of spectral smoothing methods for
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationage, Speech and Hearii
age, Speech and Hearii 1 Speech Commun cation tion 2 Sensory Comm, ection i 298 RLE Progress Report Number 132 Section 1 Speech Communication Chapter 1 Speech Communication 299 300 RLE Progress Report
More informationThe Acquisition of English Intonation by Native Greek Speakers
The Acquisition of English Intonation by Native Greek Speakers Evia Kainada and Angelos Lengeris Technological Educational Institute of Patras, Aristotle University of Thessaloniki ekainada@teipat.gr,
More informationAGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016
AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory
More informationQuarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Nord, L. and Hammarberg, B. and Lundström, E. journal:
More informationNoise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions
26 24th European Signal Processing Conference (EUSIPCO) Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions Emma Jokinen Department
More informationAn Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English
Linguistic Portfolios Volume 6 Article 10 2017 An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English Cassy Lundy St. Cloud State University, casey.lundy@gmail.com
More informationA NOTE ON THE BIOLOGY OF SPEECH PERCEPTION* Michael Studdert-Kennedy+
A NOTE ON THE BIOLOGY OF SPEECH PERCEPTION* Michael Studdert-Kennedy+ The goal of a biological psychology is to undermine the autonomy of whatever it studies. For language, the goal is to derive its properties
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationOnline Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by:[university of Sussex] On: 15 July 2008 Access Details: [subscription number 776502344] Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationImproved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form
Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused
More informationPerceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University
1 Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech Jacques Koreman Saarland University Institute of Phonetics P.O. Box 151150 D-66041 Saarbrücken Germany
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationA Cross-language Corpus for Studying the Phonetics and Phonology of Prominence
A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence Bistra Andreeva 1, William Barry 1, Jacques Koreman 2 1 Saarland University Germany 2 Norwegian University of Science and
More informationAn Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District
An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District Report Submitted June 20, 2012, to Willis D. Hawley, Ph.D., Special
More informationTHE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS
THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS ROSEMARY O HALPIN University College London Department of Phonetics & Linguistics A dissertation submitted to the
More informationWord Stress and Intonation: Introduction
Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress
More informationSegregation of Unvoiced Speech from Nonspeech Interference
Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationLinking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds
Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds Anne L. Fulkerson 1, Sandra R. Waxman 2, and Jennifer M. Seymour 1 1 University
More informationSubject: Opening the American West. What are you teaching? Explorations of Lewis and Clark
Theme 2: My World & Others (Geography) Grade 5: Lewis and Clark: Opening the American West by Ellen Rodger (U.S. Geography) This 4MAT lesson incorporates activities in the Daily Lesson Guide (DLG) that
More informationJournal of Phonetics
Journal of Phonetics 41 (2013) 297 306 Contents lists available at SciVerse ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics The role of intonation in language and
More informationIndividual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION
L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.
More informationPsychology of Speech Production and Speech Perception
Psychology of Speech Production and Speech Perception Hugo Quené Clinical Language, Speech and Hearing Sciences, Utrecht University h.quene@uu.nl revised version 2009.06.10 1 Practical information Academic
More informationNovember 2012 MUET (800)
November 2012 MUET (800) OVERALL PERFORMANCE A total of 75 589 candidates took the November 2012 MUET. The performance of candidates for each paper, 800/1 Listening, 800/2 Speaking, 800/3 Reading and 800/4
More informationSpeaker Recognition. Speaker Diarization and Identification
Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationThe Evolution of Random Phenomena
The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples
More informationSEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH
SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud
More informationPhonetic imitation of L2 vowels in a rapid shadowing task. Arkadiusz Rojczyk. University of Silesia
Phonetic imitation of L2 vowels in a rapid shadowing task Arkadiusz Rojczyk University of Silesia Arkadiusz Rojczyk arkadiusz.rojczyk@us.edu.pl Institute of English, University of Silesia Grota-Roweckiego
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationA Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language
A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.
More informationOn Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC
On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationAtypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu
More informationModern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization
CS 294-5: Statistical Natural Language Processing Speech Synthesis Lecture 22: 12/4/05 Modern TTS systems 1960 s first full TTS Umeda et al (1968) 1970 s Joe Olive 1977 concatenation of linearprediction
More informationBeginning primarily with the investigations of Zimmermann (1980a),
Orofacial Movements Associated With Fluent Speech in Persons Who Stutter Michael D. McClean Walter Reed Army Medical Center, Washington, D.C. Stephen M. Tasko Western Michigan University, Kalamazoo, MI
More informationOn building models of spoken-word recognition: When there is as much to learn from natural oddities as artificial normality
Perception & Psychophysics 2008, 70 (7), 1235-1242 doi: 10.3758/PP.70.7.1235 On building models of spoken-word recognition: When there is as much to learn from natural oddities as artificial normality
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationIn how many ways can one junior and one senior be selected from a group of 8 juniors and 6 seniors?
Counting Principle If one activity can occur in m way and another activity can occur in n ways, then the activities together can occur in mn ways. Permutations arrangements of objects in a specific order
More informationRevisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab
Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationFOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.
CONTENTS FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8 УРОК (Unit) 1 25 1.1. QUESTIONS WITH КТО AND ЧТО 27 1.2. GENDER OF NOUNS 29 1.3. PERSONAL PRONOUNS 31 УРОК (Unit) 2 38 2.1. PRESENT TENSE OF THE
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationLanguage Acquisition Chart
Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationFix Your Vowels: Computer-assisted training by Dutch learners of Spanish
Carmen Lie-Lahuerta Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish I t is common knowledge that foreign learners struggle when it comes to producing the sounds of the target language
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationRachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA
LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,
More informationUNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak
UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationUsing GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning
80 Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning Anne M. Sinatra, Ph.D. Army Research Laboratory/Oak Ridge Associated Universities anne.m.sinatra.ctr@us.army.mil
More informationLecture Notes in Artificial Intelligence 4343
Lecture Notes in Artificial Intelligence 4343 Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Computer Science Christian Müller (Ed.) Speaker Classification I Fundamentals, Features,
More informationMastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.
Chapter 2 Mastering Team Skills and Interpersonal Communication Chapter 2-1 Communicating Effectively in Teams Chapter 2-2 Communicating Effectively in Teams Collaboration involves working together to
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationPerceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli
Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli Marianne Latinus 1,3 *, Pascal Belin 1,2 1 Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationLinguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1
Linguistics 1 Linguistics Matthew Gordon, Chair Interdepartmental Program in the College of Arts and Science 223 Tate Hall (573) 882-6421 gordonmj@missouri.edu Kibby Smith, Advisor Office of Multidisciplinary
More informationSuccess Factors for Creativity Workshops in RE
Success Factors for Creativity s in RE Sebastian Adam, Marcus Trapp Fraunhofer IESE Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany {sebastian.adam, marcus.trapp}@iese.fraunhofer.de Abstract. In today
More informationINTRODUCTION J. Acoust. Soc. Am. 102 (3), September /97/102(3)/1891/7/$ Acoustical Society of America 1891
Perception of synthetic /ba/ /wa/ speech continuum by budgerigars (Melopsittacus undulatus) Micheal L. Dent, Elizabeth F. Brittan-Powell, Robert J. Dooling, and Alisa Pierce Department of Psychology, University
More informationConcept Acquisition Without Representation William Dylan Sabo
Concept Acquisition Without Representation William Dylan Sabo Abstract: Contemporary debates in concept acquisition presuppose that cognizers can only acquire concepts on the basis of concepts they already
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationLecturing Module
Lecturing: What, why and when www.facultydevelopment.ca Lecturing Module What is lecturing? Lecturing is the most common and established method of teaching at universities around the world. The traditional
More informationPhysics 270: Experimental Physics
2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu
More information