Yoonsook Mo. University of Illinois at Urbana-Champaign

Similar documents
The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

On the Formation of Phoneme Categories in DNN Acoustic Models

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Mandarin Lexical Tone Recognition: The Gating Paradigm

English Language and Applied Linguistics. Module Descriptions 2017/18

Rhythm-typology revisited.

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition at ICSI: Broadcast News and beyond

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Word Stress and Intonation: Introduction

Proceedings of Meetings on Acoustics

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Copyright by Niamh Eileen Kelly 2015

Phonological Processing for Urdu Text to Speech System

L1 Influence on L2 Intonation in Russian Speakers of English

A survey of intonation systems

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Journal of Phonetics

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Local and Global Acoustic Correlates of Information Structure in Bulgarian

Journal of Phonetics

The influence of metrical constraints on direct imitation across French varieties

Letter-based speech synthesis

The Up corpus: A corpus of speech samples across adulthood

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

THE RECOGNITION OF SPEECH BY MACHINE

The Acquisition of English Intonation by Native Greek Speakers

Segregation of Unvoiced Speech from Nonspeech Interference

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Speech Emotion Recognition Using Support Vector Machine

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Collecting dialect data and making use of them an interim report from Swedia 2000

STA 225: Introductory Statistics (CT)

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Phonological and Phonetic Representations: The Case of Neutralization

Sample Goals and Benchmarks

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Eyebrows in French talk-in-interaction

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Characterizing and Processing Robot-Directed Speech

Phonological encoding in speech production

Expressive speech synthesis: a review

Automatic intonation assessment for computer aided language learning

Organizing Comprehensive Literacy Assessment: How to Get Started

Designing a Speech Corpus for Instance-based Spoken Language Generation

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Program in Linguistics. Academic Year Assessment Report

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Self-Supervised Acquisition of Vowels in American English

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Applications of memory-based natural language processing

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Discourse Structure in Spoken Language: Studies on Speech Corpora

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Probability and Statistics Curriculum Pacing Guide

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Individual Differences & Item Effects: How to test them, & how to test them well

age, Speech and Hearii

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

Mixed Accents: Scottish Children with English Parents

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

IEEE Proof Print Version

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Clinical Application of the Mean Babbling Level and Syllable Structure Level

Year 4 National Curriculum requirements

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Curriculum Vitae. Sara C. Steele, Ph.D, CCC-SLP 253 McGannon Hall 3750 Lindell Blvd., St. Louis, MO Tel:

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

Phonetics. The Sound of Language

Consonants: articulation and transcription

A Neural Network GUI Tested on Text-To-Phoneme Mapping

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Infants learn phonotactic regularities from brief auditory experience

Consonant-Vowel Unity in Element Theory*

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Voice conversion through vector quantization

18 The syntax phonology interface

Multi-Lingual Text Leveling

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Lecture 9: Speech Recognition

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Transcription:

Yoonsook Mo D t t off Linguistics Li i ti Department University of Illinois at Urbana-Champaign

Speech utterances are composed of hierarchically structured phonological phrases. A prosodic boundary marks the phonological phrase juncture and serves to demarcate chunks of words. Within each utterance, some words or phrases are more prominent than others. Prosodic prominence highlights a word or a phrase and conveys its status as focused or discourse-new. In this paper, in particular, prominence is of interest.

This talk focuses on the phonetic correlates of prosodic prominence, and is part of my larger study of phonetic correlates of prosodic structure in production and perception.

Phonetic implementation Speakers encode prosodic structure through the modulation of phonetic parameters. Acoustic correlates of prominence Fundamental frequency (F0) Duration (Fry, 1955 and 1958; Turk and Sawusch, 1996) Intensity (Fry, 1955 and 1958; Kochanski, 2005) Sub-band intensities (Sluijter and van Heuven, 1996; Heldner, 2001 and 2003) Formants Spectral tilt (Fant et al., 2000; Sluijter and van Heuven, 1996)

I investigate the phonetic encoding of prominence 14 vowels in American English in everyday y conversational speech from 38 ordinary speakers of American English by about 100 untrained, ordinary listeners Prominence as judged by ordinary listeners, based only on auditory impression. No visual inspection of speech display.

In other work I show duration, intensity and sub-band intensity measures to be important correlates of prominence. (Mo, 2008a and b) What effect, if any, does prominence have on F0 and on vowel formants? Intonation Hyper- vs. hypo- articulation

Fundamental frequency (F0) Height and shape of F0 contours are shown to be as a major correlates of prominence Stressed vs. Unstressed (Lieberman, 1969; Cooper et al., 1985 among others) Pitch accents (Gussenhoven et al., 1997; Hermes and Rump, 1994; Pirrehumbert, 1979; Terken, 1991 and 1994) Still controversial Perception of focal status has not changed by gradual addition of F0 rise on non-focused words (Heldner and Strangert, 1997) F0 plays a minor role in the automatic classification of pitch accent (Kochanski, 2005)

Vowel quality Acoustic studies (Sluijter and van Heuven, 1996; van Bergem, 1993) Articulatory studies (Beckman et al., 1992; De Jong, 1995; ; Erickson, 2002; Cho, 2005)

Sonority expansion (Beckman et al., 1992) - Under accent, articulators move to increase sonority - More open vocal-tract Hyperarticulation (De Jong, 1995; ; Erickson, 2002) - Under accent, phonetic space of phonemic contrast expands - Feature distinctiveness is enhanced Combination of sonority expansion and hyperarticulation (Cho, 2005) - Under accent, more open - In front/ back dimension, more front or more back

To investigate the phonetic properties that cue prominence in conversational speech by ordinary listeners How does fundamental frequency vary? How are formant structures modified? To evaluate which underlying mechanism better describes the phenomenon of prominence, as judged by listeners

A speaker marks a word as prosodically prominent in accordance with its pragmatic value (e.g., focused), position in the phrase, and other factors. A speaker implements a prominent word with an F0 excursion, and with enhanced speech gestures that are longer, larger, or both. These effects are strongest on the lexically stressed syllable. Listeners perceive a word as prominent based on acoustic evidence of the speaker s enhanced speech gesture. Therefore, words perceived as prominent will have stressed syllables that are acoustically enriched. - Higher F0 - Higher F1 and more peripheral F2

Experimental Hypotheses F0 Vowels in words perceived as prominent will have higher F0 peaks. Vowel quality Hyper-articulation: vowel formants will indicate more peripheral place of articulation, because prominence enhances phonemic contrast OR High vowel: lower F1 Low vowel: higher F1 Front vowel: higher F2 Back vowel: lower F2 Sonority Expansion: higher F1 regardless of vowel height

Materials 54 speech excerpts from 38 speakers in the Buckeye corpus of spontaneous speech of American English. Sound files are equalized in their loudness level. Length: 11 to 58 seconds. Sound file presentation and its corresponding word transcripts Participants 97 listeners from undergraduate Linguistics courses Naïve in terms of phonetics and phonology of prosody transcription.

Simple definitions of prominence and boundary. Prominence which highlights a word or a phrase and makes them stand out from other non-prominent words Boundary which marks a chunk of speech and can help listeners interpret long stretches of continuous speech Playing sound files twice at their own pace. While listening, they marked prominent words and words at juncture using the following transcription marks: Prominence Boundary word word word word word word

Transcriptions pooled over listeners; each word is assigned a probabilistic P(rominence) and B(oundary) score ranged 0-1. 1.0 0.9 B-scores 0.8 P-scores 0.7 0.6 0.5 0.4 0.3 0.2 0.1 00 0.0 Speaker 26

Fleiss kappa inter-transcriber agreement scores and their corresponding z-scores Exp.1 Exp. 2 z=2.33, α=0.01 Run 1 Run 2 Run 1 Grp 1 Grp 2 Grp 1 Grp 2 Grp 1 Grp 2 prominence Kappa 0.373 0.421 0.394 0.407 0.356 0.400 z 19.43 20.48 18.15 18.31 15.31 19.56 boundary Kappa 0.612 0.544 0.621 0.575 0.560 0.567 z 27.62 21.87 25.05 26.22 24.89 22.49 Fleiss' statistic shows that transcribers agreement is significantly above chance levels l at p<.001 Untrained listeners transcription is reliable.

F0 Measured in 1ms interval Smoothed by median-filtering with a 13 point window only at CV junctures Interpolating F0 contours Formants Steady state formants (F1 and F2) measured Monophthong: at vowel midpoint Diphthong: at 10% and 90% of the vowel

F0, F1 and F2 are extracted from the stressed vowels of each word in order to hold stress constant. Vowels ɑ æ ʌ ɔ aʋ aɪ ɛ N 173 290 407 121 52 309 463 ɝ eɪ ɪ i oʋ ʋ u Total 122 214 475 306 211 72 183 3398 Then the extracted acoustic measures are normalized. x x z = s F0 with a 400ms analysis window Formants in the total phone space

Hypothesis: The more prominent a word is, the higher F0 max will be. Pearson s bivariate correlation analysis b/w F0 max and Pscores All ɑ æ ʌ ɔ aʊ aɪ ɛ ɝ eɪ ɪ i oʊ ʊ u N 3398 173 290 407 121 52 309 214 211 463 122 475 306 10 90 10 90 10 90 10 90 72 183 F1 NA + + + + + + + + + F2 NA + + + F0 max + + + + + + + + + + + The results support the hypothesis. Pscores are positively correlated with F0 max for the majority of vowels. Overall, words perceived as prominent have higher F0 max.

Pearson s bivariate correlation analysis b/w formants and Pscores All ɑ æ ʌ ɔ aʊ aɪ ɛ ɝ eɪ ɪ i oʊ ʊ u 52 309 214 211 N 3398 173 290 407 121 463 122 475 306 72 183 10 90 10 90 10 90 10 90 F1 NA + + + + + + + + + F2 NA + + + F0 max + + + + + + + + + + +

High Front i u Back ɪ ʊ ɝ ɛ ʌ ɔ æ ɑ Low

Front Back High ɪ ʊ o Low a

Front Back High ɪ ʊ e Low a

Hyperarticulation The stressed vowels perceived as prominent are peripheral in the vowel space. Partially supported: front/ back dimension The front vowel i, the nucleus of aʋ, and the glide of eɪ are more front when perceived as prominent. The vowels other than those listed above are more back when perceived as prominent.

Sonority Expansion Regardless of vowel height, the stressed vowel in a prominent word is more open. Supported Vowels have more open vocal tract except the low vowel ɑ and diphthongs when perceived as prominent.

The combination of Hyperarticulation and sonority expansion best accounts for the relation between formants and prosodic prominence. In front/ back dimension, peripheral vowel formants (F2) suggest that vowels are hyperarticulated under prominence. In high/low dimension higher vowel formants (F1) of non-low vowels In high/low dimension, higher vowel formants (F1) of non-low vowels suggest that sonority expands under prominence.

R 2 (%) 25 20 F0 max F2 F1 15 10 5 0 aa ae ah ao ay aw eh er ey ih iy ow uh uw a æ ʌ ɔ aɪ aʋ ɛ ɝ eɪ ɪ i oʋ ʋ u Vowels

Regarding g the results from stepwise regression analyses, only a small portion of the variation in listeners response to prominence (ranged from 3.3% for /æ/ - 23.2% for /aɪ/) can be explained on the basis of those measures Not a single acoustic measure is included in the regression model across all vowels Not a unified regression pattern accounts for the variation of prominence

In this study, prominence in conversational speech produced by ordinary speakers is judged by untrained ordinary listeners. This transcription task approximates how listeners hear prosody in everyday conversation. Listeners perception of prominence is guided by the modulation of the patterns of F0, F1 and F2.

No single acoustic measure and no single pattern of prominence marks across vowels Therefore, other acoustic measures as well as other factors that affect the acoustic properties of speech should also be examined. Duration and intensities (Mo, 2008a and b) Syntactic category information (Cole, Mo & Baek., 2008) Word repetition and frequency (Cole, Mo & Hasegawa-Johnson, 2008)

Acknowledgements This research is supported by NSF grants IIS 07-03624 and IIS 04-14117 to Jennifer Cole and Mark Hasegawa- Johnson. Jennifer Cole, Linguistics, UIUC Mark Hasegawa-Johnson, ECE, UIUC Prosody-ASR group members