Yoonsook Department of Linguistics Universityy of Illinois at Urbana-Champaign

Similar documents
The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

On the Formation of Phoneme Categories in DNN Acoustic Models

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Mandarin Lexical Tone Recognition: The Gating Paradigm

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Rhythm-typology revisited.

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

English Language and Applied Linguistics. Module Descriptions 2017/18

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition at ICSI: Broadcast News and beyond

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Word Stress and Intonation: Introduction

Proceedings of Meetings on Acoustics

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Phonological Processing for Urdu Text to Speech System

Copyright by Niamh Eileen Kelly 2015

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

L1 Influence on L2 Intonation in Russian Speakers of English

A survey of intonation systems

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Journal of Phonetics

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Journal of Phonetics

Letter-based speech synthesis

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Local and Global Acoustic Correlates of Information Structure in Bulgarian

The influence of metrical constraints on direct imitation across French varieties

The Acquisition of English Intonation by Native Greek Speakers

THE RECOGNITION OF SPEECH BY MACHINE

The Up corpus: A corpus of speech samples across adulthood

STA 225: Introductory Statistics (CT)

Segregation of Unvoiced Speech from Nonspeech Interference

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Phonological encoding in speech production

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Phonological and Phonetic Representations: The Case of Neutralization

Eyebrows in French talk-in-interaction

Applications of memory-based natural language processing

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Speech Emotion Recognition Using Support Vector Machine

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Collecting dialect data and making use of them an interim report from Swedia 2000

Organizing Comprehensive Literacy Assessment: How to Get Started

Characterizing and Processing Robot-Directed Speech

Individual Differences & Item Effects: How to test them, & how to test them well

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Sample Goals and Benchmarks

Automatic intonation assessment for computer aided language learning

Designing a Speech Corpus for Instance-based Spoken Language Generation

Discourse Structure in Spoken Language: Studies on Speech Corpora

Self-Supervised Acquisition of Vowels in American English

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Expressive speech synthesis: a review

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Program in Linguistics. Academic Year Assessment Report

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

IEEE Proof Print Version

Consonants: articulation and transcription

Phonetics. The Sound of Language

CEFR Overall Illustrative English Proficiency Scales

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Probability and Statistics Curriculum Pacing Guide

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

age, Speech and Hearii

Mixed Accents: Scottish Children with English Parents

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Lecture 9: Speech Recognition

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Lecturing Module

Clinical Application of the Mean Babbling Level and Syllable Structure Level

Year 4 National Curriculum requirements

Infants learn phonotactic regularities from brief auditory experience

A Neural Network GUI Tested on Text-To-Phoneme Mapping

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

REVIEW OF CONNECTED SPEECH

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Human Emotion Recognition From Speech

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Public Speaking Rubric

18 The syntax phonology interface

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

Transcription:

Yoonsook Y k Mo M Department of Linguistics Universityy of Illinois at Urbana-Champaign p g

Speech utterances are composed of hierarchically structured phonological phrases. A prosodic boundary marks the phonological phrase juncture and serves to demarcate chunks of words. Within each utterance, some words or phrases are more prominent than others. Prosodic prominence highlights a word or a phrase and conveys its status as focused or discourse-new. In this paper, in particular, prominence is of interest. t

This talk focuses on the phonetic correlates of prosodic prominence, and is part of my larger study of phonetic correlates of prosodic structure in production and perception.

Phonetic implementation Speakers encode prosodic structure through the modulation of phonetic parameters. Acoustic correlates of prominence Fundamental frequency (F0) Duration (Fry, 1955 and 1958; Turk and Sawusch, 1996) Intensity (Fry, 1955 and 1958; Kochanski, 2005) Sub-band intensities (Sluijter and van Heuven, 1996; Heldner, 2001 and 2003) Formants Spectral tilt (Fant et al., 2000; Sluijter and van Heuven, 1996)

I investigate the phonetic encoding of prominence 14 vowels in American English in everyday conversational speech from 38 ordinary speakers of American English by about 100 untrained, ordinary listeners Prominence as judged by ordinary listeners, based only on auditory impression. No visual inspection of speech display.

In other work I show duration, intensity and sub-band intensity measures to be important t correlates of prominence. (Mo, 2008a and b) What effect, if any, does prominence have on F0 and on vowel formants? Intonation Hyper- vs. hypo- articulation

Fundamental frequency (F0) Height and shape of F0 contours are shown to be as a major correlates of prominence Stressed vs. Unstressed (Lieberman, 1969; Cooper et al., 1985 among others) Pitch accents (Gussenhoven et al., 1997; Hermes and Rump, 1994; Pirrehumbert, 1979; Terken, 1991 and 1994) Still controversial Perception of focal status has not changed by gradual addition of F0 rise on non-focused words (Heldner and Strangert, 1997) F0 plays a minor role in the automatic classification of pitch accent (Kochanski, 2005)

Vowel quality Acoustic studies (Sluijter and van Heuven, 1996; van Bergem, 1993) Articulatory studies (Beckman et al., 1992; De Jong, 1995; ; Erickson, 2002; Cho, 2005)

Sonority expansion (Beckman et al., 1992) - Under accent, articulators move to increase sonority - More open vocal-tract Hyperarticulation (De Jong, 1995; ; Erickson, 2002) - Under accent, phonetic space of phonemic contrast expands - Feature distinctiveness is enhanced Combination of sonority expansion and hyperarticulation (Cho, 2005) - Under accent, more open - In front/ back dimension, more front or more back

To investigate the phonetic properties that cue prominence in conversational speech by ordinary listeners How does fundamental frequency vary? How are formant structures modified? To evaluate which underlying mechanism better describes the phenomenon of prominence, as judged by listeners

A speaker marks a word as prosodically prominent in accordance with its pragmatic value (e.g., focused), position in the phrase, and other factors. A speaker implements a prominent word with an F0 excursion, and with enhanced speech gestures that are longer, larger, or both. These effects are strongest on the lexically stressed syllable. Listeners perceive a word as prominent based on acoustic evidence of the speaker s s enhanced speech gesture. Therefore, words perceived as prominent will have stressed syllables that are acoustically enriched. - Higher F0 - Higher F1 and more peripheral F2

Experimental Hypotheses F0 Vowels in words perceived as prominent will have higher F0 peaks. Vowel quality Hyper-articulation: vowel formants will indicate more peripheral p place of OR articulation, because prominence enhances phonemic contrast High vowel: lower F1 Low vowel: higher F1 Front vowel: higher F2 Back vowel: lower F2 Sonority Expansion: higher F1 regardless of vowel height

Materials 54 speech excerpts from 38 speakers in the Buckeye corpus of spontaneous speech of American English. Sound files are equalized in their loudness level. Length: 11 to 58 seconds. Sound file presentation and its corresponding word transcripts Participants 97 listeners from undergraduate Linguistics courses Naïve in terms of phonetics and phonology of prosody transcription.

Simple definitions of prominence and boundary. Prominence which highlights a word or a phrase and makes them stand out from other non-prominent words Boundary which marks a chunk of speech and can help listeners interpret long stretches of continuous speech Playing sound files twice at their own pace. While listening, they marked prominent words and words at juncture using the following transcription marks: Prominence Boundary word word word word word word

Transcriptions pooled over listeners; each word is assigned a probabilistic P(rominence) and B(oundary) score ranged 0-1. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 01 0.1 0.0 B-scores P-scores Speaker 26

Fleiss kappa inter-transcriber agreement scores and their corresponding z-scores Exp.1 Exp. 2 z=2.33, α=0.01 Run 1 Run 2 Run 1 Grp 1 Grp 2 Grp 1 Grp 2 Grp 1 Grp 2 prominence Kappa 0.373 0.421 0.394 0.407 0.356 0.400 z 19.43 20.48 18.15 18.31 15.31 19.56 boundary Kappa 0.612 0.544 0.621 0.575 0.560 0.567 z 27.62 21.87 25.05 26.22 24.89 22.49 Fleiss' statistic shows that transcribers agreement is significantly above chance levels at p<.001 Untrained listeners transcription is reliable.

F0 Measured in 1ms interval Smoothed by median-filtering with a 13 point window only at CV junctures Interpolating F0 contours Formants Steady state formants (F1 and F2) measured Monophthong: at vowel midpoint i Diphthong: at 10% and 90% of the vowel

F0, F1 and F2 are extracted from the stressed vowels of each word in order to hold stress constant. Vowels ɑ æ ʌ ɔ aʋ aɪ ɛ N 173 290 407 121 52 309 463 ɝ eɪ ɪ i oʋ ʋ u Total 122 214 475 306 211 72 183 3398 Then the extracted acoustic measures are normalized. x x z = s F0 with a 400ms analysis window Formants in the total phone space

JC25 Hypothesis: The more prominent a word is, the higher F0 max will be. Pearson s bivariate correlation analysis b/w F0 max and Pscores All ɑ æ ʌ ɔ aʊ aɪ ɛ ɝ eɪ ɪ i oʊ ʊ u 52 309 214 211 N 3398 173 290 407 121 463 122 475 306 72 183 10 90 10 90 10 90 10 90 F1 NA + + + + + + + + + F2 NA + + + F0 max + + + + + + + + + + + The results support the hypothesis. Pscores are positively ii correlated with F0 max for the majority of vowels. Overall, words perceived as prominent have higher F0 max.

슬라이드 21 JC25 I like this slide! very clear! Jennifer Cole, 2/8/2009

Pearson s bivariate correlation analysis b/w formants and Pscores All ɑ æ ʌ ɔ aʊ aɪ ɛ ɝ eɪ ɪ i oʊ ʊ u N 3398 173 290 407 121 52 309 214 211 463 122 475 306 10 90 10 90 10 90 10 90 72 183 F1 NA + + + + + + + + + F2 NA + + + F0 max + + + + + + + + + + +

JC26 F1 o Pscores are positively correlated with F1 regardless of vowel height ht in all the monophthongs except the low back vowel, ɑ. o F2 Pscores are negatively correlated with F1 of the glide part of two diphthongs, eɪ and aʋ. Pscores are positively correlated with F2 of the front high vowel, i. Pscores are negatively correlated with F2 of many central and back vowels and the nucleus part of two diphthongs, aɪ and oʋ.

슬라이드 23 JC26 I think you should just read out this summary while the audience views the table of results from the preceding slide. If you have a handout, you can include this slide on the handout, but you don't have to show it. The next slide really delivers this information in a more digestible fashion! Jennifer Cole, 2/8/2009

High Front i u Back ɪ ʊ ɝ ɛ ʌ ɔ æ ɑ Low

Front Back High ɪ ʊ o Low a

슬라이드 25 JC27 I changed this line segment to an arrow, showing the direction of movement of the diphthong. You should make the arrow head larger, and make the same change for the other diphthongs Jennifer Cole, 2/8/2009

Front Back High ɪ ʊ e Low a

Hyperarticulation The stressed vowels perceived as prominent are peripheral in the vowel space. Partially supported: front/ back dimension The front vowel i, the nucleus of aʋ, and the glide of eɪ are more front when perceived as prominent. The vowels other than those listed above are more back when perceived as prominent.

Sonority Expansion Regardless of vowel height, the stressed vowel in a prominent word is more open. Supported Vowels have more open vocal tract except the low vowel ɑ and diphthongs when perceived as prominent.

The combination of Hypothesis 2 and 3 best account for the relation between formants and prosodic prominence. In front/ back dimension, peripheral vowel formants (F2) suggest that vowels are hyperarticulated under prominence. In high/low dimension, higher vowel formants (F1) of non-low vowels suggest that sonority expands under prominence.

R 2 (%) 25 20 F0 max F2 F1 15 10 5 0 aa ae ah ao ay aw eh er ey ih iy ow uh uw a æ ʌ ɔ aɪ aʋ ɛ ɝ eɪ ɪ i oʋ ʋ u Vowels

Regarding the results from stepwise regression analyses, only a small portion of the variation in listeners response to prominence (ranged from 3.3% for /æ/ - 23.2% for /aɪ/) can be explained on the basis of those measures Not a single acoustic measure is included in the regression model across all vowels Not a unified regression pattern accounts for the variation of prominence

In this study, prominence in conversational speech produced by ordinary speakers is judged d by untrained ordinary listeners. This transcription task approximates how listeners hear prosody in everyday conversation. Listeners perception of prominence is guided by the modulation of the patterns of F0, F1 and F2.

No single acoustic measure and no single pattern of prominence marking across vowels Therefore, other acoustic measures as well as other factors that affect the acoustic properties of speech should also be examined. Duration and intensities (Mo, 2008a and b) Syntactic category information (Cole, Mo & Baek., 2008) Word repetition and frequency (Cole, Mo & Hasegawa-Johnson, 2008)

Acknowledgements This research is supported by NSF grants IIS 07-03624 and IIS 04-14117 to Jennifer Cole and Mark Hasegawa- Johnson. Jennifer Cole, Linguistics, UIUC Mark Hasegawa-Johnson, ECE, UIUC Prosody-ASR group members

Two separate experiments are comprised of three runs. Experiments are different in terms of the lengths of speech excerpts. Exp. 1: 11-22 sec Exp. 2: 31-58 sec Exp.1 Exp. 2 Run 1 Run 2 Run 1 Grp 1 Grp 2 Grp 1 Grp 2 Grp 1 Grp 2 N of transcribers 15 16 20 23 11 12

P-scores in two 0.30 Experiment 1 2 experiments are not statistically different. 0.25 (F=3.028, p=.082) Me ean P-scores 0.20 0.15 P-scores of 14 vowels are different from one another (F=7.509, p<.001) 0.10 aa ae ah ao aw ay eh er ey ih iy ow uh uw Vowels