Voice Source Correlates of Prosodic Features in American English: A Pilot Study

Similar documents
A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Rhythm-typology revisited.

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Emotion Recognition Using Support Vector Machine

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Mandarin Lexical Tone Recognition: The Gating Paradigm

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

A study of speaker adaptation for DNN-based speech synthesis

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

L1 Influence on L2 Intonation in Russian Speakers of English

Automatic intonation assessment for computer aided language learning

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Speech Recognition at ICSI: Broadcast News and beyond

The influence of metrical constraints on direct imitation across French varieties

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Body-Conducted Speech Recognition and its Application to Speech Support System

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The Acquisition of English Intonation by Native Greek Speakers

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Voice conversion through vector quantization

Expressive speech synthesis: a review

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Word Stress and Intonation: Introduction

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Discourse Structure in Spoken Language: Studies on Speech Corpora

age, Speech and Hearii

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Journal of Phonetics

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Table of Contents. Introduction Choral Reading How to Use This Book...5. Cloze Activities Correlation to TESOL Standards...

Evaluation of Various Methods to Calculate the EGG Contact Quotient

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Proceedings of Meetings on Acoustics

L1 and L2 acquisition. Holger Diessel

THE RECOGNITION OF SPEECH BY MACHINE

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Part I. Figuring out how English works

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

/$ IEEE

Local and Global Acoustic Correlates of Information Structure in Bulgarian

Provisional. Using ambulatory voice monitoring to investigate common voice disorders: Research update

A survey of intonation systems

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

WHEN THERE IS A mismatch between the acoustic

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Human Emotion Recognition From Speech

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Software Maintenance

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Eyebrows in French talk-in-interaction

Designing a Speech Corpus for Instance-based Spoken Language Generation

Segregation of Unvoiced Speech from Nonspeech Interference

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Copyright by Niamh Eileen Kelly 2015

Speaker recognition using universal background model on YOHO database

Modeling function word errors in DNN-HMM based LVCSR systems

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

English Language and Applied Linguistics. Module Descriptions 2017/18

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Statewide Framework Document for:

Individual Differences & Item Effects: How to test them, & how to test them well

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Corpus Linguistics (L615)

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Speaker Identification by Comparison of Smart Methods. Abstract

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

Speaker Recognition. Speaker Diarization and Identification

Multi-Lingual Text Leveling

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

CEFR Overall Illustrative English Proficiency Scales

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Cross Language Information Retrieval

Formulaic Language and Fluency: ESL Teaching Applications

Author's personal copy

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Copyright and moral rights for this thesis are retained by the author

A Case Study: News Classification Based on Term Frequency

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

(De-)Accentuation and the Processing of Information Status: Evidence from Event- Related Brain Potentials

Transcription:

Voice Source Correlates of Prosodic Features in American English: A Pilot Study * Markus Iseli, * Yen-Liang Shue, ** Melissa A. Epstein, ** Patricia Keating, *** Jody Kreiman and * Abeer Alwan * Department of Electrical Engineering, UCLA ** Department of Linguistics, UCLA *** Department of Head and Neck Surgery, UCLA Work supported in part by the NSF 1

Goal To investigate how certain acoustic measures related to the voice source (F 0, H 1* -H 2*, LIN, RK, and E e ) correlate with prosodic events. 2

Motivation Prosodic events are conveyed in part by the voice source. Few studies have analyzed voice source parameters in connected speech (e.g. Fant & Kruckenberg 1994, Sluijter & Van Heuven 1996, Epstein 2002, Kochanski et al. 2005, Choi et al. 2005). Speech processing applications would benefit from knowledge of voice source parameter dependencies on prosody. 3

Introduction: Prosody Prosody broadly refers to intonation, phrasing, timing, and lexical stress in speech. Lexical stress allows for a particular syllable in a word to be more prominent. Pitch accents signify prominence of a word within a phrase. Here, both low (L * ) and high (H * ) pitch accents are studied. Boundaries indicate breaks between groups of words. 4

Acoustic measures: LF model measures u(t) t a t p t e t c T 0 t Open phase -E e Return phase Closed phase F 0 = 1/T 0 E e is proportional to intensity RK = (t e -t p )/t e is related to glottal skew (inversely related to high frequency energy) 5

Acoustic measures (cont d) U(f) (db) H 1 * H 2 * H 1* -H 2* is related to open quotient (Holmberg 1995) LIN is proportional to high-frequency energy F 02F0 f (Hz) 6

Materials: The corpus The corpus (Epstein, 2002) consists of the following eight-syllable sentences which were ToBI labeled: Dagada gave Bobby doodads. Dagada gave Bobby doodads. Dagada gave Bobby doodads? Dagada gave Bobby doodads? Bold words are focused: pitch accent (PA) factor. Two sentences are declarative and two are interrogative: sentence type/boundary (BOUND) factor. Stressed vs. unstressed syllables are studied to examine the lexical stress (STR) factor. 7

Speakers and Material Speakers: 3 adult (25-35 years old) native speakers of American English: 2 females (B and S) and 1 male (L) Signals collected in a sound booth with a 1.0 B & K condenser microphone, and sampled at 20 khz (later downsampled to 10 khz) Each sentence was recorded 10 times for each speaker; the first and last recordings were discarded in the analysis. Total number of syllables analyzed: 700 8

Method: Estimation of source-related measures F 0, E e, RK, and LIN estimated by inverse filtering and LF-fitting. Measures are taken over one cycle. H 1* -H 2* obtained as follows: SNACK (Sjölander, 2004) F 1, F 2, B 1, B 2 STRAIGHT (Kawahara et al., 1998) Parameter Extraction Formant F 0 H * 1, H * 2 H 1, H 2 correction (Iseli et al., 2004) 9

Inter- and intra-correlations F 0 E e RK Acoustic features * LIN H 1* -H * 2 Prosodic features: Stress Pitch Accent Boundary *all measures are z-score normalized for each utterance 10

Results: Correlation between E e and F 0 F 0r 140 Hz Compare to midfrequency F 0r presented in Fant et al. (1996) 0.678* -0.488* (*) Pearson s Correlation Coefficient (r) 11

Results: Correlation between LIN and F 0 F 0r 140 Hz 0.537* -0.294* (*) Pearson s r 12

Results: Correlation between RK and F 0 F 0r 140 Hz -0.615* 0.379* (*) Pearson s r 13

Other statistically-significant intra-correlations For all F 0 : E e is positively correlated with LIN (r = 0.708) RK is negatively correlated with LIN (r = -0.711) RK is negatively correlated with E e (r = -0.593) 14

Results: Intercorrelations STR no yes PA no yes PA L* H* BOUND dec int F 0 E e LIN RK H 1* -H 2 * Color code: MALE, FEMALES, BOTH Correlations shown are statistically significant at p <.01 15

Differences from our published Interspeech 06 paper In the published paper, measures were not z-score normalized and we did not separate the results of female versus male speakers. As a result of the normalization, H 1* -H 2* is no longer a correlate of stress nor of pitch accent and E e is no longer a correlate of sentence type. Instead, F 0 is shown to be a correlate of lexical stress. In addition, there was a gender (or perhaps F 0 ) related dependency for RK relative to stress and sentence type. 16

Summary and Conclusions For our data set: Lexical Stress results in lower F 0 and in lower/higher RK for the male/female talkers. Pitch accent It is important to distinguish between low and high tones. For all talkers, F 0, intensity, and high-frequency energy (as measured by LIN and RK) are higher for H * compared to L *. Boundaries interrogative sentences have higher F 0 and LIN, and lower open quotient (as measured by H 1* -H 2* ) than declarative sentences. RK was speaker specific. 17

Comparison with other work Choi et al, 2005: H 1 -H 2 and spectral tilt measures not useful for identifying accents. Amplitude is larger for accented syllables. We agree that H 1* -H 2* measures are not correlated with stress nor pitch accent, and that E e is correlated with pitch accent. However, we find that spectral tilt and glottal skew are correlated with pitch accent (they didn t distinguish between L * and H * ). 18

Comparison with other work (cont d) Sluijter & Van Heuven, 1996: Stressed syllables have more high frequency energy, and accented syllables have higher intensity. Here, only the female speakers showed smaller glottal skew for stressed syllables. Moreover, E e is higher for H * when compared to L *. Fant & Kruckenberg, 1996: In Swedish, F 0 is a stress correlate. F 0, intensity, and high-frequency emphasis, are correlated with pitch accent. Here, we also find that F 0 is a correlate for stress, and in addition, female speech shows high-frequency emphasis. For pitch accent, when distinguishing between H * and L *, we find similar results. 19

Summary and Conclusions (cont d) The absolute value of F 0 affects how E e, LIN, and RK are correlated with F 0. Among the five parameters studied, RK was the most speaker dependent. In the future, we will examine whether these results generalize to a larger database. 20

Thank you 21