Rhythm-typology revisited.

Similar documents
A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Mandarin Lexical Tone Recognition: The Gating Paradigm

Word Stress and Intonation: Introduction

Local and Global Acoustic Correlates of Information Structure in Bulgarian

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

The influence of metrical constraints on direct imitation across French varieties

Copyright by Niamh Eileen Kelly 2015

Journal of Phonetics

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

The Acquisition of English Intonation by Native Greek Speakers

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Phonological and Phonetic Representations: The Case of Neutralization

WHEN THERE IS A mismatch between the acoustic

A survey of intonation systems

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Universal contrastive analysis as a learning principle in CAPT

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Individual Differences & Item Effects: How to test them, & how to test them well

Speech Recognition at ICSI: Broadcast News and beyond

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

L1 Influence on L2 Intonation in Russian Speakers of English

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Stages of Literacy Ros Lugg

Collecting dialect data and making use of them an interim report from Swedia 2000

English Language and Applied Linguistics. Module Descriptions 2017/18

On the nature of voicing assimilation(s)

Phonological Processing for Urdu Text to Speech System

Bitonal lexical pitch accents in the Limburgian dialect of Borgloon

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

REVIEW OF CONNECTED SPEECH

A study of speaker adaptation for DNN-based speech synthesis

Running head: DELAY AND PROSPECTIVE MEMORY 1

(De-)Accentuation and the Processing of Information Status: Evidence from Event- Related Brain Potentials

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Proceedings of Meetings on Acoustics

Understanding and Supporting Dyslexia Godstone Village School. January 2017

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Voice conversion through vector quantization

Speech Emotion Recognition Using Support Vector Machine

What is related to student retention in STEM for STEM majors? Abstract:

Lecture Notes in Artificial Intelligence 4343

Manual Response Dynamics Reflect Rapid Integration of Intonational Information during Reference Resolution

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Florida Reading Endorsement Alignment Matrix Competency 1

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

age, Speech and Hearii

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Automatic intonation assessment for computer aided language learning

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

South Carolina English Language Arts

One major theoretical issue of interest in both developing and

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Learners Use Word-Level Statistics in Phonetic Category Acquisition

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

ECON 365 fall papers GEOS 330Z fall papers HUMN 300Z fall papers PHIL 370 fall papers

Automatic Pronunciation Checker

Prosody in Speech Interaction Expression of the Speaker and Appeal to the Listener

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

18 The syntax phonology interface

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

Annotation Pro. annotation of linguistic and paralinguistic features in speech. Katarzyna Klessa. Phon&Phon meeting

Segregation of Unvoiced Speech from Nonspeech Interference

Consonant-Vowel Unity in Element Theory*

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

The Prosodic (Re)organization of Determiners

Jazz Dance. Module Descriptor.

Learning Methods in Multilingual Speech Recognition

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

A Socio-Tonetic Analysis of Sui Dialect Contact. James N. Stanford Rice University. [To appear in Language Variation and Change 20(3)]

IEEE Proof Print Version

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

INTRODUCTION. 512 J. Acoust. Soc. Am. 105 (1), January /99/105(1)/512/10/$ Acoustical Society of America 512

Unit Plan: Meter, Beat, and Time Signatures Music Theory Jenny Knabb The Pennsylvania State University Spring 2015

Beginning primarily with the investigations of Zimmermann (1980a),

Cross Language Information Retrieval

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

On the Formation of Phoneme Categories in DNN Acoustic Models

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Infants Perception of Intonation: Is It a Statement or a Question?

Inhibitory control in L2 phonological processing

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Letter-based speech synthesis

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Transcription:

DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques Koreman

Outline Research questions Recordings Measurements Statistical analysis Results Discussion Conclusions and Outlook

Research questions How do different languages exploit the universal, psycho-acoustically determined means of modifying the prominence of words in an utterance? duration fundamental frequency energy spectral properties Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? duration (length opposition; word stress) fundamental frequency (tonal word-accent) spectral properties (phonologized vowel reduction) Do speakers of a language vary in the strategies they adopt (for production and fot perception)?

For further clarification We have NOT investigated "word stress / word accent"....but rather the change in a given word as a result of making it more or less informationally prominent in the utterance; i.e., the loss of length distinction in the [o] in German Philosophie vs. Philosoph or the vowel quality alternation between [Ǣ] and [ə] in English philosopher and philosophical is not the focus of our investigation. (though it may have a bearing on our interpretation of results)

Phrasal (de-)accentuation Accentuation (phonological) can make prominent (phonetic). by lengthening,. by increasing loudness, and combinations thereof. by changing the pitch De-accentuation can reduce prominence. by shortening (including segment elision),. by decreasing loudness,. by avoiding pitch changes,. by reducing spectral distinctiveness. These properties determine the rhythm type

The link to rhythm? Speech rhythm (as a regular syllable-based or foot-based "beat") is an appealing myth.. Though we do have a very fine sense of the appropriate temporal patterning of any particular utterance (in any particular situation)..... in fact we decode it in terms of information weight. Structural differences between languages are important. because they determine the temporal patterns, and they may constrain how words are made prominent. 'Rhythm' = utterance dependent prominence pattern (not only determined by duration)

Principle of our approach Comparable production task across languages (different degrees of accentuation on same words by eliciting different focus conditions for the same sentence)

Material and elicitation Short sentences were constructed containing two one- or two-syllable "critical words" (CWs), one early (but not initial) and one late (but not final) in the sentence. + iterative versions (dada) to support comparisons across languages

German example (comparable in BG, F, N, RUS) Question: Response: Question: Response: Was sagst du? (broad) Der Mann fuhr den Wagen vor. Wer fuhr den Wagen vor? (narrow early) Der MANN fuhr den Wagen vor. Question: Was fuhr der Mann vor? (narrow late) Response: Der Mann fuhr den WAGEN vor. Question: Die DAME fuhr den Wagen vor? (narrow contr. early) Response: Der MANN fuhr den Wagen vor Question: Der Mann fuhr die KLAGEN vor? (narrow contr. late) Response: Der Mann fuhr den WAGEN vor. The questions were pre-recorded to accompany a PowerPoint presentation of the responses. text dada

(khz) 5Freq Levels of prominence CW1 CW2 CW1 CW2 CW1 CW2 early broad Early focus 5 Broad focus 5 (khz) 5Freq (khz) 5Freq late Late focus 5 Pitch (Hz) Pitch (Hz) Pitch (Hz) d e:5 m a n f u:5 d e: n v a: g n f o:5 d e:5 m a n f u:5 d e: n v a: g N f o:5 d e:5 m a n f u:5 d e: n v a: g n f o:5 1.8 Time (s) 3.5 1.4 Time (s) 3.1 2.1 Time (s) 3.8 + stress + acc. + nucl. + narrow + stress - acc. - nucl. + narrow + stress + acc. - nucl. - narrow + stress + acc. + nucl. - narrow + stress - acc. - nucl. + narrow + stress + acc. + nucl. + narrow

(khz) 5Freq Levels of prominence CW1 CW2 CW1 CW2 CW1 CW2 early broad Early focus 5 Broad focus 5 (khz) 5Freq (khz) 5Freq late Late focus 5 Pitch (Hz) Pitch (Hz) Pitch (Hz) d e:5 m a n f u:5 d e: n v a: g n f o:5 d e:5 m a n f u:5 d e: n v a: g N f o:5 d e:5 m a n f u:5 d e: n v a: g n f o:5 1.8 Time (s) 3.5 1.4 Time (s) 3.1 2.1 Time (s) 3.8 + stress + acc. + nucl. + narrow + stress - acc. - nucl. + narrow + stress + acc. - nucl. - narrow + stress + acc. + nucl. - narrow + stress - acc. - nucl. + narrow + stress + acc. + nucl. + narrow

Break down of analysis Material: 6 sentences 6 repetitions 3 focus condition (broad, narrow, narow contr.) 2 sentence positions (early, late) 2 realisational variants (lexical, delexicalised iterative) Language: Bulgarian, French, German, Norwegian, Russian Speakers: 6 regionally homogeneous Speakers (3 m, 3 f) per language (Sofia, northern standard French, Saarland, south-east Norway, Moscow area) Analysis total per language: 216 utterances

Measurements Duration Duration (ms) of stressed vowels, stressed syllables, CWs, feet F Mean F across stressed vowel of CW F change (comparison of stressed vowel in CW with preceding/following vowels) Energy intensity (db) of stressed vowel in CW Spectral balance = difference between 7-1 Hz band and 12-5 Hz band in stressed vowel of CW Normalized relative to mean across corresp. units in sentence Spectr. def. F1 F3 at middle of stressed nucleus of CW

Statistical analysis One Way Repeated Measures ANOVA per parameter for CW1 and CW2 separately with dependent variables: - duration: syll, onset, vowel; F mean, F change; intensity, spectral tilt; F1, F2, F3); with within-subject variable: - prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow) with between-subject variable: - language (BG, D, F, N, RUS) To see whether the prominence categories are realised differently across languages

Statistical analysis (cont.) Multivariate Anova s per language for CW1 and CW2 separately with dependent variables: - duration: syll, onset, vowel; F mean, F change; intensity, spectral tilt; F1, F2, F3) with independent variable: - prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow) To evaluate wich parameters are used to distiungish prominence categories in the five languages

Results: ANOVA with Repeated measures main effects for language lang. x prominence Parameter CW1 CW2 Parameter CW1 CW2 syllable dur. onset dur. vowel dur. n.s. n.s. syllable dur. onset dur. vowel dur. F mean F change n.s. F mean F change n.s. n.s. intensity spect. tilt intensity spect. tilt n.s. n.s. F1 F2 F3 n.s. n.s. n.s. n.s. F1 F2 F3 n.s. n.s. n.s.

η2-values are a ratio of conditions (prominence) and total variance, and thus indicate the part of the total variance explained by the focus conditions. Languages use the acoustic carriers of prominence to different degrees: η 2 -values for prominence 1 1 1,9,9,9,8,8,8,7,7,7,6,6,6,5,5,5,4,4,4,3,3,3,2,2,2,1,1,1 BG Bul Ger D Fr F 1,9,8,7,6,5,4,3,2,1 Nor N 1,9,8,7,6,5,4,3,2,1 Rus RUS * Results given here for CW1 but similar patterns for CW2 syllable onset vowel F1 F2 F3 intensity spec. Tilt F change F mean

Results: Duration Syllable duration range from accented to deaccented (from [dada] recordings): N > F > D ~ RUS > BG CS1 49% 3% 25% 24% 15% N > F > RUS > D ~ BG CS2 55% 37% 26% 19% 16% Note: No apparent connection between vowel-length opposition and use of duration for accentuation (compare N and D vs. F, RUS and BG)

Results: Duration CW1 CW2 c. late nc. late broad c. early nc. early BG: nc_late < c_early D: c_late < c_early F: late, br < br, early N: late, br < early RUS: c_late < nc_early BG: early < c_late G: early, br < late F: early, br < late N: early < br < late RUS: early, br < late

Results: F range F range in % from accented to deaccented (from [dada] recordings): F > D > BG ~ RUS ~ N CS1 29% 23% 18% 14% 13% F ~ D > BG > RUS > N CS2 28% 27% 23% 16% 7% These values do not have any systematic link to pitch accent categories, but note Norwegian (lexical tones)

Results: F change CW1 CW2 c. late nc. late broad c. early nc. early BG: - D: late, br < early F: late, br < early N: late < br < early RUS: late < c_early BG: early, br < c_early, c_late < br, late D: early < br < late F: early, br < late N: - RUS: early, br < br, late

Results: Intensity Intensity range in db from deaccented to accented (from [dada] recordings): BG > F > D = RUS > N CS1 5.6 3.4 2.9 2.9 1.5 BG > F ~ D > RUS > N CS2 6.1 5.7 5.3 3.9 2.7 Note: Larger intensity range for CS2 than CS1 due to greater post-nuclear than pre-nuclear de-accenting.

Results: Intensity CW1 CW2 c. late nc. late broad c. early nc. early BG: late, br < early D: late < br < early F: late < br < early N: late, br < early RUS: late, br < br, nc_early < early BG: early, br < late D: early < br < late F: early < br < late N: early, br < br, late RUS: early, br < late

Perception tests Different values in production analysis imply differential perceptual judgements......therefore pairwise presentation of different conditions (broad, contrastive early, contrastive late, non-contrastive early, non-contrastive late) Continuous prominence values preferable for statistical treatment......therefore non-categorical judgements (using a graphic interface)

A mouseclick plays the two versions in sequence The sequence may be played as often as required Both sequences are offered during the course of the experiment Erster Satz: Der Mann fuhr den Wagen vor. 1. stärker Erster Satz: Der Mann fuhr den Wagen vor. 1. stärker beide gleich stark beide gleich stark 2. stärker Zweiter Satz: Der Mann fuhr den Wagen vor. Zweiter Satz: 2. stärker Der Mann fuhr den Wagen vor. Interface for 1st critical word Interface for 2nd critical word

Perception tests (cont.) Signal manipulation: Change one parameter at a time to the value of the opposite prominence status (accented unaccented and vice versa) Problems: Parameters are not totally independent: Durational change affects F contours

Results Parewise comparison of natural stimuli: The subjects are well able to distinguish the different level of prominence. Perception with parameter manipulated stimuli: F > Duration > Intensity (Russian subjects are slightly more sensitive to Intensity)

Discussion Isačenko & Schädlich 1966, Fry 1958 found the same hierarchy in their perception experiments but Kochanski et al., 25, Tamburini & Wagner, 27: Loudness/Intensity as the main predictor of prominence in their production analyses N.B. Fry and Isačenko & Schädlich worked exclusively with lexical stress; Kochanski et al. and T&W combine lexical stress and phrasal prominence and worked only on production Our results (η2-values) show a similar importance of intensity in production, but the perception work supports Fry and Isačenko & Schädlich s conclusions!

Conclusions and Outlook The languages differ in the degree to which they exploit duration, F and intensity in production and to some extent in perception The differences (in production and perception) are not directly linked to structural differences between the languages None of the results support the mythological rhythm typology: stress-timed vs. syllable-timed The complex picture of language differences in production contrasts with an apparent universal perceptional hierarchy (F > Duration > Intensity) All previos rhythm typology work has concentrated solely on duration. Natural communication combines intonation and segmental structure within an information structural framework. Languages will therefore differ rhythmically as a product of duration AND F and rhythm measures need to reflect this.