Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Similar documents
Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Mandarin Lexical Tone Recognition: The Gating Paradigm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Rhythm-typology revisited.

Consonants: articulation and transcription

Proceedings of Meetings on Acoustics

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Body-Conducted Speech Recognition and its Application to Speech Support System

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Phonetics. The Sound of Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Emotion Recognition Using Support Vector Machine

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Learning Methods in Multilingual Speech Recognition

Phonological and Phonetic Representations: The Case of Neutralization

THE RECOGNITION OF SPEECH BY MACHINE

WHEN THERE IS A mismatch between the acoustic

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Segregation of Unvoiced Speech from Nonspeech Interference

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Shockwheat. Statistics 1, Activity 1

Speaker recognition using universal background model on YOHO database

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Word Stress and Intonation: Introduction

Speech Recognition at ICSI: Broadcast News and beyond

Human Factors Engineering Design and Evaluation Checklist

The Acquisition of English Intonation by Native Greek Speakers

age, Speech and Hearii

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Manner assimilation in Uyghur

The Good Judgment Project: A large scale test of different methods of combining expert predictions

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Universal contrastive analysis as a learning principle in CAPT

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Journal of Phonetics

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Automatic intonation assessment for computer aided language learning

5. Margi (Chadic, Nigeria): H, L, R (Williams 1973, Hoffmann 1963)

E-3: Check for academic understanding

Learners Use Word-Level Statistics in Phonetic Category Acquisition

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Speaker Recognition. Speaker Diarization and Identification

The International Coach Federation (ICF) Global Consumer Awareness Study

Phonological Processing for Urdu Text to Speech System

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

L1 Influence on L2 Intonation in Russian Speakers of English

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

CSC200: Lecture 4. Allan Borodin

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

Making Sales Calls. Watertown High School, Watertown, Massachusetts. 1 hour, 4 5 days per week

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Sound and Meaning in Auditory Data Display

Voice conversion through vector quantization

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Age Effects on Syntactic Control in. Second Language Learning

Seminar - Organic Computing

M55205-Mastering Microsoft Project 2016

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

PREVIEW LEADER S GUIDE IT S ABOUT RESPECT CONTENTS. Recognizing Harassment in a Diverse Workplace

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Lecture 2: Quantifiers and Approximation

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Cross Language Information Retrieval

Expressive speech synthesis: a review

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

SARDNET: A Self-Organizing Feature Map for Sequences

Summary results (year 1-3)

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Lecture 15: Test Procedure in Engineering Design

Visit us at:

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Stages of Literacy Ros Lugg

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

The Strong Minimalist Thesis and Bounded Optimality

Usability Design Strategies for Children: Developing Children Learning and Knowledge in Decreasing Children Dental Anxiety

Human Emotion Recognition From Speech

Beginning primarily with the investigations of Zimmermann (1980a),

Guidelines for blind and partially sighted candidates

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Transcription:

MIT OpenCourseWare http://ocw.mit.edu 24.910 Topics in Linguistic Theory: Laboratory Phonology Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

24.910 Laboratory Phonology The Theory of Adaptive Dispersion Image by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

Readings for next week: Steriade (1999), pp. 1-21 Wright (2004). Assignment: Waveform editing

Lindblom s Theory of Adaptive Dispersion Common vowel inventories: i u i u i u Arabic, Nyangumata, Aleut, etc. e o e a a a Spanish, Swahili, Cherokee, etc. Unattested vowel inventories: Italian, Yoruba, Tunica, etc. i i i u e e a a o ɔ

Lindblom s Theory of Adaptive Dispersion Try to explain why vowel systems are the way they are. Observation: vowels in an inventory tend to be evenly dispersed through the vowel space (cf. Disner 1984). Hypothesis: this facilitates efficient communication by minimizing the likelihood of confusing vowels. i u i u i u e o e o ɔ a a a First Formant (F 1 ) khz 2.5.5.75 2.5 1.5 1.0.5 MEL 250 500 750 First Formant (F 1 ) MEL 1500 1000 500 Second Formant (M 2 ) Figure by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

Lindblom s Theory of Adaptive Dispersion Try to explain why vowel systems are the way they are. Observation: vowels in an inventory tend to be evenly dispersed through the vowel space (cf. Disner 1984). Hypothesis: this facilitates efficient communication by minimizing the likelihood of confusing vowels. Vowels that are closer in the perceptual space are more easily confused. Confusions between contrasting sounds impair communication. So contrasting vowels should be as far apart as possible (dispersion).

Liljencrants & Lindblom (1972) Approach to exploring dispersion hypothesis: Modeling Simulation Comparison of simulation results to impressionistic descriptions of a large sample of vowel inventories.

Liljencrants and Lindblom (1972) The role of perceptual contrast in predicting vowel inventories. Third Formant (F 3 ) khz 1.5 2.0 3.0 4.0 Second Formant (M 2 ) 2.5 1.5 1.0.5 MEL 1500 2000 Third Formant (F ) 3 The perceptual space of articulatorily possible vowels: First Formant (F 1 ) khz 2.5.5.75 MEL 2.5 1500 1000 1.5 1.0.5 500 MEL 250 500 750 First Formant (F 1 ) MEL 1500 1000 500 Second Formant (M 2 ) Figures by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

The vowel space Why does the vowel space look like this? Why do the dimensions correspond to formant frequencies? Why just the first 2-3 formant frequencies? Why does the F1-F2 space have this shape? Third Formant (F 3 ) khz 1.5 2.0 3.0 4.0 Second Formant (M 2 ) 2.5 1.5 1.0.5 MEL 1500 2000 Third Formant (F 3 ) First Formant (F 1 ) khz 2.5.5.75 2.5 1.5 1.0.5 MEL 250 500 750 First Formant (F 1 ) MEL 1500 1000 500 MEL 1500 1000 500 Second Formant (M 2 ) Figures by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

Why do the perceptual dimensions of vowel quality correspond to formant frequencies? (cf. Pierrehumbert 2000) Production - we can control formant frequencies. Given that vowels are produced with a relatively open vocal tract, the properties of these sounds that we can manipulate most easily are: f0 (pitch) - a source property. The basis for tone contrasts. formants - filter property - the resonant frequencies of the vocal tract. Bandwidths and formant intensities generally covary with formant frequencies (Fant 1956). Varying bandwiths independently would involve changing the stiffness of the vocal tract walls, or the mode of vocal fold vibration. (NB nasalization affects formant bandwidths).

Why do the perceptual dimensions of vowel quality correspond to formant frequencies? Perception - we can perceive formant peaks. f0 is (usually) much lower than formant frequencies. Resonant frequencies are well represented as peaks in the ouput spectrum. Exception: soprano singing. Formant peaks are more robustly perceptible than valleys because they can rise above background noise.

Why do the perceptual dimensions of vowel quality correspond to F1, F2 (&F3)? Higher formants are not important in vowel quality because they are insufficiently perceptible (especially in noise). There is less energy in the voice source at higher frequencies. Our ears are less sensitive to higher frequencies. 0.005 0.01 s Figure by MIT OpenCourseWare. Intensity Level in db 150 140 130 120 db 0-10 -20-30 -40 0 1000 2000 3000 Hz Hz Figure by MIT OpenCourseWare. Upper limit of hearing 110 100 90 80 70 60 50 40 30 20 10 0 Lower limit of audibility -10-20 20 100 1K 10K 20K Frequency in Hz Figure by MIT OpenCourseWare.

The vowel space Why does the range of possible F2 values taper as F1 increases? How do you achieve maximum and minimum F1? How do you achieve maximum and minimum F2? Third Formant (F 3 ) khz 1.5 2.0 3.0 4.0 Second Formant (M 2 ) 2.5 1.5 1.0.5 MEL 1500 2000 Third Formant (F 3 ) First Formant (F 1 ) khz 2.5.5.75 2.5 1.5 1.0.5 MEL 250 500 750 First Formant (F 1 ) MEL 1500 1000 500 MEL 1500 1000 500 Second Formant (M 2 ) Figures by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

Liljencrants and Lindblom (1972) Perceptual distinctiveness of contrast between V i and V j : distance between vowels in perceptual vowel space r ij = (x i x j ) 2 +(y i y j ) 2 where x n is F2 of V n in mel y n is F1 of V n in mel Maximize distinctiveness: select N vowels so as to minimize E E = n 1 i =1 i 1 j =0 1 r ij 2

Liljencrants and Lindblom (1972) Prediction: vowel inventories with a given number of vowels should arrange those vowels so as to minimize E. What are those predicted vowel arrangements? Optimization problem: For N vowels, find F1, F2 values that minimize E (objective function). Large search space, many local minima. E = n 1 i 1 i =1 j =0 1 r ij 2

Minimizing E - stochastic search Start with vowels arranged in a circle near the center of the vowel space. (Random arrangement might be better?) Pick a vowel at random. Try small movements of that vowel in 6 directions (within the vowel space) Select the direction that results in greatest reduction in E. Move vowel in that direction until E stops decreasing, or a boundary is reached. Repeat for all vowels. Cycle through the vowels until no further reduction in E can be achieved. Should be repeated multiple times, preferably with different starting configurations. More sophisticated search strategies are possible, e.g. simulated annealing or more sophisticated procedures for identifying best change at each stage.

Predicted optimal inventories Reasonable approximations to typical 3 and 5 vowel inventories are derived. Preference for [i, a, u] is derived. Problem: Too many high, nonperipheral vowels. Not enough mid non-peripheral vowels. Second Formant (khz) 2.5 2.0 1.5 1.0.5 2.5 2.0 1.5 1.0.5 2.5 2.0 1.5 1.0 3 4 5 6 7 8 9 10 11 12.5.2.4.6.8.2.4.6.8.2.4.6.8 First Formant (khz).2.4.6.8 Figure by MIT OpenCourseWare. Adapted from Liljencrants, Johan, and Bjorn Lindblom. "Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast." Language 48, no. 4 (December 1972): 839-862.

Too many high non-peripheral vowels All inventories larger than 5 are predicted to contain one or more high vowels between [i] and [u], e.g. [y, ɨ, ɯ]. E.g. prediction for 7 vowels (unattested): i ε a u c Figure by MIT OpenCourseWare. Figure by MIT OpenCourseWare. Common 7 vowel inventories: i e y u i u o e o ε a a c Figure by MIT OpenCourseWare.

Too many high non-peripheral vowels The excess of central vowels arise because measuring distinctiveness in terms of distance in formant space gives too much weight to differences in F2. In general, languages have more F1 contrasts than F2 contrasts. Why are F1 differences more distinct than F2 differences? One factor: auditory sensitivity to frequency (next slide). But L&L already took this into account - mel scaled formant frequencies.

Italian vowels F2 (Hz) 2500 2300 2100 1900 1700 1500 1300 1100 900 700 500 i e o u 200 400 600 24 a 800 20 Frequency (Bark) 16 12 8 25 23 F2 (E) 21 ERB scales 19 17 15 6 4 0 0 1 2 3 4 5 6 7 8 9 10 Frequency (khz) Figure by MIT OpenCourseWare. Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 8 10 12 E(F1) 14 16

Too many high non-peripheral vowels Recent work by Diehl, Lindblom and Creeger (2003) suggests that the greater perceptual significance of F1 probably follows from the higher intensity of F1 relative to F2. F1 should be more salient auditorily and more robust to noise. Sound pressure level (db/hz) 40 20 0 level (db/hz) Sound pressure 40 20 0 0 Frequency (Hz) 3500 0 Frequency (Hz) 3500

Too many high non-peripheral vowels New simulations of 7 vowel system by Diehl, Lindblom and Creeger (2003) incorporate background noise perceptual distance is calculated as difference between auditory spectra. Second formant frequency (khz) 2.5 2.0 1.5 1.0 7 7.5.2.4.6.8.2.4.6.8 First formant frequency (khz) Figure by MIT OpenCourseWare. Adatped from Diehl, R. L., B. Lindblom, and C. P. Creeger. "Increasing Realism of Auditory Representations Yields Further Insights Into Vowel Phonetics." Proceedings of the 15th International Congress of Phonetic Sciences. Vol. 2. Adelaide, Australia: Causal Publications, 2003, pp.1381-1384.

The corner vowels [i, a, u] Considerations of formant intensity might also help to account for some exceptions to the generalization that every language includes the corner vowels [i, a, u]. L&L predict that this should be the case, and most languages do include all three, but a number of languages lack [u]: [i, a, o], e.g. Piraha, Axeninca Campa [i, e, a, o], e.g. Navajo, Klamath [i, e, a, o, ], e.g. Tokyo Japanese In general F1 is more intense where it is higher, and this also raises the intensity of all higher formants. In [u], both F1 and F2 are low, resulting in a low intensity vowel, with low intensity F2.

Too few interior vowels When an inventory has mid vowels [e, o] and front rounded vowel [y], it often has mid front [ø] as well (Finnish, German, French, etc) L&L predict that interior vowels only appear with 10 or more vowels. The absence of interior vowels [ə, ø] is a result of the way in which overall distinctiveness is calculated. Each vowel contributes to E based on its distance from every other vowel. Interior vowels have a high cost because they are relatively close to all the peripheral vowels. Perhaps the measure of distinctiveness, E, can be improved on. i e y u i u o e o a a Figure by MIT OpenCourseWare.

Alternative measures of distinctiveness L&L s measure E is based on an analogy to dispersion of charged particles - it is not derived from anything based on vowel perception. It has the important property that distinctiveness cost increases more rapidly as two vowels become closer - 1/r ij 2 I.e. vowels are only likely to be confused if they are quite similar. Likelihood of confusion drops of quickly as distance increases. But perhaps 1/r ij2 doesn t drop off quickly enough - the lack of interior vowels results from giving too much weight to vowel pairs that are not very close. An alternative (Flemming 2005): only consider the closest pair of vowels in the inventory. Compromise (to be explored): 1/r ijn, n > 2.

Alternative measures of distinctiveness Maximize the minimum distance (Flemming 2005) F2 (Bark) 14 12 10 8 6 2 2.5 3 3.5 4 4.5 5 5.5 6 stressed unstressed

Problems with Adaptive Dispersion Specific instantiations of the model have made specific incorrect predictions (but some of the broad predictions are correct and models are improving). The model answers an inobvious question: Given N vowels, what should they be? - what determines the size of inventories? TAD predicts a single best inventory for each inventory size. Why would languages have sub-optimal inventories? The unattested inventories shown earlier are obviously very porrly dispersed, but there are a variety of attested inventory patterns for any given number of vowels.

Extending Adaptive Dispersion If perceptual distinctiveness is important in shaping vowel inventories, then it should play a similar role in shaping consonant inventories. It is harder to develop quantitative models in this area because it is less clear what the perceptual dimensions are. Especially because many consonants cannot be treated as static, e.g. stops. Note that this is an issue for vowels also - how do diphthongs and vowel duration contrasts fit into the model?