David Weenink. First semester 2007

Similar documents
Phonetics. The Sound of Language

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Consonants: articulation and transcription

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

THE RECOGNITION OF SPEECH BY MACHINE

age, Speech and Hearii

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Speaker recognition using universal background model on YOHO database

Speaker Recognition. Speaker Diarization and Identification

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

On the Formation of Phoneme Categories in DNN Acoustic Models

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Consonant-Vowel Unity in Element Theory*

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Audible and visible speech

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

9 Sound recordings: acoustic and articulatory data

Radical CV Phonology: the locational gesture *

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Proceedings of Meetings on Acoustics

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Body-Conducted Speech Recognition and its Application to Speech Support System

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Quarterly Progress and Status Report. Sound symbolism in deictic words

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Segregation of Unvoiced Speech from Nonspeech Interference

Mandarin Lexical Tone Recognition: The Gating Paradigm

Expressive speech synthesis: a review

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Rhythm-typology revisited.

Voice conversion through vector quantization

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Universal contrastive analysis as a learning principle in CAPT

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Speaker Identification by Comparison of Smart Methods. Abstract

U IVERSIDADE FEDERAL DE SA TA CATARI A PROGRAMA DE PÓS-GRADUAÇÃO EM LETRAS/I GLÊS E LITERATURA CORRESPO DE TE. Mariane Antero Alves

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Provisional. Using ambulatory voice monitoring to investigate common voice disorders: Research update

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Contrasting English Phonology and Nigerian English Phonology

Speech/Language Pathology Plan of Treatment

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Self-Supervised Acquisition of Vowels in American English

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Different Task Type and the Perception of the English Interdental Fricatives

Complexity in Second Language Phonology Acquisition

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Language Change: Progress or Decay?

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Self-Supervised Acquisition of Vowels in American English

Journal of Phonetics

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

WHEN THERE IS A mismatch between the acoustic

Markedness and Complex Stops: Evidence from Simplification Processes 1. Nick Danis Rutgers University

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

A study of speaker adaptation for DNN-based speech synthesis

COMMUNICATION DISORDERS. Speech Production Process

VIEW: An Assessment of Problem Solving Style

Phonological and Phonetic Representations: The Case of Neutralization

Automatic segmentation of continuous speech using minimum phase group delay functions

Edinburgh Research Explorer

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Journal of Phonetics

Beginning primarily with the investigations of Zimmermann (1980a),

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Sound and Meaning in Auditory Data Display

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Statistical Parametric Speech Synthesis

The Indian English of Tibeto-Burman language speakers*

Speaking Rate and Speech Movement Velocity Profiles

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Standards for Members of the American Handwriting Analysis Foundation

Inquiry Space: Using Graphs as a Tool to Understand Experiments

Transcription:

Institute of Phonetic Sciences University of Amsterdam First semester 2007

The Speech Organs

Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations for acoustical attributes of sounds Relation vocal tract shape and formants Why are female formants higher than males Characteristics of nasal and oral sounds Fricative sounds Spectral slope of vowels

Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations for acoustical attributes of sounds Relation vocal tract shape and formants Why are female formants higher than males Characteristics of nasal and oral sounds Fricative sounds Spectral slope of vowels

Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations for acoustical attributes of sounds Relation vocal tract shape and formants Why are female formants higher than males Characteristics of nasal and oral sounds Fricative sounds Spectral slope of vowels

Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations for acoustical attributes of sounds Relation vocal tract shape and formants Why are female formants higher than males Characteristics of nasal and oral sounds Fricative sounds Spectral slope of vowels

Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations for acoustical attributes of sounds Relation vocal tract shape and formants Why are female formants higher than males Characteristics of nasal and oral sounds Fricative sounds Spectral slope of vowels

Production of Speech Main processes in production of speech Sound source (glottis +/ turbulent airstream) Shape of vocal tract Radiation from the mouth Energy losses

Production of Speech Main processes in production of speech Sound source (glottis +/ turbulent airstream) Shape of vocal tract Radiation from the mouth Energy losses

Production of Speech Main processes in production of speech Sound source (glottis +/ turbulent airstream) Shape of vocal tract Radiation from the mouth Energy losses

Production of Speech Main processes in production of speech Sound source (glottis +/ turbulent airstream) Shape of vocal tract Radiation from the mouth Energy losses

Production of Speech Main processes in production of speech Sound source (glottis +/ turbulent airstream) Shape of vocal tract Radiation from the mouth Energy losses Model: These processes are independent

Source-Filter Theory The Source-Filter theory models the production aparatus as two independent units: The source (the glottal source or noise generated at a constriction) The filter (resonances in the cavities of the vocal tract)

Source-Filter Theory The Source-Filter theory models the production aparatus as two independent units: The source (the glottal source or noise generated at a constriction) The filter (resonances in the cavities of the vocal tract)

Source-Filter Theory The Source-Filter theory models the production aparatus as two independent units: The source (the glottal source or noise generated at a constriction) The filter (resonances in the cavities of the vocal tract) A speech sound is the result of a source signal being filtered

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

From: (Rosenberg, 1971) Excitation of the vocal tract By volume velocity at glottis Is pulse-like (open and closed phase) Primarily because of rapid closure of glottis Slope at closure increases with increasing vocal effort Pitch or intensity then t open Waveform more sinusoidal Pitch or intensity then t open and slope-at-closure Damping of formants higher on open phase

Glottal flow Glottal flow derivative 0 0 1 2 Time (normalized) 0 0 1 2 Time (normalized) Open phase Closed phase

Creating a Source from Pitch Targets Create PitchTier... source 0 0.15 Add point... 0 150 Add point... 0.15 100 To PointProcess To Sound (phonation)... 44100 0.9 0.05 0.7 0.03 3 4 x1=1000 y1=40 x2=8000 y2=40-3*12 b = 1/(x2-x1)*ln(y1/y2) Draw function... x1 x2 1000 40*exp(-b*(x-x1)) 1 0-1 0 0.15 Time (s) Sound pressure level (db/hz) 60 40 36 20 1000 0 8000 Frequency (Hz)

Creating a Noise Source For fricatives we need a noise source: Create Sound from formula... noise Mono 0 0.015 22050...randomGauss(0,0.2) 1 0-1 0 0.015 Time (s) Sound pressure level (db/hz) 40 20 0 0 8000 Frequency (Hz)

Tube Models for Vowels Curvature of tract can be neglected! Only cross-sectional area Diameters equal over large lengths of vocal tract Lossless Number of segments 1...

Tube Models for Vowels Curvature of tract can be neglected! Only cross-sectional area Diameters equal over large lengths of vocal tract Lossless Number of segments 1...

Tube Models for Vowels Curvature of tract can be neglected! Only cross-sectional area Diameters equal over large lengths of vocal tract Lossless Number of segments 1...

Tube Models for Vowels Curvature of tract can be neglected! Only cross-sectional area Diameters equal over large lengths of vocal tract Lossless Number of segments 1...

Tube Models for Vowels Curvature of tract can be neglected! Only cross-sectional area Diameters equal over large lengths of vocal tract Lossless Number of segments 1...

The 1-tube closed end: anti-node open end: node Closed-open tube For length l: (2n 1) λ 4 We use λf = c and obtain F n = (2n 1)c 4l Shorter length, higher formants! c = 340 m/s Male l = 0.17 m F n = 500, 1500, 2500, 3500,... Female l = 0.145 m F n = 586, 1759, 2931, 4103,...

Deductions from a Straight Tube Constriction at node/anti-node decreases/increases resonance frequency. at lip-end always a node so rounding causes lowering of all formants velar constriction at node of F 2 : lowering u

The 2-tube Equal or unequal section lengths: not all vowels can be simulated only some peripheral ones

The 4-tube of Fant (1960) the constriction area: 1 segment (fixed length: 0.05m) before and after the constriction: 2 segments the lips: 1 segment (fixed length: 0.01m) Three parameter model of Fant (1960) 1 1 distance of constriction from glottis 2 constriction area 3 lip opening Effect of lip-rounding was a lowering of F 2 : [i] vs [y] 1 G. Fant (1960),The Speech Production,Mouton: The Hague.

Fricatives Relation between vocal tract shape and speech waveform is obscure. Noise source location varies in vocal tract Limited resources of X-ray data for model validation Depend less on tract shape than vowels

Nasals Opening of an extra cavity: difficult to model Nasal formants: fixed tube somewhat longer than vocal tract Uvular + post-velar: Nasal formants at 300/400 +k800 Palatal to labial: anti-formants at c 4l mouth

[Rosenberg, 1971] A.E. Rosenberg (1971), Effect of glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am. (49), 583 590.