Vocal Tract Acoustics

Similar documents
Phonetics. The Sound of Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Consonants: articulation and transcription

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

age, Speech and Hearii

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

THE RECOGNITION OF SPEECH BY MACHINE

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Speaker Recognition. Speaker Diarization and Identification

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Segregation of Unvoiced Speech from Nonspeech Interference

Audible and visible speech

Speech Emotion Recognition Using Support Vector Machine

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

9 Sound recordings: acoustic and articulatory data

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Speaker recognition using universal background model on YOHO database

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Consonant-Vowel Unity in Element Theory*

Body-Conducted Speech Recognition and its Application to Speech Support System

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Proceedings of Meetings on Acoustics

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Beginning primarily with the investigations of Zimmermann (1980a),

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

On the Formation of Phoneme Categories in DNN Acoustic Models

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Voice conversion through vector quantization

Speaking Rate and Speech Movement Velocity Profiles

Journal of Phonetics

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Speaker Identification by Comparison of Smart Methods. Abstract

Speech/Language Pathology Plan of Treatment

Radical CV Phonology: the locational gesture *

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

Quarterly Progress and Status Report. Sound symbolism in deictic words

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

Contrasting English Phonology and Nigerian English Phonology

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Clinical Review Criteria Related to Speech Therapy 1

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Learners Use Word-Level Statistics in Phonetic Category Acquisition

One major theoretical issue of interest in both developing and

Expressive speech synthesis: a review

Robot manipulations and development of spatial imagery

Mandarin Lexical Tone Recognition: The Gating Paradigm

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

Articulatory Distinctiveness of Vowels and Consonants: A Data-Driven Approach

Universal contrastive analysis as a learning principle in CAPT

COMMUNICATION DISORDERS. Speech Production Process

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Ansys Tutorial Random Vibration

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Course Law Enforcement II. Unit I Careers in Law Enforcement

Complexity in Second Language Phonology Acquisition

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

The Indian English of Tibeto-Burman language speakers*

Human Emotion Recognition From Speech

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Circuit Simulators: A Revolutionary E-Learning Platform

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

GEMINATION STRATEGIES IN L1 AND ENGLISH PRONUNCIATION OF POLISH LEARNERS

Edinburgh Research Explorer

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Down syndrome phonology: Developmental patterns and intervention strategies

Rhythm-typology revisited.

Phonological and Phonetic Representations: The Case of Neutralization

Language Change: Progress or Decay?

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

All Systems Go! Using a Systems Approach in Elementary Science

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Provisional. Using ambulatory voice monitoring to investigate common voice disorders: Research update

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

The Acquisition of English Intonation by Native Greek Speakers

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Transcription:

Vocal Tract Acoustics R. D. Kent Journal of Voice 1993 Presented by Daniel Felps

Motivation This is an excellent paper to kick off speech recognition High level Overview of source-filter theory It introduces many common terms in speech processing (pitch, formant, LPC, spectrograms)

Time domain y(t) = sin(4t) + sin(12t) 3

Frequency domain

Laboratory instruments for speech analysis

Waterfall spectrogram

Wideband and Narrowband

Acoustic theory of speech production Source-filter theory proposed by Gunnar Fant in 1960 Breaks speech into 2 parts 1. Source Laryngeal voicing Turbulent noise Transient 2. Filter

Source-filter theory for vowels

Source All vowels are voiced Periodic source

Filter The filter is defined by the resonances of the vocal tract

Single tube resonances F n = 2n 1 ( ) 4l c Average male vocal tract is 17 cm long This makes speech recognition tough

Duck Call How do they work? AH EE

Vowel formant patterns F1 frequency generally varies with the up and down tongue movement F2 frequency generally varies with the front to back tongue movement

Relating vocal tract shape for vowels to acoustic output Constriction parameterization 1. Size and location of constriction 3. Ratio of mouth opening to length A nomogram is graphical computation device (slide rule)

Statistical relationship 1. Tongue (2) 3. Lip 4. Jaw I would guess these would be the first 4 principal components

Articulatory relationship Understand the way the tongue, lips, or jaw effect the acoustic signal Quantal nature of articulation Nonlinearities exist between vocal tract configuration and acoustic signal

Source-filter theory for consonants Each category of consonants must be looked at individually Consonants have lower sound levels than vowels, but contribute significantly to intelligibility

Nasals /n/ Nasals involve blocking the mouth completely and letting the air come out of your nose Antiformants

Fricatives /f/ Fricatives involve letting the air slide through a narrow opening in the mouth Generate turbulence noise

Stops /p/ Stops must be described with cues 1. Stop gap 2. Release burst 3. Formant transitions

Affricates /t / Affricates begin as stops and slide into fricatives, and hence are represented as a stop followed by a fricative

Liquids /l/ Liquids are sometimes called "laterals" because of the sideways motion involved in producing them Resembles nasals and has antiformants

Glides /w/ Also known as a semi-vowel Formant patterns change gradually

Acoustic measures of speech and voice Numerous features can be extracted from a speech signal Table 2 compares the abilities of techniques to extract certain measurements

Measurements Voice onset time is the length of time that passes between when a consonant is released and when voicing begins. Voicing energy is the ratio of the maximum amplitude value of a glottal cycle at the center of the fricative to the maximum amplitude value of a glottal cycle at the center of the following vowel. Amplitude rise time is the time between 10 and 90% of the peak amplitude.

Jitter is the average absolute difference between consecutive periods, divided by the average period. Shimmer is the average absolute difference between the amplitudes of consecutive periods, divided by the average amplitude.

Prospects for automated, multidimensional analysis The paper gives the example of the difference in dysarthric speech We will see many more applications this semester

Still a mystery?

What can we tell? We know it is voiced since pitch harmonics are present The speaker is probably female, since the frequency of the pitch harmonics looks to be around 200 Using Table 1, and the F1 and F2 values, we can guess the vowel and therefore the position of the tongue

Last slide Hopefully we better understand vocal tract acoustics from 3 perspectives 1. Acoustic theory of speech production Source-filter 2. Methods for acoustic analysis LPC, spectrogram 3. Acoustic measures Formants, pitch Any questions?