Speech Synthesis by Articulatory Models

Similar documents
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Consonants: articulation and transcription

Phonetics. The Sound of Language

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Audible and visible speech

Speaker recognition using universal background model on YOHO database

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speaker Recognition. Speaker Diarization and Identification

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Speech Emotion Recognition Using Support Vector Machine

Proceedings of Meetings on Acoustics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

WHEN THERE IS A mismatch between the acoustic

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Body-Conducted Speech Recognition and its Application to Speech Support System

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

THE RECOGNITION OF SPEECH BY MACHINE

A study of speaker adaptation for DNN-based speech synthesis

Speaker Identification by Comparison of Smart Methods. Abstract

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Segregation of Unvoiced Speech from Nonspeech Interference

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Human Emotion Recognition From Speech

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

age, Speech and Hearii

Rhythm-typology revisited.

Voice conversion through vector quantization

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

On the Formation of Phoneme Categories in DNN Acoustic Models

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Expressive speech synthesis: a review

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Seminar - Organic Computing

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED MECHANICS MET 2025

Introduction to Simulation

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Universal contrastive analysis as a learning principle in CAPT

Edinburgh Research Explorer

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Lecture 10: Reinforcement Learning

9 Sound recordings: acoustic and articulatory data

ACCOUNTING FOR LAWYERS SYLLABUS

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Evolutive Neural Net Fuzzy Filtering: Basic Description

Beginning primarily with the investigations of Zimmermann (1980a),

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Major Milestones, Team Activities, and Individual Deliverables

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Automatic Pronunciation Checker

Carter M. Mast. Participants: Peter Mackenzie-Helnwein, Pedro Arduino, and Greg Miller. 6 th MPM Workshop Albuquerque, New Mexico August 9-10, 2010

Learning Methods in Multilingual Speech Recognition

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

Lecture 9: Speech Recognition

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Robot manipulations and development of spatial imagery

M55205-Mastering Microsoft Project 2016

Speech Recognition at ICSI: Broadcast News and beyond

Biology and Microbiology

Software Maintenance

Word Stress and Intonation: Introduction

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Contact: For more information on Breakthrough visit or contact Carmel Crévola at Resources:

Timeline. Recommendations

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

A Hybrid Text-To-Speech system for Afrikaans

Reinforcement Learning by Comparing Immediate Reward

SARDNET: A Self-Organizing Feature Map for Sequences

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Speaking Rate and Speech Movement Velocity Profiles

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

The Strong Minimalist Thesis and Bounded Optimality

Mandarin Lexical Tone Recognition: The Gating Paradigm

Lecture 1: Machine Learning Basics

Speech Recognition by Indexing and Sequencing

Statewide Framework Document for:

Modeling function word errors in DNN-HMM based LVCSR systems

Statistical Parametric Speech Synthesis

Python Machine Learning

Edinburgh Research Explorer

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Automatic segmentation of continuous speech using minimum phase group delay functions

Hi I m Ryan O Donnell, I m with Florida Tech s Orlando Campus, and today I am going to review a book titled Standard Celeration Charting 2002 by

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

COMMUNICATION DISORDERS. Speech Production Process

Transcription:

Speech Synthesis by Articulatory Models Advanced Signal Processing Seminar Helmuth Ploner-Bernard hamlet@sbox.tugraz.at Speech Communication and Signal Processing Laboratory Graz University of Technology November 12, 2003 p.1/39

Overview Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.2/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.3/39

Introduction Articulatory Models Fields of application (Most natural sounding) Speech synthesis Low bit-rate coding Speech recognition Understanding of human speech production Attempt to describe the actual speech production mechanisms Set of slowly time-varying physiological parameters November 12, 2003 p.4/39

Introduction Knowledge of... Acoustics Mechanics Physiology Linguistics Signal Processing Phonetics November 12, 2003 p.5/39

Introduction How does speech synthesis with articulatory models work? Articulatory Parameters Articulatory Model Area functions Articulatory Synthesizer Time domain speech signal Source-tract interaction can be accounted for quite easily November 12, 2003 p.6/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.7/39

Articulators (Speech-Organs) Oral cavity Nasal cavity Source-filter model Excitation Vocal tract does the filtering Pharynx by Prof. W. Hess Velum Glottis Palate Lips Tongue Jaw Acoustic differences between sounds from different manners and places of articulation November 12, 2003 p.8/39

(Co-)Articulation Articulation of an (isolated) phoneme involves Critical articulators, essential for correct production Non-critical articulators, place and manner unspecified Co-articulation in fluent speech Target positions of articulators strongly affected by each other Dependent on phonetic context November 12, 2003 p.9/39

(Co-)Articulation Associate priorities with parameters of articulatory model and let your controller exploit them Incorporate realistic physiological and dynamic constraints (cf. functional models) more natural sounding speech November 12, 2003 p.10/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.11/39

Wave Propagation Acoustic theory of speech production by FANT Vocal tract acoustic tube Infinitely high sound impedance, rigid walls Lossless planar wave propagation governed by WEBSTER s horn equation: 2 v x 2 + 1 A x... Direction of traveling wave v... Sound particle velocity da dx t... Time v x = 1 2 v c 2 t 2 c... Velocity of wave propagation A... Area function, wait until next slide November 12, 2003 p.12/39

Wave Propagation Area function Cross-sectional areas as a function of position between glottis and lips Time-varying shape, depending on specific positions of articulators (figure by Prof. W. Hess) November 12, 2003 p.13/39

Wave Propagation Neutral vowel /@/: assume A(x,t) const x,t Cylindrical acoustic tube Resonance frequencies f k at f k = (2k 1)c 4l, k = 1, 2,.... l is the total length of the vocal tract For a male speaker f k 500, 1500,... Hz Comparable f k s for bent pipes November 12, 2003 p.14/39

Wave Propagation Horn equation cannot be solved for arbitrary area function Changes in vocal tract shape lead to changes in Eigenfrequencies November 12, 2003 p.15/39

Wave Propagation Horn equation cannot be solved for arbitrary area function Changes in vocal tract shape lead to changes in Eigenfrequencies At f = 3.5 khz first cross-modes in vocal tract most of the energy in speech signals concentrated in region below this frequency November 12, 2003 p.15/39

Wave Propagation Horn equation cannot be solved for arbitrary area function Changes in vocal tract shape lead to changes in Eigenfrequencies At f = 3.5 khz first cross-modes in vocal tract most of the energy in speech signals concentrated in region below this frequency Nasal cavity separate tube of fixed length parallel to the vocal tract November 12, 2003 p.15/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.16/39

The Acoustic Tube Model Starting point: Short acoustic tube of constant cross-sectional area The horn equation 2 v x 2 + 1 A da dx v x = 1 2 v c 2 t 2 November 12, 2003 p.17/39

The Acoustic Tube Model Starting point: Short acoustic tube of constant cross-sectional area The horn equation 2 v x 2 + 1 A da dx can be simplified to the form v x = 1 2 v c 2 t 2 2 v x = 1 2 v 2 c 2 t 2 November 12, 2003 p.17/39

The Acoustic Tube Model Equation has a general solution of the form ( u(x,t) = u f t x ( ) u b t + x ) c c where u = va is the volume velocity Combination of two waves traveling in opposite directions forward backward November 12, 2003 p.18/39

The Acoustic Tube Model (figure by Prof. W. Hess) FANT chooses 2-4 sections of variable length Approximate continuous area function A by concatenation of homogeneous acoustic tubes At junctions, part of the traveling wave is reflected r k = A k 1 A k A k 1 + A k r k reflection coefficient November 12, 2003 p.19/39

The Acoustic Tube Model Toward a digital implementation, convenient to take equidistant samples of A(x) Delay through each segment (figure by Prof. W. Hess) τ = x c November 12, 2003 p.20/39

The Acoustic Tube Model (figure by Prof. W. Hess) KELLY-LOCHBAUM structure About 20 segments Idealized, lossless model November 12, 2003 p.21/39

The Acoustic Tube Model Losses In reality, losses occur due to Resonances of yielding walls Viscous and thermal losses along the path of propagation add multipliers Radiation at the lips insert additional segment in front of the lips Freeze delay τ to any given sampling interval Wave digital filters November 12, 2003 p.22/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.23/39

Articulatory Models Static Vocal tract described in terms of area functions Example shows nine-parameter model Motion is succession of stationary shapes November 12, 2003 p.24/39

Articulatory Models Dynamic COKER s model Set up equation of motion for every articulator Articulators are elastic Have masses and an inertia Constraints regarding positions, velocities and accelerations November 12, 2003 p.25/39

We are... here! Introduction Articulators and (Co-)Articulation Sound Wave Propagation in the Vocal Tract The Acoustic Tube Model Articulatory Models The Inverse Problem of Parameter Estimation November 12, 2003 p.26/39

Parameter Estimation (1) Inverse problem Acquire model parameters directly or indirectly from speech signal Most difficult Non-unique, i. e. more than one vocal tract shape can produce signal with identical spectrum November 12, 2003 p.27/39

Parameter Estimation (2) Required: Good acoustic matching Smooth evolution of area functions or articulatory parameters Anatomical feasibility Most methods are unable to determine vocal tract length November 12, 2003 p.28/39

Parameter Estimation MRI (1) Most intuitive way Measure vocal tract shape directly Several scans necessary for 3D-model (how can we represent /l/ with mid-sagittal area functions?) Much signal processing to be done here Costly, time consuming and noisy November 12, 2003 p.29/39

Parameter Estimation MRI (2) November 12, 2003 p.30/39

Parameter Estimation LPC Simple, cheap method Evaluate reflection coefficients from LEVINSON-DURBIN algorithm for Linear Predictive Coding Characterize an idealized acoustic tube model Obtained from real world lossy signals Inaccurate results November 12, 2003 p.31/39

Parameter Estimation Impedance Acoustic impedance measurement Special acoustic volume velocity impulse sent toward the lips Shaped in vocal tract, reflected at the closed glottis Cheap, fast, for many shapes What about the nasal cavity? How to account for losses November 12, 2003 p.32/39

Parameter Estimation ABS ABS: Analysis by Synthesis Method for automated parameter identification from natural utterances Algorithm: Extract descriptive parameters from signal Look up best matching articulatory parameters in codebook Re-synthesize with articulatory parameter set Compare re-synthesized signal to target speech signal (original) Iteratively optimize parameters November 12, 2003 p.33/39

Parameter Estimation ABS Segmentation Phoneme basis, variable length Fixed frame lengths Time alignment, pitch synchronous analyses to avoid influence of glottal excitation Descriptive parameters LPC-coefficients Mel frequency cepstral coefficients Coefficients of any spectral transformation November 12, 2003 p.34/39

Parameter Estimation ABS Remember: Mapping is non-unique Find other shapes of vocal tract according to a cost function Components of cost function Distance between spectra Smoothness of area function Smooth evolution of parameters between adjacent frames Signal energy Improvement: multi-frame optimization November 12, 2003 p.35/39

Optional: Generation of the codebook Random sampling Iterate through various configurations of articulatory parameters Store along with their corresponding descriptive parameters Huge amount of items Unnecessary data not used in language or by a speaker Inching approach Start out at extreme articulatory parameters Interpolations on trajectories in articulatory space Attention to sparsely populated areas November 12, 2003 p.36/39

Summary Wave propagation in the vocal tract Area function responsible for different sounds Co-articulation with priority parameters Non-unique acoustic-to-articulatory mapping Tube model, KELLY-LOCHBAUM structure, WDF Static models, dynamic models Parameter estimation: MRI, LPC, Impedance measurement, ABS November 12, 2003 p.37/39

References http://www.ikp.uni-bonn.de/dt/lehre/materialien/aap/aap_1f.pdf http://www.radiologyinfo.org/ J.W. Devaney and C. C. Goodyear. A comparison of acoustic and magnetic resonance imaging techniques in the estimation of vocal tract area functions. International Symposium on Speech, Image Processing and Neural Networks, pages 575 578, April 1994. A. R. Greenwood and C. C. Goodyear. Articulatory speech synthesis using a parametric model and a polynomial mapping technique. International symposium on speech, image processing and neural networks, pages 595 598, April 1994 S. Parthasarathy and C.H. Coker. Phoneme-level parametrization of speech using an articulatory model. International Conference on Acoustics, Speech and Signal Processing, pages 337 340, April 1990 Peter Vary, Ulrich Heute, and Wolfgang Hess. Digitale Sprachsignalverarbeitung. B.G. Teubner Stuttgart, 1998 November 12, 2003 p.38/39

Thank you for your attention! Have a look at the accompanying paper on the web! November 12, 2003 p.39/39