CONTACT Review Meeting 1 Genova, November 14-15, 2006

Similar documents
Consonants: articulation and transcription

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Proceedings of Meetings on Acoustics

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Beginning primarily with the investigations of Zimmermann (1980a),

Audible and visible speech

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Mandarin Lexical Tone Recognition: The Gating Paradigm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Lecture 1: Machine Learning Basics

Phonetics. The Sound of Language

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speaker recognition using universal background model on YOHO database

Speaker Identification by Comparison of Smart Methods. Abstract

A study of speaker adaptation for DNN-based speech synthesis

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Mathematics Success Level E

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Quarterly Progress and Status Report. Sound symbolism in deictic words

Speech Emotion Recognition Using Support Vector Machine

Speaking Rate and Speech Movement Velocity Profiles

Voice conversion through vector quantization

Articulatory Distinctiveness of Vowels and Consonants: A Data-Driven Approach

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Math Grade 3 Assessment Anchors and Eligible Content

On-the-Fly Customization of Automated Essay Scoring

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Algebra 2- Semester 2 Review

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Statewide Framework Document for:

Multiplication of 2 and 3 digit numbers Multiply and SHOW WORK. EXAMPLE. Now try these on your own! Remember to show all work neatly!

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Corpus Linguistics (L615)

Python Machine Learning

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

PROMOTING QUALITY AND EQUITY IN EDUCATION: THE IMPACT OF SCHOOL LEARNING ENVIRONMENT

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Mathematics subject curriculum

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Hardhatting in a Geo-World

Grade 6: Correlated to AGS Basic Math Skills

A Neural Network GUI Tested on Text-To-Phoneme Mapping

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Syllabus ENGR 190 Introductory Calculus (QR)

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

Probability and Statistics Curriculum Pacing Guide

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Analysis of Enzyme Kinetic Data

Mathematics process categories

Cal s Dinner Card Deals

Edinburgh Research Explorer

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Unit 3: Lesson 1 Decimals as Equal Divisions

Functional Skills Mathematics Level 2 assessment

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

UNIT ONE Tools of Algebra

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Robot manipulations and development of spatial imagery

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Phonological Processing for Urdu Text to Speech System

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

age, Speech and Hearii

Use and Adaptation of Open Source Software for Capacity Building to Strengthen Health Research in Low- and Middle-Income Countries

learning collegiate assessment]

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Speech/Language Pathology Plan of Treatment

Body-Conducted Speech Recognition and its Application to Speech Support System

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

THE RECOGNITION OF SPEECH BY MACHINE

Office Hours: Mon & Fri 10:00-12:00. Course Description

Honors Mathematics. Introduction and Definition of Honors Mathematics

Measurement and statistical modeling of the urban heat island of the city of Utrecht (the Netherlands)

NCEO Technical Report 27

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Helping Your Children Learn in the Middle School Years MATH

Calibration of Confidence Measures in Speech Recognition

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Probabilistic Latent Semantic Analysis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Human Emotion Recognition From Speech

Transcription:

From movements to sound Contributions to building the BB speech production system Lisa Gustavsson, Björn Lindblom, Francisco Lacerda & Elisabet Eir Cortes Summary In terms of anatomical geometry the infant Vocal Tract undergoes significant change during development. This research note reports an attempt to reconstruct an infant VT from adult data. Comparable landmarks were identified on the fixed structures of adult articulatory lateral profiles (obtained from X-ray images) and matching infant profiles (obtained from published data in the literature, Sobotta [Putz & Pabst 2001, and personal communication from author Prof. Dr. med. R. Pabst]. The x-coordinates of the infant landmarks could be accurately derived by a linear scaling of the adult data whereas the y-values required information on both the x- and the y-coordinates of the adult. These scaling rules were applied to about 400 adult articulatory profiles to derive a set of corresponding infant articulations. A Principal Components Analysis was performed on these shapes to compare the shapes of the infant and adult articulatory spaces. As expected from the scaling results the infant space is significantly compressed in relation to the adult space suggesting that the main articulatory degree of freedom for the child is jaw opening. This finding is in perfect agreement with published descriptions of the phonetics of early vocalizations. 1

Figure 1 Example of X-ray image with traced contours indicated. Points on the teeth, hard palate, posterior pharyngeal wall and laryngeal structures were selected for comparison with the corresponding points for the infant vocal tract. Analyses of adult X-ray data Our X-ray data come from a 20-second film (figure above) of an adult Swedish male speaker [Branderud et al 1998]. The speech sample consists of about 20 test words containing consonants such as [b], [p], [d], [g], [l], [k]. [r], [j], [h], [n], [s], [t] and a representative sample of the Swedish long and short vowels. The images portray a midsagittal articulatory profile. They were sampled at 50 frames/sec. A total of about 400 frames were analyzed. Tracings of all acoustically relevant structures were made using the OSIRIS software package (University of Geneva). Using specially written software we converted the contours defined in Osiris into tables with x- and y-coordinates, calibrated in mm and corrected for head movements. For the tongue, the contours were further processed by redefining them in a jaw-based coordinate system and by resampling them at 25 equidistant fleshpoints. This resampling was motivated by our choice of quantification method, Principle Components Analysis. 2

Figure 2 Image used to determine the location of landmark points in infant anatomy. Scales indicated by rulers. With permission from Putz / Pabst: Sobotta, Atlas der Anatomie des Menschen 2005 Elsevier GmbH, Urban & Fischer Verlag München The adult and infant vocal tracts: A scaling experiment A scaling function of the vocal tract (adult-infant, infant-adult) was derived from tracings of midsagittal images of the infant-vt (figure above) and the x-ray images of the adult-vt (see Analyses adult x-ray data). Since there is a non-linear relationship between infant and adult VT:s, we needed to make accurate estimates of infant anatomical structures, rather than apply a simple linear reduction of an adult model. Using empirically determined scaling-functions we could transform a set of adult articulations (see Analyses adult x-ray data) to BabyBot articulatory settings and define its articulatory space (see Articulatory parameters: Principal components analysis of X-ray data). 3

The x- and y-coordinates for the BabyBot VT were derived using: X BB (horiz) = 0.765*X adult Y BB (vert)=-0.43+0.32*y adult -0.15*X adult Figure 3 Black lines pertain to the fixed structures of an adult articulatory profile. The red contours represent the corresponding structures in an infantlike vocal tract obtained by applying the two equations specified next to the diagram to the adult landmarks. The relationship of the front-back dimension between the infant-vt and the adult-vt was more or less linear which allowed us to use only one variable (xcoordinates) to derive BabyBots x-coordinates. In the vertical dimension however, two independent variables were needed (x- and y-coordinates) to derive BabyBots y-coordinates. This is probably because one of the most critical aspects of the growing VT is the height of larynx, the pharyngeal dimensions are close to zero in an infant while it is one of the major areas in the adult-vt. Articulatory parameters: Principal components analyses The input to the PCA consisted of a matrix with columns corresponding to the 25 fleshpoints and rows containing information related to the individual tongue contours. Since the specification of each fleshpoint requires two numbers (x & y), there were twice as many rows as contours. Accordingly, the data fed into the PCA was a 822-by-25 matrix. This format had the convenience of automatically 4

sorting the PCA output into one set of horizontal weights (for the x coordinates) and one set of vertical weights (for the y coordinates). As earlier shown by several other phoneticians [Maeda 1990], the PCA provides considerable data reduction by quantifying input data in terms of a small set of building blocks, the PC s. Accordingly, the 25 fleshpoints of an observed shape, s(x), can be recovered by calculating s(i,x,v) = w1(i,v)*pc1(x) + w2(i,v)*pc2(x) + (1) where x is fleshpoint number, i identifies the contour/image, and v chooses between x or y coordinates. The PC(x) terms are underlying, numerically derived tongue shapes which, weighted by the w coefficients and summed, generate the contour under examination. The formula expresses the idea that any observed contour is a linear combination of a set of basic shapes. The accuracy of this quantitative description depends on how many PC s are used. Any degree of accuracy is possible in principle. For the present data, PC1 was found to account for 85.7 % of the variance. Two components achieved 96.3%. 5

Figure 4 The above figure presents the results of a PC analysis of adult and infant tongue shapes. Along the ordinate: the vertical component of PC1 (value of weight for y coordinate). Along the abscissa: the horizontal component of PC1 (value of weight for y coordinate). For the adult data the locations of [d] and [g] tokens and vowels are indicated. For comparison the infant-scaled version of the entire database is shown with red dots. As can be seen there is considerable compression particularly along the vertical dimension. A short-cut method of formant frequency derivation The first step of deriving the acoustic consequences of articulatory movements is to quantify the vocal tract shape in terms of an cross-sectional area function. This is done by measuring the cross-distance along the VT profile and then converting those distances into cross-sectional areas using empirically based rules. (see figure below). From the area function the formant frequencies of the articulation are then calculated. This requires a certain amount of computation. 6

Figure 5 The standard method of calculating the formant frequencies for an arbitrary articulatory configuration. The articulatory profile (top left) is first analyzed with respect to the cross-distances along the vocal tract. A crosssectional area (A) corresponding to a given distance (d) can be described in terms of power functions of the form A=α*d β. The area variations along the vocal tract, the so-called area function, is then used to compute the formant frequencies of the articulation. This fairly cumbersome procedure is the standard way of going from movement to sound. Using our X-ray data we have done some pilot work exploring the possibility of using empirical mapping rules to speed up and simplify the step from articulation to sound. Below we exemplify this method with the test utterance Johan spoken at loud voice. Pulse by pulse measurements were made of formant frequencies. From the X-ray analyses synchronized data are available on the time variations of the first two PC:s of the tongue contour, the degree of jaw opening, larynx height, vertical separation and protrusion of the lips. Using a multiple regression technique we derived predictions of each formant individually from linear combinations of the 7

above articulatory parameters. Very high numerical accuracy was obtained with absolute error scores below 2%. Figure 6 A short-cut method. Deriving formants from articulatory data using empirically established equations describing the relationship between formant frequencies and articulatory parametric tracks. 8

References Branderud P, Lundberg, H-J Lander J, Djamshidpey H, Wäneland I, Krull D & Lindblom B (1998): "X-ray analyses of speech: methodological aspects", in Fonetik 98: Papers presented at the Swedish Phonetics Conference, Stockholm University, 1998. Lindblom B (2003): "A numerical model of coarticulation based on a Principal Components analysis of tongue shapes", XVth International Congress of Phonetic Sciences, Barcelona, Spain. Maeda S (1990): Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model, in Speech Production and Speech Modeling, W Hardcastle & A Marchal (eds), pp 131-149, Dordrecht:Kluwer. 1990. Putz R & Pabst R (2001) Sobotta Atlas of Human Anatomy Head, Neck, Upper Limb, Putz R & Pabst R (eds), Volume 1, 13 th edition, Urban&Fisher. 9