Sound and Music Science. Speech Production

Similar documents
Consonants: articulation and transcription

Phonetics. The Sound of Language

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Speaker Recognition. Speaker Diarization and Identification

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

age, Speech and Hearii

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

THE RECOGNITION OF SPEECH BY MACHINE

Contrasting English Phonology and Nigerian English Phonology

Speech Emotion Recognition Using Support Vector Machine

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Body-Conducted Speech Recognition and its Application to Speech Support System

On the Formation of Phoneme Categories in DNN Acoustic Models

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

Word Stress and Intonation: Introduction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

9 Sound recordings: acoustic and articulatory data

Audible and visible speech

Consonant-Vowel Unity in Element Theory*

Proceedings of Meetings on Acoustics

Quarterly Progress and Status Report. Sound symbolism in deictic words

Universal contrastive analysis as a learning principle in CAPT

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Human Emotion Recognition From Speech

Speaker recognition using universal background model on YOHO database

5.1 Sound & Light Unit Overview

Mandarin Lexical Tone Recognition: The Gating Paradigm

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

This Performance Standards include four major components. They are

CEFR Overall Illustrative English Proficiency Scales

Journal of Phonetics

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

Lesson 1 Taking chances with the Sun

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Speaker Identification by Comparison of Smart Methods. Abstract

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Segregation of Unvoiced Speech from Nonspeech Interference

Sound and Meaning in Auditory Data Display

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Provisional. Using ambulatory voice monitoring to investigate common voice disorders: Research update

Career Series Interview with Dr. Dan Costa, a National Program Director for the EPA

COMMUNICATION DISORDERS. Speech Production Process

Enduring Understandings: Students will understand that

TEACH 3: Engage Students at All Levels in Rigorous Work

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Voice conversion through vector quantization

SARDNET: A Self-Organizing Feature Map for Sequences

Speaking Rate and Speech Movement Velocity Profiles

Radical CV Phonology: the locational gesture *

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

COORDINATING SKINNER SPEECH AND LINKLATER VOICE FOR THE BEGINNING ACTOR DAVID L. WYGANT, B.F.A. A THESIS THEATRE ARTS

Beginning primarily with the investigations of Zimmermann (1980a),

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Edinburgh Research Explorer

The Indian English of Tibeto-Burman language speakers*

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

A Believable Accent: The Phonology of the Pink Panther

Multilingual Speech Data Collection for the Assessment of Pronunciation and Prosody in a Language Learning System

Phonological and Phonetic Representations: The Case of Neutralization

Seminar - Organic Computing

Speech/Language Pathology Plan of Treatment

Ansys Tutorial Random Vibration

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

Language Change: Progress or Decay?

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Case study Norway case 1

Journal of Phonetics

Robot manipulations and development of spatial imagery

Automatic segmentation of continuous speech using minimum phase group delay functions

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Stages of Literacy Ros Lugg

83 Fellows certified in 2016! Currently 161 Fellows registered Global Online Fellowship In Head & Neck Surgery and Oncology

Transcription:

Sound and Music Science Speech Production

Learning Objectives How human vocal organ makes speech sounds How speech sounds are the product of the source, the filter and the radiation efficiency Speech articulation by different parts of the vocal tract Formants as resonances of the vocal tract How the glottis and the vocal tract are studied

The Vocal Organs It spans the oral and nasal cavities and stretches to the lungs and the diaphragm The lungs serve as reservoir of air and a source energy In speaking, air is forced form the lungs through the larynx into the three main cavities: the pharynx, the nasal and the oral cavities

The Vocal Organs continued

The Vocal Organs continued Air exits through the nose and mouth Air can be inhaled and exhaled without much sound To produce speech sounds, the flow of air is interrupted by the vocal cords or by constrictions in the vocal tract (made by the tongue or lips)

The Larynx and the Vocal Folds The most important sound source is the larynx, which contains the vocal folds or vocal cords The larynx is constructed mainly of cartilages The thyroid is one of these cartilages that forms the Adam s apple

The Larynx

The Larynx

Larynx and Vocal Folds continued The vocal folds are folds of ligament extending from the thyroid cartilage at the front to the arytenoid cartilages at the back The arytenoid cartilages are movable and control the size of the V-shaped opening between the vocal cords (glottis) Open for breathing and closed for sound production

Control of the glottal Opening

Glottal Openings A sudden opening of the vocal folds would produce a light cough or a glottal stop (a harsh h ) They are completely opening for unvoiced consonants such as s, sh, and f An intermediate opening produces a regular h sound

Glottal Openings continued By rapidly opening and closing the folds, air flow is modulated as the rapid vibration produces a buzzing sound from which vowels and voiced consonants are created There are analogous functions of the folds and the lips as in the production of p and f sounds

Vibration of the Folds The rate of vibration is determined by the mass and tension of the folds Pressure and velocity of the air do contribute in a smaller way They are typically longer and heaver in an adult male than a female and vibrate a t a lower frequency Typical speech range is one octave and singing range is two octaves

Phases of a vocal fold vibration

Vibration Modes of the Folds In normal mode, they open and close completely during the cycle and generate puffs of air that are roughly triangular in shape Open phase mode, the folds do not close completely over their entire length, so air flow does not go to zero This produces a breathy voice

Vibration Modes of the Folds continued In the third mode, very little air passes in short puffs giving rise to a creaky voice In a fourth, (head voice or falsetto) is normally not used in speech

Opening of the Vocal Folds The vocal folds are opened by air pressure in the trachea which blows them upward and outward When air velocity increases, the pressure decreases between then and they are pulled back together by the Bernoulli force

Miscellaneous facts of the Folds The folds are essential in the production of a whisper Speaking louder is mostly determined by the rate of glottal closure as this produces higher harmonics in the glottal airflow spectrum, and these harmonics excite resonances of the vocal tract leading to a buildup in the sound level

The Vocal Tract Responsible for transforming buzzes and whooshes of the vocal fold and other sources into intricate, subtle sounds of speech It can be thought of as a tube extending from the vocal folds to the lips, with a side branch leading to the nasal cavity Typical length of 17cm

The Vocal Tract continued

The Pharynx The pharynx connects the larynx with the oral cavity Its shape is not easily varied, though its length can be adjusted slightly by raising or lowering the larynx on one end, and the soft palate on the other end The soft palate acts as a valve to isolate or connect the nasal cavity to the pharynx

The Epiglottis Since food also passes through the pharynx on its way to the esophagus, the epiglottis serves as a valve to prevent food from going into the trachea It serves to acoustically isolate the esophagus from the larynx The epiglottis and the false vocal cords appear to play no significant role in speech production

Nasal Cavity Because of its fixed dimensions it is virtually untunable The soft palate controls the air flow from the pharynx to the nasal cavity If the soft palate is lowered, air and sound waves flow into the nasal cavity and a nasal effect results from resonance within the nasal cavity

Oral Cavity Because its size and shape can be varied, the oral cavity is probably the most important single part of the vocal tract The tongue flexibility along with the movement of the lips, cheeks and teeth change the size, shape and acoustics of the oral cavity

The Oral Cavity continued The lips control the size and shape of the mouth opening through which sound is radiated The mouth radiates more efficiently at higher frequencies where the wavelength approaches the size of the opening This can be seen in a 6 db per octave rise in radiation efficiency

Articulation of Speech Each syllable is made of one or more phonemes Phonemes are either vowel or consonant Vowels are always voiced (with vibrations of the vocal folds) Consonants are either voiced or unvoiced

Articulation of Speech continued There are 12 to 21 vowel sounds in English (depending on which speech scientist you talk to) Opinions vary as to whether it is a pure vowel sound rather than a diphthong (a combination of two or more vowel sounds into one phoneme)

Vowels of American English

Articulation of Speech continued Consonants are classified according to their manner of articulation: Plosive or stop consonants (p, b, t, etc) are produced by blocking the flow of air somewhere in the vocal tract (usually the mouth) and releasing the pressure rather suddenly Fricatives (f, s, sh, etc) are made by constricting the airflow to produce turbulence

Articulation of Speech continued Nasals (m, n, ng) are made by lowering the soft palate to connect the nasal cavity to the pharynx and then blocking the mouth cavity at some point along its length Liquids (r, l) are produced by raising the tip of the tongue while the oral cavity is somewhat constricted Semivowel or glide consonants (w, y) are produced by keeping the vocal tract briefly in a vowel position then changing it rapidly to a vowel sound that follows

Articulation of Speech continued Consonants are further classified according to their place of articulation, primarily the lips (labial), teeth (dental), gums (alveolar), palate (palatal) and glottis (glottal), and lips and teeth (labiodental) There are 24 consonant sounds in English

Consonants

Formants: Resonances of the Vocal Tract Formants are the peaks that occur in the sound spectra of the vowels, that are independent of the pitch They appear as envelopes that modify the amplitudes of the various harmonics of the source sound Each formant corresponds to one or more resonances in the vocal tract

Formants continued The frequency of the formants are virtually independent of the source spectrum

Effect of Formants on Sound

Formant Frequencies F 1 F 2 F 3

Prosodic Features of Speech Prosodic features are characteristics which convey meaning, emphasis, and emotion without actually changing the phonemes. They include pitch, rhythm, and accent In English, prosodic features play a secondary roles to the phonemes However, in Chinese, prosodic features change the meaning a phoneme

Prosodic Features of Speech continued Prosodic features tend to indicate the emotional state of the speaker There have been attempts to use them in lie detection to analyze recorded speech for evidence of stress

Speech Analysis Requires that we analyze frequency and sound level as functions of time To effectively this, three dimensional representations are used A real-time spectrum analyzer rapidly analyzes the spectrum of sound using the fast-fourier transform (FFT)

Speech Analysis continued The sound spectrograph was particularly developed to analyze speech by Bell Labs in 1945 It records a sound-level-frequency-time plot for a brief sample of speech Sound level is represented by the degree of blackness in a 2-D time-frequency graph

Speech Analysis continued The modern digital version uses filters to divide the incoming speech signals into many different frequency bands The amount of power that comes through each filter is measured as a function of time The speech spectrograph is printed on grayscale

Schematic of a Sound Spectrograph