The Control of Airflow during Singing

Similar documents
Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Consonants: articulation and transcription

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Phonetics. The Sound of Language

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

age, Speech and Hearii

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

THE RECOGNITION OF SPEECH BY MACHINE

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Audible and visible speech

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Mandarin Lexical Tone Recognition: The Gating Paradigm

Beginning primarily with the investigations of Zimmermann (1980a),

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Phonological and Phonetic Representations: The Case of Neutralization

Universal contrastive analysis as a learning principle in CAPT

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Consonant-Vowel Unity in Element Theory*

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speaker Recognition. Speaker Diarization and Identification

Body-Conducted Speech Recognition and its Application to Speech Support System

Proceedings of Meetings on Acoustics

Provisional. Using ambulatory voice monitoring to investigate common voice disorders: Research update

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Quarterly Progress and Status Report. Sound symbolism in deictic words

On the Formation of Phoneme Categories in DNN Acoustic Models

Segregation of Unvoiced Speech from Nonspeech Interference

Speech Recognition at ICSI: Broadcast News and beyond

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Radical CV Phonology: the locational gesture *

Expressive speech synthesis: a review

Word Stress and Intonation: Introduction

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Speech Emotion Recognition Using Support Vector Machine

Journal of Phonetics

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

9 Sound recordings: acoustic and articulatory data

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

On the nature of voicing assimilation(s)

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Speaking Rate and Speech Movement Velocity Profiles

Automatic segmentation of continuous speech using minimum phase group delay functions

Multi-sensory Language Teaching. Seamless Intervention with Quality First Teaching for Phonics, Reading and Spelling

Phonological Processing for Urdu Text to Speech System

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Contrasting English Phonology and Nigerian English Phonology

Phonological encoding in speech production

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Developing a College-level Speed and Accuracy Test

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

First Grade Curriculum Highlights: In alignment with the Common Core Standards

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

One major theoretical issue of interest in both developing and

Speaker recognition using universal background model on YOHO database

Sample Goals and Benchmarks

Guidelines for blind and partially sighted candidates

GEMINATION STRATEGIES IN L1 AND ENGLISH PRONUNCIATION OF POLISH LEARNERS

The Acquisition of English Intonation by Native Greek Speakers

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Complexity in Second Language Phonology Acquisition

Aviation English Solutions

Program in Linguistics. Academic Year Assessment Report

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

How to Judge the Quality of an Objective Classroom Test

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

A redintegration account of the effects of speech rate, lexicality, and word frequency in immediate serial recall

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Stages of Literacy Ros Lugg

Voice conversion through vector quantization

Speech/Language Pathology Plan of Treatment

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Andrew S. Paney a a Department of Music, University of Mississippi, 164 Music. Building, Oxford, MS 38655, USA Published online: 14 Nov 2014.

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

Transcription:

Paper presented at THE SECOND INTERNATIONAL CONFERENCE on the PHYSIOLOGY AND ACOUSTICS OF SINGING, October 6-9, 2004, Denver, Colorado. A discussion of each of the figures presented in the oral paper. The Control of Airflow during Singing Martin Rothenberg Professor Emeritus, Syracuse University and President, Glottal Enterprises Note: The references below that are authored all or in part by the present author can be found on the website www.rothenberg.org. Figure 1. Typical subglottal air pressures in speech and singing In singing, subglottal (lung or tracheal) air pressures as high as 40 or 50 cm H 2 0 have been reported in singing at high volume levels. Such pressures are approximately 3 to 4 times the pressures used for speech. It might be hypothesized that, if not controlled by appropriate compensatory mechanisms, the high airflows that could be caused by such pressures might be detrimental to the mucosa of the vocal folds and would also deflate the lung volume more quickly between breath pauses. 1 Figure 2. Idealized representations of three ways for reducing glottal airflow during sung vowels. Previous publications have described at least the three mechanisms for reducing glottal airflow in the presence of a high subglottal pressure that are illustrated in this figure. For each method, a hypothetical airflow trace that roughly follows the variation of projected glottal area is shown, in order to provide a comparison that shows what the effect of each mechanism on the airflow waveform is like. In the first mechanism, sometimes referred to as "pressed" voice, the vocal folds are adducted more than required to sustain voicing. The open quotient and peak airflow would then both decrease. In the second mechanism, an augmented inertive component of the supraglottal vocal tract impedance suppresses and delays the buildup of airflow during the open phase of the glottal cycle. Since the closing of the glottis at the termination of the glottal open phase forces the flow to zero, there is a net reduction in airflow. Because of the more abrupt termination of airflow caused by this inertive loading of the glottal source, there is also a stronger excitation of the higher glottal harmonics at the instant of vocal fold closure. A good vocal fold closure is required for this mechanism to be effective. (The effect on breathy voice is discussed in reference 3 below.) In the third mechanism, presumably used by sopranos in the higher pitch ranges, airflow can be suppressed by tuning the vocal tract first or lowest formant (F1) to a frequency at or near the voice fundamental frequency. If the vocal fold closure is good (the voice is not breathy), the open quotient sufficiently small (voice not in falsetto), and the vocal tract resonance sufficiently sharp (production not nasalized, for example), the peaks of supraglottal pressure immediately above the glottis caused by the F1 resonance will occur during the glottal open phase, to oppose the subglottal pressure and thus suppress the glottal flow. (This may be a reason that sopranos do not tune F1 to

higher harmonics, as do male singers, at least at high volume levels. [See the comment of John Nix in the proceedings of this conference.] Tuning to the second harmonic, for example, though increasing the radiated energy at that harmonic, would not be expected to reduce the airflow.) References: 1. M. Rothenberg, A new inverse-filtering technique for deriving the glottal volume velocity waveform during voicing, J. Acoustical Soc. Amer. 53, 1632-1645 (1973). 2. M. Rothenberg, Acoustic interaction between the glottal source and the vocal tract, in Vocal Fold Physiology, K.N. Stevens and M. Hirano, eds., University of Tokyo Press, 305-328 (1980). 3. M. Rothenberg, Source-tract acoustic interaction in breathy voice, in Vocal Fold Physiology: Laryngeal Function in Phonation and Respiration, T. Baer, C. Sasaki, and K.S. Harris, eds., College Hill Press, San Diego, 254-263 (1984). 4. M. Rothenberg, Cosi fan tutte and what it means or nonlinear source-tract interaction in the soprano voice and some implications for the definition of vocal efficiency, in Vocal Fold Physiology-Laryngeal Function in Phonation and Respiration, T. Baer, C. Sasaki, and K.S. Harris, eds., College Hill Press, San Diego, 254-263 (1986). 2 Figure 3. Glottal airflow during three types of unvoiced intervocalic consonants. Illustrated diagrammatically in Fig. 3 are the general patterns to be found for glottal airflow during the predominant types of intervocalic unvoiced consonants. The three classes of consonants illustrated are glottal fricatives (shown as /h/), non-glottal fricatives (shown as /s/), and stop consonants (shown as /t/). During consonants not having a strong vocal tract constriction (as /h/), the airflow would be expected to roughly follow the variation in the glottal area in the abductory movement, though at high degrees of vocal fold abduction the airflow is most often reduced somewhat by the back pressure created by turbulence at the glottis or in the supraglottal vocal tract. In a non-glottal fricative (as in the /s/), the airflow is reduced by the constriction at the point of articulation of the consonant. In a stop consonant (the /t/, shown here aspirated), the airflow is set to zero by the articulatory occlusion for the stop. Since the /t/ is shown aspirated, there is a release of airflow, marked A, after the instant of release that is the aspiration. Among the ways that airflow can be reduced with a high subglottal pressure during the preceding and following vowels, the possibility exists theoretically that the subglottal pressure can be reduced momentarily during the consonant. We explore that possibility next. 1. M. Rothenberg, The glottal volume velocity waveform during loose and tight glottal adjustments, Proceedings of the VII International Congress of Phonetic Sciences, 380-388 (1971). Figure 4. Subglottal pressure traces illustrating the maximum speed of change for unidirectional and cyclic volitional respiratory gestures. These traces were collected for reference 1 in order to estimate the dynamic restrictions on changes of subglottal (lung or tracheal) air pressure. They indicate that a cyclic change requires about 300

3 ms, while a unidirectional change, as an increase or decrease, requires at least about 150 ms if it is to be smooth and not oscillatory. 1. M. Rothenberg, The Breath-Stream Dynamics of Simple-Released-Plosive Production, Bibliotheca Phonetica Vol. 6, Karger, Basel (1968). Figure 5. Four potentially valid mechanisms for reducing airflow during the production of unvoiced consonants. Reducing airflow by using the respiratory muscles to reduce the subglottal pressure has been eliminated from the list because of the inordinate dynamic limitations illustrated in Figure 4. The remaining, potentially valid, possibilities are listed in the figure. Figure 6. Measurements of the variation of airflow during an intervocalic unvoiced consonant with no articulatory obstruction. This set of airflow traces was recorded to determine the maximum speed at which the vocal folds could be abducted then adducted. In the reference cited, this was referred to as a cyclic glottal opening gesture. A gesture of this type is normally a part of the production mechanism of an unvoiced consonant produced intervocalically. The waveforms are of oral airflow during the cyclic abductory gesture of four intervocalic unvoiced consonants, as recorded by a CV mask. The traces during the /p/ consonants (traces A and B) were obtained by bypassing the lip closure with a short length of tubing, to show the airflow that would attain if there were no articulatory closure. Traces C and D were of an intervocalic /h/. Thus all traces at least roughly reflect the changing state of the glottis during the abductory gesture. They were selected from a larger number of productions as typical for the adult male speaker tested. The traces illustrate that the minimum duration of a cyclic abductory gesture for this speaker (not a trained singer) was about 125 ms if the abduction was to reach the point at which voicing essentially stopped, and about 100 ms if abduction was to result only in a breathy voice (vocal folds vibrating with no vocal fold contact). 1. M. Rothenberg, The Glottal Volume velocity Waveform During Loose and Tight Voiced Glottal Adjustments, Proceedings of the Seventh International Congress of Phonetic Sciences, Mouton, The Hague (1972). Figure 7. Some potential intervocalic timing patterns for unvoiced stops. These patterns are extracted from a list in the reference below, which attempts to define a physiologically based phonetic model describing all the phonemically distinct simple-releasedplosives that are producible by the vocal tract. In this model it is hypothesized that the categories are determined from the dynamic limitations in rapid speech and discriminability considerations.

4 1. M. Rothenberg, The Breath-Stream Dynamics of Simple-Released-Plosive Production, Bibliotheca Phonetica Vol. 6, Karger, Basel (1968). Figure 8. Oral airflow patterns comparing an aspirated and unaspirated English /t/ in speech. The oral airflow patterns in the figure are of an aspirated released stop in English (above) and the unaspirated released stop (below) produced when joining two English stop phonemes (to produce a single geminated stop articulation). The traces were recorded with a Glottal Enterprises CV mask system, including the new MS-110 electronics and software. Some low pass filtering (available in the software) is used to clarify the traces by reducing acoustic energy. Figure 9. The upper trace in Figure 8 contrasted with two examples of the aspirated and unaspirated geminated stop consonants in the spoken phrases What time and What dime. The traces in the figure were positioned so as to align the instants of articulatory release at the vertical line. The geminated sequence /tt/ across a word boundary produces an aspirated release similar to that of a single aspirated /t/. The sequence /td/ across a word boundary showed an articulatory release closely synchronized with the adduction of the vocal folds for the succeeding vowel, to produce an unaspirated release, and thus reduce the volume of air expended. The closure in both instances was held longer than for the single consonant, presumably in order to signal the presence of two consonants in the underlying phoneme sequence. Figure 10. Three airflow traces in which a phoneme /s/ is followed by an unvoiced consonant /p/. The traces are recorded with the same Glottal Enterprises mask system as used in Figures 8 and 9. The traces show how the duration of the vocal fold abductory movement is used to signal juncture and control aspiration. In the non-word-initial /p/ of the production of the English word spot, there is no aspiration, as dictated by English phonology; the glottal adduction is closely synchronized to the articulatory release. The lower trace shows how this characteristic of English can be used by an English speaker who has not mastered the purposeful control of aspiration in singing (or the pronunciation of a foreign language) to reduce the aspiration in the release of the /p/ in the phrase This pot. By visualizing the phrase Thi spot (similar to the real phrase The spot ), a potentially perceptually acceptable pronunciation of This pot is produced with the /p/ not aspirated, and expired airflow greatly reduced. Figure 11. Comparison of airflow in spoken and sung unvoiced consonants. (Adapted from the reference below.) Figure 11 shows a comparison of CV mask airflow traces from the same nonsense sentence, when spoken at a moderate volume level and when sung loudly near the top of the range of this singer. The pressure trace at the top of the figure was recorded from a slightly inflated balloon placed in the

esophagus via a small diameter flexible tube introduced at the nares. Such a procedure can yield a rough representation of the subglottal (tracheal) pressure if there are no esophageal contractions present. The airflow trace was low-pass filtered to remove the blur that would be caused by voicing. Both flow and esophageal pressure traces during singing show vibrato-related oscillations, which can be neglected in the analysis. This singer appears to have employed some of the mechanisms discussed above to reduce the expended air during the unvoiced consonants. For example, the duration and extent of the abductory gesture for both instances of /h/ appear to have been reduced to the extent that the air expended in each was less than in the spoken versions. Also, the word-initial /p/ in pat, while correctly aspirated in speech, was only slightly aspirated in the sung version. 1. M. Rothenberg, D. Miller, R. Molitor and D. Leffingwell, The Control of Airflow in Loud Soprano Singing, J. of Voice, Vol. 1, No. 3, 262-268 (1987). ******* 5