Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Nord, L. and Hammarberg, B. and Lundström, E. journal: STL-QPSR volume: 33 number: 2-3 year: 1992 pages: 019-022 http://www.speech.kth.se/qpsr
STL-QPSR 2-3/ 1992 VOICED-VOICELESS DISTINCTION IN ALARYNGEAL SPEECH - ACOUSTIC AND ARTICULATORY OBSERVATIONS' Lennart Nord, Britta Hammarberg** 6 Elisabet Lundstrom Abstract In an investigation of the speech of layngedomized speakers we presently focus on their ability to distinguish behoeen voiced and voiceless speech sounds. Some of them are able to produce this distinction clearly in some contexts and it is quite remarkable considering the structure of the pharyngo-esophageal entrance, serving as the post-operative voice source. However, there are a number of acoustic cues that serve to signal the distinction between voiced and voiceless phonemes and an acoustic analysis will tell how this is possible. INTRODUCTION The voiced-voiceless distinction is a typical and common characteristic for most languages of the world and there are numerous pairs of speech sounds that differ in this respect, such as the plosives /p, b/, /t, d/, /k, g/ as well as the fricatives /f, v/, /s, z/, etc. The glottis is well adjusted for rapid devoicing gestures by means of the laryngeal muscles. It is from a theoretical point of view much harder to understand how this articulatory gesture is made by the structures around and within the pharyngo-esophageal segment ("PE-segment"). This part is usually described as less mobile and not easily adjusted by will. However, there are indications that good alaryngeal speakers are quite accurate in signalling voiced/voiceless distinctions and both articulatory and acoustic analyses support this. As part of our investigation of alaryngeal speech we are interested in the mechanism of the alaryngeal voice source. So far, we have analyzed the speech of seven esophageal and seven tracheo-esophageal speakers, two of whom also master esophageal speech. There seems to be a wide variety of voices, in terms of quality, pitch range, phonation stability etc. We believe that part of this variation stems from the fact that this group of speakers anatomically differ a lot as regards the PE-segment, that serves as the voice source, and there is probably a much larger variation than what can be found in the glottal structures. BACKGROUND A number of studies have dealt with this research area. In a study by Hirose, Sawashima, & Yoshioka (1983), they investigated the voicing distinction among esophageal speakers and used a number of techniques: perceptual, fiberoptic and acoustic. They found that most of the confusions between voiced and unvoiced cognates were in the direction that the voiceless more often were mistaken for voiced, thus /p, t, k/ were confused with /b, d, g/, than the reverse. Also they report that the confusions were more frequent in word initial position than medial. Based on their fiberoptic analyses they state that (p. 197): "There seems to be some sort of ten- *~evised version of a paper presented at the Sixth Swedish Phonetic Conf., Gothenburg, May 20-22, 1992. **also Dept. of Logopedics and Phoniatrics, KI, Huddinge Hospital.
STL-QPSR 2-3/ 1992 sion control mechanism around the region of the neoglottis but further study is needed to explore its nature. " METHODS We have made a number of intelligibility tests with alaryngeal speakers, using one-syllable word lists of the CVC structure. Acoustic analysis of the word pairs have also been performed. Recently, we have started to investigate the alaryngeal phonation by means of fiberscopic observation for a few of the speakers to examine the movement patterns of the PE-segment structure during the production of stops. Specifically, we are interested in the voiced/voiceless distinction. RESULTS OF INTELLIGIBILITY TESTS In earlier reports we have presented results of intelligibility tests (Hammarberg, Lundstrom, & Nord, 1990a; b) where we used test material produced by esophageal ("E ") and tracheo-esophageal ( "TE ") speakers. These tests have been extended using more speakers and listeners. The findings generally agree with results presented by other researchers. Despite the individual variations among speakers, we will discuss some of the results regarding the voiced-voiceless distinction. The initial phonemes were more difficult to perceive correctly than the non-initial phonemes. For both E-speakers and TE-speakers initial /p/ was often perceived as /b/; /t/ and /d/ were confused in both directions, while /k/ was correctly perceived in most cases. For the TE-speakers /g/ was often perceived as /k/. These results seem to indicate two things. Firstly, both groups of speakers have difficulties in differentiating between voiced and unvoiced segments, secondly, there seems to occur a difference in voicing by the two groups of speakers. There is a tendency that the E-speakers prefer voiced segments, while the TE-speakers devoice the segments. ACOUSTIC ANALYSIS Two spectrographic illustrations will support our findings. In Figs. la and lb, the word pairs "bank" and "pank" uttered by an E-speaker are shown. In his speech many of the initial /p/ phonemes were confused with /b/. This is a very probable result, when we look at the acoustic traces. A typical Swedish initial /p/ usually has a clear burst interval and an F1 cut-back. In this production of /p/ it is difficult to see any trace of burst and F1 starts quite sudden. The word "pank" is thus perceived as "bank" with reduced voicing cue but still not an initial /p/. In Fig. 2, an example of a word produced by a TE-speaker is shown, which was meant to be "gal" but was perceived as "kal". In the spectrogram there is no initial voicing and an initial very irregular voiced segment is heard as a burst interval, thus signalling a /k/. In non-initial positions the confusions were less frequent and one obvious reason for this was the use of timing characteristics. In Swedish, there is a clear difference between the duration of the occlusion for the voiced and unvoiced stops and also the duration of the preceding vowel is different, the longer vowel segment occurring before the voiced stop.
STL-QPSR 2-31 1992 b a rl k I Fig. la. Spectrogram of the word "bank", uttered by an E-speaker. 1 P a rl k I Fig. 1 b. Spectrogram of the word "punk ", perceived as "bank ", uttered by an E-speaker. 1 g a: 1 I Fig. 2. Spectrogram of the word "gal " perceived as "kal ", uttered by a TE-speaker.
STL-QPSR 2-3 / 1992 FIBEROPTIC REGISTRATIONS For a few of the speakers fiberoptic registrations were made while they read word lists. These registrations have not yet been quantitatively assessed but a number of interesting details can be observed. For one male TE-speaker it was possible to make a stroboscopic analysis due to his periodic and stable phonation, which revealed the contour of his "neoglottis" during phonation with indications of a "glottal wave" in this structure. Also, this speaker made a clear prolongation gesture during the voiced stops. For one of the E-speakers a clear opening gesture was observed during the unvoiced stop occlusion. This prephonatory gesture might also serve as part of the speaker's inhalation technique that he combines with an injection technique to induce air into the esophagus (Diedrich, 1980). Most of the unvoiced stops however, were realized with a minimal aspiration phase. CONCLUSIONS This group of proficient alaryngeal speakers followed the generally observed pattern, viz. that it is difficult to realize the voiced-voiceless distinction. Our fiberoptic observations seem to indicate that at least for some speakers, the mechanism for making the distinction seems to be there and the pharyngo-esophageal entrance is quite accurately controlled. A follow-up question then is: how can a speaker learn to use it? ACKNOWLEDGEMENTS We are grateful to Per-Ake Lindestad at the Dept. of Logopedics and Phoniatrics, Huddinge hospital, for making the fiberoptic registrations. This work was financed by research grants from the Swedish Cancer Society and the Swedish Council for Social Sciences. REFERENCES Diedrich, W.M. (1980): "The mechanism of esophageal speech," pp. 45-59 in (B. Weinberg, ed.), Readings in Speech Following Total Laryngectomy, University Park Press, Baltimore. Hammarberg, B., Lundstrom, E., & Nord, L. (1990a): "Consonant intelligibility in esophageal and tracheo-esophageal speech. A progress report," Phoniatric & Logopedic Progress Report No. 7 (Huddinge Univ. Hospital), pp. 49-57. Hammarberg, B., Lundstrom, E., & Nord, L. (1990b): "Intelligibility and acceptance of laryngectomee speech," PHONUM No. 1, (Reports from the Dept. of Phonetics, Univ. of UmeZi), pp. 88-91. Hirose, H., Sawashima, M., & Yoshioka, H. (1983): "Voicing distinction in esophageal speech - perceptual, fiberoptic and acoustic studies," Ann. Bull. RILP No. 17, pp. 187-199.