Mandarin Lexical Tone Recognition: The Gating Paradigm

Size: px

Start display at page:

Download "Mandarin Lexical Tone Recognition: The Gating Paradigm"

Ernest Nichols
6 years ago
Views:

1 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition in Indo-European languages often does not incorporate prosody. In Mandarin Chinese, however, lexical prosody is used extensively and has been shown to affect word processing in previous studies. The present study uses the gating paradigm to investigate the processing of the four Mandarin tones as well as the role of the initial segment in processing. Duration-blocked gates generated from eight monosyllabic quadruplets with matching frequencies of occurrence were used as stimuli. To evaluate the effect of the initial segment, the initial consonant of each syllable always formed the first gate, with later gates formed by 0ms increments. Results showed that Tone has a significantly earlier Isolation Point (IP) than Tone, which has an earlier IP than Tones and. Sonorant-initial syllables have an earlier IP than obstruent-initial syllables, but further analyses of covariance indicated that IP covariates with the duration of the initial consonant. The tone responses proposed by the participants before reaching the IP were cross-examined with the acoustic features of the four tones. The results indicated that high register cues are more prominent than low register cues, as high tones were never misidentified as low tones. Moreover, contour information outweighs low register cues, as low-onset tones were sometimes misidentified as high-onset tones with which they share similar contours. These results provide more detailed temporal information about tone processing for Mandarin.. Introduction As a phonemic feature, tone plays an important role in lexical processing in Mandarin. When listeners hear a word, they need to process both the segmental composition and tone in order to perceive the word correctly. Tones therefore play an important role in isolating the target token from possible segmental homophones. Previous research has shown that the most important acoustic cues for Mandarin tones are F0 height, F0 shape, and F0 differences between the onset and turning point of the tone (especially for Tone and Tone ). Duration cues such as the overall tone duration as well as the timing of the turning point have also been shown to affect the perception of tones. Given the acoustic differences between the four Mandarin tones, it is necessary to determine how much and what kind of acoustic information is required for listeners to perceive the tones correctly... The gating paradigm In gating experiments, participants are presented with a spoken language stimulus (phone, syllable, word, phrase, or sentence, etc.) in segments of increasing duration, and are then asked to propose the word presented and give a confidence rating (Grosjean, 996). The increment size is consistent across the stimuli (usually between 0-00ms, or a fixed percentage of an individual word). Three sets of data are usually collected in this type of study. ) Isolation point (IP) the size of the segment needed to correctly identify the stimulus without further changes. )

2 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Confidence rating the rating at each segment. ) Proposed responses subjects responses at each gate before the isolation point. This paradigm allows for precise controls of the acousticphonetic information of the stimuli presented to the subjects. As a result, it can examine the moment-to-moment recognition process and evaluate the amount of acoustic-phonetic information required to identify the stimulus... Acoustic correlates of Mandarin tones The F0 contours of the four Mandarin tones produced in isolation are provided in Figure. Tones are transcribed as moving within a pitch range from low, numerically denoted as, to high, denoted as 5. Tone is transcribed as 55 (high-level), Tone as (low-rising), Tone as (low-dipping) and Tone as 5 (high-falling). Research by Gandour (98) has shown that five aspects are relevant for tone perception: ) average F0/F0 height; ) F0 contour; ) F0 slope; ) extreme endpoints; and 5) tone duration. Previous research has shown that the primary acoustic parameters of Mandarin tones are F0 height and contour shape (Howie, 976). Duration also differs among the four tones: Tone and Tone are the longest, while Tone is the shortest (Nordenhake and Svantesson, 98). Moore and Jongman (997) have also shown that Tone has an earlier turning point and a smaller F0 change between the onset and turning point than Tone Tone Tone Tone Tone 60 F0 (Hz) Time Figure. Four Mandarin lexical tones on the syllable ma produced by a male native speaker... Previous gating studies on Mandarin tones Lee (000) used the gating paradigm to explore the lexical competition between different types of syllables by comparing: ) syllables with tonal minimal pairs to those without; ) syllables with different numbers of tonal minimal pairs; ) syllables with similar tones (i.e, Tones and ) to those with dissimilar tones (i.e, Tones and ); and ) syllables with sonorant

3 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 85 onsets to those with obstruent onsets. Three sets of data were collected: the tone isolation point the point at which the target tone was correctly identified without further changes; the word isolation point the point at which the target syllable was correctly identified without further changes; and the word recognition point the point at which the target syllable was correctly identified and the confidence rating reached at least 8 on a 0-point scale. Twenty native speakers participated in the experiments. They were asked to propose (in writing) what word, based on the segment presented to them, they thought they had heard, and give a confidence rating for their answer. Lee (000) s results indicated that tone isolation, word isolation and word recognition points were all earlier for words without tonal minimal pairs. However, no differences were found across stimuli with one, two, and three tonal minimal pairs. In terms of tone similarity, the tone isolation point was consistently earlier for words without tonally similar minimal pairs, but no differences were found in word isolation point or word recognition point. The accuracy rate for tone identification in the initial gate formed by the onset consonant was higher for sonorant onset than obstruent onset. Wu and Shu (00) also adopted the gating paradigm in their work on Mandarin tone processing. 0 Mandarin monosyllables were tested on 7 subjects. The gates were constructed with 0ms increments and were presented in a duration-blocked format. The subjects had to write down the character and give a confidence rating for the judgment on a piece of paper. They analyzed the isolation point (IP) of all stimuli, the IP of each tone, and the errors generated by onset, rime and tone. Their results showed that the IP is the longest for Tone ; no difference was found between Tones,, or. They also analyzed errors and distinguished errors as coming from either the onset, rime, or tone. It was shown that after the fifth gate (00ms), participants could correctly identify the entire target syllable. They also showed that Tone and Tone were most likely to be mistaken for each other. However, Tone and Tone, which have been shown to be similar to each other acoustically (Moore and Jongman, 997), were not easily mistaken for each other. There are two methodological problems in Wu and Shu (00) s study. First, they did not control for the frequencies of occurrence of the target syllables across different tones in their stimuli. Second, they did not use tone quadruplets to control the segmental composition of the stimuli. Therefore, their result on the processing of different tones might have been confounded with the frequency effects as well as effects from the segmental composition of the stimuli... The current study The present study proposes a revised methodology that provides a better control of frequency of occurrence and segmental composition. Our goal is to investigate the amount of tonal information needed to correctly identify the target tone. This includes the tone duration required from the onset of the token as well as the acoustic cues the listeners adopt during processing. In the meantime, it will also allow us to systematically investigate the effect of sonorancy of the initial consonant on tonal identification. Figure simulates the hypothesized process of Mandarin tone identification. As shown in the figure, we hypothesize that the four tones will first be distinguished as two groups based on onset tone height. Between the two tones that start with a high register, Tone can be identified earlier than Tone, as contour shapes require longer duration to be perceived (Black, 970;

4 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 86 Greenberg and Zee, 979). For tones that start with a low register, Tone, which has an earlier turning point and a smaller F0 change between the tone onset and the turning point, is predicted to require a shorter duration to identify than Tone. Figure. Hypothesized process for Mandarin tone recognition. In addition, we hypothesize that a sonorant initial can provide acoustic information and subsequently trigger an earlier isolation point as compared to an obstruent initial.. Methodology.. Stimuli The stimuli consist of 8 tone quadruplets, each containing the same segmental composition but different tones. The stimulus list is given in Figure. Four quadruplets have a CV structure, while the other four have a CVN structure. The frequencies of occurrence were matched across the four different tones using Da (007) s corpus. The design includes two within-subject factors: Tone (,, and ) and Initial consonant (sonorant and obstruent). Figure. Stimuli- two initial consonant types (sonorant and obstruent). Construction of the stimuli for Mandarin tone gating is not an easy task. Ideally the frequencies of occurrence for the syllables used in each quadruplet need to be matched. In addition, each syllable in Chinese usually has a number of homophones, each of which has a unique character. For example 郁 [y ]- the first syllable of the first author s given name, has 07 homophones. Consequently, syllable frequency, instead of character frequency was used for frequency control. This is because when the stimuli are presented to the subjects auditorily, it is not apparent which character will be activated. This is based on the assumption that when the subjects listen to the stimuli, all homophones will be activated for that pronunciation.

5 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 87 The stimuli were recorded in an anechoic chamber at the University of Kansas by a male native Mandarin speaker. The recording was then transferred to PRAAT for editing in the Phonetics and Psycholinguistics Laboratory at the University of Kansas. The initial consonant of each syllable always formed the first gate. The following gates were formed in 0ms increments starting from the onset of the rime in each syllable. Figure illustrates the complete gating sequence for the Mandarin word husband [fu ]. gate gate gate gate gate 5 gate 6 gate 7 gate 8 gate 9 gate 0 gate gate Time (s) Figure. The gating sequence, illustrated by the word [fu ] husband. The first gate includes only the onset consonant. Following gates were constructed in 0ms increments. The last gate always contains the entire syllable... Participants Twenty-eight adult native Mandarin speakers from Beijing were tested at Peking University. They were paid for their participation... Experimental procedure The subjects were tested individually in a quiet room using the SuperLab program (Cedrus). The experiment began with an instruction (recorded by the same speaker who recorded the stimuli), which explained to the subjects that their task was to identify the tone for each gated stimulus and provide a confidence rating on a scale of to 7 for their response by pressing the corresponding buttons on a keyboard. The stimuli were presented in a duration-blocked format, in which subjects heard the first gates of all stimuli, then the second gates, etc... Data processing The following data were collected from the subjects responses: ) isolation point (IP) the point at which the target tone was correctly identified without further changes; ) proposed responses before reaching the IP for each tone.

6 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 88. Results.. Isolation Point (IP) The IP results are given in Figure 5. A (Tone) (Initial consonant) analysis of variance (ANOVA) of the IP showed a main effect of Tone [F(, 86)=.0, p<.00]. A posthoc analysis indicated that the IP for Tone was earlier than that for Tone, which in turn was earlier than the IPs for Tone and Tone. There was no difference between the IPs for Tone and Tone. The main effect of Initial consonant is also significant [F(, 86)=07.8, p<.00]; sonorant-initial syllables have an earlier IP than obstruent-initial syllables. The interaction between the two main effects is not significant [F(, 86)=.9, p=.087]. Figure 5. Isolation points across four tones and two types of initial segments. To further examine whether the earlier IP for sonorant-initial syllables was caused by the sonorancy of the initial segment or a shorter gate duration, we measured the duration of the initial consonant (gate ). Results from a (Tone) (Initial consonant) ANOVA showed that the main effect of Tone was not significant [F(, )=., p=.78], but there was a main effect of the Initial consonant [F(, )= 9.5, p<.00]; sonorant-initial syllables had a shorter gate duration than obstruent-initial syllables. The interaction is not significant [F(, )=.8, p=.697]. The gate duration results are given in Figure 6.

7 Kansas Working Papers in Linguistics, Vol. 0 (008), p Initial sonorant obstruent Gate Duration (ms) Tone Figure 6. Gate duration across four tones and two types of initial segments. To evaluate the effect of gate duration on the IP, we further conducted a (Tone) (Initial consonant) analysis of covariance (ANCOVA) with gate duration as a covariate. The main effect of Tone was still significant [F(, 86)=99.65, p<.00], and posthoc analyses indicated that Tone had an earlier IP than Tone, which had an earlier IP than Tone and Tone, and there was no difference between Tone and Tone. The main effect of Initial consonant was not significant [F(, 86)=.006, p=.99]. The interaction was also not significant [F(,86)=.9, p=.]. This analysis showed that when the factor of gate duration was excluded, there was no significant difference in IP between sonorant-initial and obstruent-initial syllables. The IP results excluding gate duration are given in Figure 7. Identification Point (ms) Initial sonorant obstruent 0.50 Tone Figure 7. Isolation points across four tones and two types of initial segments (with gate duration excluded).

8 Kansas Working Papers in Linguistics, Vol. 0 (008), p Accuracy rate at gate To further investigate the effect of sonorancy on tonal identification, we calculated the accuracy rates for tonal identification at gate for all stimuli. These accuracy rates are given in Figure 8. Results from a (Tone) (Initial consonant) ANOVA showed that, unsurprisingly, there was a main effect of tone [F(, 888)=.78, p<.00], and posthoc analyses indicated that the accuracy rate was the highest for Tone, followed by Tone and then Tone and Tone. But the main effect of the Initial consonant was also significant [F(, 888)=9.5, p<.00]; the accuracy rate was higher for sonorant-initial syllables than obstruent-initial syllables. Gate Accuracy Rate 0.80 Sonorant 0.60 Accuracy rate Obstruent 0.00 Tone Figure 8. Gate accuracy rate across four tones and two types of initial segments. To further illustrate this point, sample F0 contours of the sonorant-initial syllable meng and the obstruent-initial syllable fang were plotted in Figure 9. We can see that although the IP for meng is earlier than that for fang, this difference is largely due to the difference in the duration of the initial consonant (gate ).

9 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 9 Figure 9. Sample F0 contours for a sonorant-initial syllable meng and an obstruentinitial syllable fang. The dotted line indicates the gate boundary and the solid line shows the IP. Our ANCOVA results for the IP and the ANOVA results for the accuracy rate indicate that although the sonorancy of the initial consonant does not necessarily trigger an earlier IP, it does contribute to tonal identification in boosting the accuracy rate at gate. The fact that it does not trigger an earlier IP indicates that in order for the listener to confidently identify a tone, a certain amount of duration from the vowel is necessary, as it provides clearer acoustic cues for F0... Analysis of tonal confusion before IP We further examined the responses listeners provided before the IP to investigate the possible cues that the listeners used in making their judgments. Figure 0 shows the histograms of tone responses at gates -9 for Tone and Tone. The y-axis represents the number of times a particular tone was given as the response. For later gates of the stimuli, subjects correctly identified the target tone with a close-to-00% accuracy rate. We thus do not report these later gates in the histograms. For Tone tokens (Figure 0a), subjects start reaching a high accuracy rate from gate. Errors made before gate are mostly misidentifications as Tone. This may be due to the similarity between the initial contours of Tone and Tone in both the register and the tone shape. Correspondingly, Tone (Figure 0b), before reaching a high accuracy rate at gate 5, was often misidentified as Tone. The reason that it takes longer for Tone to be correctly identified than Tone may be that at earlier gates, the tone duration is not long enough for the subjects to perceive the falling contour, thus causing a level tone perception.

10 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 9 (a) Gates for Tone Frequency (b) Identification Gates for Tone Frequency Identification Figure 0. Histograms of tone responses at gates -9 for Tone (a) and Tone (b). The histograms for Tone and Tone responses for gates -9 are given in Figure. Similar to the results found in Wu and Shu (00), there was a high percentage of Tone responses at earlier gates for Tone and Tone. We propose that this is again due to the short duration of the presented segment, which does not warrant a contour tone perception. Tone, being the only level tone in the language, then becomes the most common response at earlier gates for all four target tones. Interestingly, instead of being confused with Tone, Tone tokens at earlier gates received significant Tone responses. Closer examination of Tone and Tone showed that at early gates, these two tones, although different in register, share very similar tonal contours (cf. Fig. ). The first gate of Tone was also often misidentified as Tone, presumably due to the short duration. Starting from gate, subjects had significantly more Tone responses. Interestingly, in gates -7, Tone tokens were sometimes misidentified as Tone. We surmise that since Tone is the only tone in the low register region, listeners may have taken advantage of the low register and identified them as Tone at the very beginning; but when enough duration was heard, which warranted a falling tone perception, a Tone perception was triggered.

11 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 9 (a) Gates for Tone Frequency Identification (b) Gates for Tone Frequency Identification Figure. Histograms of tone responses at gates -9 for Tone (a) and Tone (b). The confusion analysis suggests that among the acoustic cues for Mandarin tones, listeners are the most sensitive to a high registered pitch at the beginning of a tonal stimulus, as evidenced by the earlier IPs for Tone and Tone, which start with a high pitch. The contour perception requires a significant duration of the vowel that carries the contour, as shown by the common misidentification of Tone as Tone at early gates. An acoustic low pitch seems to carry a less significant perceptual weight than an acoustic high pitch at tonal onset, as evidenced by the common misidentification of Tone (low pitch onset) as Tone (high pitch onset) at early gates; the low pitch also carries a less significant perceptual weight than a contour pitch at tonal onset, as the speakers sometimes identified the low falling pitch at the beginning of Tone as the high falling pitch of Tone. Our tonal confusion data are inconsistent with the general understanding that among Mandarin tones, Tone is more likely confused with Tone, and Tone is more likely confused with Tone. This may be due to the fact that the cues listeners focus on during the initial unfolding of the tone are different from the ones they use once the entire tone has been presented.

12 Kansas Working Papers in Linguistics, Vol. 0 (008), p. 9. Discussion and conclusion The current study establishes that a timing difference exists in the processing of different Mandarin tones. The isolation point is the earliest for Tone, followed by Tone, which is then followed by Tone and Tone. The IP for sonorant-initial syllables is earlier than that for obstruent-initial syllables, but this difference is likely due to the shorter duration of initial sonorants than initial obstruents, not their difference in sonorancy per se. Despite its lack of temporal effects, the sonorancy of the initial consonant does contribute to the identification of tone, in that it boosts the accuracy rate of identification at gate, which is solely composed of the initial consonant. Based on the confusion analysis before the isolation point, a hierarchy of cues at the onset of tonal identification was also found: high > contour > low. High-onset tones, regardless of contours, were not misidentified as low-onset tones; but low-onset tones were sometimes misidentified as high-onset tones due to their contour shapes. In sum, our study provides more detailed temporal information about tone processing for Mandarin. With a better understanding of the timing of identification and acoustic cues that listeners rely on at the onset of identification, it will assist in the further refinement of temporal precision in future processing studies of Mandarin tones. References Black, J. W (970). The magnitude of pitch inflection. Proceedings of the 6 th International Congress of Phonetic Sciences. Prague: Da, J. (007). Chinese text computing corpus. Gandour, J. (98). Tone perception in Far Eastern languages. Journal of Phonetics, : Greenberg, S. and E. Zee (979). On the perception of contour tones. UCLA Working Papers in Phonetics 5: Grosjean, F. (980). Spoken word recognition processes and the gating paradigm. Perception and Psychophysics, 8 (): Howie, J. M. (976). Acoustical studies of Mandarin vowels and tones. Cambridge: Cambridge University Press. Lee, C-Y. (000). Lexical tone in spoken word recognition: a view from Mandarin Chinese. Doctorate Dissertation, Brown University. Moore, C. B., and Jongman, A. (997). Speaker normalization in the perception of Mandarin Chinese tones. Journal of the Acoustical Society of America, 0: Nordenhake, M. and J. O. Svantesson. (98). Duration of standard Chinese word tones in different sentence environments. Lund University Working papers in Linguistics 5: 05-. Wu, N. and Shu, H. (00). The gating paradigm and spoken word recognition of Chinese. Acta Psychologica, 5 (5): Author contact information: Yuwen Lai: yuwen@ku.edu Jie Zhang: zhang@ku.edu

Phonological and Phonetic Representations: The Case of Neutralization

Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider