CATEGORICAL SPEECH PERCEPTION REVISITED

Similar documents
Mandarin Lexical Tone Recognition: The Gating Paradigm

Phonological and Phonetic Representations: The Case of Neutralization

Rhythm-typology revisited.

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

The Acquisition of English Intonation by Native Greek Speakers

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Journal of Phonetics

Word Stress and Intonation: Introduction

L1 Influence on L2 Intonation in Russian Speakers of English

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Prosody in Speech Interaction Expression of the Speaker and Appeal to the Listener

Proceedings of Meetings on Acoustics

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

A survey of intonation systems

Speech Emotion Recognition Using Support Vector Machine

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Recognition at ICSI: Broadcast News and beyond

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

The influence of metrical constraints on direct imitation across French varieties

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Stages of Literacy Ros Lugg

Consonants: articulation and transcription

Universal contrastive analysis as a learning principle in CAPT

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Florida Reading Endorsement Alignment Matrix Competency 1

Copyright by Niamh Eileen Kelly 2015

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Phonological encoding in speech production

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

On the nature of voicing assimilation(s)

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

THE RECOGNITION OF SPEECH BY MACHINE

Manual Response Dynamics Reflect Rapid Integration of Intonational Information during Reference Resolution

Surface Structure, Intonation, and Meaning in Spoken Language

Understanding and Supporting Dyslexia Godstone Village School. January 2017

English Language and Applied Linguistics. Module Descriptions 2017/18

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Voice conversion through vector quantization

TAG QUESTIONS" Department of Language and Literature - University of Birmingham

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Guidelines for blind and partially sighted candidates

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Sample Goals and Benchmarks

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

On the Formation of Phoneme Categories in DNN Acoustic Models

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Consonant-Vowel Unity in Element Theory*

1972 M.I.T. Linguistics M.S. 1972{1975 M.I.T. Linguistics Ph.D.

One major theoretical issue of interest in both developing and

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Collecting dialect data and making use of them an interim report from Swedia 2000

Learning Methods in Multilingual Speech Recognition

Automatic intonation assessment for computer aided language learning

IEEE Proof Print Version

Contrastiveness and diachronic variation in Chinese nasal codas. Tsz-Him Tsui The Ohio State University

Speech Perception in Dyslexic Children. With and Without Language Impairments. Franklin R. Manis. University of Southern California.

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Evolution of Symbolisation in Chimpanzees and Neural Nets

Psychology of Speech Production and Speech Perception

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

SARDNET: A Self-Organizing Feature Map for Sequences

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Phonetics. The Sound of Language

Audible and visible speech

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Seminar - Organic Computing

Phonological Processing for Urdu Text to Speech System

Quarterly Progress and Status Report. Sound symbolism in deictic words

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING

GOLD Objectives for Development & Learning: Birth Through Third Grade

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Part I. Figuring out how English works

18 The syntax phonology interface

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

Lecture Notes in Artificial Intelligence 4343

Transcription:

CATEGORICAL SPEECH PERCEPTION REVISITED Klaus Kohler Institut für Phonetik und digitale Sprachverarbeitung (IPDS), Kiel, Germany kjk@ipds.uni-kiel.de ABSTRACT CSP postulates perceptual grouping of an acoustic continuum into sharply delimited phonological categories with discrimination maxima across the identification boundaries. The experimental procedure was applied to F contours in a peak-shift and semantic contextualization paradigm in German and showed a categorical change from early to medial position in relation to the accented syllable. But in a comparable valley shift from early to late a discrimination maximum was not found although there was clear category formation in the identification task. In F-peak perception a syntagmatic pitch contrast of high-low or low-high, respectively, across the syntagmatic articulatory landmark of consonant-vowel transition, preceding a final fall, is characteristic of early vs medial. In the valley shift, the decisive pitch difference between early and late final rises is confined to the vowel and thus lacks a tight link with a syntagmatic articulatory contrast. This leads to the conclusion that perceptual categorization of a physical continuum is not tied to a discrimination maximum, unless there is an additional association with contrastive vocal tract sequencing. This can also explain differences found in the categorization of consonants vs vowels, and stresses the relevance of syntagmatic auditory enhancement beside paradigmatic phonemic opposition in speech perception. THE HASKINS PARADIGM OF CATEGORICAL SPEECH PERCEPTION Structural linguistics established the concept of contrastive sound units phonemes differentiated by distinctive features, as against the redundant features of contextually determined allophones. The psychologists at Haskins took over this segment-oriented view of language and its bipartition into contrastive invariance and conditioned variability, and projected it onto speech perception. Decoding phonemes became the task of the listener, who had to extract the distinctive features of phonemic contrasts from speech variability. This is the theoretical basis that led to the paradigm of categorical speech perception and subsequently to the development of the Motor Theory: listeners would attune to the speech parameters that distinguish phonemes, and thus categorize an acoustic continuum sharply in an identification task; at the same time they would differentiate acutely across the category boundaries, but only poorly inside them (Liberman et al.1957, 1962). The classic identification and discrimination experiments of acoustic continua referring to place of articulation and VOT in plosives were considered supporting the notion of a special Speech Code in perception, closely linked to categorical separation in production as against gradual acoustic manifestation (Liberman et al.1967). The theory was critically reviewed by Lane (1965). The experimental results were less clear for vowels than for consonants, and seemed to disfavour categorical tonal perception. From Sound to Sense: June 11 June 13, 4 at MIT C-157

CATEGORICAL PITCH PERCEPTION The Categorization of Peak and Valley Alignment Kohler (1987) applied CSP to the perception of F contours in German in a peak-shift and semantic contextualization paradigm, and showed categorical changes in the identification of early vs medial peaks, with a discrimination maximum across the category boundary, i.e. support for the classic Haskins paradigm. As the early peak was found to be associated with finality ( knowing, coming to the end of an argument ), the medial peak with openness ( observing, starting a new argument ), appropriate contexts could be constructed for identification of test stimuli such that their intonation either fitted or did not. Discrimination was tested with 1-step and 2-step pairings. The results have been reproduced over and over again. The discrimination pattern across vs inside early/medial categories even works with speakers of diverse languages (tone and intonation), who have no knowledge of German and therefore cannot provide a semantic classification, which would be different in different languages anyway. So categorical discrimination is possible in human language without it being tied to semantically determined categorical identification. This points to a wide-spread, or even universal, psychophonetic principle of pitch perception in speech. It needs to be kept separate from linguistic and other functional uses of F peak synchronization, which vary from language to language to differentiate word tones, sentence modalities and attitudinal/expressive patterns in communication. The shift and semantic contextualization paradigm was also applied to a continuum of valley contours from an early to a late synchronization with articulation. Several different experimental designs were used, but CSP was not confirmed by any of them. In the latest experimental series (Niebuhr & Kohler, 4), peak and valley patterns were constructed in such a way that they were exact mirror images in semitone steps up and down with reference to an initial F of 8Hz. The peak shift shows the usual CSP in identification and discrimination; the valley shift also points to a clear categorization at the opposite ends of the shift scale, but discrimination of 2-step pairings is not significantly different from the discrimination of identical stimuli in peak and valley shifts. Figure 1 presents the results. 9 8 7 peak valley 9 8 7 peak_unequal valley_unequal peak_equal valley equal %matching 1 2 3 4 5 6 7 8 9 Figure 1. Identification functions of the peak and valley stimuli (left) and discrimination functions of their (un)equal pairings (right, the number refers to the serial rank of the first in the pair). 18 subjects, 5 repetitions. % different 1 2 3 4 5 6 7 8 9 From Sound to Sense: June 11 June 13, 4 at MIT C-158

So in the case of the valley shift, the psychophonetic principle does not apply, although the functional categorization is clearly there. This means that the psychophonetic and the functional principle can be independent, with either discrimination or functional identification being categorical, but they may also be linked, as in the classic Haskins paradigm and in the peak shift data. The Haskins group generalised one particular constellation with far-reaching consequences for the theory of speech perception. The Categorization of a Phrase-final Falling-to-Rising Continuum The absence of categorical discrimination in spite of functional category formation is also shown by results from discrimination and identification experiments with a phrase-final falling-to-rising continuum, carried out by the author with students in a course on prosody at IPDS. The naturally produced sentence Alle Jungen spielen Fußball. All boys play football. was used as the basis for generation. It contained two accents, realised as peak contours on alle and Fußball with an F dip between them, a maximum of the second peak (on the vowel [u:] of Fuß) of 1Hz and a phrase-final value of 7Hz. The F curve was stylised by 5 significant points (start, first peak maximum, minimum between peaks, second peak maximum, end) with linear interpolation between them. For generation, the first 4 points were kept constant across the series, the last one was changed in steps of one semitone, starting from 7Hz and going up to 264Hz, which resulted in 24 stimuli forming a continuum from falling via level to rising pitch on the last word (see Table 1). The generation was done in praat. Table 1. Phrase-final F in the 24 stimuli resynthesized from the natural utterance Alle Jungen spielen Fußball. Sti1 7 Sti7 99 Sti13 1 Sti19 198 Sti2 74 Sti8 5 Sti14 148 Sti 9 Sti3 78 Sti9 111 Sti15 157 Sti21 222 Sti4 83 Sti 117 Sti16 166 Sti22 235 Sti5 88 Sti11 124 Sti17 176 Sti23 249 Sti6 93 Sti12 132 Sti18 186 Sti24 264 With reference to the peak maximum of 1Hz, stimuli 1 8 represent a series of decreasing falls, 9 level pitch, and stimuli 24 a series of increasing rises. For the identification test, the stimuli were repeated times and randomized. Subjects were asked to classify each of the 2 test stimuli as either final statement or non-final statement or question by pressing one of three response buttons in a computerized reaction measuring set-up. For the discrimination test, stimuli were paired with a step size of 2 in ascending order, and in addition every third, starting with Sti2, was paired with itself. This gave paired test stimuli, which were repeated 5 times and randomized. Using the same equipment, subjects recorded their perception of each of the 1 test stimuli as either same or different. 8 phonetically naive subjects took part in the identification test, 7 in the discrimination test. Figure 2 shows the results of the two tests. From Sound to Sense: June 11 June 13, 4 at MIT C-159

final st non-final st quest unequal equal 9 9 8 % identified as 8 7 % different 7 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 21 22 23 24 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 21 22 23 Figure 2. Identification functions for the stimuli of the phrase-final falling-torising series (left, 8 subjects, repetitions) and the discrimination function of their (un)equal pairings (right, the number refers to the serial rank of the first in the pair; 7 subjects, 5 repetitions). The continuum is clearly partitioned into a statement and a question section, with a further subdivision of the former into final and non-final (continuation), which is less sharply marked, but obviously a perceived category. The first 3 stimuli of the series had enough of a phrase-final pitch fall from the accent peak to trigger an overwhelming finality judgement; thereafter there is a constant increase of the assessment as continuation up to 15, with question being quite insignificant to this point, but then dominating the response pattern. The middle range of the responses subsumes less clearly falling, level and moderately rising pitch stimuli. For a clear judgement of question a rise of at least 8 semitones seems to be necessary. This clear categorization of the continuum via function in the identification task is not matched by discrimination maxima at the category boundaries. Non-identical stimuli are more often judged different than identical ones, through the whole series, with the exception of the first nonidentical pair, but the discrimination function oscillates around the % mark from sti7-9 onwards. From this point on pairings are first of all around level pitch (slightly falling + level, very slightly falling + very slightly rising, level + slightly rising) and then continue to be both rising from the anchor point of 1Hz, with a constant 2-semitone difference within the pair. The response pattern is different, showing a peak, when both stimuli are falling, i.e. sti3-5 to sti6-8. The different extents of the fall are no doubt processed by the listener as sounding more and less terminal, respectively, and for this reason they may be well discriminated. The pairings round level pitch all sound non-final, signaling continuation, and the 2-step pairings of rises cannot span the functional categories of continuation and question, and consequently all sound equally different. So again the psychophonetic principle does not seem to be operative in this series in spite of clear functional categorization. EXPLAINING THE DATA The difference in F-peak and F-valley categorization may be explained with reference to the specific link of syntagmatic pitch and articulation contrasts across the landmark of consonant- From Sound to Sense: June 11 June 13, 4 at MIT C-1

vowel transition. Both early and medial peak contours are defined by a terminal F fall, but the former is characterised by a high-low, the latter by a low-high, F trajectory across the articulatory landmark, where, in addition, an increase in acoustic intensity heightens the pitch contrast. This link of a reversal of a syntagmatic pitch contrast with the acoustic output of a syntagmatic articulatory contrast would thus determine discriminatory distinctivity, In the valley shift, on the other hand, the decisive pitch difference between early and late final rises is confined to the vowel and thus lacks a tight link with a syntagmatic articulatory contrast. So it can be concluded that a discrimination peak is not an inherent feature in the perceptual categorization of a physical continuum but constitutes a separate psychophonetic principle, based on two features: (1) Syntagmatic contrasts in addition to paradigmatic oppositions of pitch and vocal tract shapes lead to auditory enhancement (Diehl, 1991). (2) Prosodic patterns are perceived in relation to the acoustic patterns of vocal tract sequencing. These two features define a Speech Code with a different perspective from the one developed at Haskins: it transcends the segmental-phonemic orientation and the very specific CSP paradigm. The psychophonetic principle makes it possible for listeners of diverse languages to perceive categorical changes in F peak contour synchronization, even without a knowledge of the respective language. For example, a Chinese listener partitions the early to late peak sequence in the German sentence Sie hat ja gelogen. She's been lying. at the same places as German listeners by referring to changes from tone 3 to tone 4 and finally to tone 2+4 (Kohler 1991, p. 156), without understanding the meaning of the sentences. This psychophonetic principle in pitch perception may be expected to be put to wide-spread use in the languages of the world for a spectrum of functions: word tone, tonal accent (e.g. Swedish), pragmatics in German, English and other languages. Moreover, as the early peak focuses on low pitch in the high-low transition into the accented vowel, whereas the medial peak focuses on high pitch across the articulatory landmark, the association of this low vs high pitch with finality vs openness in German may be seen as another aspect of the Frequency Code (Ohala, 1984), related to dominance vs subordination, and thus to a very general principle of human behaviour. In all cases where pitch patterns are not defined as syntagmatic pitch contrasts in relation to syntagmatic articulatory transitions but as pitch characteristics of syllable nuclei or phrasal positions, the psychophonetic principle does not seem to operate in pitch perception, hence the negative results of discrimination tasks related to valley alignment and phrase-final rising pitch, in spite of functional categorization established in identification tasks. To this list may be added the discrimination of peak height for emphasis (Ladd & Morton, 1997). Since the psychophonetic principle is conceived of as being based on syntagmatic contrast, sound perception, too, would only be expected to show discrimination peaks if the definition of the sound category relies on essential syntagmatic features. This would explain the differences found in the categorization of consonants and vowels. The relevance of syntagmatic contrast is most obvious in place and voiced/voiceless oppostions of plosives, which formed the basis for the classic Haskins CSP and for the Motor Theory of Speech Perception. However, the fixation on the segmental phoneme made it impossible to give the syntagmatic domain a central role in speech perception theory, although the coining of the term encoding and the reinvention of From Sound to Sense: June 11 June 13, 4 at MIT C-161

coarticulation in intensive studies a quarter of a century after it was first proposed by Menzerath and de Lacerda (1933) were attempts to blur the segmental boundaries post hoc. The time has now come to take a more radical approach to the limitations of the segment and the phoneme and to give perception research a new direction for a better understanding of speech communication (Hawkins & Smith 1). ACKNOWLEDGEMENTS The author would like to thank the students who participated in his prosody class at IPDS in the summer semester of 1, in particular Oliver Niebuhr, who supervised the data collection and report compilation of falling-to-rising pitch categorization. A special vote of thanks goes to research assistant Benno Peters, who had the idea for this experiment in the first place and who generated the stimuli. REFERENCES Diehl, R. (1991) The role of phonetics within the study of language, Phonetica, 48, 1-133. Hawkins, S. & Smith, R. (1) Polysp: a polysystemic, phonetically rich approach to speech understanding, Italian Journal of Linguistics, 13, 99-188. Kohler, K. J. (1987) Categorical pitch perception, Proc. XI th International Congress of Phonetic Sciences, Tallinn, 5, 331-333. Kohler, K. J. (1991) Terminal intonation patterns in single-accent utterances of German: phonetics, phonology and semantics, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel (AIPUK), 25, 115-185. Ladd, D. R. & Morton, R. (1997) The perception of intonational emphasis: continuous or categorical?, Journal of Phonetics, 25, 313-342. Lane, H. (1965) The motor theory of speech perception. A critical review, Psychological Review, 72, 275-9. Liberman, A. M., Harris, K. S., Hoffman, H. S., Griffith, B. C. (1957) The discrimination of speech sounds within and across phoneme boundaries, Journal of Experimental Psychology, 54, 358-368. Liberman, A. M., Cooper, F. S., Harris, K. S., MacNeilage, P. F. (1962) A motor theory of speech perception, Proceedings of the Speech Communication Seminar, Stockholm 1962. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., Studdert-Kennedy, M. (1967) Perception of the speech code, Psychological Review, 74, 431-4. Menzerath, P. & de Lacerda, A. (1933), Koartikulation, Steuerung und Lautabgrenzung. Berlin, Bonn: Ferd. Dümmlers Verlag. Niebuhr, O. & Kohler, K. J. (4) Perception and cognitive processing of tonal alignmwent in German, Proceedings of the International Symposium on Tonal Aspects of Languages (TAL4), Beijing. Ohala, J. J. (1984) An ethological perspective on common cross-language utilization of F of voice, Phonetica, 41, 1-16. From Sound to Sense: June 11 June 13, 4 at MIT C-162