Language and Perception. Theories of Speech Perception

Similar documents
Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Speech Recognition at ICSI: Broadcast News and beyond

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Psychology of Speech Production and Speech Perception

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Phonological encoding in speech production

On the Formation of Phoneme Categories in DNN Acoustic Models

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Phonological and Phonetic Representations: The Case of Neutralization

Stages of Literacy Ros Lugg

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Journal of Phonetics

Using computational modeling in language acquisition research

THE RECOGNITION OF SPEECH BY MACHINE

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

A joint model of word segmentation and meaning acquisition through crosssituational

REVIEW OF NEURAL MECHANISMS FOR LEXICAL PROCESSING IN DOGS BY ANDICS ET AL. (2016)

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

LEXICAL CATEGORY ACQUISITION VIA NONADJACENT DEPENDENCIES IN CONTEXT: EVIDENCE OF DEVELOPMENTAL CHANGE AND INDIVIDUAL DIFFERENCES.

Mandarin Lexical Tone Recognition: The Gating Paradigm

Infants learn phonotactic regularities from brief auditory experience

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

English Language and Applied Linguistics. Module Descriptions 2017/18

Proceedings of Meetings on Acoustics

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Modeling function word errors in DNN-HMM based LVCSR systems

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Contact Information 345 Mell Ave Atlanta, GA, Phone Number:

Beeson, P. M. (1999). Treating acquired writing impairment. Aphasiology, 13,

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Learning Methods in Multilingual Speech Recognition

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Speech Emotion Recognition Using Support Vector Machine

Speaker Identification by Comparison of Smart Methods. Abstract

INTRODUCTION. 512 J. Acoust. Soc. Am. 105 (1), January /99/105(1)/512/10/$ Acoustical Society of America 512

Copyright and moral rights for this thesis are retained by the author

ANNUAL REPORT SCHOOL OF COMMUNICATION SCIENCES & DISORDERS FACULTY OF MEDICINE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Perceptual foundations of bilingual acquisition in infancy

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Language Development: The Components of Language. How Children Develop. Chapter 6

Communicative signals promote abstract rule learning by 7-month-old infants

Artificial Neural Networks

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

Abstractions and the Brain

age, Speech and Hearii

Audible and visible speech

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Rhythm-typology revisited.

Course Law Enforcement II. Unit I Careers in Law Enforcement

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Philosophy of Literacy Education. Becoming literate is a complex step by step process that begins at birth. The National

Knowledge Transfer in Deep Convolutional Neural Nets

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Aging and the Use of Context in Ambiguity Resolution: Complex Changes From Simple Slowing

Processing Lexically Embedded Spoken Words

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Evolution of Symbolisation in Chimpanzees and Neural Nets

Self-Supervised Acquisition of Vowels in American English

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Exploring Dyslexics Phonological Deficit I: Lexical vs Sub-lexical and Input vs Output Processes

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Segregation of Unvoiced Speech from Nonspeech Interference

Human Emotion Recognition From Speech

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

SARDNET: A Self-Organizing Feature Map for Sequences

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Modeling function word errors in DNN-HMM based LVCSR systems

Speaker Recognition. Speaker Diarization and Identification

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Consonants: articulation and transcription

Body-Conducted Speech Recognition and its Application to Speech Support System

Degeneracy results in canalisation of language structure: A computational model of word learning

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

THE INFLUENCE OF TASK DEMANDS ON FAMILIARITY EFFECTS IN VISUAL WORD RECOGNITION: A COHORT MODEL PERSPECTIVE DISSERTATION

Python Machine Learning

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

One major theoretical issue of interest in both developing and

Transcription:

Language and Perception Theories of Speech Perception

Theories of Speech Perception Theories specify the objects of perception and the mapping from sound to object. Theories must provide for robustness and graceful degradation. A key element to graceful degradation is the principle of least commitment. Theories must be sufficiently specific to be falsified (perhaps by being implemented as a model of perception).

Speech Oddities Perceptual constancy, but lack of invariants Categorical perception Segmentation Audio-visual integration Duplex perception Rate of speech sounds

Where is the Invariant? Three types of theories: 1. In the signal, but we haven t been looking in the right place (e.g., Stevens & Blumstein) 2. In the production of the signal: Motor Theory (Liberman, Mattingly, et al.) 3. In the mind of the perceiver: TRACE (McClelland & Elman)

Categories of Theories Active vs. Passive Bottom-up vs. Top-Down Autonomous vs. Interactive

Active vs. Passive Theories Active theories the process of speech perception involves some aspect of speech production, with the listener viewed as having an active part in the process. Speech sounds are sensed, analyzed for their phonetic properties by reference to how such sounds are produced, and thereby recognized. Passive theories the process of speech perception is primarily sensory and the listener is relatively passive in this process. The listener has a filtering mechanism with knowledge of speech production and vocal tract characteristics playing a minor role and only in difficult listening situations.

Bottom-up vs. Top-Down Theories Bottom-up All the information necessary for the recognition of sounds is contained within the acoustic signal. The first stages involve the conversion of the incoming auditory information into a neural signal. Some sort of neural spectrogram reveals the timevarying formant frequencies into speech. From this neural code the perceptual system has to derive the critical phonetic features. The listener doesn t need to involve linguistic and cognitive processes in decoding sounds.

Bottom-up vs. Top-Down Theories Top-down higher-level linguistic and cognitive operation plays a crucial role in the identification and analysis of sounds. The listener makes use of stored knowledge that serves to constrain the number of plausible alternative messages.

Phonemic restoration If a sound in a known word is removed and replaced by a noise (a cough or a buzz), then listeners think they have heard the speech sound anyway (Warren, 1970). Supposedly, they cannot tell exactly where the noise was in the utterance. Consider: It was found that the *eel was on the shoe. It was found that the *eel was on the table. It was found that the *eel was on the orange. It was found that the *eel was on the axle.

Autonomous vs. Interactive Theories Autonomous the signal is processed in a serial manner, from the phonetic to lexical stages, to syntactic stages and so on. The listener s perceptual decision making can be made in a closed, autonomous system that contains all the necessary perceptual operations for such decisions, with no need for other sources of information (e.g., info provided by context). The output of one stage of processing provides the input to the next stage Interactive information and knowledge from many sources are available to the listener and are involved at any or all stages of processing the signal on it s way through the speech perception system.

Stevens & Blumstein Acoustic Landmarks 1) Landmark detection. Points of maximal and minimal change. 2) Measure acoustic correlates in vicinity of landmarks. 3) Estimate distinctive features and syllable structure. 4) Match to lexicon, use lexical info to synthesize a set of landmarks and cues, compare to results of step 2.

Landmarks The landmarks and cues are derived from considerations of the articulators. That is, the representation is distinctive features that are useful in speech production. The analysis of the signal is based on a process of segmentation and landmark identification. Again, the landmarks are motivated by articulatory considerations. Only one underlying representation is present for each lexical item.

Landmark Theory - Critique The mapping of acoustic correlate to feature not yet sufficiently specified. This makes testing difficult. No psychological evidence for landmarks. If an iterative component is present, see earlier critique about analysis-bysynthesis. Does prosodic information influence early processing?

Landmark Theory - Classification Active Bottom-up Autonomous

TRACE Elman and McClellan proposed TRACE as a multi-stage model that consists of an auditory (ear) front end, auditory feature extraction, a phonetic level, and a lexical level. TRACE is implemented in a connectionist architecture and has both ascending and descending (feedback) connections as well as connections within each level. TRACE is both a theory and a model of perception.

Connectionist Models a/k/a PDP or neural networks Class of neurally inspired information processing models that attempt to model information processing the way it actually takes place in the brain. A system of neural connections appeared to be distributed in a parallel array in addition to serial pathways. Different types of mental processing are considered to be distributed throughout a highly complex neural network. Information processing takes place through interactions of large numbers of simple processing elements called units, each sending excitatory and inhibitory signals to other units.

TRACE

TRACE Multiple levels of representation as well as feed-forward and feedback connections between processing units (nodes). Nodes are arranged on three levels that together, form a network Phonetic feature Phoneme Word Activation on one level increases the activity of all connected nodes on adjacent levels (bottom-up or topdown). Within all levels, nodes are connected by inhibitory links, forcing rapid resolution of any ambiguity in the signal (i.e., suppressing competing nodes).

Trace Key elements Invariant cues are not required. Perception is a result of a cascade of stages involving a one-to-many and many-to-one mapping (behaves like a prototype system). Feedback and competition among nodes at the same level are used to stabilize perception.

Trace - Critique Some aspects of connectionist architecture are very implausible. Only implements limited set of features, phonemes, and words. Unclear if this can be scaled to the full range of voices, speaking rates, phonemes and words of spoken language (is this robust?). No separate justification for mapping of cues to phonemes other than it can be learned by model (using back-propagation learning).

Trace - Classification Passive Top-Down Interactive

Supplementary Readings Anderson, J. L., Morgan, J. L., & White, K. S. (2003). A statistical basis for speech sound discrimination. Language and Speech, 46, 155-182. Auberge, V., & Cathiard, M. (2003). Can we hear the prosody of smile? Speech Communication, 40, 87-97. Barker, B. A., & Newman, R. S. (2004). Listen to your mother! The role of talker familiarity in infant streaming. Cognition, 94, B45-B53. Boatman, D. (2004). Cortical bases of speech perception: Evidence from functional lesion studies. Cognition, 92, 47-65. Bosch, L., & Sebastian-Galles, N. (2003). Simultaneous bilingualism and the perception of a language-specific vowel contrast in the first year of life. Language and Speech, 46, 217-243. Dehaene-Lambertz, G., & Gliga, T. (2004). Common neural basis for phoneme processing in infants and adults. Journal of Cognitive Neuroscience, 16, 1375-1387.

Supplementary Readings Goldinger, S. D., & Azuma, T. (2003). Puzzle-solving science: The quixotic quest for units in speech perception. Journal of Phonetics, 31, 305-320. Grossberg, S. (2003). Resonant neural dynamics of speech perception. Journal of Phonetics, 31, 423-445. LoCasto, P. C., Krebs-Noble, D., Gullapalli, R. P., & Burton, M. W. (2004). An fmri investigation of speech and tone segmentation. Journal of Cognitive Neuroscience, 16, 1612-1624. Mills, D. L., Prat, C., Zangl, R., Stager, C. L., Neville, H. J., & Werker, J. F. (2004). Language experience and the organization of brain activity to phonetically similar words: ERP evidence from 14- and 20-month-olds. Journal of Cognitive Neuroscience, 16, 1452-1464. Nazzi, T., & Ramus, F. (2003). Perception and acquisition of linguistic rhythm by infants. Speech Communication, 41, 233-243.

Supplementary Readings Pichora-Fuller, M., & Souza, P. E. (2003). Effects of aging on auditory processing of speech. International Journal of Audiology, 42, 2S11-2S16. Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neurosciences, 26, 100-107. Thomas, S. M., & Jordan, T. R. (2004). Contributions of oral and extraoral facial movement to visual and audiovisual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 30, 873-888. Toro, J. M., Trobalon, J. B., & Sebastian-Galles, N. (2005). Effects of backward speech and speaker variability in language discrimination by rats. Journal of Experimental Psychology: Animal Behavior Processes, 31, 95-100. Vouloumanos, A., & Werker, J. F. (2004). Tuned to the signal: The privileged status of speech for young infants. Developmental Science, 7, 270-276. Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech activates motor areas involved in speech production. Nature Neuroscience, 7, 701-702.