reading: Borden, et al. Ch. 6 (today); Keating (1990): The window model of coarticulation (Tues) Theories of Speech Perception

Similar documents
Speech Recognition at ICSI: Broadcast News and beyond

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Psychology of Speech Production and Speech Perception

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Processing Lexically Embedded Spoken Words

THE INFLUENCE OF TASK DEMANDS ON FAMILIARITY EFFECTS IN VISUAL WORD RECOGNITION: A COHORT MODEL PERSPECTIVE DISSERTATION

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Proceedings of Meetings on Acoustics

Phonological and Phonetic Representations: The Case of Neutralization

Mandarin Lexical Tone Recognition: The Gating Paradigm

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

Modeling function word errors in DNN-HMM based LVCSR systems

Ling/Span/Fren/Ger/Educ 466: SECOND LANGUAGE ACQUISITION. Spring 2011 (Tuesdays 4-6:30; Psychology 251)

Infants learn phonotactic regularities from brief auditory experience

Phonological encoding in speech production

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Florida Reading Endorsement Alignment Matrix Competency 1

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Effects of Open-Set and Closed-Set Task Demands on Spoken Word Recognition

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Falling on Sensitive Ears

Universal contrastive analysis as a learning principle in CAPT

Body-Conducted Speech Recognition and its Application to Speech Support System

An Empirical and Computational Test of Linguistic Relativity

Consonants: articulation and transcription

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

First Grade Curriculum Highlights: In alignment with the Common Core Standards

THE RECOGNITION OF SPEECH BY MACHINE

Lecturing Module

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Abstractions and the Brain

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Modeling function word errors in DNN-HMM based LVCSR systems

On the Formation of Phoneme Categories in DNN Acoustic Models

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

A NOTE ON THE BIOLOGY OF SPEECH PERCEPTION* Michael Studdert-Kennedy+

SLINGERLAND: A Multisensory Structured Language Instructional Approach

Evolution of Symbolisation in Chimpanzees and Neural Nets

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Rhythm-typology revisited.

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Language Acquisition Chart

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Perceptual processing of partially and fully assimilated words in French

Retrieval in cued recall

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Audible and visible speech

Learning Methods in Multilingual Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Stages of Literacy Ros Lugg

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

English Language and Applied Linguistics. Module Descriptions 2017/18

CEFR Overall Illustrative English Proficiency Scales

TEKS Comments Louisiana GLE

On building models of spoken-word recognition: When there is as much to learn from natural oddities as artificial normality

What is a Mental Model?

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Large Kindergarten Centers Icons

Spring Course Syllabus. Course Number and Title: SPCH 1318 Interpersonal Communication

Degeneracy results in canalisation of language structure: A computational model of word learning

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Challenging Texts: Foundational Skills: Comprehension: Vocabulary: Writing: Disciplinary Literacy:

URBANIZATION & COMMUNITY Sociology 420 M/W 10:00 a.m. 11:50 a.m. SRTC 162

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999

Implementing the English Language Arts Common Core State Standards

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

The Enterprise Knowledge Portal: The Concept

Stochastic Phonology Janet B. Pierrehumbert Department of Linguistics Northwestern University Evanston, IL Introduction

GOLD Objectives for Development & Learning: Birth Through Third Grade

Aging and the Use of Context in Ambiguity Resolution: Complex Changes From Simple Slowing

Figuration & Frequency: A Usage-Based Approach to Metaphor

Phonological Processing for Urdu Text to Speech System

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Using computational modeling in language acquisition research

Transcription:

L105/205 Phonetics Scarborough Handout 15 Nov. 17, 2005 reading: Borden, et al. Ch. 6 (today); Keating (1990): The window model of coarticulation (Tues) Theories of Speech Perception 1. Theories of speech perception must be able to account for certain facts about the acoustic speech signal, e.g.: There is inter-speaker and intra-speaker variability among signals that convey information about equivalent phonetic events. The acoustic speech signal is continuous even though it is perceived as and represents a series of discrete units. Speech signals contain cues that are transmitted very quickly (20 to 25 sounds per second) and simultaneously. They must also be able to account for various perceptual phenomena, e.g.: categorical perception phonemic restoration episodic memory plus, various word recognition effects (e.g., frequency effects, priming, etc.) 2. Theories of speech perception differ with respect to their views of what is perceived and how. Auditory listeners identify acoustic patterns or features by matching them to stored acoustic representations Bottom-up perception is built from information in the physical signal Active cognitive/intellectual work is involved in perception Motor listeners extract information about articulations from the acoustic signal Top-down listeners use higher level sources of information to supplement the acoustic signal Passive perception relies on passive responses (e.g., thresholds) Auditory theories 3. Auditory Model (Fant, 1960; also Stevens & Blumstein, 1978) The assumption of this model is that invariance can always be found in the speech signal by means of extraction into distinctive features. Listeners, through experience with language, are sensitive to the distinctive patterns of the speech wave. We have feature detectors (that may be more or less specialized).

o template matching: When we listen to speech, we match the incoming auditory patterns to stored templates (phonemes or syllables) to identify the sounds. Templates may be more abstract than the patterns or features found in spectrograms (especially to represent place of articulation). o After being decoded, the perceptual units have to be recombined to access lexical items. Auditory Enhancement Theory (Diehl & Kluender, 1989) Various acoustic properties may work together to increase the auditory salience of phonological contrasts. Contrasts between sounds are robust because phonological systems have evolved to enhance the perceptual distinctiveness of the contrasts. Motor theories 4. Motor Theory (Liberman, et al., 1967; Liberman & Mattingly, 1985) Given the lack of acoustic invariance, we can look for invariance in the articulatory domain (i.e., maybe the representational units are defined in articulatory terms). Motor theory postulates that speech is perceived by reference to how it is produced; that is, when perceiving speech, listeners access their own knowledge of how phonemes are articulated. Articulatory gestures such as rounding or pressing the lips together are units of perception that directly provide the listener with phonetic information. Biological specialization for phonetic gestures prevents listeners from hearing the signal as ordinary sound, but enables them to use the systematic, special relation between signal and sound to perceive the gestures. Originally, the motor commands that control articulation were considered to be the invariant phonetic features. The revised theory says that it is intended gestures that are the invariant object of perception.

(from Fougeron web tutorial) - We perceive sounds discretely (categorically) because sounds are produced with discrete articulators/gestures. The McGurk effect suggests that we represent at least some features as articulatory. 5. Analysis by Synthesis (Stevens & Halle, 1960) In this model, speech perception is based on auditory matching mediated through speech production. When a listener hears a speech signal, he or she analyzes it by mentally modeling the articulation (in other words, the listener tries to synthesize the speech his or herself). If the auditory result of the mental synthesis matches the incoming acoustic signal, the hypothesized perception is interpreted as correct. 6. Direct Realist Theory (Fowler, 1986) Direct realism postulates that speech perception is direct (i.e., happens through the perception of articulatory gestures), but it is not special. All perception involves direct recovery of the distal source of the event being perceived (Gibson). In vision, you perceive objects (e.g., trees, cars, etc.). Likewise with smell you perceive e.g., cookies, roses, etc. Why not in the auditory perception of speech? So, listeners perceive tongues and lips. The articulatory gestures that are the objects of speech perception are not intended gestures (as in Motor Theory). Rather, they are the actual gestures. Word recognition 7. TRACE (McClelland & Elman, 1986) TRACE is a connectionist network model of speech perception / lexical perception. Different levels of speech units (e.g., features, phonemes, words) are represented on different levels of the network.

o Influences across levels share excitatory activation; i.e., activated features lead to the activation of the related phoneme; activated phonemes activate units on the word level. o Influences within a level (those that are inconsistent with eachother) are inhibitory; i.e., the activation of one phoneme level unit inhibits the activation of other competing phonemes. 8. Cohort Theory (Marslen-Wilson, 1980) Cohort theory models spoken word recognition. Based on the beginning of an input word, all words in memory with the same word-initial acoustic information, the cohort, are activated. As the signal unfolds in time, members of the cohort which are no longer consistent with the input drop out of the cohort. input: cap- (e.g., of captivate) cap, captain, capsize, captive, caption, capital, captivate, etc. capt- (of captivate) cap, captain, capsize, captive, caption, capital, captivate, etc. Cohort elimination continues until a single word remains (i.e., is identified). The point (left to right) at which a word diverges from all other members of the cohort is called the uniqueness point. 9. Neighborhood Activation Model (Luce, 1986; Luce & Pisoni, 1998) The Neighborhood Activation Model (NAM) models spoken word recognition as the identification of a target from among a set of activated candidates (competitors). All words phonologically similar to a given word are in the word s neighborhood. Recognition of a word is based on the probability that the stimulus word was presented compared to the probability that other words in the neighborhood were in fact presented. Probability is also influenced by lexical frequency.

High Relative Frequency High recognition probability Low Relative Frequency Low recognition probability 10. Exemplar Models Non-analytic approaches (e.g., Johnson, 1997; Goldinger, 1997; Pierrehumbert, 2002) In most models of speech perception, the objects of perception (or the representational units) are highly abstract. In fact, information about specific instances of a particular word are abstracted away from and discarded in the process of speech perception. So information about a particular speaker or speech style or environmental context can play no role in the representation of words in memory. Exemplar models postulate that information about particular instances (episodic information) is stored. Mental representations do not have to be highly abstract. They do not necessarily lack redundancy. Categorization of an input is accomplished by comparison with all remembered instances of each category (rather than by comparison with an abstract, prototypical rep n). - Often, exemplars are modeled as categorizations of words, but they might also be categorizations of segments or syllables or whatever. Stored exemplars are activated to a greater or lesser extent according to their degree of similarity to an incoming stimulus; activation levels determine categorization...?.. input stored rep n of

11. Generalized model of speech perception speech acoustic analysis initial product Reference Cohort Templates Motor acts Similarity network Other comparator or selector (grammar constraints) decision (adapted from Kent, 1997) 12. Machine speech recognition speech front end A/D windowing DSP some set of acoustic measures for ea. VQ code-book: spectral classification lexicon most likely sequence of words sequence of output units grammar constraints (adapted from Keating notes) statistical models of windows and/or output units (HMM) (e.g., phones, diphones) training data (labeled data) linguists find data to describe possible inputs to build stat. models these must be constrained somehow