Speech Perception. Phonemes. Source-Filter Model. Speech Articulation Specialized for speech. (Image removed due to copyright considerations.

Similar documents
1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Consonants: articulation and transcription

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Phonetics. The Sound of Language

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

On the Formation of Phoneme Categories in DNN Acoustic Models

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

age, Speech and Hearii

Speaker Recognition. Speaker Diarization and Identification

THE RECOGNITION OF SPEECH BY MACHINE

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Mandarin Lexical Tone Recognition: The Gating Paradigm

Using a Native Language Reference Grammar as a Language Learning Tool

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Rhythm-typology revisited.

Speech Recognition at ICSI: Broadcast News and beyond

Phonological and Phonetic Representations: The Case of Neutralization

Segregation of Unvoiced Speech from Nonspeech Interference

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Body-Conducted Speech Recognition and its Application to Speech Support System

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Language Development: The Components of Language. How Children Develop. Chapter 6

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

L1 Influence on L2 Intonation in Russian Speakers of English

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Universal contrastive analysis as a learning principle in CAPT

Word Stress and Intonation: Introduction

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speech Emotion Recognition Using Support Vector Machine

Phonological Processing for Urdu Text to Speech System

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

English Language and Applied Linguistics. Module Descriptions 2017/18

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Contrasting English Phonology and Nigerian English Phonology

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Sample Goals and Benchmarks

Human Emotion Recognition From Speech

Quarterly Progress and Status Report. Sound symbolism in deictic words

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

GOLD Objectives for Development & Learning: Birth Through Third Grade

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

Consonant-Vowel Unity in Element Theory*

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

Learning Methods in Multilingual Speech Recognition

Self-Supervised Acquisition of Vowels in American English

Different Task Type and the Perception of the English Interdental Fricatives

Self-Supervised Acquisition of Vowels in American English

Copyright and moral rights for this thesis are retained by the author

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Speaker recognition using universal background model on YOHO database

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Guidelines for blind and partially sighted candidates

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

Journal of Phonetics

Eyebrows in French talk-in-interaction

Audible and visible speech

Human Factors Engineering Design and Evaluation Checklist

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Clinical Review Criteria Related to Speech Therapy 1

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Journal of Phonetics

Edinburgh Research Explorer

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Florida Reading Endorsement Alignment Matrix Competency 1

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

9 Sound recordings: acoustic and articulatory data

The Acquisition of English Intonation by Native Greek Speakers

Linking Task: Identifying authors and book titles in verbose queries

Aging and the Use of Context in Ambiguity Resolution: Complex Changes From Simple Slowing

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Evolution of Symbolisation in Chimpanzees and Neural Nets

Automatic intonation assessment for computer aided language learning

Part I. Figuring out how English works

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Transcription:

Question: Why are we studying speech in a perception class? Speech Perception 9.35 Josh McDermott I mean, isn t language some high-level cognitive thing? Answer: Speech is received by the brain as a sound signal. Perceptual processes must transform the sound signal into a form that semantic and syntactic processes can handle. This requires solving some very difficult problems. Phonemes Phonemes are the smallest unit of sound that can make a difference in the meaning of speech. e.g. pot vs. dot *** phonemes are not letters*** Speech Articulation Specialized for speech Creates choking hazard. Breathing affected We can think of the problem of speech perception as that of extracting a string of phonemes from the speech signal. This is hard to do. Source-Filter Model larynx: buzzy sound source Show movie Changeable resonators filter the sound produced: pharynx (throat); mouth lips nose 1

Key Properties of Speech Fundamental frequency (F 0 ) Men: 80-240Hz Women: 140-500Hz Children: 170-600Hz (determined by length and thickness of vocal chords) Harmonics Resonators cavities amplify certain frequencies and dampen others Bigger cavities = low sounds Smaller cavities = high sounds Key Properties of Speech Formants (F 1, F 2, etc.) Strongest frequencies (Result from the size and shape of the resonating cavities) Sound is modulated by manipulating the articulators. Changes resonance properties (frequencies of formants) Changes airflow. Phonemes of the world 40 phonemes in English Range: 11 in Polynesian 141 in Khoisan ( Bushman ) Total inventory across languages: thousands However, some are very common across all languages (e.g., /m/, /n/, /t/, /d/, /k/, /g/, /s/, /z/). Phonemes Consonants: Restricted vocal tract 1. place of articulation (dental vs. velar etc.) 2. manner of articulation (stop vs. nasal vs. fricative etc.) 3. voicing (voiced, unvoiced) Examples: stops /b/: voiced, labial, stop /p/: unvoiced, labial, stop /d/: voiced, dental, stop /t/: unvoiced, dental, stop /g/: voiced, velar, stop /k/: unvoiced, velar, stop 2

Examples: fricatives and nasals /z/: voiced, dental, fricative /s/: unvoiced, dental, fricative /m/: voiced, labial, nasal /n/: voiced, dental, nasal Phonemes, continued Vowels: Unrestricted vocal tract Different vowels are distinguished by how the sound produced by the vocal cords is filtered. Resonances are altered in two ways: 1. part of tongue (front vs. back) - bet vs. butt 2. position of tongue (high, middle, low) - beet vs. bat Changing the resonances alters the formants of the vowel: Speech spectrogram. Darkness indicates intensity of sound at a given moment at a given frequency. I can see you Note different types of segments and what they look like. Stops vs. Vowels Fricatives White noise Generally it is not clear where one word begins and another ends. Formant transitions characterize consonants; formant positions characterize vowels. 3

Question: what happens when you whisper? Vowels can be perceived just fine, but the vocal chords do not vibrate So, all we have to do is build detectors for each phoneme and we re set, right? Nope. Phonemes are produced differently depending on many factors. This results in a constancy problem much like those in vision (e.g. object recognition across viewpoints). Phonemes are produced differently depending on what comes before and after them. But they sound the same! Somehow they are recognized as the same despite producing different patterns of sound energy. Coarticulation Other factors affecting sound Prosody Stress prominence within words permit as a verb PERmit as a noun Intonation Variations in pitch across a phrase Dad wants me to mow the lawn. Dad wants me to mow the lawn? Emotional State Smiling Frowning Stressed Other factors affecting sound Different speakers sound different Accents Gender Age So the same phoneme may be realized in many, many different ways. Solutions to speech perception There are some invariants: Stops bursts Fricatives Turbulence broad spectrum energy Vowels Steady state formants relations between formants Nasals Low frequency band of energy along with absence of high frequency noise /m/ and /n/ differ in formant transitions 4

Solutions: Categorical Perception We impose categories on physically continuous stimuli, which aids their detection. In-class demonstration: the /ka/ - /ga/ continuum Voicing: differences in Voice Onset Time (VOT) Small VOT: voiced; Large VOT: unvoiced /ga/ - /ka/ in-class demonstration 1. 0 msec (/ga/) 2. 70 msec (/ka/) 3. 60 msec (/ka/) 4. 30 msec (usually /ga/) 5. 10 msec (/ga/) 6. 20 msec (/ga/) 7. 40 msec (usually /ka/) 8. 50 msec (/ka) % labeled /ga/ in /ga/-/ka/ continuum Categorical perception. Discrimination is best at a category boundary. What Good is Categorical Perception? Helps to Ignore irrelevant information Quickly classify transient events consonants versus vowels 5

Solutions: Knowledge of Words People are biased to hear phonemes that would result in a known word. -beef/peace demo. Solutions: Visual Input McGurk effect Visual input (lipreading) is integrated with auditory input to determine the phoneme that is perceived. Show demo. McGurk effect: vision & speech interact. Summary: Problems in Phoneme Recognition Problem Lack of invariance Solutions Acoustic features Categorical perception Visual input Context Segmenting words is also hard: No physical boundaries between words Here the words are artificially separated. In real speech everything runs together. Not like written language Mares eat oats and does eat oats and little lambs eat ivy. A kid ll eat ivy too. Wouldn t you? Oronyms It s a doggy-dog world I don t really think it s a parent. The girl with kaleidoscope eyes Top down information We need a decanter. We needed a cantor. Context influences lexical processor 6