Speech and speech processing / April 7, 2005 Ted Gibson

Similar documents
Consonants: articulation and transcription

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Phonetics. The Sound of Language

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

On the Formation of Phoneme Categories in DNN Acoustic Models

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Phonological and Phonetic Representations: The Case of Neutralization

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

age, Speech and Hearii

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

English Language and Applied Linguistics. Module Descriptions 2017/18

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Universal contrastive analysis as a learning principle in CAPT

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speaker Recognition. Speaker Diarization and Identification

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Mandarin Lexical Tone Recognition: The Gating Paradigm

Florida Reading Endorsement Alignment Matrix Competency 1

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Contrasting English Phonology and Nigerian English Phonology

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Word Stress and Intonation: Introduction

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Phonological Processing for Urdu Text to Speech System

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

THE RECOGNITION OF SPEECH BY MACHINE

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Rhythm-typology revisited.

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Journal of Phonetics

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Speech Recognition at ICSI: Broadcast News and beyond

Understanding and Supporting Dyslexia Godstone Village School. January 2017

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

First Grade Curriculum Highlights: In alignment with the Common Core Standards

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Segregation of Unvoiced Speech from Nonspeech Interference

L1 Influence on L2 Intonation in Russian Speakers of English

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Consonant-Vowel Unity in Element Theory*

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Different Task Type and the Perception of the English Interdental Fricatives

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Phonological encoding in speech production

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Learning to Read and Spell Words:

Stages of Literacy Ros Lugg

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Language Development: The Components of Language. How Children Develop. Chapter 6

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

9 Sound recordings: acoustic and articulatory data

Audible and visible speech

Proceedings of Meetings on Acoustics

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

On the nature of voicing assimilation(s)

THE INFLUENCE OF TASK DEMANDS ON FAMILIARITY EFFECTS IN VISUAL WORD RECOGNITION: A COHORT MODEL PERSPECTIVE DISSERTATION

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Large Kindergarten Centers Icons

Self-Supervised Acquisition of Vowels in American English

Journal of Phonetics

ABSTRACT. Some children with speech sound disorders (SSD) have difficulty with literacyrelated

Using a Native Language Reference Grammar as a Language Learning Tool

Self-Supervised Acquisition of Vowels in American English

Quarterly Progress and Status Report. Sound symbolism in deictic words

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Body-Conducted Speech Recognition and its Application to Speech Support System

Test Blueprint. Grade 3 Reading English Standards of Learning

Underlying Representations

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

The Acquisition of English Intonation by Native Greek Speakers

REVIEW OF CONNECTED SPEECH

Transcription:

Speech and speech processing 9.59 / 24.905 April 7, 2005 Ted Gibson

The structure of language Sound structure: phonetics and phonology cat = /k/ + /æ/ + /t/ eat = /i/ + /t/ rough = /r/ + /^/ + /f/

Language sounds win wing writer vs. rider Sounds, not the spelling: rough = /r^f/

Summary Articulatory properties of speech Distinctive / articulatory features English consonants and vowels Information is smeared between segments: co-articulation Speech perception Problems: Lack of invariance, smearing Solutions: Acoustic features; Categorical perception; Motor theory of perception; Use of context What aspects of speech are learned / innate?

Phones vs. Phonemes vs. Allophones Phones: acoustically different speech sounds Phonemes: sounds that make a difference in meaning pot vs. dot Allophones: different phones corresponding to the same phoneme Spin vs. pin S[p]in vs. [p h ]in

Source-Filter Model larynx: buzzy sound source Changeable resonators: pharynx (throat); mouth lips nose

SCHEMATIC OF THE VOCAL TRACT Velum or Soft Palate Hard Palate Uvula Mouth Nasal Passage Nose Alveolar Ridge Back Apex Lips Epiglottis Tongue Food Passage Larynx Vocal Folds Windpipe (Trachea) Figure by MIT OCW.

Key Properties of Speech Formants of voiced sounds (F 1, F 2, etc.) Harmonics: Strongest frequencies (Result from the size and shape of the resonating cavities) Range of human hearing 20Hz-20,000Hz Sound is modulated by manipulating the articulators. Changes resonance properties (frequencies of formants) Changes airflow.

Table removed for copyright reasons. The International Phoentic Alphabet (Phonemes of English).

Phonemes of the world 40 phonemes in English Range: 11 in Polynesian 141 in Khoisan ( Bushman ) Total inventory across languages: thousands However, some are very common across all languages (e.g., /m/, /n/, /t/, /d/, /k/, /g/, /s/, /z/): Easy to produce, easy to distinguish

Speech sounds: Distinctive/Articulatory features Consonants: Restricted vocal tract 1. place of articulation (dental vs. velar etc.) 2. manner of articulation (stop vs. nasal vs. fricative etc.) 3. voicing (voiced, unvoiced)

English Stop Consonants /b/: voiced, labial, stop /p/: unvoiced, labial, stop /d/: voiced, dental, stop /t/: unvoiced, dental, stop /g/: voiced, velar, stop /k/: unvoiced, velar, stop

English Fricatives /f/: unvoiced, labio-dental, fricative /v/: voiced, labio-dental, fricative /s/: unvoiced, dental, fricative /z/: voiced, dental, fricative /sh/: unvoiced, alveolar, fricative /zh/: voiced, alveolar, fricative

English Nasals /m/: voiced, labial, nasal /n/: voiced, dental, nasal /ng/: voiced, velar, nasal

Speech sounds: Distinctive features Vowels: Unrestricted vocal tract 1. part of tongue (front vs. back) - beet vs. boot; bet vs. butt 2. position of tongue (high, middle, low) - beet vs. bat; boot vs. bought

Table removed for copyright reasons. The International Phoentic Alphabet (Phonemes of English).

The dog snapped The different types of segments and what they look like. Stops vs. Vowels Fricatives White noise Generally it is not clear where one segment begins and another stops. Information is smeared

Graphs of frequency vs. time removed for copyright reasons.

Voicing in a Spectrogram: The /ka/ - /ga/ continuum Voicing: differences in Voice Onset Time (VOT) Small VOT: voiced; Large VOT: unvoiced Plosion spike (stop) followed by formants (vowel) Graphs of frequency vs. time removed for copyright reasons.

Phonemes are not produced serially Sounds are not produced serially cat is not just /k/ + /æ/ + /t/ eat is not just /i/ + /t/ rough is not just /r/ + /^/ + /f/ Synthesized speech often sounds unnatural Parallel transmission Context conditioned variation

Continuous speech Coarticulate: adjust pronunciation of current sound to take into account preceding and following sounds kill vs. cool bog Information for segments overlap so we can get out more in a shorter amount of time Fast (~15 sounds/sec): Articulators are not always in the ideal position so we need to cheat

/da/ Graphs of frequency vs. time removed for copyright reasons. /dee/ /doo/

Not independent segments, but Features Speech is a trajectory through a sequence of articulatory targets Rules are conditioned on distinctive features Plural -s bib /z/ dog /z/ dad /z/ tip /s/ tick /s/ cat /s/ kiss /iz/ wish /iz/ pinch /iz/ hen /z/ till /z/ bay /z/ Example of assimilation a feature spreads from one segment to an adjacent segment Makes things easier to pronounce

Speech Perception

Problems for Speech Perception Fast, 15 sounds/sec up to 30 sounds/sec in fast speech Parallel transmission: Sounds blend into each other Each chunk of signal contains evidence of multiple phonemes Coarticulation

Problems for Speech Perception Prosody (suprasegmentals) Stress prominence within words permit as a verb PERmit as a noun Rate Changes formant transitions Same sound can be produced for two different phonemes /ba/ vs. /wa/ Intonation Variations in pitch across a phrase Dad wants me to mow the lawn. Dad wants me to mow the lawn?

Problems for Speech Perception Emotional State Smiling Frowning Stressed Different speakers

Problems for Speech Perception Context-conditioned variation One-to-many variation: Same phoneme may be superficially realized in different ways Many-to-one variation: Different phonemes can have the same sound in different contexts

Summary: Problems in Speech Perception Problems Lack of invariance, smearing Solutions Acoustic features Categorical perception Motor theory of perception Context Same level Phonemic context, prosodic context High level Syntactic, semantic, lexical knowledge

Solutions to speech perception There are some acoustic invariants: Stops Bursts: aperiodic burst of energy in some frequencies Fricatives Turbulence broad spectrum energy Vowels Steady state formants relations between formants Nasals Low frequency band of energy along with absence of high frequency noise voicing /m/ and /n/ differ in formant transitions

Solutions: Categorical Perception For consonants, much of the difficulty of telling sounds apart is at the boundaries among sounds We impose categories on physically continuous stimuli

In-class demonstration: the /ka/ - /ga/ continuum Voicing: differences in Voice Onset Time (VOT) Small VOT: voiced; Large VOT: unvoiced Graphs of frequency vs. time removed for copyright reasons.

/ga/ - /ka/ in-class demonstration 1. 0 msec (/ga/) 2. 70 msec (/ka/) 3. 60 msec (/ka/) 4. 30 msec (usually /ga/) 5. 10 msec (/ga/) 6. 20 msec (/ga/) 7. 40 msec (usually /ka/) 8. 50 msec (/ka)

% labeled /ga/ in /ga/-/ka/ continuum

Results of discrimination task: 10 msec intervals of VOT

Categorical Perception: Can t discriminate stimuli any better than you can identify them. Discriminate tell two things apart Identify classify a sound Perceptual phenomenon; Not a response strategy What Good is Categorical Perception? It helps to Ignore irrelevant information Quickly classify transient events consonants versus vowels

Motor Theory of Perception McGurk Effect Visual information automatically integrated into speech percept Place of articulation cued by visual input Manner cued by ear

Solutions: Phonemic Context Use knowledge of how surrounding segments are articulated to interpret ambiguous segments /s/ is higher frequency than /sh/ White noise is higher preceding /a/ than /u/ A sound halfway between /s/ and /sh/ is interpreted differently depending on whether it is pronounced before a /u/ or an /a/

Graph removed for copyright reasons.

Solutions: Prosodic Context Rate Normalization We correct for speaking rate VOT discrimination Categorical boundary shifts for /ga/-/ka/ if previous syllable is pronounced faster (e.g., short /da/ versus long /da/) Formants /ba/ vs. /wa/ If succeeding syllable is faster, then percept can change.

Solutions: Higher-Level Context Noisy perception (Miller, Heise, Lichten, 1951) Grammatical: Accidents kill motorists on the highways. Anomalous: Accidents carry honey between the house. Scrambled: Around accidents country honey the shoot. Shadowing Echo speech you hear (Marslen- Wilson & Welsh, 1978) Intentional mispronunciations When corrected, they go completely unnoticed and do not delay shadowing Use syntax and semantics to perceive the input

Context can Affect Perception /pi/ vs. /bi/ demo: lexical knowledge affects categorical boundary Not just high-level percept, but perceptual discrimination is affected.

Summary: Problems in Speech Perception Problems Lack of invariance, smearing Solutions Acoustic features Categorical perception Motor theory of perception Context Same level Phonemic context, prosodic context High level Syntactic, semantic, lexical knowledge