Speech production and phonetics

Similar documents
Consonants: articulation and transcription

Phonetics. The Sound of Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

On the Formation of Phoneme Categories in DNN Acoustic Models

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Universal contrastive analysis as a learning principle in CAPT

Contrasting English Phonology and Nigerian English Phonology

THE RECOGNITION OF SPEECH BY MACHINE

age, Speech and Hearii

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Speaker Recognition. Speaker Diarization and Identification

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

MASTERY OF PHONEMIC SYMBOLS AND STUDENT EXPERIENCES IN PRONUNCIATION TEACHING. Master s thesis Aino Saarelainen

Segregation of Unvoiced Speech from Nonspeech Interference

Consonant-Vowel Unity in Element Theory*

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Affricates. Affricates 11/20/2015. Phonetics of English 1

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Body-Conducted Speech Recognition and its Application to Speech Support System

Mandarin Lexical Tone Recognition: The Gating Paradigm

Word Stress and Intonation: Introduction

Speech Emotion Recognition Using Support Vector Machine

Quarterly Progress and Status Report. Sound symbolism in deictic words

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Proceedings of Meetings on Acoustics

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Audible and visible speech

SARDNET: A Self-Organizing Feature Map for Sequences

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Radical CV Phonology: the locational gesture *

Phonological and Phonetic Representations: The Case of Neutralization

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

Edinburgh Research Explorer

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

COMMUNICATION DISORDERS. Speech Production Process

9 Sound recordings: acoustic and articulatory data

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Evaluation of Various Methods to Calculate the EGG Contact Quotient

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

NIH Public Access Author Manuscript Lang Speech. Author manuscript; available in PMC 2011 January 1.

Stages of Literacy Ros Lugg

The Indian English of Tibeto-Burman language speakers*

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Journal of Phonetics

Language Change: Progress or Decay?

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Large Kindergarten Centers Icons

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Complexity in Second Language Phonology Acquisition

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

On the nature of voicing assimilation(s)

Speech/Language Pathology Plan of Treatment

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

The Bruins I.C.E. School

Get Your Hands On These Multisensory Reading Strategies

Journal of Phonetics

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Learning Methods in Multilingual Speech Recognition

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Phonological Processing for Urdu Text to Speech System

Beginning primarily with the investigations of Zimmermann (1980a),

REVIEW OF CONNECTED SPEECH

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Expressive speech synthesis: a review

Adding Japanese language synthesis support to the espeak system

Speech Recognition at ICSI: Broadcast News and beyond

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

Rhythm-typology revisited.

DIBELS Next BENCHMARK ASSESSMENTS

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

U IVERSIDADE FEDERAL DE SA TA CATARI A PROGRAMA DE PÓS-GRADUAÇÃO EM LETRAS/I GLÊS E LITERATURA CORRESPO DE TE. Mariane Antero Alves

The Acquisition of English Intonation by Native Greek Speakers

Transcription:

Speech production and phonetics Slides for this lecture are partly based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 2013. Books: Speech Communications, Douglas O'Shaughnessy Speech production anatomy» Overview, source- filter model of speech production» Vocal tract» Larynx, glottis Articulatory phonetics» Vowels» Consonants» International phonetic alphabet

What is phonetics?» Phonetics studies speech: Production - > ARTICULATORY Acoustic realization - > ACOUSTIC Perception - > AUDITORY AUDITORY PHONETICS ACOUSTIC PHONETICS ARTICULATORY PHONETICS 2

Vocal organs» Vocal organs can be subdivided into: - central (Broca s area, Wernicke s area) Language 3

and - peripheral Larynx, glottis 4

Source- filter model of speech production» Speech production can be viewed as acoustic filtering operation» Larynx (vocal folds) and lungs provide source excitation» Vocal tract acts as a filter that shapes the spectrum of the speech signal

Vocal tract Nasal cavity Oral cavity» Vocal tract refers to vocal organs after the larynx» Divided into following sections: Pharynx cavity Nasal cavity Oral cavity» Organs of vocal tract that move to produce various speech sounds Tongue Soft palate Pharynx cavity Soft palate (velum) - > opens/closes path to nasal cavity Lower jaw Lips 6

Vocal tract and Formants» Vocal tract acts like an adjustable filter: resonant frequencies are determined by the vocal tract shape

opens nose cavity for m, n, ng [ ] cavity closes off larynx while eating (=gullet) à to stomach (=windpipe) à to lungs 8

MRI (Magnetic Resonance Imaging) images of the vocal tract /aa/ /ii/ http://personal.ee.surrey.ac.uk/personal/p.jackson/nephthys/jaleel.html 9

Glottis (in larynx)» Glottis is the space between vocal folds» From the speech production viewpoint, the role of larynx is to turn the silent flow of air from the lungs into audible sound» The arytenoid cartilages are a pair of small three- sided pyramids which form part of the larynx, to which the vocal folds (vocal cords) are attached Muscle that controls the vocal folds - Tightness - Position Space between vocal folds Interarytenoid space Arytenoids 10 http://www.youtube.com/watch?v=wjrsa77u6ou

Function of the vocal folds» A: vocal folds and arytenoids closed - > glottal closure (no airflow)» B: Vocal folds vibrating, arytenoids closed - > phonation, f0; voicing» C: Vocal folds close, arytenoids open- > whisper» D: glottal constriction - > weak unvoiced noise, glottal fricative [h]» E: rest/breathing position - > unvoiced consonants» F: deep- breath position (sigh / breathlessness) - > not used for speech 11

Sources of sound energy» Vocal fold vibration Is caused by pressurized air passing through the membranous portion of the narrowed glottis. Causes repeated opening and closing of the glottis Formation of voiced sounds in this way is called phonation Frequency of vibration: fundamental frequency F 0 can be altered with muscles from 80-400 Hz for males, 120-800 Hz for females, 300 Hz for children.» Turbulence Air moving quickly through a small hole Fricative or unvoiced sounds E.g. tongue/teeth ( ss in hiss )» Explosion Release of pressure build up E.g. behind lips ( p in peak ) or tongue ( t in tell ) Plosive sounds Compare b in bat (voiced plosive) with p in pat (unvoiced plosive)

Articulatory phonetics and International Phonetic Alphabet

Articulatory phonetics» One goal of phonetics is to classify phonemes of different languages Phonetic alphabets: + International phonetic alphabet (IPA) (chart) + Repsesents sounds with symbols: For notational reasons (ASCII- based) others are used too, e.g. Arpabet» Phonetics describes phonemes as accurately as possible based on their articulation 14

Classification of speech sounds» Consonant vs. vowel: consonants involve an obstruction in air stream above the glottis.» Voiced vs. voiceless: voiced if vocal chords vibrate» Nasal vs. oral: nasal if air travels through nasal cavity and oral cavity closed» Lateral vs. non- lateral: In lateral phonemes, air stream passes through the sides of the oral cavity ( ball, lateral ) and not through the middle 15

Vowels Vowels are voiced phonemes, where the vocal tract is open. Vowels are characterized by using articulation features: Open- Close dimension referes to how close the tongue is to the roof of the mouth. The more closer to palate the more closed the the vowel is. Front- Back dimension referes to position of articulation by means of tongue positions: the narrowest point of the vocal tract is essential. Lip roundedness (binary value), right&left of bullet: rounded&unrounded. Nasalization When the velum is open, airflow gets to the nasal cavity and a nasal phoneme is produced. When the velum is closed, an oral phoneme is produced. www.internationalphoneticalphabet.org/ ipa- sounds/ipa- chart- with- sounds/sound 16

Consonants» In most consonants, the airflow is obstructed at some point» Consonants are characterized by: 1. Voicing voiced or unvoiced 2. Place of articulation 3. Manner of articulation 17 IPA consonants in 5 minutes

Voicing of consonants» Voicing is determined by the vibration of the vocal folds» A consonant can be voiced or unvoiced» In English, voiced consonants include [v] (van), [z] (zip), [ʒ] (confusion), [b], [d], [g], [dʒ] (gin)» Unvoiced consonants include: [f], [s], [p], [t], [k], [h], [s], [tʃ] 18

Consonants places of articulation Place of articulation tells where is the primary constriction along the vocal track Consonant s places of articulation: bilabial (1): made with the two lips (P,B,M) labio- dental (2): lower lip & upper front teeth (F,V) dental (4): tongue tip/blade&upper front teeth (TH,DH) alveolar (5): tongue tip/blade & alveolar ridge (T,D,N) retroflex: tongue tip & back of the alveolar ridge (R) palato- alveolar: tongue tip&back of the alveolar ridge (SH) palatal (6): front of the tongue & hard palate (Y,ZH) velar (7): back of the tongue & soft palate (K,G,NG) uvular: (8) back of the tongue against or near the uvula. pharyngeal: (9) in the pharynx glottal: (10) in the glottis (you do not have to remember the above latin words) 19

Consonants manners of articulation» Main variation in the manner of articulatio regards the question how freely the air stream flows when the consonant is produced» Sonorants: continuous, non- turbulent airflow in the vocal tract» Obstruent: airflow is partly or completely obstructed 20

Sonorants Sonorants sounds where the air stream passes unobstructed through the vocal tract (includes vowels and consonants)» Semivowels (aka glides): vowel- like sounds with greater constriction than corresponding vowels (/y/, /w/: yes, well ).» Liquids have spectra similar to vowels, but few decibels weaker.» Lateral ( led ): obstruction of the air stream at a point along the center of the oral tract, with incomplete closure between one or both sides of the tongue and the roof of the mouth (/l/) Retroflex ( red ): tip of the tongue is curled back slightly (/r/)» Nasal: soft palate down, airflow is through the nasal tract (/m/, /n/)» Approximants are similar to fricatives, but articulators do no come close enough to generate turbulent airflow. 21

Obstruents Obstruents are consonants where the airflow is partly of completely obstructed at some point» Plosive: complete obstruction with sudden (explosive) release (/p/, /b/, /t/, /d/, /k/, /g/)» Fricative: articulators close together, turbulent airflow produced. Aperiodic, with usually most of the energy at high frequencies (/f/, /v/, /th/, / dh/, /s/, /z/, /sh/, /zh/, /h/) 22

Flaps and Trills» In trills the articulator vibrates rapidly with frequency of 20-25 Hz against the place of articulation. Only English trill is /r/ as in roar, where tongue touches the alveolar ridge for two to three vibrations.» In flaps the articulation organs touch only once by a single contraction of the muscles involved.

IPA international phonetic alphabet Pronunciation of IPA consosnants Voiceless consonants on the left of left/right pair Voiced in case of only one consonant

Other phonetics terms» Phoneme: the smallest linguistic unit which may bring about a change of meaning (kill vs. kiss). Phonemes are combined to form larger entities such as words. Noted in text with slashes e.g. /i/» Phone: individual spoken realization of a phoneme In principle all phones are different different speech sounds that are realizations of the same phoneme are known as allophones noted in text with brackets e.g. [i]» Coarticulation: vocal organs move in a continuous manner and therefore (conceptually isolated) speech sound is influenced by, and becomes more like, a preceding or following speech sound.» Diphone: the time- span between the middle- part of a phone until the middle part of the following phone. Includes phone transition.» Triphone: a temporal unit that covers two diphones. 25

Prosody» Prosody refers to longer- term properties of speech Rhythm: varying the temporal length of syllables (or some other units) Stress: relative emphasis of syllables in a word or certain words in a sentence, manifested in higher/lower pitch or dynamics (loudness) Intonation: variation of pitch over a segment of multiple words (e.g. Sentence) that may + indicate the attitudes and emotions of the speaker + signal the difference between statement and question + focus attention on the important words 26

Acoustic phonetics» Acoustically, speech signal, as any sound, can be viewed as air pressure level variation» Acoustic phonetics studies the acoustic characteristics of speech and their relationships to the speech production 27 Longitudinal waves: http://www.kettering.edu/physics/drussell/demos/waves/wavemotion.html

Formants F1,F2 for vowels The vocal tract can be treated as an acoustic tube with resonance frequencies called formants, F i where i is the formant order, and i=1 is the lowest frequency.

Speech production and modeling Quatieri: Discrete Time Speech Signal Processing Principles and Practice 29 http://www.phys.unsw.edu.au/jw/glottis- vocal- tract- voice.html