Table 1: ARPAbet Examples

Similar documents
Consonants: articulation and transcription

On the Formation of Phoneme Categories in DNN Acoustic Models

Phonetics. The Sound of Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speaker Recognition. Speaker Diarization and Identification

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

A Neural Network GUI Tested on Text-To-Phoneme Mapping

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Guidelines for blind and partially sighted candidates

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

By Zorica Đukić, Secondary School of Pharmacy and Physiotherapy

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Universal contrastive analysis as a learning principle in CAPT

Contrasting English Phonology and Nigerian English Phonology

Mandarin Lexical Tone Recognition: The Gating Paradigm

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

9 Sound recordings: acoustic and articulatory data

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

THE RECOGNITION OF SPEECH BY MACHINE

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Phonological and Phonetic Representations: The Case of Neutralization

Large Kindergarten Centers Icons

Speech Recognition at ICSI: Broadcast News and beyond

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

age, Speech and Hearii

Word Stress and Intonation: Introduction

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Teaching Literacy Through Videos

Grade 6: Correlated to AGS Basic Math Skills

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Prevalence of Oral Reading Problems in Thai Students with Cleft Palate, Grades 3-5

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Shockwheat. Statistics 1, Activity 1

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

CEFR Overall Illustrative English Proficiency Scales

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Fountas-Pinnell Level P Informational Text

WHAT DOES IT REALLY MEAN TO PAY ATTENTION?

Speaker Identification by Comparison of Smart Methods. Abstract

Rhythm-typology revisited.

Audible and visible speech

Human Emotion Recognition From Speech

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Voice conversion through vector quantization

Appendix L: Online Testing Highlights and Script

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Journal of Phonetics

Copyright Corwin 2015

Learning Methods in Multilingual Speech Recognition

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

Radius STEM Readiness TM

Speaker recognition using universal background model on YOHO database

NCSC Alternate Assessments and Instructional Materials Based on Common Core State Standards

Welcome to ACT Brain Boot Camp

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Modern Fantasy CTY Course Syllabus

Understanding and Supporting Dyslexia Godstone Village School. January 2017

STUDENT MOODLE ORIENTATION

SARDNET: A Self-Organizing Feature Map for Sequences

Sample Goals and Benchmarks

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Consonant-Vowel Unity in Element Theory*

Phonological Processing for Urdu Text to Speech System

Multiple Intelligence Teaching Strategy Response Groups

Speech/Language Pathology Plan of Treatment

DIBELS Next BENCHMARK ASSESSMENTS

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

Body-Conducted Speech Recognition and its Application to Speech Support System

This Performance Standards include four major components. They are

Application of Virtual Instruments (VIs) for an enhanced learning environment

Quarterly Progress and Status Report. Sound symbolism in deictic words

Florida Reading Endorsement Alignment Matrix Competency 1

Missouri GLE FIRST GRADE. Communication Arts Grade Level Expectations and Glossary

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Monticello Community School District K 12th Grade. Spanish Standards and Benchmarks

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Piano Safari Sight Reading & Rhythm Cards for Book 1

Course Law Enforcement II. Unit I Careers in Law Enforcement

Transcription:

Scribe for Monday 1/10/05 A new set of slides was handed out (slides for Lecture 3, Lecture 3/4 ) but not used much today. The derivation of the one-dimensional wave equation, which describes the propogation of a wave with speed v [1], listed in Lecture 3 s outline won t be covered. We finished the previous lecture (from the Lecture 2 slides) and looked/listed to an acoustic tube example. This demonstrated that the human vocal tract could be modeled as an acoustic tube [2], which is the focus of the lecture following this one. Symbol Description Example Word Transcription p voiceless bilabial stop put [ p uh t ] ng voiced velar nasal sing [ s ih ng ] n voiced alveolar nasal night [ n ay t ] f voiceless labiao dental fricative f ind [ f ay n d ] Table 1: ARPAbet Examples Transcription in Speech Recognition Here we note the difference between Phonemic and Phonetic transcriptions. According to Wikipedia [3], In spoken language, a phoneme is a basic, theoretical unit of sound that can distinguish words......a succinct way to describe the idea of a phoneme is the smallest difference that makes a difference. Phonetic means how a vocal system would form the sounds, i.e. Phonetics (from the Greek word phone = sound/voice) is the study of speech sounds (voice). It is concerned with the actual nature of the sounds and their production. [3] This also relates to Baseforms and Surface Forms of types of speech, where the baseform is the ideal text which is to be spoken while the surface form is the actual expression of the speech. Speech recognition is typically phonetic. IPA [4] and ARPAbet [5] are two phonetic alphabets commonly used in speech recognition. Here are a few examples from ARPAbet s phonetic alphabet: There are numerous properties of speech used to categorize phones. Place of articulation is all about where in the vocal system the sounds are generated (where s the tongue, where s the teeth, etc.), a few examples of which are listed below. There is a good interactive webpage demonstrating place and manner of articulation at http://www.chass.utoronto.ca/ danhall/phonetics/sammy.html [6]. This applet allows selection of numerous articulation applets, then shows the appropriate modification to a graphical cross section of the vocal tract. Manner of articulation is another one of these properties. Continuant sounds (steady-state) vs. non-continuant sounds (transient) come out 1

Location Description Sample Word Bilabial Lips Closed Together mmm Labiodental Upper Lip and Lower Teeth fit Aveolar Tongue Behind Upper Teeth dune Palatal Tongue Behind Ridge Behind Upper Teeth nog] Table 2: Some Places of Articulation when there is an articulator moving or not moving during the sound production. With continuant sounds, the passage of air is restricted, but not completely stopped. Articulators do not move in the production of continuant sounds. Continuants are sometimes called fricatives [7]. Non-continuant sounds are those in which a change in the vocal tract configuration is required during the production of the sound [8]. More than one sound can be made with the same articulator placement! These different sounds are determined by whether the sound is voiced or unvoiced. For example, foo and voo sound different, but if you whisper the two, the v sound is no longer voiced and sounds just like the f. Vowels are distinguished by several characteristics: Large amplitude Long duration (40-400ms) Distinguished by tongue hump and degree of constriction Every language has a schwa vowel sound. It s just that popular. Some cultures have vowel sounds that others don t. Pronunciation is (intuitively) based on the speaker. Vowel degrees of constriction range from high to low, depending on (you guessed it) constriction of the airway. Tongue hump position ranges from front to back. As a general rule, going from low high constriction (ex. a ee ) tends to lower the F1 formant. Formants and Frequency Singers and musicians sometimes use software which shows them the formants of their voice in real-time in order to help them keep their sounds more steady [9]. Formants are the resonance frequencies of the vocal tract. Formants are not harmonics of the fundamental frequency. If you have a system G(S) as shown in Figure 1 below, the frequency of the output signal Y (S) is going to be based on the frequency of the input signal U(S). If the system is designed 2

to model the frequency response of the vocal tract, then it cannot be modeled as a Linear, Time-Invariant (LTI) system except for brief intervals of time where the formants are not changing. The frequency response of the vocal tract is determined by the formants, not the frequency of the output. As a reminder, the output of an LTI system is determined by the convolution of the input with the system s description, Y (S) = U(S) G(S) or y(t) = (u(τ)g(t τ)dτ) or y(n) = k(u(k)g(n k)) Figure 1: System Block Diagram If several of the formants of vowels are plotted (F1 formant vs. F2 formant), a rough figure of a triangle [10] is formed (see figure below). This usually doesn t work very well for determining vowels. On top of that difficulty it s pretty tough to automatically find formants. To see this more clearly, we could record 5-10 seconds of a vowel without changing the articulators, view the spectrogram and fft, and try to guess the formants from looking at these analyses. Formants with a higher center frequency occupy a greater bandwidth. Figure 2: Vowel Triangle for some Common Vowel Sounds The sound of a dipthong is defined by movement of the articulators. Therefore they must be non-continuants. 3

Stops/Plosives involve the buildup and release of air pressure behind an articulator. The McGurk effect is a phenomenon where a person s perception of what is being said may be dependent on what the person is seeing. We watched a video clip of a person s lips moving while listening to a repetitive audio clip. The results of this experiment are in Table 1. Class Visual Audio ba ba ba va va ba za tha ba ah ga ba Table 3: McGurk Effect Prosody is like the musical part of speech. It refers to the rhythmic and pitch changes in a word which can affect its meaning. Sarcasm is one way to use prosody to alter the meaning of a word or phrase. Musical notation isn t really precise enough to properly represent the pitch contours of prosody. In some cultures, the pitch affects meaning. Usually there isn t much differentiation in pitches representing different information because humans with perfect pitch are rare (people who can listen to a tone and accurately know its absolute pitch). As a preview for next lecture, we listened to a duck whistle with a variety of acoustic tube attachments. The duck whistle mimicked the glottal sounds, while the tubes altered the duck call to sound like certain vowel sounds by mimicking the constrictions of the vocal tract. REFERENCES 1 Wave Equation Description http://mathworld.wolfram.com/waveequation.html 2 Human Vocal Tract as an Acoustic Tube http://ccrma.stanford.edu/ bilbao/master/node5.html 3 Wikipedia http://en.wikipedia.org/wiki/ 4 IPA Alphabet Chart http://www.arts.gla.ac.uk/ipa/ipachart.html 5 ARPAbet Alphabet Chart http://www.billnet.org/phon/arpabet.html 6 Articulation Applet http://www.chass.utoronto.ca/ danhall/phonetics/sammy.html 7 Continuant Sounds 4

http://www.inthebeginning.org/ntgreek/phonics/continuant.htm 8 Speech Terms http://www.research.ibm.com/people/l/lvsubram/teaching/speech/speechterms.htm 9 Video Voice Software http://www.videovoice.com/ 10 Vowel Triangle Image http://isl.ira.uka.de/speechcourse/slides/nature/acoustics/formants/formants.gif 5