Specialization Module. Speech Technology. Timo Baumann

Similar documents
Consonants: articulation and transcription

Speaker Recognition. Speaker Diarization and Identification

Phonetics. The Sound of Language

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Universal contrastive analysis as a learning principle in CAPT

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

age, Speech and Hearii

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Proceedings of Meetings on Acoustics

THE RECOGNITION OF SPEECH BY MACHINE

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Phonological and Phonetic Representations: The Case of Neutralization

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Learning Methods in Multilingual Speech Recognition

Body-Conducted Speech Recognition and its Application to Speech Support System

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Rhythm-typology revisited.

Contrasting English Phonology and Nigerian English Phonology

Mandarin Lexical Tone Recognition: The Gating Paradigm

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Phonological Processing for Urdu Text to Speech System

Segregation of Unvoiced Speech from Nonspeech Interference

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

The influence of orthographic transparency on word recognition. by dyslexic and normal readers

Consonant-Vowel Unity in Element Theory*

On the Formation of Phoneme Categories in DNN Acoustic Models

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

The Bruins I.C.E. School

Speaker recognition using universal background model on YOHO database

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

9 Sound recordings: acoustic and articulatory data

DIBELS Next BENCHMARK ASSESSMENTS

Speech Recognition at ICSI: Broadcast News and beyond

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Florida Reading Endorsement Alignment Matrix Competency 1

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Journal of Phonetics

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Radical CV Phonology: the locational gesture *

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Word Stress and Intonation: Introduction

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Voice conversion through vector quantization

A Believable Accent: The Phonology of the Pink Panther

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Self-Supervised Acquisition of Vowels in American English

Stages of Literacy Ros Lugg

Self-Supervised Acquisition of Vowels in American English

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Quarterly Progress and Status Report. Sound symbolism in deictic words

GOLD Objectives for Development & Learning: Birth Through Third Grade

Contrastiveness and diachronic variation in Chinese nasal codas. Tsz-Him Tsui The Ohio State University

Language Change: Progress or Decay?

SARDNET: A Self-Organizing Feature Map for Sequences

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

The pronunciation of /7i/ by male and female speakers of avant-garde Dutch

Journal of Phonetics

Affricates. Affricates, nasals, laterals and continuants. Affricates. Affricates. Study questions

Human Emotion Recognition From Speech

Multi-sensory Language Teaching. Seamless Intervention with Quality First Teaching for Phonics, Reading and Spelling

The Acquisition of English Intonation by Native Greek Speakers

Missouri GLE FIRST GRADE. Communication Arts Grade Level Expectations and Glossary

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Speech Emotion Recognition Using Support Vector Machine

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

English Language and Applied Linguistics. Module Descriptions 2017/18

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

RP ENGLISH AND CASTILIAN SPANISH DIPHTHONGS REVISITED FROM THE BEATS-AND-BINDING PERSPECTIVE

ABSTRACT. Some children with speech sound disorders (SSD) have difficulty with literacyrelated

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Infants learn phonotactic regularities from brief auditory experience

Transcription:

Specialization Module Speech Technology Timo Baumann baumann@informatik.uni-hamburg.de Universität Hamburg, Department of Informatics Natural Language Systems Group

A bit of Phonetics

Speech Production: Source-Filter Model glottal folds produce primary signal vocal tract acts as a filter (slightly different for voiceless sounds) figure derived from Wikimedia Commons; CC-BY-SA-2.5

Speech Production: Vowels glottal folds produce primary signal vocal tract acts as a filter the field of movement for the tongue in oral cavity is idealized as a trapezoid resonance of cavity determines vowel

Speech Production: Vowels glottal folds produce primary signal vocal tract acts as a filter the field of movement for the tongue in oral cavity is idealized as a trapezoid resonance of cavity determines vowel

Vocalic sounds: Diphthongs of course, the tongue may move during the vowel, resulting in a changing sound, r ce ni ]: [aɪ t, igh [aʊ ]: lou d,

Speech Production: Consonants two types of phones: vowels: air is exhaled freely consonants: obstruction perturbs air further classification criteria: although there's no clear definition of what is still an [i:] or already a [j] vocal tract is not just a filter but also a source of additional sound voiceless consonants: glottal folds are open, sound only from perturbation means of articulation: voicing, mouth opening, tongue position, lip rounding, nasality, secondary obstructions, length,... classification by International Phonetic Association

Consonants manner of articulation (plosives, nasals, fricatives, ) place of constriction (lips, teeth, glottis)

The International Phonetic Alphabet more symbols: other sounds (clicks, ) tones stress marks lengthening more details used for narrow transcription, e.g. in dialectology languages often do not distinguish between all possible sounds

Exercise (in small groups): 1. transcribe your name in the phonetic alphabet 2. transcribe some words (ideally: not English nor German) without speaking them aloud 3. exchange notes, listen carefully whether your partners correctly read out your transcript; check for errors

The Phonemic System of a Language only small subset of symbols in the IPA contextual rules determine phonetic realization e.g. German [ç/x] ( ich / ach ) is a single phoneme /ç/ context limitations (Phonotactics), often in combination with syllabic structure syllable = onset + nucleus + coda e.g. German nucleus must be a vowel; complex coda with up to 5 consonants (rules for consonant sequences) e.g. Japanese: restrictions on coda and consonant clusters: Arbeit arubaito baumukūhen, ryukkusakku? e.g. English: no /ŋ/ in onset, no /h/ in coda,

N-American English Phoneme Set

German Phoneme Set more vowels(/y/, /ʏ/, /œ/), fewer diphthongs similar consonants (but their realization differs, e.g. aspiration)

Units of Speech: Phones vs. Phonemes speech sounds ( Phonetics) distinguishable units language independent Signifiant linguistic symbols ( Phonology) distinctive units every language has its phoneme system Signifié minimal pairs: bat rat cat /b/, /r/, /k/ are phonemes in English, thus different phones one's articulatory/perceptory capacities are shaped by the mother tongue(s) different sounds may sound identical or be hard to pronounce

Units of Speech: Phones vs. Phonemes speech sounds ( Phonetics) distinguishable units language independent Signifiant Notational Convention: examples in quotes /phonemes/ in slashes [phones] in brackets linguistic symbols ( Phonology) distinctive units every language has its phoneme system Signifié minimal pairs: bat rat cat /b/, /r/, /k/ are phonemes in English, thus different phones one's articulatory/perceptory capacities are shaped by the mother tongue(s) different sounds may sound identical or be hard to pronounce

Phonotactics words have a phonemic representation in the mental lexicon: phonotactics determines realization probably /'prabəbli/ /'prabəbli/ [prɑːbəbli] often material is left out in faster speech (elision) probably [prɑːwliː] this is also (partly) determined by phonotactics and highly context-dependent (speed, setting, )

Speech: the continuous signal of a symbolic system (language).

Acoustic (and other 1-dimensional) Signals x(t): pressure differential in air over time non-stationary: signal changes over time when voiced, signal is a quasi-periodic oscillation complex signal consisting of multiple harmonics time

Acoustic (and other 1-dimensional) Signals x(t): pressure differential in air over time non-stationary: signal changes over time when voiced, signal is a quasi-periodic oscillation complex signal consisting of multiple harmonics time

Complex Periodic Signals simplest signal: sine wave frequency (= 1/wavelength), amplitude, phase all periodic signals can be combined from (an infinite number) of sine waves e.g. the sawtooth signal: 0.3 "sawtooth-i.dat" using 1:(-$2) every ::::2400 0.2 0.1 0-0.1-0.2-0.3 2.945 2.95 2.955 2.96 2.965 2.97 2.975 2.98 2.985 2.99 2.995 3

Fourier Synthesis (2 π k f t ) sawtooth signal: x(t)= sin k k =1 approximate with fewer (than infinitely many) sine waves: 1 '220.dat' every ::::2400 220Hz 0.5 0-0.5-1 0 0.01 0.02 0.03 0.04 0.05 0.6 '440.dat' every ::::2400 0.4 440Hz 0.2 0-0.2-0.4-0.6 0 0.01 0.02 0.03 0.04 0.05 0.8 '220+440.dat' 0.6 220+440Hz every ::::2400 0.4 0.2 0-0.2-0.4-0.6-0.8 0 0.01 0.02 0.03 0.04 0.05 0.15 '220+440+660+880+1100+1320+1540+1760+1980+2200+2420+2640+2860+3080+3300.dat' every ::::2400 220+440+...+3300 Hz 0.1 0.05 0-0.05-0.1-0.15 0 0.01 0.02 0.03 0.04 0.05

Fourier Analysis every complex signal can be analysed into their constituting sine waves (frequency, phase, amplitude) Fourier's theorem speech signal x-axis: time y-axis: amplitude FFT-spectrum x-axis: frequency y-axis: amplitude phase is often ignored

The human ear performs frequency analysis.

Auditory Processing large spikes from harmonics of fundamental frequency signal envelope is registered by the auditory organ speech sounds result in characteristic peaks in the signal envelope formants exception: non-harmonic sounds, such as plosives

Auditory Processing large spikes from harmonics of fundamental frequency signal envelope is registered by the auditory organ speech sounds result in characteristic peaks in the signal envelope formants exception: non-harmonic sounds, such as plosives

Formants the auditory organ performs frequency analysis peaks mask close-by but smaller peaks only largest peaks are tracked and amplified formants Schwa sound (mid-central vowel): peaks ~ 500Hz, 1500Hz, 2500Hz (depends on length of vocal tract) vowel triangle: positions of vowels relative to 1st and 2nd formant figure derived from Wikimedia Commons; CC-BY-SA-2.5

Speech varies over time.

Spectrogram display changing spectrum over time slice the signal into (overlapping) windows analyze windows individually (using Fourier analysis) use colors to draw spectrum strength

Thank you. baumann@informatik.uni-hamburg.de https://nats-www.informatik.uni-hamburg.de/slp16 Universität Hamburg, Department of Informatics Natural Language Systems Group

Further Reading Speech Signal Representation: P. Taylor (2009): Text-to-Speech Synthesis. Cambridge Univ Press. ISBN: 9780521899277. InfBib: A TAY 43070 D. Jurafsky & J. Martin (2009): Speech and Language Processing. Pearson International. InfBib: A JUR 4204x Phonetics: M. Pétursson & J. Neppert (1996): Elementarbuch der Phonetik. Buske. J. Neppert (1999): Elemente einer akustischen Phonetik. Buske. Phonology/Phonotactics/Phonological Systems: E. Ternes (1999): Einführung in die Phonologie. Wiss. Buchgesellschaft. ISBN: 978-3534138708.

Notizen

Desired Learning Outcomes understand the basics of phonetics: voiced/unvoiced sounds, place and manner of articulation,... formants explain vowel perception phonetics vs. phonology: (ir)relevance of variability understand Fourier synthesis all waveforms can be synthesized from sine waves correspondingly, all waveforms can be analyzed into constituting sine waves: frequency, phase, amplitude speech varies over time, hence we use sliding windows