Acoustic Phonetics Part 2

Similar documents
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Rhythm-typology revisited.

Mandarin Lexical Tone Recognition: The Gating Paradigm

Speech Emotion Recognition Using Support Vector Machine

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Consonants: articulation and transcription

Speech Recognition at ICSI: Broadcast News and beyond

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Word Stress and Intonation: Introduction

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

The Acquisition of English Intonation by Native Greek Speakers

Phonetics. The Sound of Language

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Learning Methods in Multilingual Speech Recognition

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Speaker Recognition. Speaker Diarization and Identification

L1 Influence on L2 Intonation in Russian Speakers of English

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Universal contrastive analysis as a learning principle in CAPT

Eyebrows in French talk-in-interaction

Collecting dialect data and making use of them an interim report from Swedia 2000

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Body-Conducted Speech Recognition and its Application to Speech Support System

Segregation of Unvoiced Speech from Nonspeech Interference

Phonological Processing for Urdu Text to Speech System

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

age, Speech and Hearii

English Language and Applied Linguistics. Module Descriptions 2017/18

GEMINATION STRATEGIES IN L1 AND ENGLISH PRONUNCIATION OF POLISH LEARNERS

REVIEW OF CONNECTED SPEECH

Phonological and Phonetic Representations: The Case of Neutralization

Voice conversion through vector quantization

Proceedings of Meetings on Acoustics

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Sample Goals and Benchmarks

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Course Law Enforcement II. Unit I Careers in Law Enforcement

THE RECOGNITION OF SPEECH BY MACHINE

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Florida Reading Endorsement Alignment Matrix Competency 1

Automatic intonation assessment for computer aided language learning

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

A study of speaker adaptation for DNN-based speech synthesis

Learners Use Word-Level Statistics in Phonetic Category Acquisition

On the Formation of Phoneme Categories in DNN Acoustic Models

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Speaker recognition using universal background model on YOHO database

A survey of intonation systems

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Using a Native Language Reference Grammar as a Language Learning Tool

Consonant-Vowel Unity in Element Theory*

Journal of Phonetics

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

ACCREDITATION STANDARDS

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

MFL SPECIFICATION FOR JUNIOR CYCLE SHORT COURSE

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Self-Supervised Acquisition of Vowels in American English

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

The influence of metrical constraints on direct imitation across French varieties

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

SIE: Speech Enabled Interface for E-Learning

The Common European Framework of Reference for Languages p. 58 to p. 82

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Human Emotion Recognition From Speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Journal of Phonetics

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Individual Differences & Item Effects: How to test them, & how to test them well

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Bi-Annual Status Report For. Improved Monosyllabic Word Modeling on SWITCHBOARD

WHEN THERE IS A mismatch between the acoustic

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Update on Standards and Educator Evaluation

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Expressive speech synthesis: a review

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Transcription:

Acoustic Phonetics Part 2 Lecturer: Dr Anna Sfakianaki HY578 Digital Speech Signal Processing Spring Term 2016-17 CSD, University of Crete

INTERPRETING SPECTROGRAMS (I) In connected speech, many of the sounds are more difficult to distinguish. Transcribe the segments in the following phrase She came back and started again. (American English) i k e m b æ k n s t t d /æ n

INTERPRETING SPECTROGRAMS (IΙ) I should have thought spectrograms were unreadable. (British English) We first find obvious things first, i.e. [s, ] which stand out. Start at the beginning, and find the vowel [a] in the first word. The vowel in thought before [s]. And then the [t] in thought. It seems as if the whole of the phrase should have was pronounced without any voicing: [atf t] a t f t s

I should have thought spectrograms were unreadable. Try to transcribe spectrograms were unreadable, remembering that some of the sounds you might have expected to be voiced might be voiceless. No aspiration after [p]. [] is very short but you can see the coming together of F2 and F3 for the [ k ]. INTERPRETING SPECTROGRAMS (II) [t] is highly aspirated, so the following [r] becomes voiceless. Same with []. a t f t s p kt

INTERPRETING SPECTROGRAMS (II) I should have thought spectrograms were unreadable. The velar stop [g] is released into an [] located by the lowering of F3 and F4. The fricative after the [m] appears to be voiceless and of less intensity than a [s]. The [w] is distinguishable by the low F2 of the following vowel. The lowering of F3 marks the [r] in were. a t f t s p ktæ m z w

INTERPRETING SPECTROGRAMS (II) I should have thought spectrograms were unreadable. The lowering of F3 marks the [r] in unreadable. [d] and [] are very short. The final syllabic /l/ looks like a back vowel. a t f t s p ktæ m z w n i d b l

INTERPRETING SPECTROGRAMS (III) English sentence spoken by a British English speaker. Try to identify segments in the sentence. What do you observe at 14-15; What must be there when the third formant is below 2000 Hz? Can you discern a distinctive pattern of F2 and F3 at (26) and (24-25)?

INTERPRETING SPECTROGRAMS (III) (1): Small fricative noise near 3000 Hz. (2): A vowel that may be [i] or []. (3-4): A sharp break in the pattern and faint formants at about 250, 1300, and 2400 Hz nasal or lateral (5): This vowel looks like [æ] or []. (6): Fricative of low energy, [f] or []

INTERPRETING SPECTROGRAMS (III) (7): voiceless stop: [p], [t] or [k] (8): Aspiration strong at high frequency, most likely a [t]. (9): The vowel has a low F1 and a high F2, so it s either [i] or []. (10): F2 falls slightly diphthongization

INTERPRETING SPECTROGRAMS (III) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) i l æ t i n, m, f k, p he laugh/left here

INTERPRETING SPECTROGRAMS (III) (13-14): Fricative [f] ή []. (15): Low F3, indicating []. (16-17): Vowel with low F1 and high F2 [i] (17-18): Voicing near the baseline and intense, high frequency burst [d] (20-21): Long, high and front vowel (diphthong) [e]. (23): Fricative like [s], but due to lack of intensity [z] with faint voicing. (24): Very short vowel, probably []. (25-26): Velar pinch velar stop (27-29): Long vowel (diphthong) ending in back low vowel [] he left here three days ago

TYPES OF SPECTROGRAMS wide-band spectrograms narrow-band spectrograms Is Pat sad or mad?

TYPES OF SPECTROGRAMS Wide-band spectrograms Very accurate in the time dimension They show each vibration of the vocal folds as a separate vertical line. They indicate the precise moment of a stop burst with a vertical spike. Less accurate in the frequency dimension There are usually several component frequencies present in a single formant, all of them lumped together in one wide band on the spectrogram. Narrow-band spectrograms More accurate in the frequency dimension (at the expense of accuracy in the time dimension). The spikes of stop releases are smeared in the time dimension in the narrow-band spectrogram. The frequencies that compose each formant are visible.

FEMALE VOICE Women s voices usually have a higher pitch. The higher the F0 the more difficult it is to locate formants, because the harmonics interfere with the display of formants. Greek phrase uttered by a male and a female Greek adult. Λέγε «παππού» πάλι. (Say grandfather again) male female

7. INDIVIDUAL DIFFERENCES It is important to know what sort of differences exist between different speakers. 1. When trying to measure features that are linguistically significant, one must know how to discount purely individual features. 2. When trying to find out whether a speaker has speech problems. 3. For valid speaker identification in forensic situations. Individual variation is readily apparent when studying spectrograms relative quality

7. INDIVIDUAL DIFFERENCES Same phonetic quality Similar relative positions Different absolute values Vowels pronounced by 2 speakers of Californian English.

7. INDIVIDUAL DIFFERENCES No simple technique to average out individual characteristics so that a formant plot shows only the phonetic qualities of vowels. F4 indicator of individual s head size Express values of other formants as percentages of the mean F4. F4 values are not usually reported. Phoneticians do not really know how to compare acoustic data on the sounds of one individual with those of another. We cannot write a computer program that will accept any individual s vowels as input and then output a narrow phonetic transcription.

8. SPEECH SYNTHESIS & PROSODY A large part of applied phonetics work is concerned with computer speech technology directed towards improving speech synthesis systems. The greatest challenges in the field of speech synthesis concern intonation and rhythm. Stereotyped intonation unnatural speech To get correct pitch changes/rhythm Speaker s attitude towards world & specific topic Emphasis Syntax of the utterance Higher level pragmatic considerations Segmental influences

9. SPEECH RECOGNITION Systems can recognize Single words Limited sets of words in task specific situations with structured dialogue limited set of possible answers Yet to achieve Accurate written transcript of ordinary speech as spoken by people with a wide range of accents and different personal characteristics

10. FORENSIC PHONETICS Speaker identification in legal proceedings. Voice-prints: spectrograms of a person s voice Said to be as individual as fingerprints Greatly exaggerated claim Some individual characteristics are recorded on spectrograms. Individual characteristics on spectrograms Position of F4 and higher formants speaker s voice quality Locations of higher formants in nasals individual physiological characteristics Speaker s speech habits Length and type of aspiration after initial voiceless stops Rate of formant transition after voiced stops Mean pitch Range of F0

10. FORENSIC PHONETICS Nobody knows how many individuals share similar characteristics. An expert s opinion on the probability of two voices being the same has evidential value. No two cases (recordings) are ever the same Recording quality Recording duration Word content Speech style (natural, emotional etc.) Elaborate prior testing is needed. likelihood ratio: Likelihood voices are the same Likelihood voices are different Visit: Forensic Speech Science University of York https://sites.google.com/site/yorkfss/home

READ & VISIT Visit the websites: https://corpus.linguistics.berkeley.edu/acip/ Material for chapter 8 from UC Berkley Linguistics, A course in phonetics including online exercises http://home.cc.umanitoba.ca/~robh/howto.html Monthly Mystery Spectrogram Webzone -Rob Hagiwara's professional webspace http://www.youtube.com/watch?v=gg4ihbiitd0 Introduction to spectrogram analysis (FloridaLinguistics.com) http://www.linguistics.ucla.edu/people/hayes/103/spectrogramreading/i ndex.htm Spectrogram reading practice (by Bruce Hayes, UCLA) http://www.oddcast.com/home/demos/tts/tts_example.php Text-to-Speech synthesis Avatars

EXERCISE A P. 215 Put a transcription of the segments in the phrase Please pass me my book above the waveform. Draw lines showing the boundaries between the segments.

EXERCISE B P. 215 The spectrogram shows the phrase Show me a spotted hyena. Put a transcription above it, and show the segment boundaries. In places there are no clear boundaries (as in the first part of hyena), draw dashed lines.