Effects of vowel types on perception of speaker characteristics of unknown speakers

Similar documents
Voice conversion through vector quantization

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Mandarin Lexical Tone Recognition: The Gating Paradigm

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

THE RECOGNITION OF SPEECH BY MACHINE

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Speech Emotion Recognition Using Support Vector Machine

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Speaker recognition using universal background model on YOHO database

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Online Publication Date: 01 May 1981 PLEASE SCROLL DOWN FOR ARTICLE

Proceedings of Meetings on Acoustics

age, Speech and Hearii

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Course Law Enforcement II. Unit I Careers in Law Enforcement

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Segregation of Unvoiced Speech from Nonspeech Interference

Expressive speech synthesis: a review

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Phonological and Phonetic Representations: The Case of Neutralization

A study of speaker adaptation for DNN-based speech synthesis

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Audible and visible speech

Journal of Phonetics

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Linking object names and object categories: Words (but not tones) facilitate object categorization in 6- and 12-month-olds

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

On the Formation of Phoneme Categories in DNN Acoustic Models

Effects of Open-Set and Closed-Set Task Demands on Spoken Word Recognition

Body-Conducted Speech Recognition and its Application to Speech Support System

Automatic intonation assessment for computer aided language learning

Human Emotion Recognition From Speech

Rhythm-typology revisited.

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Mastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Strategic Plan Dashboard Results. Office of Institutional Research and Assessment

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Hynninen and Zacharov; AES 106 th Convention - Munich 2 performing such tests on a regular basis, the manual preparation can become tiresome. Manual p

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

9 Sound recordings: acoustic and articulatory data

Speaker Recognition. Speaker Diarization and Identification

Human Factors Engineering Design and Evaluation Checklist

Phonetics. The Sound of Language

INTRODUCTION. 512 J. Acoust. Soc. Am. 105 (1), January /99/105(1)/512/10/$ Acoustical Society of America 512

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

L1 Influence on L2 Intonation in Russian Speakers of English

Cooper Upper Elementary School

The Acquisition of English Intonation by Native Greek Speakers

File Print Created 11/17/2017 6:16 PM 1 of 10

Cooper Upper Elementary School

Learners Use Word-Level Statistics in Phonetic Category Acquisition

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Sound and Meaning in Auditory Data Display

/$ IEEE

Falling on Sensitive Ears

Speaker Identification by Comparison of Smart Methods. Abstract

Consonants: articulation and transcription

Eyebrows in French talk-in-interaction

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

LEARNABILTIY OF SOUND CUES FOR ENVIRONMENTAL FEATURES: AUDITORY ICONS, EARCONS, SPEARCONS, AND SPEECH

2010 National Survey of Student Engagement University Report

Individual Differences & Item Effects: How to test them, & how to test them well

Beginning primarily with the investigations of Zimmermann (1980a),

Communicative signals promote abstract rule learning by 7-month-old infants

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Manual Response Dynamics Reflect Rapid Integration of Intonational Information during Reference Resolution

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Describing Motion Events in Adult L2 Spanish Narratives

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

IEEE Proof Print Version

California Department of Education English Language Development Standards for Grade 8

Automatic segmentation of continuous speech using minimum phase group delay functions

INTRODUCTION J. Acoust. Soc. Am. 102 (3), September /97/102(3)/1891/7/$ Acoustical Society of America 1891

Transcription:

Effects of vowel types on perception of speaker characteristics of unknown speakers ATR Human Information Science Laboratories Tatsuya Kitamura and Parham Mokhtari This research was supported by the Ministry of Internal Affairs and Communications on their Strategic Information and Communications R&D Programme. 1

Introduction Speech signal conveys Linguistic info. Identity Emotional state Auditory face (Belin et al. 24) Health condition etc. Speaker individualities 2

Speaker individualities Humans have individual voice characteristics as they have individual face. Acoustic characteristics of speech sounds vary due to the differences of Phoneme Pitch frequency Speaking style Emotional state Health condition Communication channel etc. Intra-speaker variation But, humans can robustly identify speakers of familiar voices. Why? How? 3

Aim To investigate human abilities to identify speakers despite intra-speaker variations. Hypotheses Humans perceive speaker individualities that persist across intra-speaker variations. Humans have intra-speaker variation models in their mind. Three psychoacoustic experiments focused on intra-speaker variations due to vowel differences. 4

Experiments 1 & 2 Speaker identification tests. To confirm whether there are speaker individualities common to sustained vowels. To investigate effects of dynamic features on identifying the speakers. To show effects of pitch frequency (F) on identifying the speakers of vowels and sentences. 5

Experiments 1 & 2 Speech data & participants Experiment 1 4 sustained Japanese vowels uttered by 4 male native Japanese speakers. /a/, /e/, /i/, and /o/. Approx..6 sec. Experiment 2 3 Japanese sentences uttered by 4 male speakers /aɾajɯɾɯ genʒitsɯ o sɯbeteʒibun no ho:e neʒimagetanoda/ /iʃʃɯ:kan bakaɾi njɯ:yo:kɯ o ʃɯzai ʃita/ /teɾebi ge:mɯ ja pasokon de ge:mɯ o ʃite asobɯ/ Sampling rate (Fs): 16 khz 2 tokens for each vowel and sentence. 9 listeners (2 males and 7 females) 6

Experiments 1 & 2 Stimuli Experiment 1 V1: Speech waves with normalized amplitude. V2: Speech waves with normalized amplitude and pitch frequency. Experiment 2 Pitch contours were retained. S1: Speech waves with normalized amplitude. S2: Speech waves with normalized amplitude and pitch frequency. Pitch frequencies of V2 and S2 were tuned to a mean value by using the STRAIGHT analysissynthesis system (Kawahara et al. 1999). 7

Experiments 1 & 2 Procedure ABX test Participants were asked to select which of the first two speakers produced the third stimulus. Example Stimulus V1 Stimulus V2 Stimulus S1 Stimulus S2 /a/ of speaker B /a/ of speaker A /a/, /i/, or /o/ of speaker A or B A B X 1 sec reference vowel 1 sec Fig. ABX sequence for Exp. 1 time 8

1 amp. normalized vowel ** ** ** 8 6 4 chance level 2 /a/ /i/ percent correct [%] 12 ** statistically significant difference (p <.1) /o/ 12 12 1 8 6 4 8 6 chance level 2 /a/ /i/ Stimulus V2 chance level 2 sentence1 sentence2 sentence3 Stimulus S1 F normalized vowel ** ** 4 amp. normalized sentence 1 Stimulus V1 /o/ percent correct [%] ** percent correct [%] percent correct [%] Experiments 1 & 2 Results 12 F normalized sentence 1 8 6 4 chance level 2 sentence1 sentence2 sentence3 Stimulus S2 9

Experiments 1 & 2 Discussions Possible explanation for the differences of results of Exp. 1 & 2 1. The participants could obtain dynamic features of each speaker from the different sentences, which were absent in the sustained vowel stimuli. 2. They needed speech sounds with dynamic variations in order to obtain invariant static features as cues to speaker identification. 3. They identified the speakers using speaker characteristics obtained from phonemes common to the sentences. 1

Experiment 3 To investigate possible speaker characteristics common to sustained vowels. Focused on Higher frequency region of speech spectra. Glottal source pattern. 11

Experiment 3 Method Speech data 3 sustained Japanese vowels 6 male speakers. /a/, /e/, and /o/. Approx. 1 sec. 3 tokens for each vowel Fs=16 khz ABX test procedure Reference vowel is /a/. 12 female participants Stimuli: 4 types original speech wave normalize amplitude normalize F fix F constant randomize frame sequence stimulus 4 stimulus 1 stimulus 2 down sampling (Fs=5 khz) stimulus 3 12

12 1 8 6 4 2 ** ** chance level /a/ /e/ /o/ Stimulus 1 ** down sampled 12 1 8 ** ** 6 4 2 chance level /a/ /e/ Stimulus 3 /o/ percent correct [%] ** amp. normalized ** statistically significant difference (p <.1) percent correct [%] percent correct [%] percent correct [%] Experiment 3 Results 12 1 8 F normalized ** ** 6 4 2 ** 12 1 chance level /a/ /e/ Stimulus 2 Frame randomized ** 8 6 ** 4 2 /o/ chance level /a/ /e/ Stimulus 4 /o/ 13

Conclusions 1. There are speaker individualities common to the sustained vowels. 2. The possible cues common to the sustained vowels: 1. the mean of the pitch frequency, 2. higher frequency region of speech spectra, 3. glottal source patterns. 3. The mean of pitch frequency is not a significant cue for identification of the unknown speaker of sentences. 14

15

Speech spectra 2 2 /o/ /o/ /e/ /e/ /a/ 2 4 6 frequency [khz] Speaker A 8 2 4 6 frequency [khz] Speaker B /a/ 8 16

Speech production model proposed by Honda et al. (24) Source-filter model of speech production (Fant 197) glottal source vocal tract filter radiation speech Honda s model glottal source hypopharynx filter vocal tract proper filter radiation speech 17

Experiment 1 Participants 9 listeners (2 males and 7 females). They have never met with the speakers nor listened to the speakers voice. No hearing impairments. 18