emotional speech Advanced Signal Processing Winter Term 2003 franz zotter

Similar documents
Expressive speech synthesis: a review

Speech Emotion Recognition Using Support Vector Machine

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Human Emotion Recognition From Speech

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

/$ IEEE

A study of speaker adaptation for DNN-based speech synthesis

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Voice conversion through vector quantization

On the Formation of Phoneme Categories in DNN Acoustic Models

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Speaker recognition using universal background model on YOHO database

A Hybrid Text-To-Speech system for Afrikaans

Speaker Identification by Comparison of Smart Methods. Abstract

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

WHEN THERE IS A mismatch between the acoustic

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

Rhythm-typology revisited.

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Modeling function word errors in DNN-HMM based LVCSR systems

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Client Psychology and Motivation for Personal Trainers

Speech Recognition at ICSI: Broadcast News and beyond

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Learning Methods for Fuzzy Systems

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Automatic Pronunciation Checker

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Behavior List. Ref. No. Behavior. Grade. Std. Domain/Category. Social/ Emotional will notify the teacher when angry (words, signal)

Segregation of Unvoiced Speech from Nonspeech Interference

Statistical Parametric Speech Synthesis

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

IEEE Proof Print Version

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

SARDNET: A Self-Organizing Feature Map for Sequences

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Modeling function word errors in DNN-HMM based LVCSR systems

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speaker Recognition. Speaker Diarization and Identification

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Mandarin Lexical Tone Recognition: The Gating Paradigm

Journal of Phonetics

2 months: Social and Emotional Begins to smile at people Can briefly calm self (may bring hands to mouth and suck on hand) Tries to look at parent

A Neural Network GUI Tested on Text-To-Phoneme Mapping

2014 Free Spirit Publishing. All rights reserved.

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Evolution of Symbolisation in Chimpanzees and Neural Nets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

The Common European Framework of Reference for Languages p. 58 to p. 82

Learning Methods in Multilingual Speech Recognition

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Building Text Corpus for Unit Selection Synthesis

Word Stress and Intonation: Introduction

Eyebrows in French talk-in-interaction

Body-Conducted Speech Recognition and its Application to Speech Support System

Evolutive Neural Net Fuzzy Filtering: Basic Description

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

age, Speech and Hearii

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Seminar - Organic Computing

Audible and visible speech

MULTIMEDIA Motion Graphics for Multimedia

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

STAFF DEVELOPMENT in SPECIAL EDUCATION

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

CEFR Overall Illustrative English Proficiency Scales

Math 96: Intermediate Algebra in Context

Vicente Amado Antonio Nariño HH. Corazonistas and Tabora School

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Speak with Confidence The Art of Developing Presentations & Impromptu Speaking

THE RECOGNITION OF SPEECH BY MACHINE

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Proceedings of Meetings on Acoustics

arxiv: v1 [math.at] 10 Jan 2016

A new Dataset of Telephone-Based Human-Human Call-Center Interaction with Emotional Evaluation

Lecturing Module

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Transcription:

emotional speech Advanced Signal Processing Winter Term 2003 franz zotter

contents emotion psychology articulation of emotion physical, facial speech acoustic measures features, recognition affect bursts emotional speech detection/ synthesis applications synthetic feature generation methods HMM (hidden markov models) neural network models available systems emotional speech 2

emotion psychology C. Darwin: archetypes of emotion, biological survival reasons (anger, disgust, fear, sadness, surprise, happiness) W. James: biological reasons, feel bodily changes: we are afraid because we tremble Cognitive: (M. Arnold) emotion determined by appraisal (consider: novelty, pleasantness, responsibility, effort) Social Constructivist: (J. Averill) cultural based emotional behaviour, social rules and moral values emotional speech 3

physical features of emotion autonome nervous systems (sympathetic/parasympathetic) fear/anger: short respiration cycles respiration rhythm irregular high subglottal pressure dry mouth muscle tremor high blood pressure and heart rate [Janet E. Cahn] relaxation/grief: smooth respiration cycles and rhythm low subglottal pressure Increased salivation low blood pressure and heart rate [pic: Akemi IIda] emotional speech 4

facial features of emotion happiness, anger, surprise, fear, sadness, disgust facial expression is very accurate in recognition carries conscious (controlled) and unconscious information about emotion gestures / postures gestures mostly with hands and motion posture: e.g. turn sb. so s back, crossing arms on the chest, etc. emotional speech 5

emotion in speech (2) impacts on speech: [Janet E. Cahn] fear/anger: increased speed and loudness higher pitch expanded pitch range disturbed speech rhythm precise articulation increased higher frequency energy relaxation/grief: low speed and loudness low pitch smaller pitch range smooth speech rhythm, fluent speech imprecise articulation: formant change towards schwa decreased higher frequency energy emotional speech 6

emotion in speech (1) [Klaus Scherer, Brunswikian Lens model] emotional speech 7

pitch (F0): pitch range acoustic measures pitch average contour slope (up/down) accent shape and range harmonicity: breathiness: amount of respiration noise Laryngealisation: due to small subglottal pressure (narrow pulse shape, irregular period) tremor / jitter: irregular pitch period T brilliance (energy ratio between high and low frequencies) loudness (psycho-acoustic weighing) timing: intensity contour (pauses, hesitation) word duration vowel / consonant duration intensity of plosive bursts spectral information: formant positions, bandwidths [Janet E. Cahn] articulation precision emotional speech 8

acoustic measures (1/4) [Janet E. Cahn] pitch (F0): pitch range pitch average contour slope (up/down) accent shape and range [Akemi Iida, pitch contour, histogram] emotional speech 9

acoustic measures (2/4) harmonicity: breathiness: amount of respiration noise Laryngealisation: due to small subglottal pressure (narrow pulse shape, irregular period) tremor / jitter: irregular pitch period T T t emotional speech 10

acoustic measures (3/4) brilliance (energy ratio between high and low frequencies) loudness (psycho-acoustic weighing) timing: intensity contour (pauses, hesitation) word duration vowel / consonant duration intensity of plosive bursts [Akemi Iida, phone duration] emotional speech 11

acoustic measures (4/4) spectral information: formant positions, bandwidths due to small lip s opening or articulation precision differences due to speaker s arousal f [Akemi Iida, vowel position] emotional speech 12

features of emotional speech (1) [Klaus Scherer] emotional speech 13

features of emotional speech (2) [Klaus Scherer] emotional speech 14

affect bursts [Marc Schröder] definition: short emotional non-speech expressions. assigned emotions are often easily recognized emotional speech 15

emotion detection and synthesis applications: automatic dialog systems (trouble recognition) emotion analysis: pathologic purposes (schizophrenia, parkinson, ) forensic purposes (lie detection) speech driven facial animations TTS (text-to-speech) synthesis with emotion (context, xml) speech manipulation (conversion) emotional speech 16

synthetic feature generation affect burst insertion residual excitation manipulations (source-filter models: LPC, ): pitch manipulation (MBROLA, PSOLA, RP-PSOLA, ) timing accents, pitch slope, F0 interpolation pitch shift jitter processing additive noise (breathiness) ring modulation (spectral shift -> harmonicity) linear filtering (emphasis) wave-shaping (exciter: higher harmonics, non-linear) envelope modulation: (pauses, hesitation, plosive bursts, stressed words) spectral modification: formant positions and bandwidth rearrangement emphasis (brilliance) frame rearrangement (timing, diphone transitions) reflection coefficient interpolation (LPC, articulation precision) emotional speech 17

synthetic feature generation neutral to emotional speech synthesizer [Jun Sato, residual signal driven emotional speech synthesizer] emotional speech 18

HMM (hidden markov model) [Hansen, Ghazale] lambda: model parameters (propabilities for observed states with respect to their history) O: observed prosodic parameter sequence Q: emotional state of the listener The HMM training estimates all model parameters lamda In the end you can: detect emotions by propability measures create prosodic features with the viterbi algorithm emotional speech 19

[Jun Sato] neural network models task of each node: arousal of output nodes (next layer) due to the input arousal (previous layer) emotion space (3rd layer output): 2 nodes: 2dimensional emotion space: emotion intensity emotion type The node s in- and output behavior has to be estimated in a training process. tasks: emotion detection from prosodic params given emotion ->prosodic parameter generation [picture: example for one sentence] Problem: context dependent emotional speech 20

available systems HAMLET (DECTalk, formant synthesis, Iain Murray) LAERTES (BT Laureate, concatenative synthesis, Iain Murray) CHATAKO (CHATR, unit selection, Akemi Iida) AffectEditor (DECTalk, formant synthesis, Janet E. Cahn, MIT) VieCtoS (OFAI, concatenative, Erhard Rank) emosyn (MBROLA, TU-Berlin, Felix Burkhard) neural networking: Jun Sato emotional speech 21

screenshots, examples (1) CHATAKO (unit selection) anger happiness sadness emotional speech 22

screenshots, examples (2) AffectEditor (formant synthesis) anger happiness sadness emotional speech 23

examples Emofilt (rule based prosody) anger happiness sadness anger VieCtoS happiness sadness emotional speech 24

papers [Akemi Iida, ] [Janet E. Cahn] [Iain Murray] [Klaus Scherer, Speech Communication 40, 2003] [Mark Schröder, Speech Communication 40, 2003] [Jun Sato, IEEE Robot and Human Communication 1996] [Randolph Cornelius, Speech Communication 40, 2003] [Sahar Bou-Ghazale, John Hansen, IEEE Transaction on Speech and Audio Processing, 1998]. emotional speech 25