SPEECH EMOTION RECOGNITION USING TRANSFER NON- NEGATIVE MATRIX FACTORIZATION

Similar documents
Speech Emotion Recognition Using Support Vector Machine

A study of speaker adaptation for DNN-based speech synthesis

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Human Emotion Recognition From Speech

arxiv: v2 [cs.cv] 30 Mar 2017

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Generative models and adversarial training

Modeling function word errors in DNN-HMM based LVCSR systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Python Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning Methods in Multilingual Speech Recognition

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

(Sub)Gradient Descent

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Assignment 1: Predicting Amazon Review Ratings

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A survey of multi-view machine learning

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Word Segmentation of Off-line Handwritten Documents

CSL465/603 - Machine Learning

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Comment-based Multi-View Clustering of Web 2.0 Items

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep Neural Network Language Models

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

A Web Based Annotation Interface Based of Wheel of Emotions. Author: Philip Marsh. Project Supervisor: Irena Spasic. Project Moderator: Matthew Morgan

Calibration of Confidence Measures in Speech Recognition

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

WHEN THERE IS A mismatch between the acoustic

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Affective Classification of Generic Audio Clips using Regression Models

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Rule Learning With Negation: Issues Regarding Effectiveness

arxiv: v1 [cs.lg] 7 Apr 2015

Model Ensemble for Click Prediction in Bing Search Ads

Indian Institute of Technology, Kanpur

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Attributed Social Network Embedding

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

An Online Handwriting Recognition System For Turkish

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Online Updating of Word Representations for Part-of-Speech Tagging

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Edinburgh Research Explorer

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Speech Recognition by Indexing and Sequencing

Using Web Searches on Important Words to Create Background Sets for LSI Classification

CS Machine Learning

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

Truth Inference in Crowdsourcing: Is the Problem Solved?

Support Vector Machines for Speaker and Language Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Improvements to the Pruning Behavior of DNN Acoustic Models

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Speech Recognition at ICSI: Broadcast News and beyond

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

CS 446: Machine Learning

Multilingual Sentiment and Subjectivity Analysis

Rule Learning with Negation: Issues Regarding Effectiveness

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A new Dataset of Telephone-Based Human-Human Call-Center Interaction with Emotional Evaluation

A Comparison of Two Text Representations for Sentiment Analysis

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Australian Journal of Basic and Applied Sciences

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speaker recognition using universal background model on YOHO database

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

arxiv: v1 [cs.lg] 15 Jun 2015

Automatic Pronunciation Checker

Latent Semantic Analysis

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Using dialogue context to improve parsing performance in dialogue systems

A Vector Space Approach for Aspect-Based Sentiment Analysis

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Transcription:

ICASSP 2016 Shanghai, China SPEECH EMOTION RECOGNITION USING TRANSFER NON- NEGATIVE MATRIX FACTORIZATION Peng Song School of Computer and Control Engineering, Yantai University pengsongseu@gmail.com 2016.3.25

Outline PARTI: PART2: PART3: PART4: PART5: Introduction Baseline NMF algorithm Our proposed method Experimental Results and Discussions Conclusion and Future Work 2

PARTI: Introduction 3

Speech Emotion Recognition Definition: As a hot research topic in affective computing and speech signal processing fields, the goal of speech emotion recognition is to automatically recognize emotions from speech, e.g., anger, happiness, sadness, surprise. Application: Intelligent transportation systems Healthcare field Call Centers Many other HCI fields 4

Framework of Speech Emotion Recognition Our focus in this paper 5

Recognition Methods All kinds of classification methods popular in pattern recognition and machine learning fields, are employed for emotional label classification or prediction including: support vector machine (SVM) hidden Markov model (HMM) Gaussian mixture model (GMM) neural network (NN) some regression methods deep neural network (DNN) extreme learning machine (ELM) Weakness: They are carried out and evaluated on single corpus. In practice, it is too hard to collect a large emotional speech dataset, and the training data and testing data are often collected from different devices and environments, this discrepancy will obviously influence the recognition performance. 6

Recognition Methods(Cont.) To realize the cross-corpus speech emotion recognition, some efforts have been taken in recent years. Schuller et al. conduct preliminary cross-corpus experiments on six different datasets (2011) Deng et al. present an autoencoder-based unsupervised domain adaptation method (2014) We introduce a dimension reduction based transfer learning approach (2014) Weakness: Most of these methods do not take into account the different distributions of different corpora, and the difference is always very large. Our previous dimension reduction based transfer learning algorithm conducts transfer learning or dimension reduction separately. 7

PART2: Baseline NMF algorithm 8

NMF NMF (non-negative matrix factorization) is a well-known algorithm that can obtain a low dimensional representation of the non-negative data (Lee & Seung, 1999). It aims at finding two non-negative matrices to well approximate the original matrix data as follows It is a non-convex problem to optimize U and V together, and can be solved via an iterative algorithm (Lee & Seung, 2001) as 9

Graph NMF Many previous studies have demonstrated that the naturally occurring data may usually reside on or close to a low dimensional submanifold embedded in a high dimensional space, so Cai et al. (2011) present a graph NMF algorithm, which is written as where LL = DD WW is the graph laplacian, in which DD = [dd jjjj ] RR NN NN, dd jjjj = ll ww iiii, and 10

PART3: Our proposed method 11

Minimizing the distribution divergence By using the GNMF algorithm, the latent coding vectors can be obtained for the two corpus are obtained. However, the differences between the distributions of coding vectors are still large, so the empirical maximum mean discrepancy (MMD) algorithm is employed where 12

The transfer NMF method By integrating the GNMF function with the MMD algorithm, the objective function of the transfer NMF can be written as Let, the above equation can be rewritten as 13

The transfer NMF method (Cont.) As NMF, the above Equation is not convex when optimizing U and V together, so the iterative algorithm is also employed, and the updating functions can be rewritten as where TT + and TT are the positive and negative parts of TT. 14

PART4: Experimental Results and Discussions 15

Experimental setup Datasets: Berlin (EMO-DB) dataset, enterface dataset Strategies: The 1 st case: the lableled Berlin dataset is chosen for training, and the unlabeled enterface dataset is used for testing. The 2 nd case: the labeled enterface dataset is chosen for training, and the unlabeled Berlin dataset is used for testing. Emotion Categories: anger, disgust, fear, happiness, sadness and surprise Features: Extracted by the opensmile toolkit The 1582 dimensional feature set of Interspeech 2010 Paralinguistic challenge is adopted 16

Experimental setup (Cont.) 17

Experimental results Recognition results in case1 (enterface dataset for training, Berlin dataset for testing) the dimension reduction based transfer learning method (DR) the transfer component analysis method (TCA) the NMF method (NMF) the graph NMF method (GNMF) the proposed TNMF method (Ours) 18

Experimental results (Cont.) Recognition results in case2 (Berlin dataset for training, enterface dataset for testing) the dimension reduction based transfer learning method (DR) the transfer component analysis method (TCA) the NMF method (NMF) the graph NMF method (GNMF) the proposed TNMF method (Ours) 19

PART5: Conclusion and Future Work 20

Conclusions In this paper, a new cross-corpus speech emotion recognition method using transfer NMF is presented. The NMF approach is proposed for dimension reduction and feature representation The MMD algorithm is employed for similarity measurement The NMF and MMD are jointly optimized 21

Discussions There still exist some problems in current method: The classifier is trained only using the labeled features of source dataset, without considering the unlabeled information from the target dataset Learning common feature representations may lessen the class discrimination of each corpus More datasets should be involved to evaluate the performance of our proposed method 22

Thank You!