Affective computing. Emotion recognition from speech. Fall 2018

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Affective computing. Emotion recognition from speech. Fall 2018"

Transcription

1 Affective computing Emotion recognition from speech Fall 2018 Henglin Shi,

2 Outlines Introduction to speech features Why speech in emotion analysis Speech Features Speech and speech production Types of speeches Fundamental and resonant frequencies Speech production model Speech feature extraction Short time analysis Acoustic feature extraction Prosodic feature extraction

3 Why speeches in emotion analysis Speech is a main modality for people to express message Speech is more REAL than other modalities

4 Speech Emotion Features Lexical features Acoustic features Prosodic features

5 Lexical Features Explicit affective messages. Affective words, stress, etc.

6 Acoustic Features Traditional acoustic features (MFCC, LPC, PLP, etc.) Many filter bank and prefiltering options Simple signal measures (e.g. zerocrossings, HNR) Other spectral measures (e.g. formants, long term spectrum)

7 Prosodic Features Pitch/F0 Pitch tracker F0 contour and derivative distributions Rhythm/Duration Voiced/unvoiced/silence segmentation Distributions of segments and segment ratios Phoneme segmentation Speech rate Loudness/Intensity FFT and short segment energy Energy contours and spectral parameters Quality Inverse filtering Vocal source parameters

8 Process of Speech Production

9 Process of Speech Production

10 The mechanism of Speech Production Vocal tract begins at the opening of the vocal chord, and ends at the lips Vocal tract is a non-uniform acoustic tube of different diameter. The cross-section area of the vocal, determined by the positions of the tongue, lips, jaw and velum.

11 The mechanism of Speech Production The nasal tract begins at the velum and ends at the nostrilss (Diameter of nasal tract also varies). When velum is lowered, the nasal tract is acoustically coupled to the vocal tract to produce the nasal sounds of speech.

12 Classification of Speech Sounds Voiced Sounds The vocal chord vibrates when air is passing through E.g. vowels like /a/, /e/, /i/ Unvoiced Sounds The Vocal chord does not vibrate E.g. /f/, /s/, /k/ Other Sounds Nasal sounds Plosive sounds

13 Voiced Sound Vocal chords usually vibrate at a particular frequency, which is called the fundamental frequency (F0) of the sound. Different persons have different fundamental frequency 50 to 200 Hz for male speakers 150 to 300 Hz for femal speackers 200 to 400 Hz for child speakers The inverse of the fundamental frequency is the estimation on the pitch period.

14

15 Unvoiced Sound Characterised by high frequency components, just like random noise. For unvoiced sound, the vocal chord is held open, and the air rushes through the lungs, through the vocal tract, shaped by the vocal tract and comes out at the lips.

16 Other Sound Classes Nasal Sounds Vocal chord may vibrate Coupled with nasal cavity Sound radiated from the nostrils and lips E.g. /ing/ Plosive Sounds Generated by pressure behind the closure with sudden releases E.g. /k/, /t/

17 Resonant Frequencies of Vocal Tract Vocal tract is a non-uniform acoustic tube of different diameter, i.e. it s diameter varies. Vocal tract with different diameters will generate different resonant frequencies, which are called formants. Three to four formants present below 4kHz of speech.

18 Formants

19

20

21

22 Formants: vowels

23 Speech production P(f) R(f) T(f) U(f) P(f) = U f T f R(f)

24 Linear model of speech production Prosodic parameters Pitch and intonations Quality (Vocal source and tract) Intensity Durations

25 Speech feature extraction Local feature and global feature Local Feature: features describe a frame Global Feature: features describe an utterance Short time analysis

26 Framing/Windowing Using window functions to segment speech signal into small frames 10 to 20 ms for each frame Examples: Rectangular window Hamming window

27 Acoustic Feature: Zero Crossing Count The ZCC refects the signal frequency The ZCC is calculated according to: N 1 ZCC i = 0.5 sign s k sign s k 1 k=1 ZCC reflects the frequency of the signal DC offset should be removed

28 Acoustic Feature: Mel-cepstrum Mel-Frequency Cepstral Coefficients (MFCC) Mel-scale spaced filter bank Corresponds to human auditory system (equal perceived pitch increments) Usually ~12-24 coefficients used with 50% overlapping window Pre-emphasis often used for loudness equalization Mean cepstral subtraction for relative features, Delta and delta-delta features possible for sequences Alternatively, critical band energy features, i.e. logarithms of the band filters, no DCT

29 Pre-emphisize Filtering S z E z = A v z 1 2 P a k z k 1+ k=1 1 z 1 Speech signal is not the original signal from the vocal tract If we want to focus the vocal tract, we have to apply a high-pass filter to cancel factors which are not belong to the vocal tract

30 Prodosic Featur: Short Time Energy Summation of sqares of all samples within a frame Used to distinguish voiced and unvoiced sounds Larger STE: voiced sound Smaller STE: unvoiced sound E m = n s n w m n 2

31 Utilizing the Features STE (high) Voiced Speech ZCC (low) ZCC (high) Unvoiced Speech STE (low)

32 Prodosic Featur: Fundamental Frequency Pitch Period = 1 F 0 F 0 is the fundamental frequency of vocal chord vibration Method for estimating Pitch Period Time domain methods Frequency domain method

33 Extract Pitch Period in Time Domain Short Time Autocorrelation Function Assumption: One signal can be considered as a delayed version of another φ k = 1 N n=0 N 1 s n s n k Finding k to maximize φ k Average Magnitude Difference Function D k = 1 N n=0 N 1 s n n k Finding k to minimize D k

34 Features within an Utterance

35 Emperical Result on Emotional effects in speech

36

37

38 Same text and speakers with different emotions Same text with different speakers and different emotions Emotional F0 contour examples Neutral Bored Angry Happy

39 Features: Prosodic features

40 Emotion recognition from speech Traditional machine learning tools used frequently Feature selection and transformations Sequential floating search (SFFS), principal component analysis (PCA), nonlinear manifold modeling, etc. Classifiers Linear discriminant analysis (LDA), k-nearest neighbors (knn), support vector machines (SVM), hidden markov models (HMM), neural networks (NN) Validation and regularization Cross-validation, cost/penalty functions, Bayesian Information Criterion (BIC), structural risk minimization (SRM), etc.

41 State-of-the-art methods Pitch tracker Autocorrelation is probably the best short term method Need a better estimate of glottal closures e.g. waveform matching (time-domain) Classifier SVM or neural network Any classifier accepting nonlinear data will do deep neural architectures are a current trend Feature training Genetic algorithms, floating search PCA transformation of traditional features seems to help very little, nonlinear methods (e.g. Isomap) are better

42 Deep Learning Current paradigm in ASR State-of-the-art approach used in all major speech recognition solutions (Apple, Google, Facebook, Microsoft, ) Alternative to feature engineering Can use e.g. raw spectrograms and/or (large) sets of traditional acoustic features as inputs Hidden layers used to learn nonlinear features or filter banks Fusion of multimodal sources straightforward Computational costs and overlearning problems, but, if correctly applied, offers very promising performance

43 State-of-the-art performance Theoretical performance according to literature % in an automatic speaker-independent limited emotion case (discrimination) Neutral, sad, happy, angry 55-70% for human reference in a non-limited recognition of basic emotions in multicultural context In practice Neutral, sad, happy, angry, disgusted, surprised, fearful % depending on the scenario constraints, sample size, quality, number of emotions, and available features

44 Linear Predictive Coding (LPC) Very useful for estimating pitch, formants, spectra, and vocal tract parameters Assumption: a speech signal sample can be estimated as a linear combination of past samples.

45 LPC (Cont.) Inverse z-transformation of the vocal tract model : p s n = a k s n k + Gu n k=1 If we can use another set of a k which can make p k=1 Thus we have p a k s n k a k s n k = s n Gu n k=1 p e n = s n a k s n k = Gu n k=1

46 Usage of LPC Measure the pitch period more precisely using e n = s n p k=1 a k s n k = Gu n Questions: If we apply this model on unvoiced speech? Note: These coefficients should be calculated short-timely within frames

Professor E. Ambikairajah. UNSW, Australia. Section 1. Introduction to Speech Processing

Professor E. Ambikairajah. UNSW, Australia. Section 1. Introduction to Speech Processing Section Introduction to Speech Processing Acknowledgement: This lecture is mainly derived from Rabiner, L., and Juang, B.-H., Fundamentals of Speech Recognition, Prentice-Hall, New Jersey, 993 Introduction

More information

HUMAN SPEECH EMOTION RECOGNITION

HUMAN SPEECH EMOTION RECOGNITION HUMAN SPEECH EMOTION RECOGNITION Maheshwari Selvaraj #1 Dr.R.Bhuvana #2 S.Padmaja #3 #1,#2 Assistant Professor, Department of Computer Application, Department of Software Application, A.M.Jain College,Chennai,

More information

PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1

PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1 PERFORMANCE ANALYSIS OF MFCC AND LPC TECHNIQUES IN KANNADA PHONEME RECOGNITION 1 Kavya.B.M, 2 Sadashiva.V.Chakrasali Department of E&C, M.S.Ramaiah institute of technology, Bangalore, India Email: 1 kavyabm91@gmail.com,

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (I)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (I) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (I) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-213 1439 Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine Akshay S. Utane, Dr.

More information

Isolated Speech Recognition Using MFCC and DTW

Isolated Speech Recognition Using MFCC and DTW Isolated Speech Recognition Using MFCC and DTW P.P.S.Subhashini Associate Professor, RVR & JC College of Engineering. ABSTRACT This paper describes an approach of isolated speech recognition by using the

More information

Design and Development of Database and Automatic Speech Recognition System for Travel Purpose in Marathi

Design and Development of Database and Automatic Speech Recognition System for Travel Purpose in Marathi IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 5, Ver. IV (Sep Oct. 2014), PP 97-104 Design and Development of Database and Automatic Speech Recognition

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

I.INTRODUCTION. Fig 1. The Human Speech Production System. Amandeep Singh Gill, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18552

I.INTRODUCTION. Fig 1. The Human Speech Production System. Amandeep Singh Gill, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18552 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 10 Oct. 2016, Page No. 18552-18556 A Review on Feature Extraction Techniques for Speech Processing

More information

A COMPATIVE STUDY OF SILENCE AND NON SILENCE REGIONS OF SPEECH SIGNAL USING PROSODY FEATURES FOR EMOTION RECOGNITION

A COMPATIVE STUDY OF SILENCE AND NON SILENCE REGIONS OF SPEECH SIGNAL USING PROSODY FEATURES FOR EMOTION RECOGNITION A COMPATIVE STUDY OF SILENCE AND NON SILENCE REGIONS OF SPEECH SIGNAL USING PROSODY FEATURES FOR EMOTION RECOGNITION J. Naga Padmaja Assistant Professor of CSE KITS, KHAMMAM srija26@gmail.com Abstract

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Speaker Recognition Using Vocal Tract Features

Speaker Recognition Using Vocal Tract Features International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 3, Issue 1 (August 2013) PP: 26-30 Speaker Recognition Using Vocal Tract Features Prasanth P. S. Sree Chitra

More information

Emotion Recognition from Speech using Prosodic and Linguistic Features

Emotion Recognition from Speech using Prosodic and Linguistic Features Emotion Recognition from Speech using Prosodic and Linguistic Features Mahwish Pervaiz Computer Sciences Department Bahria University, Islamabad Pakistan Tamim Ahmed Khan Department of Software Engineering

More information

SPEECH PROCESSING Overview

SPEECH PROCESSING Overview SPEECH PROCESSING Overview Patrick A. Naylor Spring Term 2008/9 Voice Communication Speech is the way of choice for humans to communicate: no special equipment required no physical contact required no

More information

Speech Synthesis by Articulatory Models

Speech Synthesis by Articulatory Models Speech Synthesis by Articulatory Models Advanced Signal Processing Seminar Helmuth Ploner-Bernard hamlet@sbox.tugraz.at Speech Communication and Signal Processing Laboratory Graz University of Technology

More information

David Weenink. First semester 2007

David Weenink. First semester 2007 Institute of Phonetic Sciences University of Amsterdam First semester 2007 The Speech Organs Speech Speech production organs determine acoustic characteristics of speech sounds. Motivation Find explanations

More information

Sound and Music Science. Speech Production

Sound and Music Science. Speech Production Sound and Music Science Speech Production Learning Objectives How human vocal organ makes speech sounds How speech sounds are the product of the source, the filter and the radiation efficiency Speech articulation

More information

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 59 Feature Extraction Using Mel Frequency Cepstrum Coefficients for Automatic Speech Recognition Dr. C.V.Narashimulu

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Speech Recognition using MFCC and Neural Networks

Speech Recognition using MFCC and Neural Networks Speech Recognition using MFCC and Neural Networks 1 Divyesh S. Mistry, 2 Prof.Dr.A.V.Kulkarni Department of Electronics and Communication, Pad. Dr. D. Y. Patil Institute of Engineering & Technology, Pimpri,

More information

SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH

SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH SECURITY BASED ON SPEECH RECOGNITION USING MFCC METHOD WITH MATLAB APPROACH 1 SUREKHA RATHOD, 2 SANGITA NIKUMBH 1,2 Yadavrao Tasgaonkar Institute Of Engineering & Technology, YTIET, karjat, India E-mail:

More information

Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers

Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers Vol.2, Issue.3, May-June 2012 pp-854-858 ISSN: 2249-6645 Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers Bishnu Prasad Das 1, Ranjan Parekh

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

AN OVERVIEW OF HINDI SPEECH RECOGNITION

AN OVERVIEW OF HINDI SPEECH RECOGNITION AN OVERVIEW OF HINDI SPEECH RECOGNITION Neema Mishra M.Tech. (CSE) Project Student G H Raisoni College of Engg. Nagpur University, Nagpur, neema.mishra@gmail.com Urmila Shrawankar CSE Dept. G H Raisoni

More information

Speaker Identification based on GFCC using GMM

Speaker Identification based on GFCC using GMM Speaker Identification based on GFCC using GMM Md. Moinuddin Arunkumar N. Kanthi M. Tech. Student, E&CE Dept., PDACE Asst. Professor, E&CE Dept., PDACE Abstract: The performance of the conventional speaker

More information

Voice Transformation

Voice Transformation Voice Transformation Mark Tse Columbia University EE6820 Speech and Audio Processing Project Report Spring 2003 Abstract Voice transformation is a technique that modifies a source speaker s speech so it

More information

Analysis Of Emotion Recognition System Through Speech Signal Using KNN, GMM & SVM Classifier

Analysis Of Emotion Recognition System Through Speech Signal Using KNN, GMM & SVM Classifier www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 6 June 2015, Page No. 12523-12528 Analysis Of Emotion Recognition System Through Speech Signal Using

More information

Chabib Arifin* 1, Hartanto Junaedi 2 1,2 Institut Sains Terapan dan Teknologi Surabaya/ Magister of Information Technology

Chabib Arifin* 1, Hartanto Junaedi 2 1,2 Institut Sains Terapan dan Teknologi Surabaya/ Magister of Information Technology KINETIK, Vol. 3, No. 2, May 2018, Pp. 181-190 ISSN : 2503-2259 E-ISSN : 2503-22677 181 Emotion Sound Classification with Support Vector Machine Algorithm Chabib Arifin* 1, Hartanto Junaedi 2 1,2 Institut

More information

Speaker Recognition in Farsi Language

Speaker Recognition in Farsi Language Speaker Recognition in Farsi Language Marjan. Shahchera Abstract Speaker recognition is the process of identifying a person with his voice. Speaker recognition includes verification and identification.

More information

Study of Speaker s Emotion Identification for Hindi Speech

Study of Speaker s Emotion Identification for Hindi Speech Study of Speaker s Emotion Identification for Hindi Speech Sushma Bahuguna BCIIT, New Delhi, India sushmabahuguna@gmail.com Y.P Raiwani Dept. of Computer Science and Engineering, HNB Garhwal University

More information

Artificial Intelligence 2004

Artificial Intelligence 2004 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech Recognition acoustic signal as input conversion

More information

SAiL Speech Recognition or Speech-to-Text conversion: The first block of a virtual character system.

SAiL Speech Recognition or Speech-to-Text conversion: The first block of a virtual character system. Speech Recognition or Speech-to-Text conversion: The first block of a virtual character system. Panos Georgiou Research Assistant Professor (Electrical Engineering) Signal and Image Processing Institute

More information

Towards Lower Error Rates in Phoneme Recognition

Towards Lower Error Rates in Phoneme Recognition Towards Lower Error Rates in Phoneme Recognition Petr Schwarz, Pavel Matějka, and Jan Černocký Brno University of Technology, Czech Republic schwarzp matejkap cernocky@fit.vutbr.cz Abstract. We investigate

More information

Theory and Applications

Theory and Applications Theory and Applications of Digital Speech Processing First Edition Lawrence R. Rabiner Rutgers University and the University of California at Santa Barbara Ronald W. Schafer Hewlett-Packard Laboratories

More information

BROAD PHONEME CLASSIFICATION USING SIGNAL BASED FEATURES

BROAD PHONEME CLASSIFICATION USING SIGNAL BASED FEATURES BROAD PHONEME CLASSIFICATION USING SIGNAL BASED FEATURES Deekshitha G 1 and Leena Mary 2 1,2 Advanced Digital Signal Processing Research Laboratory, Department of Electronics and Communication, Rajiv Gandhi

More information

A Study of Speech Emotion and Speaker Identification System using VQ and GMM

A Study of Speech Emotion and Speaker Identification System using VQ and GMM www.ijcsi.org http://dx.doi.org/10.20943/01201604.4146 41 A Study of Speech Emotion and Speaker Identification System using VQ and Sushma Bahuguna 1, Y. P. Raiwani 2 1 BCIIT (Affiliated to GGSIPU) New

More information

Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529

Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529 SMOOTHED TIME/FREQUENCY FEATURES FOR VOWEL CLASSIFICATION Zaki B. Nossair and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA, 23529 ABSTRACT A

More information

SPEECH. Linguistics. What is it? Linguistics. Acoustics. Physiology

SPEECH. Linguistics. What is it? Linguistics. Acoustics. Physiology Speech Production Speech Perception Speech Analysis Speech Synthesis Speech Recognition Speech Coding Human Factors Hearing SPEECH What is it? Linguistics Acoustics Physiology Speaker The Speech Chain

More information

CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL

CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL CHAPTER-4 SUBSEGMENTAL, SEGMENTAL AND SUPRASEGMENTAL FEATURES FOR SPEAKER RECOGNITION USING GAUSSIAN MIXTURE MODEL Speaker recognition is a pattern recognition task which involves three phases namely,

More information

Speech To Text Conversion Using Natural Language Processing

Speech To Text Conversion Using Natural Language Processing Speech To Text Conversion Using Natural Language Processing S. Selva Nidhyananthan Associate Professor, S. Amala Ilackiya UG Scholar, F.Helen Kani Priya UG Scholar, Abstract Speech is the most effective

More information

Speech Perception. Phonemes. Source-Filter Model. Speech Articulation Specialized for speech. (Image removed due to copyright considerations.

Speech Perception. Phonemes. Source-Filter Model. Speech Articulation Specialized for speech. (Image removed due to copyright considerations. Question: Why are we studying speech in a perception class? Speech Perception 9.35 Josh McDermott I mean, isn t language some high-level cognitive thing? Answer: Speech is received by the brain as a sound

More information

Speaker Independent Phoneme Recognition Based on Fisher Weight Map

Speaker Independent Phoneme Recognition Based on Fisher Weight Map peaker Independent Phoneme Recognition Based on Fisher Weight Map Takashi Muroi, Tetsuya Takiguchi, Yasuo Ariki Department of Computer and ystem Engineering Kobe University, - Rokkodai, Nada, Kobe, 657-850,

More information

Vocal Tract Acoustics

Vocal Tract Acoustics Vocal Tract Acoustics R. D. Kent Journal of Voice 1993 Presented by Daniel Felps Motivation This is an excellent paper to kick off speech recognition High level Overview of source-filter theory It introduces

More information

Emotion Recognition and Evaluation of Mandarin Speech Using Weighted D-KNN Classification

Emotion Recognition and Evaluation of Mandarin Speech Using Weighted D-KNN Classification Emotion Recognition and Evaluation of Mandarin Speech Using Weighted D-KNN Classification Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh, Yuan-Hao Chang Department of Computer Science and Engineering, Tatung

More information

STOP CONSONANT CLASSIFICTION USING RECURRANT NEURAL NETWORKS

STOP CONSONANT CLASSIFICTION USING RECURRANT NEURAL NETWORKS STOP CONSONANT CLASSIFICTION USING RECURRANT NEURAL NETWORKS NSF Summer Undergraduate Fellowship in Sensor Technologies David Auerbach (physics), Swarthmore College Advisors: Ahmed M. Abdelatty Ali, Dr.

More information

11Music and Speech. Perception. 11 Music and Speech Perception. 11 Music. Chapter 11. Music Speech

11Music and Speech. Perception. 11 Music and Speech Perception. 11 Music. Chapter 11. Music Speech 11Music and Speech Perception Chapter 11 11 Music and Speech Perception Music Speech 11 Music Music as a way to express thoughts and emotions Pythagoras: Numbers and musical intervals Some clinical psychologists

More information

Course Name: Speech Processing Course Code: IT443

Course Name: Speech Processing Course Code: IT443 Course Name: Speech Processing Course Code: IT443 I. Basic Course Information Major or minor element of program: Major Department offering the course: Information Technology Department Academic level:400

More information

Review Article A REVIEW ON LANDMARK DETECTION METHODOLOGIES OF STOP CONSONANTS

Review Article A REVIEW ON LANDMARK DETECTION METHODOLOGIES OF STOP CONSONANTS , pp.-316-320. Available online at http://www.bioinfopublication.org/jouarchive.php?opt=&jouid=bpj0000187 Review Article A REVIEW ON LANDMARK DETECTION METHODOLOGIES OF STOP CONSONANTS NIRMALA S. R.* AND

More information

A TIME-SERIES PRE-PROCESSING METHODOLOGY WITH STATISTICAL AND SPECTRAL ANALYSIS FOR VOICE CLASSIFICATION

A TIME-SERIES PRE-PROCESSING METHODOLOGY WITH STATISTICAL AND SPECTRAL ANALYSIS FOR VOICE CLASSIFICATION A TIME-SERIES PRE-PROCESSING METHODOLOGY WITH STATISTICAL AND SPECTRAL ANALYSIS FOR VOICE CLASSIFICATION by Lan Kun Master of Science in E-Commerce Technology 2013 Department of Computer and Information

More information

Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR

Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR Pitch Synchronous Spectral Analysis for a Pitch Dependent Recognition of Voiced Phonemes - PISAR Hans-Günter Hirsch Institute for Pattern Recognition, Niederrhein University of Applied Sciences, Krefeld,

More information

Speech Communication, Spring Intelligent Multimedia Program -

Speech Communication, Spring Intelligent Multimedia Program - Speech Communication, Spring 2006 - Intelligent Multimedia Program - Lecture 1: Introduction, Speech Production and Phonetics Zheng-Hua Tan Speech and Multimedia Communication Division Department of Communication

More information

MULTI-STREAM FRONT-END PROCESSING FOR ROBUST DISTRIBUTED SPEECH RECOGNITION

MULTI-STREAM FRONT-END PROCESSING FOR ROBUST DISTRIBUTED SPEECH RECOGNITION MULTI-STREAM FRONT-END PROCESSING FOR ROBUST DISTRIBUTED SPEECH RECOGNITION Kaoukeb Kifaya 1, Atta Nourozian 2, Sid-Ahmed Selouani 3, Habib Hamam 1, 4, Hesham Tolba 2 1 Department of Electrical Engineering,

More information

Fuzzy Clustering For Speaker Identification MFCC + Neural Network

Fuzzy Clustering For Speaker Identification MFCC + Neural Network Fuzzy Clustering For Speaker Identification MFCC + Neural Network Angel Mathew 1, Preethy Prince Thachil 2 Assistant Professor, Ilahia College of Engineering and Technology, Muvattupuzha, India 2 M.Tech

More information

Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier

Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier Speaker Recognition Using DWT- MFCC with Multi-SVM Classifier SWATHY M.S / PG Scholar Dept.of ECE Thejus Engineering College Thrissur, India MAHESH K.R/Assistant Professor Dept.of ECE Thejus Engineering

More information

Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis

Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis Speaker Transformation Algorithm using Segmental Codebooks (STASC) Presented by A. Brian Davis Speaker Transformation Goal: map acoustic properties of one speaker onto another Uses: Personification of

More information

Real-Time Speaker Identification

Real-Time Speaker Identification Real-Time Speaker Identification Evgeny Karpov 15.01.2003 University of Joensuu Department of Computer Science Master s Thesis Table of Contents 1 Introduction...1 1.1 Basic definitions...1 1.2 Applications...4

More information

Automatic Speaker Classification Based on Voice Characteristics

Automatic Speaker Classification Based on Voice Characteristics Automatic Speaker Classification Based on Voice Characteristics A thesis submitted for the degree of Master of Information Sciences (Research) of the University of Canberra Phuoc Thanh Nguyen December

More information

Speech Production. What is the source and filter model?

Speech Production. What is the source and filter model? Review for Test 3 Speech Production What is the source and filter model? Speech Production What is the source and filter model? Source: friction, voicing, aspiration cause movement of air Filter: moving

More information

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral

Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral EVALUATION OF AUTOMATIC SPEAKER RECOGNITION APPROACHES Pavel Král and Václav Matoušek University of West Bohemia in Plzeň (Pilsen), Czech Republic pkral matousek@kiv.zcu.cz Abstract: This paper deals with

More information

The ICSI RT-09 Speaker Diarization System. David Sun

The ICSI RT-09 Speaker Diarization System. David Sun The ICSI RT-09 Speaker Diarization System David Sun Papers The ICSI RT-09 Speaker Diarization System, Gerald Friedland, Adam Janin, David Imseng, Xavier Anguera, Luke Gottlieb, Marijn Huijbregts, Mary

More information

RECENT ADVANCES in COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS and CYBERNETICS

RECENT ADVANCES in COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS and CYBERNETICS Gammachirp based speech analysis for speaker identification MOUSLEM BOUCHAMEKH, BOUALEM BOUSSEKSOU, DAOUD BERKANI Signal and Communication Laboratory Electronics Department National Polytechnics School,

More information

A Hybrid Speech Recognition System with Hidden Markov Model and Radial Basis Function Neural Network

A Hybrid Speech Recognition System with Hidden Markov Model and Radial Basis Function Neural Network American Journal of Applied Sciences 10 (10): 1148-1153, 2013 ISSN: 1546-9239 2013 Justin and Vennila, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajassp.2013.1148.1153

More information

The acoustics of nasals and laterals

The acoustics of nasals and laterals 24.963 Linguistic Phonetics The acoustics of nasals and laterals 5 4 FREQ (khz) 3 2 1 1 2 3 4 5 6 7 8 9 Adapted from Stevens, K. N. Acoustic Phonetics. Cambridge, MA: MIT Press, 1999, chapter 6. 1 Assignments

More information

International Journal of Computer Trends and Technology (IJCTT) Volume 39 Number 2 - September2016

International Journal of Computer Trends and Technology (IJCTT) Volume 39 Number 2 - September2016 Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices Swapnanil Gogoi 1, Utpal Bhattacharjee 2 1

More information

MFCC-based Vocal Emotion Recognition Using ANN

MFCC-based Vocal Emotion Recognition Using ANN 2012 International Conference on Electronics Engineering and Informatics (ICEEI 2012) IPCSIT vol. 49 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V49.27 MFCC-based Vocal Emotion Recognition

More information

Analysis of Infant Cry through Weighted Linear Prediction Cepstral Coefficient and Probabilistic Neural Network

Analysis of Infant Cry through Weighted Linear Prediction Cepstral Coefficient and Probabilistic Neural Network Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features *

An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features * An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features * K. GOPALAN, TAO CHU, and XIAOFENG MIAO Department of Electrical and Computer Engineering Purdue University

More information

Audio-visual feature selection and reduction for emotion classification

Audio-visual feature selection and reduction for emotion classification Audio-visual feature selection and reduction for emotion classification Sanaul Haq, Philip J.B. Jackson and James Edge Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford,

More information

L18: Speech synthesis (back end)

L18: Speech synthesis (back end) L18: Speech synthesis (back end) Articulatory synthesis Formant synthesis Concatenative synthesis (fixed inventory) Unit-selection synthesis HMM-based synthesis [This lecture is based on Schroeter, 2008,

More information

Performance Analysis of Spoken Arabic Digits Recognition Techniques

Performance Analysis of Spoken Arabic Digits Recognition Techniques JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL., NO., JUNE 5 Performance Analysis of Spoken Arabic Digits Recognition Techniques Ali Ganoun and Ibrahim Almerhag Abstract A performance evaluation of

More information

AUTOMATED ALIGNMENT OF SONG LYRICS FOR PORTABLE AUDIO DEVICE DISPLAY

AUTOMATED ALIGNMENT OF SONG LYRICS FOR PORTABLE AUDIO DEVICE DISPLAY AUTOMATED ALIGNMENT OF SONG LYRICS FOR PORTABLE AUDIO DEVICE DISPLAY BY BRIAN MAGUIRE A thesis submitted to the Graduate School - New Brunswick Rutgers, The State University of New Jersey in partial fulfillment

More information

Lombard Speech Recognition: A Comparative Study

Lombard Speech Recognition: A Comparative Study Lombard Speech Recognition: A Comparative Study H. Bořil 1, P. Fousek 1, D. Sündermann 2, P. Červa 3, J. Žďánský 3 1 Czech Technical University in Prague, Czech Republic {borilh, p.fousek}@gmail.com 2

More information

Comparison of Speech Normalization Techniques

Comparison of Speech Normalization Techniques Comparison of Speech Normalization Techniques 1. Goals of the project 2. Reasons for speech normalization 3. Speech normalization techniques 4. Spectral warping 5. Test setup with SPHINX-4 speech recognition

More information

Ian S. Howard 1 & Peter Birkholz 2. UK

Ian S. Howard 1 & Peter Birkholz 2. UK USING STATE FEEDBACK TO CONTROL AN ARTICULATORY SYNTHESIZER Ian S. Howard 1 & Peter Birkholz 2 1 Centre for Robotics and Neural Systems, University of Plymouth, Plymouth, PL4 8AA, UK. UK Email: ian.howard@plymouth.ac.uk

More information

A SURVEY: SPEECH EMOTION IDENTIFICATION

A SURVEY: SPEECH EMOTION IDENTIFICATION A SURVEY: SPEECH EMOTION IDENTIFICATION Sejal Patel 1, Salman Bombaywala 2 M.E. Students, Department Of EC, SNPIT & RC, Umrakh, Gujarat, India 1 Assistant Professor, Department Of EC, SNPIT & RC, Umrakh,

More information

Lecture 1-7: Source-Filter Model

Lecture 1-7: Source-Filter Model Lecture 1-7: Source-Filter Model Overview 1. Properties of vowel sounds: we can observe a number of properties of vowel sounds which tell us a great deal about how they must be generated: (i) they have

More information

Introduction to Speech Technology

Introduction to Speech Technology 13/Nov/2008 Introduction to Speech Technology Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition

More information

Speaker Recognition Using MFCC and GMM with EM

Speaker Recognition Using MFCC and GMM with EM RESEARCH ARTICLE OPEN ACCESS Speaker Recognition Using MFCC and GMM with EM Apurva Adikane, Minal Moon, Pooja Dehankar, Shraddha Borkar, Sandip Desai Department of Electronics and Telecommunications, Yeshwantrao

More information

This lecture. Speech production and articulatory phonetics. Mel-frequency cepstral coefficients (i.e., the input to ASR systems) Next week: 3 lectures

This lecture. Speech production and articulatory phonetics. Mel-frequency cepstral coefficients (i.e., the input to ASR systems) Next week: 3 lectures This lecture Speech production and articulatory phonetics. Mel-frequency cepstral coefficients (i.e., the input to ASR systems) Next week: 3 lectures Some images from Jim Glass course 6.345 (MIT), the

More information

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION

FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION FILTER BANK FEATURE EXTRACTION FOR GAUSSIAN MIXTURE MODEL SPEAKER RECOGNITION James H. Nealand, Alan B. Bradley, & Margaret Lech School of Electrical and Computer Systems Engineering, RMIT University,

More information

An Emotion Recognition System based on Right Truncated Gaussian Mixture Model

An Emotion Recognition System based on Right Truncated Gaussian Mixture Model An Emotion Recognition System based on Right Truncated Gaussian Mixture Model N. Murali Krishna 1 Y. Srinivas 2 P.V. Lakshmi 3 Asst Professor Professor Professor Dept of CSE, GITAM University Dept of IT,

More information

emotional speech Advanced Signal Processing Winter Term 2003 franz zotter

emotional speech Advanced Signal Processing Winter Term 2003 franz zotter emotional speech Advanced Signal Processing Winter Term 2003 franz zotter contents emotion psychology articulation of emotion physical, facial speech acoustic measures features, recognition affect bursts

More information

Evaluation of Different Feature Extraction Techniques for Continuous Speech Recognition

Evaluation of Different Feature Extraction Techniques for Continuous Speech Recognition Evaluation of Different Feature Extraction Techniques for Continuous Speech Recognition Hamdy K. Elminir, Mohamed Abu ElSoud, L. M. Abou El-Maged Misr Academy for Engineering & Technology Computer Science

More information

Keywords: Spoken Hindi word & numerals, Fourier descriptors, Correlation, Mel Frequency Cepstral Coefficient (MFCC) and Feature extraction.

Keywords: Spoken Hindi word & numerals, Fourier descriptors, Correlation, Mel Frequency Cepstral Coefficient (MFCC) and Feature extraction. Volume 3, Issue 5, May 213 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Frequency Analisys

More information

Myanmar Language Speech Recognition with Hybrid Artificial Neural Network and Hidden Markov Model

Myanmar Language Speech Recognition with Hybrid Artificial Neural Network and Hidden Markov Model ISBN 978-93-84468-20-0 Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT'2015) Singapore, March 29-30, 2015, pp. 116-122 Myanmar Language Speech Recognition with

More information

Speaker Identification for Biometric Access Control Using Hybrid Features

Speaker Identification for Biometric Access Control Using Hybrid Features Speaker Identification for Biometric Access Control Using Hybrid Features Avnish Bora Associate Prof. Department of ECE, JIET Jodhpur, India Dr.Jayashri Vajpai Prof. Department of EE,M.B.M.M Engg. College

More information

SPECTRAL CORRELATES OF BREATHINESS AND ROUGHNESS FOR DIFFERENT TYPES OF VOWEL FRAGMENTS. Guus de Krom

SPECTRAL CORRELATES OF BREATHINESS AND ROUGHNESS FOR DIFFERENT TYPES OF VOWEL FRAGMENTS. Guus de Krom SPECTRAL CORRELATES OF BREATHINESS AND ROUGHNESS FOR DIFFERENT TYPES OF VOWEL FRAGMENTS Guus de Krom Research Institute for Language and Speech, University of Utrecht Trans 10, 3512 JK Utrecht, the Netherlands

More information

AUTOMATIC CLASSIFICATION OF ANIMAL VOCALIZATIONS. Patrick J. Clemins, B.S., M.S.

AUTOMATIC CLASSIFICATION OF ANIMAL VOCALIZATIONS. Patrick J. Clemins, B.S., M.S. AUTOMATIC CLASSIFICATION OF ANIMAL VOCALIZATIONS by Patrick J. Clemins, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements

More information

HMM-Based Stressed Speech Modeling with Application to Improved Synthesis and Recognition of Isolated Speech Under Stress

HMM-Based Stressed Speech Modeling with Application to Improved Synthesis and Recognition of Isolated Speech Under Stress IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 3, MAY 1998 201 HMM-Based Stressed Speech Modeling with Application to Improved Synthesis and Recognition of Isolated Speech Under Stress Sahar

More information

Low-Delay Singing Voice Alignment to Text

Low-Delay Singing Voice Alignment to Text Low-Delay Singing Voice Alignment to Text Alex Loscos, Pedro Cano, Jordi Bonada Audiovisual Institute, Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain {aloscos, pcano, jboni }@iua.upf.es http://www.iua.upf.es

More information

Tone Recognition of Isolated Mandarin Syllables

Tone Recognition of Isolated Mandarin Syllables Tone Recognition of Isolated Mandarin Syllables Zhaoqiang Xie and Zhenjiang Miao Institute of Information Science, Beijing Jiao Tong University, Beijing 100044, P.R. China {08120470,zjmiao}@bjtu.edu.cn

More information

Pitch-based Gender Identification with Two-stage Classification

Pitch-based Gender Identification with Two-stage Classification Pitch-based Gender Identification with Two-stage Classification Yakun Hu, Dapeng Wu, and Antonio Nucci 1 Abstract In this paper, we address the speech-based gender identification problem Mel-Frequency

More information

Table 1: Classification accuracy percent using SVMs and HMMs

Table 1: Classification accuracy percent using SVMs and HMMs Feature Sets for the Automatic Detection of Prosodic Prominence Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Jennifer Cole, Mark Hasegawa-Johnson, and Margaret Fleck This work presents a series of experiments

More information

The Use of Dynamic Vocal Tract Model for constructing the Formant Structure of the Vowels

The Use of Dynamic Vocal Tract Model for constructing the Formant Structure of the Vowels The Use of Dynamic Vocal Tract Model for constructing the Formant tructure of the Vowels Vera V. Evdoimova Department of Phonetics, aint-petersburg tate University, aint-petersburg, Russia postmaster@phonetics.pu.ru

More information

Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems

Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems Nisha.V.S, M.Jayasheela Abstract Speaker recognition is the process of automatically recognizing a person on the basis

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

21-23 September 2009, Beijing, China. Evaluation of Automatic Speaker Recognition Approaches

21-23 September 2009, Beijing, China. Evaluation of Automatic Speaker Recognition Approaches 21-23 September 2009, Beijing, China Evaluation of Automatic Speaker Recognition Approaches Pavel Kral, Kamil Jezek, Petr Jedlicka a University of West Bohemia, Dept. of Computer Science and Engineering,

More information

COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM

COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM COMPARATIVE STUDY OF MFCC AND LPC FOR MARATHI ISOLATED WORD RECOGNITION SYSTEM Leena R Mehta 1, S.P.Mahajan 2, Amol S Dabhade 3 Lecturer, Dept. of ECE, Cusrow Wadia Institute of Technology, Pune, Maharashtra,

More information

PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY

PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY V. Karthikeyan 1 and V. J. Vijayalakshmi 2 1 Department of ECE, VCEW, Thiruchengode, Tamilnadu, India, Karthick77keyan@gmail.com

More information

Combining Finite State Machines and LDA for Voice Activity Detection

Combining Finite State Machines and LDA for Voice Activity Detection Combining Finite State Machines and LDA for Voice Activity Detection Elias Rentzeperis, Christos Boukis, Aristodemos Pnevmatikakis, and Lazaros C. Polymenakos Athens Information Technology, 19.5 Km Markopoulo

More information