Determining Emotion in Speech
|
|
- Frederick Caldwell
- 5 years ago
- Views:
Transcription
1 Determining Emotion in Speech Charles Van Winkle University of Washington
2 Reviewed Literature Toward Detecting Emotions in Spoken Dialogs Publication Date 2005 Authors Chul Min Lee Shrikanth S. Narayanan Detecting emotional state of a child in a conversational computer game Publication Date 2010 Authors Serdar Yildirim Shrikanth Narayanan Alexandros Potamianos 2
3 Toward Detecting Emotions in Spoken Dialogs Observations Claims Role of spoken language interfaces in human-computer interaction applications has increased Automatically recognizing emotions from human speech has therefore grown in importance Research in understanding and modeling human emotions is increasingly attracting attention from the engineering community There is an increasing need to know not only what information a user conveys but how it is being conveyed Emotions are important in human communication and decisionmaking It is desirable that intelligent human-machine interface be able to accommodate human emotions in an appropriate way 3
4 Toward Detecting Emotions in Spoken Dialogs Challenges Claims It is difficult to define what emotion means in a precise way Disagreement on the number of emotion categories It may not be necessary or practical to recognize a large variety of emotions in the context of developing algorithms for conversational interfaces. Reconciling long-term properties like moods with short-term emotional states Previous Studies show promise of using higher level linguistic information for emotion recognition 4
5 Toward Detecting Emotions in Spoken Dialogs The Old Acoustic Signal Pattern Recognition Maximum likelihood Bayes classification Kernel regression K-nearest neighborhood methods Fisher linear discrimination methods An ensemble of neural networks 5
6 Toward Detecting Emotions in Spoken Dialogs The Old Acoustic Features Pitch-Related Features Fundamental Frequency (aka Pitch), (aka F0) (and other formant frequencies) Pitch Contour Energy Timing Features Speech Rate Boundaries of Phrases/Words/Phonemes Spectral Information Voiced and Unvoiced portions 6
7 Toward Detecting Emotions in Spoken Dialogs The Old Discourse Information Has been used in conjunction with acoustic correlates Topic and/or Sub-Dialog Repetition Correction information Use of Swear Words* Negation How to combine the different information sources (e.g. Acoustic & Discourse) Fusion at the Feature Level Suffers from potential dimensionality issues in regards to classification with increasing feature sizes. 7
8 Toward Detecting Emotions in Spoken Dialogs The New Favor the notion of application-dependent emotions Examine a reduced space of emotions Negative (anger and frustration in human speech) and Non-Negative emotions (the complement) Data Set Speech signals derived from a commercially deployed automatic call center dialog system Combine various aspects of spoken language information Acoustic Lexical Discourse Intended Use Detection of negative emotions can be used as a strategy to improve the quality of the service in automated call center applications 8
9 Toward Detecting Emotions in Spoken Dialogs The Plan Acoustic Leverage previously published results and use a number of acoustic correlates Systematically Reconcile through Feature Selection Feature Reduction 9
10 Toward Detecting Emotions in Spoken Dialogs The Plan Discourse Separate users responses into 5 categories Rejection Found more often in negative emotion utterances Repetition Rephrase Ask-Start Over None of the Above Mostly factual responses to voice prompts like giving the name of a person or place (in the corpus) 10
11 Toward Detecting Emotions in Spoken Dialogs The Plan Language Introduce a new method for estimating the emotion information conveyed by words (and by sequences of words) Automatically calculate emotional salience of the words in the specific (constrained) data corpus How to combine the various information sources? Fusion at the decision level Linear Discriminant Classifiers with Gaussian class-conditional probability K-Nearest Neighborhood classifiers Emotional Salience is a measure of how much information a word provides about a given emotion category 11
12 Toward Detecting Emotions in Spoken Dialogs The Data Observations Claims Most studies in emotion recognition in speech have used actors voices Single utterances for archetypal emotions Results from these may not be generalized to human-machine interaction scenarios. In non-dialog settings Real data suffers from coverage problems Need vast amounts of data characterizing various emotions in various contexts A limited-domain approach allows in-depth focus on a finite set of emotions Using significant amounts of data obtained from realistic human-machine interactions 12
13 Toward Detecting Emotions in Spoken Dialogs The Data Speech data 8 khz, 8-bit, µ-law compression Obtained From real users engaged in spoken dialog with a machine agent Commercially-deployed call center application 1187 calls* Each having an average of 6 utterances About 7200 total utterances Database was whittled down from thousands of calls to only include a fraction with potentially negative emotions. Authors used some automatic pre-processing and some subjective tagging by 4 different human listeners. 13
14 Toward Detecting Emotions in Spoken Dialogs Acoustic Fundamental Frequency (F0) mean, median, standard deviation, maximum, minimum, range, & linear regression coefficient Energy mean, median, standard deviation, maximum, minimum, range, & linear regression coefficient Duration speech-rate, ratio of duration of voiced and unvoiced region, and duration of the longest voiced speech Formants First and second formant frequencies (F1, F2), and their bandwidths (BW1, BW2). Also mean of each feature 14
15 Toward Detecting Emotions in Spoken Dialogs Acoustic Forward Selection to Reduce Dimensionality Two sets of rank-ordered selected features 10-best features 15-best features Principle Component Analysis to possibly further reduce dimensionality Male 15-best Ratio of duration of voiced and unvoiced region, energy STDEV*, energy median, F0 regression coeff., F0 median, energy regression coeff., energy max, energy min, energy range, duration of the longest voiced speech, F0 mean, BW1, F0 max, BW2 Female 15-best Ratio of duration of voiced and unvoiced region, energy median, F0 regression coeff., speech rate, energy min, duration of the longest voiced speech, energy regression coeff., F0 median, F0 mean, F1, energy mean, energy max, F0 max, energy range, energy STDEV 15
16 Toward Detecting Emotions in Spoken Dialogs Lexical Word Salience Emotion Wrong 0.72 Negative Computer 0.72 Negative Damn 0.72 Negative No 0.45 Negative Arrival 0.33 Non-Negative Phoenix 0.33 Non-Negative Delayed 0.21 Non-Negative Baggage 0.20 Non-Negative After salience calculation, a salient word pair dictionary was constructed by only retaining word pairs that have greater salience values than a pre-chosen threshold and optimized on held-out data. Gender-Independent. 16
17 Toward Detecting Emotions in Spoken Dialogs Lexical 17
18 Toward Detecting Emotions in Spoken Dialogs Discourse Male Female Total Tag Negative Non-Negative Negative Non-Negative Negative Non-Negative Rejection Repeat Rephrase Ask-Startover Non Total Labeling performed by one person, based on utterance transcriptions Rephrase is non-perfect Repeat, & eventually becomes same category 18
19 Toward Detecting Emotions in Spoken Dialogs Error k = 8 for Male k = 4 for Female 19
20 Toward Detecting Emotions in Spoken Dialogs Error M/F 20
21 Toward Detecting Emotions in Spoken Dialogs Error Male 21
22 Toward Detecting Emotions in Spoken Dialogs Error Female 22
23 23
24 computer game Observations Claims Over the last few years, attention to automatic recognition of users communicative styles within spoken dialog system frameworks has increased It is important to know not only what was said but also how something was communicated to a dialog system Enabling automatic emotion recognition within a multimodal dialog system is an emerging trend Being able to detect the users emotion can help enhance the capability of such systems in terms of being more natural and responsive 24
25 computer game Observations Claims Currently deployed spoken dialog interfaces are limited in terms of handling the rich information contained in speech Their scope in supporting natural human-machine interaction is therefore limited as well. Much of the work on emotion analysis focuses on databases with acted speech This provides certain useful knowledge, but it is more suitable to work on data that is directly representative and suitable for the domain application in mind 25
26 computer game Challenges Claims Most research on emotion recognition is primarily targeted towards adult users Greater variability exists in the acoustic and linguistic characteristics of children s speech. These parameters change with age and gender. Automatic recognition from speech itself is a difficult problem It may be difficult to elicit acted speech from children Children are one of the potential beneficiaries of computers with spoken interfaces, e.g. for educational applications and games. It is important to identify emotionally salient features by means of emotion recognition as a function of gender and age group It is not necessary to recognize a large set of emotions 26
27 computer game Survey of the Corpora Databases of children speech Mostly used for acoustic analysis and modeling Some are read speech corpora Recent databases Child-machine spontaneous speech interaction Open-ended spoken dialog interaction between children and animated characters in a game setting Data from children spontaneously communicating with the AIBO robot (Emotional labeling for this corpus is available) A corpus of child-machine spoken dialog interaction in a game setting (Used in this paper) 27
28 computer game Previous Acoustic Techniques Acoustic Signal Pattern Recognition Used to separate emotional coloring present in (children s) speech Popular features Phoneme-, syllable-, and word-level statistics corresponding to F0 Energy Duration Spectral Parameters Voice Quality Parameters 28
29 computer game Previous Aggregate Techniques Previous studies show that younger children use less overt politeness markers and express more frustration compared to older children It has been shown that the use of speech and language features for predicting student emotions in human-computer tutoring dialogs improves accuracy Promising results in the combined use of acoustic, spectral, and language information for detecting confidence, puzzlement, and hesitation in child-machine dialog tasks Language model features might be poor predictors of frustration Emotion recognition performance can be improved by using contextual information in addition to acoustic features 29
30 computer game Proposal Focus on two attitudinal states Polite and Frustrated Authors believe that this is well-suited to domain of child-computer interfaces Data Set Children s Interactive Multimedia Project (ChIMP) database Combine various aspects of spoken language information Acoustic Language Extend the notion of Emotional Salience Intended Use Detection of polite and frustrated states in children of different age groups and genders 30
31 computer game The Data Spontaneous child-machine spoken dialog interaction in a game setting Task was to play the game Where in the USA is Carmen Sandiego? Goal is to identify and arrest a cartoon criminal Children had to interact with several animated characters to obtain clues Most children played the game twice Contains speech data collected from 160 boys and girls (ages 6-14) Wizard-of-Oz technique Over 50,000 utterances 31
32 computer game The Data Researchers tagged speech from 103 out of the 160 players Neutral, Polite, or Frustrated Age Group Female Male Total Results are presented as a function of age group and gender Table to the right shows number of subjects per category Image Here Total
33 computer game The Data Goals Identify age and gender trends in emotional state Identify lexical, semantic and pragmatic markers of emotional state Neutral Polite Frustrated Total Only utterances where both labelers are used Image Here Table to the right shows number of instances (speaker turn) for each emotional class for each gender and age group Male Female Total 10, ,585 33
34 computer game The Data Goals Identify age and gender trends in emotional state Identify lexical, semantic and pragmatic markers of emotional state Neutral Polite Frustrated Total % 17% 14% 37% % 20% 6% 35% Only utterances where both labelers are used % 15.8% 16% 28% Image Here Table to the right shows percent of instances (speaker turn) for each emotional class for each gender and age group Male 72% 15% 13% 53% Female 69% 20% 11% 47% Total 70% 18% 12% 100% 34
35 computer game Lexical and Pragmatic Markers Polite Explicit Markers please, thank you, excuse me Implicit Markers may I, could you, would you Usage of explicit vs. implicit varies with age Frustrated Typical Lexical Markers shut up, oh man, hurry, oops, heck Pragmatic Markers Repetition or getting stuck in the same dialog state for multiple turns often indicated that a child was experiencing difficulty with the task and was getting frustrated 35
36 computer game Feature Extraction Acoustic Feature Extraction Low-Level descriptors 384 features were extracted using opensmile feature extraction. Features comprise of utterance level statistics corresponding to pitch frequency, RMS energy, zero-crossing-rate, harmonics-to-noise ratio, and 1 12 MFCCs. Delta coefficients were also computed to each of these LLD. Twelve statistics from each LLD and Delta Coefficients mean, standard deviation, skewness, kurtosis, maximum and minimum value, relative position, range and two linear regression coefficients with their mean square error 36
37 computer game Feature Extraction Lexical Feature Extraction Certain words are associated with specific emotions and attitudes Two different modeling approaches are proposed Information theoretic analysis is used for lexical feature selection, in conjunction with Bayesian classifiers Second, latent semantic analysis (LSA) is used to transform the feature space and then cosine distance metrics are used to compute emotional distance between utterances (Both techniques used widely in the field) Calculate Emotional Salience Then create Bayesian classifier 37
38 computer game Feature Extraction Male Female Class Drop it Hey you Do it No thank Find the Frustrated Get me You there Stop miss Not that Pick that Frustrated Shut up Someone talk Need this My pad Go talk Frustrated Stop this You repeat I don t You pick To issue Frustrated Stop please You mind Hello there Doing mister Hello I d Polite You good Suspect can Please show You have Thanks can Polite Please tell Person please Very much Please take Would you Polite The phone You can You get Look that Where d she Polite After salience calculation, a salient word pair dictionary was constructed by only retaining word pairs that have greater salience values than a pre-chosen threshold and optimized on held-out data 38
39 computer game Feature Extraction Discourse and Contextual Information Modeling Model relationship between emotional state dialog state Simple Bayesian model Assume emotional state depends directly on dialog state history (past three) Context because emotions are persistent Use derivative of the acoustic features as extra parameter Table to the right shows examples for 5 (out of 9) possible sates User Can I talk to him please? Tell me about the suspect Dialog State Talk2Him TellmeAbout Can I see my Image Here EnterFeature choices for height? Tall for height Thank you Tell me where did the suspect go EnterFeature CloseBook WhereDid 39
40 computer game Fusion of Classifiers Decision-Level Fusion Acoustic, Lexical, and Contextual information sources If classifiers are statistical and calculate posterior probabilities Can use Average of Decision Can use Product of Decision (assuming independence) Acoustic doesn t fit the above description Use distance metric instead of decision then sigmoidal transform Two-Way Classification Politeness is more of a speaking style than emotional state Three-Way Classification 40
41 computer game Acoustic Evaluation Unweighted Average Recall MFCC F0 RMS energy Voicing ZCR Male 67.9% 45.4% 42.9% 40.3% 41.1% Female 70.4% 51.3% 44.9% 45.8% 46.3% % 51.7% 44.2% 45.7% 44.2% % 47.9% 45.2% 44.0% 43.6% % 49.3% 44.0% 43.0% 44.9% Used a k-nearest neighborhood classifier (k-nnr) with k=3. Classification results for the three categories (neutral, polite, frustrated) are computed using 10-fold cross-validation 41
42 computer game Two-Way Classification Male Female Acoustic Lex Lex LSA Context Polite vs. Others Unweighted Average Recall % 42
43 computer game Two-Way Classification Male Female Acou + Lex Acou + Lex Acou + LSA Acou + Ctxt Polite vs. Others Unweighted Average Recall % 43
44 computer game Two-Way Classification Male Female Acou + Lex1 + C Acou + Lex2 + C Acou + LSA + C Polite vs. Others Unweighted Average Recall % 44
45 computer game Two-Way Classification Male Female Acoustic Lex Lex LSA Context Frustrated vs. Others Unweighted Average Recall % 45
46 computer game Two-Way Classification Male Female Acou + Lex Acou + Lex Acou + LSA Acou + Ctxt Frustrated vs. Others Unweighted Average Recall % 46
47 computer game Two-Way Classification Male Female Acou + Lex1 + C Acou + Lex2 + C Acou + LSA + C Frustrated vs. Others Unweighted Average Recall % 47
48 computer game Three-Way Classification Male Female Acoustic Lex Lex LSA Context Three-way classification results in terms of unweighted average recall% 48
49 computer game Three-Way Classification Male Female Acou + Lex Acou + Lex Acou + LSA Acou + Ctxt Three-way classification results in terms of unweighted average recall% 49
50 computer game Three-Way Classification Male Female Acou + Lex1 + C Acou + Lex2 + C Acou + LSA + C Three-way classification results in terms of unweighted average recall% 50
51 51
Speech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationAffective Classification of Generic Audio Clips using Regression Models
Affective Classification of Generic Audio Clips using Regression Models Nikolaos Malandrakis 1, Shiva Sundaram, Alexandros Potamianos 3 1 Signal Analysis and Interpretation Laboratory (SAIL), USC, Los
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationEvidence for Reliability, Validity and Learning Effectiveness
PEARSON EDUCATION Evidence for Reliability, Validity and Learning Effectiveness Introduction Pearson Knowledge Technologies has conducted a large number and wide variety of reliability and validity studies
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationAtypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationRhythm-typology revisited.
DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationFunctional Skills Mathematics Level 2 assessment
Functional Skills Mathematics Level 2 assessment www.cityandguilds.com September 2015 Version 1.0 Marking scheme ONLINE V2 Level 2 Sample Paper 4 Mark Represent Analyse Interpret Open Fixed S1Q1 3 3 0
More informationChapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4
Chapters 1-5 Cumulative Assessment AP Statistics Name: November 2008 Gillespie, Block 4 Part I: Multiple Choice This portion of the test will determine 60% of your overall test grade. Each question is
More informationUsing EEG to Improve Massive Open Online Courses Feedback Interaction
Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie
More informationSegregation of Unvoiced Speech from Nonspeech Interference
Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationRachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA
LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,
More informationSession 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design
Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Paper #3 Five Q-to-survey approaches: did they work? Job van Exel
More informationHow to Judge the Quality of an Objective Classroom Test
How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationPractice Examination IREB
IREB Examination Requirements Engineering Advanced Level Elicitation and Consolidation Practice Examination Questionnaire: Set_EN_2013_Public_1.2 Syllabus: Version 1.0 Passed Failed Total number of points
More informationConversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games
Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games David B. Christian, Mark O. Riedl and R. Michael Young Liquid Narrative Group Computer Science Department
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationA Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language
A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationAutomatic Pronunciation Checker
Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale
More informationAcoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA
Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan James White & Marc Garellek UCLA 1 Introduction Goals: To determine the acoustic correlates of primary and secondary
More informationEnglish Language and Applied Linguistics. Module Descriptions 2017/18
English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationAN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)
B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationDialog Act Classification Using N-Gram Algorithms
Dialog Act Classification Using N-Gram Algorithms Max Louwerse and Scott Crossley Institute for Intelligent Systems University of Memphis {max, scrossley } @ mail.psyc.memphis.edu Abstract Speech act classification
More informationThink A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -
C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,
More informationIndividual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION
L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationLongest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov Dec. 2015), PP 01-07 www.iosrjournals.org Longest Common Subsequence: A Method for
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationAn Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District
An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District Report Submitted June 20, 2012, to Willis D. Hawley, Ph.D., Special
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationMath-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade
Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade The third grade standards primarily address multiplication and division, which are covered in Math-U-See
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationSEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH
SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud
More informationCourse Law Enforcement II. Unit I Careers in Law Enforcement
Course Law Enforcement II Unit I Careers in Law Enforcement Essential Question How does communication affect the role of the public safety professional? TEKS 130.294(c) (1)(A)(B)(C) Prior Student Learning
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationSpeech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationProbability estimates in a scenario tree
101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.
More informationSpeaker recognition using universal background model on YOHO database
Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationListening and Speaking Skills of English Language of Adolescents of Government and Private Schools
Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationIntroduction to the Common European Framework (CEF)
Introduction to the Common European Framework (CEF) The Common European Framework is a common reference for describing language learning, teaching, and assessment. In order to facilitate both teaching
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationAGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016
AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory
More informationSpeech Translation for Triage of Emergency Phonecalls in Minority Languages
Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More information