Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks

Similar documents
Deep Neural Network Language Models

Calibration of Confidence Measures in Speech Recognition

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Speech Recognition at ICSI: Broadcast News and beyond

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

arxiv: v1 [cs.cl] 27 Apr 2016

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking Task: Identifying authors and book titles in verbose queries

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3

A study of speaker adaptation for DNN-based speech synthesis

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Methods in Multilingual Speech Recognition

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v2 [cs.cl] 26 Mar 2015

Modeling function word errors in DNN-HMM based LVCSR systems

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Assignment 1: Predicting Amazon Review Ratings

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Human Emotion Recognition From Speech

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

Speech Emotion Recognition Using Support Vector Machine

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Generative models and adversarial training

Georgetown University at TREC 2017 Dynamic Domain Track

Adaptive Learning in Time-Variant Processes With Application to Wind Power Systems

THE world surrounding us involves multiple modalities

Switchboard Language Model Improvement with Conversational Data from Gigaword

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Rule Learning With Negation: Issues Regarding Effectiveness

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Rule Learning with Negation: Issues Regarding Effectiveness

arxiv: v1 [cs.cv] 10 May 2017

arxiv: v4 [cs.cl] 28 Mar 2016

WHEN THERE IS A mismatch between the acoustic

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Eye Movements in Speech Technologies: an overview of current research

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Reducing Features to Improve Bug Prediction

CSL465/603 - Machine Learning

A Reinforcement Learning Variant for Control Scheduling

Improvements to the Pruning Behavior of DNN Acoustic Models

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Degeneracy results in canalisation of language structure: A computational model of word learning

Probabilistic Latent Semantic Analysis

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Learning Methods for Fuzzy Systems

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Soft Computing based Learning for Cognitive Radio

Cultivating DNN Diversity for Large Scale Video Labelling

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

On the Formation of Phoneme Categories in DNN Acoustic Models

Python Machine Learning

Using dialogue context to improve parsing performance in dialogue systems

Axiom 2013 Team Description Paper

Residual Stacking of RNNs for Neural Machine Translation

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Vector Space Approach for Aspect-Based Sentiment Analysis

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

arxiv: v3 [cs.cl] 24 Apr 2017

arxiv: v1 [cs.lg] 15 Jun 2015

INPE São José dos Campos

Device Independence and Extensibility in Gesture Recognition

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Natural Language Processing. George Konidaris

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Investigation on Mandarin Broadcast News Speech Recognition

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Transcription:

Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks Bing Liu, Ian Lane Carnegie Mellon University liubing@cmu.edu, lane@cmu.edu

Outline Background & Motivation Proposed Methods Experiments & Results Conclusions 1

Outline Background & Motivation Proposed Methods Experiments & Results Conclusions 2

Background Spoken Language Understanding (SLU) is an important component in spoken dialog systems. Main tasks in SLU: Intent Detection Slot Filling 3

Background Intent detection Sequence classification SVM, CNN [1], Recursive NN [2], etc. Fig 1. CNN [1] intent model Fig 2. Recursive NN [2] intent model [1] Xu, Puyang, and Ruhi Sarikaya. "Convolutional neural network based triangular crf for joint intent detection and slot filling." ASRU, 2013. [2] Guo, Daniel, et al. "Joint semantic utterance classification and slot filling with recursive neural networks." SLT 2014. 4

Background Slot filling Sequence labeling MEMM, CRF, RNN [1, 2], etc. Fig. RNN slot filling model [1] Mesnil, Grégoire, et al. "Using recurrent neural networks for slot filling in spoken language understanding." IEEE/ ACM Transactions on Audio, Speech, and Language Processing, 2015. [2] Yao, Kaisheng, et al. "Spoken language understanding using long short-term memory neural networks." SLT, 2014. 5

Background Joint intent detection & slot filling Benefits: Simplifies the SLU systems Improves the generalization performance of a task using the other related task CNN [1], Recursive NN [2] [1] Xu, Puyang, and Ruhi Sarikaya. "Convolutional neural network based triangular crf for joint intent detection and slot filling." ASRU, 2013. [2] Guo, Daniel, et al. "Joint semantic utterance classification and slot filling with recursive neural networks." SLT 2014. 6

Background Limitations of previous joint SLU models: Conditioned on the entire word sequence Not suitable for online tasks 7

Motivation Develop a model that performs online (incremental) SLU as the new word arrives. SLU results provide additional context for next word prediction in ASR online decoding.! Joint online (incremental) SLU + LM 8

Query: First class flights from Phoenix to Seattle First à class à flights à from à Phoenix à to à Seattle Intent confidence scores Next word probability from LM Next Word Prob pittsburgh 1.1e-3 phone 0.7e-3 phoenix 1.4e-3 price 3.0e-3 Prob 2.1e-3 0.7e-3 2.4e-3 1.8e-3 Prob 2.6e-3 0.7e-3 2.4e-3 1.2e-3 9

Outline Background & Motivation Proposed Methods Experiments & Results Conclusions 10

Independent task models RNN Language Model RNN Intent Detection Model RNN Slot Filling Model 11

Joint model Intent model: Slot filling model: Language model: 12

Next step prediction: 13

Joint model training Training: linear interpolation of the cost for each task: Intent Slot filling LM 14

Query: First class flights from Phoenix to Seattle First à class à flights à from à Phoenix à to à Seattle Intent confidence scores à Intent estimation might be unstable at the beginning of the sequence Adjusted / Scaled intent context Fig. Schedule of increasing intent contribution to the context vector along with the growing input word sequence. 15

Joint model variations 16

Outline Background & Motivation Proposed Methods Experiments & Results Conclusions 17

Data set ATIS (Airline Travel Information System) Intent 18 intent classes evaluated on classification error rate. Slot Filling 127 slot labels evaluated on F1 score. 18

Experiments RNN model settings LSTM Cell Mini batch training Adam optimization method Dropout & L2 regularization ASR model settings AM: LibriSpeech AM LM: trained on ATIS corpus 19

Experiments Inputs: True text input Speech input with simulated noise Models: Independent training model Basic joint model Joint model with intent context Joint model with slot label context Joint model with intent & slot label context Tasks: Intent detection; Slot filling; Language modeling 20

Experiment Results True text input Intent detection 0.56% absolute (26.3% relative) error reduction over independent training intent model 21

Experiment Results True text input Slot filling Slight degradation on slot filling F1 score comparing to independent training slot filling model. 22

Experiment Results True text input Language modeling 11.8% relative reduction on perplexity comparing to the independent training language model 23

Experiment Results Noisy speech input & ASR output ASR Settings WER Intent Error F1 Score Decoding: LibriSpeech AM & 2-gram LM 14.51 4.63 84.46 Decoding: LibriSpeech AM & 2-gram LM Rescoring: 5-gram LM Decoding: LibriSpeech AM & 2-gram LM Rescoring: Independent training RNNLM Decoding: LibriSpeech AM & 2-gram LM Rescoring: Joint training RNNLM 13.66 5.02 85.08 12.95 4.63 85.43 12.59 4.44 86.87 24

Outline Background & Motivation Proposed Methods Experiments & Results Conclusions 25

Conclusions We proposed an RNN model for joint online (incremental) SLU and LM. Improved performance on intent detection and LM, with slight degradation on slot filling. Consistent performance gain over independent training model with noisy speech input. 26

Thanks & Questions 27