YUN- N U N G ( V I V I A N ) C H E N

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Python Machine Learning

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Laboratorio di Intelligenza Artificiale e Robotica

Calibration of Confidence Measures in Speech Recognition

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Modeling function word errors in DNN-HMM based LVCSR systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Emotional Variation in Speech-Based Natural Language Generation

Laboratorio di Intelligenza Artificiale e Robotica

A Reinforcement Learning Variant for Control Scheduling

Using dialogue context to improve parsing performance in dialogue systems

Georgetown University at TREC 2017 Dynamic Domain Track

Seminar - Organic Computing

Learning Methods in Multilingual Speech Recognition

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

INPE São José dos Campos

Reinforcement Learning by Comparing Immediate Reward

Learning Methods for Fuzzy Systems

arxiv: v1 [cs.lg] 7 Apr 2015

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Assignment 1: Predicting Amazon Review Ratings

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Degeneracy results in canalisation of language structure: A computational model of word learning

arxiv: v1 [cs.cv] 10 May 2017

Task Completion Transfer Learning for Reward Inference

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Dialog-based Language Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v1 [cs.cl] 27 Apr 2016

Parsing of part-of-speech tagged Assamese Texts

Radius STEM Readiness TM

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Residual Stacking of RNNs for Neural Machine Translation

Second Exam: Natural Language Parsing with Neural Networks

On the Formation of Phoneme Categories in DNN Acoustic Models

Lecture 1: Machine Learning Basics

A Pipelined Approach for Iterative Software Process Model

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Task Completion Transfer Learning for Reward Inference

Word Segmentation of Off-line Handwritten Documents

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Knowledge Transfer in Deep Convolutional Neural Nets

Speech Recognition at ICSI: Broadcast News and beyond

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Qualitative Research and Audiences. Thursday, February 23, 17

Lecture 10: Reinforcement Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

On the Combined Behavior of Autonomous Resource Management Agents

arxiv: v1 [cs.cl] 2 Apr 2017

Student Morningness-Eveningness Type and Performance: Does Class Timing Matter?

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Axiom 2013 Team Description Paper

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

The MEANING Multilingual Central Repository

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

An empirical study of learning speed in backpropagation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Linking Task: Identifying authors and book titles in verbose queries

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

CHAT To Your Destination

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Speaker Identification by Comparison of Smart Methods. Abstract

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Indian Institute of Technology, Kanpur

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

SOFTWARE EVALUATION TOOL

THE world surrounding us involves multiple modalities

Playful Practice of Early Literacy Skills via Customized Digital Books and Apps. Barbara Culatta and Kendra Hall-Kenyon

Human Emotion Recognition From Speech

Using Virtual Manipulatives to Support Teaching and Learning Mathematics

A study of speaker adaptation for DNN-based speech synthesis

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Control and Boundedness

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Forget catastrophic forgetting: AI that learns after deployment

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

MYCIN. The MYCIN Task

arxiv: v4 [cs.cl] 28 Mar 2016

Transcription:

YUN- N U N G ( V I V I A N ) C H E N H T T P : / / V I V I A N C H E N. I D V. T W H A K K A N I - T U R, T U R, G A O, D E N G 1

Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 2

Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 3

Spoken Dialogue System (SDS) Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). JARVIS Iron Man s Personal Assistant Baymax Personal Healthcare Companion Good intelligent assistants help users to organize and access information conveniently 4

Dialogue System Pipeline Speech Signal ASR Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Screen Display location? Text response Where are you located? Output Generation System Action request_locaion Language Understanding (LU) User Intent Detection Slot Filling Semantic Frame (Intents, Slots) request_movie genre=action date=this weekend Dialogue Management (DM) Dialogue State Tracking Policy Decision 5

0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360 375 390 405 420 435 450 465 480 495 Success Rate End-to-End Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen LU Importance Learning Curve of System Performance Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Simulation Epoch RL Agent w/o LU errors Rule Agent w/o LU errors 6

0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360 375 390 405 420 435 450 465 480 495 Success Rate End-to-End Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen LU Importance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Learning Curve of System Performance Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05 Simulation Epoch RL Agent w/o LU errors RL Agent w/ 5% LU errors >5% performance drop Rule Agent w/o LU errors Rule Agent w/ 5% LU errors The system performance is sensitive to LU errors, for both rule-based and reinforcement learning agents. 7

Dialogue System Pipeline Speech Signal ASR Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Screen Display location? Text response Where are you located? Output Generation System Action request_locaion current bottleneck error propagation Language Understanding (LU) User Intent Detection Slot Filling Semantic Frame (Intents, Slots) request_movie genre=action date=this weekend Dialogue Management (DM) Dialogue State Tracking Policy Decision SLU usually focuses on understanding single-turn utterances The understanding result is usually influenced by 1) local observations 2) global knowledge. 8

Spoken Language Understanding Domain Identification Intent Prediction Slot Filling D I U S communication send_email just sent email to bob about fishing this weekend O O O O O B-contact_name B-subject I-subject I-subject send_email(contact_name= bob, subject= fishing this weekend ) U 1 S 1 U 2 send email to bob B-contact_name send_email(contact_name= bob ) are we going to fish this weekend S B-message I-message I-message I-message 2 I-message I-message I-message send_email(message= are we going to fish this weekend ) 9

Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 10

MODEL ARCHITECTURE 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Contextual Sentence Encoder RNN mem x 1 x 2 x i history utterances {x i } p i m i c Knowledge Attention Distribution current utterance Memory Representation Sentence Encoder RNN in x 1 x 2 x i u Inner Product Weighted Sum h RNN Tagger W kg Knowledge Encoding Representation slot tagging sequence y V y t-1 h t-1 o V h t W W W U U M w t-1 M y t w t Idea: additionally incorporating contextual knowledge during slot tagging Chen, et al., End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in Interspeech, 2016. 11

MODEL ARCHITECTURE 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Contextual Sentence Encoder RNN mem CNN x 1 x 2 x i history utterances {x i } p i m i c Knowledge Attention Distribution current utterance Memory Representation Sentence Encoder RNN in CNN x 1 x 2 x i u Inner Product Weighted Sum h RNN Tagger W kg Knowledge Encoding Representation slot tagging sequence y V y t-1 h t-1 o V h t W W W U U M w t-1 M y t w t Idea: additionally incorporating contextual knowledge during slot tagging Chen, et al., End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in Interspeech, 2016. 12

END-TO-END TRAINING Tagging Objective slot tag sequence RNN Tagger contextual utterances & current utterance o y t-1 y t y t+1 V V V h t-1 h t h t+1 W W W W M U M U M U w t-1 w t w t+1 Automatically figure out the attention distribution without explicit supervision 13

Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 14

EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Knowledge Sentence Training Set First Turn Other Overall Encoding Encoder single-turn x x 60.6 16.2 25.5 The model trained on single-turn data performs worse for non-first turns due to mismatched training data 15

EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Treating multi-turn data as single-turn for training performs reasonable 16

EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Encoder- Tagger Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 multi-turn current utt (c) RNN 57.6 56.0 56.3 multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Encoding current and history utterances improves the performance but increases the training time 17

EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Encoder- multi-turn current utt (c) RNN 57.6 56.0 56.3 Tagger multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1 Applying memory networks significantly outperforms all approaches with much less training time 18

EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Encoder- Tagger Proposed NEW! NOT IN THE PAPER! Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 multi-turn current utt (c) RNN 57.6 56.0 56.3 multi-turn history + current (x, c) RNN 69.9 60.8 62.5 multi-turn history + current (x, c) RNN 73.2 65.7 67.1 multi-turn history + current (x, c) CNN 73.8 66.5 68.0 CNN produces comparable results for sentence encoding with shorter training time 19

Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 20

Conclusion The proposed end-to-end memory networks store contextual knowledge, which can be exploited dynamically based on an attention model for manipulating knowledge carryover for multi-turn understanding The end-to-end model performs the tagging task instead of classification The experiments show the feasibility and robustness of modeling knowledge carryover through memory networks 21

Future Work Leveraging not only local observation but also global knowledge for better language understanding Syntax or semantics can serve as global knowledge to guide the understanding model Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks, arxiv preprint arxiv: 1609.03286 22

Q & A T H A N K S F O R Y O U R AT T E N T I O N! The code will be available at https://github.com/yvchen/contextualslu 23