Open Domain Statistical Spoken Dialogue Systems

Similar documents
Calibration of Confidence Measures in Speech Recognition

arxiv: v3 [cs.cl] 24 Apr 2017

Using dialogue context to improve parsing performance in dialogue systems

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Lecture 1: Machine Learning Basics

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A study of speaker adaptation for DNN-based speech synthesis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Georgetown University at TREC 2017 Dynamic Domain Track

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference

Generative models and adversarial training

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Learning Methods in Multilingual Speech Recognition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Learning Methods for Fuzzy Systems

Speech Recognition at ICSI: Broadcast News and beyond

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

CSL465/603 - Machine Learning

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Deep Neural Network Language Models

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Python Machine Learning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A Comparison of Two Text Representations for Sentiment Analysis

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

arxiv: v1 [cs.cv] 10 May 2017

Speech Emotion Recognition Using Support Vector Machine

Lecture 10: Reinforcement Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Assignment 1: Predicting Amazon Review Ratings

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Indian Institute of Technology, Kanpur

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

arxiv: v1 [cs.cl] 27 Apr 2016

Reducing Features to Improve Bug Prediction

Eye Movements in Speech Technologies: an overview of current research

Rule Learning With Negation: Issues Regarding Effectiveness

Switchboard Language Model Improvement with Conversational Data from Gigaword

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Human Emotion Recognition From Speech

On the Formation of Phoneme Categories in DNN Acoustic Models

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Beyond the Pipeline: Discrete Optimization in NLP

CHAT To Your Destination

Improvements to the Pruning Behavior of DNN Acoustic Models

INPE São José dos Campos

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

AQUA: An Ontology-Driven Question Answering System

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Residual Stacking of RNNs for Neural Machine Translation

Evolutive Neural Net Fuzzy Filtering: Basic Description

Linking Task: Identifying authors and book titles in verbose queries

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Software Maintenance

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Dialog-based Language Learning

A Reinforcement Learning Variant for Control Scheduling

Truth Inference in Crowdsourcing: Is the Problem Solved?

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

A Vector Space Approach for Aspect-Based Sentiment Analysis

WHEN THERE IS A mismatch between the acoustic

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Rule Learning with Negation: Issues Regarding Effectiveness

Evolution of Symbolisation in Chimpanzees and Neural Nets

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Probabilistic Latent Semantic Analysis

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A Review: Speech Recognition with Deep Learning Methods

An investigation of imitation learning algorithms for structured prediction

The Conversational User Interface

arxiv: v1 [cs.lg] 15 Jun 2015

Reinforcement Learning by Comparing Immediate Reward

Natural Language Processing. George Konidaris

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

arxiv: v1 [cs.lg] 7 Apr 2015

Word Segmentation of Off-line Handwritten Documents

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Transcription:

Open Domain Statistical Spoken Dialogue Systems Steve Young Dialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering Department Cambridge, UK 1

Contents Building an End-to-End Statistical Dialogue System for a Single Domain Spoken Language Understanding Belief Tracking Policies and Dialogue Management Natural Language Generation Towards Open-Domain Dialogue Systems A multi-domain architecture Distributed Dialogue Management Incremental Domain Learning On-line Adaptation Conclusions 2

Statistical Spoken Dialogue To enable fully automatic on-line learning, all components must be trainable from data. Deploy, Collect Data, Improve 3

Statistical Spoken Dialog System I d like a cheap Italian on the east side of town inform( price=cheap, food=italian, area=east) [0.7] Restaurant Hotel Bar Italian Indian Dontcare North East South Expensive Cheap Dontcare Understanding Type Food Area Price ASR Semantic Decoder Belief Tracker User Turn Level Turn Level Dialogue Level Dialogue Level Policy Based Decision Logic Database/ Application TTS Message Generator You d like a cheap restaurant on the east side of town? What kind of food would you like? Generation confirm-request( price=cheap, area=east, food=?) Response Planner 4 confirm-request(food) Dialog Manager

Spoken Language Understanding (SLU) Various decoding strategies a) Semantic parsing I d like a cheap Italian on the east side of town Grammar Rules Phoenix Parser Frame: inform Type: restaurant Food: italian Price: cheap Area: east b) Semantic tagging ˆ Y = argmax Y P(Y X) eg. HMM, CRF X = I d like a cheap Italian on the east side of town Y = B-inform I-inform o B-price B-food o o B-area I-area I-area I-area inform price=cheap food=italian area=east c) Semantic tuple classifier SVM-area area=east [p=0.7] I d like a <p-value> <f-value> on the <a-value> side of town N-gram Features SVM-food SVM-price food=italian [p=0.8] price=cheap [p=0.5] 5 etc

SLU Performance Semantic tuple classifier Cambridge Restaurant System: Noisy in-car data, various conditions, 37% average word error rate (WER) 10571 training utterances, 4882 test utterances Features Trained On F-Score Item Cross Entropy Phoenix 0.69 2.78 CRF ASR 1-best 0.67 2.75 N-grams ASR 1-best 0.69 1.79 N-grams ASR 2-best 0.70 1.72 Weighted N-grams Weighted N-grams Weighted N- grams + Context ASR 10-best 0.71 1.76 Confusion Network Confusion Network choice of classifier not so important 0.73 1.68 0.77 1.43 1-best incurs significant information loss! M. Henderson, et al (2012). "Discriminative Spoken Language Understanding Using Word Confusion Networks." IEEE SLT 2012, Miami, FL 6

Belief Tracking inform( price=cheap, food=italian, area=east) [0.7] Belief Tracker Restaurant Hotel Bar Italian Indian Dontcare North East South Expensive Cheap Dontcare confirm-request(food) Type Food Area Price Aim: to maintain a distribution over all dialogue state variables using SLU output at each turn as evidence 3 principal approaches: rule-based dynamic Bayesian network discriminative model (eg RNN) 7

Dynamic Bayesian Networks (DBNs) Goal User Behaviour gtype gfood Ontology type = bar, restaurant, hotel food = french, chinese, italian, All nodes conditioned by previous action and previous time-slice User Act Memory History utype htype Recognition Errors ufood hfood Next time slice t+1 Observation at time t otype ofood I m looking for an Indian restaurant B. Thomson and S. Young (2010). "Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems." Computer Speech and Language,24(4): 562-588. [CSL 2015 Best paper Award] 8

Recurrent Neural Net Belief Tracking Last System Action SLU (ASR) N-grams Recurrent Neural Network Belief state Memory Word-Based Dialog State Tracking with Recurrent Neural Networks M. Henderson, B. Thomson and S. Young, SigDial 2014, Philadelphia, PA 9

Belief Tracking Performance Cambridge Restaurant System (Dialog State Tracking Challenge 2): Telephone data, various conditions, 20% to 40% average word error rate (WER) 1612 training dialogs, 1117 test dialogs Joint Slot Accuracy (fraction of turns in which all goal labels are correct) Joint L2 (L2 norm between tracker output distribution and reference) System Features Accuracy L2 Baseline SLU 61.6% 0.74 discriminative Bayes Net SLU 67.5% tracker 0.55 significantly better than generative tracker Delex RNN SLU 73.7% 0.41 Full RNN SLU 74.2% 0.39 Delex RNN ASR 74.6% 0.38 Full RNN ASR 76.8% 0.35 intermediate semantic representation incurs more information loss! The Second Dialog State Tracking Challenge M. Henderson, B. Thomson and J. Williams, SigDial 2014, Philadelphia, PA 10

Dialog Management Restaurant Hotel Bar Italian Indian Dontcare Type Food Area Price belief state b North East South Expensive Cheap Dontcare Policy π Decision Logic Reward Function confirm-request(food) a π(a b) τ R = γ τ 1 r(a τ,b τ ) Partially Observable Markov Decision Process action at each turn is function of belief state b policy optimised by maximising expected cumulative reward R trained on corpora, user simulator or on-line Exact solutions intractable, but wide range of approximations: gradient ascent directly on policy π (NAC) maximise GP approximation of Q-function (GP-SARSA) S. Young, M. Gasic, B. Thomson and J. Williams (2013). "POMDP-based Statistical Spoken Dialogue Systems: a Review." Proc IEEE, 101(5):1160-1179 11

Natural Actor-Critic π(a b,θ) = e θ.φ a (b) e θ.φ a' (b) a' Action specific features φ a (b) defined on b Policy defined directly on softmax θ.φ a (b) J (θ ) = E T 1 r(b t,a t ) π t θ Cost function is sum over observed per turn rewards Optimise using natural gradient ascent! J (θ ) = F θ 1 J (θ ) Gradient is estimated by sampling dialogues so Fisher Information Matrix does not need to be explicitly computed. F. Jurcicek, B. Thomson and S. Young (2011). "Natural Actor and Belief Critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs." ACM Transactions on Speech and Language Processing, 7(3) 12

GP-SARSA Q 0 π (b,a) ~ GP(0,k((b,a),(b,a))) Q π (b,a) = E π (R) is expected total reward R following policy π from point (b,a) Given trajectory B t = (b 1,a 1 ),...,(b t,a t ) and rewards r t = r 1,...,r t posterior is Q t π (b,a) r t,b t ~ N(Q(b,a),cov((b,a),(b,a))) GP-SARSA Reinforcement Learning Choose: Update: Observe: Update: a t+1 Q t π (b t,a t ) b t b t+1 r t+1 Q π Q π t t+1 Gaussian processes for POMDP-based dialogue manager optimization - M. Gasic and S. Young (2014). IEEE Trans. Audio, Speech and Language Processing, 22(1):28-40. 13

Dialog Manager Performance Cambridge Restaurant System: Reward = +20 for success -1 per turn User simulator-based training, 100k dialogs Telephone-based on-line training, 1200 dialogs Telephone-based real-user testing, 500 dialogs Telephone speech recognition, 20% average word error rate (WER) Method Training Reward Success Rate #Turns similar NAC Simulator 11.9 91.8% performance 6.5 but GP must faster GP-Sarsa Simulator 11.6 91.2% 6.6 GP-Sarsa On-line 13.4 96.8% 6.0 Learning from real interactions makes significant difference S. Young, et al (2014). "Evaluation of Statistical POMDP-based Dialogue Systems in Noisy Environments." International Workshop Spoken Dialogue Systems (IWSDS 2014), Napa, CA 14

Natural Language Generation confirm-request(food) Response Planner confirm-request( price=cheap, area=east, food=?) Message Generator You d like a cheap restaurant on the east side of town? What kind of food would you like? 3 principal approaches: hand-crafting with parameterised templates generative linguistic rules data driven using over-generate and filter approach 15

Constrained RNN Generation Inform(name=Seven_Days,1 food=chinese) 0,#0,#1,#0,#0,#,#1,#0,#0,#,#1,#0,#0,#0,#0,#0 dialog'act'1+hot representation </s> SLOT_NAME serves1111111111slot_food. </s> </s> Seven1Days serves1111111111111chinese. </s> RNN trained on data pairs consisting of a) 1-hot representation of system dialog act b) corresponding delexicalised output utterance T-H. Wen et al (2015). "Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking." Sigdial 2015, Prague, Cz. [Best paper award] 16

Generation Performance BLEU Score 1 Slot Error 10.0% 0.8 8.0% 0.6 6.0% 0.4 4.0% 0.2 2.0% 0 Rules Class-LM LSTM SC-LSTM 0.0% T-H. Wen et al (2015). "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems." EMNLP 2015, Lisbon, Portugal. [Best paper award] 17

Open-Domain Statistical Dialogue 18

Objectives To develop spoken dialogue systems which: 1. allow users to reference multiple domains within a single conversation 2. supports natural conversation even in rarely visited domains 3. can learn automatically on-line through interaction with user Deploy, Collect Data, Improve Note that user in the loop enables on-line reinforcement learning 19

Example Dialogue Active Topics Gen Cal Train Taxi Hello, how can I help you? What appointments do I have tomorrow? You have a meeting at 10am with John and a teleconf at noon with Bill. I need to go to London first thing, can you reschedule the meeting with John? John is free tomorrow at 3pm, is that ok? Yes, thats fine. I also need a taxi to the station. Meeting with John at 15.00 is confirmed. What time do you need the taxi? When does the train depart to London? The 9.15am gets in at 10.06. When is the one before that? The train before that leaves at 8.45am and arrives at 9.40. Ok I will take that, book the taxi for 8.15am from my house. Ok, I will book the taxi for 8.15am, is that correct? Yes that's right. Ok. Do you need anything else? Not for now thanks. 20

Run-time Architecture Belief State Manager What appointments..go to London.... need a taxi.. Speech Input Topic DM Cal DM Train DM Taxi DM Qi(b,a) Committee Manager a * NLG Speech Output Domain Factory Ontology 21

Distributed Dialog Management Each DM operates independently, receives speech, tracks its own beliefs and proposes system actions DM s operate as a Bayesian Committee Machine, each machine s Q-value has a confidence attached to it: Q(b,a) = Σ Q (b,a) M i=1 Σ i Q (b,a) 1 Q i (b,a) Σ Q (b,a) 1 = (M 1)* k((b,a),(b,a)) 1 + M i=1 Σ i Q (b,a) 1 Reinforcement learning operates on the group, distributing rewards at each turn according to previous action selection. Modular, flexible, incremental, trainable on-line, M. Gasic et al (2015). Policy Committee for Adaptation in Multi-Domain Spoken Dialogue Systems." IEEE ASRU 15, Scotsdale, AZ. 22

Incremental Domain Learning Initially pool all available data and learn generic models venue DH+DR MH MR MV hotel restaurant 23

Incremental Domain Learning Refine with more data using generic models as priors venue MV Mv is now a prior for MH and MR MH hotel restaurant MR DH DR M. Gasic et al (2015). "Distributed Dialogue Policies for Multi-Domain Statistical Dialogue Management." IEEE ICASSP 15, Brisbane, Sydney. 24

Performance of Generic Policies Strategy #Dialogs Restaurant Hotel in-domain 250 62.5% 64.3% in-domain 500 67.5% 70.1% generic 500 73.0% 76.2% i.e. 250 from each domain in-domain 2500 83.9% 85.9% in-domain 5000 86.4% 86.9% generic 5000 86.5% 87.1% Success rates averaged over 10 policies and 1000 dialogues per condition Distributed Dialogue Policies for Multi-Domain Statistical Dialogue Management M. Gasic, D. Kim, P. Tsiakoulis and S. Young, Proc ICASSP, Brisbane, 2015 25

On-line Adaptation with Real Users San Francisco Restaurant Domain a) with generic prior b) no prior Performance is acceptable after only 50 dialogues in the new domain. 26

Conclusions End-to-end statistical dialogue is feasible, and can match or exceed hand-crafted systems in limited domains User-in-loop makes on-line learning feasible, even for previously unseen domains Distributed hierarchical models, with generic parameters and committees of experts enable systems to learn to expand coverage whilst avoiding unacceptable user experience. Focus today has been on expanding dialogue management. Current work suggests that similar ideas extend to SLU and NLG. 27

CUED Dialogue Systems Group Current Steve Young Milica Gasic David Vandyke Lina Rojas-Barahona Nikola Mrksic Eddy Su Shawn Wen Stefan Ultes* Past Blaise Thomson, Apple Dongho Kim, Apple Matt Henderson, Google Prof Kai Yu, SJTU Jason Williams, Microsoft Pirros Tsiakoulis, Innoetics Ltd Francois Mairesse, Amazon Catherine Breslin, Amazon *starting Jan 2016 Prof Filip Jurcicek, Charles U. 28

Deep Learning - Seq2Seq Models Thought Vector W X Y Z </s> A B C </s> W X Y Z Key strengths: automatic feature extraction ability to compactly encode sequence information But hard to build a practical system without pulling out and explicit action set and without individually trainable modules. 29