YUN- N U N G ( V I V I A N ) C H E N

Size: px

Start display at page:

Download "YUN- N U N G ( V I V I A N ) C H E N"

Ralph Terry
6 years ago
Views:

1 YUN- N U N G ( V I V I A N ) C H E N H T T P : / / V I V I A N C H E N. I D V. T W H A K K A N I - T U R, T U R, G A O, D E N G 1

2 Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 2

3 Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 3

Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating

4 Spoken Dialogue System (SDS) Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). JARVIS Iron Man s Personal Assistant Baymax Personal Healthcare Companion Good intelligent assistants help users to organize and access information conveniently 4

5 Dialogue System Pipeline Speech Signal ASR Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Screen Display location? Text response Where are you located? Output Generation System Action request_locaion Language Understanding (LU) User Intent Detection Slot Filling Semantic Frame (Intents, Slots) request_movie genre=action date=this weekend Dialogue Management (DM) Dialogue State Tracking Policy Decision 5

6 Success Rate End-to-End Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen LU Importance Learning Curve of System Performance Upper Bound DQN DQN Rule Rule Simulation Epoch RL Agent w/o LU errors Rule Agent w/o LU errors 6

7 Success Rate End-to-End Memory Networks for Multi-Turn Spoken Language Understanding Yun-Nung (Vivian) Chen LU Importance Learning Curve of System Performance Upper Bound DQN DQN Rule Rule Simulation Epoch RL Agent w/o LU errors RL Agent w/ 5% LU errors >5% performance drop Rule Agent w/o LU errors Rule Agent w/ 5% LU errors The system performance is sensitive to LU errors, for both rule-based and reinforcement learning agents. 7

8 Dialogue System Pipeline Speech Signal ASR Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Screen Display location? Text response Where are you located? Output Generation System Action request_locaion current bottleneck error propagation Language Understanding (LU) User Intent Detection Slot Filling Semantic Frame (Intents, Slots) request_movie genre=action date=this weekend Dialogue Management (DM) Dialogue State Tracking Policy Decision SLU usually focuses on understanding single-turn utterances The understanding result is usually influenced by 1) local observations 2) global knowledge. 8

Spoken Language Understanding Domain Identification Intent Prediction Slot Filling D I U S communication send_email just sent email to bob about fishing this weekend O O O O O B-contact_name

9 Spoken Language Understanding Domain Identification Intent Prediction Slot Filling D I U S communication send_ just sent to bob about fishing this weekend O O O O O B-contact_name B-subject I-subject I-subject send_ (contact_name= bob, subject= fishing this weekend ) U 1 S 1 U 2 send to bob B-contact_name send_ (contact_name= bob ) are we going to fish this weekend S B-message I-message I-message I-message 2 I-message I-message I-message send_ (message= are we going to fish this weekend ) 9

10 Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 10

11 MODEL ARCHITECTURE 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Contextual Sentence Encoder RNN mem x 1 x 2 x i history utterances {x i } p i m i c Knowledge Attention Distribution current utterance Memory Representation Sentence Encoder RNN in x 1 x 2 x i u Inner Product Weighted Sum h RNN Tagger W kg Knowledge Encoding Representation slot tagging sequence y V y t-1 h t-1 o V h t W W W U U M w t-1 M y t w t Idea: additionally incorporating contextual knowledge during slot tagging Chen, et al., End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in Interspeech,

12 MODEL ARCHITECTURE 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Contextual Sentence Encoder RNN mem CNN x 1 x 2 x i history utterances {x i } p i m i c Knowledge Attention Distribution current utterance Memory Representation Sentence Encoder RNN in CNN x 1 x 2 x i u Inner Product Weighted Sum h RNN Tagger W kg Knowledge Encoding Representation slot tagging sequence y V y t-1 h t-1 o V h t W W W U U M w t-1 M y t w t Idea: additionally incorporating contextual knowledge during slot tagging Chen, et al., End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in Interspeech,

13 END-TO-END TRAINING Tagging Objective slot tag sequence RNN Tagger contextual utterances & current utterance o y t-1 y t y t+1 V V V h t-1 h t h t+1 W W W W M U M U M U w t-1 w t w t+1 Automatically figure out the attention distribution without explicit supervision 13

14 Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 14

15 EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Knowledge Sentence Training Set First Turn Other Overall Encoding Encoder single-turn x x The model trained on single-turn data performs worse for non-first turns due to mismatched training data 15

16 EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x multi-turn x x Treating multi-turn data as single-turn for training performs reasonable 16

17 EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Encoder- Tagger Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x multi-turn x x multi-turn current utt (c) RNN multi-turn history + current (x, c) RNN Encoding current and history utterances improves the performance but increases the training time 17

18 EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall RNN Tagger single-turn x x multi-turn x x Encoder- multi-turn current utt (c) RNN Tagger multi-turn history + current (x, c) RNN Proposed multi-turn history + current (x, c) RNN Applying memory networks significantly outperforms all approaches with much less training time 18

19 EXPERIMENTS Dataset: Cortana communication session data GRU for all RNN adam optimizer embedding dim=150 hidden unit=100 dropout=0.5 Model RNN Tagger Encoder- Tagger Proposed NEW! NOT IN THE PAPER! Training Set Knowledge Sentence First Turn Encoding Encoder Other Overall single-turn x x multi-turn x x multi-turn current utt (c) RNN multi-turn history + current (x, c) RNN multi-turn history + current (x, c) RNN multi-turn history + current (x, c) CNN CNN produces comparable results for sentence encoding with shorter training time 19

20 Outline Introduction Spoken Dialogue System Spoken/Natural Language Understanding (SLU/NLU) Contextual Spoken Language Understanding Model Architecture End-to-End Training Experiments Conclusion & Future Work 20

21 Conclusion The proposed end-to-end memory networks store contextual knowledge, which can be exploited dynamically based on an attention model for manipulating knowledge carryover for multi-turn understanding The end-to-end model performs the tagging task instead of classification The experiments show the feasibility and robustness of modeling knowledge carryover through memory networks 21

22 Future Work Leveraging not only local observation but also global knowledge for better language understanding Syntax or semantics can serve as global knowledge to guide the understanding model Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks, arxiv preprint arxiv:

23 Q & A T H A N K S F O R Y O U R AT T E N T I O N! The code will be available at 23

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering