Deep Learning. Mohammad Ali Keyvanrad Lecture 17: Neural Text Generation

Similar documents
Residual Stacking of RNNs for Neural Machine Translation

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Second Exam: Natural Language Parsing with Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Python Machine Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v4 [cs.cl] 28 Mar 2016

Cross Language Information Retrieval

Georgetown University at TREC 2017 Dynamic Domain Track

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Deep Neural Network Language Models

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Lip Reading in Profile

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A study of speaker adaptation for DNN-based speech synthesis

INPE São José dos Campos

Artificial Neural Networks written examination

THE world surrounding us involves multiple modalities

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

(Sub)Gradient Descent

Learning Methods in Multilingual Speech Recognition

Lecture 10: Reinforcement Learning

Calibration of Confidence Measures in Speech Recognition

Learning Methods for Fuzzy Systems

Speech Recognition at ICSI: Broadcast News and beyond

A heuristic framework for pivot-based bilingual dictionary induction

Knowledge Transfer in Deep Convolutional Neural Nets

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

arxiv: v3 [cs.cl] 7 Feb 2017

arxiv: v1 [cs.lg] 7 Apr 2015

Human Emotion Recognition From Speech

CSL465/603 - Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Axiom 2013 Team Description Paper

arxiv: v5 [cs.ai] 18 Aug 2015

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Indian Institute of Technology, Kanpur

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.cl] 2 Apr 2017

A Case Study: News Classification Based on Term Frequency

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

SARDNET: A Self-Organizing Feature Map for Sequences

CS 598 Natural Language Processing

Softprop: Softmax Neural Network Backpropagation Learning

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v1 [cs.cv] 10 May 2017

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Attributed Social Network Embedding

Constructing Parallel Corpus from Movie Subtitles

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

ON THE USE OF WORD EMBEDDINGS ALONE TO

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Machine Learning Basics

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

arxiv: v2 [cs.cl] 26 Mar 2015

Beyond the Pipeline: Discrete Optimization in NLP

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Detecting English-French Cognates Using Orthographic Edit Distance

An OO Framework for building Intelligence and Learning properties in Software Agents

arxiv: v1 [cs.cl] 27 Apr 2016

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

An Online Handwriting Recognition System For Turkish

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Test Effort Estimation Using Neural Network

Modeling function word errors in DNN-HMM based LVCSR systems

Truth Inference in Crowdsourcing: Is the Problem Solved?

On the Formation of Phoneme Categories in DNN Acoustic Models

A Reinforcement Learning Variant for Control Scheduling

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Word Segmentation of Off-line Handwritten Documents

Overview of the 3rd Workshop on Asian Translation

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Generative models and adversarial training

arxiv: v3 [cs.cl] 24 Apr 2017

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A deep architecture for non-projective dependency parsing

Speaker Identification by Comparison of Smart Methods. Abstract

Grade 6: Module 2A: Unit 2: Lesson 8 Mid-Unit 3 Assessment: Analyzing Structure and Theme in Stanza 4 of If

Cultivating DNN Diversity for Large Scale Video Labelling

Transcription:

Deep Learning Mohammad Ali Keyvanrad Lecture 17: Neural Text Generation

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 2

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 3

Introduction Predominant techniques for text generation Template or rule-based systems Require infeasible amounts of hand-engineering Deep learning recently achieved great empirical success on some text generation tasks. Using end-to-end neural network models An encoder model to produce a hidden representation of the source text Followed by a decoder model to generate the target 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 4

Introduction Modeling discrete sequences of text tokens Given a sequence U = (u 1, u 2,, u S ) General Form of model Input sequence X Output sequence Y 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 5

Introduction For example : machine translation tasks X might be a sentence in English Y the translated sentence in Chinese 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 6

Introduction Other examples Task language modeling machine translation grammar correction summarization dialogue speech transcription image captioning question answering X (example) none (empty sequence) source sequence in English noisy, ungrammatical sentence body of news article conversation history audio / speech features image supporting text + knowledge base + question Y (example) tokens from news corpus target sequence in French corrected sentence headline of article next response in turn text transcript caption describing image answer 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 7

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 8

Machine Translation The classic test of language understanding Both language analysis & generation Translation is a US$40 billion a year industry Huge commercial use Google translates over 100 billion words a day Facebook ebay 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 9

Machine Translation Machine Translation A naive word-based system would completely fail location of subject, verb, Historical Approaches were based on probabilistic models Translation model: telling us what a sentence/phrase in a source language most likely translates into Language model: telling us how likely a given sentence/phrase is overall. LSTMs can generate arbitrary output sequences after seeing the entire input They can even focus in on specific parts of the input automatically 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 10

Progress in Machine Translation 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 11

Neural Machine Translation Neural Machine Translation The approach of modeling the entire MT process via one big artificial neural network Sometimes we compromise this goal a little 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 12

Neural MT: The Bronze Age En-Es translator Constructed on 31 En, 40 Es words Max 10 word sentence Binary encoding of words 50 inputs, 66 outputs 1 or 3 hidden 150-unit layers Ave WER: 1.3 words [Allen 1987 IEEE 1st ICNN] 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 13

Neural Machine Translation Sequence-to-sequence (Seq2Seq) model An end-to-end model made up of two recurrent neural networks (or LSTM) Encoder: takes the model s input sequence as input and encodes it into a fixed-size "context vector Decoder: uses the context vector from above as a "seed from which to generate an output sequence. Seq2Seq models are often referred to as "encoder decoder models" 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 14

Neural Machine Translation Seq2Seq architecture encoder Read the input sequence to Seq2Seq model and generate a fixed-dimensional context vector C Encoder will use a recurrent neural network cell usually an LSTM to read the input tokens 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 15

Neural Machine Translation It s so difficult to compress an arbitrary-length sequence into a single fixed-size vector encoder will usually consist of stacked LSTMs The final layer s LSTM hidden state will be used as C. [Sutskever et al. 2014] 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 16

Neural Machine Translation A deep recurrent neural network [Sutskever et al. 2014] 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 17

Neural Machine Translation Process the input sequence in reverse Last thing that the encoder sees will (roughly) corresponds to the first thing that the model outputs This makes it easier for the decoder to "get started" on the output Once it has the first few words translated correctly, it s much easier to go on to construct a correct sentence 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 18

Neural Machine Translation Seq2Seq architecture decoder The decoder is also an LSTM network We ll run all layers of LSTM, one after the other, following up with a softmax on the final We pass output word into the first layer Both the encoder and decoder are trained at the same time 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 19

Four big wins of Neural MT End-to-end training All parameters are simultaneously optimized to minimize a loss function on the network s output Distributed representations share strength Better exploitation of word and phrase similarities Better exploitation of context NMT can use a much bigger context both source and partial target text to translate more accurately More fluent text generation Deep learning text generation is much higher quality 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 20

Neural Machine Translation NMT aggressively rolled out by industry! 2016/02: Microsoft launches deep neural network MT running offline on Android/iOS. 2016/08: Systran launches purely NMT model One of the oldest machine translation companies that has done extensive work for the United States Department of Defense. 2016/09: Google launches NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 21

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 22

Bidirectional LSTM A word can have a dependency on another word before or after it. Bidirectional LSTM fix this problem Traversing a sequence in both directions The hidden states are concatenated to get the final context vector 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 23

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 24

Attention Mechanism Vanilla seq2seq & long sentences Problem: fixed-dimensional representation Y 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 25

Attention Mechanism Solution Pool of source states 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 26

Attention Mechanism Word alignments Phrase-based SMT aligned words in a preprocessing-step, usually using EM 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 27

Attention Mechanism Learning both translation & alignment 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 28

Attention Mechanism Different parts of an input have different levels of significance. Example: the ball is on the field "ball, "on, and "field, are the words that are most important Different parts of the output may even consider different parts of the input "important The first word of output is usually based on the first few words of the input The last word is likely based on the last few words of input Attention mechanisms make use of this observation. 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 29

Attention Mechanism Attention mechanisms Decoder network look at the entire input sequence at every decoding step Decoder can then decide what input words are important at any point in time 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 30

Attention Mechanism Our input is a sequence of words x 1,..., x n that we want to translate Our target sentence is a sequence of words y 1,..., y m Encoder Capture contextual representation of each word in the sentence All h 1,..., h n are the hidden vectors representing the input sentence These vectors are the output of a bi-lstm for instance 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 31

Attention Mechanism Decoder We want to compute the hidden states s i of the decoder S i 1 is the previous hidden vector Y i 1 is the generated word at the previous step c i is a context vector that capture the context from the original sentence context vector captures relevant information for the i-th decoding time step unlike the standard Seq2Seq in which there s only one context vector 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 32

Attention Mechanism For each hidden vector from the original sentence, compute a score Alignment model: a is any function with values in R for instance a single layer fully-connected neural network Computing the context vector c i weighted average of the hidden vectors from the original sentence The vector α i is called the attention vector 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 33

Attention Mechanism The graphical illustration of the proposed model generate the t-th target word y t given a source sentence (x 1 ; x 2 ; ; x T ) 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 34

Attention Mechanism Attention vector for machine translation English to French Each pixel shows the weight α ij of the annotation of the j-th source word for the ith target word 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 35

Attention Mechanism Alignment model Needs to be evaluated T x T y times for each sentence In order to reduce computation, we use a single layer multilayer perceptron Hidden Layer ( neuron) e t,1 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 36

Attention Mechanism Global vs. Local Avoid focusing on everything at each time 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 37

Attention Mechanism The major advantage of attention-based models is their ability to efficiently translate long sentences. Minh-Thang Luong, 2015] 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 38

OUTLINE Introduction Machine Translation Bidirectional LSTM Attention Mechanism Google s Multilingual NMT 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 39

Google s Multilingual NMT State-of-the-art in Neural Machine Translation (NMT) Bilingual 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 40

Google s Multilingual NMT State-of-the-art in Neural Machine Translation (NMT) Multilingual 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 41

Google s Multilingual NMT Google s Multilingual NMT System Simplicity: single model Low-resource language improvements Zero-shot translation Translate between language pairs it has never seen in this combination Train: Portuguese English + English Spanish Test: Portuguese Spanish 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 42

Google s Multilingual NMT Architecture 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 43

Google s Multilingual NMT A token at the beginning of the input sentence to indicate the target language 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 44

Dealing with the large output vocabulary NMT systems have a hard time dealing with large vocabulary size softmax can be quite expensive to compute Scaling softmax Hierarchical Softmax Reducing vocabulary simply limit the vocabulary size to a small number and replace words outside the vocabulary with a tag <UNK> Handling unknown words 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 45

References Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arxiv preprint arxiv:1409.0473 (2014). Thang Luong, Hieu Pham, and Chris Manning. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 15. Johnson, Melvin, et al. "Google's multilingual neural machine translation system: enabling zero-shot translation." arxiv preprint arxiv:1611.04558 (2016). Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arxiv preprint arxiv:1609.08144 (2016). 12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 46

12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 47

12/24/2017 M.A Keyvanrad Deep Learning (Lecture17-Neural Text Generation) 48