Deep Learning for Natural Language Processing

Size: px

Start display at page:

Download "Deep Learning for Natural Language Processing"

Dorthy Dawson
5 years ago
Views:

1 Deep Learning for Natural Language Processing

2 Topics Word embeddings Recurrent neural networks Long-short-term memory networks Neural machine translation Automatically generating image captions

3 Word meaning in NLP How do we capture meaning and context of words? Synonyms: Synechdoche: I loved the movie. Today, Washington affirmed I adored the movie. its opposition to the trade Homonyms: I deposited the money in the bank. I buried the money in the bank. pact. Polysemy: I read a book today. I wasn t able to book the hotel room.

4 Word Embeddings One of the most successful ideas of modern NLP. One example: Google s Word2Vec algorithm

5 Word2Vec algorithm

6 Word2Vec algorithm Input: One-hot representa.on of input word over vocabulary 10,000 units

7 Word2Vec algorithm Hidden layer (linear ac.va.on func.on) 300 units Input: One-hot representa.on of input word over vocabulary 10,000 units

8 Word2Vec algorithm Output: Probability (for each word w i in vocabulary) that w i is nearby the input word in a sentence. 10,000 units Hidden layer (linear ac.va.on func.on) 300 units Input: One-hot representa.on of input word over vocabulary 10,000 units

9 Word2Vec algorithm Output: Probability (for each word w i in vocabulary) that w i is nearby the input word in a sentence. 10,000 units Hidden layer (linear ac.va.on func.on) 300 units 10, weights ,000 weights Input: One-hot representa.on of input word over vocabulary 10,000 units

10 Word2Vec training Training corpus of documents Collect pairs of nearby words Example document : Every morning she drinks Starbucks coffee. Training pairs (window size = 3): (every, morning) (morning, drinks) (drinks, Starbucks) (every, she) (she, drinks) (drinks, coffee) (morning, she) (she, Starbucks) (Starbucks, coffee)

11 Word2Vec training via backpropagation Starbucks Target (probability that Starbucks is nearby drinks ) ,000 weights Linear ac<va<on func<on 10, weights drinks

12 Word2Vec training via backpropagation coffee Target (probability that coffee is nearby drinks ) ,000 weights Linear ac<va<on func<on 10, weights drinks

13 Learned word vectors 10, weights drinks

14 Some surprising results of word2vec

h@p://papers.nips.cc/paper/5021-distributed-representa.

18 Word embeddings demo

19 Recurrent Neural Network (RNN) From

20 Recurrent Neural Network unfolded in time From Training algorithm: Backpropagation in time

21 Encoder-decoder (or sequence-to-sequence ) networks for translation h@p://book.paddlepaddle.org/08.machine_transla.on/image/encoder_decoder_en.png

22 Problem for RNNs: learning long-term dependencies. The cat that my mother s sister took to Hawaii the year before last when you were in high school is now living with my cousin. Backpropagation through time: problem of vanishing gradients

23 Long Short Term Memory (LSTM) A neuron with a complicated memory gating structure. Replaces ordinary hidden neurons in RNNs. Designed to avoid the long-term dependency problem

24 Long-Short-Term-Memory (LSTM) Unit Simple RNN (hidden) unit LSTM (hidden) unit From

25 Comments on LSTMs LSTM unit replaces simple RNN unit LSTM internal weights still trained with backpropagation Cell value has feedback loop: can remember value indefinitely Function of gates ( input, forget, output ) is learned via minimizing loss

26 Google Neural Machine Translation : (unfolded in time) From

27 Neural Machine Translation: Training: Maximum likelihood, using gradient descent on weights θ * = argmax θ log P(X Y, θ ) X,Y Trained on very large corpus of parallel texts in source (X) and target (Y) languages.

28 How to evaluate automated translations? Human raters side-by-side comparisons: Scale of 0 to 6 0: completely nonsense translation 2: the sentence preserves some of the meaning of the source sentence but misses significant parts 4: the sentence retains most of the meaning of the source sentence, but may have some grammar mistakes 6: perfect translation: the meaning of the translation is completely consistent with the source, and the grammar is correct.

29 Results from Human Raters

31 Automating Image Captioning

32 Automating Image Captioning Training: Large dataset of image/cap.on pairs from Flickr and other sources CNN features SoFmax probability distribu<on over vocabulary Word embeddings Words in cap<on Vinyals et al., Show and Tell: A Neural Image Cap.on Generator, CVPR 2015

33 NeuralTalk sample results From

38 Microsoft Captionbot

Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3