Deep Learning for Natural Language Processing

Similar documents
Lecture 1: Machine Learning Basics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Python Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

arxiv: v1 [cs.lg] 15 Jun 2015

Second Exam: Natural Language Parsing with Neural Networks

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Model Ensemble for Click Prediction in Bing Search Ads

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Artificial Neural Networks written examination

arxiv: v1 [cs.lg] 7 Apr 2015

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Residual Stacking of RNNs for Neural Machine Translation

arxiv: v1 [cs.cv] 10 May 2017

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v4 [cs.cl] 28 Mar 2016

Speech Recognition at ICSI: Broadcast News and beyond

THE world surrounding us involves multiple modalities

Learning Methods for Fuzzy Systems

Generative models and adversarial training

Dropout improves Recurrent Neural Networks for Handwriting Recognition

A study of speaker adaptation for DNN-based speech synthesis

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Attributed Social Network Embedding

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.cl] 2 Apr 2017

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Knowledge Transfer in Deep Convolutional Neural Nets

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

(Sub)Gradient Descent

Speech Emotion Recognition Using Support Vector Machine

Calibration of Confidence Measures in Speech Recognition

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Word Segmentation of Off-line Handwritten Documents

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Dialog-based Language Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evolution of Symbolisation in Chimpanzees and Neural Nets

Chapter 9 Banked gap-filling

Human Emotion Recognition From Speech

Deep Neural Network Language Models

Function Tables With The Magic Function Machine

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Georgetown University at TREC 2017 Dynamic Domain Track

Encoding. Retrieval. Forgetting. Physiology of Memory. Systems and Types of Memory

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Axiom 2013 Team Description Paper

Cultivating DNN Diversity for Large Scale Video Labelling

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

A Reinforcement Learning Variant for Control Scheduling

Lecture 10: Reinforcement Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Cross Language Information Retrieval

CSL465/603 - Machine Learning

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

arxiv: v2 [cs.ir] 22 Aug 2016

arxiv: v1 [cs.cl] 27 Apr 2016

A Vector Space Approach for Aspect-Based Sentiment Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Speaker Identification by Comparison of Smart Methods. Abstract

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Semi-Supervised Face Detection

Deep Facial Action Unit Recognition from Partially Labeled Data

An empirical study of learning speed in backpropagation

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Test Effort Estimation Using Neural Network

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Forget catastrophic forgetting: AI that learns after deployment

Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital

Artificial Neural Networks

MATH Study Skills Workshop

Lip Reading in Profile

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Adaptive learning based on cognitive load using artificial intelligence and electroencephalography

arxiv: v2 [cs.cv] 3 Aug 2017

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v4 [cs.cv] 13 Aug 2017

arxiv: v3 [cs.cl] 7 Feb 2017

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

California Department of Education English Language Development Standards for Grade 8

Assignment 1: Predicting Amazon Review Ratings

Transcription:

Deep Learning for Natural Language Processing

Topics Word embeddings Recurrent neural networks Long-short-term memory networks Neural machine translation Automatically generating image captions

Word meaning in NLP How do we capture meaning and context of words? Synonyms: Synechdoche: I loved the movie. Today, Washington affirmed I adored the movie. its opposition to the trade Homonyms: I deposited the money in the bank. I buried the money in the bank. pact. Polysemy: I read a book today. I wasn t able to book the hotel room.

Word Embeddings One of the most successful ideas of modern NLP. One example: Google s Word2Vec algorithm

Word2Vec algorithm

Word2Vec algorithm Input: One-hot representa.on of input word over vocabulary 10,000 units

Word2Vec algorithm Hidden layer (linear ac.va.on func.on) 300 units Input: One-hot representa.on of input word over vocabulary 10,000 units

Word2Vec algorithm Output: Probability (for each word w i in vocabulary) that w i is nearby the input word in a sentence. 10,000 units Hidden layer (linear ac.va.on func.on) 300 units Input: One-hot representa.on of input word over vocabulary 10,000 units

Word2Vec algorithm Output: Probability (for each word w i in vocabulary) that w i is nearby the input word in a sentence. 10,000 units Hidden layer (linear ac.va.on func.on) 300 units 10,000 300 weights 300 10,000 weights Input: One-hot representa.on of input word over vocabulary 10,000 units

Word2Vec training Training corpus of documents Collect pairs of nearby words Example document : Every morning she drinks Starbucks coffee. Training pairs (window size = 3): (every, morning) (morning, drinks) (drinks, Starbucks) (every, she) (she, drinks) (drinks, coffee) (morning, she) (she, Starbucks) (Starbucks, coffee)

Word2Vec training via backpropagation Starbucks Target (probability that Starbucks is nearby drinks ) 300 10,000 weights Linear ac<va<on func<on 10,000 300 weights drinks

Word2Vec training via backpropagation coffee Target (probability that coffee is nearby drinks ) 300 10,000 weights Linear ac<va<on func<on 10,000 300 weights drinks

Learned word vectors 10,000 300 weights drinks

Some surprising results of word2vec h@p://www.aclweb.org/anthology/n13-1#page=784

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

Word embeddings demo http://bionlp-www.utu.fi/wv_demo/

Recurrent Neural Network (RNN) From http://axon.cs.byu.edu/~martinez/classes/678/slides/recurrent.pptx

Recurrent Neural Network unfolded in time From http://eric-yuan.me/rnn2-lstm/ Training algorithm: Backpropagation in time

Encoder-decoder (or sequence-to-sequence ) networks for translation h@p://book.paddlepaddle.org/08.machine_transla.on/image/encoder_decoder_en.png

Problem for RNNs: learning long-term dependencies. The cat that my mother s sister took to Hawaii the year before last when you were in high school is now living with my cousin. Backpropagation through time: problem of vanishing gradients

Long Short Term Memory (LSTM) A neuron with a complicated memory gating structure. Replaces ordinary hidden neurons in RNNs. Designed to avoid the long-term dependency problem

Long-Short-Term-Memory (LSTM) Unit Simple RNN (hidden) unit LSTM (hidden) unit From h@ps://deeplearning4j.org/lstm.html

Comments on LSTMs LSTM unit replaces simple RNN unit LSTM internal weights still trained with backpropagation Cell value has feedback loop: can remember value indefinitely Function of gates ( input, forget, output ) is learned via minimizing loss

Google Neural Machine Translation : (unfolded in time) From https://arxiv.org/pdf/1609.08144.pdf

Neural Machine Translation: Training: Maximum likelihood, using gradient descent on weights θ * = argmax θ log P(X Y, θ ) X,Y Trained on very large corpus of parallel texts in source (X) and target (Y) languages.

How to evaluate automated translations? Human raters side-by-side comparisons: Scale of 0 to 6 0: completely nonsense translation 2: the sentence preserves some of the meaning of the source sentence but misses significant parts 4: the sentence retains most of the meaning of the source sentence, but may have some grammar mistakes 6: perfect translation: the meaning of the translation is completely consistent with the source, and the grammar is correct.

Results from Human Raters

Automating Image Captioning

Automating Image Captioning Training: Large dataset of image/cap.on pairs from Flickr and other sources CNN features SoFmax probability distribu<on over vocabulary Word embeddings Words in cap<on Vinyals et al., Show and Tell: A Neural Image Cap.on Generator, CVPR 2015

NeuralTalk sample results From h@p://cs.stanford.edu/people/karpathy/deepimagesent/genera.ondemo/

Microsoft Captionbot https://www.captionbot.ai/