Deep Learning Introduction and Natural Language Processing Applications

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Lecture 1: Machine Learning Basics

Python Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Second Exam: Natural Language Parsing with Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Georgetown University at TREC 2017 Dynamic Domain Track

A Vector Space Approach for Aspect-Based Sentiment Analysis

Generative models and adversarial training

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

arxiv: v1 [cs.cl] 20 Jul 2015

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

Human Emotion Recognition From Speech

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Deep Neural Network Language Models

Probabilistic Latent Semantic Analysis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

arxiv: v2 [cs.ir] 22 Aug 2016

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Artificial Neural Networks written examination

ON THE USE OF WORD EMBEDDINGS ALONE TO

INPE São José dos Campos

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Speech Recognition at ICSI: Broadcast News and beyond

Reinforcement Learning by Comparing Immediate Reward

Attributed Social Network Embedding

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

arxiv: v1 [cs.cl] 2 Apr 2017

A deep architecture for non-projective dependency parsing

Residual Stacking of RNNs for Neural Machine Translation

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

THE world surrounding us involves multiple modalities

arxiv: v4 [cs.cl] 28 Mar 2016

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v2 [cs.ro] 3 Mar 2017

Lecture 2: Quantifiers and Approximation

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

arxiv: v3 [cs.cl] 7 Feb 2017

Model Ensemble for Click Prediction in Bing Search Ads

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A study of speaker adaptation for DNN-based speech synthesis

There are some definitions for what Word

arxiv: v2 [cs.cl] 26 Mar 2015

Switchboard Language Model Improvement with Conversational Data from Gigaword

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Probability and Statistics Curriculum Pacing Guide

On the Formation of Phoneme Categories in DNN Acoustic Models

Statewide Framework Document for:

WHEN THERE IS A mismatch between the acoustic

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v2 [cs.cv] 30 Mar 2017

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

End-of-Module Assessment Task

arxiv: v5 [cs.ai] 18 Aug 2015

(Sub)Gradient Descent

Learning Methods for Fuzzy Systems

Bluetooth mlearning Applications for the Classroom of the Future

School of Innovative Technologies and Engineering

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Lecture 1: Basic Concepts of Machine Learning

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Semantic and Context-aware Linguistic Model for Bias Detection

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Word Segmentation of Off-line Handwritten Documents

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

Cultivating DNN Diversity for Large Scale Video Labelling

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Natural Language Processing. George Konidaris

Lecture 10: Reinforcement Learning

Probing for semantic evidence of composition by means of simple classification tasks

arxiv: v1 [cs.cv] 10 May 2017

Indian Institute of Technology, Kanpur

arxiv: v1 [cs.lg] 7 Apr 2015

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Corpus Linguistics (L615)

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Dialog-based Language Learning

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Speaker Identification by Comparison of Smart Methods. Abstract

arxiv: v1 [cs.cl] 27 Apr 2016

Evolutive Neural Net Fuzzy Filtering: Basic Description

Transcription:

Deep Learning Introduction and Natural Language Processing Applications GMU CSI 899 Jim Simpson, PhD Jim.Simpson@Cynnovative.com 9/18/2017

Agenda Fundamentals Linear and Logistic Regression Logistic Regression to Neural Networks Neural Networks to Deep Learning Representation Learning with Deep Neural Networks Natural Language Processing Applications Word Embeddings/Vectors Word2Vec Language Models Long-Short-Term-Memory Recurrent Neural Networks Additional Reading 2

Definitions Deep Learning Models Are Neural Networks with more then one hidden layer Neural Networks Are two dimensional array of Logistic Regressors loosely inspired by how neurons are connected in the mammalian brain Deep Learning vs Traditional Machine Learning Deep Learning can learn complex non-linear relationships in the data Can do this without explicit manual feature engineering Adapts to all types of data (even unstructured images and natural language) 3

Regression Analysis Overview Linear Regression Dependent Variable (Predictions): Continuous Simple Case: Equation of a line y = β 0 + β 1 x Logistic Regression Dependent Variable (Predictions): Categorical Simple Case: Sigmoid function y = 1 1+e (β 0+β 1 x) Home prices from Square Footage Benign/ Malignant from Tumor Size Multiple Linear Regression: Y = β 0 + β 1 X 1 + β 2 X 2 Multiple Logistic Regression: ( ) Y logit(y )=ln = β 0 +β 1 X 1 +β 2 X 2 1 Y 4

Visual Representation of the Linear Model Two input dimensions are combined linearly to form single dimension output Y = β 0 + β 1 X 1 + β 2 X 2 X 1 β 1 + Y X 2 β 2 Inputs Output 5

Logistic Regression to Neural Networks Add extra steps between input and output X 1 β 1 + H 1 W 1 Y X 2 β 2 Inputs Hidden Unit Output With multiple dimensions X 1 β 1,1 H 1 W 1 β 1,2 Y β 2,1 X 2 H 2 β 2,2 W 2 Inputs Hidden Unit Output 6

Neural Networks with Hidden Units Add non-linearity through layering activation functions X 1 β 1,1 f(x) H 1 W 1 f(x) = tanh(x) β 1,2 Y X 2 β 2,1 f(x) H 2 β 2,2 W 2 Inputs Hidden Unit Output Advantages Adding these Hidden Units allows us to capture complex interactions between the variables, whereas we previously treated them as linearly independent The non-linearity on the Hidden Units results in a warping of the feature space that is hard to visualize but really beneficial Being able to choose the number of Hidden Units allows us to change the dimensionality of the problem, potentially making classification far easier in a higher-dimensional space 7

Neural Networks to Deep Learning Deep Learning uses Neural Networks with multiple hidden layers X 1 Y X 2 Inputs First Layer Second Layer Output Number of neurons per layer and number of layers become hyper-parameters Input Dimension e.g. Number of pixels in image X 1 X 2 Y Output Classes e.g. Numbers 0-9 in Digits Recognition Inputs First Layer Second Layer Output 8

Learning Non-linear Decision Boundaries Logistic Regression without Feature Engineering Logistic Regression without manual feature engineering is NOT able to separate blue dots from orange dots http://playground.tensorflow.org/#activation=relu&regularization=l2&batchsize=20&dataset=circle&regdataset=regplane&learningrate=0.1&regularizationrate=0.001&noise=0&networkshape=&seed=0.27923&showtestdata=false&discretize=false&perctraindata=80&x=true&y=true&xtimesy=false&xsquared=fal se&ysquared=false&cosx=false&sinx=false&cosy=false&siny=false&collectstats=false&problem=classification&initzero=false&hidetext=false 9

Learning Non-linear Decision Boundaries Logistic Regression with Manual Feature Engineering Adding additional hand derived features allows logistic regression to separate blue dots from orange dots http://playground.tensorflow.org/#activation=relu&regularization=l2&batchsize=20&dataset=circle&regdataset=regplane&learningrate=0.1&regularizationrate=0.001&noise=0&networkshape=&seed=0.27923&showtestdata=false&discretize=false&perctraindata=80&x=true&y=true&xtimesy=false&xsquared=tru e&ysquared=true&cosx=false&sinx=false&cosy=false&siny=false&collectstats=false&problem=classification&initzero=false&hidetext=false 10

Learning Non-linear Decision Boundaries Neural Network without Manual Feature Engineering A very simple neural network can separate the two without any manual feature engineering http://playground.tensorflow.org/#activation=relu&regularization=l2&batchsize=20&dataset=circle&regdataset=regplane&learningrate=0.1&regularizationrate=0.001&noise=0&networkshape=3&seed=0.27923&showtestdata=false&discretize=false&perctraindata=80&x=true&y=true&xtimesy=false&xsquared=fa lse&ysquared=false&cosx=false&sinx=false&cosy=false&siny=false&collectstats=false&problem=classification&initzero=false&hidetext=false 11

Learning Non-linear Decision Boundaries Deep Neural Network http://playground.tensorflow.org/#activation=relu&regularization=l2&batchsize=20&dataset=spiral&regdataset=regplane&learningrate=0.03&regularizationrate=0.001&noise=0&networkshape=8,8,6&seed=0.99514&showtestdata=false&discretize=false&perctraindata=80&x=true&y=true&xtimesy=false&xsquar 12 ed=false&ysquared=false&cosx=false&sinx=false&cosy=false&siny=false&collectstats=false&problem=classification&initzero=false&hidetext=false

Deep Learning Frameworks 13

Natural Language Processing (NLP) Tasks and Recurrent Neural Networks NLP Applications Sentiment Analysis Machine Translation Question Answering Dialogue Agents Language Generation Recurrent Neural Networks http://colah.github.io/posts/2015-08-understanding-lstms/ Common Across all Applications Recurrent Neural Networks (RNNs) Word Embeddings/Vectors http://cs224d.stanford.edu/lectures/cs224d-lecture8.pdf Recommended Resource: Stanford CS224d/n: Natural Language Processing with Deep Learning: http://web.stanford.edu/class/cs224n/ 14

Word Embeddings Problem: consider the sentence I made her duck Approach: Distributional Hypothesis You shall know a word by the company it keeps J. R. Firth Solution: Word Embeddings/Vectors https://www.tensorflow.org/tutorials/word2vec 15

Given a corpus with these three sentences I like deep learning. I like NLP. I enjoy flying. Word Vectors from Singular Value Decomposition of Co-Occurence Matrix Co-Occurrence Matrix Singular Value Decomposition Problems: Computation scales quadratically for n x m matrix: O(mn 2 ) Hard to add new words or documents 16

Word Vectors: Main Idea of Word2Vec Instead of capturing co-occurrence counts directly Predict surrounding words of every word In a window of length c of ever word Objective function: Maximize the log probability of any context word given current center word: Simplest first formulation for conditional probability: 17

Word2Vec: Skip-Gram with Negative Sampling Word2Vec embeds each word into a low-dimensional vector space using: Skip-Gram: Train for center word w I at time t in a local context window of length c Negative Sampling: Clever way to frame the problem as a supervised classification problem Maximize probability of: a true pair (the center word and word in its context window) Minimize probability of: a couple of random pairs (the center word and a random word outside context window) This simple logistic regression problem moves the vectors for the true pair closer This simple logistic regression problem moves the vectors for the random pairs apart 18

Reduced Dimensional (300-dim to 2-d) Word Vectors Trained on English Wikipedia Relationships Superlatives Named Entities Images using GloVe from Richard Socher available http://cs224d.stanford.edu 19

Language Model using Word Vectors A language model: Assigns probabilities to sentences (sequence of words) By predicting next word,, in a sentence given history of previous words coffee with cream and sugar It is a classification problem where the target class at each iteration is The model is trained to predict a probability distribution over the vocabulary The loss or error is the distance between the prediction and the target 20

Language Model using Neural Networks Trained using a Recurrent Neural Networks (RNNs): Neural networks with feedback loops, allowing information to persist Natural architecture for working with sequences With Long-Short-Term Memories (LSTMs): RNNs with more complex units To capture both long-term and short-term dependencies 21

Single Cell Visualization of Language Model trained on Linux Source Code http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 22

Single Cell Visualization of Language Model trained on Linux Source Code http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 23

Additional Reading Papers Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality Lin, Henry W., and Max Tegmark. "Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language." arxiv preprint arxiv:1606.06737 (2016). Blog Posts https://arxiv.org/abs/1606.06737v2 Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Chris Olah: Understanding LSTM Networks http://colah.github.io/posts/2015-08-understanding-lstms/ Chris Olah: Attention and Augmented Recurrent Neural Networks Code! https://distill.pub/2016/augmented-rnns/ Keras: https://github.com/fchollet/keras-resources TensorFlow: https://www.tensorflow.org/tutorials/ 24

Research Ideas Uncertainty of Predictions in Recurrent Neural Networks Gal, Yarin. Uncertainty in deep learning. Diss. PhD thesis, University of Cambridge, 2016. http://mlg.eng.cam.ac.uk/yarin/blog_2248.html Tom Wiecki: Bayesian Deep Learning http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/ Uber Engineering: Application Motivation https://eng.uber.com/neural-networks-uncertainty-estimation/ Distributed Deep Learning of Recurrent Neural Networks Scaling Out using Spark and Scaling Up using TensorFlow/Keras https://github.com/databricks/tensorframes https://github.com/yahoo/tensorflowonspark https://github.com/cerndb/dist-keras 25

BACKUP 26

Computer Vision Tasks and Convolutional Neural Networks Computer Vision Applications Image Classification Object Detection Semantic Segmentation Image Captioning Style Transfer Image Generation Convolutional Neural Networks http://timdettmers.com/2015/03/26/convolutiondeep-learning/ Common Across All Applications Convolutional Neural Networks https://www.researchgate.net/publication/281607765_hierarchical_deep_learning_architecture _For_10K_Objects_Classification Recommended Resource: Stanford CS231n: Convolutional Neural Networks for Visual Recognition: http://cs231n.stanford.edu 27

Convolutional Neural Networks 28

Convolutional Neural Networks 29

30