Slides credited from Richard Socher

Similar documents
Natural Language Processing. George Konidaris

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Second Exam: Natural Language Parsing with Neural Networks

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Context Free Grammars. Many slides from Michael Collins

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

CS 598 Natural Language Processing

Proof Theory for Syntacticians

Lecture 1: Machine Learning Basics

Accurate Unlexicalized Parsing for Modern Hebrew

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Compositional Semantics

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

arxiv: v1 [cs.cv] 10 May 2017

arxiv: v5 [cs.ai] 18 Aug 2015

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

AQUA: An Ontology-Driven Question Answering System

Parsing of part-of-speech tagged Assamese Texts

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Prediction of Maximal Projection for Semantic Role Labeling

Python Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

A deep architecture for non-projective dependency parsing

Argument structure and theta roles

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Ensemble Technique Utilization for Indonesian Dependency Parser

arxiv: v1 [cs.cl] 20 Jul 2015

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Chapter 4: Valence & Agreement CSLI Publications

The stages of event extraction

LTAG-spinal and the Treebank

A Framework for Customizable Generation of Hypertext Presentations

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

An Interactive Intelligent Language Tutor Over The Internet

Knowledge Transfer in Deep Convolutional Neural Nets

Pre-Processing MRSes

A Vector Space Approach for Aspect-Based Sentiment Analysis

Formulaic Language and Fluency: ESL Teaching Applications

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Some Principles of Automated Natural Language Information Extraction

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Beyond the Pipeline: Discrete Optimization in NLP

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

The Smart/Empire TIPSTER IR System

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Using dialogue context to improve parsing performance in dialogue systems

Assignment 1: Predicting Amazon Review Ratings

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Deep Neural Network Language Models

arxiv: v2 [cs.ir] 22 Aug 2016

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Loughton School s curriculum evening. 28 th February 2017

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Control and Boundedness

An Introduction to the Minimalist Program

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Speech Recognition at ICSI: Broadcast News and beyond

Artificial Neural Networks written examination

The College Board Redesigned SAT Grade 12

Statewide Framework Document for:

Developing a TT-MCTAG for German with an RCG-based Parser

A Comparison of Two Text Representations for Sentiment Analysis

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.lg] 15 Jun 2015

Lecture 10: Reinforcement Learning

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Learning Methods for Fuzzy Systems

A Usage-Based Approach to Recursion in Sentence Processing

Learning Computational Grammars

Extracting Verb Expressions Implying Negative Opinions

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Specifying a shallow grammatical for parsing purposes

Age Effects on Syntactic Control in. Second Language Learning

Generative models and adversarial training

Probing for semantic evidence of composition by means of simple classification tasks

Transcription:

Slides credited from Richard Socher

Sequence Modeling Idea: aggregate the meaning from all words into a vector Compositionality Method: Basic combination: average, sum Neural combination: Recursive neural network (RvNN) Recurrent neural network (RNN) Convolutional neural network (CNN) How to compute 這 (this) 規格 (specification) 有 (have) 誠意 (sincerity) N-dim 2

Recursive Neural Network From Words to Phrases 3

Recursive Neural Network Idea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases Assumption: language is described recursively 4

Related Work for RvNN Pollack (1990): Recursive auto-associative memories Previous Recursive Neural Networks work by Goller & Küchler (1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors. Hinton (1990) and Bottou (2011): Related ideas about recursive models and recursive operators as smooth versions of logic operations 5

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 6

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 7

Phrase Mapping Principle of Compositionality The meaning (vector) of a sentence is determined by 1) the meanings of its words and 2) the rules that combine them 1 5 the country of my birth the place where I was born 1 3.5 5.5 6.1 2.5 3.8 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth Idea: jointly learn parse trees and compositional vector representations 8

Sentence Syntactic Parsing Parsing is a process of analyzing a string of symbols Parsing tree conveys 1) Part-of-speech for each word 2) Phrases 3) Relationships S VP PP NP NP DT NN VB IN DT NN The cat sat on the mat. (NN = noun, VB = verb, DT = determiner, IN = Preposition) 9

Sentence Syntactic Parsing Parsing is a process of analyzing a string of symbols Parsing tree conveys 1) Part-of-speech for each word 2) Phrases 3) Relationships S VP PP DT NP NN VB NP IN DT NN The cat sat on the mat. (NN = noun, VB = verb, DT = determiner, IN = Preposition) 10

Sentence Syntactic Parsing Parsing is a process of analyzing a string of symbols Parsing tree conveys 1) Part-of-speech for each word 2) Phrases Noun phrase (NP): the cat, the mat Preposition phrase (PP): on the mat Verb phrase (VP): sat on the mat NP Sentence: the cat sat on the mat DT NN VB 3) Relationships NP IN DT NN The cat sat on the mat. (NN = noun, VB = verb, DT = determiner, IN = Preposition) S VP PP 11

Sentence Syntactic Parsing Parsing is a process of analyzing a string of symbols Parsing tree conveys 1) Part-of-speech for each word 2) Phrases 3) Relationships subject verb modifier_of_place the cat is the subject of sat on the mat is the place modifier of sat DT NP NN VB VP NP IN DT NN The cat sat on the mat. (NN = noun, VB = verb, DT = determiner, IN = Preposition) S PP 12

Learning Structure & Representation Vector representations incorporate the meaning of words and their compositional structures S VP PP NP NP The cat sat on the mat. 13

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 14

Recursion Assumption Are languages recursive? debatable Recursion helps describe natural language Ex. the church which has nice windows, a noun phrase containing a relative clause that contains a noun phrases NP NP PP 15

Recursion Assumption Characteristics of recursion 1. Helpful in disambiguation 2. Helpful for some tasks to refer to specific phrases: John and Jane went to a big festival. They enjoyed the trip and the music there. they : John and Jane; the trip : went to a big festival; there : big festival 3. Works better for some tasks to use grammatical tree structure Language recursion is still up to debate 16

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 17

Recursive Neural Network Architecture A network is to predict the vectors along with the structure Input: two candidate children s vector representations Output: 1) vector representations for the merged node 2) score of how plausible the new node would be score PP Neural Network NP on the mat. 18

Recursive Neural Network Definition 1) vector representations for the merged node Neural Network 2) score of how plausible the new node would be same W parameters at all nodes of the tree weight-tied 19

Sentence Parsing via RvNN 3.1 0.3 0.1 0.4 2.3 Neural Network Neural Network Neural Network Neural Network Neural Network 20

Sentence Parsing via RvNN 1.1 Neural Network 0.1 0.4 2.3 Neural Network Neural Network Neural Network 21

Sentence Parsing via RvNN 1.1 3.6 Neural Network 0.1 Neural Network Neural Network 22

Sentence Parsing via RvNN 3.8 1.1 Neural Network Neural Network 23

Sentence Parsing via RvNN Sentence parsing score Sentence vector embeddings Neural Network 24

Backpropagation through Structure Principally the same as general backpropagation (Goller& Küchler, 1996) Backward Pass l i Three differences l a j x 1 j l 1 l 1 Sum derivatives of W from all nodes Split derivatives at each node Add error messages from parent + node itself Forward Pass 25

1) Sum derivatives of W from all nodes Neural Network Neural Network 26

2) Split derivatives at each node During forward propagation, the parent node is computed based on two children During backward propagation, the errors should be computed wrt each of them Neural Network Neural Network 27

3) Add error messages For each node, the error message is compose of Error propagated from parent Error from the current node Neural Network 28

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 29

Composition Matrix W Neural Network Issue: using the same network W for different compositions 30

Syntactically Untied RvNN Idea: the composition function is conditioned on the syntactic categories Neural Network Benefit Composition function are syntax-dependent Allows different composition functions for word pairs, e.g. Adv + AdjP, VP + NP Issue: speed due to many candidates 31

Compositional Vector Grammar Compute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013) Prunes very unlikely candidates for speed Provides coarse syntactic categories of the children for each beam candidate Probability context-free grammar (PCFG) helps decrease the search space Socher et al., Parsing with Compositional Vector Grammars, in ACL, 2013. 32

Labels for RvNN The score can be passed through a softmax function to compute the probability of each category NP x 1 x 2 x 3 y 1 y 2 y 3 softmax Neural Network x 4 Softmax loss cross-entropy error for optimization Socher et al., Parsing Natural Scenes and Natural Language with Recursive Neural Networks, in ICML, 2011. 33

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 34

Recursive Neural Network Neural Network Issue: some words act mostly as an operator, e.g. very in very good 35

Matrix-Vector Recursive Neural Network Neural Network Idea: each word can additionally serve as an operator 36

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 37

Recursive Neural Tensor Network Idea: allow more interactions of vectors 38

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 39

Language Compositionality 40

Image Compositionality Idea: image can be composed by the visual segments (same as natural language parsing) 41

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 42

Paraphrase for Learning Sentence Vectors A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings 43

Outline Property Syntactic Compositionality Recursion Assumption Network Architecture and Definition Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network Applications Parsing Paraphrase Detection Sentiment Analysis 44

Sentiment Analysis Sentiment analysis for sentences with negation words can benefit from RvNN 45

Sentiment Analysis Sentiment Treebank with richer annotations Phrase-level sentiment labels indeed improve the performance Socher et al., Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, in EMNLP, 2013. 46

Sentiment Tree Illustration Stanford live demo: http://nlp.stanford.edu/sentiment/ Phrase-level annotations learn the specific compositional functions for sentiment Socher et al., Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, in EMNLP, 2013. 47

Concluding Remarks Recursive Neural Network Idea: syntactic compositionality & language recursion Network Variants Standard Recursive Neural Network Weight-Tied Weight-Untied Matrix-Vector Recursive Neural Network Recursive Neural Tensor Network 48