Discriminative Neural Sentence Modeling by Tree-Based Convolution

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Python Machine Learning

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A Vector Space Approach for Aspect-Based Sentiment Analysis

Second Exam: Natural Language Parsing with Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Word Embedding Based Correlation Model for Question/Answer Matching

Assignment 1: Predicting Amazon Review Ratings

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

arxiv: v5 [cs.ai] 18 Aug 2015

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Australian Journal of Basic and Applied Sciences

ON THE USE OF WORD EMBEDDINGS ALONE TO

Generative models and adversarial training

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

arxiv: v4 [cs.cl] 28 Mar 2016

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

arxiv: v1 [cs.cv] 10 May 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

arxiv: v1 [cs.lg] 15 Jun 2015

MYCIN. The MYCIN Task

arxiv: v1 [cs.cl] 20 Jul 2015

Lecture 1: Machine Learning Basics

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

CS Machine Learning

arxiv: v2 [cs.cl] 26 Mar 2015

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Residual Stacking of RNNs for Neural Machine Translation

THE world surrounding us involves multiple modalities

There are some definitions for what Word

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Knowledge Transfer in Deep Convolutional Neural Nets

Semantic and Context-aware Linguistic Model for Bias Detection

A deep architecture for non-projective dependency parsing

Probing for semantic evidence of composition by means of simple classification tasks

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.cv] 30 Mar 2017

Rule Learning with Negation: Issues Regarding Effectiveness

Multivariate k-nearest Neighbor Regression for Time Series data -

Dialog-based Language Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning From the Past with Experiment Databases

arxiv: v3 [cs.cl] 7 Feb 2017

Evolutive Neural Net Fuzzy Filtering: Basic Description

arxiv: v1 [cs.cl] 27 Apr 2016

arxiv: v2 [cs.ir] 22 Aug 2016

Discriminative Learning of Beam-Search Heuristics for Planning

Georgetown University at TREC 2017 Dynamic Domain Track

Linking Task: Identifying authors and book titles in verbose queries

Speech Emotion Recognition Using Support Vector Machine

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A survey of multi-view machine learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Word Segmentation of Off-line Handwritten Documents

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Indian Institute of Technology, Kanpur

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v1 [cs.lg] 7 Apr 2015

Reducing Features to Improve Bug Prediction

Human Emotion Recognition From Speech

A Comparison of Two Text Representations for Sentiment Analysis

Semi-Supervised Face Detection

Online Updating of Word Representations for Part-of-Speech Tagging

Deep Neural Network Language Models

Calibration of Confidence Measures in Speech Recognition

Lecture 1: Basic Concepts of Machine Learning

INPE São José dos Campos

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Deep Facial Action Unit Recognition from Partially Labeled Data

A Reinforcement Learning Variant for Control Scheduling

SARDNET: A Self-Organizing Feature Map for Sequences

Conference Presentation

Beyond the Pipeline: Discrete Optimization in NLP

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Learning Methods in Multilingual Speech Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

SORT: Second-Order Response Transform for Visual Recognition

Transcription:

Discriminative Neural Sentence Modeling by Lili Mou, 1 Hao Peng, 1 Ge Li, Yan Xu, Lu Zhang, Zhi Jin Software Institute, Peking University, P. R. China EMNLP, Lisbon, Portugal September, 2015

Outline 1 2 c-tbcnn d-tbcnn 3 Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 4

Outline 1 2 c-tbcnn d-tbcnn 3 Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 4

Sentence Modeling Sentence modeling To capture the meaning of a sentence Related to various tasks in NLP [Kalchbrenner et al., 2014] Sentiment analysis Paraphrase detection Language-image matching Our focus: discriminative sentence modeling Classify a sentence according to a certain criterion

An Example Sentiment analysis A movie review An idealistic love story that brings out the latent 15-year-old romantic in everyone. The sentiment? Positive Neutral Negative

Feature Engineering Bag-of-words n-gram More dedicated ones, e.g.,[silva et al., 2011]... Problem: Sentence modeling is usually NON-TRIVIAL Example [Socher et al., 2011] white blood cells destroying an infection an infection destroying white blood cells Kernel Machines, e.g., SVM + Circumvent explicit feature representation Crucial to design the kernel function, which summarizes all data information

Neural networks Automatic feature learning Word embeddings [Mikolov et al., 2013] Paragraph vectors [Le and Mikolov, 2014] Prevailing neural sentence models Convolutional neural networks (CNNs) [Collobert and Weston, 2008] Recursive neural networks (RNNs) [Socher et al., 2011] A variant: Recurrent neural networks

Convolutional Neural Networks (CNNs) Effective feature learning Unable to capture tree structural information

Are tree structures necessary for deep learning of representations? Example [Pinker, 1994] The dog the stick the fire burned beat bit the cat. If if if it rains it pours I get depressed I should get help. That that that he left is apparent is clear is obvious.

CNNs versus Sentence Structures

Recursive Neural Networks (RNNs) + Structure-sensitive Long propagation path

Long Propagation Path Burying illuminating information under complicated structure Gradient blowup or vanishing

Our Intuition Can we combine the merits of CNNs and RNNs Having short propagation path like CNNs Capturing structure info like RNNs Our solution: al Neural Network (TBCNN)

Outline c-tbcnn d-tbcnn 1 2 c-tbcnn d-tbcnn 3 Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 4

Architecture of TBCNN c-tbcnn d-tbcnn

Technical Points c-tbcnn d-tbcnn How to Represent nodes as vectors in consistency trees? How to Handle nodes with different numbers of children in dependency trees? How to Pool over varying sized and shaped structures?

c-tbcnn c-tbcnn d-tbcnn Pretrain an RNN and fix Perform convolution E.g., A convolutional window of depth 2 i.e., a parent p with children l and r ( y = f W (c) p p + W (c) c l + W (c) c r + b (c)) l r

Remark on Complexity c-tbcnn d-tbcnn Exponential to the window depth Linear to the number of nodes Tree-based convolution does not add to complexity, But is less flexible than flat CNNs.

d-tbcnn c-tbcnn d-tbcnn Associate weights with dependency types (e.g., nsubj, dobj) rather than positions ( ) n y = f W p (d) p + W (d) r[c i ] c i + b (d) r[c i ]: relation of between p and c i i=1

Pooling Heuristics c-tbcnn d-tbcnn Global pooling 3-slot pooling for c-tbcnn k-slot pooling for d-tbcnn

Outline Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 1 2 c-tbcnn d-tbcnn 3 Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 4

Sentiment Analysis Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis Dataset Stanford sentiment tree bank 5 labels: + + / + /0/ / 8544/1101/2210 sentences, 150k phrases Our settings 5-way classification + binary classification Training: sentences + phrases Testing: sentences only Data samples Label Offers that rare combination of entertainment and education. ++ An idealistic love story that brings out the latent 15-year-old romantic in everyone. + Its mysteries are transparently obvious, and it s too slowly paced to be a thriller.

Group Method 5-class accuracy 2-class accuracy Baseline SVM 40.7 79.4 Naïve Bayes 41.0 81.8 1-layer convolution 37.4 77.1 CNNs Deep CNN 48.5 86.8 Non-static 48.0 87.2 Multichannel 47.4 88.1 Basic 43.2 82.4 Matrix-vector 44.4 82.9 RNNs Tensor 45.7 85.4 Tree LSTM 51.0 88.0 Deep RNN 49.8 86.6 Recurrent LSTM 45.8 86.7 bi-lstm 49.1 86.8 Vector Word vector avg. 32.7 80.1 Paragraph vector 48.7 87.8 TBCNNs c-tbcnn 50.4 86.8 d-tbcnn 51.4 87.9

Group Method 5-class accuracy 2-class accuracy Baseline SVM 40.7 79.4 Naïve Bayes 41.0 81.8 1-layer convolution 37.4 77.1 CNNs Deep CNN 48.5 86.8 Non-static 48.0 87.2 Multichannel 47.4 88.1 Basic 43.2 82.4 Matrix-vector 44.4 82.9 RNNs Tensor 45.7 85.4 Tree LSTM 51.0 88.0 Deep RNN 49.8 86.6 Recurrent LSTM 45.8 86.7 bi-lstm 49.1 86.8 Vector Word vector avg. 32.7 80.1 Paragraph vector 48.7 87.8 TBCNNs c-tbcnn 50.4 86.8 d-tbcnn 51.4 87.9

Question Classification Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis Dataset 5452 training + 500 test Labels abbreviation entity description human location numeric Data samples What is the temperature at the center of the earth? What state did the Battle of Bighorn take place in? Label number location

Results Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis Method Acc. (%) Reported in SVM 10k features + 60 rules 95.0 [Silva et al., 2011] CNN-non-static 93.6 [Kim, 2014] CNN-mutlichannel 92.2 [Kim, 2014] RNN 90.2 [Zhao et al., 2015] Deep-CNN 93.0 [Kalchbrenner et al., 2014] Ada-CNN 92.4 [Zhao et al., 2015] c-tbcnn 94.8 Our implementation d-tbcnn 96.0 Our implementation

Model Analysis: Pooling Methods Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis Model Pooling method 5-class accuracy (%) c-tbcnn Global 48.48 ± 0.54 3-slot 48.69 ± 0.40 d-tbcnn Global 49.39 ± 0.24 2-slot 49.94 ± 0.63 Remarks Averaged over 5 random initializations Hyperparameters predefined, less optimal

Model Analysis: Sentence Length Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 40 30 20 10 0 Accuracy (%)50 RNN c-tbcnn d-tbcnn 9 10 14 15 19 20 24 25 29 30 34 35 Setence length Reimplemented RNN: 42.7% accuracy, slightly lower than 43.2% reported in [Socher et al., 2011]

Visualization Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis The stunning dreamlike visual will impress even those who have little patience for Euro-film pretension.

Visualization Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis The stunning dreamlike visual will impress even those who have little patience for Euro-film pretension.

Visualization Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis The stunning dreamlike visual will impress even those who have little patience for Euro-film pretension.

Outline 1 2 c-tbcnn d-tbcnn 3 Experiment I: Sentiment Analysis Experiment II: Question Classification Model Analysis 4

Way of information propagation Iterative Sliding Structure Flat Recurrent Convolution Tree Recursive Tree-based convolution

Thank you for listening! Q & A

References Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine learning. Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Kim, Y. (2014). Convolutional neural networks for sentence classification. Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. Pinker, S. (1994). The Language Instinct: The New Science of Language and Mind. Pengiun Press.

Silva, J., Coheur, L., Mendes, A., and Wichert, A. (2011). From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review, 35(2):137 154. Socher, R., Pennington, J., Huang, E., Ng, A., and Manning, C. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Zhao, H., Lu, Z., and Poupart, P. (2015). Self-adaptive hierarchical sentence model. arxiv preprint arxiv:1504.05070, to appear in Proceedints of Intenational Joint Conference in Artificial Intelligence.