Relation Classification with Gated Recursive Convolutional Networks

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Python Machine Learning

Residual Stacking of RNNs for Neural Machine Translation

arxiv: v4 [cs.cl] 28 Mar 2016

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Second Exam: Natural Language Parsing with Neural Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Lecture 1: Machine Learning Basics

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v1 [cs.cl] 27 Apr 2016

Dialog-based Language Learning

arxiv: v1 [cs.cv] 10 May 2017

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

arxiv: v2 [cs.ir] 22 Aug 2016

SARDNET: A Self-Organizing Feature Map for Sequences

arxiv: v1 [cs.lg] 15 Jun 2015

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

ON THE USE OF WORD EMBEDDINGS ALONE TO

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Knowledge Transfer in Deep Convolutional Neural Nets

Lip Reading in Profile

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Artificial Neural Networks written examination

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Model Ensemble for Click Prediction in Bing Search Ads

Softprop: Softmax Neural Network Backpropagation Learning

A study of speaker adaptation for DNN-based speech synthesis

Attributed Social Network Embedding

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Modeling function word errors in DNN-HMM based LVCSR systems

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

On the Formation of Phoneme Categories in DNN Acoustic Models

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

(Sub)Gradient Descent

SORT: Second-Order Response Transform for Visual Recognition

INPE São José dos Campos

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Human Emotion Recognition From Speech

Modeling function word errors in DNN-HMM based LVCSR systems

Cultivating DNN Diversity for Large Scale Video Labelling

Learning Methods for Fuzzy Systems

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

THE enormous growth of unstructured data, including

arxiv: v3 [cs.cl] 7 Feb 2017

Evolutive Neural Net Fuzzy Filtering: Basic Description

GACE Computer Science Assessment Test at a Glance

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

A Deep Bag-of-Features Model for Music Auto-Tagging

Word Segmentation of Off-line Handwritten Documents

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

A Review: Speech Recognition with Deep Learning Methods

Speech Recognition at ICSI: Broadcast News and beyond

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Probabilistic Latent Semantic Analysis

Generative models and adversarial training

Georgetown University at TREC 2017 Dynamic Domain Track

A deep architecture for non-projective dependency parsing

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v5 [cs.ai] 18 Aug 2015

Improvements to the Pruning Behavior of DNN Acoustic Models

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An empirical study of learning speed in backpropagation

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Using focal point learning to improve human machine tacit coordination

Calibration of Confidence Measures in Speech Recognition

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Reinforcement Learning Variant for Control Scheduling

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

arxiv: v1 [cs.cv] 2 Jun 2017

Rule Learning With Negation: Issues Regarding Effectiveness

Assignment 1: Predicting Amazon Review Ratings

Indian Institute of Technology, Kanpur

Offline Writer Identification Using Convolutional Neural Network Activation Features

arxiv: v4 [cs.cv] 13 Aug 2017

Australian Journal of Basic and Applied Sciences

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

A Pipelined Approach for Iterative Software Process Model

THE world surrounding us involves multiple modalities

Deep Neural Network Language Models

Diverse Concept-Level Features for Multi-Object Classification

Learning From the Past with Experiment Databases

arxiv: v2 [cs.cl] 26 Mar 2015

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

Transcription:

Relation Classification with Gated Recursive Convolutional Networks Karl-Heinz Krachenfels CIS, LMU-Munich, Germany February 21, 2017 Abstract In this work we investigate variants of recursive Convolutional Networks (rcnns) for a relation classification task. We give a short intuition why recursive Convolutional Networks (rcnns) are well suited for tasks dependent on structural patterns. We then show that we can improve the performance substantially by adding a gating logic to our convolutional architecture. The task is taken from exercise 7 from the Deep Learning master course at CIS-LMU Munich, WS 2016/17. 1 Task Description and Encoding The task is to classify a given sentence that contains a relation. The sentence is already preprocessed and consists of a left context (5 words), a middle context (10 words) and a right context (5 words) separated by a query tag and an arg tag (see figure 1). The task is to classify the input into one of 5 given relation classes. Because the convolutional network does not get the input sequentially and the weight matrix is shared throughout the convolutional network (in contrast to the convolutional architecture that we used in the lecture where we had 3 segments with different weight matrices) we add 12 padding symbols to our encoding. In this way the convolutional operators can determine in which context a word occurs as soon as possible in the information flow through the network layers. Figure 1: symbols Encoding with 12 different padding 2 Motivation and Intuition for Recursive Approach 2.1 Intuition for Recursive Network Figure 2 shows our intuition why a recursive architecture is a good representation for structural dependencies. On the left side we see a production of a grammar that is mapped to a convolutional unit (in the mid of the figure) and the combination of multiple such units to a recursive network (on the right side). The intuition is that 1

the production C AB can be reduced multiple times if the representation occurs on multiple levels. The intuition is based on the assumption that the representation of the symbols stays constant in the different layers. We believe it is one of the main challenges to build neural architectures that enforce the stability of feature representations in deep networks as precondition for this type of recursive networks to work. Figure 3: Gating Unit, from Cho et al. (2014) Figure 2: Production, Convolutional Unit, multiple Layers of Recursive Convolution 2.2 Motivation for Gating Cho et al. (2014) investigate gated recursive CNNs (grcnns) as an alternative for the encoder in encoder-decoder based neural translation systems. The idea of the gating is to either pass through the left or right input of the convolutional cell or to apply the convolution followed by a sigmoid function denoted by h in the drawing (figure 3, left side). One of the intuitions is that the convolutional network with gating logic automatically learns the structure (figure 4). Figure 4: Learning structure, from Cho et al. (2014) 2

3 Experiments 3.1 Experiments from Lecture We mapped all words that occurred less frequent than the 1000 most frequent words to an <UNK> symbol. The reason to do so was to avoid overfitting and to reduce training times. The focus of this work was the comparison of architectures and our hardware was quite limited to learn deep networks (27 layers for our deepest topology, see below) we did not investigate excessively in hyperparameter optimization and trained only for one epoch. We trained on the 366565 training samples from the Deep Learning course, exercise 7 and tested with 746 samples. We achieved 65.2 % accuracy for the convolutional architecture with 3 convolutional segments followed by a maxpooling layer and a softmax layer (figure 5) and 73.5 % accuracy for the variant with LSTMs consisting of a LSTM layer followed by a full connected layer and a softmax layer (figure 6). Figure 6: Topology with LSTM (unfolded) 3.2 Pyramidal Recursive CNN Topology The pyramidal recursive CNN is a multi layer CNN where all convolutional units on all layers share the weights (figure 7). For simplicity reasons we use the same input encoding for the pyramidal topology and the binary topology in the next section. Due to this reason the encoding is not optimal because we filled up with additional padding symbols until we reached the input length of 32. The topology has 24 recursive convolutional layers and the top level convolutional layer has 8 convolutional units. The input and output dimension of the convolutional cell and the embedding dimension in all our experiments with recursive CNNs is 50. On top we added a full connected layer with 20 hidden units followed by a softmax layer with 5 output units for the predicted classes. 3.3 Experiments with Pyramidal Recursive CNNs Figure 5: Topology with 3 Segment CNN We implemented the logic of the recursive CNN in theano. The implementation contains a software switch to toggle between the gated and non gated variant. We measured accuracy of 54.3% for the variant without gating and 73.9% accuracy for the variant with gating. 3

Figure 8: Binary Recursive CNN Figure 7: Pyramidal Recursive CNN 3.4 4 4.1 Binary Recursive CNN Topology Discussion Conclusion This work shows that recursive convolutional networks without maxpooling can solve classification tasks that depend on structure with a state of the art performance when the convolutional operators are combined with a gating logic. In our experiments we outperformed the LSTM variant by around 3% (76.3% vs. 73.5% accuracy). Even the deep network with 24 convolutional layers worked on a state of the art level but the intuition that this network represents the structure well and is thus superior did not hold. One of the reasons might be that it suffers from the depth of the network which might be caused by the vanishing/exploding gradient problem - although the gating helps a bit. A research direction might be to combine grcnns with techniques to build very deep networks. Specifically Highway Networks, Srivastava et al. (2015) and Residual Networks, He et al. (2016) are candidates for such architectures. As a further variant we evaluated a binary recursive CNN topology. This architecture is presumably not able to represent an arbitrary structure but is well capable of representing order dependent features. The topology is shown in figure 8. We performed the experiments again with a gated and a non gated variant of our recursive convolution architecture. We achieved accuracy of 56.4% without gates and an accuracy of 76.3% with the gated variant. 4

4.2 Critics and Future Work A critics might be that the feature size does not grow and the network is not capable to model richer and higher level features. The remedy to this could be a recursive convolution operation that reuses the weight matrix from the last layer but adds new entries per layer and so increases the dimensionality of the features steadily which could be seen as a compromise between feature stability on one side and inventing new features on the other side. Another research aspect is that the gating could be applied element wise for each feature as in the work of Dauphin et al. (2016). In our experiments all convolutions were binary convolutions - it would also be an option to investigate broader convolutions or combine K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. 2014. arxiv preprint arxiv:1409.1259, Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier. Language modeling with gated convolutional networks. arxiv preprint arxiv:1612.08083, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770 778, 2016. R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arxiv preprint arxiv:1505.00387, 2015. 5 Appendix 5.1 Parameters The following parameters were used for all experiments with recursive CNNs: Figure 9: Improved gating variant the convolution with broader inputs for the gating logic as shown in figure 9. learning rate=0.1 l1 reg=0.00001 emb size = 50 hidden layer size=20 conv fan in=conv fan out=50 Learning Mode: SGD References 5