LEARNING REPRESENTATIONS FOR TEXT-LEVEL DISCOURSE PARSING

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Python Machine Learning

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Second Exam: Natural Language Parsing with Neural Networks

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A deep architecture for non-projective dependency parsing

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Assignment 1: Predicting Amazon Review Ratings

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Beyond the Pipeline: Discrete Optimization in NLP

arxiv: v1 [cs.cv] 10 May 2017

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

A Vector Space Approach for Aspect-Based Sentiment Analysis

Ensemble Technique Utilization for Indonesian Dependency Parser

Prediction of Maximal Projection for Semantic Role Labeling

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Lecture 1: Basic Concepts of Machine Learning

arxiv: v4 [cs.cl] 28 Mar 2016

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

THE world surrounding us involves multiple modalities

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CSL465/603 - Machine Learning

arxiv: v1 [cs.cl] 20 Jul 2015

ARNE - A tool for Namend Entity Recognition from Arabic Text

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Lecture 1: Machine Learning Basics

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Firms and Markets Saturdays Summer I 2014

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

SEMAFOR: Frame Argument Resolution with Log-Linear Models

The stages of event extraction

arxiv: v1 [cs.lg] 7 Apr 2015

Generative models and adversarial training

Device Independence and Extensibility in Gesture Recognition

(Sub)Gradient Descent

Context Free Grammars. Many slides from Michael Collins

A study of speaker adaptation for DNN-based speech synthesis

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Attributed Social Network Embedding

Linking Task: Identifying authors and book titles in verbose queries

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture

Model Ensemble for Click Prediction in Bing Search Ads

AQUA: An Ontology-Driven Question Answering System

arxiv: v2 [cs.ir] 22 Aug 2016

Learning Computational Grammars

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

The Smart/Empire TIPSTER IR System

Residual Stacking of RNNs for Neural Machine Translation

Probabilistic Latent Semantic Analysis

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v2 [cs.cv] 30 Mar 2017

Applications of memory-based natural language processing

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Word Segmentation of Off-line Handwritten Documents

arxiv: v1 [cs.lg] 15 Jun 2015

Deep Neural Network Language Models

Human Emotion Recognition From Speech

Learning Methods for Fuzzy Systems

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Indian Institute of Technology, Kanpur

Learning to Schedule Straight-Line Code

Developing a TT-MCTAG for German with an RCG-based Parser

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The Strong Minimalist Thesis and Bounded Optimality

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

BYLINE [Heng Ji, Computer Science Department, New York University,

arxiv: v5 [cs.ai] 18 Aug 2015

Modeling function word errors in DNN-HMM based LVCSR systems

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Distant Supervised Relation Extraction with Wikipedia and Freebase

SARDNET: A Self-Organizing Feature Map for Sequences

Artificial Neural Networks written examination

Axiom 2013 Team Description Paper

Online Updating of Word Representations for Part-of-Speech Tagging

Using Semantic Relations to Refine Coreference Decisions

Guide to Teaching Computer Science

Modeling function word errors in DNN-HMM based LVCSR systems

Boosting Named Entity Recognition with Neural Character Embeddings

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The Discourse Anaphoric Properties of Connectives

Parsing of part-of-speech tagged Assamese Texts

Using dialogue context to improve parsing performance in dialogue systems

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Transcription:

THESIS PROPOSAL LEARNING REPRESENTATIONS FOR TEXT-LEVEL DISCOURSE PARSING Copyright 2015 gw0 [ http://gw.tnode.com/ ] < gw.2015@tnode.com>

OVERVIEW motivation discourse parsing PDTB style deep learning architectures sequence processing word embeddings our approach key ideas guided layer wise multi task learning progress

MOTIVATION natural language processing (NLP) large pipelines of independently constructed components or subtasks traditionally hand engineered sparse features based on language/domain/task specific knowledge still room for improvement on challenging NLP tasks deep learning architectures backpropagation could be the one learning algorithm to unify learning of all components latent features/representations are automatically learned as distributed dense vectors surprising results for a number of NLP tasks

DISCOURSE PARSING discourse: a piece of text meant to communicate specific information (clauses, sentences, or even paragraphs) understood only in relation to other discourses, their joint meaning is larger than individual unit's meaning alone [Index arbitrage doesn't work], arg1 and [it scares natural buyers of stock]. arg2 PDTB style, id: 14883, type: explicit, sense: Expansion.Conjunction [But] arg2 if [this prompts others to consider the same thing], then [it may become much more important]. PDTB style, id: 14905, type: explicit, sense: Contingency.Condition arg2 arg1

PDTB-STYLE EXAMPLES He added [that "having just one firm do this isn't going to mean a hill of beans]. arg1 But [if this prompts others to consider the same thing, then it may become much more important]." arg2 PDTB style, id: 14904, type: explicit, sense: Comparison.Concession In addition, Black & Decker had said it would sell two other undisclosed Emhart operations if it received the right price. [Bostic is one of the previously unnamed units, and the first of the five to be sold.] arg1 [The company is still negotiating the sales of the other four units and expects to announce agreements by the end of the year] arg1. [The five units generated sales of about $1.3 billion in 1988, almost half of Emhart's $2.3 billion revenue]. Bostic posted 1988 sales of $255 million. arg2 PDTB style, id: 12886, type: entrel, sense: EntRel

PDTB-STYLE DISCOURSE PARSING Penn Discourse Treebank adopts the predicate argument view and independence of discourse relations 2159 articles from the Wall Street Journal 4 discourse sense classes, 16 types, 23 subtypes also called shallow discourse parsing discourse relations are not connected to each another to form a connected structure (tree or graph) adjacent/non adjacent units in same/different sentences primary goals locate explicit or implicit discourse connective locate text spans for argument 1 and 2 predict sense that characterizes the nature of the relation

DEEP LEARNING ARCHITECTURES multiple layers of learning blocks stacked on each other beginning with raw data, its representation is transformed into increasingly higher and more abstract forms in each layer, until final low dimensional features for a given task

SEQUENCE PROCESSING Text documents of different lengths are usually treated as a sequence of words: transition based processing mechanisms recurrent neural networks (RNNs) applying the same set of weights over the sequence (temporal dimension) or structure (tree based)

WORD EMBEDDINGS Represent text as numeric vectors of fixed size: word embeddings: SGNS (word2vec), GloVe,... feature/phrase/document embeddings character level convolutional networks Unsupervised pre training helps develop natural abstractions. Sharing word embedding in multi task learning improves their performance in the absence of hand engineered features.

OUR APPROACH PDTB style end to end discourse parser one deep learning architecture instead of multiple independently constructed components almost without any hand engineered NLP knowledge Input: tokenized text documents (from CoNLL 2015 shared task) Output: extracted PDTB style discourse relations connectives arguments 1 and 2 discourse senses

KEY IDEAS unified end to end architecture backpropagation as the one learning algorithm for all discourse parsing subtasks and related NLP tasks automatic learning of representations in hidden layers of deep learning architectures (bidirectional deep RNN/LSTM) shared intermediate representations partially stacked on top of each other to benefit from each others representations guided layer wise multi task learning jointly learning all discourse parsing subtasks and related NLP tasks including unsupervised pre training

GUIDED LAYER-WISE MULTI-TASK LEARNING

PROGRESS technology Python Theano: fast tensor manipulation library Keras: modular neural network library resources and inputs pre trained word2vec lookup table (on Google News) tokenized text documents as input POS tags of input tokens evaluation (from CoNLL 2015 shared task) performance in terms of precision/recall/f1 score explicit connectives, argument 1, 2 and combined extraction, sense classification, overall

COMPLICATION OR USEFUL? Experiments with single task learning with bidirectional deep RNN for discourse sense tagging:

SINGLE-TASK RESULTS long training time for randomly initialized weights lower tasks improve initialization overfitting training data more tasks improve generalization FUTURE EXPERIMENTS various discourse parsing subtasks various related NLP tasks (chunking, POS, NER, SRL,...) different representation structures different activation, optimization, architectures long short term memory (LSTM) neural Turing machines (NTM)

DOES IT MAKE SENSE? I would like to hear your feedback and ideas for my thesis proposal. THANK YOU http://gw.tnode.com/deep learning/acl2015 presentation/ Copyright 2015 gw0 [ http://gw.tnode.com/ ] < gw.2015@tnode.com>