Improving Neural Abstractive Text Summarization with Prior Knowledge

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Second Exam: Natural Language Parsing with Neural Networks

arxiv: v1 [cs.cl] 2 Apr 2017

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Summarizing Answers in Non-Factoid Community Question-Answering

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v4 [cs.cl] 28 Mar 2016

Probabilistic Latent Semantic Analysis

Linking Task: Identifying authors and book titles in verbose queries

Vocabulary Agreement Among Model Summaries And Source Documents 1

Residual Stacking of RNNs for Neural Machine Translation

THE world surrounding us involves multiple modalities

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

The Strong Minimalist Thesis and Bounded Optimality

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Attributed Social Network Embedding

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Deep Neural Network Language Models

Learning Methods for Fuzzy Systems

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Truth Inference in Crowdsourcing: Is the Problem Solved?

Natural Language Processing. George Konidaris

arxiv: v1 [cs.cv] 10 May 2017

Calibration of Confidence Measures in Speech Recognition

Assignment 1: Predicting Amazon Review Ratings

The stages of event extraction

arxiv: v2 [cs.ir] 22 Aug 2016

Radius STEM Readiness TM

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

Laboratorio di Intelligenza Artificiale e Robotica

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Lecture 10: Reinforcement Learning

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A Graph Based Authorship Identification Approach

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

AQUA: An Ontology-Driven Question Answering System

A Reinforcement Learning Variant for Control Scheduling

Georgetown University at TREC 2017 Dynamic Domain Track

Rule Learning With Negation: Issues Regarding Effectiveness

Dialog-based Language Learning

Prediction of Maximal Projection for Semantic Role Labeling

Lecture 1: Basic Concepts of Machine Learning

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Laboratorio di Intelligenza Artificiale e Robotica

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

CSL465/603 - Machine Learning

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

A Case Study: News Classification Based on Term Frequency

Lecture 1: Machine Learning Basics

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Variations of the Similarity Function of TextRank for Automated Summarization

CEFR Overall Illustrative English Proficiency Scales

Artificial Neural Networks written examination

HLTCOE at TREC 2013: Temporal Summarization

arxiv: v1 [cs.cl] 20 Jul 2015

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Let's Learn English Lesson Plan

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

1/25/2012. Common Core Georgia Performance Standards Grade 4 English Language Arts. Andria Bunner Sallie Mills ELA Program Specialists

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

TINE: A Metric to Assess MT Adequacy

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Methods in Multilingual Speech Recognition

Word Segmentation of Off-line Handwritten Documents

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

The Conversational User Interface

On document relevance and lexical cohesion between query terms

1.11 I Know What Do You Know?

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Disambiguation of Thai Personal Name from Online News Articles

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Knowledge Transfer in Deep Convolutional Neural Nets

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Indian Institute of Technology, Kanpur

Memory-based grammatical error correction

arxiv: v3 [cs.cl] 7 Feb 2017

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Abstractions and the Brain

Speech Recognition at ICSI: Broadcast News and beyond

The College Board Redesigned SAT Grade 12

Generative models and adversarial training

Visual CP Representation of Knowledge

Transcription:

Improving with Prior Knowledge Gaetano Rossiello, Pierpaolo Basile, Giovanni Semeraro, Marco Di Ciano and Gaetano Grasso gaetano.rossiello@uniba.it Department of Computer Science University of Bari - Aldo Moro, Italy URANIA 16-1st Italian Workshop on Deep Understanding and Reasoning: A challenge for Next-generation Intelligent Agents 28 November 2016 AI*IA 16 - Genoa, Italy

Text Summarization The goal of summarization is to produce a shorter version of a source text by preserving the meaning and the key contents of the original. A well written summary can significantly reduce the amount of cognitive work needed to digest large amounts of text.

Information Overload Information overload is a problem in modern digital society caused by the explosion of the amount of information produced on both the World Wide Web and the enterprise environments.

Text Summarization - Approaches Input Single-document Multi-document Output Extractive Abstract Headline Extractive Summarization The generated summary is a selection of relevant sentences from the source text in a copy-paste fashion. Abstractive Summarization The generated summary is a new cohesive text not necessarily present in the original source.

Extractive Summarization - Methods Statistical methods Features based Machine Learning Fuzzy Logic Graph based Distributional Semantic LSA (Latent Semantic Analysis) NMF (Non-Negative Matrix Factorization) Word2Vec

Abstractive Summarization: a Challenging Task Abstractive summarization requires deep understanding and reasoning over the text, determining the explicit or implicit meaning of each element, such as words, phrases, sentences and paragraphs, and making inferences about their properties a in order to generate new sentences which compose the summary a Norvig, P.: Inference in text understanding. AAAI, 1987. Abstractive Example Original: Russian defense minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Summary: Russia calls for joint front against terrorism.

Deep Learning for Abstractive Text Summarization Idea Casting the summarization task as a neural machine translation problem, where the models, trained on a large amount of data, learn the alignments between the input text and the target summary through an attention encoder-decoder paradigm. Rush, A., et al. A neural attention model for abstractive sentence summarization. EMNLP 2015 Nallapati, R., et al. Sequence-to-sequence RNNs for text summarization and Beyond. CoNNL 2016 Chopra, S., et al. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. NAACL 2016

Deep Learning for Abstractive Text Summarization 1 Rush, A., et al.: A neural attention model for abstractive sentence summarization. EMNLP 2015

Abstractive Summarization - Problem Formulation Let us consider: Original text x = {x 1, x 2,..., x n } Summary y = {y 1, y 2,..., y m } where n >> m and x i, y j V (V is the vocabulary) A probabilistic perspective goal The summarization problem consists in finding an output sequence y that maximizes the conditional probability of y given an input sequence x arg max y V P(y x) P(y x) = P(y x; θ) = y t=1 P(y t {y 1,..., y t 1 }, x; θ) where θ denotes a set of parameters learnt from a training set of source text and target summary pairs.

Recurrent Neural Networks Recurrent neural network (RNN) is a neural network model proposed in the 80 s for modelling time series. The structure of the network is similar to feedforward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle).

Sequence to Sequence Learning Sequence to sequence learning problem can be modeled by RNNs using a encoder-decoder paradigm. The encoder is a RNN that reads one token at time from the input source and returns a fixed-size vector representing the input text. The decoder is another RNN that generates words for the summary and it is conditioned by the vector representation returned by the first network.

Abstractive Summarization and Sequence to Sequence where: P(y x; θ) = P(y t {y 1,..., y t 1 }, x; θ) = g θ (h t, c) h t = g θ (y t 1, h t 1, c) The vector context c is the output of the encoder and it encodes the representation of the whole input source. g θ is a RNN and it can be modeled using: Elman RNN LSTM (Long-Short Term Memory) GRU (Gated Recurrent Unit) At the time t the decoder RNN computes the probability of the word y t given the last hidden state h t and the context input c.

Limits of the State-of-the-Art Neural Models The proposed neural attention-based models for abstractive summarization are still in an early stage, thus they show some limitations: Problems in distinguish rare and unknown words Grammar errors in the generated summaries Example Suppose that none of two tokens 10 and Genoa belong to the vocabulary, then the model cannot distinguish the probability of the two sentences: The airport is about 10 kilometers. The airport is about Genoa kilometers.

Infuse Prior Knowledge into Neural Networks Our Idea Infuse prior knowledge, such as linguistic features, into a RNNs in order to overtake these limits. Motivation: The airport DT NN is about VBZ IN? kilometers CD NNS where CD is the Part-of-Speech (POS) tag that identifies a cardinal number. Thus, 10 is the unknown token with the higher probability because it is tagged as CD. Introducing information about the syntactical role of each word, the neural network can tend to learn the right collocation of the words by belonging to a certain part-of-speech class.

Infuse Prior Knowledge into Neural Networks Preliminary approach: Combine hand-crafted linguistic features and embeddings as input vectors into RNNs. Substitute the softmax layer of neural network with a Log-Linear model.

Evaluation Plan - Dataset We plan to evaluate our models on gold-standard datasets for the summarization task: DUC (Document Understanding Conference) 2002-2007 1 TAC (Text Analysis Conference) 2008-2011 2 Gigaword 3 CNN/DailyMail 4 Cornell University Library 5 Local government documents 6 1 http://duc.nist.gov/ 2 http://tac.nist.gov/data/index.html 3 https://catalog.ldc.upenn.edu/ldc2012t21 4 https://github.com/deepmind/rc-data 5 https://arxiv.org/ 6 made available by InnovaPuglia S.p.A.

Evaluation Plan - Metric ROUGE (Recall-Oriented Understudy for Gisting Evaluation) ROUGE a metrics compare an automatically produced summary against a reference or a set of references (human-produced) summary. a Lin, Chin-Yew. ROUGE: a Package for Automatic Evaluation of Summaries. WAS 2004 ROUGE-N: N-gram based co-occurrence statistics. ROUGE-L: Longest Common Subsequence (LCS) based statistics. ROUGE N (X ) = S {Ref S {Ref Summaries} Summaries} gramn S count match(gram n,x ) gramn S count(gramn)

Future Works Evaluate the proposed approach by comparing it with the SOA models. Integrate relational semantic knowledge into RNNs in order to learn jointly word and knowledge embeddings by exploiting knowledge bases and lexical thesaurus. Abstractive summaries from whole documents or multiple documents.