Natural Language Processing SoSe Summarization. (based on the book of Jurafski and Martin 2009)

Similar documents
Variations of the Similarity Function of TextRank for Automated Summarization

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Columbia University at DUC 2004

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Summarizing Text Documents: Carnegie Mellon University 4616 Henry Street

AQUA: An Ontology-Driven Question Answering System

Linking Task: Identifying authors and book titles in verbose queries

HLTCOE at TREC 2013: Temporal Summarization

Cross Language Information Retrieval

The Smart/Empire TIPSTER IR System

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Probabilistic Latent Semantic Analysis

Cross-lingual Text Fragment Alignment using Divergence from Randomness

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCES

Vocabulary Agreement Among Model Summaries And Source Documents 1

Python Machine Learning

The Role of String Similarity Metrics in Ontology Alignment

Summarizing Answers in Non-Factoid Community Question-Answering

Applications of memory-based natural language processing

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Handling Sparsity for Verb Noun MWE Token Classification

Speech Recognition at ICSI: Broadcast News and beyond

Distant Supervised Relation Extraction with Wikipedia and Freebase

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Using dialogue context to improve parsing performance in dialogue systems

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

(Sub)Gradient Descent

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Detecting English-French Cognates Using Orthographic Edit Distance

Assignment 1: Predicting Amazon Review Ratings

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

On document relevance and lexical cohesion between query terms

A Graph Based Authorship Identification Approach

Using Semantic Relations to Refine Coreference Decisions

CS Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

PNR 2 : Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

BYLINE [Heng Ji, Computer Science Department, New York University,

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Case Study: News Classification Based on Term Frequency

Compositional Semantics

Finding Translations in Scanned Book Collections

Switchboard Language Model Improvement with Conversational Data from Gigaword

Prediction of Maximal Projection for Semantic Role Labeling

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

A Comparison of Two Text Representations for Sentiment Analysis

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Attributed Social Network Embedding

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Short Text Understanding Through Lexical-Semantic Analysis

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Comment-based Multi-View Clustering of Web 2.0 Items

Language Independent Passage Retrieval for Question Answering

The stages of event extraction

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Word Segmentation of Off-line Handwritten Documents

arxiv: v1 [cs.lg] 3 May 2013

Formulaic Language and Fluency: ESL Teaching Applications

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services

Term Weighting based on Document Revision History

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

THE VERB ARGUMENT BROWSER

Lecture 1: Machine Learning Basics

Automatic document classification of biological literature

Cross-Lingual Text Categorization

Annotation Projection for Discourse Connectives

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Introduction to Questionnaire Design

Degree Qualification Profiles Intellectual Skills

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

CEFR Overall Illustrative English Proficiency Scales

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Web as a Corpus: Going Beyond the n-gram

Grade 5: Module 3A: Overview

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Using Web Searches on Important Words to Create Background Sets for LSI Classification

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

The Short Essay: Week 6

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

arxiv: v1 [cs.cl] 2 Apr 2017

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

A Domain Ontology Development Environment Using a MRD and Text Corpus

Writing Functional Ot Goals In Snf

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Facing our Fears: Reading and Writing about Characters in Literary Text

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

California Department of Education English Language Development Standards for Grade 8

Rule Learning With Negation: Issues Regarding Effectiveness

Beyond the Pipeline: Discrete Optimization in NLP

Construction Grammar. University of Jena.

Transcription:

Natural Language Processing SoSe 2015 Summarization Dr. Mariana Neves July 6th, 2014 (based on the book of Jurafski and Martin 2009)

Outline 2 Task Single-document summarization Multi-document summarization Query-focused summarization Evaluation

Outline 3 Task Single-document summarization Multi-document summarization Query-focused summarization Evaluation

Summarization 4 Half-way between Information retrieval (entire documents) Question answering (factoid answers) It is the process of distilling the most important information from a text to produce an abridged version for a particular task and user (Jurafski and Martin 2009)

Summarization 5 Kinds of summaries Outlines of a document Abstracts of a scientific article Headlines of news articles Snippets summarizing a Web page on a search engine results page Action items or other summaries of a (spoken) business meeting Summaries of emails threads Answers to complex questions (multi-documents)

Summarization Dimensions Single-document Multiple-document 6 Headlines of new articles, abstracts of scientific publications Series of new stories of the same event, emails from a thread

Summarization Dimensions Generic Query-focused 7 focus of the important information of the document(s) Question answering

Abstract vs. Extract Extract Abstract 8 Combination of phrases and sentences from the document(s) Uses different words to describe the content of the document(s) Most current summarizers are extractive (easier)

Abstract vs. Extract (figure taken from Mani 2001) 9

Abstract vs. Extract (figures taken from Mani 2001) 10

Architecture of summarization systems Content selection Information ordering Order and structure the extracted units Sentence realization 11 Usually sentences and clauses Clean up to assure fluency

Outline 12 Task Single-document summarization Multi-document summarization Query-focused summarization Evaluation

Single-document summarization Content selection Choose sentences Binary classification task 13 Important (extract worthy) Unimportant (not extract worthy) Information ordering Sentences are ordered by their original order in the document Sentence realization Remove non-essential phrases from the sentences Fusing sentences into a single one

Unsupervised content selection Select sentences with more salient or informative words Saliency Weight schemes instead of word frequencies 14 Topic signature: set of salient or signature terms with salient scores greater than a threshold θ Tf-idf

Centroid-based summarization Set of signature terms as a pseudo-sentence that is the centroid of all sentence in the document We look for sentences which are close to this centroid sentence Compute distances between each candidate sentence x and each other sentence y Choose sentences which are on average closer to other sentences 1 centrality ( x )= K 15 tf.idf.cosine ( x, y ) y

Rhetorical parsing Introduce more sofisticated discourse knowledge Applying a discourse parser to compute coherence relations between the discurse units (figure taken from Marcu 2000) 16

Supervised Content Selection Efectivelly combine various features from the text Training data: documents and respective summaries Extracts of sentences: 17 Classification task: 1 (present); 0 (not present)

Supervised Content Selection Features Position of the sentence in the text: Title First sentence of paragraph 2 First sentence of paragraph 3 Final sentence Cue phrases Word informativeness 18 In summary.., In conclusion.., This paper.. Topic signature

Supervised Content Selection Features Sentence length Long sentences rather than short ones Binary feature based on a cutoff (e.g., 5 words) Cohesion 19 Sentences that contain more terms from a lexical chain (series of related words) are extract worthy Can also be computed using graph-based methods (e.g., PageRank)

Supervised Content Selection 20 Using abstracts of documents as training data Need to align sentences in abstracts to the document text Longest common subsequences of non-stopwords Minimum edit distance

Sentence realization Sentence compression or sentence simplification Running a syntactic parser and prunning some phrases Examples: 21 Apposition: Barry Goldwater, the junior senator from Arizona, received the Republican nomination in 1964 Attribution clauses: Rebels agreed to talks with governments, international observers said Tuesday Prepositional phrases without NERs Initial adverbials: For example, On the other hand, At this point, etc. Also supervised machine learning

Outline 22 Task Single-document summarization Multi-document summarization Query-focused summarization Evaluation

Multi-document summarization Applications Summarize Web pages for a particular event in the news Finding answers to complex questions Architecture Content selection Information ordering Sentence realization Use of unsupervised methods over supervised ones 23 Not much training data available

Content selection (Multi-doc) 24 Redundancy of information Summaries should not be consisted of identical or similar sentences Calculating the redundancy factor between new extracted sentences and current selected sentences

Content selection (Multi-doc) Maximal Marginal Relevance (MMR) λ is a weight that can be tuned Similarity is some similarity function MMR penalization factor ( s)=λ max s Summary Similarity ( s, si ) i 25

Content selection (Multi-doc) 26 Clustering algorithm Groups sentences in clusters of related sentences Select a single (centroid) sentence from each cluster Sentence simplification or compression in this step Produce many variations of the original sentence Let the clustering or MMR select the best one

Information ordering (Multi-doc) Concatenate extracted sentences in a coherent way Chronological ordering 27 If date of the original document/article is available (e.g, news) But usually lack cohesion Coherence Coherence relations between sentences Cohesion and lexical chains (local cohesion)

Information ordering (Multi-doc) 28 Lexical cohesion Ordering sentences next to sentences which contain similar words tf.idf, cosine similarity between pair of sentences Minimizing distance between neighboring sentences

Information ordering (Multi-doc) 29 Centering Salient entities Syntactic realization of the focus (i.e., subject or object) Transitions between realizations

Information ordering (Multi-doc) 30 Centering Salient entities Syntactic realization of the focus (i.e., subject or object) Transitions between realizations

Information ordering (Multi-doc) (figure taken from Barzilay and Lapata 2005) 31

Information ordering (Multi-doc) 32 Given coherence score for pairs or sequence of sentences Problem: find the optimal ordering of sentences NP-complete But there are good approximation methods Althaus et al. 2004, Knight 1999, Cohen et al 1999, Brew 1992

Sentence realization (Multi-doc) Checking further for coherence Longer or more descriptive phrases should come before short, reduced or abbreviated forms Examples 33 U.S. President George W. Bush and Bush Co-reference resolution algorithm Rewrite, cleanup rules

Sentence realization (Multi-doc) (figure taken from Nenkova and McKeown 2003) 34

Sentence realization (Multi-doc) 35 Sentence fusion Parsing each sentence Alignment of the parses to find common information Build a fusion structure with overlapping information Create a new fused sentence

Sentence realization (Multi-doc) (figure taken from Barzilay and McKeown 2005) 36

Outline 37 Task Single-document summarization Multi-document summarization Query-focused summarization Evaluation

Query-focused summarization Question answering 38 Longer, descriptive, more informative answers

Query-focused summarization Example: (BioASQ training data) "What is the function of the mammalian gene Irg1?" "Human IRG1 and mouse Irg1 mediates antiviral and antimicrobial immune responses, without its exact role having been elucidated. Irg1 has been suggested to have a role in apoptosis and to play a significant role in embryonic implantation. Irg1 is reported as the mammalian ortholog of methylcitrate dehydratase." 39

Query-focused summarization Content selection 40 Adapt multi-doc content selection to rank sentences based relevance to the query Overlapping words query/sentences Cosine similarity query/sentence

Query-focused summarization Content selection Build a top-down expectations for each topic 41 Biography: dates, nationalities, educations, etc. Drug efficacy: population, problem/disease, intervention, outcome, side-effects, etc.

Query-focused summarization Content selection Use of templates: Example: Biography <NAME> is <WHY_FAMOUS>. She/He was born on <BIRTH_DATE> in <BIRTH_LOCATION>. She/He <EDUCATION>. <DESCRIPTIVE_SENTENCE> <DESCRIPTIVE_SENTENCE>... 42

Outline 43 Task Single-document summarization Multi-document sumamrization Query-focused summarization Evaluation

Evaluation ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Measure the amount of overlapping N-grams between automatic and human-generated summaries ROUGE-1 (unigram), ROUGE-2 (bigram), etc. Count match (bigram) ROUGE2= S Summaries bigrams S Count (bigram) S Summaries bigrams S 44

Evaluation ROUGE Recall-oriented measure ROUGE-L ROUGE-S, ROUGE-SU 45 Longest common subsequence Skip bigrams: pair of words in a certain order by allowing any number of words between them

Further Reading Speech and Language Processing 46 Chapters 23.3 23.8