Summarization Machine Translation

Similar documents
Cross Language Information Retrieval

ROSETTA STONE PRODUCT OVERVIEW

arxiv: v1 [cs.cl] 2 Apr 2017

CSCI 5582 Artificial Intelligence. Today 12/5

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Columbia University at DUC 2004

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Probabilistic Latent Semantic Analysis

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

My First Spanish Phrases (Speak Another Language!) By Jill Kalz

Loughton School s curriculum evening. 28 th February 2017

Language Acquisition Chart

Words come in categories

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The College Board Redesigned SAT Grade 12

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES SCHOOL OF INFORMATION SCIENCES

Applications of memory-based natural language processing

California Department of Education English Language Development Standards for Grade 8

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Florida Reading Endorsement Alignment Matrix Competency 1

The stages of event extraction

Language Independent Passage Retrieval for Question Answering

Task Tolerance of MT Output in Integrated Text Processes

Linking Task: Identifying authors and book titles in verbose queries

Minimalism is the name of the predominant approach in generative linguistics today. It was first

CS 598 Natural Language Processing

Construction Grammar. University of Jena.

Timeline. Recommendations

Language Model and Grammar Extraction Variation in Machine Translation

Multilingual Sentiment and Subjectivity Analysis

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

TINE: A Metric to Assess MT Adequacy

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

What the National Curriculum requires in reading at Y5 and Y6

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Constructing Parallel Corpus from Movie Subtitles

A Comparison of Two Text Representations for Sentiment Analysis

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Constraining X-Bar: Theta Theory

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Argument structure and theta roles

Natural Language Processing. George Konidaris

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

L1 and L2 acquisition. Holger Diessel

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Introduction to Simulation

The Smart/Empire TIPSTER IR System

CX 101/201/301 Latin Language and Literature 2015/16

5. UPPER INTERMEDIATE

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Formulaic Language and Fluency: ESL Teaching Applications

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Finding Translations in Scanned Book Collections

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Oakland Unified School District English/ Language Arts Course Syllabus

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

AQUA: An Ontology-Driven Question Answering System

Using Web Searches on Important Words to Create Background Sets for LSI Classification

CEFR Overall Illustrative English Proficiency Scales

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Variations of the Similarity Function of TextRank for Automated Summarization

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Section V Reclassification of English Learners to Fluent English Proficient

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Annotation Projection for Discourse Connectives

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

THE VERB ARGUMENT BROWSER

Parsing of part-of-speech tagged Assamese Texts

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Prediction of Maximal Projection for Semantic Role Labeling

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Case Study: News Classification Based on Term Frequency

Noisy SMS Machine Translation in Low-Density Languages

The Strong Minimalist Thesis and Bounded Optimality

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Compositional Semantics

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Developing a TT-MCTAG for German with an RCG-based Parser

Chapter 9 Banked gap-filling

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Pronunciation: Student self-assessment: Based on the Standards, Topics and Key Concepts and Structures listed here, students should ask themselves...

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Transcription:

Summarization Machine Translation

Summarization Text summarization is the process of distilling the most important information from a text to produce an abridged version for a particular task and user Definition adapted from Mani and Maybury 1999 Types of summaries in current research: Outlines or abstracts of any document, article, etc. Snippets summarizing a Web page or a search engine results page Action items or other summaries of a business meeting Summaries of email threads Simplifying text by compressing sentences 2

Single vs. Multiple Documents Single-document summarization Given a single document, produce abstract outline headline Multiple-document summarization Given a group of documents, produce a gist of the content, and create a cohesive answer that combines information from each document a series of news stories on the same event a set of web pages about some topic or question 3

Extractive vs. Abstractive Extractive summarization: create the summary from phrases or sentences in the source document(s) Abstractive summarization: express the ideas in the source documents using (at least in part) different words 4

Typical approaches to general problem Currently, achieve extraction instead of a true re-phrasing Content Selection Identify the sentences or clauses to extract Information Ordering How to order the selected units Sentence Realization Perform cleanup on the extracted units so that they are fluent in their new context E.g. replacing pronoun or other references left dangling Document Sentence Segmentation All sentences from documents Sentence Extraction Extracted sentences Information Ordering Sentence Realization Sentence Simplification Summary Content Selection 5

Content Selection Simple approach is to select sentences that have more informative words according to saliency defined from a topic signature of the document Centroid-based summarization uses log-likelihood ratios for words, computing the probability of observing the word in the input more often than in the background corpus Other centrality methods try to rank the sentences according to a centrality score Methods based on rhetorical parsing use coherence relations to identify satellite and nucleus sentences Machine learning methods use features based on Position, cue phrases, word informativeness, sentence length, cohesion (computing lexical chains of the document) 6

Information Ordering Simplest is to keep the document ordering Chronological ordering: Order sentences by the date of the document (for summarizing news).. (Barzilay, Elhadad, and McKeown 2002) Coherence: Choose orderings that make neighboring sentences similar (by cosine). Choose orderings in which neighboring sentences discuss the same entity (Barzilay and Lapata 2007) Topical ordering Learn the ordering of topics in the source documents 7

Simplifying Sentences Zajic et al. (2007), Conroy et al. (2006), Vanderwende et al. (2007) Simplest method: parse sentences, use rules to decide which modifiers to prune (more recently a wide variety of machine-learning methods) appositives attribution clauses PPs without named entities initial adverbials Rajam, 28, an artist who was living at the time in Philadelphia, found the inspiration in the back of city magazines. Rebels agreed to talks with government officials, international observers said Tuesday. The commercial fishing restrictions in Washington will not be lifted unless the salmon population increases [PP to a sustainable number]] For example, On the other hand, As a matter of fact, At this point 8

Summarization Evaluation Extrinsic (task-based) evaluation: humans are asked to rate the summaries according to how well they are enabled to perform a specific task Intrinsic (task-independent) evaluation Human judgments to rate the summaries ROUGE (Recall Oriented Understudy Gisting Evaluation) Humans generate summaries for a document collection System-generated summaries are rated according to how close they come to the human-generated summary Measures have included unigram overlap, bigram overlap, and longest common subsequence Pyramid method Humans identify units of meaning and then an overlap measure is computed 9

Summarization for Question-Answering: Snippets Create snippets summarizing a web page for a query Google: 156 characters (about 26 words) plus title and link 10

Machine Translation (MT) Translating text from one language to another. 11

Machine Translation Translating text from one language to another is a task challenging even for humans to try to fully capture the style and nuanced meaning of the original While research focuses on trying to produce the fullyautomatic, high-quality translation, there are many tasks for which a rough translation is sufficient The differences between languages include systematic differences that can be modeled in some way and idiosyncratic and lexical differences that must be dealt with one by one. 12

Why MT is hard Given the Japanese phrase fukaku hansei shite orimasu If this is translated to English as we apologize it is not faithful to the original meaning But if we translate it as we are deeply reflecting (on our past behavior, and what we did wrong, and how to avoid the problem next time) the translation is not fluent. Example from Jurafsky and Martin text. 13

Differences between languages Morphological differences: Number of morphemes per word Isolating languages: Vietnamese and Cantonese, each word has one morpheme Polysynthetic languages: Eskimo, a single word has many morphemes corresponding to a complete sentence. Degree to which morphemes are segmentable Agglutinative, morphemes have clean boundaries (Turkish) Fusion languages, single affix may have multiple morphemes (Russian) 14

Differences between languages Syntactic differences Basic word order of verbs, subjects and objects SVO: English, Mandarin, French, German, SOV: Hindi, Japanese VSO: Classical Arabic and Biblical Hebrew Head marking and dependent marking languages Mark relation between dependent and head on the head English marks possessive on dependent: the man s house Hungarian marks possessive on the head noun: (Hungarian equivalent of:) the man house-his Direction of motion with respect to verb English direction on particle: the bottle floated out Spanish direction on verb: la botella salio flotando Grammatical constraints on matching gender-marked words Many others... 15

Differences between languages Semantic differences Lexical gap One language doesn t have a word for concept in another Differences in way that conceptual space is divided up for different words etape jambe journey leg human leg leg animal leg chair leg patte pied paw animal paw bird foot human foot foot The complex overlap between English leg, foot, etc. and various French translations. (Jurafsky & Martin, Figure 21.2) 16

Classical MT/Machine Translation In this line of MT research, approaches can be classified according to the level of unit of translation Direct translation uses a word translation approach Syntactic and semantic transfer approaches use syntactic phrase and semantic units, respectively, as the unit of translation 17

Statistical Approaches Build probabilistic models of faithfulness and fluency and combine the models to get the most probable translation. Modeled as a noisy channel pretend that the foreign input F is a corrupted version of the target language output E and the task is to discover the hidden sentence E that generated the observed sentence F. Informally, we refer to translating from French to English Requires two models Language model to compute P(E), probability that any sequence E of English words is a sentence Translation model to compute P(F E), conditional probability that French sentence F was a translation of an English sentence E Given French sentence f, its translation e is arg max (all e in E) P(e) * P(f e) Note that this appears backwards to translate from English to French, but we invoke Bayes theorem to define the decoder. 18

Statistical Language Models Language model to compute P(E) In practice, learn probabilities of bigrams in the language to be translated from instead of entire sentences Translation has improved greatly due to large corpora See Google Translate Translation model to compute P(F E) Learn probabilities from parallel corpora Model the translation as word translation combined with alignment prob. E: And the program has been implemented. F: Le programme a ete mis en application. Alignment variables: (2, 3, 4, 5, 6, 6, 6) gives Le -> the mis -> implemented Programme -> program en -> implemented a -> has application -> implemented ete -> been 19

Alignment and Parallel Corpora The translation model uses probabilities of word alignment Word alignment models are automatically trained from parallel corpora Hansard Corpus Canadian parliament documents for French, English and a variety of native American languages United Nations proceedings documents LDC has corpora in several language pairs Literary parallel corpora are not as suitable because of the stronger presence of literary devices, such as metaphor 20

MT Evaluation Human raters can evaluate along the two dimensions of fluency and fidelity (and there are several individual metrics for each of these dimensions) BLEU automatic evaluation system Evaluation corpus contains human generated translations Metrics evaluate how closely the system-generated translations correspond to the human ones 21