Problems in Current Text Simplification Research

Similar documents
Problems in Current Text Simplification Research: New Data Can Help

arxiv: v1 [cs.cl] 2 Apr 2017

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Noisy SMS Machine Translation in Low-Density Languages

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Effect of Word Complexity on L2 Vocabulary Learning

Detecting English-French Cognates Using Orthographic Edit Distance

Cross Language Information Retrieval

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills:

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Multi-Lingual Text Leveling

Re-evaluating the Role of Bleu in Machine Translation Research

Applications of memory-based natural language processing

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Memory-based grammatical error correction

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

On-the-Fly Customization of Automated Essay Scoring

Physics 270: Experimental Physics

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

Many instructors use a weighted total to calculate their grades. This lesson explains how to set up a weighted total using categories.

Readability tools: are they useful for medical writers?

Language Model and Grammar Extraction Variation in Machine Translation

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

Constructing Parallel Corpus from Movie Subtitles

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Multilingual and Cross-Lingual Complex Word Identification

The NICT Translation System for IWSLT 2012

Word Segmentation of Off-line Handwritten Documents

Guru: A Computer Tutor that Models Expert Human Tutors

TINE: A Metric to Assess MT Adequacy

Improving the impact of development projects in Sub-Saharan Africa through increased UK/Brazil cooperation and partnerships Held in Brasilia

MYP Language A Course Outline Year 3

CS Machine Learning

Mathematics process categories

Australian Journal of Basic and Applied Sciences

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Lecture 2: Quantifiers and Approximation

CEFR Overall Illustrative English Proficiency Scales

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

The Effect of Syntactic Simplicity and Complexity on the Readability of the Text

Language Acquisition Chart

An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

One Stop Shop For Educators

Transportation Equity Analysis

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Discriminative Learning of Beam-Search Heuristics for Planning

The Relationship Between Poverty and Achievement in Maine Public Schools and a Path Forward

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

ROSETTA STONE PRODUCT OVERVIEW

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Assignment 1: Predicting Amazon Review Ratings

Linking Task: Identifying authors and book titles in verbose queries

2013 TRIAL URBAN DISTRICT ASSESSMENT (TUDA) RESULTS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning With Negation: Issues Regarding Effectiveness

A heuristic framework for pivot-based bilingual dictionary induction

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

South Carolina English Language Arts

Columbia University at DUC 2004

Language Independent Passage Retrieval for Question Answering

Managing Printing Services

Khairul Hisyam Kamarudin, PhD 22 Feb 2017 / UTM Kuala Lumpur

Using Moodle in ESOL Writing Classes

Using Proportions to Solve Percentage Problems I

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

English Language Arts Missouri Learning Standards Grade-Level Expectations

Prentice Hall Literature Common Core Edition Grade 10, 2012

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Special Edition. Starter Teacher s Pack. Adrian Doff, Sabina Ostrowska & Johanna Stirling With Rachel Thake, Cathy Brabben & Mark Lloyd

English Grammar and Usage (ENGL )

From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Online Updating of Word Representations for Part-of-Speech Tagging

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

GEOG Introduction to GIS - Fall 2015

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

This Performance Standards include four major components. They are

Vocabulary Usage and Intelligibility in Learner Language

Virtual Seminar Courses: Issues from here to there

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

Learning Methods in Multilingual Speech Recognition

Transcription:

Problems in Current Text Simplification Research Wei Xu Chris Callison-Burch Courtney Napoles UPenn UPenn JHU TACL paper @ EMNLP Sep-20-2015

What is Text Simplification

What is Text Simplification INPUT Applesauce is a puree made of apples.

What is Text Simplification INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste.

What is Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.

What is Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )

What is Text Simplification paraphrasing INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )

What is Text Simplification paraphrasing deletion INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )

What is Text Simplification!! paraphrasing deletion splitting INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )

What is Text Simplification!! paraphrasing deletion splitting INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples. for children, disabled, non-native speakers for other NLP tasks (MT, summarization )

Goal of Text Simplification INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.

Goal of Text Simplification grammaticality INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.

Goal of Text Simplification grammaticality meaning preservation INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.

Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 OUT-2 Applesauce is a puree made of apples. Applesauce is a soft paste. Applesauce is a paste. It is made of apples.

Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste. Human Evaluation 5 4 5 OUT-2 Applesauce is a paste. It is made of apples. 5 5 4

Goal of Text Simplification grammaticality meaning preservation simplicity INPUT OUT-1 Applesauce is a puree made of apples. Applesauce is a soft paste. Human Evaluation 5 4 5 OUT-2 Applesauce is a paste. It is made of apples. 5 5 4 (no reliable automatic evaluation yet)

Brief History of Sentence Simplification

Brief History of Sentence Simplification rule-based 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)

Brief History of Sentence Simplification rule-based 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)

Parallel Wikipedia Corpus

Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)

Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)

Brief History of Sentence Simplification rule-based machine translation 1997 Chandrasekar & Srinivas 1999 Dras (PhD thesis) 2000 Carroll, Minnen, Pearce, Canning, Devlin 2002 Canning (PhD thesis) 2004 Siddharthan (PhD thesis) 2010 Zhu, Bernhard, Gurevych 2011 Woodsend & Lapata 2011 Coster & Kauchak 2012 Wubben, van den Bosch, Krahmer 2014 Narayan & Gardent 2014 Siddharthan (Survey) 2014 Angrosh, Nomoto, Siddharthan 2014 Narayan (PhD thesis) Now Xu, Callison-Burch, Napoles (Opinion)

Problems in Simplification Research State-of-the-art evaluation is suboptimal. But we have been doing this in the past 5 years*. Simple Wikipedia data dominated in the past 5 years. But its quality was taken for granted. It limits the scope of research. * (Angrosh et al. 2014) tried comprehension quiz

Why this is important? Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind

Why this is important? Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind

Why this is important? Simplification Breakthrough on Sea wind direction wind direction better understanding better review more diverse research better data and evaluation better model a straight path upwind a zigzag path upwind

Recently, there have been several attempts at addressing the text simplification task as a monolingual translation problem However, they did not try to seek reasons for the success or the failure of their systems. Štajner, Béchara, Saggion (2015)

Recently, there have been several attempts at addressing the text simplification task as a monolingual translation problem However, they did not try to seek reasons for the success or the failure of their systems. Štajner, Béchara, Saggion (2015) WHY DID THIS HAPPEN? state-of-the-art competition 1 2 not easy to do

Opinion #1 Current evaluation doesn t tell us what s going on.

System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity not easy to measure

System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: not easy to measure

System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: - evaluate sub-tasks separately not easy to measure

System Comparability sub-systems paraphrasing evaluation criteria grammaticality deletion meaning preservation!! splitting simplicity We need more controlled evaluation: - evaluate sub-tasks separately not easy to measure - target specific audience (e.g. 10-12 year old)

Opinion #2 Simple Wikipedia is not that simple

Specific questions that need addressing are : we need to better understand the quality of Simple English Wikipedia, a resource that has been used to train many SMT based simplification systems Advaith Siddharthan (2014 Survey)

Specific questions that need addressing are : we need to better understand the quality of Simple English Wikipedia, a resource that has been used to train many SMT based simplification systems Advaith Siddharthan (2014 Survey) WHAT S NEW? We quantitively and systematically answer this quest.

Quality of Parallel Wikipedia Corpus* alignment error real simplification 17% 50% 33% not simpler

Inaccuracy in Parallel Wikipedia Corpus* alignment error real real (two sentences have different meaning) simplification 17% 50% 33% not not simpler

Inaccuracy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% not not simpler

Inadequacy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% not simpler

Inadequacy in Parallel Wikipedia Corpus* alignment error 17% real real (two sentences have different meaning) simplification Best automatic sentence alignment gets about 0.7 F1 score (Hwang et al. 2015) 50% 33% Sentences can have similar meaning but not simplification not simpler

Inadequacy in Parallel Wikipedia Corpus* alignment error real simplification (aligned and simpler) 17% 33% 50% not simpler

Inadequacy in Parallel Wikipedia Corpus* r real simplification (aligned and simpler)? 50%

Inadequacy in Parallel Wikipedia Corpus* r real simplification? (aligned and simpler) 50% Some sentences are simpler by only one word while the rest of sentence is still complex

Issues with Parallel Wikipedia Corpus

Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities

Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics

Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics suboptimal for tuning MT system

Issues with Parallel Wikipedia Corpus suboptimal for estimating translation probabilities suboptimal for developing automatic metrics suboptimal for tuning MT system unsuitable for document-level simplification

Opinion #3 New data can help

Newsela Dataset every article at 5 levels of simplification written by trained editors, comes with comprehension quizzes Wei Xu, Chris Callison- Burch, Courtney Napoles. Problems in Current Text Simplifica@on Research: New Data Can Help TACL (2015)

Wikipedia* Newsela alignment error 17% real simplification alignment error not simpler 2% 6% real simplification 50% 33% 92% not simpler manual inspection of aligned sentence pairs

Wikipedia* Newsela Good simplification needs more paraphrasing. deletion + paraphrase 24% deletion only 42% deletion + paraphrase deletion only 7% 20% paraphrase only 34% 74% paraphrase only degree of paraphrasing

Wikipedia* Newsela Good simplification could be much shorter. Normal Simple 30 24 18 12 6 0 30 24 18 12 6 0 sentence length (#words) see syntax analysis in the paper

Wikipedia* (total 2.6 million tokens) Newsela (total 1.3 million tokens) Good simplification uses a much smaller vocabulary. Normal Simple Normal Simple 23,771 71,340 6,669 19,849 19,197 583 chimpanzee chimp 18% reduction 48% reduction vocabulary size (#unique words)

Wikipedia* Newsela Good simplification reduces certain function word usage. commune, as and northern northwestern film ; southwestern footballer, and " of which as percent including director most significantly reduced words (weighted log-odds-ratio analysis w/ informative Dirichlet prior)

Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8,000 0 750 1500 2250 3000 approximately approximately 0 125 250 375 500 0 10 20 30 40 most significantly reduced words see syntax analysis in the paper

Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8,000 0 750 1500 2250 3000 approximately approximately 0 125 250 375 500 0 10 20 30 40 most significantly reduced words see syntax analysis in the paper

Wikipedia* Newsela Normal Simple Postal officials recently tried to, which could. Postal officials recently tried to. That could. which which where where 0 2,000 4,000 6,000 8,000 0 750 1500 2250 3000 approximately approximately 0 125 250 375 500 0 10 20 30 40 most significantly reduced words see syntax analysis in the paper

Wikipedia* Newsela Wikipedia is not suitable for full-document simplification. 3.19% 57.28% 0.00 0.25 0.50 0.75 1.00 1.25 0.00 0.25 0.50 0.75 1.00 1.25 document compression ratio (simple/normal) see discourse analysis in the paper

Opinion #1 Current evaluation doesn t tell us what s going on. Opinion #2 Simple Wikipedia is not that simple. Opinion #3 New data can help.

My Suggestions

My Suggestions to reviewers:

My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia

My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia - be sympathetic towards papers specially on data construction*, data analysis* and automatic evaluation metrics * (Pellow & Maxine, 2014 HCOMP; Marcelo & Specia, 2014 PITR)

My Suggestions to reviewers: - be open-minded to papers that may not follow previous evaluation setup, may not outperform the state-of-theart on Wikipedia - be sympathetic towards papers specially on data construction*, data analysis* and automatic evaluation metrics - read our paper * (Pellow & Maxine, 2014 HCOMP; Marcelo & Specia, 2014 PITR)

My Suggestions

My Suggestions to researchers:

My Suggestions to researchers: - consider working on text simplification ( pre-bleu age )

My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation

My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation - make your system replicable

My Suggestions to researchers: - consider working on text simplification ( pre-bleu age ) - improve evaluation - make your system replicable - read our paper

Thank you Questions? Opinions? Sponsor: NSF Newsela data are available at h5ps://newsela.com/data/

Back Up

Wikipedia* Newsela simple cue words complex conjunc1ons change of discourse connectives (odds-ratio)

Reasons of Quality Issues in Parallel Wikipedia Corpus The Simple Wikipedia was created by volunteers with no specific objective; Articles in Simple Wikipedia do not necessarily map Normal Wikipedia; As an encyclopedia, Wikipedia contains extremely difficulty words and sentences.

Newsela Dataset Original Simple-1 Simple-2 Simple-3 Slightly more fourth-graders nationwide are reading proficiently compared with a decade ago, but only a third of them are now reading well, according to a new report. Fourth-graders in most states are better readers than they were a decade ago. But only a third of them actually are able to read well, according to a new report. Fourth-graders in most states are better readers than they were a decade ago. But only a third of them actually are able to read well, according to a new report. Most fourth-graders are better readers than they were 10 years ago. But few of them can actually read well. Simple-4 Fourth-graders are better readers than 10 years ago. But few of them read well.

Newsela Dataset 1,130 news articles Time: 2013 January ~ 2015 March Source: Chicago Tribune, Seattle Times, LA Times, The Baltimore Sun Original: 56k sentences Simple: 64k sentences