Ameeta Agrawal Nikolay Yakovets. 01 Dec 2011

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Problems in Current Text Simplification Research: New Data Can Help

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Detecting English-French Cognates Using Orthographic Edit Distance

Readability tools: are they useful for medical writers?

Multi-Lingual Text Leveling

Columbia University at DUC 2004

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Language Acquisition Chart

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

Assignment 1: Predicting Amazon Review Ratings

Annotation Projection for Discourse Connectives

Formulaic Language and Fluency: ESL Teaching Applications

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Statewide Framework Document for:

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Beyond the Pipeline: Discrete Optimization in NLP

Language Model and Grammar Extraction Variation in Machine Translation

Guidelines for Writing an Internship Report

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Discriminative Learning of Beam-Search Heuristics for Planning

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Noisy SMS Machine Translation in Low-Density Languages

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law

Florida Reading Endorsement Alignment Matrix Competency 1

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Mathematics process categories

The Smart/Empire TIPSTER IR System

Creating a Test in Eduphoria! Aware

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Multilingual Sentiment and Subjectivity Analysis

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

BAYLOR COLLEGE OF MEDICINE ACADEMY WEEKLY INSTRUCTIONAL AGENDA 8 th Grade 02/20/ /24/2017

Linking Task: Identifying authors and book titles in verbose queries

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

National Literacy and Numeracy Framework for years 3/4

Ensemble Technique Utilization for Indonesian Dependency Parser

The Indices Investigations Teacher s Notes

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

arxiv: v1 [cs.cl] 2 Apr 2017

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

L1 and L2 acquisition. Holger Diessel

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Getting Started with Deliberate Practice

Graph Alignment for Semi-Supervised Semantic Role Labeling

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Grammars & Parsing, Part 1:

Who s Reading Your Writing: How Difficult Is Your Text?

Aviation English Training: How long Does it Take?

Learning to Schedule Straight-Line Code

The NICT Translation System for IWSLT 2012

A Domain Ontology Development Environment Using a MRD and Text Corpus

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Introduction and Motivation

A First-Pass Approach for Evaluating Machine Translation Systems

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

A Framework for Customizable Generation of Hypertext Presentations

GACE Computer Science Assessment Test at a Glance

A Case Study: News Classification Based on Term Frequency

Chapter 4 - Fractions

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Parsing of part-of-speech tagged Assamese Texts

Blank Table Of Contents Template Interactive Notebook

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

ROSETTA STONE PRODUCT OVERVIEW

Are You Ready? Simplify Fractions

NAME OF ASSESSMENT: Reading Informational Texts and Argument Writing Performance Assessment

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Distant Supervised Relation Extraction with Wikipedia and Freebase

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

A Graph Based Authorship Identification Approach

English Language and Applied Linguistics. Module Descriptions 2017/18

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Visual CP Representation of Knowledge

Create A City: An Urban Planning Exercise Students learn the process of planning a community, while reinforcing their writing and speaking skills.

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Constructing Parallel Corpus from Movie Subtitles

Table of Contents. Introduction Choral Reading How to Use This Book...5. Cloze Activities Correlation to TESOL Standards...

Constraining X-Bar: Theta Theory

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

The College Board Redesigned SAT Grade 12

Excel Intermediate

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

The Effect of Syntactic Simplicity and Complexity on the Readability of the Text

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Reading Comprehension Lesson Plan

Algebra 1 Summer Packet

Transcription:

Ameeta Agrawal Nikolay Yakovets 01 Dec 2011

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. Prime Minister Vladimir V. Pu2n, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Pu2n built his reputa2on in part on his success at suppressing terrorism, so the a@acks could be considered a challenge to his stature. 2

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. main clause Prime Minister Vladimir V. Pu2n, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Pu2n built his reputa2on in part on his success at suppressing terrorism, so the a@acks could be considered a challenge to his stature. 3

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. main clause apposi-ve Prime Minister Vladimir V. Pu2n, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Pu2n built his reputa2on in part on his success at suppressing terrorism, so the a@acks could be considered a challenge to his stature. 4

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. main clause apposi-ve Prime Minister Vladimir V. Pu2n, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Pu2n built his reputa2on in part on his success at suppressing terrorism, so the a@acks could be considered a challenge to his stature. par-cipial phrase 5

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. main clause apposi-ve Prime Minister Vladimir V. Pu2n, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Pu2n built his reputa2on in part on his success at suppressing terrorism, so the a@acks could be considered a challenge to his stature. conjunc-on of clauses par-cipial phrase 6

In complex sentences, facts can be presented with varied and complex linguis2c construc2ons. Output: Prime Minister Vladimir V. Putin cut short a trip to Siberia. He is the country's top leader. He returned to Moscow to oversee the federal response. Mr. Putin built his reputation in part on his success at suppressing terrorism. 7

Source: Complex sentence Target: Set of simple declarative sentences Easier to read Simpler vocabulary and syntactic structure

Development of reading aids for: People with aphasia (Carroll et al., 1999) Non-native speakers (Siddharthan, 2003) Individuals with low literacy (Watanabe et al., 2009) Improve performance of: Parsers (Chandrasekar et al., 1996) Summarizers (Klebanov et al., 2004) Semantic role labelers (Vickrey and Koller, 2008)

Baseline: substitution of difficult words with more common words Input: Prime Minister Putin, the country s paramount leader Output: Prime Minister Putin, the country s top leader Sentence structural rules e.g. Rewrite operations: deletion, substitution, insertion, reordering, etc. Input: Prime Minister Putin, the country s paramount leader, cut short a trip to Siberia. Output: Prime Minister Putin is the country s top leader. He cut short a trip to Siberia. Challenge: learning the rules How do you do this automatically?

Woodsend, Lapata, 2011 Wikipedia to induce quasi-synch grammar align phrases and learn syntactic & lexical simplification rules Integer Linear Programming to put the phrases together Coster, Kauchak, 2011 generate a parallel corpus using Wikipedia and Simple Wikipedia align phrases and dynamic programming to find the best global sentence alignment Biran, Brody, Elhadad, 2011 use Wikipedia revision histories to learn lexical simplification rules

Woodsend, Lapata, 2011 Wikipedia and Simple Wikipedia to induce quasisynch grammar align phrases and learn syntactic & lexical simplification rules Integer Linear Programming to put the phrases together Coster, Kauchak, 2011 generate a parallel corpus using Wikipedia and Simple Wikipedia align phrases to find the best global sentence alignment Biran, Brody, Elhadad, 2011 use Wikipedia and Simple Wikipedia revision histories to learn lexical simplification rules

Alignment Most alignment previously done at global level Useful when input and output sequences are similar and of roughly equal size But consider the example sentences: Prime Minister Vladimir V. Putin, the country's paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response. Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature Prime Minister Vladimir V. Putin cut short a trip to Siberia. He is the country's top leader. He returned to Moscow to oversee the federal response. Mr. Putin built his reputation in part on his success at suppressing terrorism. Sequences not similar, not equal size!

Alignment Propose to test local alignment More useful for dissimilar sequences that are suspected to contain regions of similarity Other sequence alignment algorithms used in Bioinformatics Perhaps conceptual graphs as they provide an intuitive and easily understandable means to represent knowledge (??)

Two ways: Human judges Automatic evaluation (readability measures) For Wikipedia, Simple Wikipedia and our output: Flesch-Kincaid Grade Level index BLEU: scores the target output by counting n-gram matches with the reference TERp: similar to word error rate, allows shifts Closest to Simple Wikipedia = good!