Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

CS 598 Natural Language Processing

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

CS Machine Learning

November 2012 MUET (800)

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Developing a TT-MCTAG for German with an RCG-based Parser

How to Judge the Quality of an Objective Classroom Test

Teaching a Laboratory Section

Compositional Semantics

Linking Task: Identifying authors and book titles in verbose queries

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Multilingual Sentiment and Subjectivity Analysis

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Lower and Upper Secondary

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Chapter 4: Valence & Agreement CSLI Publications

Self Study Report Computer Science

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Introduction to Questionnaire Design

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Foundations of Knowledge Representation in Cyc

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

An Interactive Intelligent Language Tutor Over The Internet

Specifying a shallow grammatical for parsing purposes

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

This publication is also available for download at

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

BLACKBOARD TRAINING PHASE 2 CREATE ASSESSMENT. Essential Tool Part 1 Rubrics, page 3-4. Assignment Tool Part 2 Assignments, page 5-10

Parsing of part-of-speech tagged Assamese Texts

Guidelines for Writing an Internship Report

5 th Grade Language Arts Curriculum Map

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

Constraining X-Bar: Theta Theory

Ch VI- SENTENCE PATTERNS.

California Department of Education English Language Development Standards for Grade 8

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Context Free Grammars. Many slides from Michael Collins

The Discourse Anaphoric Properties of Connectives

Go fishing! Responsibility judgments when cooperation breaks down

Instructions and Guidelines for Promotion and Tenure Review of IUB Librarians

Morphosyntactic and Referential Cues to the Identification of Generic Statements

Words come in categories

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Natural Language Processing. George Konidaris

Wonderworks Tier 2 Resources Third Grade 12/03/13

The Interface between Phrasal and Functional Constraints

Skyward Gradebook Online Assignments

Ensemble Technique Utilization for Indonesian Dependency Parser

Rule-based Expert Systems

Audit Documentation. This redrafted SSA 230 supersedes the SSA of the same title in April 2008.

Construction Grammar. University of Jena.

Writing Research Articles

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Science Fair Project Handbook

LNGT0101 Introduction to Linguistics

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Character Stream Parsing of Mixed-lingual Text

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008

Mercer County Schools

TU-E2090 Research Assignment in Operations Management and Services

Firms and Markets Saturdays Summer I 2014

AQUA: An Ontology-Driven Question Answering System

Literature and the Language Arts Experiencing Literature

Create Quiz Questions

Loughton School s curriculum evening. 28 th February 2017

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Grammars & Parsing, Part 1:

Software Maintenance

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Coast Academies Writing Framework Step 4. 1 of 7

Update on Soar-based language processing

Multi-Lingual Text Leveling

Introduction to Simulation

Proof Theory for Syntacticians

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

The stages of event extraction

THE VERB ARGUMENT BROWSER

Chapter 2 Rule Learning in a Nutshell

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Prediction of Maximal Projection for Semantic Role Labeling

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Rule Learning with Negation: Issues Regarding Effectiveness

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Rendezvous with Comet Halley Next Generation of Science Standards

BULATS A2 WORDLIST 2

Transcription:

Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences Manjuan Duan, Ethan Hill, Michael White August 11-12, 2016, LAW-X The Ohio State University Department of Linguistics 1

Joint work with Manjuan& Duan& Ethan& Hill& 2

Introduction

How can we crowd-source data for adapting parsers to new domains? To some extent, MTurk workers can perform meaningand form-oriented tasks such as annotating PP-attachment points, with some training (Snow et al., 2008; Jha et al., 2010) Gerdes (2013) and Zeldes (2016) also found that it was possible to obtain fairly high quality class-sourced annotations, where students only received a modest amount of training 3

How can we crowd-source data for adapting parsers to new domains? To some extent, MTurk workers can perform meaningand form-oriented tasks such as annotating PP-attachment points, with some training (Snow et al., 2008; Jha et al., 2010) Gerdes (2013) and Zeldes (2016) also found that it was possible to obtain fairly high quality class-sourced annotations, where students only received a modest amount of training In the current study, rather than annotating syntax, we use natural language clarification questions, simply asking Mturk workers to select the right paraphrase of a structurally ambiguous sentence 3

Big picture: Just ask people what ambiguous sentences mean Interp 1' Para 1' Sent' Parser' Realizer' AMT:' Closer'in' meaning?' Interp t' Silver' Data' Interp 2' Para 2' 4

Difference from previous studies Aiming (ultimately) for all structural ambiguities identifiable by an automatic parser, not confined to some specific constructions (Jha et al., 2010) AMT workers are making choices among paraphrases, not annotations, and no specific tutorial is needed 5

Methods

laser<num>sg laser<num>sg stop.01<tense>past,<mood>dcl stop.01<tense>past,<mood>dcl laser<num>sg stop<partic>pass PASS<TENSE>past,<MOOD>dcl PASS<TENSE>past,<MOOD>dcl stop<partic>pass Godzilla<NUM>sg Godzilla<NUM>sg laser<num>sg Generating disambiguating paraphrases: An illustration Top Parse Reversal He stopped Godzilla with the laser! With the laser, he stopped Godzilla! Mod Arg0 with Godzilla<NUM>sg he with Mod by Arg0 Arg0 realize Rewrite Godzilla was stopped " by him with the laser! Det Input Sentence the Det he He stopped Godzilla! with the laser! the Next Parse realize Reversal He stopped Godzilla with the laser! Godzilla<NUM>sg Arg0 he Mod with rewrite Arg0 by Arg0 Mod realize Rewrite Godzilla with the laser" was stopped by him! he with Det the Det the 6

Generating disambiguating paraphrases: An illustration Top Parse Reversal He stopped Godzilla with the laser! With the laser, he stopped Godzilla! stop.01<tense>past,<mood>dcl Mod Arg0 PASS<TENSE>past,<MOOD>dcl with Godzilla<NUM>sg he stop<partic>pass Arg0 Mod Arg0 realize laser<num>sg with by Godzilla<NUM>sg Det the laser<num>sg Det he the Next Parse realize Reversal He stopped Godzilla with the laser!

Generating disambiguating paraphrases: An illustration Reversal He stopped Godzilla with the laser! With the laser, he stopped Godzilla! PASS<TENSE>past,<MOOD>dcl with stop<partic>pass Arg0 Mod Arg0 by Godzilla<NUM>sg realize Rewrite Godzilla was stopped " by him with the laser! laser<num>sg he Det the

laser<num>sg stop<partic>pass PASS<TENSE>past,<MOOD>dcl PASS<TENSE>past,<MOOD>dcl stop<partic>pass laser<num>sg Generating disambiguating paraphrases: An illustration Top Parse Reversal He stopped Godzilla with the laser! With the laser, he stopped Godzilla! stop.01<tense>past,<mood>dcl Mod Arg0 with laser<num>sg Godzilla<NUM>sg he with Arg0 Mod Arg0 by Godzilla<NUM>sg realize Rewrite Godzilla was stopped " by him with the laser! Det Input Sentence the Det he He stopped Godzilla! with the laser! the Next Parse stop.01<tense>past,<mood>dcl realize Reversal He stopped Godzilla with the laser! Godzilla<NUM>sg Arg0 he Mod with rewrite Arg0 by Arg0 Godzilla<NUM>sg Mod realize Rewrite Godzilla with the laser" was stopped by him! laser<num>sg he with Det the Det the

Generating disambiguating paraphrases: An illustration Next Parse stop.01<tense>past,<mood>dcl realize Reversal He stopped Godzilla with the laser! Godzilla<NUM>sg Arg0 he PASS<TENSE>past,<MOOD>dcl Mod with rewrite stop<partic>pass Arg0 Arg0 by Godzilla<NUM>sg Mod realize Re Godzilla was stopp laser<num>sg he with Det the laser<num>sg Det the

Generating disambiguating paraphrases: An illustration lize Reversal He stopped Godzilla with the laser! PASS<TENSE>past,<MOOD>dcl ewrite stop<partic>pass Arg0 Arg0 by Godzilla<NUM>sg Mod realize Rewrite Godzilla with the laser" was stopped by him! he with laser<num>sg Det the

Obtaining meaningfully distinct parses 1. Parse the input sentence with the OpenCCG parser to obtain its top 25 parses 2. Find a parse from the n-best parse list which is meaningfully distinct from the top parse: 8

Obtaining meaningfully distinct parses 1. Parse the input sentence with the OpenCCG parser to obtain its top 25 parses 2. Find a parse from the n-best parse list which is meaningfully distinct from the top parse: Only compare the unlabeled and unordered dependencies from the two parses The symmetric difference cannot be empty, with neither set of dependencies a superset of the other 8

Obtaining meaningfully distinct parses 1. Parse the input sentence with the OpenCCG parser to obtain its top 25 parses 2. Find a parse from the n-best parse list which is meaningfully distinct from the top parse: Only compare the unlabeled and unordered dependencies from the two parses The symmetric difference cannot be empty, with neither set of dependencies a superset of the other Ambiguities involving only POS, named entity or word sense differences are disregarded 8

Obtaining meaningfully distinct parses 1. Parse the input sentence with the OpenCCG parser to obtain its top 25 parses 2. Find a parse from the n-best parse list which is meaningfully distinct from the top parse: Only compare the unlabeled and unordered dependencies from the two parses The symmetric difference cannot be empty, with neither set of dependencies a superset of the other Ambiguities involving only POS, named entity or word sense differences are disregarded 3. If successful, this phase yields a top and next parse the ones reflecting the greatest uncertainty 8

Two ways to obtain paraphrases Paraphrases obtained from reverse realization (reversals) Able to generate paraphrases for ambiguities involving various constructions identifiable by an auto parser Paraphrases obtained from logical form rewriting (rewrites) Triggered by specific syntactic constructions such as PP-attachment ambiguity and modifier scope ambiguity in coordination 9

Validating reverse realizations Need to ensure paraphrases actually disambiguate intended meanings 10

Validating reverse realizations Need to ensure paraphrases actually disambiguate intended meanings 1. Realize the top and next parse into a n-best realization list (n=25), using OpenCCG 2. Traverse the list to find a qualifying paraphrase, which has to be different from the original sentence have different relative distance among the words involving the ambiguity from the original sentence 10

Validating reverse realizations Need to ensure paraphrases actually disambiguate intended meanings 1. Realize the top and next parse into a n-best realization list (n=25), using OpenCCG 2. Traverse the list to find a qualifying paraphrase, which has to be different from the original sentence have different relative distance among the words involving the ambiguity from the original sentence 3. Parse each candidate paraphrase to make sure the most likely interpretation includes the dependencies from which it was generated 10

Two-sided paraphrases and one-sided paraphrases Two-sided paraphrases: Two paraphrases are obtained for the original sentence, one generated from the top parse, and one from the next One-sided paraphrases: Only one paraphrase is obtained for the original sentence 11

Logical form rewriting Rewritten logical forms are realized to obtain paraphrases which highlight the ambiguous part Passive and cleft rewrites for PP-attachment ambiguities Coordination rewrites for ambiguities in the scope of modiers with coordinated phrases 12

Passive rewrites: An example I saw the girl with the telescope. Rewrite The girl with the telescope was seen by me. 13

Cleft rewrites: An example I saw the girl with the telescope. Rewrite The girl with the telescope was what I saw. 14

Coordination rewrites: An example (1) The old men and women are becoming senile. Rewrite The old women and the old men are becoming senile 15

Coordination rewrites: An example (2) The old men and women are becoming senile. Rewrite The women and the old men are becoming senile 16

Experiment

Validation experiment Aim: Examine the quality of the crowd-sourced annotations through disambiguating paraphrases Used AMT workers as our naive annotators For comparison, hand annotated 1,030 sentences as the optimal ( gold ) annotations to measure the accuracy of the crowd-sourced annotations 17

Data preparation Parsing(and( Filtering( Paraphrasing( Selec2on( AMT( Surveys( 14,114(sentences( from(big(10(football( and(prehistoric( rep2les( 5,063(with(( top(and(next( parses( 3,605(valid( paraphrases( 1,030( items( Working assumption: Unannotated data available in large quantities, so can focus on most informative ambiguities 18

Gold annotations We selected the correct parse of the sentence by examining the dependency graphs of the input sentence: Annotated top if the top parse was correct Annotated next if the next parse was correct Annotated neither if neither of them was more correct than the other one 19

Distribution of test data 20

Collecting human judgments 5 judgments for each sentence were collected from AMT workers and the judgments of identical sentences were collapsed Neither cases were excluded from analysis Comprehension questions were asked to prevent random choosing Agreement levels among the AMT workers: Majority > 50% agreement Strong Majority > 75% Unanimity > 90% 21

Coverage vs. Accuracy: Higher accuracy (but lower coverage) with greater agreement 22

One-sided vs. Two-sided: Two-sided much more reliable 23

Reversals vs. Rewrites: Reversals at least as accurate 24

Potential correction to current parser 25

Manual analysis Examined 43 sentences where unanimous AMT workers judgments did not agree with gold annotations and located the following reasons for error: Incompetent or broken realizations (29/43) Bad parses (11/43) Lack of context (3/43) 26

Preliminary parser retraining experiment Trained OpenCCG Parser with majority AMT worker annotations (along with original CCGbank data) Trained the parser separately in the two domains Evaluated the parser with 10-fold cross validation 27

Evaluation of retrained parser: an example Parses were considered correct if the top and next dependencies occur in the same order as in gold: e.g., for the sentence I saw the girl with the telescope, if (saw, with) is annotated as the correct dependency, n-best parses Correct Incorrect 1...... 2 (saw, with)... 3...... 4... (girl, with) 5 (girl, with)... 6............ (saw, with) 25...... 28

Parser retraining results Dinosaur Football Train size 471 356 Eval size 291 226 Original acc. 0.701 0.668 Retrained acc. 0.749 0.717 Correction rate 0.243 0.32 MacNemars chi-square test shows a significant improvement in the dinosaur domain (p = 0.02) No significant improvement on football data due to the smaller data size The retrained parsers do not differ significantly from the original parser (p > 0.05 for both) on the CCGbank development set 29

Conclusions

Conclusions and future work It is possible to obtain accurate crowd-sourced judgments from naive annotators with no instruction pointing the way towards collecting parser training data on a massive scale 30

Conclusions and future work It is possible to obtain accurate crowd-sourced judgments from naive annotators with no instruction pointing the way towards collecting parser training data on a massive scale The preliminary parsing experiment already suggests that automatic parsers can be retrained to achieve better parsing accuracy 30

Conclusions and future work It is possible to obtain accurate crowd-sourced judgments from naive annotators with no instruction pointing the way towards collecting parser training data on a massive scale The preliminary parsing experiment already suggests that automatic parsers can be retrained to achieve better parsing accuracy In the future, we plan to experiment with parser adaptation with multiple parsers and larger data sets We also plan to experiment with generating paraphrases with sentence splitting and simplification (Siddharthan, 2006; Siddharthan, 2011) 30

Acknowledgments We thank James Curran, Eric Fosler-Lussier, the OSU Clippers Group and the anonymous reviewers for helpful comments and discussion. This work was supported in part by NSF grant 1319318. 31

Thank you! 31

Incompetent realizations Realization ok, but fails to reliably capture the different meaning in the parses Usually involved just adding or deleting punctuation 32

Incompetent realizations: An example The teeth were adapted to crush bivalves, gastropods and other animals with a shell or exoskeleton. (animals, with): Same as the original sentence (crush, with): The teeth were adapted to crush bivalves, gastropods and other animals, with a shell or exoskeleton. 33

Broken realizations Inappropriate heavy NP shift Long adverbials moved between verbs and their (other) complements Wrong modifier-modificand word order Wrong position of the particle for phrasal verbs Wrong preposition-complement position 34

Broken realizations: An example They are thought to have gone extinct during the Triassic-Jurassic extinction event. (gone, during): They are thought to have gone during the Triassic-Jurassic extinction event extinct. (thought, during): They are thought during the Triassic-Jurassic extinction event to have gone extinct. 35

Bad parses Although one parse is better than the other one for the disputed dependency, the rest of both parses are so broken that the realization cannot reliably capture the meaning difference Parsing in as a conjunction Bad parse in general 36

Bad parses: An example Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina. (returns, to): Coming off a disappointing 2-10 season in 2009 returns to a bowl game to face East Carolina Maryland. (Coming, to): Coming off a disappointing 2-10 season to a bowl game to face East Carolina in 2009 Maryland returns. 37

Bad parses: top parse Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina. come.03<mood>dcl,<nom>+,<partic>pres Mod Arg2 in off x1 return<num>pl,<det>nil season<num>sg Mod Mod Mod Mod Mod Det face.01 2009 Maryland<NUM>sg to 2-10<NUM>sg disappointing a Arg0 Purpose East_Carolina<NUM>sg game<num>sg Mod Det bowl<num>sg a 38

Bad parses: next meaningfully distinct Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina. come.03<mood>dcl,<nom>+,<partic>pres Mod Mod Arg2 face.01 to in off x1 Arg0 Purpose East_Carolina<NUM>sg game<num>sg return<num>pl,<det>nil season<num>sg Mod Det Mod Mod Mod Mod Det bowl<num>sg a 2009 Maryland<NUM>sg 2-10<NUM>sg disappointing a 39

Lack of context Turkers fail to choose the correct parse because of lack of context 40

Lack of context: An example Michigan s backup center, Gerald Ford, expressed a desire to attend the fair while in Chicago. (attend, while): Michigan s backup center, Gerald Ford, expressed a desire to attend while in Chicago the fair. (expressed, while): Michigan s backup center, Gerald Ford, expressed while in Chicago a desire to attend the fair. 41

Regression analysis A regression analysis to determine the factors affecting AMT workers choices: One-sided Two-sided Maj S. Maj Maj S. Maj parse -0.03-0.05 0.01 0.01 bleu 3.05* 4.38** 1.68* 3.07** rlz.glb 0.01 0.01 0.07** 0.103*** AMT workers tend to choose: the paraphrases similar to the original sentence the paraphrases with higher fluency scores 42

Regression analysis for coverage and accuracy trade-off 1.0 0.9 Accuracy 0.8 Majority.Baseline Majority.Pred Strong.Majority.Baseline Strong.Majority.Pred 0.7 0.6 0 100 200 300 400 Data Size 43

Distribution of test data 44

Data preparation 1. We collected 6,335 sentences from Prehistoric Reptiles and 7,779 from Big 10 Conference Football 2. After parsing the sentences and filtering sentences too short or too long, 5,063 sentences were found to be ambiguous 3. Valid paraphrases were generated for 3,605 sentences 4. 515 sentences from each domain were selected for validation experiment 45