Lexical Cohesion and Coherence

Similar documents
Probabilistic Latent Semantic Analysis

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Compositional Semantics

English Language and Applied Linguistics. Module Descriptions 2017/18

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

been each get other TASK #1 Fry Words TASK #2 Fry Words Write the following words in ABC order: Write the following words in ABC order:

Do multi-year scholarships increase retention? Results

CaMLA Working Papers

Florida Reading Endorsement Alignment Matrix Competency 1

The Smart/Empire TIPSTER IR System

Protocol for using the Classroom Walkthrough Observation Instrument

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

CS 598 Natural Language Processing

M55205-Mastering Microsoft Project 2016

arxiv: v1 [cs.cl] 2 Apr 2017

Introduction to Questionnaire Design

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Speech Recognition at ICSI: Broadcast News and beyond

Evidence for Reliability, Validity and Learning Effectiveness

Learning Methods in Multilingual Speech Recognition

Guidelines for Writing an Internship Report

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Exemplar Grade 9 Reading Test Questions

Realization of Textual Cohesion and Coherence in Business Letters through Presupposition 1

SEMAFOR: Frame Argument Resolution with Log-Linear Models

A Case Study: News Classification Based on Term Frequency

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Tools to SUPPORT IMPLEMENTATION OF a monitoring system for regularly scheduled series

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

AQUA: An Ontology-Driven Question Answering System

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

On-the-Fly Customization of Automated Essay Scoring

Knowledge-Based - Systems

Constructing a support system for self-learning playing the piano at the beginning stage

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Case study Norway case 1

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Economics Unit: Beatrice s Goat Teacher: David Suits

Supervised Agriculture Experience Suffield Regional 2013

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

The College Board Redesigned SAT Grade 12

Applications of memory-based natural language processing

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Word Segmentation of Off-line Handwritten Documents

Control and Boundedness

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Introductory Astronomy. Physics 134K. Fall 2016

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

The Short Essay: Week 6

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

What the National Curriculum requires in reading at Y5 and Y6

Lecture 2: Quantifiers and Approximation

Exploring the Feasibility of Automatically Rating Online Article Quality

A discursive grid approach to model local coherence in multi-document summaries

Getting the Story Right: Making Computer-Generated Stories More Entertaining

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Biome I Can Statements

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

CSC200: Lecture 4. Allan Borodin

Rendezvous with Comet Halley Next Generation of Science Standards

Automatic Essay Assessment

Copyright Corwin 2015

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION. ENGLISH LANGUAGE ARTS (Common Core)

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

A Framework for Customizable Generation of Hypertext Presentations

What is related to student retention in STEM for STEM majors? Abstract:

Vorlesung Mensch-Maschine-Interaktion

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

DICTE PLATFORM: AN INPUT TO COLLABORATION AND KNOWLEDGE SHARING

Curriculum and Assessment Policy

Multi-genre Writing Assignment

5 Star Writing Persuasive Essay

LNGT0101 Introduction to Linguistics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Effect of Word Complexity on L2 Vocabulary Learning

Generating Test Cases From Use Cases

ENG 111 Achievement Requirements Fall Semester 2007 MWF 10:30-11: OLSC

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Chapter 4: Valence & Agreement CSLI Publications

A STUDY ON THE EFFECTS OF IMPLEMENTING A 1:1 INITIATIVE ON STUDENT ACHEIVMENT BASED ON ACT SCORES JEFF ARMSTRONG. Submitted to

Loughton School s curriculum evening. 28 th February 2017

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Construction Grammar. University of Jena.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Transcription:

Leftovers from Last Time Coherence in Automatically Generated Text Input Type C S eg for ABC ASR 0.1723 Closed Captions 0.1515 Transcripts 0.1356 DUC results: most of automatic summaries exhibit lack of coherence Is it possible to automatically compute text coherence? text representation Note the impact for ASR! inference procedure Lexical Cohesion and Coherence 1/34 Lexical Cohesion and Coherence 3/34 Lack of Coherence Lexical Cohesion and Coherence Regina Barzilay regina@csail.mit.edu Hobbs Example(1982) When Teddy Kennedy paid a courtesy call on Ronald Reagan recently, he made only one Cabinet suggestion. Western surveillance satellites confirmed huge Soviet troop concentrations virtually encircling Poland. February 17, 2004 Lexical Cohesion and Coherence 2/34

Today s Topics Text Cohesion Two linguistic theories of text connectivity Text Cohesion (Halliday&Hasan 76) Centering Theory (Grosz&Joshi&Weinstein 83) Application to automatic essay scoring Cohesion captures devices that link sentences into a text Lexical cohesion References Ellipsis Conjunctions Lexical Cohesion and Coherence 5/34 Lexical Cohesion and Coherence 7/34 Text Representation Text cohesion -------------------------------------------------------------------------------------------------------------+ Sentence: 05 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 -------------------------------------------------------------------------------------------------------------+ 14 form 1 111 1 1 1 1 1 1 1 1 1 1 8 scientist 11 1 1 1 1 1 1 5 space 11 1 1 1 25 star 1 1 11 22 111112 1 1 1 11 1111 1 5 binary 11 1 1 1 4 trinary 1 1 1 1 8 astronomer 1 1 1 1 1 1 1 1 7 orbit 1 1 12 1 1 6 pull 2 1 1 1 1 16 planet 1 1 11 1 1 21 11111 1 1 7 galaxy 1 1 1 11 1 1 4 lunar 1 1 1 1 19 life 1 1 1 1 11 1 11 1 1 1 1 1 111 1 1 27 moon 13 1111 1 1 22 21 21 21 11 1 3 move 1 1 1 7 continent 2 1 1 2 1 3 shoreline 12 6 time 1 1 1 1 1 1 3 water 11 1 6 say 1 1 1 11 1 3 species 1 1 1 -------------------------------------------------------------------------------------------------------------+ Sentence: 05 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 -------------------------------------------------------------------------------------------------------------+ Hobbs Example(1982) The concept of cohesion refers to relations of meaning that exist within the text, and that defines it as a text. Cohesion occurs where the interpretation of some element in the discourse dependent on that of another. Lexical Cohesion and Coherence 4/34 Lexical Cohesion and Coherence 6/34

Lexical Chains: Example Lexical Chains: Computation 1. There was once a little girl and a little boy and a dog 2. And the sailor was their daddy 3. And the little doggy was white 4. And they like the little doggy Associanist text models 5. And they stroke it 6. And they fed it 7. And they ran away 8. And then daddy had to go on a ship Define word similarity function Define insertion conflict strategy (greedy vs. dynamic strategy) 9. And the children misssed em 10. And they began to cry Lexical Cohesion and Coherence 9/34 Lexical Cohesion and Coherence 11/34 Example Lexical Chains: Applications Halliday&Hasan(1982) Summarization Time flies. Segmentation - You can t; they fly too quickly. Malapropism Detection Find three cohesive ties! Information Retrieval Lexical Cohesion and Coherence 8/34 Lexical Cohesion and Coherence 10/34

Lexical Chains: Accuracy Vector-Based Coherence Assessment Example: Entertainment-service 1 auto-maker 1 enterprise 1 massachusetts-institute 1 technology-microsoft 1 microsoft 10 concern 1 company 6 Each sentence is represented as a weighted vector of its terms SENTENCE 1 : 1 0 0 0 1 1 0 SENTENCE 2 : 1 1 1 1 0 0 1 The accuracy bounded by the quality of a lexical resource The need in disambiguation makes the task harder Disambiguation accuracy around 60% For more examples see: http://www.cs.columbia.edu/nlp/summarization-test/index.html Distance between two adjacent sentences is measured using cosine t sim(b 1, b 2 ) = w y,b 1 w t,b2 n t w2 t,b 1 t=1 w2 t,b 2 Lexical continuity is measured as average distance between sentences in a paragraph Lexical Cohesion and Coherence 13/34 Lexical Cohesion and Coherence 15/34 Lexical Chains: Example Automatic Measurement of Text Coherence Cohesive ties reflect the degree of text coherence First attempts to (semi-) automate cohesion judgments rely on: propositional modeling of text structure (Kintsch&van Dijk 78) time consuming and requires training readability measures (Flesch 48) weak correlation with comprehension measures Lexical Cohesion and Coherence 12/34 Lexical Cohesion and Coherence 14/34

Experimental Set-Up Results Data from (Britton& Gulgoz 88) Source: text on the airwar in Vietnam from an Air Force training textbook Weighted No. Inference LSA word props Efficiency mult. Various revision methods to improve text readability: Principled (based on propositional model) Heuristic (based on reader s intuition) Text coherence overlap recalled (props/min) choice Original 0.192 0.047 35.5 3.44 37.11 Readability rev. 0.193 0.073 32.8 3.57 29.74 Principled rev. 0.347 0.204 58.6 5.24 46.44 Heuristic rev. 0.403 0.225 56.2 6.01 48.23 Readability (based on readability index) Lexical Cohesion and Coherence 17/34 Lexical Cohesion and Coherence 19/34 Term similarity Experimental Set-Up Latent Semantic Analysis (Deerwester 90) Goal: identification of semantically similar words birth, born, baby Assumption: the context surrounding a given word provides important information about its meaning Method: Singular Vector Decomposition Data from (Britton& Gulgoz 88) Evaluation: based on recall, efficiency recall and scores on a multiple choice Assessment: Principled and Heuristic is better than Readability and Original Lexical Cohesion and Coherence 16/34 Lexical Cohesion and Coherence 18/34

Centering Theory Analysis (Grozs&Joshi&Weinstein 95) Goal: to account for differences in perceived discourse Focus: local coherence global vs immediate focusing in discourse (Grosz 77) The same content, different realization Variation in coherence arises from choice of syntactic expressions and syntactic forms Method: analysis of reference structure Lexical Cohesion and Coherence 21/34 Lexical Cohesion and Coherence 23/34 Understanding the Results Phenomena to be Explained No significant difference between LSA and the baseline model in this experiment Other experiments showed that LSA may perform better, but note need in parameter estimation Neither model is used for prediction Johh went to his favorite music store to buy a piano. He had frequented the store for many years. He was excited that he could finally buy a piano. He arrived just as the store was closing for the day. John went to his favorite music store to buy a piano. It was a store John had frequented for many years. He was excited that he could finally buy a piano. It was closing just as John arrived. Lexical Cohesion and Coherence 20/34 Lexical Cohesion and Coherence 22/34

Centering Theory: Basics Example Unit of analysis: centers Affiliation of a center: utterance (U) and discourse segment (DS) Function of a center: to link between a given utterance and other utterances in discourse John went to his favorite music store to buy a piano. It was a store John had frequented for many years. He was excited that he could finally buy a piano. It was closing just as John arrived. Lexical Cohesion and Coherence 25/34 Lexical Cohesion and Coherence 27/34 Another Example Center Typology John really goofs sometimes. Yesterday was a beautiful day and he was excited about trying out his new sailboat. He wanted Tony to join him on a sailing trip. He called him at 6am. He was sick and furious at being woken up so early. Types: Forward-looking Centers C f (U, DS) Backward-looking Centers C b (U, DS) Connection: C b (U n ) connects with one of C f (U n 1 ) Lexical Cohesion and Coherence 24/34 Lexical Cohesion and Coherence 26/34

Center Continuation Center Shifting Continuation of the center from one utterance not only to the next, but also to subsequent utterances C b (U n+1 )=C b (U n ) C b (U n+1 ) is the most highly ranked element of C f (U n+1 ) (thus, likely to be C b (U n+2 ) Shifting the center, if it is neither retained no continued C b (U n+1 ) <> C b (U n ) Lexical Cohesion and Coherence 29/34 Lexical Cohesion and Coherence 31/34 Constraints on Distribution of Centers Center Retaining C f is determined only by U; C f are partially ordered in terms of salience The most highly ranked element of C f (U n 1 ) is realized as C b (U n ) Syntax plays role in ambiguity resolution: subj > ind obj > obj > others Retention of the center from one utterance to the next C b (U n+1 )=C b (U n ) C b (U n+1 ) is not the most highly ranked element of C f (U n+1 ) (thus, unlikely to be C b (U n+2 ) Types of transitions: center continuation, center retaining, center shifting Lexical Cohesion and Coherence 28/34 Lexical Cohesion and Coherence 30/34

Application to Essay Grading (Miltsakaki&Kukich 00) Framework: GMAT e-rater Implementation: manual annotation of coreference information Grading: based on ratio of shifts Data: GMAT essays Lexical Cohesion and Coherence 33/34 Coherent Discourse Study results Coherence is established via center continuation John went to his favorite music store to buy a piano. He had frequented the store for many years. John went to his favorite music store to buy a piano. It was a store John had frequented for many years. Correlation between shifts and low grades (established using t-test) He was excited that he could finally buy a piano. He arrived just as the store was closing for the day. He was excited that he could finally buy a piano. It was closing just as John arrived. Improvement of score prediction in 57% Lexical Cohesion and Coherence 32/34 Lexical Cohesion and Coherence 34/34