Identifying Topic and Focus by an Automatic Procedure

Similar documents
CS 598 Natural Language Processing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Parsing of part-of-speech tagged Assamese Texts

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Word Stress and Intonation: Introduction

Proof Theory for Syntacticians

Ch VI- SENTENCE PATTERNS.

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Today we examine the distribution of infinitival clauses, which can be

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Constraining X-Bar: Theta Theory

Developing Grammar in Context

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Underlying and Surface Grammatical Relations in Greek consider

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

An Interactive Intelligent Language Tutor Over The Internet

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

English Language and Applied Linguistics. Module Descriptions 2017/18

First Grade Curriculum Highlights: In alignment with the Common Core Standards

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Multimedia Application Effective Support of Education

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Language Center. Course Catalog

Argument structure and theta roles

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Procedia - Social and Behavioral Sciences 154 ( 2014 )

BULATS A2 WORDLIST 2

Developing a TT-MCTAG for German with an RCG-based Parser

L1 and L2 acquisition. Holger Diessel

Grammars & Parsing, Part 1:

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Loughton School s curriculum evening. 28 th February 2017

Character Stream Parsing of Mixed-lingual Text

What the National Curriculum requires in reading at Y5 and Y6

The College Board Redesigned SAT Grade 12

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Some Principles of Automated Natural Language Information Extraction

CHAPTER IV RESEARCH FINDING AND DISCUSSION

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Advanced Grammar in Use

Context Free Grammars. Many slides from Michael Collins

Construction Grammar. University of Jena.

LING 329 : MORPHOLOGY

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Frequency and pragmatically unmarked word order *

Control and Boundedness

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Copyright 2017 DataWORKS Educational Research. All rights reserved.

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Hindi-Urdu Phrase Structure Annotation

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

California Department of Education English Language Development Standards for Grade 8

Unit 8 Pronoun References

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Language properties and Grammar of Parallel and Series Parallel Languages

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Theoretical Syntax Winter Answers to practice problems

Pseudo-Passives as Adjectival Passives

Common Core State Standards for English Language Arts

The Discourse Anaphoric Properties of Connectives

Derivations (MP) and Evaluations (OT) *

DOWNSTEP IN SUPYIRE* Robert Carlson Societe Internationale de Linguistique, Mali

The Effect of Syntactic Simplicity and Complexity on the Readability of the Text

Derivational and Inflectional Morphemes in Pak-Pak Language

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Compositional Semantics

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Dissertation Summaries. The Acquisition of Aspect and Motion Verbs in the Native Language (Aristotle University of Thessaloniki, 2014)

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Aspectual Classes of Verb Phrases

AQUA: An Ontology-Driven Question Answering System

The Strong Minimalist Thesis and Bounded Optimality

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Using dialogue context to improve parsing performance in dialogue systems

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

Highlighting and Annotation Tips Foundation Lesson

On the Notion Determiner

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Transcription:

Identifying Topic and Focus by an Automatic Procedure Eva Haji~ov~ & Petr Sgall Institute of Formal and Applied Linguistics Charles University Malostransk6 n~trn. 25, 118 00 Praha 1 Czech Republic (hajicova@cspguk11.bitnet, sgall@espgukl 1.bitnet) Hana Skonmalovli Institute of Theoretical and Computational Linguistics Charles University Celetn~t 13, 110 00 Praha 1 Czech Republic (skoumal@prahal.ff.curd.cs) Abstract An algorithm for automatic identification of topic and focus of the sentence is presented, based on dependency syntax and using written input, which is much more ambiguous than spoken utterance. 1. The dichotomy of topic and focus, based, in the Praguean Functional Generative Description, on the scale of communicative dynamism (underlying word order), is relevant not only for a possible placement of the sentence in a context, but also for its semantic interpretation. The underlying word order differs from the surface one especially in that the verb stands moreto the right than all its complementations belonging to the topic of the sentence (or to the local topic of the clause headed by the verb), and more to the left than those belonging to the focus. Using a dependency grammar (or, more or less equivalently, a flat structure in a constituency based grammar), we can illustrate this by the following example, where (1') is a simplified underlying representation of (1) on a reading answering e.g. the question Where has Charles found my pen?: (1) Charles has found your pen in a box lying on the table. (1') (Charles)Act ((you)app,a pen)obj find.pelf Ceox.Indef ((Rel)Act lie (table)l~.o,)c~o, )L~.~, In (1') every pair of parentheses encompasses a dependent item (i.e. corresponds to an edge of the linearized dependency tree), the indices of parentheses denote kinds of dependency (valency slots, or theta roles and adjuncts): Act stands for Actor (underlying Subject), Appurt for Appurtenance (Possessivity in a broader sense), Obj for Objective (underlying ~Object), Loc for Locative, Gener for the General Relationship (of an adjunct to its head); the other indices denote values of morphological categories (Perfect, Indefiniteness) and of adverbial prepositions (in, on), Rel denotes a relative pronoun (here 178

deleted on the surface). For more details of the descriptive framework used, see Sgall et al. (1986, Chapters 2 and 3). An automatic identification of topic and focus may use the input information on surface word order, on the dependency relations between autosemantic lexical occurrences, on the systemic ordering of kinds of complementations (reflected by the underlying order of the items included in the focus), on definiteness, on lexical semantic properties of words and (if spoken input is used) on the position of the intonation center (sentence stress). The primary position of the intonation center is at the end of the sentence (where it need not be phonetically realized by a specific stress), but also in another (secondary) position the intonation center marks the most dynamic part of the sentence (focus proper), cf. (2), where the underlying order is as indicated by (2'): (2) Charles has found your PEN in a box lying on the table. (2') (Charles) (box ((Rel) (table) lie)) find ((you) pen) After several years of research in this domain, which has included psycholinguistic experiments with Czech and German sentences, as well as investigations with native speakers of English, we are convinced that in the individual languages there exists a basic ordering of the kinds of complementations of every verb (noun, adjective). We assume that this ordering, called systemic ordering, directly determines the underlying word order in the focus, so that if a sentence part A follows another one, B, under systemic ordering, then B is less dynamic than A (i.e. B precedes A in the underlying word order) only if B belongs to the topic. In the topic part of the sentence the underlying word order often differs from systemic ordering. The systemic ordering of some of the main kinds of complementations in English has the following shape: Time - Actor- Addressee - Objective- Origin - Effect Manner Directional(from) - Means - Directional(to) - Locative 2. An automatic identification of topic, focus and the degrees of communicative dynamism, discussed in a preliminary way by Haji6ova and Sgall (1985), can be based on the following considerations: In languages with a high degree of "free" word order (as in most Slavonic languages), a secondary position of the intonation center is frequent only in spoken dialogues. In technical texts (spoken or written) there is a strong tendency to arrange the words so that the intonation center falls on the last word of the sentence (where it need not be phonetically manifested), of course with the exception of enclitic words. A general procedure for determining the topic-focus articulation in such languages can then be formulated as follows: (i) All complementations (participants and adverbials, or arguments and adjuncts) preceding the verb are contextually bound. As for the complementations following the verb, a "main rule" may be stated: the boundary between topic (to the left) and focus (to the right) can be drawn between any two elements, provided that those belonging to the focus are arranged in the surface word order in accordance with systemic ordering of the kinds of complementations. (ii) The verb is ambiguous as for its position in the topic or in the focus. (iii) If a spoken utterance (with its intonation center identified) is analyzed, then similar regularities hold for sentences with normal intonation (intonation center at the end). However, if a non-finai element carries the intonation center, then all the complementations standing after this element are contextually bound; for the rest of the sentence, (i) and (ii) hold; the bearer of the intonation center belongs to the focus. In English the surface word order is determined by grammatical rules to a large 179

extent, so that intonation plays a more decisive role than in the Slavonic languages. The written shape of the sentence does not suffice here to determine the topic-focus articulation to such a degree as e.g. in Czech. The "main rule" also applies, but otherwise only certain important regularities can be stated here on the basis of word order and grammatical values (especially the articles and other determiners). In order to be able to reduce the ambiguity of the written shape of the English sentence as much as possible, it is also necessary to take into account certain semantic clues: especially with Locative and the Temporal modifications, it is important to distinguish between specific information (e.g. on a nice September day, on October 22, 1991, seven months ago) and items containing just a general setting (e.g. always) or being directly (as indexicals) determined from the utterance (here, today, this year). The latter examples usually belong to the topic, the former ones typically occurring in the focus. As for the verb, it is important to have access to the verb of the preceding utterance: if the main verb of sentence n has the same meaning as (or a meaning included in) that of sentence n- 1, then it belongs to the topic; also verbs with very general lexical meanings (such as be, have, happen, carry out, become) may be handled as belonging to the topic. Otherwise (i.e. in the unmarked case), the verb generally belongs to the focus. 3. In the output of the algorithmic procedure completing the parsing of a written English sentence, many ambiguities remain, but it is known that sentences (even in their spoken shape) often are ambiguous as for their topic-focus articulation, so that it should be understood as a good result if the procedure identifies such an ambiguity. The algorithm has been formulated as follows: (a) The input to our part of the parser is assumed to have passed through the preceding parts, by which the dependency structure of the sentence has been identified, so that also the underlying dependency relations (valency positions) of the complementations (to the governing verb) are known. (b) If the verb occupies the rightmost position in the sentence and its subject is (ba) definite (including noun groups with this, one of the, etc.), then the verb belongs to the focus getting the index f, and its subject belongs to the topic, which we denote by the index t; (bb) indef'mite, then the subject is (indexed by) f and the verb is t. In either case, the other complementations are handled according to (cb) below. (c) If the verb does not occupy the rightmost position, then: (ca) the verb itself is understood as t, if it has a very general lexical meaning (see above), or as f if its meaning is very specific, or else the verb is characterized as intermediate, i.e. ambiguous, abbreviated as (t/0; (cb) the eomplementations preceding the verb are denoted as t, with the exception of an indefinite subject and of a specific (i.e. neither general nor highly indexical, see above) Temporal complementation; either of the latter two is characterized as t/f; (cc) to the right of the verb, (i) if there is a single complementation, and this is a personal pronoun or another definite noun group, then it is t or t/f, respectively; (ii) if the rightmost complementation is Temp or Loc, then if it is specific, it is f and otherwise it is t; if it is another kind of complementation, then if it is indefinite, it is f and if definite, it is t/f; (iii) if there is such an ordered pair A,B to the right of the verb that falls to follow systemic ordering (see Section 2 and the "main rule" above), and B has not been assigned the index t according to (ii), then, for the rightmost such pair, A belongs to the topic (t), and so do all the complementations between A and the verb; the rightmost complementation 180

of the whole sentence is f (only a personal pronoun following another one is t/f in this position), all those standing between A and the rightmost one are t/f; (iv) if (iii) does not apply then all remaining complementations to the right of the verb are t/f. (d) If all the complementations have been determined as t, then (da) if the verb was t/f after point (ca) and the rightmost complementation is a definite noun group, an indexical word or pronoun, then this rightmost element gets f (this result is abbreviated as t(f)); (db) if (da) does not apply, then both the rightmost element of the sentence and its verb get t/f. (e) The remaining representations containing no f are discarded. (f) The complementations with the index t are shifted to the left of the verb, those with f, to the right of it. Let us add that our algorithm only determines the appurtenance of an element to the topic or to the focus, but does not specify the underlying word order within topic. When implemented (together with a simplified parser), the algorithm was checked with a set of sentences, and it yielded the expected results, cf. the following examples (the notation of which is simplified in that the indices characterizing the underlying structure (cf. (1') above) are left out). NOTE: Our examples concern written English sentences. In its present form, the algorithm handles only the verb and the parts of sentence immediately depending on it; deeper embedded items (esp. adjuncts of nouns) are left aside for the time being. Examples: (A) Charles found the pen in a box. The steps of the analysis (mostly in a simplified notation, without the grammatical indices): after the application of (a): (Charles)Act find.pret (pen.indef)obj Coox).m (ca): Charles find.t/f pen box (cb): Charles.t find.t/f pen box (cc)(ii) Charles.t find.t/f pen box.f (iv) Charles.t find.t/f pen.t/f box.f (f) and resolution of the abbreviation t/f: Charles.t find.f pen.f box.f (e.g. answering: Why are the children so happy?) Charles.t pen.t find.f box.f (e.g. answering: How did Charles get the pen?) Charles.t find.t pen.f box.f (e.g. answering: What did Charles find where?) Charles.t pen.t find.t box.f (e.g. answering: Where did Charles find the pen?) (B) A Frenchman proved the theorem. (a) (Frenchman.Indef)Aot prove (theorem)obi (ca) Frenchman prove.t/f theorem (cb) Frenchman.t prove.t/f theorem (cc)(i) Frenchman.t/f prove.t/f theorem, t/f (e),(f) prove.f Frenchman.f theorem.f (without topic) Frenchman.t prove.f theorem.f (e.g. answering: What did Frenchmen achieve in this field?) prove.t Frenchman. f theorem, f Frenchman. t prove.t theorem, f theorem.t prove.f Frenchman.f (i.e. pronounced A Frenchman PROVED the theorem) Frenchman.t theorem.t prove.f (ditto) theorem.t prove.t Frenchman.f (e.g. answering: Who proved the theorem?) (C) At noon Mike awoke. (a) (noon)temp (Mike)Act awake Coa) noon Mike.t awake, f (cb) noon.t/f Mike.t awake.f 181

(e),(f) Mike.t awake.f noon.f Mike.t noon.t awake.f (D) Yesterday we arrived to Nice from Grenoble. (a) (yesterday)r,~, (we)act arrive (Nice)m,.t, (Grenoble)D~.f,o,, (ca) yesterday we arrive.t/f Nice Grenoble (cb) yesterday.t we.t arrive.t/f Nice Grenoble (cc)(ii) yesterday.t we.t arrive.t/f Nice Grenoble.t/f (cc)(iii) yesterday.t we.t arrive.t/f Nice.t Grenoble.t/f (e),(f) yesterday.t we.t Nice.t arrive.f Grenoble.f yesterday.t we.t Nice.t arrive.t Grenoble.f yesterday.t we.t Nice.t Grenoble.t arrive.f re) Bob met her. (a) (yesterday)r,~, (Bob)not meet (she)obi (ca) yesterday Bob meet.t/f she (cb) yesterday.t Bob.t meet.t/f she (cc)(i) yesterday.t Bob.t meet.t/f she.t (d) yesterday.t Bob.t meet.t/f she.t(f) (e),(f) yesterday.t Bob.t she.t meet.f (i.e. Yesterday Bob MET her) yesterday.t Bob.t meet.t she.f (i.e. Yesterday Bob met HER (rather than HIM) or similarly) References [Haji~v~i and Sgall, 1985] Eva Haji~Wi and Petr SgaU. Towards an automatic identification of topic and focus. Proceedings of the 2nd Conference of the European Chapter of the Association for Computational Linguistics, Geneva, 263-267, 1985. [Sgall, 1986] Petr Sgall, Eva Haji~ov~i and Jarmila Panevov~i. The meaning of the sentence in its semantic and pragmatic aspects. Ed. by J. Mey. Dordrecht:Reidel - Prague:Academia, 1986. 182