A Computational Implementation of Internally Headed Relative Clause Constructions

Similar documents
Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Pre-Processing MRSes

15 The syntax of overmarking and kes in child Korean

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Hindi Aspectual Verb Complexes

Korean ECM Constructions and Cyclic Linearization

Chapter 4: Valence & Agreement CSLI Publications

Implementing the Syntax of Japanese Numeral Classifiers

Control and Boundedness

On the Notion Determiner

Underlying and Surface Grammatical Relations in Greek consider

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Argument structure and theta roles

Building an HPSG-based Indonesian Resource Grammar (INDRA)

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Words come in categories

Second Language Acquisition of Korean Case by Learners with. Different First Languages

cmp-lg/ Jul 1995

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

EPP Parameter and No A-Scrambling

Constraining X-Bar: Theta Theory

Direct and Indirect Passives in East Asian. C.-T. James Huang Harvard University

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Compositional Semantics

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Switched Control and other 'uncontrolled' cases of obligatory control

An Interactive Intelligent Language Tutor Over The Internet

Construction Grammar. University of Jena.

Developing a TT-MCTAG for German with an RCG-based Parser

Some Principles of Automated Natural Language Information Extraction

Parsing of part-of-speech tagged Assamese Texts

Proof Theory for Syntacticians

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Feature-Based Grammar

The Discourse Anaphoric Properties of Connectives

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

Theoretical Syntax Winter Answers to practice problems

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

CS 598 Natural Language Processing

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

LNGT0101 Introduction to Linguistics

A Computational Evaluation of Case-Assignment Algorithms

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Heads and history NIGEL VINCENT & KERSTI BÖRJARS The University of Manchester

Context Free Grammars. Many slides from Michael Collins

Construction Grammar. Laura A. Michaelis.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Grammars & Parsing, Part 1:

An Introduction to the Minimalist Program

Structure-Preserving Extraction without Traces

Specifying Logic Programs in Controlled Natural Language

Derivational and Inflectional Morphemes in Pak-Pak Language

Type-driven semantic interpretation and feature dependencies in R-LFG

THE VERB ARGUMENT BROWSER

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

Multiple case assignment and the English pseudo-passive *

Language acquisition: acquiring some aspects of syntax.

Natural Language Processing. George Konidaris

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

LFG Semantics via Constraints

THE FU CTIO OF ACCUSATIVE CASE I MO GOLIA *

Pseudo-Passives as Adjectival Passives

Ch VI- SENTENCE PATTERNS.

LING 329 : MORPHOLOGY

Update on Soar-based language processing

Using a Native Language Reference Grammar as a Language Learning Tool

The optimal placement of up and ab A comparison 1

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Som and Optimality Theory

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

The Strong Minimalist Thesis and Bounded Optimality

UC Berkeley Dissertations, Department of Linguistics

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Accurate Unlexicalized Parsing for Modern Hebrew

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

Prediction of Maximal Projection for Semantic Role Labeling

The Interface between Phrasal and Functional Constraints

The Role of the Head in the Interpretation of English Deverbal Compounds

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

AQUA: An Ontology-Driven Question Answering System

Today we examine the distribution of infinitival clauses, which can be

Adapting Stochastic Output for Rule-Based Semantics

Hindi-Urdu Phrase Structure Annotation

A relational approach to translation

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

On the Position of Adnominal Adjectival Expressions in Korean

Word Formation is Syntactic: Raising in Nominalizations

Tibor Kiss Reconstituting Grammar: Hagit Borer's Exoskeletal Syntax 1

Oblique Case Marking on Core Arguments in Korean and Japanese

Transcription:

A Computational Implementation of Internally Headed Relative Clause Constructions Jong-Bok Kim 1, Peter Sells, and Jaehyung Yang 1 School of English, Kyung Hee University, Seoul, Korea 10-701 jongbok@khu.ac.kr Dept. of Linguistics, Stanford University, USA sells@stanford.edu School of Computer Engineering, Kangnam University, Kyunggi, 446-70, Korea jhyang@kangnam.ac.kr Abstract. The so-called Internally Headed Relative Clause (IHRC) construction found in the head-final languages Korean and Japanese has received little attention from computational perspectives even though it is frequently found in both text and speech. This is partly because there have been no grammars precise enough to allow deep processing of the construction s syntactic and semantic properties. This paper shows that the typed feature structure grammar HPSG (together with the semantic representations of Minimal Recursion Semantics) offers a computationally feasible and useful way of deep-parsing the construction in question. 1 Introduction In terms of truth conditions, there is no clear difference between a (Korean) IHRC (internally head relative clause) like (1)a and and EHRC (externally headed relative clause) like (1)b. 1 (1) a. Tom-un [sakwa-ka cayngpan-wi-ey iss-nun kes]-ul mekessta Tom-top apple-nom tray-top-loc exist-pne kes-acc ate Tom ate an apple, which was on the tray. b. Tom-un [ cayngpan-wi-ey iss-nun sakwa]-ul mekessta. Tom-top tray-top-loc exist-pne apple-acc ate Tom ate an apple that was on the tray. Both describe an event in which an apple is on the tray, and Tom s eating it. Yet, there exist several intriguing differences between the two constructions. One crucial difference between the IHRC and EHRC comes from the fact that 1 We thank anonymous reviewers for their helpful comments and suggestions. This work was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF-005-04-A00056). The following is the abbreviations used for glosses and feature attributes in this paper: acc (accusative), comp (complementizer), loc (locative), nom (nominative), pne (prenominal), top (topic), etc. T. Salakoski et al. (Eds.): FinTAL 006, LNAI 419, pp. 4 1, 006. c Springer-Verlag Berlin Heidelberg 006

A Computational Implementation of IHRC Constructions 5 the semantic object of mekessta ate in the IHRC example (1)a is the NP sakwa apple buried inside the embedded clause. It is thus the subject of the embedded clause that serves as the semantic argument of the main predicate ([1], []). In the analysis of such IHRCs, the central questions thus involve (a) the key syntactic properties, (b) the association of the internal head of the IHRC clause with the matrix predicate so that the head can function as its semantic argument, and (c) the differences between the IHRC and EHRC. This paper provides a constraint-based analysis within the framework of HPSG (Head-driven Phrase Structure Grammar) and implements it in the existing HPSG grammar for Korean using the LKB (Linguistic Building Knowledge) system to check the computational feasibility of the proposed analysis. Implementing an Analysis.1 Syntactic Aspects of the IHRC One main morphological property of the IHRC construction is shown in ()b: the embedded clausal predicate should be in the adnominal present form of (n)un, followed by the so-called bound noun kes. This clearly contrasts with the EHRC example ()a, in which the predicate can have any of the three different markers of tense information: 4 () a. Tom-i i ilk-nun/un/ul chayk i Tom-nom read-pres.pne/pst.pne/fut.pne book the book that Tom reads/read/will read b. Tom-un [sakwa-ka cayngpan-wi-ey iss-nun/*ul kes]-ul mekessta Tom-top apple-nom tray-top-loc exist-pne kes-acc ate Tom ate an apple, which was (lit. is ) on the tray. In traditional Korean grammar, kes in the IHRC is called a dependent noun, in that it always requires either a modifying determiner or clause, even in a non-ihrc usage: () a.*(i/ku/ce) kes *(this/that) thing b.*(nay-ka mek-un) kes the thing (*that I ate) This close syntactic relation between the clause and the noun kes can also be found in the fact that unlike canonical nouns, it must combine with a preceding adnominal clause: (4) Na-nun *(kangto-ka unhayng-eyse nao-nun) kes-ul capassta I-top robber-nom bank-from come-out-pne kes-acc caught I arrested the robber who was coming out of the bank. The LKB, freely available with open source (http://lingo.stanford.edu), is a grammar and lexicon development environment for use with constraint-based linguistic formalisms such as HPSG. cf. []. 4 These three prenominal markers in the EHRC extend their meanings to denote aspects when combined with (preceding) tense suffixes.

6 J.-B. Kim, P. Sells, and J. Yang These examples show that the pronoun kes selects an adnominal clause as its complement, and that the IHRC requires a specific inflected form of its predicate. Then, what is the relationship between the whole IHRC clause including kes and the matrix verb? To relate the matrix verb with this construction with an internal semantic head, it was assumed in transformational grammar that it was necessary to introduce an empty category such as pro to the right of the adnominal clause, on the assumption that the IHRC is an adjunct clause (Jhang 1991). However, there is ample evidence showing that the clause is a direct syntactic nominal complement of the matrix predicate. One strong argument against an adjunct treatment centers on the passivization of the IHRC clause. As shown in (5), an object IHRC clause can be promoted to the subject of the sentence. (5) [Tom-i talli-nun kes]-i Mary-eyeuyhayse caphiessta Tom-nom run-pne kes-nom Mary-by be.caught Tom, who was running, was caught by Mary. Another fact concerning the status of the IHRC comes from stacking: whereas more than one EHRC clause can be stacked, only one IHRC clause is possible: (6) a.*kyongchal-i [kangto-ka unhayng-eyse nao-nun] police-nom [robber-nom bank-from come.out-pne] [ton-ul hwumchi-n] kes-ul chephohayssta money-acc steal-pne kes-acc arrested (int.) The police arrested a thief coming out of the bank, stealing money. b. kyongchal-i [ unhayng-eyse nao-nun] police-nom [ bank-from come.out-pne] [ton-ul hwumchi-n] kangto-lul chephohayssta money-acc steal-pne robber-acc arrested (int.) The police arrested a thief coming out of the bank, stealing money. This contrast implies that the adnominal clause which is the IHRC has the canonical properties of a complement clause. Based on these observations, we assume the structure (7) for the internal and external structure of the IHRC in (1)a: (7) VP " # hd-comp-ph SUBJ NP NP " # V hd-comp-ph Λ COMPS NP COMPS 1 S apple-nom tray-on.top.of-loc exist-pne kes-ul N COMPS 1 Λ ate

A Computational Implementation of IHRC Constructions 7 As represented in the tree, kes combines with its complement clause, forming a hd-comp-ph (head-complement-ph). This resulting NP also functions as the complement of the matrix verb ate.. Semantic Aspects of the IHRC and Related Constructions One thing to note is that IHRCs are syntactically very similar to DPCs (direct perception constructions). IHRCs and DPCs both function as the syntactic argument of a matrix predicate. However, in the IHRC (8)a, the internal argument John within the embedded clause functions as the semantic argument of caught. Meanwhile, in (8)b it is the whole embedded clausal complement that functions as its semantic argument: (8) a. Mary-nun [John-i talli-nun kes]-ul capassta. Mary-top John-nom run-pne kes-acc caught Mary caught John who was running. b. Mary-nun [John-i talli-nun kes]-ul poassta. Mary-top John-nom run-pne kes-acc saw Mary saw John running. The only difference between (8)a and (8)b is the matrix predicate, which correlates with the meaning difference. When the matrix predicate is an action verb such as capta catch, chepohata arrest, or mekta eat as in (8)a, we obtain an entity reading for the clausal complement. But as in (8)b we will have only an event reading when the matrix predicate is a type of perception verb such as po-ta see, al-ta know, and kiekhata remember. The key point in our analysis is thus that the interpretation of kes is dependent uponthetypeofmatrixpredicate.hence the lexical entries in our grammar involve not only syntax but also semantics. For example, the verb cap-ta catch in (9) lexically requires its object to refer to a ref-ind (referential-index) whereas the verb po-ta see in (10) selects an object complement whose index is indiv-ind (individual index) whose subtypes include ref-ind and event-ind, indicating that its object can be either a referential individual or an event. 5 5 The meaning representations adopted here involve Minimal Recursion Semantics (MRS), developed by [4]. This is a framework of computational semantics designed to enable semantic composition using only the unification of type feature structures. The value of the attribute SEM(ANTICS) we used here represents simplified MRS, though it originally includes HOOK, RELS, and HCONS. The feature HOOK represents externally visible attributes of the atomic predications in RELS (RELA- TIONS). The value of LTOP is the local top handle, the handle of the relations with the widest scope within the constituent. The value of XARG is linked to the external argument of the predicate. See [4] and [5] for the exact function(s) of each attribute. We suppress irrelevant features.

8 J.-B. Kim, P. Sells, and J. Yang (9) cap-ta catch po-ta see " # " # SUBJ NP i SUBJ NP SYN VAL i COMPS NP j SYN VAL COMPS NP j a. * PRED catch v rel + b. * PRED see v rel ARG0 e1 + 6SEM RELS 6 7 ARG0 e1 4 4ARG1 i[ref-ind] 5 7 6SEM RELS 6 7 5 4 4ARG1 i[ref-ind] 5 7 5 ARG j[ref-ind] ARG j[ind-ind] These lexical entries will then project an identical syntactic structure for (8)a and (8)b, represented together here in (10): (10) VP Λ INDEX e1 V NP " # Λ COMPS INDEX 1 INDEX e1 S N " # " # INDEX e1 COMPS caught/saw XARG i INDEX 1 John i-nom run-pne kes-ul As represented in the structure, in both constructions kes selects an adnominal S as its complement and forms a hd-comp-ph with it. The resulting NP serves the complement of the main verb caught or saw. However, semantically, due to the lexical entries in (9), the object of caught is linked to the external argument (XARG) robber whereas that of saw in (9)b is linked to the event denoted by the S. 6 The type of predicate thus determines whether the INDEX value of kes will be identified with that of the S or that of its XARG, as presented in the lexical entries: (11) kes " # HEAD POS noun a. 6SYN 4 VAL COMPS S[INDEX e1] 7 5 SEM HOOK INDEX e1 kes HEAD POS noun b. SYN4 h i 5 6 VAL COMPS S XARG i 7 4 5 SEM HOOK INDEX i 6 The feature XARG refers to the external argument in control constructions like John tries to run. The XARG of run is thus identified the matrix subject John. See[5] for details.

A Computational Implementation of IHRC Constructions 9 This grammar in which lexical information interacts with the other syntactic components ensures that the perception verb saw combines with an NP projected from (11)a whereas the action verb caught with an NP projected from (11)b. Otherwise, the resulting structure will not satisfy the selectional restrictions of the predicates. Incorporating this into our Korean grammar, 7 we implemented this analysis in the LKB and obtained the following two parsed trees and MRSs for the two examples: 7 The current Korean Resource Grammar has 94 type definitions, 6 grammar rules, 77 inflectional rules, 1100 lexical entries, and 100 test-suite sentences, and aims to expand its coverage on real-life data.

0 J.-B. Kim, P. Sells, and J. Yang Leaving aside the irrelevant parts, we can see that the two have the identical syntactic structures but different semantics. In the former, the ARG0 value of kes is identified with the named rel (for John ) but in the latter it is identified with run rel. The analysis thus provides a clean account of the complementary distribution of the IHRC and the DPC. That is, according to our analysis, we obtain an entity reading when the index value of kes is identified with that of the external argument. Meanwhile, we have an event reading when the index value is structure-shared with that of the adnominal S. This analysis thus correctly predicts that there exist no cases where the two readings are available simultaneously. One of the welcome predictions that this analysis brings is that the canonical antecedent of the pronoun kes is the external argument: (1) [haksayng-i aktang-ul cha-nun kes-ul] capassta student-nom rascal-acc kick-pne kes-acc caught (I) caught a student, who was then kicking a rascal. Even though one can catch either a student or a rascal, the semantic object of the verb catch is not the object but the external argument haksayng (attested by our implementation but not included here because of limits on space). Discussion and Conclusion The analysis we have presented so far, part of the typed-feature structure grammar HPSG for Korean aiming at working with real-world data, has been implemented into LKB (Linguistic Knowledge Building System) to test its performance and feasibility. We first inspected the Sejong Treebank Corpus (,95 sentences) and identified 4,610 sentences with [S[FORM nun] +kes]. Of these, we inspected the 518 ACC marked examples, but found only IHRC examples. Another 154 examples used kes in a cleft construction, and 61 as direct perception examples. Among these, we selected canonical types of the IHRC constructions to check if the grammar can parse them both in terms of syntax and semantics. As we have shown in section., the grammar is quite successful in picking up the appropriate semantic head from the IHRC. Of course, issues remain of extending the coverage of our grammar to parse more real-life data and further identifying other constructional types of kes, such as cleft usages. Any grammar, aiming for real world application, needs to provide a correct syntax from which we can build semantic representations in compositional ways. In addition, these semantic representations must be rich enough to capture compositional as well as constructional meanings. In this respect, the analysis we have sketched here seems to be promising in the sense that it provides appropriate semantic representations for the IHRC and DPC in a compositional way, suitable for applications requiring deep natural language understanding.

A Computational Implementation of IHRC Constructions 1 References 1. Kim, Y.B.: Relevance in internally headed relative clauses in korean. Lingua 11 (00) 541 559. Chung, C., Kim, J.B.: Differences between externally and internally headed relative clause constructions. In: Proceedings of HPSG 00, CSLI Publications (00) 5. Copestake, A.: Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford (00) 4. Copestake, A., Flickenger, D., Sag, I., Pollard, C.: Minimal recursion semantics: An introduction. Manuscript (00) 5. Bender, E.M., Flickinger, D.P., Oepen, S.: The grammar matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Carroll, J., Oostdijk, N., Sutcliffe, R., eds.: Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, Taipei, Taiwan (00) 8 14