A Lexical Functional Mapping Algorithm Tamer S. Mahdi 1 and Robert E. Mercer 2 1 IBM Canada Toronto, Ontario, Canada tamer@ca.ibm.com 2 Cognitive Engineering Laboratory, Department of Computer Science The University of Western Ontario, London, Ontario, Canada mercer@csd.uwo.ca Abstract. Semantic interpretation is the process of mapping a syntactic structure to a context-independent meaning representation. An algorithm is presented which maps the f-structure produced by a Lexical- Functional Grammar parser to thematic roles. The mapping algorithm integrates Lexical Mapping Theory, Grimshaw s theory of prominence, and other semantic theories. The algorithm uses selectional restrictions from WordNet to enforce semantic compatibility among constituents. 1 Linguistic Background Lexical Mapping Theory [2] is the backbone of our algorithm. To it are added other elements linking syntax and semantics which are found in other works in semantic representation: [1,4,5,10,11] and others. In order to combine these theories, we have developed a consistent representation and an algorithm that uses this representation. Due to the size of this paper, we ll only describe the basic elements of LMT. For a comprehensive description of the main linguistic theories involved, see [8]. LMT assumes three levels of structure: a structure representing the underlying organization of argument roles called argument structure or a- structure, in addition to the f-structure and c-structure of Lexical-Functional Grammar (LFG) [6]. The theory consists of the following four components [2]: (1) Thematic Structure: The theory postulates a universal hierarchy of thematic roles ordered: agent > beneficiary > recipient/experiencer > instrument > patient/theme > location. (2) Classification of Syntactic Functions: The theory classifies syntactic functions according to the features [-r] and [-o] summarized by: thematically thematically unrestricted [ r] restricted [+r] non-objective [ o] SUBJ OBL θ objective [+o] OBJ OBJ θ This research was performed while the first author was at The University of Western Ontario and was funded by NSERC Research Grant 0036853. R. Cohen and B. Spencer (Eds.): AI 2002, LNAI 2338, pp. 303 309, 2002. c Springer-Verlag Berlin Heidelberg 2002
304 Tamer S. Mahdi and Robert E. Mercer OBL θ and OBJ θ abbreviate multiple oblique functions and restricted objects respectively; one for each semantic role θ: OBL ben,obj recip, etc. The thematically restricted functions have fixed thematic roles. At the same time, subjects and objects may correspond to any thematic role and may even be nonthematic, and hence are thematically unrestricted [-r]. On the other hand, objects have the primitive property of complementing transitive predicators (verbs and prepositions), and not complementing intransitive predicators (nouns and adjectives). This property is called objective [+o].obliques are nonobjectlike [-o]. (3) Lexical Mapping Principles: Thematic roles are associated with partial specifications of syntactic functions by means of lexical mapping principles. A constraint on these principles is the preservation of syntactic information; mapping principles can only add syntactic features and not delete or change them. This monotonicity is allowed by underspecification. There are three types of mapping principles: [2] (a) Intrinsic Role Classifications: These principles associate syntactic functions with the intrinsic meanings of the roles. They include: The Agent Encoding Principle (the agent role is encoded as a nonobjective function [-o], and alternates between subject and oblique), the Theme Encoding principle (a theme or patient role is an unrestricted function [-r], that can alternate between subject and object), and the Locative Encoding Principle (a locative role is encoded as a nonobjective function [-o]). These principles are extended to account for applicative and dative constructions [3]. It is further observed that in some languages (called asymmetrical object languages, e.g. English) at most one role can be intrinsically classified unrestricted [-r]. This constraint is called Asymmetrical Object Parameter (AOP). (b) Morpholexical Operations: Morpholexical operations affect lexical argument structures by adding and suppressing thematic roles. For example, the passive suppresses the highest thematic role in the lexical argument structure. The applicative is a morpholexical operation that adds a thematic role. It adds a newthematic role to the argument structure of a verb belowits highest role, as in Mary cooked the children dinner. (c) Default Role Classifications: The default classifications apply after the entire argument structure has been morpholexically built up. The defaults capture the generalization that the highest thematic role of a verb will be the subject, and lower roles will be nonsubjects. Thus, the general subject default classifies the highest thematic role as unrestricted [-r], and lower roles as restricted [+r]. (4) Well-formedness Conditions: (a) The Subject Condition: Every lexical form must have a subject. (b) Function-Argument Biuniqueness: In every lexical form, every expressed lexical role must have a unique syntactic function, and every syntactic function must have a unique lexical role.
A Lexical Functional Mapping Algorithm 305 2 Computing the Semantic Representation We have classified verbs, and we have defined the appropriate thematic roles for each verb class. We assume an LFG-based parser, like GWWB [7], has already produced the f-structure for the sentence. Our algorithm performs the mapping between the syntactic functions produced by the parser, and the corresponding thematic roles of the sentence s verb. If a match is found, the a-structure for the sentence is produced, otherwise the sentence is rejected. Our system consists of four main parts. Defining and Interpreting the Lexicon Defining a lexicon is not an easy task. There are many words to be defined with the majority of them having multiple senses, and even for a single verb sense, there are different complement structures with varying thematic roles corresponding to each structure. Obviously, defining each of these forms as a separate entry in the lexicon is a tedious task. We need to be able to define a single entry in the lexicon for each verb sense. That entry must capture the various thematic role combinations a verb may take. To do this, we assign optionality codes to the verb s arguments. Together, these codes define the relationships between the arguments. We have designed an algorithm that uses this specification to produce the valid role combinations for each verb. Verb Mapping The verb mapping algorithm is an operationalization of LMT. LMT considers only the thematic dimension. To integrate Grimshaw s views [5] on the interaction of the aspectual and thematic dimensions with LMT s rules of mapping, we define an aspectual prominence indicator for each verb class. The thematic roles defined for a verb class are sorted in descending order according to their prominence on the thematic hierarchy. When the prominence of the roles differs from the thematic to the aspectual dimension, the aspectual prominence indicator is set to the ranking of the aspectually most prominent argument. Before the mapping is started, that argument is swapped with the first argument on the argument list, and the list is resorted starting with its second argument to ensure that the arguments are listed in the correct order of prominence. Since, we know that the absolutely most prominent argument should be mapped to the subject, we exclude it from LMT s mapping rules by changing its intrinsic classification to [-o] regardless of what its current value is, and regardless of the thematic role of that argument. Once this is done the mapping algorithm can start. Sentence Mapping The mapping described in the previous paragraph is concerned with mapping a verb s thematic roles to the corresponding syntactic functions according to LMT s mapping principles. But, mapping a sentence involves more than that. The actual arguments of the sentence being mapped are matched against each of the valid role combinations for the sentence s verb. First, the algorithm checks to see whether the actual number of arguments in the sentence matches the number of arguments in the particular role combination under consideration. If they don t match, an error is issued, otherwise the
306 Tamer S. Mahdi and Robert E. Mercer verb mapping as discussed above is performed. The syntactic functions that result from the mapping are then compared to the syntactic functions from the sentence s f-structure (obtained from the parser). If they all match, the mapping is successful and the sentence s a-structure is produced. If one or more syntactic functions don t match, the next valid combination of roles is mapped until all valid combinations have been considered. If none of the mapped role combinations match the sentence s syntactic functions, the sentence is said to be syntactically ill-formed. Another consideration when mapping sentences is whether the sentence is in the passive or active. If the parser indicates that the sentence is in the passive, the number of arguments in the sentence is expected to be one less than that in the case of the active. If it is, we proceed with the mapping, otherwise there is an error. After sorting the valid role combinations to ensure the most prominent argument is on the top of the list that argument is removed from the top. The mapping then proceeds exactly as with active sentences. Finally, the algorithm performs adjunct mapping, but because of space limitations, we do not show their treatment in this paper. Selectional Restrictions Selectional restrictions give us a means of checking and enforcing compatibility between the verb and the conceptual types of its arguments. To accomplish this, we use WordNet [9]. Specifically, we utilize the part of WordNet that organizes nouns in a hierarchical semantic organization going from many specific terms at the lower levels to a few generic terms at the top. Each noun is linked to a superordinate term/concept, and inherits the properties of that concept in addition to having its own distinguishing properties. Word- Net is searched for the conceptual types of each of the sentence s arguments. The conceptual types found are compared with the predefined conceptual types of the verb s arguments; if the former match or are subordinate terms of the latter, the selectional restrictions are met, and the sentence is accepted otherwise the sentence is rejected. By allowing verb arguments to take more than one conceptual type we overcome the major drawback of selectional restrictions, i.e. being too general or too restrictive. Selectional restrictions are applied in exactly the same way to adjuncts.
3 Sentence Mapping Examples (1) The teacher gave the boy a book A Lexical Functional Mapping Algorithm 307 Verb Arg1 Arg2 Arg3 give (active) the teacher the boy a book f-structure : subj obj objth a-structure : AG REC TH GIVE AG REC TH Intr. -o -r +o Default -r +r ----------------------------------- SUBJ SUBJ/OBJ OBJTH W.F. SUBJ OBJ OBJTH To map the sentence the algorithm uses the sentence s constituents and each constituent s syntactic function (produced by the parser), and the voice (active in this case). The mapping is performed according to LMT: First, the intrinsic classifications apply, assigning [-o] to the agent. Since the recipient role is intrinsically classified [-r], and by AOP (see Section 1) there can only be one role intrinsically classified [-r], the theme takes the alternative [+o] classification. The general default applies assigning [-r] to the agent, making it the subject, [+r] to the theme making it a restricted object, and leaving the recipient as underspecified. By function argument biuniqueness, the beneficiary becomes the object, since the agent is realized as the subject. Since the syntactic functions produced by the mapping match the syntactic functions obtained from the parser, the sentence is accepted and the a-structure is produced as shown above. Depending on howthe optionality codes for give are defined, other role combinations valid for give are also mapped, but they are rejected since the resulting syntactic functions don t match those of the sentence. Hence, only the a-structure shown is produced. (2) The boy was given a book Verb Arg1 Arg2 give (passive) the boy a book f-structure : subj objth a-structure : REC TH GIVE REC TH Intr. -r +o Default +r -------------------- SUBJ/OBJ OBJTH W.F. SUBJ OBJTH This is the same sentence as in (1), but in the passive. The agent (the teacher) is suppressed, and the recipient (the boy) is nowthe subject. The treatment of the passive is done according to LMT, where the verb s most prominent argument is suppressed. We remove the most prominent argument from the list of the verb s
308 Tamer S. Mahdi and Robert E. Mercer arguments, and we shift each of the other arguments one position up, then the mapping is performed and the resulting syntactic functions are compared with the syntactic functions obtained from the parser. Since they match, the sentence is accepted. The mapping details are: The recipient is assigned [-r], and the theme [+o] by AOP. The general default assigns [+r] to the theme making it a restricted object. The subject condition makes the recipient the subject. In the passive, the number of arguments in the sentence is expected to be one less than in the active. Since this is the case in the example above, the sentence is accepted. (3) The waiter served the guests Verb Arg1 Arg2 serve (active) the waiter the guests f-structure : subj obj a-structure : AG REC PERSON PERSON SERVE AG REC Intr. -o -r Default -r ----------------------- SUBJ SUBJ/OBJ W.F. SUBJ OBJ In the previous examples, we didn t show the use of selectional restrictions to focus on the mapping. Here we show how applying selectional restrictions enhances the mapping. Without applying selection restrictions, there would be two possible mappings for the sentence above; object to recipient, and object to theme. Serve has many senses, the sense used here is to help someone with food or drink. The conceptual types for the arguments of serve are: person for agent and recipient, food and drink for theme. Obviously, the guests will fit the conceptual type for recipient, but not for theme. Hence, the mapping of object to theme is ruled out, and only the mapping of object to recipient is plausible. (4) Mary cooked the children Verb Arg1 Arg2 cook (active) Mary the children f-structure : subj obj Error: The theme of "cook" should be a kind of FOOD. "The children" is not a kind of FOOD! The optionality codes defined for cook disallowit from having a beneficiary without a theme, hence the children cannot be mapped to beneficiary in this example. Without selectional restrictions, the children will be mapped to theme, and the sentence would be accepted. Applying selectional restrictions matches the conceptual types of the children (i.e. person) and the theme (i.e. food). Since they don t match, the sentence will be rejected, and the error message shown above will be issued.
A Lexical Functional Mapping Algorithm 309 References 1. A. Alsina. The Role of Argument Structure in Grammar: Evidence from Romance. CSLI Publications, Stanford, CA, 1996. 303 2. J. Bresnan and J. M. Kanerva. Locative inversion in chichewa: A case study of factorization in grammar. Linguistic Inquiry, 20:1 50, 1989. 303, 304 3. J. Bresnan and L. Moshi. Object asymmetries in comparative Bantu syntax. Linguistic Inquiry, 21:147 185, 1990. 304 4. D. Dowty. Word Meaning and Montague Grammar. Reidel, Dordrecht, 1979. 303 5. J. Grimshaw. Argument Structure. The MIT Press, Cambridge, MA, 1990. 303, 305 6. R. Kaplan and J. Bresnan. Lexical-Functional Grammar: A formal system for grammatical representation. In The Mental Representation of Grammatical Relations, pages 173 281. The MIT Press, Cambridge, MA, 1982. 303 7. R. M. Kaplan and J. T. Maxwell. LFG Grammar-Writer s Workbench. Technical Report, Xerox PARC, 1996. 305 8. T. Mahdi. A Computational Approach to Lexical Functional Mapping. MScThesis, The University of Western Ontario, May 2001. 303 9. G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3:235 244, 1990. 306 10. K. P. Mohanan and T. Mohanan. Semantic representation in the architecture of LFG. Paper presented at the LFG Workshop, Grenoble, France, August 1996. 303 11. Z. Vendler. Linguistics in philosophy. Cornell University Press, Ithaca, NY, 1967. 303