Towards a Syntax-Semantics Interface for Topological Dependency Grammar

Towards a Syntax-Semantics Interface for Topological Dependency Grammar 256 Abstr We present the first step towards a constraint-based syntax-semantics interface for Topological Dependency Grammar (TDG) (Duchier and Debusmann, 2001). We extend TDG with a new level of representation called semantic dependency dag to capture the deep semantic dependencies and clearly separate this level from the syntic dependency tree. We stipulate an emancipation mechanism between these levels that relates semantic arguments to their syntic realizations, and demonstrate its application with an account of raising and control constructions. 1 Introduction Duchier and Debusmann (2001) introduced Topological Dependency Grammar (TDG), a lexicalized grammatical formalism for dependency grammar, to account for the challenging word order phenomena in free word order languages such as German. In this article, we describe the first step towards a syntax-semantics interface for TDG, namely recovering the deep semantic dependencies. We elaborate on ideas outlined in (Korthals and Debusmann, 2002). TDG as in (Duchier and Debusmann, 2001) explains linearization phenomena through the interion of two structures, similar to (Gerdes and Kahane, 2001): a non-ordered tree of syntic dependencies, where edges are labeled by grammatical functions, and an ordered and projective tree of topological dependencies, where edges are labeled by topological fields. TDG stipulates that they must be related through an emancipation mechanism which allows a word to climb up and land in the topological domain of a syntic ancestor. We propose to follow the same methodology in the recovery of deep semantic dependencies: in addition to the syntic tree and the topological tree, we now introduce the semantic dependency dag, a directed acyclic graph, whose edges are labeled by semantic roles. Again we stipulate that the semantic dependency dag must be related to the syntic dependency tree through an emancipation mechanism which allows a semantic argument to be realized higher up in the syntax tree. Finally, we demonstrate how this simple mechanism suffices to model control and raising constructions. 2 Phenomena The level of syntic dependency is often regarded as being close to semantics, and indeed there are many examples where syntic and semantic argument structure match: Mary loves him (1) Assuming the Praguian set of thematic roles, the ect Mary of loves corresponds to the deep ect or or, the direct object him to the deep object or patient. Here is a corresponding logical form for (1): love(mary, he)

2.1 Raising When we consider raising constructions, we observe a mismatch between syntic and semantic argument structure, e.g.: Mary seems to laugh (2) The corresponding logical form is: seem(laugh(mary)) On the syntic level, seems has a ect but laugh has none, while in the semantic argument structure, laugh has an or (Mary) but seems does not. We say that the ect of the embedded verb laugh is realized as the ect of the raising verb seems. This phenomenon is called ect-toect raising. If the embedded verb has no or, the ect of a raising verb must be an expletive: It(expl) seems to rain (3) Seems to rain (4) Mary seems to rain (5) The ects of embedded verbs can also be realized by direct objects. This phenomenon is called ect-to-object raising. For example: Mary believes him to laugh (6) Here, the ect of laugh climbs up to believes and is realized as its object. The corresponding logical form is: 2.2 Control believe(mary, laugh(he)) Control verbs are similar to raising verbs: they also realize the ect of an embedded verb as one of their complements. Contrary to raising verbs though, control verbs additionally assign this ect a semantic role: Mary tries to laugh. (7) The ect Mary of tries is not only the or of laugh, but also of the control verb tries itself: try(mary, laugh(mary)) This phenomenon is called ect-to-ect control. Control verbs cannot embed verbs without an or, such as rain: It(expl) tries to rain (8) Tries to rain (9) Mary tries to rain (10) The ect of an embedded verb can also be realized by other syntic functions than ect. For example here is a case of ect-to-object control: Mary persuades him to laugh (11) where the ect of laugh is realized as the object of the control verb persuades. Here is the corresponding logical form: persuade(mary, he, laugh(he)) 3 TDG Framework In this section, we provide an informal introduction to the TDG framework and describe our proposed extension. For a more theoretical presentation of TDG s formal foundations, we refer the reader to (Duchier, 2001). A TDG analysis consists of a lexical assignment and three structures: the syntic dependency tree (ID for immediate dominance), the topological dependency tree (LP for linear precedence) and the semantic dependency dag (TH for thematic), which are formed from the same set of nodes (one for each word of the input) but different sets of edges. In this article, we ignore the LP tree and focus solely on the core of our proposal, namely the new TH dag and its relation to the ID tree. 3.1 Syntic dependency tree The ID tree is a non-ordered tree of syntic dependencies where edges are labeled with grammatical functions such as for ect or obj for object. The ID tree-level closely corresponds to the analytical layer in FGD (Sgall et al., 1986), to the f-structure in LFG (Bresnan and Kaplan, 1982) and to the DEPS-level in new versions of HPSG as

e.g. in (Malouf, 2000). Below, we show an example ID tree analysis of (12): Mary tries to laugh. (12) Mary tries to laugh (13) For this paper, we assume the following set G of grammatical functions: G = {, obj,, } corresponding respectively to ect, object, toinfinitival complement and to-particle. 3.2 Semantic dependency dag The TH dag is a directed acyclic graph of semantic dependencies. Like the ID tree, it is non-ordered and its edges are labeled by semantic roles such as for or (deep ect) and pat for patient (deep object). The TH dag-level closely corresponds to the tectogrammatical layer in FGD, to the a-structure in LFG, and to the ARG-ST in new versions of HPSG. Here is an example TH dag of sentence (12): denotes the or (deep ect), pat the patient (deep object) and a general relationship. We assign the latter to the predicates embedded under control and raising verbs as we could not find a suitable dependency relation for those predicates in the FGD-literature. 3.3 Lexical constraints A TDG analysis is constrained by an assignment of lexical entries to nodes. A lexical entry has the signature: in ID : 2 G out ID : 2 Π G in TH : 2 S out TH : 2 Π S link TH : S 2 G raised TH : 2 G For example, the lexical entry for the infinitive laugh is: laugh = in ID : {} out ID : {} in TH : {} out TH : {} link TH : {} raised TH : Mary tries to laugh (14) In the remainder of the section, we explain these features and state the principles according to which a lexical assignment simultaneously constrains the ID tree, the TH dag, and the emancipation relationship between them, and thus restricts the admissible analyses. We do not commit ourselves to a particular set of semantic roles. For this article, we adopt a subset of the Praguian dependency relations developed for the dependency grammar formalism of FGD (Functional Generative Description) (Sgall et al., 1986). These dependency relations are also used for annotation of the tectogrammatical layer of the Prague Dependency Treebank (PDT) (Böhmová et al., 2001), and in the dependency grammar formalism of Dependency Grammar Logic (DGL) (Kruijff, 2001): S = {, pat, } Incoming edge principle. in every structure of the analysis, the incoming edge (if any) of a node must be licensed by the corresponding in feature. The in ID and in TH features license the incoming edge respectively in the ID tree and in the TH dag. Thus, in the ID tree laugh only accepts an incoming edge labeled ; we say that laugh has grammatical function. In the TH dag, laugh only accepts an incoming edge labeled ; we say that laugh fills the semantic role. Outgoing edges principle. in every structure of the analysis, the outgoing edges of a node must

satisfy in label and number the stipulation of the corresponding out feature. The out ID and out TH features provide this stipulation respectively for the ID tree and the TH dag. Thus, in the ID tree, laugh requires precisely one outgoing edge labeled for the to-particle and admits no other. In the TH dag, laugh requires one outgoing edge labeled for the or and no other. Notice that out ID corresponds to the notion of subcategorization, and out TH to valency. The stipulation of an out feature is expressed as a set of label patterns. Given a set L of labels, we write Π L for the set of label patterns π that can be formed according to the following abstr syntax: π ::= l l? l l L These patterns are used to distinguish obligatory and optional complements: l means precisely one edge labeled l (obligatory), l? means at most one (optional), and l means 0 or more. We now turn to a principle that relates the TH dag to the ID tree. Linking principle. the semantic arguments of a node, i.e. its dependents in the TH dag, must be syntically realized in the ID tree with grammatical functions stipulated in the link TH feature. link TH describes a mapping between semantic roles and sets of grammatical functions which may realize them. In the example lexical entry above, the or of laugh () must be realized as a ect (). Implicitly, the other semantic roles are mapped to the empty set, i.e. they cannot be realized by any grammatical function. The link TH feature and the linking principle are more thoroughly discussed in (Korthals and Debusmann, 2002). The remaining features concern the emancipation principle which covers both raising and control constructions and explains how a semantic argument of an embedded verb can be realized as a syntic dependent of a dominating raising or control verb. We illustrate it with a lexical entry for control verb tries: in ID : out ID : {?, } in TH : tries = out TH : {, [ } ] {} link TH : {} raised TH : {} Emancipation principle. (a) only ects may emancipate. (b) an emancipated ect must be realized in a raising/control position. (c) a raising/control position must realize the emancipated ect of at least one embedded verb. More precisely, stipulation (a) states that a semantic argument of a word w must also be realized as a syntic dependent of w, except if it is link TH -mapped to in w s lexical entry, in which case it may emancipate and be realized higher up in the ID tree. For (b), the feature raised TH indicates available raising/control positions: thus is a control position for tries. Note that for simplicity, we drop the feature blocks TH and the corresponding barriers principle. The barriers principle prohibits that nodes climb too far up, e.g. that they should not climb through finite verbs. 3.4 Lexical inheritance TDG is a highly lexicalized grammar formalism, and in order to express linguistic generalizations, we make use of a mechanism of lexical inheritance. This mechanism is thoroughly described in (Debusmann, 2001) and allows us to compose lexical entries from a number of lexical types (prefixed with t ) using lattice operations. For instance we obtain the lexical entry for tries (as given above) as follows: tries = t finite t t c to where t finite is the lexical type for finite verbs, t for verbs whose infinitival complement realizes a general relationship () and t c to for ect-to-ect-control verbs. Inheritance amounts to set intersection for features in ID and in TH, and set union for out ID, out TH, link TH and raised TH. Omitted features are assigned

a default value (lattice top): the full set of labels for in ID and in TH, and the empty set for all other features. 4 Grammar fragment In this section, we present a grammar fragment covering the phenomena outlined in section 2. The grammar fragment mainly consists of a number of lexical types from which we obtain the individual lexical entries. An overview showing how the lexicon is obtained from the lexical types is shown in Table 1, and the lexical entries themselves are displayed in Table 2. 4.1 Nouns We begin with the lexical type for nouns (t noun): inid : {, obj} t noun = in TH : {, pat} That is: on the syntic level, nouns may have either the grammatical function ect () or object (obj), while on the semantic level, they may fill either the or () or patient (pat) roles. The following type describes nouns which can only be objects: t obj = t noun [ in ID : {obj} ] An expletive is modeled as a noun that fills no semantic role (the in TH -set is empty): inid : {, obj} t expl = t noun in TH : 4.2 To-particles The lexical type for to-particles is defined below: inid : {} t = in TH : 4.3 Finite verbs In this fragment, finite verbs are matrix verbs, thus have no incoming edges and an optional ect: in ID : t finite = out ID : {?} in TH : Notice that this lexical type does not lead to overgeneration due to the interplay of the ID tree and the TH dag; the ect becomes obligatory if it realizes an obligatory semantic role. We will provide an example for this idea below in section 5. 4.4 Infinite verbs For the small grammar fragment described in this article, we only consider to-infinitives requiring a to-particle (). Their incoming edge label must be in the ID tree, and in the TH dag: t infinite = 4.5 Linking types in ID : {} out ID : {} in TH : {} Linking types describe how semantics roles can be realized by grammatical functions (Korthals and Debusmann, 2002). For this fragment, we only define three linking types. The first (t ) realizes the or by a ect: t = outth : {} link TH : {} The second (t pat obj) realizes the patient by an object: t pat obj = out ID : {obj} out TH : {pat} link TH : pat {obj} The third (t ) states that the embedded predicate of a verb is assigned the semantic role and must be realized by an infinitive: t = 4.6 Raising out ID : {} out TH : {} link TH : {} The raising-position for ect-to-ect raising verbs is : t r to = [ raised TH : {} ] The or of a ect-to-object raising verb is realized by a ect, and its raising-position is obj: t r to obj = t [ out ID : {obj} raised TH : {obj} ]

4.7 Control We model control as a special case of raising. Hence, the lexical type for ect-to-ect control verbs inherits from the lexical type for ectto-ect raising verbs. Contrary to a raising verb, a control verb realizes its or as a ect in addition: t c to = t r to t We model ect-to-object control verbs as ect-to-object raising verbs which require a patient realized as their object: t c to obj = t r to obj t pat obj 5 Application Mary seems to laugh Mary seems to laugh Here, the or Mary of laugh has climbed up (or emancipated) and is syntically realized as the ect of the raising verb seems. The latter assigns no semantic role to its ect (i.e. there is no edge from seems to Mary in the TH dag). The to-particle fills no semantic role; it is isolated in the TH dag. Now consider the sentence: It(expl) seems to rain. (18) In this section, we apply the TDG framework and the grammar fragment outlined above to the phenomena laid out in section 2. We begin with the simple sentence where syntic and semantic argument structure still match: Mary loves him. (15) It seems to rain It seems to rain The ID tree and TH dag analyses of (15) are: obj Mary loves him pat Mary loves him Mary is the or of loves and him the patient. (16) is an ungrammatical example: Loves him. (16) Although the ect of loves is optional on the ID-level, it becomes obligatory because it must realize an obligatory or on the TH-level. Hence (16) is excluded by the outgoing-edges principle because the required or is missing. 5.1 Raising We show a ect-to-ect raising example and the corresponding ID tree and TH dag analyses: Mary seems to laugh. (17) Here, no emancipation takes place since the raising verb seems embeds a verb without an or (rain). The expletive it fills no semantic role and is thus isolated in the TH dag. Since the expletive it does not fill a semantic role, our grammar would also license the following ungrammatical sentence: Seems to rain (19) We can exclude this sentence by various means, for instance by appeal to the topological level, but this goes beyond the scope of this article. We turn to the following ungrammatical sentence: Mary seems to rain (20) Here, the emancipation principle (c) is violated that requires a raising position (here Mary) to realize at least one emancipated ect. But rain has no or, thus no ect that could emancipate. We turn to an example of ect-to-object raising: Mary believes him to laugh. (21)

obj obj pat Mary believes him to laugh Mary believes him to laugh Mary persuades him to laugh Mary persuades him to laugh This time, the ect of the embedded verb laugh climbs up and is realized as the object him of the raising verb believes. 5.2 Control Next, we discuss an example of ect-to-ect control: Mary tries to laugh. (22) The ID tree and the TH dag analyses of the sentence are provided in (13) and (14) above. There, the ect Mary of the embedded verb laugh emancipates and is realized as the ect of the control verb tries. Contrary to a raising verb though, the control verb tries also assigns a semantic role (or) to the emancipated ect, i.e. Mary fills a semantic role on two verbs at the same time: it is the or or tries and the or of laugh. In the TH dag, this is reflected by two edges entering Mary. Control verbs cannot embed verbs without an or: It(expl) tries to rain (23) Tries to rain (24) In our framework, these sentences are not licensed because tries requires an or on the TH-level but does not get one. Thus, the outgoing edges principle is violated. (25) is another ungrammatical sentence: Mary tries to rain (25) Here, the incoming and outgoing edges principles are not violated: tries has an or (Mary) and rain has no or, as required in the lexicon. However, the emancipation principle (c) is violated again since the controling position Mary does not realize the emancipated ect of an embedded verb. Finally an example of ect-to-object control: Mary persuades him to laugh (26) Here, the ect of the embedded verb laugh is realized as the object of the control verb persuades. 6 Conclusion We presented the first steps towards a syntaxsemantics interface for the TDG grammar formalism. We extended TDG with a new level of semantic dependencies (TH dag) which is clearly separated from the purely syntic dependencies captured in the ID tree. The two levels inter through lexicalized constraints and principles, such as the linking and the emancipation principles. We illustrated how fairly complex phenomena, such as control and raising, can be modeled as emerging from the interions of simple constraints. We have implemented a prototype constraint-based parser including the new semantic dependency dag-level which performs very well (Duchier and Debusmann, 2002). In the next step, we plan to augment TDG further in order to build a concurrent syntax-semantics interface to an underspecified semantics using CLLS (Constraint Language for Lambda Structure) (Egg et al., 2001). References Alena Böhmová, Jan Hajič, Eva Hajičová, and Barbora Hladká. 2001. The prague dependency treebank: Three-level annotation scenario. In Treebanks: Building and Using Syntically Annotated Corpora. Joan Bresnan and Ronald Kaplan. 1982. Lexicalfunctional grammar: A formal system for grammatical representation. In The Mental Representation of Grammatical Relations. MIT Press. Ralph Debusmann. 2001. A declarative grammar formalism for dependency grammar. Master s thesis, University of Saarland. Denys Duchier and Ralph Debusmann. 2001. Topological dependency trees: A constraint-based ac-

Mary = t noun him = t obj it = t expl to = t loves = t finite t t pat obj seems = t finite t c t r to believes = t finite t c t r to obj tries = t finite t c t c to persuades = t finite t c t c to obj laugh = t infinite t rain = t infinite Table 1: Obtaining the lexical entries from the lexical types in ID out ID in TH out TH link TH raised TH Mary {, obj} {, pat} him {obj} {, pat} it {, obj} to {} loves {?, obj} {, pat} {} pat {obj} seems {?, } {} {} {} believes {?, obj, } {, } {} {obj} {} tries {?, } {, } {} {} {} persuades {?, obj, } {, pat, } {} {obj} pat {obj} {} laugh {} {} {} {} {} rain {} {} {} Table 2: The lexical entries count of linear precedence. In ACL 2001 Proceedings. Denys Duchier and Ralph Debusmann. 2002. Topological Dependency Grammar 1.2. http://www.mozartoz.org/mogul/info/duchier/coli/dg.html. Denys Duchier. 2001. Lexicalized syntax and topology for non-projective dependency grammar. In MOL 8 Proceedings. Markus Egg, Alexander Koller, and Joachim Niehren. 2001. The constraint language for lambda structures. Journal of Logic, Language, and Information. Kim Gerdes and Sylvain Kahane. 2001. Word order in german: A formal dependency grammar using a topological hierarchy. In ACL 2001 Proceedings. Christian Korthals and Ralph Debusmann. 2002. Linking syntic and semantic arguments in a dependency-based formalism. In COLING 2002 Proceedings. Geert-Jan M. Kruijff. 2001. A Categorial-Modal Architecture of Informativity. Ph.D. thesis, Charles University. Robert Malouf. 2000. A head-driven account of longdistance case assignment. In Grammatical Interfaces in HPSG. CSLI Publications. Petr Sgall, Eva Hajicova, and Jarmila Panevova. 1986. The Meaning of the Sentence in its Semantic and Pragmatic Aspects. D. Reidel.