Hindi-Urdu Phrase Structure Annotation Rajesh Bhatt and Owen Rambow January 12, 2009 1 Design Principle: Minimal Commitments Binary Branching Representations. Mostly lexical projections (P,, AP, AdvP) and one functional projection CP. Full clauses are treated as s, no node labelled S or IP. oun phrases are treated as Ps, no DPs or PPs. Auxiliaries are handled a verbs that combine with s. CPs are postulated only when there is evidence for a complementizer. o structural representation of agreement or case. o primitive notion of subject or object. Attempt to keep analyses uniform: there is always an X and an XP even if there is only one word in the XP. Two arguments of the verb have a canonical word order with respect to each other and the verb. Deviations from this are indicated via traces and coindexation. 1
2 Central Assumption (1) The syntax of Hindi-Urdu clauses structurally distinguishes at most two positions. These positions are the [Specifier,] position and the [Complement,] positions. Two distinct arguments: Simple Transitive (2) ne kiwaba parhi Erg book.f read.pfv.f read the book. P P K P ne kiwaba parhi (3) Diagnostics for the high argument: a. When unmarked, it controls agreement. akbara parhegi.f newspaper.m read.fut.fsg will read the newspaper. b. In non-finite embedded clauses, it becomes null. ne [ dilli jana] caha Erg Delhi go.inf want.pfv wanted to go to Delhi. c. In the obligational construction, it bears dative case. ko akbara parhna he Dat newspaper read.inf be.prs.sg has to read the newspaper. 2
(4) Diagnostics for the low argument: a. When high argument is marked, low argument controls agreement: rama ne kiwaba parhi Ram.m Erg book.f read.pfv.f Ram read the book. b. In non-finite embedded clauses, it remains overt and can control agreement: ne [ akbara parhna] caha Erg newspaper read.inf want.pfv wanted to read the newspaper. c. Depending upon its specificity and animacy, it be marked with ko ne akbara ko parha Erg newspaper Acc read.pfv read the newspaper. Using these diagnostics, we can distinguish true transitives from pseudotransitives. (5) a. rama dilli jaega Ram.m Delhi.f go.fut.msg Ram will go to Delhi. b. Obligational construction: dilli does not control agreement rama Ram ko dillii jana/*jani he Dat Delhi go.inf/*go.inf.f be.prs Ram has to go to Delhi. 3
2.1 Intransitives When there is only one distinct argument as in unaccusatives (e.g. TUTnaa break ) and in unergatives (e.g. hãs-naa laugh ), there is a question whether we still have two structurally distinguished positions or just one. One possible treatment: Unergatives Unaccusative P P i t i In principle, there are a number of diagnostics that can help distinguish between unaccusatives and unergatives but in practice, the distinction can be tricky. One possibility is to make the distinction at the level of the lexicon/propbank and assign both unaccusatives and unergatives the following simplex structure. P 2.2 Passives The single argument in the passive of a transitive occupies both the high and the low position. (6) keka kala banaya jaega cake tomorrow make.pfv FO.Fut.MSg The cake will be made tomorrow. 4
P i AdvP keka kala P t i banaya jaega 2.3 Ditransitives and Others The assumption that there are only two structurally distinguished positions has the effect that we do not make a distinction between clear cases of adjuncts (temporal, locative, manner, reason, and other adverbials) and cases of putative arguments. These include: (7) a. Dative arguments in ditransitives (8) rama Ram b. Dative subjects in dative subject verbs c. erbs with quirky objects (instrumentals, locatives) ne mina ko kala eka kiwaba di Erg Mina Dat yesterday a book give.pfv.f Ram gave a book to Mina yesterday. 5
P P K ne P rama P mina K ko AdvP kala P kiwaba D eka - note that the dative argument Mina ko and the temporal adverb kala are both represented as adjuncts on. - the argumenthood of Mina ko could be identified at the lexical level and indicated as a diacritic on the phrase structure. It could also be inherited from the PDG annotation. - while the distinction between temporal adverbials and dative arguments is a clear one, other cases are harder to adjudicate. For example, locative arguments of motion verbs and benefactive arguments of creation verbs. di 3 Scrambling and Traces Traces are used to represent to indicate scrambling. (9) The basic structure is as discussed earlier. a. egation is adjoined to. b. Adverbs are adjoined to and. c. If arguments appear in an order other than the order in the basic structure, the reordering is indicated via traces. d. The reordering of adjuncts does not necessarily involve traces. 6
(10) a. [ P akbara i [ P [ t i parhegi]]] newspaper.f read.fut.fsg The newspaper, will read. b. [ P [ akbara i [ kala [ t i parhegi]]]].f newspaper tomorrow read.fut.f will read the newspaper tomorrow. c. Putatively non-canonical word order, but no traces: [ P Sita ko [ P Ram ne [ kitaab dii]]] Sita Dat Ram Erg book.f give.pfv.f Ram gave Sita a book. d. Extraction of possessor, traces: [ P [Sita kii] i [ P [ P Ram ne [ [ P t i kitaab] parhii]] Sita Gen.f Ram Erg book.f read.pfv.f hai]] be.prs Ram read Sita s book. (literally: Sita s, Ram read book.) 4 Representation of ull Elements If there is a low argument and the high argument is missing, the high argument is realized as a pro, a silent pronoun. If we have evidence that there is a low argument but it is not overt, then the missing argument is also realized as a pro. Other elements that are missing (adjuncts, datives, etc.) are not explicitly represented. 7