Improving coverage and parsing quality of a large-scale LFG for German

Size: px

Start display at page:

Download "Improving coverage and parsing quality of a large-scale LFG for German"

Thomasine Wells
6 years ago
Views:

1 Improving coverage and parsing quality of a large-scale LFG for German Christian Rohrer, Martin Forst Institute for Natural Language Processing (IMS) University of Stuttgart Azenbergstr Stuttgart, Germany {rohrer,forst}@ims.uni-stuttgart.de Abstract We describe experiments in parsing the German TIGER Treebank. In parsing the complete treebank, 86.44% of the sentences receive full parses; 13.56% receive fragment parses. We discuss the methods used to enhance coverage and parsing quality and we present an evaluation on a gold standard, to our knowledge the first one for a deep grammar of German. Considering the selection performed by our current version of a stochastic disambiguation component, we achieve an f-score of 84.2%, the upper and lower bounds being 87.4% and 82.3% respectively. 1. Introduction For realistic applications we need grammars with broad coverage. The broader the coverage, however, the greater the number of possible readings per sentence and the lower the performance. When increasing coverage, we tried to include the most frequent constructions (based on a corpus study) and at the same time to restrict the grammar rules in order to avoid overgeneration. The restrictions are sometimes too heavy, and we loose certain sentences, but the gain in performance clearly justifies the restrictions. Besides quantity of analyses, one also wants quality. Quality can only be measured by evaluating against a gold standard. Once substantial coverage with high quality has been reached, the problem is to chose the intended reading. Disambiguation of competing syntactic analyses is one of the greatest challenges for computational linguistics. We present first results of experiments with a stochastic disambiguation model. 2. A Broad-Coverage LFG for German The grammar was developed in the ParGram project (Butt et al., 2002). Besides achieving 50% coverage (Dipper, 2003), the grammar writers concentrated on phenomena discussed in theoretical syntax. With the advent of treebanks and successful attempts to induce grammars from treebanks, we shifted our focus. In a new project (DLFG 1 ), we are concentrating on coverage. The grammar now has 274 LFG style rules, which compile into an automaton with 6,584 states and 22,241 arcs. The grammar uses several lexicons and and a guessing mechanism for default lexical entries. The lexicons record mainly subcategorization information. As a form of preprocessing, the grammar uses a cascade of finite-state transducers (Kaplan et al., 2004), mainly for tokenization and morphological analysis. The input sentences are thus processed by a tokenizer, a multi-word transducer, a morphology and a 1 Disambiguierung einer Lexikalisch-Funktionalen Grammatik für das Deutsche ( Disambiguation of a Lexical Functional Grammar for German ) research project financed by the DFG (Deutsche Forschungsgemeinschaft German Research Foundation ), grant Ro 245/18-1 guesser before they are actually parsed. Later we will also include a named entity recognizer (NER). In the current experiments with the gold standard we simulate the NER by manual marking. 3. Enhancing grammar coverage 3.1. Corpus-based enlargement of grammar coverage In order to increase coverage of the grammar we first had to find out where the grammar was incomplete. We systematically created testsuites extracted from the TIGER Treebank. For instance we extracted all NPs up to the head or all NPs which are modified by a (subcategorized) subordinate clause or a verbphrase. We also extracted the trees associated with the corrsponding strings in order to determine the frequency of a construction. Most of the examples where our grammar failed involved constructions with very limited frequency. Hence, once a grammar has achieved broad coverage progress is slow. There were, however, a few areas where adding new rules really helped to increase coverage: Coordination Coordination was one phenomenon of which only the basic instances were covered by the original grammar. We thus introduced new rules for several subtypes of asymmetric or otherwise special coordination. Coordination of adverbs with PPs In analogy to predicative constituents like in he is a Republican and proud of it, which can be handled with a special coordination rule for predicative constituents that allows, e.g., DPs and APs to be coordinated, we account for the coordination of ADVPs and PPs that function as modifiers with a special coordination rule 2, namely ADVP ADVP: ; CONJco PP:. (1) hier und in Berlin here and in Berlin here and in Berlin 2 For simplicity of presentation, we only present simplified versions of the newly introduced grammar rules.

2 Subject gap in finite constructions (SGF) (2) Hierhin kam Here came Hans und hielt seinen Vortrag. Hans and gave his talk. Hans came here and gave his talk. In these constructions, which have received a lot of linguistic attention since Höhle (1983), the shared subject is in the Mittelfeld of the first conjunct instead of being in the Vorfeld. This means that it is not distributed automatically into the second conjunct. We have implemented an analysis following Frank (2002), who treats SGF coordination as a marked case of CP coordination that can only occur given a very particular information structure. Unlike Frank (2002), we formulate the rule as a coordination of a CP and a Cbar, but this is a detail motivated by efficiency considerations: CP CP: ( SUBJ) = ( SUBJ); CONJco Cbar:. Adverbs and PPs between conjunction and conjunct Coordinated structures where an ADVP or a PP occurs left of the last conjunct, as illustrated in (3), have received much less attention in theoretical linguistics nor are they accounted for in most deep grammars, to our knowledge. (3) From Von month Monat to zu month Monat grows wächst the das offer Angebot und mit ihm auch die Nachfrage. and with it also the demand. The offer grows from month to month and so does the demand. However, they are relatively frequent in text corpora, so that coverage can be noticeably improved by the introduction of a rule for these constructions. We therefore formulated coordination rules of the following type, where, in the f- annotations, refers to the f-structure of the right sister: DP DP: ; CONJco ( { ADVP: ( ADJUNCT); PP: ( ADJUNCT) } ) DP:. Parentheticals 4-5% of the sentences in the TIGER Corpus contain constituents marked as parentheticals. We introduce parenthetical constructions via a metarule macro. It allows insertion of a parenthetical between any two constituents on the right hand side of a phrase-structure rule. Reported speech without real verbum dicendum In German newspaper text, sentences like the following occur relatively frequently: (4) Die The Fans fans waren were zunächst at irritiert, first bewertet irritated, Hans die Veränderung der Band. evaluates Hans the change of the band. The fans were confused at first, says Hans, evaluating the change of the band. The first clause, Die Fans waren zunächst irritiert, represents reported speech, but the clause which introduces the reported speech does not contain a verb of saying. Bewerten does not subcategorize for a sentential complement. In our example, it takes a subject (Hans) and an object (die Veränderung der Band). The distribution of this kind of construction is the same as the distribution of reportive parentheticals headed by verbs that subcategorize for a COMP. Hence, in addition to COMP, we allow the reported speech before or around a reportive parenthetical to be projected to the semantic function REPORTEDSPEECH. The f-structure associated to (4) is illustrated in figure 1. "``Die Fans waren zunächst irritiert'', bewertet Hans die Veränderung der Band PRED SUBJ ADJUNCT 1 TOPIC 'irritieren<null, [333:Fan]>' PRED 'Fan' SPEC DET PRED 'die' PRED 'zunächst' PRED SUBJ OBJ 'bewerten<[114:hans], [134:Veränderung]>' 114 PRED 'Hans' PRED 'Veränderung' PRED 'Band' ADJ-GEN SPEC DET PRED 'die' 204 SPEC DET PRED 'die' REPORTEDSPEECH [1:irritieren] [333:Fan] Figure 1: f-structure of (4) 3.2. Corpus-based restriction of grammar rules Rule specialization In the original version of our grammar we tried to write rules as general as possible. For instance, a VP can function as an AP if the head verb is transformed into a participle. Instead of an unrestricted rule AP[+infl] VP[+infl], with very negative effects on efficiency, we wrote a special rule VP-as-AP, where we limit the number and function of possible constituents and where we exclude recursion in the verbal complex. This is motivated by the fact that, in the TIGER Corpus, there is not a single occurrence of an AP with a participle head dominating a VP. The exclusion of recursion in deverbal attributive APs has a very positive impact on the efficiency of the grammar because there are numerous forms that can be both an inflected past participle and a past tense form. Consider the following subordinate clause: (5) Weil Because er die Frau die he the woman the Aktien zu verkaufen shares to sell überredete, convinced, Because he convinced the woman to sell the shares... The form überredete can be both a past tense form and a past participle. As the original grammar allows infinitival VPs to be embedded in attributive deverbal APs, it can analyze the string die Aktien zu verkaufen überredete as an inflected AP, and this inflected AP can then be analyzed as a headless DP. This means that a large number of undesired c-structures is built which are only ruled out during the solution of the f-structure constraints. Of course, with respect to efficiency, it is a very attractive feature of the revised grammar that these erroneous c-structures are not built at all in the first place.

3 Restricting long distance dependencies Solving the equations which account for long distance dependencies can be very time-consuming. We therefore simplified these equations based on a corpus study, e.g. for extraposed relative clauses Restricting rules by number of tokens We restrict certain rules by limiting the number of tokens covered by the rule. E.g., subjectless insertions like wie früher berichtet ( as previously reported ) have only very few words between as and reported Generality of the steps taken to enhance grammar coverage Our section on corpus-based improvement of grammar coverage may create the impression that we tailored the grammar too closely to the TIGER Corpus. We therefore parsed the 20,614 sentences of the NEGRA Corpus. 81.5% of the sentences obtained a full parse and 18.5%, a partial parse. These results on the NEGRA Corpus are clearly not as good as the results on the TIGER Corpus, but with a grammar coverage of more than 80%, they show that coverage does not drop dramatically on unseen corpora and that at least most of the measures taken to improve coverage carry over to the unseen data. 4. Robustness We augmented the standard grammar with a FRAGMENT grammar to collect as much information as possible in cases where a sentence does not get a full parse. The parser returns well-formed chunks like NPs, PPs, VPs, Ss, etc. The grammar has a fewest-chunk method for determining the least fragmented parse. It turned out that the quality of fragment parses can be improved by restricting complex rules (e.g. the S-rule) in the fragment grammar wrt. the standard grammar. In order to cope with timeouts and memory problems, we use the SKIMMING technique (Riezler et al., 2002). When the amount of time or memory spent on a sentence exceeds a given threshold, XLE skims the constituents whose processing has not yet been completed, i.e. XLE does only a bounded amount of work per subtree. When skimming, we use a restricted version of our grammar. This is achieved with the help of special OT marks (Frank et al., 2001), socalled SKIMMING NOGOOD marks, which turn off expensive rules like headless NPs, free datives, etc. during skimming. 5. Testing 5.1. Gold standard We evaluated parse quality on manually validated dependency annotations for 1602 sentences from the TiGer Dependency Bank (Forst et al., 2004) The annotation from the TIGER Treebank were semi-automatically transformed into dependency triples which were then corrected and extended by human annotators. It encodes the same type of dependency triples as the PARC 700 Dependency Bank (King et al., 2003). The grammatical relations and morphosyntactic features are the ones annotated in the TIGER Treebank, except for systematic changes meant to make the TiGer DB more suitable for parser evaluation Parsing quality In tables 1 and 2, we give the results of two types of parse selection: (1) lower bound: In the lower bound a parse from the set of parses is chosen randomly. (2) upper bound: In the case of the upper bound the best F-score according to the annotation schema is chosen. F-score is defined as the harmonic mean of precision and recall (f = 2pr p+r ). We use the triple encoding and evaluation software of (Crouch et al., 2002). Table 1 shows that full parses achieve a noticeably higher f-score than partial parses; this shows that it is crucial to improve coverage to, say, at least 80% in order to parse free text with a reasonable quality. Table 2 gives the upper bound and the lower bound figures for the 1602 gold standard sentences broken down according to the grammatical relations and morphosyntactic features encoded Disambiguation Table 3, finally, gives preliminary results for our stochastic disambiguation component. Two versions of the component are compared with each other and with the upper and lower bound. Both versions are based on maximum entropy models that are trained in a supervised manner on partially labelled data. The training material for both models were the parses of 3,817 sentences from the TIGER Corpus (except of sentences 8,001 through 10,000). The all properties version uses both the kind of property described in Riezler et al. (2002) and a series of new properties that mainly encode information on the linear order of grammatical functions. The only original properties version only makes use of the former. upper all properties only original lower relation bound for disamb. properties bound all preds only da gr oa op op loc quant sb sbp Table 3: F-scores for selected grammatical relations in the 1602 TiGer DB examples broken down according to parse selection method 6. Discussion 6.1. Coverage In order to get a full parse, the input sentence has to be wellformed. At least 1% of the sentences in the testsuite contain spelling mistakes, punctuation errors or grammatical errors. Furthermore the TIGER annotators sometimes assign full structures to elliptical sentences that lack a clear syntactic head. In order to match the analyses annotated for them, our parser would have to do a lot of structure building, which would lead to overgeneration and inefficiency.

4 full and non- non-skimmed skimmed all full skimmed fragments fragments fragments fragments % of test set upper bound lower bound avg. sentence length avg. parse time in sec Table 1: Upper bound and lower bound f-scores for grammatical relations and morphosyntactic features in the 1602 TiGer DB examples broken down according to parse quality Among the well-formed sentences which receive a partial parse we have to distinguish three types: (1) constructions for which our grammar contains rules, which, however, are turned off for efficiency reasons (e.g. coordination without an explicit conjunction), (2) constructions for which we do not have rules (e.g., special types of non-constituent coordination, certain parenthetical constructions, heavy ellipsis), (3) sentences which contain lexical material that is not in the lexicon and which our guesser cannot handle (e.g., problems of subcategorization, idioms and collocations). Subcategorization poses problems especially if a MWE as a whole subcategorizes for a sentential function like COMP despite the fact that none of its parts subcatgorizes for a COMP. This is the case with the MWE zu Protokoll geben which subcategorizes for a COMP but neither geben nor Protokoll subcategorize for a COMP Parsing quality As Table 1 shows, the results for the complete testsuite are quite good. Breaking them down according to parse quality shows that our upper bound for full parses is roughly identical to Riezler et al. (2002). Our values for the complete test set are better (87.4% vs. 84.1%) because more sentences of our testsuite receive a full parse. If we subtract the 55 sentences with an average length of 41.7 words that get a partial parse after skimming, we obtain for 96.6% of our testsuite an upper bound of 88.0% and a lower bound of 82.9%. The F-score of our non-skimmed fragment parses is surprisingly high. Only highly elliptical sentences get really bad values. One explanation for our good values are our detailed subcategorization lexicons. The figures in table 2 are more informative than overall F- score. They illustrate that the f-scores for grammatical relations are not as good as those for morphosyntactic features. The lower values for case are due to syntactic ambiguity and are therefore not a purely morphological problem; to a limited extent this is also true for the feature num (number). In the preds-only evaluation the values for arguments sb (subject) and oa (accusative object) are better than those for da (dative object) and og (genitive object). So-called free datives are quite frequent in German, and as the name indicates, difficult to predict and to specify in the subcategorization lexicon. We guess free datives and, apparently, we go wrong sometimes. For genitive objects we get bad values because, for efficiency reasons, we require that the genitive be morphologically marked. Furthermore, genitive NPs may be attached to preceding NPs. The figures for sbp (logical subject in passives) are worse than those for grammatical subjects because the PP denoting the logical subject is introduced by von, which has many different functions. Subcategorized PPs (and ADVPs) are annotated as op (oblique), op dir (directional argument), op loc (locative argument) and op manner (modal argument). The low f- score for subcategorized PPs indicates gaps in the subcategorization lexicon. In addition, this low score has a negative effect on the f-score of mo (modifiers or ajduncts). pds (predicative complements) with the copula sein can be confused with stative passives. E.g., Er ist ihm übergeordnet is analyzed as stative passive by our grammar and as pd by the annotators. The values for the subcategorized functions oc fin (finite complement clauses) and oc inf (non-finite argument VPs) differ. The figures for clauses with the function oc fin are lower because clauses introduced by interrogative or relative pronouns in adverbial function can be interpreted as oc fins if the embedding clause contains a word which subcategorizes for such a clause. Furthermore there is interference with rs (reported speech) and app cl (appositive clauses). gl (genitive left) denotes possessives and gr (genitive right) denotes genitive adjuncts and von PPs with genitive function. gl constructions are easy to identify because they always precede their head, whereas the analysis of gr ultimately is a semantic problem, at least when it is realized by a von PP. Comparative complements (cc) and relative clauses (rc), which are often extraposed, are difficult to attach to the corresponding head. Coordination (cj) is also notoriously difficult and achieves fairly low values Disambiguation The figures in table 3 show that a selection performed by one of the versions of the stochastic disambiguation component clearly performs better than a random selection (lower bound). We also observe that the all properties version of the disambiguation component performs noticeably better than the only original properties version. In terms of overall f-score, the gain with respect to the lower bound doubles with the help of the additional properties; for the core grammatical functions, such as oa, sb etc., which are particularly important for the potential construction of a semantic representation on the basis of f-structures, this gain is even far more important. For many of the grammatical functions, the additional properties allow the all properties f-score to be closer to the upper bound f-score than to the lower bound f-score. As this is not the case of the only original properties f-scores, we believe that property design will be partic-

5 relation or upper bound lower bound feature precision recall f-score precision recall f-score all 61213/ / / /70482 = 88.1 = = 82.8 = preds only 22050/ / / /27328 = = = 76.2 = ams 0/2 = 0 0 0/2 = 0 0 app 185/268 = /337 = /282 = /336 = app cl 23/27 = 85 23/77 = /26 = 85 22/77 = cc 17/23 = 74 17/46 = /20 = 70 14/45 = cj 1183/1412 = /1806 = /1412 = /1806 = da 118/190 = /162 = /226 = /162 = det 3655/3816 = /3938 = /3822 = /3930 = gl 292/316 = /317 = /305 = /316 = gr 804/928 = /902 = /897 = /899 = measured 9/20 = 45 9/24 = /20 = 45 9/24 = mo 4997/6878 = /6610 = /6946 = /6601 = mod 2087/2219 = /2228 = /2226 = /2227 = name mod 336/420 = /385 = /424 = /385 = number 370/469 = /424 = /456 = /423 = oa 923/1098 = /1191 = /1104 = /1189 = oa2 0/1 = 0 0 obj 2916/3213 = /3180 = /3227 = /3174 = oc fin 151/212 = /226 = /211 = /226 = oc inf 340/379 = /411 = /387 = /411 = og 5/5 = 100 5/9 = /5 = 60 3/9 = op 267/389 = /526 = /377 = /526 = op dir 29/38 = 76 29/140 = /38 = 53 20/140 = op loc 35/52 = 67 35/59 = /44 = 52 23/59 = op manner 6/8 = 75 6/16 = /4 = 50 2/16 = pd 258/358 = /403 = /358 = /403 = pred restr 110/121 = /122 = /123 = /122 = quant 172/195 = /234 = /184 = /234 = rc 175/212 = /250 = /209 = /250 = rs 2/19 = 11 2/4 = /19 = 11 2/4 = sb 2549/3128 = /3274 = /3140 = /3272 = sbp 35/46 = 76 35/57 = /43 = 65 28/57 = topic disloc 1/16 = 6 1/3 = /18 = 0 0/3 = 0 0 case 7941/9004 = /9098 = /8991 = /9085 = circ form 5/8 = 62 5/6 = /8 = 62 5/6 = comp form 99/115 = 86 99/160 = /111 = 86 96/160 = coord form 557/613 = /648 = /615 = /648 = degree 2346/2640 = /2488 = /2668 = /2486 = det type 3628/3780 = /3779 = /3772 = /3771 = fut 61/63 = 97 61/71 = /65 = 94 61/71 = gend 7207/7829 = /7875 = /7850 = /7864 = mood 2129/2254 = /2366 = /2253 = /2364 = num 8739/9495 = /9333 = /9510 = /9319 = pass asp 258/287 = /324 = /287 = /324 = perf 296/301 = /355 = /299 = /355 = pers 2392/2621 = /2800 = /2617 = /2796 = precoord form 7/8 = 88 7/9 = /7 = 86 6/9 = pron form 71/74 = 96 71/72 = /74 = 96 71/72 = pron type 1282/1689 = /1482 = /1700 = /1482 = tense 2145/2240 = /2360 = /2239 = /2358 = Table 2: Upper bound and lower bound precisions, recalls and F-scores for grammatical relations and morphosyntactic features in the 1602 TiGer DB examples

6 ularly important for the further improvement of the stochastic disambiguation component. A further step that we plan to take and that, as we hope, will improve the results of the stochastic disambiguation, regardless of the properties that are used for it, is the acquisition of more training data Comparison with previous work Our results are comparable to those reported by Riezler et al. (2002) and Cahill et al. (2005) for English. Our score is improved by the fact that we check some morphological information like gender, number or tense, which a good chunker could also identify correctly. In a preds-only evaluation, the figures are lower, but the same tendency is observed with other parsers that are evaluated on dependencybased gold standards. Dubey and Keller (2003) induce a grammar from the NE- GRA Treebank, a predecessor of TIGER. They report a labelled precision and recall of up to 74%. The results for induced grammars seem to be worse for German with its free word order than for English. This also holds for the German LFG induced from the TIGER Corpus (Cahill et al., 2005). The authors report an f-score of 71%. The evaluation is equivalent to ours, i.e. based on dependency triples obtained via conversion from TIGER graphs. The testsuite which functions as a gold standard, however, is fairly small. One of the reasons for the low f-score seems to be the lack of morphological information and the very flat structure of the TIGER graphs. Integrating morphological information would certainly improve the score. The flat structure of the NEGRA and TIGER Treebanks may also have a negative influence on the quality of the induced grammars. Foth et al. (2005) describe a parsing system for unrestricted German text. Total coverage is achieved by means of defeasible, graded constraints. The authors report an f-score of 87% in an evaluation with the NEGRA Corpus. These are clearly the best results for German so far. They are also better than those reported by Schiehlen (2003), who achieves an f-score of 81.7% on the NEGRA data. In support of our approach, we would like to mention that our grammar is fully reversible and comes with a fullfledged generator. 7. Conclusion We have shown that a hand-crafted deep grammar can achieve good results on free text. The next step will be to refine our stochastic disambiguation component. Our grammar can also be used in generation, unlike other large-scale grammars of German. 8. References Miriam Butt, Helge Dyvik, Tracy H. King, Hiroshi Masuichi, and Christian Rohrer The Parallel Grammar Project. In Proceedings of COLING-2002 Workshop on Grammar Engineering and Evaluation, pages 1 7. Aoife Cahill, Michael Burke, Martin Forst, Ruth O Donovan, Christian Rohrer, Josef van Genabith, and Andy Way Treebank-Based Multilingual Unification-Grammar Resources. Research in Language and Computation. Richard Crouch, Ronald M. Kaplan, Tracy H. King, and Stefan Riezler A comparison of evaluation metrics for a broad-coverage parser. In Proceedings of the LREC Workshop Beyond PARSEVAL Towards improved evaluation mesures for parsing systems, pages 67 74, Las Palmas, Spain. Stefanie Dipper Implementing and Documenting Large-scale Grammars German LFG. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), Volume 9, Number 1. Amit Dubey and Frank Keller Probabilistic Parsing for German using Sister-Head Dependencies. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages , Sapporo, Japan. Martin Forst, Núria Bertomeu, Berthold Crysmann, Frederik Fouvry, Silvia Hansen-Schirra, and Valia Kordoni Towards a dependency-based gold standard for German parsers The TiGer Dependency Bank. In Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC 04), Geneva. Kilian Foth, Wolfgang Menzel, and Ingo Schröder Robust parsing with weighted constraints. Natural Language Engineering, 11(1):1 25. Anette Frank, Tracy Holloway King, Jonas Kuhn, and John T. Maxwell III Optimality Theory Style Constraint Ranking in Large-Scale LFG Grammars. In Peter Sells, editor, Formal and Empirical Issues in Optimality Theoretic Syntax. Anette Frank A (Discourse) Functional Analysis of Asymmetric Coordination. In Proceedings of the 7th International LFG Conference (LFG 05), Athens, Greece. CSLI Publications. Tilmann Höhle Topologische Felder. Ph.D. thesis, University of Cologne. Ronald M. Kaplan, John T. Maxwell, Tracy H. King, and Richard Crouch Integrating Finite-state Technology with Deep LFG Grammars. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, Nancy, France. Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan The PARC 700 Dependency Bank. In Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC 03), Budapest. Stefan Riezler, Tracy Holloway King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002, Philadelphia. Michael Schiehlen Combining Deep and Shallow Approaches in Parsing German. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,