Improving coverage and parsing quality of a large-scale LFG for German

Size: px
Start display at page:

Download "Improving coverage and parsing quality of a large-scale LFG for German"

Transcription

1 Improving coverage and parsing quality of a large-scale LFG for German Christian Rohrer, Martin Forst Institute for Natural Language Processing (IMS) University of Stuttgart Azenbergstr Stuttgart, Germany {rohrer,forst}@ims.uni-stuttgart.de Abstract We describe experiments in parsing the German TIGER Treebank. In parsing the complete treebank, 86.44% of the sentences receive full parses; 13.56% receive fragment parses. We discuss the methods used to enhance coverage and parsing quality and we present an evaluation on a gold standard, to our knowledge the first one for a deep grammar of German. Considering the selection performed by our current version of a stochastic disambiguation component, we achieve an f-score of 84.2%, the upper and lower bounds being 87.4% and 82.3% respectively. 1. Introduction For realistic applications we need grammars with broad coverage. The broader the coverage, however, the greater the number of possible readings per sentence and the lower the performance. When increasing coverage, we tried to include the most frequent constructions (based on a corpus study) and at the same time to restrict the grammar rules in order to avoid overgeneration. The restrictions are sometimes too heavy, and we loose certain sentences, but the gain in performance clearly justifies the restrictions. Besides quantity of analyses, one also wants quality. Quality can only be measured by evaluating against a gold standard. Once substantial coverage with high quality has been reached, the problem is to chose the intended reading. Disambiguation of competing syntactic analyses is one of the greatest challenges for computational linguistics. We present first results of experiments with a stochastic disambiguation model. 2. A Broad-Coverage LFG for German The grammar was developed in the ParGram project (Butt et al., 2002). Besides achieving 50% coverage (Dipper, 2003), the grammar writers concentrated on phenomena discussed in theoretical syntax. With the advent of treebanks and successful attempts to induce grammars from treebanks, we shifted our focus. In a new project (DLFG 1 ), we are concentrating on coverage. The grammar now has 274 LFG style rules, which compile into an automaton with 6,584 states and 22,241 arcs. The grammar uses several lexicons and and a guessing mechanism for default lexical entries. The lexicons record mainly subcategorization information. As a form of preprocessing, the grammar uses a cascade of finite-state transducers (Kaplan et al., 2004), mainly for tokenization and morphological analysis. The input sentences are thus processed by a tokenizer, a multi-word transducer, a morphology and a 1 Disambiguierung einer Lexikalisch-Funktionalen Grammatik für das Deutsche ( Disambiguation of a Lexical Functional Grammar for German ) research project financed by the DFG (Deutsche Forschungsgemeinschaft German Research Foundation ), grant Ro 245/18-1 guesser before they are actually parsed. Later we will also include a named entity recognizer (NER). In the current experiments with the gold standard we simulate the NER by manual marking. 3. Enhancing grammar coverage 3.1. Corpus-based enlargement of grammar coverage In order to increase coverage of the grammar we first had to find out where the grammar was incomplete. We systematically created testsuites extracted from the TIGER Treebank. For instance we extracted all NPs up to the head or all NPs which are modified by a (subcategorized) subordinate clause or a verbphrase. We also extracted the trees associated with the corrsponding strings in order to determine the frequency of a construction. Most of the examples where our grammar failed involved constructions with very limited frequency. Hence, once a grammar has achieved broad coverage progress is slow. There were, however, a few areas where adding new rules really helped to increase coverage: Coordination Coordination was one phenomenon of which only the basic instances were covered by the original grammar. We thus introduced new rules for several subtypes of asymmetric or otherwise special coordination. Coordination of adverbs with PPs In analogy to predicative constituents like in he is a Republican and proud of it, which can be handled with a special coordination rule for predicative constituents that allows, e.g., DPs and APs to be coordinated, we account for the coordination of ADVPs and PPs that function as modifiers with a special coordination rule 2, namely ADVP ADVP: ; CONJco PP:. (1) hier und in Berlin here and in Berlin here and in Berlin 2 For simplicity of presentation, we only present simplified versions of the newly introduced grammar rules.

2 Subject gap in finite constructions (SGF) (2) Hierhin kam Here came Hans und hielt seinen Vortrag. Hans and gave his talk. Hans came here and gave his talk. In these constructions, which have received a lot of linguistic attention since Höhle (1983), the shared subject is in the Mittelfeld of the first conjunct instead of being in the Vorfeld. This means that it is not distributed automatically into the second conjunct. We have implemented an analysis following Frank (2002), who treats SGF coordination as a marked case of CP coordination that can only occur given a very particular information structure. Unlike Frank (2002), we formulate the rule as a coordination of a CP and a Cbar, but this is a detail motivated by efficiency considerations: CP CP: ( SUBJ) = ( SUBJ); CONJco Cbar:. Adverbs and PPs between conjunction and conjunct Coordinated structures where an ADVP or a PP occurs left of the last conjunct, as illustrated in (3), have received much less attention in theoretical linguistics nor are they accounted for in most deep grammars, to our knowledge. (3) From Von month Monat to zu month Monat grows wächst the das offer Angebot und mit ihm auch die Nachfrage. and with it also the demand. The offer grows from month to month and so does the demand. However, they are relatively frequent in text corpora, so that coverage can be noticeably improved by the introduction of a rule for these constructions. We therefore formulated coordination rules of the following type, where, in the f- annotations, refers to the f-structure of the right sister: DP DP: ; CONJco ( { ADVP: ( ADJUNCT); PP: ( ADJUNCT) } ) DP:. Parentheticals 4-5% of the sentences in the TIGER Corpus contain constituents marked as parentheticals. We introduce parenthetical constructions via a metarule macro. It allows insertion of a parenthetical between any two constituents on the right hand side of a phrase-structure rule. Reported speech without real verbum dicendum In German newspaper text, sentences like the following occur relatively frequently: (4) Die The Fans fans waren were zunächst at irritiert, first bewertet irritated, Hans die Veränderung der Band. evaluates Hans the change of the band. The fans were confused at first, says Hans, evaluating the change of the band. The first clause, Die Fans waren zunächst irritiert, represents reported speech, but the clause which introduces the reported speech does not contain a verb of saying. Bewerten does not subcategorize for a sentential complement. In our example, it takes a subject (Hans) and an object (die Veränderung der Band). The distribution of this kind of construction is the same as the distribution of reportive parentheticals headed by verbs that subcategorize for a COMP. Hence, in addition to COMP, we allow the reported speech before or around a reportive parenthetical to be projected to the semantic function REPORTEDSPEECH. The f-structure associated to (4) is illustrated in figure 1. "``Die Fans waren zunächst irritiert'', bewertet Hans die Veränderung der Band PRED SUBJ ADJUNCT 1 TOPIC 'irritieren<null, [333:Fan]>' PRED 'Fan' SPEC DET PRED 'die' PRED 'zunächst' PRED SUBJ OBJ 'bewerten<[114:hans], [134:Veränderung]>' 114 PRED 'Hans' PRED 'Veränderung' PRED 'Band' ADJ-GEN SPEC DET PRED 'die' 204 SPEC DET PRED 'die' REPORTEDSPEECH [1:irritieren] [333:Fan] Figure 1: f-structure of (4) 3.2. Corpus-based restriction of grammar rules Rule specialization In the original version of our grammar we tried to write rules as general as possible. For instance, a VP can function as an AP if the head verb is transformed into a participle. Instead of an unrestricted rule AP[+infl] VP[+infl], with very negative effects on efficiency, we wrote a special rule VP-as-AP, where we limit the number and function of possible constituents and where we exclude recursion in the verbal complex. This is motivated by the fact that, in the TIGER Corpus, there is not a single occurrence of an AP with a participle head dominating a VP. The exclusion of recursion in deverbal attributive APs has a very positive impact on the efficiency of the grammar because there are numerous forms that can be both an inflected past participle and a past tense form. Consider the following subordinate clause: (5) Weil Because er die Frau die he the woman the Aktien zu verkaufen shares to sell überredete, convinced, Because he convinced the woman to sell the shares... The form überredete can be both a past tense form and a past participle. As the original grammar allows infinitival VPs to be embedded in attributive deverbal APs, it can analyze the string die Aktien zu verkaufen überredete as an inflected AP, and this inflected AP can then be analyzed as a headless DP. This means that a large number of undesired c-structures is built which are only ruled out during the solution of the f-structure constraints. Of course, with respect to efficiency, it is a very attractive feature of the revised grammar that these erroneous c-structures are not built at all in the first place.

3 Restricting long distance dependencies Solving the equations which account for long distance dependencies can be very time-consuming. We therefore simplified these equations based on a corpus study, e.g. for extraposed relative clauses Restricting rules by number of tokens We restrict certain rules by limiting the number of tokens covered by the rule. E.g., subjectless insertions like wie früher berichtet ( as previously reported ) have only very few words between as and reported Generality of the steps taken to enhance grammar coverage Our section on corpus-based improvement of grammar coverage may create the impression that we tailored the grammar too closely to the TIGER Corpus. We therefore parsed the 20,614 sentences of the NEGRA Corpus. 81.5% of the sentences obtained a full parse and 18.5%, a partial parse. These results on the NEGRA Corpus are clearly not as good as the results on the TIGER Corpus, but with a grammar coverage of more than 80%, they show that coverage does not drop dramatically on unseen corpora and that at least most of the measures taken to improve coverage carry over to the unseen data. 4. Robustness We augmented the standard grammar with a FRAGMENT grammar to collect as much information as possible in cases where a sentence does not get a full parse. The parser returns well-formed chunks like NPs, PPs, VPs, Ss, etc. The grammar has a fewest-chunk method for determining the least fragmented parse. It turned out that the quality of fragment parses can be improved by restricting complex rules (e.g. the S-rule) in the fragment grammar wrt. the standard grammar. In order to cope with timeouts and memory problems, we use the SKIMMING technique (Riezler et al., 2002). When the amount of time or memory spent on a sentence exceeds a given threshold, XLE skims the constituents whose processing has not yet been completed, i.e. XLE does only a bounded amount of work per subtree. When skimming, we use a restricted version of our grammar. This is achieved with the help of special OT marks (Frank et al., 2001), socalled SKIMMING NOGOOD marks, which turn off expensive rules like headless NPs, free datives, etc. during skimming. 5. Testing 5.1. Gold standard We evaluated parse quality on manually validated dependency annotations for 1602 sentences from the TiGer Dependency Bank (Forst et al., 2004) The annotation from the TIGER Treebank were semi-automatically transformed into dependency triples which were then corrected and extended by human annotators. It encodes the same type of dependency triples as the PARC 700 Dependency Bank (King et al., 2003). The grammatical relations and morphosyntactic features are the ones annotated in the TIGER Treebank, except for systematic changes meant to make the TiGer DB more suitable for parser evaluation Parsing quality In tables 1 and 2, we give the results of two types of parse selection: (1) lower bound: In the lower bound a parse from the set of parses is chosen randomly. (2) upper bound: In the case of the upper bound the best F-score according to the annotation schema is chosen. F-score is defined as the harmonic mean of precision and recall (f = 2pr p+r ). We use the triple encoding and evaluation software of (Crouch et al., 2002). Table 1 shows that full parses achieve a noticeably higher f-score than partial parses; this shows that it is crucial to improve coverage to, say, at least 80% in order to parse free text with a reasonable quality. Table 2 gives the upper bound and the lower bound figures for the 1602 gold standard sentences broken down according to the grammatical relations and morphosyntactic features encoded Disambiguation Table 3, finally, gives preliminary results for our stochastic disambiguation component. Two versions of the component are compared with each other and with the upper and lower bound. Both versions are based on maximum entropy models that are trained in a supervised manner on partially labelled data. The training material for both models were the parses of 3,817 sentences from the TIGER Corpus (except of sentences 8,001 through 10,000). The all properties version uses both the kind of property described in Riezler et al. (2002) and a series of new properties that mainly encode information on the linear order of grammatical functions. The only original properties version only makes use of the former. upper all properties only original lower relation bound for disamb. properties bound all preds only da gr oa op op loc quant sb sbp Table 3: F-scores for selected grammatical relations in the 1602 TiGer DB examples broken down according to parse selection method 6. Discussion 6.1. Coverage In order to get a full parse, the input sentence has to be wellformed. At least 1% of the sentences in the testsuite contain spelling mistakes, punctuation errors or grammatical errors. Furthermore the TIGER annotators sometimes assign full structures to elliptical sentences that lack a clear syntactic head. In order to match the analyses annotated for them, our parser would have to do a lot of structure building, which would lead to overgeneration and inefficiency.

4 full and non- non-skimmed skimmed all full skimmed fragments fragments fragments fragments % of test set upper bound lower bound avg. sentence length avg. parse time in sec Table 1: Upper bound and lower bound f-scores for grammatical relations and morphosyntactic features in the 1602 TiGer DB examples broken down according to parse quality Among the well-formed sentences which receive a partial parse we have to distinguish three types: (1) constructions for which our grammar contains rules, which, however, are turned off for efficiency reasons (e.g. coordination without an explicit conjunction), (2) constructions for which we do not have rules (e.g., special types of non-constituent coordination, certain parenthetical constructions, heavy ellipsis), (3) sentences which contain lexical material that is not in the lexicon and which our guesser cannot handle (e.g., problems of subcategorization, idioms and collocations). Subcategorization poses problems especially if a MWE as a whole subcategorizes for a sentential function like COMP despite the fact that none of its parts subcatgorizes for a COMP. This is the case with the MWE zu Protokoll geben which subcategorizes for a COMP but neither geben nor Protokoll subcategorize for a COMP Parsing quality As Table 1 shows, the results for the complete testsuite are quite good. Breaking them down according to parse quality shows that our upper bound for full parses is roughly identical to Riezler et al. (2002). Our values for the complete test set are better (87.4% vs. 84.1%) because more sentences of our testsuite receive a full parse. If we subtract the 55 sentences with an average length of 41.7 words that get a partial parse after skimming, we obtain for 96.6% of our testsuite an upper bound of 88.0% and a lower bound of 82.9%. The F-score of our non-skimmed fragment parses is surprisingly high. Only highly elliptical sentences get really bad values. One explanation for our good values are our detailed subcategorization lexicons. The figures in table 2 are more informative than overall F- score. They illustrate that the f-scores for grammatical relations are not as good as those for morphosyntactic features. The lower values for case are due to syntactic ambiguity and are therefore not a purely morphological problem; to a limited extent this is also true for the feature num (number). In the preds-only evaluation the values for arguments sb (subject) and oa (accusative object) are better than those for da (dative object) and og (genitive object). So-called free datives are quite frequent in German, and as the name indicates, difficult to predict and to specify in the subcategorization lexicon. We guess free datives and, apparently, we go wrong sometimes. For genitive objects we get bad values because, for efficiency reasons, we require that the genitive be morphologically marked. Furthermore, genitive NPs may be attached to preceding NPs. The figures for sbp (logical subject in passives) are worse than those for grammatical subjects because the PP denoting the logical subject is introduced by von, which has many different functions. Subcategorized PPs (and ADVPs) are annotated as op (oblique), op dir (directional argument), op loc (locative argument) and op manner (modal argument). The low f- score for subcategorized PPs indicates gaps in the subcategorization lexicon. In addition, this low score has a negative effect on the f-score of mo (modifiers or ajduncts). pds (predicative complements) with the copula sein can be confused with stative passives. E.g., Er ist ihm übergeordnet is analyzed as stative passive by our grammar and as pd by the annotators. The values for the subcategorized functions oc fin (finite complement clauses) and oc inf (non-finite argument VPs) differ. The figures for clauses with the function oc fin are lower because clauses introduced by interrogative or relative pronouns in adverbial function can be interpreted as oc fins if the embedding clause contains a word which subcategorizes for such a clause. Furthermore there is interference with rs (reported speech) and app cl (appositive clauses). gl (genitive left) denotes possessives and gr (genitive right) denotes genitive adjuncts and von PPs with genitive function. gl constructions are easy to identify because they always precede their head, whereas the analysis of gr ultimately is a semantic problem, at least when it is realized by a von PP. Comparative complements (cc) and relative clauses (rc), which are often extraposed, are difficult to attach to the corresponding head. Coordination (cj) is also notoriously difficult and achieves fairly low values Disambiguation The figures in table 3 show that a selection performed by one of the versions of the stochastic disambiguation component clearly performs better than a random selection (lower bound). We also observe that the all properties version of the disambiguation component performs noticeably better than the only original properties version. In terms of overall f-score, the gain with respect to the lower bound doubles with the help of the additional properties; for the core grammatical functions, such as oa, sb etc., which are particularly important for the potential construction of a semantic representation on the basis of f-structures, this gain is even far more important. For many of the grammatical functions, the additional properties allow the all properties f-score to be closer to the upper bound f-score than to the lower bound f-score. As this is not the case of the only original properties f-scores, we believe that property design will be partic-

5 relation or upper bound lower bound feature precision recall f-score precision recall f-score all 61213/ / / /70482 = 88.1 = = 82.8 = preds only 22050/ / / /27328 = = = 76.2 = ams 0/2 = 0 0 0/2 = 0 0 app 185/268 = /337 = /282 = /336 = app cl 23/27 = 85 23/77 = /26 = 85 22/77 = cc 17/23 = 74 17/46 = /20 = 70 14/45 = cj 1183/1412 = /1806 = /1412 = /1806 = da 118/190 = /162 = /226 = /162 = det 3655/3816 = /3938 = /3822 = /3930 = gl 292/316 = /317 = /305 = /316 = gr 804/928 = /902 = /897 = /899 = measured 9/20 = 45 9/24 = /20 = 45 9/24 = mo 4997/6878 = /6610 = /6946 = /6601 = mod 2087/2219 = /2228 = /2226 = /2227 = name mod 336/420 = /385 = /424 = /385 = number 370/469 = /424 = /456 = /423 = oa 923/1098 = /1191 = /1104 = /1189 = oa2 0/1 = 0 0 obj 2916/3213 = /3180 = /3227 = /3174 = oc fin 151/212 = /226 = /211 = /226 = oc inf 340/379 = /411 = /387 = /411 = og 5/5 = 100 5/9 = /5 = 60 3/9 = op 267/389 = /526 = /377 = /526 = op dir 29/38 = 76 29/140 = /38 = 53 20/140 = op loc 35/52 = 67 35/59 = /44 = 52 23/59 = op manner 6/8 = 75 6/16 = /4 = 50 2/16 = pd 258/358 = /403 = /358 = /403 = pred restr 110/121 = /122 = /123 = /122 = quant 172/195 = /234 = /184 = /234 = rc 175/212 = /250 = /209 = /250 = rs 2/19 = 11 2/4 = /19 = 11 2/4 = sb 2549/3128 = /3274 = /3140 = /3272 = sbp 35/46 = 76 35/57 = /43 = 65 28/57 = topic disloc 1/16 = 6 1/3 = /18 = 0 0/3 = 0 0 case 7941/9004 = /9098 = /8991 = /9085 = circ form 5/8 = 62 5/6 = /8 = 62 5/6 = comp form 99/115 = 86 99/160 = /111 = 86 96/160 = coord form 557/613 = /648 = /615 = /648 = degree 2346/2640 = /2488 = /2668 = /2486 = det type 3628/3780 = /3779 = /3772 = /3771 = fut 61/63 = 97 61/71 = /65 = 94 61/71 = gend 7207/7829 = /7875 = /7850 = /7864 = mood 2129/2254 = /2366 = /2253 = /2364 = num 8739/9495 = /9333 = /9510 = /9319 = pass asp 258/287 = /324 = /287 = /324 = perf 296/301 = /355 = /299 = /355 = pers 2392/2621 = /2800 = /2617 = /2796 = precoord form 7/8 = 88 7/9 = /7 = 86 6/9 = pron form 71/74 = 96 71/72 = /74 = 96 71/72 = pron type 1282/1689 = /1482 = /1700 = /1482 = tense 2145/2240 = /2360 = /2239 = /2358 = Table 2: Upper bound and lower bound precisions, recalls and F-scores for grammatical relations and morphosyntactic features in the 1602 TiGer DB examples

6 ularly important for the further improvement of the stochastic disambiguation component. A further step that we plan to take and that, as we hope, will improve the results of the stochastic disambiguation, regardless of the properties that are used for it, is the acquisition of more training data Comparison with previous work Our results are comparable to those reported by Riezler et al. (2002) and Cahill et al. (2005) for English. Our score is improved by the fact that we check some morphological information like gender, number or tense, which a good chunker could also identify correctly. In a preds-only evaluation, the figures are lower, but the same tendency is observed with other parsers that are evaluated on dependencybased gold standards. Dubey and Keller (2003) induce a grammar from the NE- GRA Treebank, a predecessor of TIGER. They report a labelled precision and recall of up to 74%. The results for induced grammars seem to be worse for German with its free word order than for English. This also holds for the German LFG induced from the TIGER Corpus (Cahill et al., 2005). The authors report an f-score of 71%. The evaluation is equivalent to ours, i.e. based on dependency triples obtained via conversion from TIGER graphs. The testsuite which functions as a gold standard, however, is fairly small. One of the reasons for the low f-score seems to be the lack of morphological information and the very flat structure of the TIGER graphs. Integrating morphological information would certainly improve the score. The flat structure of the NEGRA and TIGER Treebanks may also have a negative influence on the quality of the induced grammars. Foth et al. (2005) describe a parsing system for unrestricted German text. Total coverage is achieved by means of defeasible, graded constraints. The authors report an f-score of 87% in an evaluation with the NEGRA Corpus. These are clearly the best results for German so far. They are also better than those reported by Schiehlen (2003), who achieves an f-score of 81.7% on the NEGRA data. In support of our approach, we would like to mention that our grammar is fully reversible and comes with a fullfledged generator. 7. Conclusion We have shown that a hand-crafted deep grammar can achieve good results on free text. The next step will be to refine our stochastic disambiguation component. Our grammar can also be used in generation, unlike other large-scale grammars of German. 8. References Miriam Butt, Helge Dyvik, Tracy H. King, Hiroshi Masuichi, and Christian Rohrer The Parallel Grammar Project. In Proceedings of COLING-2002 Workshop on Grammar Engineering and Evaluation, pages 1 7. Aoife Cahill, Michael Burke, Martin Forst, Ruth O Donovan, Christian Rohrer, Josef van Genabith, and Andy Way Treebank-Based Multilingual Unification-Grammar Resources. Research in Language and Computation. Richard Crouch, Ronald M. Kaplan, Tracy H. King, and Stefan Riezler A comparison of evaluation metrics for a broad-coverage parser. In Proceedings of the LREC Workshop Beyond PARSEVAL Towards improved evaluation mesures for parsing systems, pages 67 74, Las Palmas, Spain. Stefanie Dipper Implementing and Documenting Large-scale Grammars German LFG. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), Volume 9, Number 1. Amit Dubey and Frank Keller Probabilistic Parsing for German using Sister-Head Dependencies. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages , Sapporo, Japan. Martin Forst, Núria Bertomeu, Berthold Crysmann, Frederik Fouvry, Silvia Hansen-Schirra, and Valia Kordoni Towards a dependency-based gold standard for German parsers The TiGer Dependency Bank. In Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC 04), Geneva. Kilian Foth, Wolfgang Menzel, and Ingo Schröder Robust parsing with weighted constraints. Natural Language Engineering, 11(1):1 25. Anette Frank, Tracy Holloway King, Jonas Kuhn, and John T. Maxwell III Optimality Theory Style Constraint Ranking in Large-Scale LFG Grammars. In Peter Sells, editor, Formal and Empirical Issues in Optimality Theoretic Syntax. Anette Frank A (Discourse) Functional Analysis of Asymmetric Coordination. In Proceedings of the 7th International LFG Conference (LFG 05), Athens, Greece. CSLI Publications. Tilmann Höhle Topologische Felder. Ph.D. thesis, University of Cologne. Ronald M. Kaplan, John T. Maxwell, Tracy H. King, and Richard Crouch Integrating Finite-state Technology with Deep LFG Grammars. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, Nancy, France. Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan The PARC 700 Dependency Bank. In Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC 03), Budapest. Stefan Riezler, Tracy Holloway King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002, Philadelphia. Michael Schiehlen Combining Deep and Shallow Approaches in Parsing German. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Adapting Stochastic Output for Rule-Based Semantics

Adapting Stochastic Output for Rule-Based Semantics Adapting Stochastic Output for Rule-Based Semantics Wissenschaftliche Arbeit zur Erlangung des Grades eines Diplom-Handelslehrers im Fachbereich Wirtschaftswissenschaften der Universität Konstanz Februar

More information

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English. Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

EAGLE: an Error-Annotated Corpus of Beginning Learner German

EAGLE: an Error-Annotated Corpus of Beginning Learner German EAGLE: an Error-Annotated Corpus of Beginning Learner German Adriane Boyd Department of Linguistics The Ohio State University adriane@ling.osu.edu Abstract This paper describes the Error-Annotated German

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing. Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Specifying a shallow grammatical for parsing purposes

Specifying a shallow grammatical for parsing purposes Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland

More information

Feature-Based Grammar

Feature-Based Grammar 8 Feature-Based Grammar James P. Blevins 8.1 Introduction This chapter considers some of the basic ideas about language and linguistic analysis that define the family of feature-based grammars. Underlying

More information

Construction Grammar. University of Jena.

Construction Grammar. University of Jena. Construction Grammar Holger Diessel University of Jena holger.diessel@uni-jena.de http://www.holger-diessel.de/ Words seem to have a prototype structure; but language does not only consist of words. What

More information

LFG Semantics via Constraints

LFG Semantics via Constraints LFG Semantics via Constraints Mary Dalrymple John Lamping Vijay Saraswat fdalrymple, lamping, saraswatg@parc.xerox.com Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304 USA Abstract Semantic theories

More information

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES PRO and Control in Lexical Functional Grammar: Lexical or Theory Motivated? Evidence from Kikuyu Njuguna Githitu Bernard Ph.D. Student, University

More information

Control and Boundedness

Control and Boundedness Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

A relational approach to translation

A relational approach to translation A relational approach to translation Rémi Zajac Project POLYGLOSS* University of Stuttgart IMS-CL /IfI-AIS, KeplerstraBe 17 7000 Stuttgart 1, West-Germany zajac@is.informatik.uni-stuttgart.dbp.de Abstract.

More information

Indeterminacy by Underspecification Mary Dalrymple (Oxford), Tracy Holloway King (PARC) and Louisa Sadler (Essex) (9) was: ( case) = nom ( case) = acc

Indeterminacy by Underspecification Mary Dalrymple (Oxford), Tracy Holloway King (PARC) and Louisa Sadler (Essex) (9) was: ( case) = nom ( case) = acc Indeterminacy by Underspecification Mary Dalrymple (Oxford), Tracy Holloway King (PARC) and Louisa Sadler (Essex) 1 Ambiguity vs Indeterminacy The simple view is that agreement features have atomic values,

More information

Switched Control and other 'uncontrolled' cases of obligatory control

Switched Control and other 'uncontrolled' cases of obligatory control Switched Control and other 'uncontrolled' cases of obligatory control Dorothee Beermann and Lars Hellan Norwegian University of Science and Technology, Trondheim, Norway dorothee.beermann@ntnu.no, lars.hellan@ntnu.no

More information

"f TOPIC =T COMP COMP... OBJ

f TOPIC =T COMP COMP... OBJ TREATMENT OF LONG DISTANCE DEPENDENCIES IN LFG AND TAG: FUNCTIONAL UNCERTAINTY IN LFG IS A COROLLARY IN TAG" Aravind K. Joshi Dept. of Computer & Information Science University of Pennsylvania Philadelphia,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Theoretical Syntax Winter Answers to practice problems

Theoretical Syntax Winter Answers to practice problems Linguistics 325 Sturman Theoretical Syntax Winter 2017 Answers to practice problems 1. Draw trees for the following English sentences. a. I have not been running in the mornings. 1 b. Joel frequently sings

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Interfacing Phonology with LFG

Interfacing Phonology with LFG Interfacing Phonology with LFG Miriam Butt and Tracy Holloway King University of Konstanz and Xerox PARC Proceedings of the LFG98 Conference The University of Queensland, Brisbane Miriam Butt and Tracy

More information

The Discourse Anaphoric Properties of Connectives

The Discourse Anaphoric Properties of Connectives The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Building an HPSG-based Indonesian Resource Grammar (INDRA) Building an HPSG-based Indonesian Resource Grammar (INDRA) David Moeljadi, Francis Bond, Sanghoun Song {D001,fcbond,sanghoun}@ntu.edu.sg Division of Linguistics and Multilingual Studies, Nanyang Technological

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2 National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2 LAG2201 German 2 Course Outline Course coordinators and lecturers A/P

More information

A Computational Evaluation of Case-Assignment Algorithms

A Computational Evaluation of Case-Assignment Algorithms A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

cmp-lg/ Jul 1995

cmp-lg/ Jul 1995 A CONSTRAINT-BASED CASE FRAME LEXICON ARCHITECTURE 1 Introduction Kemal Oazer and Okan Ylmaz Department of Computer Engineering and Information Science Bilkent University Bilkent, Ankara 0, Turkey fko,okang@cs.bilkent.edu.tr

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Development of the First LRs for Macedonian: Current Projects

Development of the First LRs for Macedonian: Current Projects Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk

More information

Information Status in Generation Ranking

Information Status in Generation Ranking Aoife Cahill nformation Status in Generation Ranking 1 / 57 nformation Status in Generation Ranking Aoife Cahill joint work with Arndt Riester Heidelberg Computational Linguistics Colloquium December 9,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses Universal Grammar 1 evidence : 1. crosslinguistic investigation of properties of languages 2. evidence from language acquisition 3. general cognitive abilities 1. Properties can be reflected in a.) structural

More information

Freitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115

Freitag 7. Januar = QUIZ = REFLEXIVE VERBEN = IM KLASSENZIMMER = JUDD 115 DEUTSCH 3 DIE DEBATTE: GEFÄHRLICHE HAUSTIERE Debatte: Freitag 14. JANUAR, 2011 Bewertung: zwei kleine Prüfungen. Bewertungssystem: (see attached) Thema:Wir haben schon die Geschichte Gefährliche Haustiere

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Underlying and Surface Grammatical Relations in Greek consider

Underlying and Surface Grammatical Relations in Greek consider 0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph

More information

Writing a composition

Writing a composition A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a

More information

Type-driven semantic interpretation and feature dependencies in R-LFG

Type-driven semantic interpretation and feature dependencies in R-LFG Type-driven semantic interpretation and feature dependencies in R-LFG Mark Johnson Revision of 23rd August, 1997 1 Introduction This paper describes a new formalization of Lexical-Functional Grammar called

More information

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand 1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at

More information

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit Unit 1 Language Development Express Ideas and Opinions Ask for and Give Information Engage in Discussion ELD CELDT 5 EDGE Level C Curriculum Guide 20132014 Sentences Reflective Essay August 12 th September

More information

Update on Soar-based language processing

Update on Soar-based language processing Update on Soar-based language processing Deryle Lonsdale (and the rest of the BYU NL-Soar Research Group) BYU Linguistics lonz@byu.edu Soar 2006 1 NL-Soar Soar 2006 2 NL-Soar developments Discourse/robotic

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

Hindi Aspectual Verb Complexes

Hindi Aspectual Verb Complexes Hindi Aspectual Verb Complexes HPSG-09 1 Introduction One of the goals of syntax is to termine how much languages do vary, in the hope to be able to make hypothesis about how much natural languages can

More information

Argument structure and theta roles

Argument structure and theta roles Argument structure and theta roles Introduction to Syntax, EGG Summer School 2017 András Bárány ab155@soas.ac.uk 26 July 2017 Overview Where we left off Arguments and theta roles Some consequences of theta

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

The Interface between Phrasal and Functional Constraints

The Interface between Phrasal and Functional Constraints The Interface between Phrasal and Functional Constraints John T. Maxwell III* Xerox Palo Alto Research Center Ronald M. Kaplan t Xerox Palo Alto Research Center Many modern grammatical formalisms divide

More information

! XLE: A First Walkthrough! Robustness techniques! Generation! Disambiguation! Applications: ! Provide detailed syntactic/semantic analyses

! XLE: A First Walkthrough! Robustness techniques! Generation! Disambiguation! Applications: ! Provide detailed syntactic/semantic analyses XLE: Grammar Development Platform Parser/Generator/Rewrite System ICON 2007 Miriam Butt (Universit( Universität Konstanz) Tracy Holloway King (PARC) Outline! What is a deep grammar and why would you want

More information

Som and Optimality Theory

Som and Optimality Theory Som and Optimality Theory This article argues that the difference between English and Norwegian with respect to the presence of a complementizer in embedded subject questions is attributable to a larger

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Guidelines for Writing an Internship Report

Guidelines for Writing an Internship Report Guidelines for Writing an Internship Report Master of Commerce (MCOM) Program Bahauddin Zakariya University, Multan Table of Contents Table of Contents... 2 1. Introduction.... 3 2. The Required Components

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading Welcome to the Purdue OWL This page is brought to you by the OWL at Purdue (http://owl.english.purdue.edu/). When printing this page, you must include the entire legal notice at bottom. Where do I begin?

More information

LNGT0101 Introduction to Linguistics

LNGT0101 Introduction to Linguistics LNGT0101 Introduction to Linguistics Lecture #11 Oct 15 th, 2014 Announcements HW3 is now posted. It s due Wed Oct 22 by 5pm. Today is a sociolinguistics talk by Toni Cook at 4:30 at Hillcrest 103. Extra

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Participate in expanded conversations and respond appropriately to a variety of conversational prompts Students continue their study of German by further expanding their knowledge of key vocabulary topics and grammar concepts. Students not only begin to comprehend listening and reading passages more fully,

More information

National Literacy and Numeracy Framework for years 3/4

National Literacy and Numeracy Framework for years 3/4 1. Oracy National Literacy and Numeracy Framework for years 3/4 Speaking Listening Collaboration and discussion Year 3 - Explain information and ideas using relevant vocabulary - Organise what they say

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s)) Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other

More information

Derivational and Inflectional Morphemes in Pak-Pak Language

Derivational and Inflectional Morphemes in Pak-Pak Language Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Today we examine the distribution of infinitival clauses, which can be

Today we examine the distribution of infinitival clauses, which can be Infinitival Clauses Today we examine the distribution of infinitival clauses, which can be a) the subject of a main clause (1) [to vote for oneself] is objectionable (2) It is objectionable to vote for

More information

Chapter 9 Banked gap-filling

Chapter 9 Banked gap-filling Chapter 9 Banked gap-filling This testing technique is known as banked gap-filling, because you have to choose the appropriate word from a bank of alternatives. In a banked gap-filling task, similarly

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation Gene Kim and Lenhart Schubert Presented by: Gene Kim April 2017 Project Overview Project: Annotate a large, topically

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Roy Bar-Haim,Ido Dagan, Iddo Greental, Idan Szpektor and Moshe Friedman Computer Science Department, Bar-Ilan University,

More information

Multiple case assignment and the English pseudo-passive *

Multiple case assignment and the English pseudo-passive * Multiple case assignment and the English pseudo-passive * Norvin Richards Massachusetts Institute of Technology Previous literature on pseudo-passives (see van Riemsdijk 1978, Chomsky 1981, Hornstein &

More information

Hindi-Urdu Phrase Structure Annotation

Hindi-Urdu Phrase Structure Annotation Hindi-Urdu Phrase Structure Annotation Rajesh Bhatt and Owen Rambow January 12, 2009 1 Design Principle: Minimal Commitments Binary Branching Representations. Mostly lexical projections (P,, AP, AdvP)

More information