Extracting and Using Trace-Free Functional Dependencies from the Penn Treebank to Reduce Parsing Complexity

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

Context Free Grammars. Many slides from Michael Collins

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Accurate Unlexicalized Parsing for Modern Hebrew

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

CS 598 Natural Language Processing

The stages of event extraction

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

LTAG-spinal and the Treebank

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Natural Language Processing. George Konidaris

Some Principles of Automated Natural Language Information Extraction

Parsing of part-of-speech tagged Assamese Texts

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Prediction of Maximal Projection for Semantic Role Labeling

Advanced Grammar in Use

Ensemble Technique Utilization for Indonesian Dependency Parser

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Construction Grammar. University of Jena.

Developing a TT-MCTAG for German with an RCG-based Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Argument structure and theta roles

Specifying a shallow grammatical for parsing purposes

The Interface between Phrasal and Functional Constraints

"f TOPIC =T COMP COMP... OBJ

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Analysis of Probabilistic Parsing in NLP

The Discourse Anaphoric Properties of Connectives

Learning Computational Grammars

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Proof Theory for Syntacticians

Chapter 4: Valence & Agreement CSLI Publications

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Compositional Semantics

Constraining X-Bar: Theta Theory

Using dialogue context to improve parsing performance in dialogue systems

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

An Efficient Implementation of a New POP Model

Adapting Stochastic Output for Rule-Based Semantics

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Experiments with a Higher-Order Projective Dependency Parser

Multiple case assignment and the English pseudo-passive *

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Ch VI- SENTENCE PATTERNS.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Pseudo-Passives as Adjectival Passives

Annotation Projection for Discourse Connectives

AQUA: An Ontology-Driven Question Answering System

Hindi-Urdu Phrase Structure Annotation

Underlying and Surface Grammatical Relations in Greek consider

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

The Smart/Empire TIPSTER IR System

Control and Boundedness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Words come in categories

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A Computational Evaluation of Case-Assignment Algorithms

Memory-based grammatical error correction

LNGT0101 Introduction to Linguistics

Theoretical Syntax Winter Answers to practice problems

Domain Adaptation for Parsing

Developing Grammar in Context

Using Semantic Relations to Refine Coreference Decisions

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Today we examine the distribution of infinitival clauses, which can be

Beyond the Pipeline: Discrete Optimization in NLP

The Role of the Head in the Interpretation of English Deverbal Compounds

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Dependency Annotation of Coordination for Learner Language

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Loughton School s curriculum evening. 28 th February 2017

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Extracting and Ranking Product Features in Opinion Documents

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Refining the Design of a Contracting Finite-State Dependency Parser

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

A Graph Based Authorship Identification Approach

Transcription:

Extracting and Using Trace-Free Functional Dependencies from the Penn Treebank to Reduce Parsing Complexity Gerold Schneider Institute of Computational Linguistics, University of Zurich Department of Linguistics, University of Geneva gerold.schneiderlettres.unige.ch November 14, 2003 1

Contents 1. Motivation 2. Probability Model 3. Extraction of Dependencies 4. Frequency Analysis of Empty Nodes 5. Evaluation 6. Conclusions 2

1 Motivation 1. Most formal grammars need parsers with high parsing complexity: O(n 5 ) and worse 2. Most statistical parsers allow using O(n 3 ) complexity algorithms (Eisner, 2000), (Nivre, 2003), such as the CYK used here, but they do not express long-distance dependencies (LDD) and empty nodes (EN) 3. Most successful deep-linguistic Dependency Parsers (Lin, 1998), (Tapanainen and Järvinen, 1997) do not have a statistical base 4. Reconstruction of LDD and EN from statistical parser output is not successful (Johnson, 2002) 3

2 Lexicalized Dependency Probability Model In a binary CFG, any two constituents A and B which are adjacent during parsing are candidates for the RHS of a rewrite rule. Terminal types are the word tags. X! AB; e:g:n P! DT NN (1) In DG and Bare Phrase Structure, one of these is isomorphic to the RHS, i.e. the head. B! AB; e:g: NN! DT NN (2) A! AB; e:g: V B! V B PP (3) DG rules additionally use a syntactic relation label R. A non-lexicalized model would be: p(rja! AB) ο = #(R; A! AB) (4) #(A! AB) 4

Research on PCFG and PP-attachment has shown the importance of probabilizing on lexical heads (a and b). #(R; A! AB; a; b) (5) p(rja! AB;a;b) ο = All that A! AB expresses is that in the dependency relation the dependency is towards the right. #(A! AB; a; b) #(R; right; a; b) (6) p(rjright; a; b) ο = e.g. for the Verb-PP attachment relation pobj (following (Collins and Brooks, 1995) including the desc. noun = noun inside PP) #(right; a; b) #(right; verb; prep; desc:noun) (7) #(pobj; right; verb; prep; desc:noun) p(pobjjright; verb; prep; desc:noun) ο = 5

(Collins, 1996) MLE estimation: P (Rjha; atagi; hb; btagi; dist) ο = #(R; ha; atagi; hb; btagi; dist) (8) #(ha; atagi; hb; btagi; dist) (Schneider, 2003) MLE estimation: P (R; distja; b) ο = p(rja; b) p(distjr) ο = #(R; a; b) #(R; dist) (9) #R #(a; b) licencing, rule-based hand-written grammar over Penn tags back-off to semantic classes (WordNet) real distance, measured in chunks co-occurrence in denominator is not sentence-context, but of competing relations (e.g. object/adjunct or P subject/modpart) decision! probabilities 6 Relations (R) have a Functional Dependency Grammar definition (overleaf)

ff hobjecti TLT 2003, Växjö. Gerold Schneider S``` ψψψ NP VP h hhh (( (( man VB NP eat banana PP H Φ IN NP with fork Reduced, chunked Tree representation for the sentence This man eats bananas with a fork leads to the following Dependency Relations: hnp; S;VPi ff hvb;vp;npi ff hvb;vp;ppi - ff hin; PP; NPi man eat banana with fork (Collins, 1996) hverb PPi - hnoun prepi man eat banana with fork (Schneider, 2003) hsubjecti ff - 7

3 Extraction of Dependencies Active subject relation has the head of an arbitrarily nested NP with the functional tag SBJ as dependent, and the head of an arbitrarily nested VP as head Passive subject and control subject:? hhh (((( h NP-SBJ-X VP X XX οο noun V NP ο passive verb *-X? hhh (((( h NP-SBJ-X VP X XX οο noun V S ο control-verb NP-SBJ -NONE- -NONE- 99 % identity of X! local dependencies across several subtrees X! simply reduce to really local dependency *-X 8

A large subset of syntactic relations, the ones which are considered most relevant for argument structure and which are most ambiguous, are modeled. Some use functional labels, several levels of subtrees and empty nodes as integral parts. RELATION LABEL EXAMPLE verb subject subj he sleeps verb direct object obj sees it verb second object obj2 gave (her) kisses verb adjunct adj ate yesterday verb subord. clause sentobj saw (they) came verb pred. adjective predadj is ready verb prep. phrase pobj slept in bed noun prep. phrase modpp draft of paper noun participle modpart report written verb complementizer compl to eat apples noun preposition prep to the house Verb subject has a different probability model for active and passive 9

4 Frequency Analysis of Empty Nodes Distribution of the 10 most frequent types of empty nodes and their antecedents in the Penn Treebank (adapted from (Johnson, 2002)) Antecedent POS Label Count Description Example 1 NP NP * 22,734 NP trace Sam was seen * 2 NP * 12,172 NP PRO * to sleep is nice 3 WHNP NP *T* 10,659 WH trace the woman who you saw *T* (4) *U* 9,202 Empty units $25*U* (5) 0 7,057 Empty complementizers Sam said 0 Sasha snores (6) S S *T* 5,035 Moved clauses Sam had to go, Sasha said *T* 7 WHADVP ADVP *T* 3,181 WH-trace Sam explained how to leave *T* (8) SBAR 2,513 Empty clauses Sam had to go, said Sasha (SBAR) (9) WHNP 0 2,139 Empty relative pronouns thewoman0wesaw (10) WHADVP 0 726 Empty relative pronouns thereason0toleave Empty elements [rows 4,5,9,10]! non-nucleus material Moved clauses[6], subj utterance-verb inversion[8]! change of canonical direction 10

4.1 NP Traces Coverage of the patterns for the most frequent NP traces [row 1] Type Count prob-modeled Treatment passive subject 6,803 YES local relation indexed gerund 4,430 NO Tesnière translation control, raise, semi-aux 6,020 YES post-parsing processing (see below) others / not covered 5,481 TOTAL 22,734 sentobj(ask, elaborate, _g101293, ->, 36). modpart(salinger, ask,elaborate, <-, 36). appos(salinger, secretary, _g101568, ->, 36). subj(reply, salinger, ask, <-, 36). subj(say, i, _g101843, <-, 36). subj(get, it, _g102032, <-, 36). subj(go, it, subj_control, <-, 36). % subj-control prep(draft, thru, _g102286, <-, 36). pobj(go, draft, thru, ->, 36). sentobj(get, go, draft, ->, 36). sentobj(say, get, it, ->, 36). sentobj(reply, say, i, ->, 36). 11

4.2 NP PRO 12,172 NP PRO [row 2] in the Treebank. 5,656 are modpart, 3,095 non-indexed gerunds, 1,598 adverbial phrases of verbs, 268 adverbial phrases of nouns. 4.3 WH trace 113 of the 10,659 WHNP antecedants [row 3] are question pronouns. Over 9,000 are relative pronouns! change of direction if subject or infinitive [example of row 7] is present But non-subject WH-question pronouns and support verbs need to be treated as real non-local dependencies. Before main parsing is started, the support verb is attached to any lonely participle chunk in the sentence, the WH-pronoun pre-parses with any verb. 12

>< >:! modpp TLT 2003, Växjö. Gerold Schneider 5 Evaluation Subject 8 >< >: subj Precision: modpart! ncsubj OR C cmod OR C (with rel.pro) ncsubj Recall:! subj C modpart OR C = non-clausal subject ncsubj C = clausal modification, used for relative clauses (but not all cmod c are relative pronouns) cmod Object 8 >< Precision: obj OR obj2! dobj C OR obj2 C Recall: dobj C OR obj2 C! obj OR obj2 dobj C =first object >: 8 obj2 C =second object Precision: modpp! ncmod C (with prep) OR xmod C (with prep) noun-pp Recall: ncmod C (with prep) OR xmod C (with prep) ncmod C =non-clausal modification xmod C =clausal modification for verb-to-noun translations 13

General Evaluation and Comparison Percentage Values for Subject Object noun-pp verb-pp Precision 91 89 73 74 Recall 81 83 67 83 Comparison to Lin (on the whole Susanne corpus) Subject Object PP-attachment Precision 89 88 78 Recall 78 72 72 Comparison to Buchholz (Buchholz, 2002); and to Charniak (Charniak, 2000), according to Preiss Subject(ncsubj) Object(dobj) Precision 86; 82 88; 84 Recall 73; 70 77; 76 14

Selective LDD evaluation (as far as the annotations permit) modpart LDD relations results for WH-Subject Precision 57/62 92 % WH-Subject Recall 45/50 90 % WH-Object Precision 6/10 60 % WH-Object Recall 6/7 86 % Anaphora of the rel. clause subject Precision 41/46 89 % Anaphora of the rel. clause subject Recall 40/63 63 % Passive subject Recall 132/160 83% Precision for subject-control subjects 40/50 80% Precision for object-control subjects 5/5 100% Precision of relation 34/46 74% Precision for topicalized verb-attached PPs 25/35 71% 15

6 Conclusions fast ( 300,000 words/h), lexicalized broad-coverage parser with grammatical relation (GR) output GR are closer to predicate-argument structures than pure constituency structures, and more informative if non-local dependencies are involved. Parser s performance is state-of-the-art. for English, most non-local dependencies can be treated as local dependencies (1) by using and modeling dedicated patterns across several levels of constituency subtrees (2) by lexicalized post-processing rules (3) because some non-local dependencies are artifacts of the grammatical representation. 16

References Buchholz, Sabine. 2002. Memory-Based Grammatical Relation Finding. Ph.D. thesis, University of Tilburg, Tilburg, Netherlands. Charniak, Eugene. 2000. A maximum-entropy-inspired parser. In Proceedings of the North American Chapter of the ACL, pages 132 139. Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pages 184 191, Philadelphia. Collins, Michael and James Brooks. 1995. Prepositional attachment through a backed-off model. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA. Eisner, Jason. 2000. Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt, editors, Advances in Probabilistic and Other Parsing Technologies. Kluwer Academic Publishers. Johnson, Mark. 2002. A simple pattern-matching algorithm for recovering empty nodes and their antecedents. In Proceedings of the 40th Meeting of the ACL, University of Pennsylvania, Philadelphia. Lin, Dekang. 1998. Dependency-based evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems, Granada, Spain. Nivre, Joakim. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), Nancy. Schneider, Gerold. 2003. Extracting and using trace-free Functional Dependencies from the Penn Treebank to reduce parsing complexity. In Proceedings of Treebanks and Linguistic Theories (TLT) 2003,Växjö, Sweden. Tapanainen, Pasi and Timo Järvinen. 1997. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 64 71. Association for Computational Linguistics. 17

Parsing Efficiency I DG is binary & in Chomsky Normal Form! CYK CYK Parsing: bottom-up parallel processing, passive chart j for = 2 N to # length of span i for = 1 N j + 1 to # begin of span k for i + 1 = i + j 1 to # separator position Z! XY if X 2 [i k];y 2 [k j] and j = 3 Z =2 [i j] and then Z insert [i j] at A C B D C E D F A B B C C D D E E F A B C D E F 18

DG is binary & in Chomsky Normal Form! CYK! 0(n 3 ) CYK Parsing: bottom-up parallel processing my chartdata-driven CYK implementation 1. Add all terminals to chart X i: k:[i k] Y j:[k j] : tried(x; Y Y Z Z [i! X; j] tried(x; Y 2. Loop: foreach chart entry foreach chart entry # adjacent if ) foreach assert to chart (for next Loop) else assert ) 3. If any rule was successful, prune and then Loop again, else terminate. pruning: If in a Loop more than m chart entries are created, then for every span with more n than readings in the chart, only keep the most n=2 probable entries. auxiliary charts: remember all tried chart pairs. Remember all computed probabilities. 19