Developing a TT-MCTAG for German with an RCG-based Parser

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Parsing of part-of-speech tagged Assamese Texts

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Some Principles of Automated Natural Language Information Extraction

LTAG-spinal and the Treebank

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

"f TOPIC =T COMP COMP... OBJ

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Proof Theory for Syntacticians

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Control and Boundedness

Theoretical Syntax Winter Answers to practice problems

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Hindi-Urdu Phrase Structure Annotation

Adapting Stochastic Output for Rule-Based Semantics

Modeling full form lexica for Arabic

Words come in categories

Linking Task: Identifying authors and book titles in verbose queries

Chapter 4: Valence & Agreement CSLI Publications

Refining the Design of a Contracting Finite-State Dependency Parser

Update on Soar-based language processing

Hyperedge Replacement and Nonprojective Dependency Structures

arxiv:cmp-lg/ v1 16 Aug 1996

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

An Interactive Intelligent Language Tutor Over The Internet

Prediction of Maximal Projection for Semantic Role Labeling

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Underlying and Surface Grammatical Relations in Greek consider

Argument structure and theta roles

Context Free Grammars. Many slides from Michael Collins

Specifying a shallow grammatical for parsing purposes

The Interface between Phrasal and Functional Constraints

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Natural Language Processing. George Konidaris

Character Stream Parsing of Mixed-lingual Text

Annotation Projection for Discourse Connectives

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Applications of memory-based natural language processing

The Discourse Anaphoric Properties of Connectives

A Framework for Customizable Generation of Hypertext Presentations

Constraining X-Bar: Theta Theory

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

LING 329 : MORPHOLOGY

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

The Real-Time Status of Island Phenomena *

1. Introduction. 2. The OMBI database editor

Analysis of Probabilistic Parsing in NLP

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

AQUA: An Ontology-Driven Question Answering System

The Strong Minimalist Thesis and Bounded Optimality

Beyond the Pipeline: Discrete Optimization in NLP

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Using dialogue context to improve parsing performance in dialogue systems

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Ensemble Technique Utilization for Indonesian Dependency Parser

Building an HPSG-based Indonesian Resource Grammar (INDRA)

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Development of the First LRs for Macedonian: Current Projects

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Procedia - Social and Behavioral Sciences 154 ( 2014 )

EAGLE: an Error-Annotated Corpus of Beginning Learner German

THE VERB ARGUMENT BROWSER

- «Crede Experto:,,,». 2 (09) ( '36

A First-Pass Approach for Evaluating Machine Translation Systems

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Software Maintenance

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Emotional Variation in Speech-Based Natural Language Generation

Taking into Account the Oral-Written Dichotomy of the Chinese language :

The Smart/Empire TIPSTER IR System

STRUCTURAL ENGINEERING PROGRAM INFORMATION FOR GRADUATE STUDENTS

Dependency, licensing and the nature of grammatical relations *

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

A Version Space Approach to Learning Context-free Grammars

Transcription:

Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008, 28.05.2008 Developing a TT-MCTAG for German 1

Aims and scope Presentation of an implementation framework for a German TAG-based grammar How to design and maintain a grammatical resource? (i.e., a German TT-MCTAG) How to connect this with a (2-layered) lexical resource? How to parse German using these resources? Outline: 1 The formalism: TAG and TT-MCTAG 2 The implementation framework: XMG and TuLiPA 3 The grammar: GerTT Developing a TT-MCTAG for German 2

Tree-Adjoining Grammar - Basics A Tree Adjoining Grammar (TAG) is a set of elementary trees: a finite set of initial trees a finite set of auxiliary trees E.g.: ADV * easily NP V repaired NP Combinatorial operations: substitution: replacing a non-terminal leaf with an initial tree adjunction: replacing an internal node with an auxiliary tree Developing a TT-MCTAG for German 3

Tree-Adjoining Grammar - Example NP NP NP Peter V NP the fridge ADV * repaired easily derived tree derivation tree NP Peter ADV repaired easily V NP 1 2 22 repaired the fridge Peter easily the fridge Developing a TT-MCTAG for German 4

Tree-Adjoining Grammar - Basics TAGs are mildly context-sensitive: 1 Polynomial time parsing complexity 2 Generation of limited crossing dependencies 3 Constant growth property (semilinearity) Large TAG grammars: English and Korean (XTAG, UPenn) French TAG (Benoit Crabbé s PhD-thesis)... Developing a TT-MCTAG for German 5

Why not TAG for German? The order of complements (and adjuncts) of a verb is flexible. (1) Peter liebt Susi. 1: Peter loves Susi 2: Susi loves Peter (2) dass Peter heute den Kühlschrank repariert hat dass den Kühlschrank heute Peter repariert hat... ( that Peter has repaired the fridge today ) TAG is inappropriate for German, because it is: not powerful enough for some constructions (i.e., coherent constructions) not descriptively adequat (i.e., one elementary tree for each permutation) Developing a TT-MCTAG for German 6

Why not TAG for German? The order of complements (and adjuncts) of a verb is flexible. (1) Peter liebt Susi. 1: Peter loves Susi 2: Susi loves Peter (2) dass Peter heute den Kühlschrank repariert hat dass den Kühlschrank heute Peter repariert hat... ( that Peter has repaired the fridge today ) TAG is inappropriate for German, because it is: not powerful enough for some constructions (i.e., coherent constructions) not descriptively adequat (i.e., one elementary tree for each permutation) Developing a TT-MCTAG for German 7

TT-MCTAG: a TAG-extension for German Multi-Component TAG (MCTAG) with shared-nodes locality Elementary structures are tuples γ, {β 1,...,β n } : a lexicalized elementary tree γ (the head tree) a tree set {β 1,..., β n } (the complement trees) Meaning of tree tuples: During derivation, the β-trees have to attach to the γ-tree (via node sharing). Node sharing: In the derivation tree, 1 a β-tree must either be the immediate daughter of its γ-tree, 2 or the β-tree must be connected to the daughter of the γ-tree via a chain of root adjunctions. V, repariert NP nom *, NP acc * Developing a TT-MCTAG for German 8

TT-MCTAG example (3) dass den Kühlschrank heute Peter repariert ( that Peter repairs the fridge today ) ADV * * 8 >< V, >: repariert NP nom NP Peter heute *, NP acc NP den K. * 9 >= >; + 1 repariert 0 NP nom 0 Peter heute 0 NP acc 1 den Kühlschrank Developing a TT-MCTAG for German 9

The implementation framework: metagrammar XMG-compiler lexicon parser parsing results (TuLiPA) sentence XMG: extensible MetaGrammar (Duchier et al, 2004) TuLiPA: Tübingen Linguistic Parsing Architecture (Parmentier et al, 2008) Developing a TT-MCTAG for German 10

extensible MetaGrammar (XMG) (Duchier et al, 2004) XMG lets one construct a grammar semi-automatically by describing tree fragments and their combination. The output structures are unlexicalized trees (tree schemata). Essential for: consistency, design and maintainance efforts Components: 1 a descripton language 2 a compiler 3 a viewer 4 output format: XML XMG has been extended to describe tree sets. Developing a TT-MCTAG for German 11

XMG: An example NP + * NP * substitution node -projection complement tree AP + * AP * adverbial anchor -projection adverbial tree Developing a TT-MCTAG for German 12

XMG: An example + Developing a TT-MCTAG for German 13

A 2-layered lexicon Morphological lexicon maps an (inflected) token to some lemma form, while preserving morphological information in a feature structure. vergisst vergessen [pos=v; num=sg; per=3;] Lemma lexicon maps a lemma onto tree tuple families, while also containing selectional restrictions (e.g., case assignment). *ENTRY: vergessen *CAT: v *SEM: BinaryRel[pred=vergessen] *ACC: 1 *FAM: Vnp2 *FILTERS: [] *EX: *EQUATIONS: NParg1 cas = nom NParg2 cas = acc *COANCHORS: Developing a TT-MCTAG for German 14

A 2-layered lexicon Morphological lexicon maps an (inflected) token to some lemma form, while preserving morphological information in a feature structure. vergisst vergessen [pos=v; num=sg; per=3;] Lemma lexicon maps a lemma onto tree tuple families, while also containing selectional restrictions (e.g., case assignment). *ENTRY: vergessen *CAT: v *SEM: BinaryRel[pred=vergessen] *ACC: 1 *FAM: Vnp2 *FILTERS: [] *EX: *EQUATIONS: NParg1 cas = nom NParg2 cas = acc *COANCHORS: Developing a TT-MCTAG for German 15

Tübingen Linguistic Parsing Architecture (TuLiPA) (Parmentier et al, 2008) Components: 1 TT-MCTAG-to-RCG converter (on-line) 2 RCG parser RCG derivation forest TT-MCTAG derivation forest 3 Parse viewer (derived tree, derivation tree, dependency view, semantic representation) Availability of TuLiPA: written in Java and released under the GNU GPL (http://sourcesup.cru.fr/tulipa/) Developing a TT-MCTAG for German 16

TuLiPA: Why RCG? RCG is useful, because: it has attractive formal properties (polynomially parsable, full expressive power of MCS-languages); there exist parsing algorithms. Parser can be reused for other mildly context-sensitive formalisms! NB: RCG properly includes MCS. We use a restricted RCG, called simple RCG, that is included in MCS. Developing a TT-MCTAG for German 17

TuLiPA: The graphical frontend Developing a TT-MCTAG for German 18

TuLiPA: The graphical frontend Developing a TT-MCTAG for German 19

Ongoing grammar development GerTT (German TT-MCTAG) Large-coverage TT-MCTAG for German, including semantics. Linguistic principals: no empty elements such as traces and PRO no control and raising in the syntax State of implementation: free word order phenomena: scrambling, coherent constructions, verbal clustering extraction phenomena: relative clauses, wh-questions, bridging constructions ca. 70 XMG-classes Currently, coverage testing is prepared based on the TSNLP test suite. Developing a TT-MCTAG for German 20

Summary TT-MCTAG: More natural support of flexible word order languages, but still mildly context-sensitive (in fact only k-tt-mctag). The implementation framework: XMG + TuLiPA: Immediate control over implementational (consistency) and linguistic (coverage) aspects of the grammar. XMG: Effortless means for making systematic changes in the grammar. TuLiPA: Easiliy adoptable to other MCS formalisms (given a RCG conversion algorithm). And GerTT is on his way... Developing a TT-MCTAG for German 21

References Denys Duchier,Joseph Le Roux,Yannick Parmentier (2004): The Metagrammar Compiler: An NLP Application with a Multi-paradigm. Second International Mozart/Oz Conference (MOZ 2004)Architecture. Yannick Parmentier, Laura Kallmeyer, Wolfgang Maier, Timm Lichte, Johannes Dellert (2008): TuLiPA: A syntax-semantics parsing environment for mildly context-sensitive formalisms. Proceedings of the The Ninth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+9). Developing a TT-MCTAG for German 22