Syntactic Theory. Tree-Adjoining Grammar (TAG) Yi Zhang. November 5th, Department of Computational Linguistics Saarland University

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

"f TOPIC =T COMP COMP... OBJ

Developing a TT-MCTAG for German with an RCG-based Parser

LTAG-spinal and the Treebank

Grammars & Parsing, Part 1:

Hyperedge Replacement and Nonprojective Dependency Structures

Language properties and Grammar of Parallel and Series Parallel Languages

Parsing of part-of-speech tagged Assamese Texts

Proof Theory for Syntacticians

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Some Principles of Automated Natural Language Information Extraction

LING 329 : MORPHOLOGY

Prediction of Maximal Projection for Semantic Role Labeling

CS 598 Natural Language Processing

A Grammar for Battle Management Language

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Context Free Grammars. Many slides from Michael Collins

arxiv:cmp-lg/ v1 16 Aug 1996

Discriminative Learning of Beam-Search Heuristics for Planning

A Version Space Approach to Learning Context-free Grammars

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Natural Language Processing. George Konidaris

An Introduction to the Minimalist Program

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Analysis of Probabilistic Parsing in NLP

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Chapter 4: Valence & Agreement CSLI Publications

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

A Framework for Customizable Generation of Hypertext Presentations

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Compositional Semantics

The Discourse Anaphoric Properties of Connectives

University of Edinburgh. University of Pennsylvania

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Character Stream Parsing of Mixed-lingual Text

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Control and Boundedness

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Accurate Unlexicalized Parsing for Modern Hebrew

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

A General Class of Noncontext Free Grammars Generating Context Free Languages

The Interface between Phrasal and Functional Constraints

Refining the Design of a Contracting Finite-State Dependency Parser

UCLA UCLA Electronic Theses and Dissertations

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Pre-Processing MRSes

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

SOME MINIMAL NOTES ON MINIMALISM *

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

PROBLEMS IN ADJUNCT CARTOGRAPHY: A CASE STUDY NG PEI FANG FACULTY OF LANGUAGES AND LINGUISTICS UNIVERSITY OF MALAYA KUALA LUMPUR

Hindi-Urdu Phrase Structure Annotation

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

An Efficient Implementation of a New POP Model

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

A relational approach to translation

Theoretical Syntax Winter Answers to practice problems

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

LNGT0101 Introduction to Linguistics

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective

Minimalism is the name of the predominant approach in generative linguistics today. It was first

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Adapting Stochastic Output for Rule-Based Semantics

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Dependency, licensing and the nature of grammatical relations *

Authors note Chapter One Why Simpler Syntax? 1.1. Different notions of simplicity

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Construction Grammar. University of Jena.

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

A Computational Evaluation of Case-Assignment Algorithms

Domain Adaptation for Parsing

Organizing Comprehensive Literacy Assessment: How to Get Started

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Beyond the Pipeline: Discrete Optimization in NLP

Interfacing Phonology with LFG

Update on Soar-based language processing

Grammar Extraction from Treebanks for Hindi and Telugu

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Specifying Logic Programs in Controlled Natural Language

Disambiguation of Thai Personal Name from Online News Articles

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Highlighting and Annotation Tips Foundation Lesson

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

AQUA: An Ontology-Driven Question Answering System

Transcription:

Syntactic Theory Tree-Adjoining Grammar (TAG) Yi Zhang Department of Computational Linguistics Saarland University November 5th, 2009

What you should have known so far... Phrase structure grammars Context-free grammar (CFG) Dependency grammar

What you should have known so far... Phrase structure grammars Context-free grammar (CFG) Dependency grammar

Outline Overview Tree-Subsitutional Grammar (TSG)

Outline Overview Tree-Subsitutional Grammar (TSG)

Tree-Adjoining Grammar Describing natural language syntax in CFG is not aways effective/possible Comparing to CFG, TAG is an extended formalism Basic elements in TAG are trees, instead of atomic symbols TAG is a tree-rewriting (instead of strings rewriting) system TAG is mildly context-sensitive A lexically-oriented formalism (especially the lexicalized tree adjoining grammar (LTAG))

A Brief Review of the History and Variants of TAG Originally developed by Aravind Joshi (1975) Lexicalized Tree-Adjoining Grammar (LTAG) Synchronous TAG (STAG) Multi-component TAG (MCTAG)

Outline Overview Tree-Subsitutional Grammar (TSG)

Phrase Structure Tree & CFG 1. S NP VP 2. VP really VP 3. VP V NP 4. V likes 5. NP John 6. NP Lyn NP John S really VP V likes VP NP Lyn The locality of each rule is limited to one level of branching in the tree PS tree directly reflects the derivation steps of the CFG

Limitations of CFG as Linguistic Formalism Limited locality makes it difficult to describe (even slightly) non-local linguistic phenomena Although it is possible to extend the CFG with complex categories (e.g. via lexicalization), the grammar soon gets ugly

Tree-Substitution Grammar Elementary structures are phrase structure trees A downward arrow ( ) indicates where a substitution takes place α 1 α 2 α 3 NP S John NP VP NP Lyn V NP likes

Substitution Operation The substitution operation allows one to insert elementary trees into other elementary trees Where there is a node marked for substitution ( ) on the frontier, an elementary tree rooted in the same category can be substituted there S A S A A

Substitutions & Derived Tree S NP VP V NP likes

Substitutions & Derived Tree S NP John V VP NP likes

Substitutions & Derived Tree S NP VP John V likes NP Lyn

Substitutions & Derived Tree S NP VP John V NP likes Lyn A (completely) derived tree has no more substitution nodes on the frontier The order of substitutions is irrelevant

Elementary Trees Elementary trees are the building blocks of TSG and TAG For TSG, all the elementary trees are so-called initial trees, which are characterized as followings: interior nodes labeled by non-terminal symbols frontier nodes labeled by terminal and non-terminal symbols non-terminal nodes on the frontier of the initial tree are marked for substitution (and conventionally noted with )

Tree-Substitution Grammar: Formal Definition A Tree-Substitution Grammar (TSG) is a quadruple (Σ, NT, I, S), where 1. Σ is a finite set of terminal symbols 2. NT is a finite set of non-terminal symbols: Σ NT = Φ 3. S is a distinguished non-terminal symbol: S NT 4. I is a finite set of initial trees

Lexicalization A grammar is lexicalized if it consists of: a finite set of structures each associated with a lexical item; each lexical item will be called the anchor of the corresponding structure an operation or operations for composing the structures Theorem Lexicalized grammars are finitely ambiguous We say a formalism F can be lexicalized by another formalism F, if for any finitely ambiguous grammar G in F there is a grammar G in F such that G is a lexicalized grammar and such that G and G generate the same tree set (and hence the same language).

Lexicalization A grammar is lexicalized if it consists of: a finite set of structures each associated with a lexical item; each lexical item will be called the anchor of the corresponding structure an operation or operations for composing the structures Theorem Lexicalized grammars are finitely ambiguous We say a formalism F can be lexicalized by another formalism F, if for any finitely ambiguous grammar G in F there is a grammar G in F such that G is a lexicalized grammar and such that G and G generate the same tree set (and hence the same language).

Lexicalization A grammar is lexicalized if it consists of: a finite set of structures each associated with a lexical item; each lexical item will be called the anchor of the corresponding structure an operation or operations for composing the structures Theorem Lexicalized grammars are finitely ambiguous We say a formalism F can be lexicalized by another formalism F, if for any finitely ambiguous grammar G in F there is a grammar G in F such that G is a lexicalized grammar and such that G and G generate the same tree set (and hence the same language).

Problem with Lexicalization in TSG Consider this CFG 1. S NP VP 2. VP adv VP 3. VP v 4. NP n It can be lexicalized in a TSG (α 1 ) S NP (α 2 ) S VP v NP adv (α 3 ) VP VP VP adv VP (α 4 ) VP (α 5) NP v n

Problem with Lexicalization in TSG Consider this CFG 1. S NP VP 2. VP adv VP 3. VP v 4. NP n It can be lexicalized in a TSG (α 1 ) S NP (α 2 ) S VP v NP adv (α 3 ) VP VP VP adv VP (α 4 ) VP v (α 5) NP n Linguistically motivated???

Is TSG Good Enough? Theorem Finitely ambiguous context-free grammars cannot be lexicalized with a tree-substitution grammar Proof. 1. S S S 2. S a (Try to prove there is no lexicalzed TSG that generates the same tree language)

Is TSG Good Enough? Theorem Finitely ambiguous context-free grammars cannot be lexicalized with a tree-substitution grammar Proof. 1. S S S 2. S a (Try to prove there is no lexicalzed TSG that generates the same tree language)

References I Joshi, A. and Schabes, Y. (1997). Tree-adjoining grammars.