Tree-Adjoining Grammars

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Developing a TT-MCTAG for German with an RCG-based Parser

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

"f TOPIC =T COMP COMP... OBJ

CS 598 Natural Language Processing

Grammars & Parsing, Part 1:

Parsing of part-of-speech tagged Assamese Texts

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

A Version Space Approach to Learning Context-free Grammars

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Natural Language Processing. George Konidaris

LTAG-spinal and the Treebank

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Proof Theory for Syntacticians

The Interface between Phrasal and Functional Constraints

Context Free Grammars. Many slides from Michael Collins

Parsing natural language

Hyperedge Replacement and Nonprojective Dependency Structures

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

An Introduction to the Minimalist Program

The Strong Minimalist Thesis and Bounded Optimality

Analysis of Probabilistic Parsing in NLP

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Control and Boundedness

Ch VI- SENTENCE PATTERNS.

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Prediction of Maximal Projection for Semantic Role Labeling

Some Principles of Automated Natural Language Information Extraction

arxiv:cmp-lg/ v1 16 Aug 1996

Language properties and Grammar of Parallel and Series Parallel Languages

A General Class of Noncontext Free Grammars Generating Context Free Languages

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Ensemble Technique Utilization for Indonesian Dependency Parser

A Grammar for Battle Management Language

The Discourse Anaphoric Properties of Connectives

Specifying Logic Programs in Controlled Natural Language

Refining the Design of a Contracting Finite-State Dependency Parser

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Chapter 4: Valence & Agreement CSLI Publications

Guidelines for Writing an Internship Report

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

An Interactive Intelligent Language Tutor Over The Internet

Pre-Processing MRSes

Accurate Unlexicalized Parsing for Modern Hebrew

AQUA: An Ontology-Driven Question Answering System

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Linking Task: Identifying authors and book titles in verbose queries

Using dialogue context to improve parsing performance in dialogue systems

Compositional Semantics

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Radius STEM Readiness TM

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Dependency, licensing and the nature of grammatical relations *

Constraining X-Bar: Theta Theory

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

GACE Computer Science Assessment Test at a Glance

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

WSU Five-Year Program Review Self-Study Cover Page

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Statewide Framework Document for:

Hindi-Urdu Phrase Structure Annotation

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

LING 329 : MORPHOLOGY

Update on Soar-based language processing

Specifying a shallow grammatical for parsing purposes

An Efficient Implementation of a New POP Model

On the Notion Determiner

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Character Stream Parsing of Mixed-lingual Text

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Reinforcement Learning by Comparing Immediate Reward

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Argument structure and theta roles

Beyond the Pipeline: Discrete Optimization in NLP

Som and Optimality Theory

A Graph Based Authorship Identification Approach

A relational approach to translation

Language Model and Grammar Extraction Variation in Machine Translation

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Transcription:

Tree Adjoining Grammars Tree-Adjoining Grammars Department of Computer Science University of Helsinki Department of Computer Science, University of Helsinki Page 1

Tree Adjoining Grammars Outline Introduction: formalisms for linguistic purposes. Basics of TAGs: elementary structures and operations, derivation. Formal properties of grammars and TAGs TAG variants Multicomponent TAGs (MC-TAG) Synchronous TAGs (S-TAG) TAG parsing Department of Computer Science, University of Helsinki Page 2

Tree Adjoining Grammars Formal systems for linguistic theories Basis of any formal system: elementary structures and combining operations. Context-free grammars (CFG): terminal and nonterminal symbols, and rewrite rules. CFG example rules as elementary structures. 1. S 2. 3. 4. V 5. 6. really V likes John Lyn Department of Computer Science, University of Helsinki Page 3

Tree Adjoining Grammars Derivation in CFGs The phrase structure tree S John really V likes Lyn For each nonterminal node, the daughters record which rule was used to rewrite it. Department of Computer Science, University of Helsinki Page 4

Tree Adjoining Grammars Tree Substitution Grammars (TSG) Both elementary objects and derivations are trees. TSG example. S S John Lyn V John V likes likes Lyn Elementary structures are combined by substitution. Condition: The nonterminal node must have the same label as the root node of the substituted tree. Department of Computer Science, University of Helsinki Page 5

Tree Adjoining Grammars Domain of locality CFGs and TSGs are weakly equivalent. They generate the same string languages, but the derived structures have a different Domain of locality. Local restrictions are valid in the domain of locality: a CFG rule or a tree grammar tree. Examples: V agreement, subcategorisation. TSGs (and other tree grammars) have an Extended domain of locality. Department of Computer Science, University of Helsinki Page 6

Tree Adjoining Grammars Lexicalisation A grammar is lexicalised, if every elementary structure is associated with exactly one lexical item, and every lexical item of the language is associated with a finite set of elementary structures in the grammar. CFGs cannot be lexicalised in a linguistically meaningful manner, but let s try. S likes No place for really? Instead of merging two rules into one, we can combine them into a tree structure TSG. Still no place for really. Solution: Adjunction operation. A formalism in which the elementary structures of a grammar are trees and in which the combining operations are adjunction and substitution is called a Tree Adjoining Grammar (TAG). When lexicalised, we have a Lexicalised Tree Adjoining Grammar (LTAG). Department of Computer Science, University of Helsinki Page 7

Tree Adjoining Grammars Elementary structures Elementary trees are maximal syntactic projections of lexical items. Initial tree: Auxiliary tree: X X X Alpha trees. Recursion is not allowed in initial trees. Lexicalised trees have anchors on the frontier of the tree. Beta trees. Recursion allowed. Root and foot node must have the same label. Department of Computer Science, University of Helsinki Page 8

Tree Adjoining Grammars Operations Substitution: only for initial trees or lexical items. Y 2 X Y 1 => X Y 2 Adjunction: only for auxiliary trees. Y 2 Y 3 X Y 1 => X Y 2 Y 3 Department of Computer Science, University of Helsinki Page 9

Tree Adjoining Grammars Adjunction example Adjunction of really into initial tree: S S really * John V John really likes Lyn V likes Lyn Department of Computer Science, University of Helsinki Page 10

Tree Adjoining Grammars Derived trees and derivation trees A string-rewriting formalism, e.g. a CFG, derives a set of strings. A tree-rewriting formalism, e.g. a TAG, derives a tree: derived tree. Linguistic TAGs derive phrase structure trees. A derivation tree records how the derived string (CFG) or derived tree (TAG) was assembled from elementary rules (CFG) or elementary tree (TAG). Derivation tree for John really likes Lyn: (like) (John) (Lyn) (really) Department of Computer Science, University of Helsinki Page 11

Tree Adjoining Grammars Derivation tree examples When derived treed are ambiguous, derivation trees might show the difference. Elementary tree for an idiomatic expression and two derivation trees for Mary pull John s leg: To pull s leg Literal reading Idiomatic reading S pull-n0vn1 pull-leg-n0vdn1n 0 Mary- leg- Mary- John- V s-d pull D N John- 1 s leg Department of Computer Science, University of Helsinki Page 12

Tree Adjoining Grammars Adjunction constraints and features Elementary tree nodes can be annotated with adjunction constraints. Selective adjoining constraint (SA): list of accepted trees. Null adjoining constraint (NA): empty list. Obligatory adjunction constraint (OA): boolean value. Nonterminal and terminal nodes? NA nodes are nonterminal nodes that are not rewritten. OA nodes are nonterminal nodes that must be rewritten. SA nodes are either terminal or nonterminal nodes for tree rewriting. Department of Computer Science, University of Helsinki Page 13

Tree Adjoining Grammars Comparison of formal grammars Chomsky hierarchy for string rewriting systems Grammar Languages Automaton Production rules Type-0 Recursively enumerable Turing machine No restrictions Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine A Type-2 Context-free Nondeterministic A pushdown automaton Type-3 Regular Finite state automaton A A ab a Tree Adjoining Grammars are sronger than CFGs, but weaker than Context-sensitive grammars. Department of Computer Science, University of Helsinki Page 14

(TAG) Tree Adjoining Grammars Formal properties of TAGs The set of languages generated by a TAG, context-free grammar, (CFG). (TAG), includes the set of languages generated by a Inclusion is proper, e.g. COUNT-4= a n 0 (CFG) Moreover, (TAG) (CSG), e.g. COUNT-5 (CSG) (TAG) Automaton: Embedded Pushdown Automaton with a stack of stacks of stack symbols as the pushdown store. Tree-Adjoining Languages (TAL) are polynomially parsable, time complexity O(n ). Department of Computer Science, University of Helsinki Page 15

Tree Adjoining Grammars Extending the Power of TAG TAG cannot always provide a satisfactory analysis for linguistic constructions, e.g. This building, John bought a picture of. This building is the complement of the noun picture and should be substituted into an node in the same elementary tree as the head noun picture. Illegal adjuntion: S S 0 PP V 1* Det N P 1 buy picture of Illegal auxiliary tree Department of Computer Science, University of Helsinki Page 16

Tree Adjoining Grammars Multicomponent TAGs (MC-TAG) Elementary sets are sets of trees rather than single trees. In a tree-local multicomponent TAG, all members of an elementary set must adjoin simultaneously into a single elementary tree. In a set-local multicomponent TAG, all members of a derived set of trees must adjoin simultaneously into trees from a single elementary set. S S S* PP 0 Det N P 1 V 1 picture of buy Department of Computer Science, University of Helsinki Page 17

Tree Adjoining Grammars Synchronous TAGs (STAG) A Synchronous TAG relates the tree-adjoining grammars of two different languages. Definitions for node to node correspondence, lexical entries, feature transfer. Application areas include machine translation, language generation, semantic analysis, etc. A typical transfer algorithm for machine translation: Parse the source sentence according to the source grammar. Map each elementary tree in the source derivation tree with a tree in the target derivation tree according to the transfer lexicon. Read the target sentence off the target derivation tree. Example. Department of Computer Science, University of Helsinki Page 18

Tree Adjoining Grammars TAG recognition and parsing A bottom-up chart parser proceeds bottom-up in recognising the elementary trees used in a derivation and assembling the elementary trees into a derivation. Worst and best case time complexity O(n ). Earley-style algorithms combine bottom-up parsing with top-down prediction on derived trees. Worst case time complexity O(n ) O(n ), faster in an average case. Head-driven algorithms extends parses along the path from the anchor of an elementary tree to its root by performing adjunctions. Worst case time complexity O(n Algorithms based on kernel grammars (a CFG) parse the input twice. In the second step, TAGincompatible derivations are eliminated from the context-free parse forest. Worst case time complexity O(n ). Several other parsing algorithms exist. ). Department of Computer Science, University of Helsinki Page 19

Tree Adjoining Grammars Today... Project work topics introduction and selection. Presentation schedule. Delivery of exercises for next week. Department of Computer Science, University of Helsinki Page 20