Lecture 5: Parsing with constraint-based grammars

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Chapter 4: Valence & Agreement CSLI Publications

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

CS 598 Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Feature-Based Grammar

"f TOPIC =T COMP COMP... OBJ

Natural Language Processing. George Konidaris

Proof Theory for Syntacticians

Adapting Stochastic Output for Rule-Based Semantics

Some Principles of Automated Natural Language Information Extraction

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Argument structure and theta roles

On the Notion Determiner

Grammars & Parsing, Part 1:

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Parsing of part-of-speech tagged Assamese Texts

Compositional Semantics

Context Free Grammars. Many slides from Michael Collins

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Developing a TT-MCTAG for German with an RCG-based Parser

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

An Interactive Intelligent Language Tutor Over The Internet

LING 329 : MORPHOLOGY

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Constraining X-Bar: Theta Theory

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

AQUA: An Ontology-Driven Question Answering System

Hindi Aspectual Verb Complexes

The Interface between Phrasal and Functional Constraints

Control and Boundedness

Type Theory and Universal Grammar

Analysis of Probabilistic Parsing in NLP

Using dialogue context to improve parsing performance in dialogue systems

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Construction Grammar. University of Jena.

A relational approach to translation

MODELING DEPENDENCY GRAMMAR WITH RESTRICTED CONSTRAINTS. Ingo Schröder Wolfgang Menzel Kilian Foth Michael Schulz * Résumé - Abstract

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Type-driven semantic interpretation and feature dependencies in R-LFG

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Structure-Preserving Extraction without Traces

Derivational and Inflectional Morphemes in Pak-Pak Language

Hyperedge Replacement and Nonprojective Dependency Structures

Language acquisition: acquiring some aspects of syntax.

Switched Control and other 'uncontrolled' cases of obligatory control

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

An Introduction to the Minimalist Program

Parsing natural language

6.863J Natural Language Processing Lecture 12: Featured attraction. Instructor: Robert C. Berwick

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Accurate Unlexicalized Parsing for Modern Hebrew

Specifying Logic Programs in Controlled Natural Language

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Update on Soar-based language processing

Negation through reduplication and tone: implications for the LFG/PFM interface 1

Prediction of Maximal Projection for Semantic Role Labeling

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Radius STEM Readiness TM

Constructions with Lexical Integrity *

Interfacing Phonology with LFG

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

LTAG-spinal and the Treebank

Words come in categories

Underlying and Surface Grammatical Relations in Greek consider

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Domain Adaptation for Parsing

The Inclusiveness Condition in Survive-minimalism

cmp-lg/ Jul 1995

Pseudo-Passives as Adjectival Passives

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

A Framework for Customizable Generation of Hypertext Presentations

Visual CP Representation of Knowledge

Refining the Design of a Contracting Finite-State Dependency Parser

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

Phenomena of gender attraction in Polish *

Character Stream Parsing of Mixed-lingual Text

The College Board Redesigned SAT Grade 12

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

Introduction to Causal Inference. Problem Set 1. Required Problems

Specifying a shallow grammatical for parsing purposes

THE VERB ARGUMENT BROWSER

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Transcription:

Lecture 5: Parsing with constraint-based grammars Providing a more adequate treatment of syntax than simple CFGs: replacing the atomic categories by more complex data structures. 1. Problems with simple CFG encoding: agreement, subcategorisation, long distance dependencies. 2. Feature structures (informally) 3. Encoding agreement 4. Parsing with feature structures 5. Feature stuctures more formally 6. Encoding subcategorisation 7. Interface to morphology 1

Deficiencies in atomic category CFGs Overgeneration with lecture 4 grammar: agreement e.g. subject verb agreement they fish, it fishes, *it fish, *they fishes case pronouns (and maybe who/whom) they like them, *they like they Expanding symbols: S -> NP-sg-subj VP-sg S -> NP-pl-subj VP-pl VP-sg -> V-sg NP-sg-obj VP-sg -> V-sg NP-pl-obj VP-pl -> V-pl NP-sg-obj VP-pl -> V-pl NP-pl-obj NP-sg-subj -> he NP-sg-obj -> him NP-sg-subj -> fish NP-pl-subj -> fish NP-sg-obj -> fish NP-pl-obj -> fish 2

Intuitive solution for case and agreement BUT: very large grammar, misses generalizations, no way of saying when we don t care about agreement. Have separate slots (features) for case (CASE) and agreement (AGR) Allow slot values for CASE to be subj, obj or unspecified Allow slot values for AGR to be sg, pl or unspecified Subjects must have the same value for AGR as their verbs Subjects have CASE subj, objects have CASE obj can (noun) fish (noun) she CASE [ ] AGR sg CASE [ ] AGR [ ] CASE subj AGR sg 3

them CASE obj AGR pl 4

Subcategorization intransitive vs transitive etc e.g., verbs have different numbers and types of syntactic arguments: *Kim adored *Kim gave Sandy *Kim adored to sleep Kim liked to sleep *Kim devoured Kim ate Subcategorization is correlated with semantics, but not determined by it. Overgeneration: they fish fish it (S (NP they) (VP (V fish) (VP (V fish) (NP it)))) Informally: need slots on the verbs for their syntactic arguments. 5

Long-distance dependencies 1. which problem did you say you don t understand? 2. who do you think Kim asked Sandy to hit? 3. which kids did you say were making all that noise? gaps (underscores below) 1. which problem did you say you don t understand? 2. who do you think Kim asked Sandy to hit? 3. which kids did you say were making all that noise? In 3, the verb were shows plural agreement. * what kid did you say were making all that noise? The gap filler has to be plural. Informally: need a gap slot which is to be filled by something that itself has features. 6

Feature structures 1. Features like AGR with simple values (sg, pl): atomic-valued 2. Unspecified values possible on features: compatible with any value. 3. Values for features for subcat and gap themselves have features: complex-valued path: a sequence of features 4. Method of specifying two paths are the same: reentrancy 5. Unification: combining two feature structures, retaining all information from each, or fail if information is incompatible. Feature structures are singly-rooted directed acyclic graphs, with arcs labelled by features and terminal nodes associated with values. Rules relate FSs i.e. lexical entries and phrases are represented as FSs Rule application by unification 7

Graphs and AVMs Example 1: CAT AGR NP sg Here, CAT and AGR are atomic-valued features. NP and sg are values. Example 2: CAT AGR NP is complex-valued, AGR is unspecified. AVM notation: Example 1: CAT NP AGR sg Example 2: CAT NP AGR [ ] 8

Reentrancy F a G a F a G a F G 3 a F 0 a G 0 9

CFG with agreement S -> NP-sg VP-sg S -> NP-pl VP-pl VP-sg -> V-sg NP-sg VP-sg -> V-sg NP-pl VP-pl -> V-pl NP-sg VP-pl -> V-pl NP-pl V-pl -> like V-sg -> likes NP-sg -> it NP-pl -> they NP-sg -> fish NP-pl -> fish 10

FS grammar fragment encoding agreement Grammar rules Rule1 Rule2 Lexicon: CAT S CAT VP ;;; noun phrases they CAT NP AGR pl fish it ;;; verbs like likes CAT NP AGR [ ] CAT NP AGR sg CAT V AGR pl CAT V AGR sg Root structure: [ CAT S ] CAT NP, CAT VP CAT V, CAT NP AGR [ ] 11

Parsing (informally) they like it The lexical structures for like and it are unified with the corresponding structure to the right hand side of rule 2 (unifications succeed). The structure corresponding to the mother of the rule is: CAT VP AGR pl Unifies with the rightmost daughter position of rule 1. they is unified with the leftmost daughter. Result unifies with root structure 12

Rules as FSs Rules have features MOTHER, DTR1, DTR2... DTRN. Rule2 (informally): CAT VP actually: CAT V, CAT NP MOTHER CAT VP DTR1 CAT V DTR2 CAT NP AGR [ ] AGR [ ] 13

Rule 2 application like unified with the value of DTR1 in rule 2. MOTHER CAT VP pl DTR1 CAT V DTR2 CAT NP AGR [ ] it is unified with the value for DTR2: MOTHER CAT VP pl DTR1 CAT V DTR2 CAT NP AGR sg 14

Rule 1 application MOTHER value acts as the DTR2 of Rule 1 CAT VP AGR pl is unified with the DTR2 value of: MOTHER CAT S DTR1 CAT NP DTR2 CAT VP This gives: MOTHER CAT S pl DTR1 CAT NP DTR2 CAT VP 15

Rule 1 application continued The FS for they is: CAT NP AGR pl The unification of this with the value of DTR1 succeeds but adds no new information: MOTHER CAT S pl DTR1 CAT NP DTR2 CAT VP 16

Properties of FSs Connectedness and unique root A FS must have a unique root node: apart from the root node, all nodes have one or more parent nodes. Unique features Any node may have zero or more arcs leading out of it, but the label on each (that is, the feature) must be unique. No cycles No node may have an arc that points back to the root node or to a node that intervenes between it and the root node. Values A node which does not have any arcs leading out of it may have an associated atomic value. Finiteness A FS must have a finite number of nodes. 17

Subsumption Feature structures are ordered by information content FS1 subsumes FS2 if FS2 carries extra information. FS1 subsumes FS2 if and only if the following conditions hold: Path values For every path P in FS1 there is a path P in FS2. If P has a value t in FS1, then P also has value t in FS2. Path equivalences Every pair of paths P and Q which are reentrant in FS1 (i.e., which lead to the same node in the graph) are also reentrant in FS2. Unification The unification of two FSs FS1 and FS2 is the most general FS which is subsumed by both FS1 and FS2, if it exists. 18

Grammar with subcategorisation information shared between a lexical entry and the dominating phrases of the same category (agreement and category) S NP V VP VP Schematically: circles indicate heads VP PP V P NP COMP subcategorization: arguments that come after the lexical entry in English (e.g., verbs objects) Rule 1 unifies the second dtr with the COMP value of the first. SPR arguments that come before the lexical entry in English (e.g. verbs subjects) Rule 2 unifies the first daughter with the SPR value of the second. 19

Example rule application: they fish Lexical entry for fish: CAT verb AGR pl SPR CAT noun Rule 2: 1 SPR filled 2 AGR 3 SPR filled, 1 AGR 3 SPR 2 20

unification with second dtr position gives: 1 CAT verb AGR 3 pl SPR filled 2 CAT noun AGR 3 SPR filled, 1 SPR 2 21

Lexical entry for they: CAT noun AGR pl SPR filled unify this with first dtr position: 1 CAT verb AGR 3 pl SPR filled 2 CAT noun AGR 3 SPR filled, 1 SPR 2 Root is: CAT verb SPR filled Mother structure unifies with root, so valid. 22

Parsing with feature structure grammars Naive algorithm: standard chart parser with modified rule application Rule application: 1. copy rule 2. copy daughters (lexical entries or FSs associated with edges) 3. unify rule and daughters 4. if successful, add new edge to chart with rule FS as category Efficient algorithms reduce copying. Packing involves subsumption. Probabilistic FS grammars are complex. 23

Templates Capture generalizations in the lexicon: fish INTRANS VERB sleep INTRANS VERB snore INTRANS VERB INTRANS VERB CAT verb AGR pl SPR CAT noun 24

Interface to morphology Associate inflectional affixes with templates. s PLURAL_NOUN PLURAL_NOUN CAT noun AGR pl stem is: CAT noun AGR SPR filled unify stem with affix template: CAT noun AGR pl SPR filled Unification failure with verbs etc. 25