Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Similar documents
CS 598 Natural Language Processing

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Natural Language Processing. George Konidaris

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Proof Theory for Syntacticians

Developing a TT-MCTAG for German with an RCG-based Parser

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Some Principles of Automated Natural Language Information Extraction

Character Stream Parsing of Mixed-lingual Text

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Hyperedge Replacement and Nonprojective Dependency Structures

Constraining X-Bar: Theta Theory

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

The Interface between Phrasal and Functional Constraints

Chapter 4: Valence & Agreement CSLI Publications

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

"f TOPIC =T COMP COMP... OBJ

Parsing natural language

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Compositional Semantics

Context Free Grammars. Many slides from Michael Collins

Prediction of Maximal Projection for Semantic Role Labeling

A Grammar for Battle Management Language

Analysis of Probabilistic Parsing in NLP

LTAG-spinal and the Treebank

Argument structure and theta roles

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

An Interactive Intelligent Language Tutor Over The Internet

GACE Computer Science Assessment Test at a Glance

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

An Introduction to the Minimalist Program

Construction Grammar. University of Jena.

Accurate Unlexicalized Parsing for Modern Hebrew

Ensemble Technique Utilization for Indonesian Dependency Parser

Lecture 1: Basic Concepts of Machine Learning

SEMAFOR: Frame Argument Resolution with Log-Linear Models

LING 329 : MORPHOLOGY

Theoretical Syntax Winter Answers to practice problems

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Language acquisition: acquiring some aspects of syntax.

The Structure of Multiple Complements to V

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

AQUA: An Ontology-Driven Question Answering System

A relational approach to translation

Specifying Logic Programs in Controlled Natural Language

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Language properties and Grammar of Parallel and Series Parallel Languages

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Refining the Design of a Contracting Finite-State Dependency Parser

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Pre-Processing MRSes

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Language Acquisition Chart

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

LNGT0101 Introduction to Linguistics

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Type Theory and Universal Grammar

The CYK -Approach to Serial and Parallel Parsing

Adapting Stochastic Output for Rule-Based Semantics

Using dialogue context to improve parsing performance in dialogue systems

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Organizing Comprehensive Literacy Assessment: How to Get Started

Words come in categories

TCC Jim Bolen Math Competition Rules and Facts. Rules:

A Computational Evaluation of Case-Assignment Algorithms

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

A Framework for Customizable Generation of Hypertext Presentations

A Version Space Approach to Learning Context-free Grammars

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Multiple case assignment and the English pseudo-passive *

Annotation Projection for Discourse Connectives

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Som and Optimality Theory

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

The Smart/Empire TIPSTER IR System

Arizona s College and Career Ready Standards Mathematics

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Information for Candidates

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Transcription:

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together or arrangement. Refers to the way words are arranged together. Why worry about syntax? The boy ate the frog. The frog was eaten by the boy. The frog that the boy ate died. The boy whom the frog was eaten by died. Slide CS474 1 Slide CS474 2 Key ideas: Syntactic Analysis constituency: groups of words may behave as a single unit or phrase grammatical relations: refer to the subject, object, indirect object, etc. subcategorization and dependencies: refer to certain kinds of relations between words and phrases, e.g. want can be followed by an infinitive, but find and work cannot. All can be modeled by various kinds of grammars that are based on context-free grammars. Grammars and Parsing Need a grammar: a formal specification of the structures allowable in the language. Need a parser: algorithm for assigning syntactic structure to an input sentence. Sentence Beavis ate the cat. Parse Tree NP S VP NAME V NP Beavis ate ART the N cat Slide CS474 3 Slide CS474 4

CFG example CFG s are also called phrase-structure grammars. Equivalent to Backus-Naur Form (BNF). 1. S NP VP 5. NAME Beavis 2. VP V NP 6. V ate 3. NP NAME 7. ART the 4. NP ART N 8. N cat CFG s are powerful enough to describe most of the structure in natural languages. CFG s are restricted enough so that efficient parsers can be built. A context free grammar consists of: CFG s 1. a set of non-terminal symbols N 2. a set of terminal symbols Σ (disjoint from N) 3. a set of productions, P, each of the form A α, where A is a non-terminal and α is a string of symbols from the infinite set of strings (Σ N) 4. a designated start symbol S Slide CS474 5 Slide CS474 6 Derivations If the rule A β P, and α and γ are strings in the set (Σ N), then we say that αaγ directly derives αβγ, or αaγ αβγ Let α 1, α 2,..., α m be strings in (Σ N), m > 1, such that α 1 α 2, α 2 α 3,..., α m 1 α m, L G The language L G generated by a grammar G is the set of strings composed of terminal symbols that can be derived from the designated start symbol S. L G = {w w Σ, S w} then we say that α 1 derives α m or α 1 αm Parsing: the problem of mapping from a string of words to its parse tree according to a grammar G. Slide CS474 7 Slide CS474 8

General Parsing Strategies Grammar Top-Down Bottom-Up 1. S NP VP S NP VP NAME ate the cat 2. VP V NP NAME VP NAME V the cat 3. NP NAME Beav VP NAME V ART cat 4. NP ART N Beav V NP NAME V ART N 5. NAME Beavis Beav ate NP NP V ART N 6. V ate Beav ate ART N NP V NP 7. ART the Beav ate the N NP VP 8. N cat Beav ate the cat S A Top-Down Parser Input: CFG grammar, lexicon, sentence to parse Output: yes/no State of the parse: (symbol list, position) start state: ((S) 1) 1 The 2 old 3 man 4 cried 5 Slide CS474 9 Slide CS474 10 Grammar: Grammar and Lexicon 1. S NP VP 4. VP v 2. NP art n 5. VP v NP 3. NP art adj n Lexicon: the: art old: adj, n man: n, v cried: v 1 The 2 old 3 man 4 cried 5 P SL (((S) 1)) Algorithm for a Top-Down Parser 1. Check for failure. If PSL is empty, return NO. 2. Select the current state, C. C pop (PSL). 3. Check for success. If C = (() <final-position>), YES. 4. Otherwise, generate the next possible states. (a) s 1 first-symbol(c) (b) If s 1 is a lexical symbol and next word can be in that class, create new state by removing s 1, updating the word position, and adding it to P SL. (I ll add to front.) (c) If s 1 is a non-terminal, generate a new state for each rule in the grammar that can rewrite s 1. Add all to P SL. (Add to front.) Slide CS474 11 Slide CS474 12

Example Current state Backup states 1. ((S) 1) 2. ((NP VP) 1) 3. ((art n VP) 1) ((art adj n VP) 1) 4. ((n VP) 2) ((art adj n VP) 1) 5. ((VP) 3) ((art adj n VP) 1) 6. ((v) 3) ((v NP) 3) ((art adj n VP) 1) 7. (() 4) ((v NP) 3) ((art adj n VP) 1) Backtrack 8. ((v NP) 3) ((art adj n VP) 1) leads to backtracking... 9. ((art adj n VP) 1) 10. ((adj n VP) 2) 11. ((n VP) 3) 12. ((VP) 4) 13. ((v) 4) ((v NP) 4) 14. (() 5) ((v NP) 4) YES DONE! Slide CS474 13 Slide CS474 14 Problems with the Top-Down Parser 1. Only judges grammaticality. 2. Stops when it finds a single derivation. 3. No semantic knowledge employed. 4. No way to rank the derivations. 5. Problems with left-recursive rules. 6. Problems with ungrammatical sentences. Efficient Parsing The top-down parser is terribly inefficient. Have the first year Phd students in the computer science department take the Q-exam. Have the first year Phd students in the computer science department taken the Q-exam? Slide CS474 15 Slide CS474 16

Chart Parsers chart: data structure that stores partial results of the parsing process in such a way that they can be reused. The chart for an n-word sentence consists of: n + 1 vertices a number of edges that connect vertices Judge Ito scolded the defense. 0 1 2 3 4 5 S-> NP. VP VP->V NP. S-> NP VP. Chart Parsing: The General Idea The process of parsing an n-word sentence consists of forming a chart with n + 1 vertices and adding edges to the chart one at a time. Goal: To produce a complete edge that spans from vertex 0 to n and is of category S. There is no backtracking. Everything that is put in the chart stays there. Chart contains all information needed to create parse tree. Slide CS474 17 Slide CS474 18 Bottom-UP Chart Parsing Algorithm Do until there is no input left: 1. If the agenda is empty, get next word from the input, look up word categories, add to agenda (as constituent spanning two postions). 2. Select a constituent from the agenda: constituent C from p 1 to p 2. 3. Insert C into the chart from position p 1 to p 2. 4. For each rule in the grammar of form X C X 1... X n, add an active edge of form X C X 1... X n from p 1 to p 2. 5. Extend existing edges that are looking for a C. (a) For any active edge of form X X 1... CX n from p 0 to p 1, add a new active edge X X 1... C X n from p 0 to p 2. (b) For any active edge of form X X 1... X n C from p 0 to p 1, add a new (completed) constituent of type X from p 0 to p 2 to the agenda. Slide CS474 19 Slide CS474 20

Grammar and Lexicon Grammar: 1. S NP VP 3. NP ART ADJ N 2. NP ART N 4. VP V NP Lexicon: the: ART man: N, V old: ADJ, N boat: N Sentence: 1 The 2 old 3 man 4 the 5 boat 6 [See.ppt slides] Example Slide CS474 21 Slide CS474 22 NP2 (rule 3) NP1 (rule 2) S (rule 1) The old man the boat. 1 2 3 4 5 ART1 ADJ1 N2 ART2 V1 NP->ART. N NP->ART. ADJ N NP -> ART ADJ. N S -> NP. VP VP1 VP -> V. NP VP2 (rule 4) NP1 (rule 2) NP->ART. N N3 NP->ART. ADJ N 6 Bottom-up Chart Parser Is it any less naive than the top-down parser? 1. Only judges grammaticality.[fixed] 2. Stops when it finds a single derivation.[fixed] 3. No semantic knowledge employed. 4. No way to rank the derivations. 5. Problems with ungrammatical sentences.[better] 6. Terribly inefficient. S -> NP. VP Slide CS474 23 Slide CS474 24

Efficient Parsing n = sentence length Time complexity for naive algorithm: exponential in n Time complexity for bottom-up chart parser: (n 3 ) Options for improving efficiency: 1. Don t do twice what you can do once. 2. Don t represent distinctions that you don t need. Fall leaves fall and spring leaves spring. 3. Don t do once what you can avoid altogether. The can holds the water. ( can : AUX, V, N) Earley Algorithm: Top-Down Chart Parser For all S rules of the form S X 1... X k, add a (top-down) edge from 1 to 1 labeled: S X 1... X k. Do until there is no input left: 1. If the agenda is empty, look up word categories for next word, add to agenda. 2. Select a constituent from the agenda: constituent C from p 1 to p 2. 3. Using the (bottom-up) edge extension algorithm, combine C with every active edge on the chart (adding C to chart as well). Add any new constituents to the agenda. 4. For any active edges created in Step 3, add them to the chart using the top-down edge introduction algorithm. Slide CS474 25 Slide CS474 26 Top-down edge introduction. To add an edge S C 1... C i... C n ending at position j: For each rule in the grammar of form C i X 1... X k, recursively add the new edge C i X 1... X k from j to j. Grammar and Lexicon Grammar Lexicon 1. S NP VP the: ART 2. NP ART ADJ N large: ADJ 3. NP ART N can: N, AUX, V 4. NP ADJ N hold: N, V 5. VP AUX VP water: N, V 6. VP V NP Sentence: 1 The 2 large 3 can 4 can 5 hold 6 water 7 Slide CS474 27 Slide CS474 28