October 16, 2003 Chapter Parsing

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

The Interface between Phrasal and Functional Constraints

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS 598 Natural Language Processing

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Grammars & Parsing, Part 1:

Developing a TT-MCTAG for German with an RCG-based Parser

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Proof Theory for Syntacticians

"f TOPIC =T COMP COMP... OBJ

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Ensemble Technique Utilization for Indonesian Dependency Parser

Using dialogue context to improve parsing performance in dialogue systems

The Smart/Empire TIPSTER IR System

Some Principles of Automated Natural Language Information Extraction

A Version Space Approach to Learning Context-free Grammars

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

School of Innovative Technologies and Engineering

Prediction of Maximal Projection for Semantic Role Labeling

Parsing natural language

Analysis of Probabilistic Parsing in NLP

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Compositional Semantics

An Interactive Intelligent Language Tutor Over The Internet

Accurate Unlexicalized Parsing for Modern Hebrew

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Multimedia Application Effective Support of Education

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Introduction to CRC Cards

Natural Language Processing. George Konidaris

Specifying Logic Programs in Controlled Natural Language

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Character Stream Parsing of Mixed-lingual Text

Short Text Understanding Through Lexical-Semantic Analysis

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Update on Soar-based language processing

AQUA: An Ontology-Driven Question Answering System

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Chapter 4: Valence & Agreement CSLI Publications

The CYK -Approach to Serial and Parallel Parsing

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Hyperedge Replacement and Nonprojective Dependency Structures

A Domain Ontology Development Environment Using a MRD and Text Corpus

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Adapting Stochastic Output for Rule-Based Semantics

GACE Computer Science Assessment Test at a Glance

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Interpretive (seeing) Interpersonal (speaking and short phrases)

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

LTAG-spinal and the Treebank

Lecture 10: Reinforcement Learning

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Language properties and Grammar of Parallel and Series Parallel Languages

Backwards Numbers: A Study of Place Value. Catherine Perez

THE VERB ARGUMENT BROWSER

Pre-Processing MRSes

Applications of memory-based natural language processing

The stages of event extraction

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

An Efficient Implementation of a New POP Model

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Computational Grammars

Lecture 1: Basic Concepts of Machine Learning

Context Free Grammars. Many slides from Michael Collins

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Refining the Design of a Contracting Finite-State Dependency Parser

Specifying a shallow grammatical for parsing purposes

1.11 I Know What Do You Know?

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

The Role of the Head in the Interpretation of English Deverbal Compounds

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

WSU Five-Year Program Review Self-Study Cover Page

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

A Framework for Customizable Generation of Hypertext Presentations

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

ALEKS. ALEKS Pie Report (Class Level)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Guidelines for Writing an Internship Report

Linking Task: Identifying authors and book titles in verbose queries

Task Types. Duration, Work and Units Prepared by

ARNE - A tool for Namend Entity Recognition from Arabic Text

Transcription:

October 16, 2003 Chapter 10.3 10.6 Parsing

Outline of TOP-DOWN-PARSE Initialize agenda with (S, first word) Pop that state off the agenda Loop: Check if we re finished, if so return tree Check if the node we re trying to expand is a POS If so, check whether the current word of the input has the current node as a possible POS If so, apply the lexical rules of the grammar to that node to build more trees, and add results to the agenda. (NB - APPLY-LEXICAL-RULES will have to return (tree, word) pairs, where the word is the next word in the string.)

Outline of TOP-DOWN-PARSE Loop:... If the current node wasn t a part of speech, apply the non-lexical rules of the grammar to that node, and add the resulting search states to the agenda. If after doing all that, the agenda is empty, reject the sentence. Otherwise, take the next search state of the top of the agenda, and do the loop again.

Corrected version of TOP-DOWN-PARSE function TOP-DOWN-PARSE(input,grammar) returns a parse tree agenda (Initial S tree, Beginning of input) css POP(agenda) loop if SUCCESSFUL-PARSE?(css) then return TREE(css) else if CAT(NODE-TO-EXPAND(css)) is a POS then if CAT(NODE-TO-EXPAND(node-to-expand)) POS(CURRENT-INPUT(css)) then PUSH(APPLY-LEXICAL-RULE(css,grammar),agenda) else PUSH(APPLY-RULES(css,grammar),agenda) if agenda is empty then return reject else css POP(agenda) end

Problems with TOP-DOWN-PARSE Infinite loops with left-recursive grammars Ambiguity Inefficient reparsing of subtrees

Dynamic Programming First introduced by Bellman (1957) A table-driven method of solving problems by combining solutions to sub-problems In this context, what is the problem? What is a sub-problem? How are sub-problems combined?

The Earley Chart Parser: Chart and Dotted Rules For a sentence of N words, the chart contains N+1 cells. Each cell contains a list of states. A state consists of: a local subtree, information about the degree of completion of that subtree, and information about how much of the string corresponds to the subtree. For example (in dotted-rule notation): S V P, [0, 0] NP Det Nominal, [1, 2] V P V NP, [0, 3]

The Earley Chart Parser, Outline Add the initial S (gamma S, [0,0]) to the chart at position 0. Loop, for each of the rest of the cells in the chart: If the state is incomplete, and the category to the right of the dot is not a POS, add (if not redundant) new states to the chart in the current position for each rule that expands that category. These new states all have the dot at the beginning of the rule, and the same span of the string as the rule in the original state. (PREDICTOR)

The Earley Chart Parser, Outline Loop (continued): If the state is incomplete, and the category to the right of the dot ( B ) is a POS, and B is a possible POS for the next word in the string, add the rule B word (if not redundant) to the next cell in the chart. (SCANNER) If the state is complete (dot all the way to the right), look for other states in the current cell which are currently seeking a daughter of the same category as the mother of this state. For each one of those, add a state (if not redundant) to the next cell, with the dot moved over one, and the span increased to the end of the current word. (COMPLETER)

The Earley Chart Parser In which cell of the chart does one find the spanning edge(s)? Is the Earley algorithm top-down or bottom-up? Best-first or exhaustive? Uni-directional or bi-directional? Breadth-first or depth-first?

How does the Earley algorithm solve these problems? Ambiguity Inefficient reparsing of subtrees Infinite loops with left-recursive grammars

Expanding the Chart Parsing is apparently an exponential-time problem. The Earley alogrithm does recognition in polynomial time (O(N 3 )). Returning the trees is still potentially exponential. How would this alogrithm return the trees?

Finite-State or Chunk Parsing A parser such as the Earley Chart Parser is only as good as the grammars it interprets. When robustness is more important than precision, shallow parsing techniques are used instead. Finite-State or Chunk parsers recognize patterns within sentences as Noun groups, Verb groups etc. They can be used to return partial results, even if the whole string can t be treated.

Finite-State or Chunk Parsing More modern robust, shallow processing techniques involve machine learning to deal with learning all the different patterns. One example: RASP (Robust Accurate Statistical Parsing) http://www.cogs.susx.ac.uk/lab/nlp/rasp The European project Deep Thought is currently investigating methods for combining deep and shallow parsing for knowledge-intensive information extraction http://www.project-deepthought.net/

Coming up next... Adding unification to enable more interesting grammars. Another parsing algorithm, with a different kind of chart. Probabilistic parsing, for meaningful best-first