A context-free grammar (CFG) is a 4-tuple

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Proof Theory for Syntacticians

CS 598 Natural Language Processing

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Parsing of part-of-speech tagged Assamese Texts

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Grammars & Parsing, Part 1:

Language properties and Grammar of Parallel and Series Parallel Languages

Natural Language Processing. George Konidaris

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Hyperedge Replacement and Nonprojective Dependency Structures

Developing a TT-MCTAG for German with an RCG-based Parser

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

A General Class of Noncontext Free Grammars Generating Context Free Languages

A Version Space Approach to Learning Context-free Grammars

Some Principles of Automated Natural Language Information Extraction

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Character Stream Parsing of Mixed-lingual Text

An Interactive Intelligent Language Tutor Over The Internet

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Refining the Design of a Contracting Finite-State Dependency Parser

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Compositional Semantics

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Context Free Grammars. Many slides from Michael Collins

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

"f TOPIC =T COMP COMP... OBJ

The Strong Minimalist Thesis and Bounded Optimality

Discovering Statistics

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

A Grammar for Battle Management Language

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Linking Task: Identifying authors and book titles in verbose queries

Genevieve L. Hartman, Ph.D.

Writing Research Articles

An Introduction to the Minimalist Program

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Specifying Logic Programs in Controlled Natural Language

Abstractions and the Brain

Chapter 4: Valence & Agreement CSLI Publications

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

School of Innovative Technologies and Engineering

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Prediction of Maximal Projection for Semantic Role Labeling

Enumeration of Context-Free Languages and Related Structures

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Shockwheat. Statistics 1, Activity 1

Concept Acquisition Without Representation William Dylan Sabo

Analysis of Probabilistic Parsing in NLP

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Empiricism as Unifying Theme in the Standards for Mathematical Practice. Glenn Stevens Department of Mathematics Boston University

Discriminative Learning of Beam-Search Heuristics for Planning

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

AQUA: An Ontology-Driven Question Answering System

Formative Assessment in Mathematics. Part 3: The Learner s Role

LING 329 : MORPHOLOGY

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

The Interface between Phrasal and Functional Constraints

The Good Judgment Project: A large scale test of different methods of combining expert predictions

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Aspectual Classes of Verb Phrases

LTAG-spinal and the Treebank

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

University of Groningen. Systemen, planning, netwerken Bosman, Aart

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

The Role of the Head in the Interpretation of English Deverbal Compounds

Detecting English-French Cognates Using Orthographic Edit Distance

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Degree Qualification Profiles Intellectual Skills

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

ARNE - A tool for Namend Entity Recognition from Arabic Text

South Carolina English Language Arts

Foundations of Knowledge Representation in Cyc

Type Theory and Universal Grammar

The Discourse Anaphoric Properties of Connectives

A relational approach to translation

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

In search of ambiguity

GACE Computer Science Assessment Test at a Glance

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Syntactic systematicity in sentence processing with a recurrent self-organizing network

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Assessing Children s Writing Connect with the Classroom Observation and Assessment

Lecture 10: Reinforcement Learning

The Odd-Parity Parsing Problem 1 Brett Hyde Washington University May 2008

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

Lecture 2: Quantifiers and Approximation

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Transcription:

Syntax analysis Context-free grammars Previously we used context-free grammars for specifying an input language, and for predictive parsing Now we consider this topic systematically, in much greater detail fter we have developed the theory of predictive parsing, we will turn to the study of a more powerful (that is, more widely applicable), but decidedly less intuitive approach to parsing Eventually, we ll also experiment with the use of the parser generator Yacc context-free grammar (CFG) is a 4-tuple G = (V, Σ, S, P) where V is a finite set of variables, or nonterminal symbols Σ is a finite set of terminal symbols, or tokens S V is the start symbol P is a finite set of productions of the form α where V and α is a string over V Σ We begin by looking more closely at the basics of the underlying theory of context-free grammars Let us first define the notion of a derivation step in CFG G: Given a variable, and strings α, β, γ over V Σ, we can write αγ G αβγ if there is a production β In this case, we say that αγ derives αβγ in one step (We usually suppress the subscript G) 1 2

Example Take G = ({E, }, {(, ), +,,num}, E, P) where P consists of E E E E ( E ) E num + Then we have, for instance, Derivable strings, sentential forms, and sentences We write α β to say that α derives β in zero or more steps More precisely, α α for any string α over V Σ, and α γ if there is a β st α β and β γ and E num In particular, for every variable, there is a set of strings derivable from the strings α over V Σ st α E (E) (EE) (nume) (num+e) (num+num) In general, we can abbreviate n -productions α 1 α 2 α n by writing α 1 α 2 α n, so the CFG above can be written E E E ( E ) num sentential form of G is a string derivable from the start symbol sentence of G is a sentential form of G in which no variables occur The language generated by G, denoted L(G), is the set of sentences of G Example num and (num + num) are sentences of the CFG G with productions E E E ( E ) num while E and (nume) are sentential forms Is ((num + E)) a sentence of G? sentential form? 3 4

Leftmost and rightmost derivations Rightmost derivations, parse trees t each step in a derivation, two choices are made: which variable to replace, and which production to apply We will see that the behavior of top-down parsers, such as the predictive parsers we played around with, corresponds to leftmost derivations in which the production at each step is applied to the leftmost variable in the sentential form Example We previously saw two leftmost derivations E num and E (E) (EE) (nume) (num+e) (num+num) in the grammar with productions E E E ( E ) num nother derivation of (num + num) is E (E) (EE) (Enum) (E+num) (num+num) which is not leftmost In fact, the derivation E (E) (EE) (Enum) (E+num) (num+num) an example of a rightmost derivation the production at each step is applied to the rightmost variable in the sentential form We ll see later that some powerful parsing algorithms are best understood in terms of rightmost derivations parse tree can be understood as a graphical representation of a derivation which suppresses as much information as possible about the order of production applications For example, the parse tree corresponding to the above rightmost derivation also corresponds to the previous leftmost derivation, and to additional derivations that are neither leftmost nor rightmost But every parse tree corresponds to a unique leftmost derivation, and a unique rightmost derivation 5 6

mbiguous grammars CFG G is ambiguous if there is more than one leftmost derivation for a sentence of G Equivalently, G is ambiguous if there is more than one rightmost derivation for a sentence of G Equivalently, G is ambiguous if there is more than one parse tree for a sentence of G Claim The CFG with productions E E E ( E ) num is ambiguous Consider, for example, the sentence num + num num We previously discussed the kind of problem that such ambiguity causes for translation or evaluation of such an expression For example, the ambiguity of E E E num can be easily eliminated by transforming the grammar as follows E E num In some cases, eliminating ambiguity may make the grammar harder to understand Sometimes it is better to use other means for disambiguating a grammar In fact, it is often nice to write ambiguous grammars with commonsensical disambiguating rules such as operator precedence for expressions We will eventually see this idea applied in Yacc More generally, for some approaches to parsing, we want unambiguous grammars 7 8

mbiguity cannot always be eliminated some context-free languages are inherently ambiguous For example, the language L = {a m b m a n b n m, n N } { a m b n a n b m m, n N } is context-free, but cannot be generated by any unambiguous CFG Consider the following grammar for this language: S B ab ǫ B abb C C bca ǫ Intuitively, any grammar for L will have two parse trees for a n b n a n b n for some n N There is no general method for identifying ambiguity, or for eliminating it when possible In fact, even the question of whether an arbitrary CFG is ambiguous is unsolvable 9 Regular languages and context-free languages Recall: some context-free languages are not regular For example L = {a n b n n N } is context-free generated by the grammar S asb ǫ but it is not regular Every regular language is context-free Consider any NF M = (States, Σ,move, S 0,Final) CFG that generates the language accepted by M is G = (States, Σ, S 0, P) where P is constructed by taking ab for each B move(, a), for all States and a Σ {ǫ}, and also taking ǫ for every Final 10

Some languages are not context-free Top-down parsing and left-recursion simple example is { ww w (a b) } In practice, such limitations are one reason that the grammar for a typical programming language will allow syntactically illegal programs for example, programs in which an identifier is used without being declared Recall: this sort of thing might be handled via a symbol table (One attribute of an identifier could reflect whether or not the identifier is declared) The book gives some additional examples, trying to relate specific examples of languages that are not context-free to associated problematic structures of programming languages The behavior of a top-down parser corresponds to a leftmost derivation: at each step apply a production to the leftmost variable in the current sentential form The immediate goal is to generate a sentential form whose first character is the current input symbol For instance, for input bbb and grammar we could begin by constructing b b / b so that the sentential form (in this case, sentence) generated is bb While this parse is wrong in the sense that it can t generate the whole input string it does generate the first (and second) input character 11 12

Of course a successful top-down parse of bbb using grammar would instead generate b b / b to match the first b fter doing this you re also ready to match the second and third b So it appears that we may succeed with this grammar, at least if we allow backtracking, so that we can go back and try something else if we happen to guess wrong at some point But there is problem The problem is that when attempting a top-down parse of bb (or bbb, or ) with this grammar, we could instead find ourselves generating a parse tree such as Notice that this never leads to a match with the first character of the input We never generate an initial terminal character The crucial difficulty here is that the production b is left-recursive But even a grammar with no immediate left-recursion can exhibit this behaviour 13 14

For example, we could have a similar difficulty with the grammar B b B which is also left-recursive, although there is no single production that is left-recursive B B B So we want a definition of left-recursive grammars, and a method for systematically eliminating left recursion We begin with a preliminary definition We write α + β to say that α derives β in one or more steps More precisely, α + β if α β, and α + γ if there is a β st α + β and β γ CFG G = (V, Σ, S, P) is left-recursive if there is a variable st + α for some string α over V Σ So, for example, the grammar B b B is left-recursive because, for instance, + 15 16

For next time Next time we ll (at least begin to) learn how to systematically eliminate left-recursion Read 41 43 if you haven t already 17