Quick Grammar Type Recognition: Concepts and Techniques

Similar documents
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

A General Class of Noncontext Free Grammars Generating Context Free Languages

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Proof Theory for Syntacticians

Language properties and Grammar of Parallel and Series Parallel Languages

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

A Version Space Approach to Learning Context-free Grammars

Grammars & Parsing, Part 1:

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

CS 598 Natural Language Processing

Developing a TT-MCTAG for German with an RCG-based Parser

An Interactive Intelligent Language Tutor Over The Internet

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The Strong Minimalist Thesis and Bounded Optimality

Abstractions and the Brain

Compositional Semantics

Axiom 2013 Team Description Paper

Some Principles of Automated Natural Language Information Extraction

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

The Interface between Phrasal and Functional Constraints

Multimedia Application Effective Support of Education

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Natural Language Processing. George Konidaris

"f TOPIC =T COMP COMP... OBJ

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Reinforcement Learning by Comparing Immediate Reward

Detecting English-French Cognates Using Orthographic Edit Distance

Ensemble Technique Utilization for Indonesian Dependency Parser

Lecture 10: Reinforcement Learning

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Context Free Grammars. Many slides from Michael Collins

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Linking Task: Identifying authors and book titles in verbose queries

A Grammar for Battle Management Language

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Refining the Design of a Contracting Finite-State Dependency Parser

(Sub)Gradient Descent

AQUA: An Ontology-Driven Question Answering System

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Using dialogue context to improve parsing performance in dialogue systems

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Lecture 1: Machine Learning Basics

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Enumeration of Context-Free Languages and Related Structures

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

GRAMMAR IN CONTEXT 2 PDF

Multiple case assignment and the English pseudo-passive *

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

School of Innovative Technologies and Engineering

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Prediction of Maximal Projection for Semantic Role Labeling

Specifying Logic Programs in Controlled Natural Language

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Hyperedge Replacement and Nonprojective Dependency Structures

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Parsing natural language

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

GACE Computer Science Assessment Test at a Glance

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Software Maintenance

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Writing Research Articles

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Are You Ready? Simplify Fractions

Discriminative Learning of Beam-Search Heuristics for Planning

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Matching Similarity for Keyword-Based Clustering

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Evolutive Neural Net Fuzzy Filtering: Basic Description

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Disambiguation of Thai Personal Name from Online News Articles

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities

A Case Study: News Classification Based on Term Frequency

CEFR Overall Illustrative English Proficiency Scales

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Ontologies vs. classification systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Constraining X-Bar: Theta Theory

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

On the Polynomial Degree of Minterm-Cyclic Functions

Transcription:

Quick Grammar Type Recognition: Concepts and Techniques Amin Milani Fard +, Arash Deldari *, and Hossein Deldari + + Department of Computer Engineering, Ferdowsi University, Mashhad, Iran * Department of Computer Engineering, Sadjad University, Mashhad, Iran milanifard@stu-mail.um.ac.ir Abstract. This paper intends to give an overview to grammar classification in terms of language specification and parsing methods; an important and always fashionable topic in computer science, compilers and language processing area. It is known that when a conflict happens in constructing the parsing table, the grammar is not acceptable by that parsing method, however we are interested in quick ways to determine a given grammar type. Although so many papers and books have been published containing useful information about this matter, none of them covers all the recognition aspects of grammars especially quick methods. We finalized the work with our quick grammar recognizer algorithm to detect grammar type. 1 Introduction In computer science, parsing is the process of analyzing a sequence of tokens in order to determine its grammatical structure with respect to a given formal grammar. Parsing process, formally known as syntax analysis, transforms input text into a data structure, usually a tree, which is suitable for later processing. Generally, parsers operate in two stages, first identifying the meaningful tokens in the input, and then building a parse tree from those tokens. The task of the parser is essentially to determine if and how the input can be derived from the start symbol within the rules of the formal grammar. 2 Language Specifications The concepts and terminology for describing the syntax of languages is taken from Noam Chomsky s works on linguistic structure [1], [2]. His classification of grammars and the related theory was the basis of further work on formal language theory, theory of computation, and efficient methods of parsing in compiler design, [3], [4], [5] and [6]. Various restrictions on the productions define different types of grammars and corresponding languages in the Chomsky hierarchy: Type-0 grammars (unrestricted grammars), also known as recursively enumerable languages, include all formal grammars and do not have any restrictions. They generate all languages that can be recognized by a Turing machine.

Type-1 grammars (context-sensitive grammars) generate the context-sensitive languages: L R, exception: s ε is allowed if s never occurs on any right hand side. In normal form these grammars rules have the form αaβ αγβ with A a non-terminal andα,β and γ strings of terminals and non-terminals. The strings α and β may be empty, but γ must be nonempty. It can also include the rule s ε. All these languages can be recognized by linear-bounded automata. Type-2 grammars (context-free grammars) generate the context-free languages. L N. These are defined by rules of the form A γ with A a non-terminal and γ a string of terminals and non-terminals. These languages can be recognized by a pushdown automaton. Context-free languages are the theoretical basis for the syntax of most programming languages. Type-3 grammars (regular grammars) generate the regular languages. L N, R = a or R = ax, where a A and X N. Such a grammar restricts its rules to a single nonterminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed by a single non-terminal. The rule s ε is also allowed if s does not appear on the right side of any rule. These languages can be decided by a finite state automaton and can be obtained by regular expressions. Regular languages are commonly used to define search patterns and the lexical structure of programming languages. Chomsky hierarchy depicted in Fig. 1., indicates every regular language is contextfree, every context-free language is context-sensitive and every context-sensitive language is recursively enumerable. Type 0 Grammars Type 1 Grammars Type 2 Grammars Type 3 Grammars Finite Languages Fig. 1. Chomsky hierarchy From a practical point of view, grammars may be used to solve membership problem given a string over A, does it belong to language L(G) or not. Another problem is the so-called parsing problem which is finding a sequence of rewriting steps from the grammar's start symbol to the given sentence. Parsing can be seen as structuring the input according the given grammar. The algorithm that makes structuring is called a parser [7].

3 Parsing algorithms The most commonly known context-free parsing algorithms are top-down and bottom-up parsing. In top-down parsing, parser begins with the start symbol of the grammar and attempts to generate the same sentence that it is attempting to parse. The most commonly known top-down parsing algorithms are LL [3]. In bottom-up parsing, parser matches the input of the right-hand side of the productions and builds a derivation tree in reverse. The bottom-up parsing uses traditionally one symbol look ahead to guide the choice of action. The most commonly known bottom-up parsing algorithms are LR, SLR and LALR [3], [4], [8], [9]. Commonly these parsing algorithms are limited to working on subclasses of context-free grammars [9]. Hierarchies of subclasses [4] are shown in the Fig. 2. A grammar is said to be LL(k) if a parser can be written for that grammar that can make a decision of which production to apply at any stage simply by looking at most at the next k symbols of the input. LL(1) grammars are a simple but important category, where one symbol lookahead is adequate for the implementation of a top-down predictive parser [7]. A grammar is said to be LR(k) if a parser can be written for that grammar which makes a single left to right pass over the input with a lookahead of at most k symbols. These grammars can be parsed with bottom-up parsers, requiring no backtracking. LL(k) and LR(k) parsers do not backtrack and they operate efficiently. Available parser generator tools commonly support only some subclasses. Yacc [10], SableCC [11], CUP [12] and most of the other supports LALR(1). ANTLR [13], PCCTS [14] and some others support LL(k) parsing. Unambiguous Grammars Ambiguous Grammars LR(K) LR(1) LALR(1) LL(K) LL(1) SLR LR(0) LL(0) Fig. 2. Hierarchy of Context-free grammar classes No matter using top-down or bottom up, an ambiguous grammar is not able to be parsed with the known parsers. This is due to the ambiguity which happens in constructing the derivation tree. Detecting whether a grammar is ambiguous or not is not always an easy rule-based approach, however, one simple strategy is the following:

Ambiguous grammars mainly contain productions of the form A AαA β, in which both left recursion and right recursion, occurs simultaneously either direct or indirect after some non-terminal replacement. 3.1 Unger parsing An Unger parser [15] is the simplest known method to parse any context-free grammar. The exponential time complexity of this parser made it inapplicable as long as grammar is not ambiguous. In Unger parser algorithm, for each right-hand side production of grammar we must first generate all possible partitions of the input sentence. Generating partitions is not difficult: if we have m productions in right-hand side, numbered from 1 to m, and n is length of input, numbered from 1 to n, we have to find all possible partitions such that the numbers of the characters for each production are consecutive, and any production does not contain lower-numbered characters than any character in a lower-numbered production. Partition fails if a terminal symbol in a right-hand side does not match the corresponding part of the partition. The non-failed partition results will all lead to similar split-ups as sub-problems. These sub-problems must all be answered in the affirmative, or the partition is not the right one. For an ambiguous grammar that contains loops, there are infinitely many derivations to be found. So, the process needs to avoid the problem by cutting off the search in these cases. Maintaining a list of partitions that we are currently investigating can do this. If a new partitioning already appears in the list, we do not investigate that and proceed as if the partition was answered negatively. Fortunately, if the grammar does not contain such a loop, a cut-off will not do any harm either, because the search is doomed to fail anyway [7]. 3.2 Top-down parsing Although it is possible to program a backtracking top-down parser, the resulting parser will be complex and slow. Predictive parsers (sometimes called recursive descent parsers) do no backtracking they can always determine which production to use. Clearly, predictive parsers can be written for grammars in which all production alternatives start with different terminal symbols. Production of the form A Aα β γ, is called left recursive. When one of the productions in a grammar is left recursive then a predictive parser may loop forever. To overcome this problem, the left recursive rule can be replaced with the followings: A β A γ A A α A ε Left Factoring is also another problem in top-down parsing. When a non-terminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing. Considering the productions A α β1 α β2 α βn γ in which contains left factor (α), the following replacement solve the problem: A α A γ A β1 β2 βn

A CFG is LL(1) if for each collections of productions A α1 α2 αn, the following holds: 1. has no left recursion 1. First(αi) First(αj) = for all i j (No left factoring) 2. if αi * ε then 2.a. αj * ε for all i j 2.b. First(αj) Follow(A) = for all i j A CFG is it LL(k), whenever there are two leftmost derivations, 1. S * ωaα ωβα * ωx lm lm lm 2. S * ωaα ωγα * ωy lm lm lm, such that First k (x) = First k (y), it follows that β = γ. [20] 3.3 Bottom-up parsing Bottom-up parsers start with the tokens in the input string rather than with the starting symbol of the grammar. A bottom-up parser produces the rightmost derivation in reverse. Shift-reduce parsers are based on two operations the shift operation reads and stores an input symbol and the reduce operation matches groups of adjacent stored symbols with the right hand side of a production and replaces them by the corresponding left hand side. 3.3.1 Precedence parsing There is a certain class of grammars called precedence grammars for which it is possible to write relatively simple parsers. Here, precedence relationships between adjacent symbols determine the actions of the parser. Details of the techniques were given in the Languages and Compilers course books [3]. At first sight, precedence parsing looks like a good technique it is simple and implementations can be very efficient. However, it is a technique that is now rarely used in practice because it is difficult, if not impossible, to transform an average programming language grammar into a precedence form. A CFG is precedence grammar if the following conditions meet: 1. No two non-terminal exist next to each other 2. No epsilon (empty) production occur 3.3.2 LR parsing LR parsers are efficient bottom-up parsers that can be constructed for a large class of context-free grammars. An LR(k) grammar is one that generates strings each of which can be parsed during a single deterministic scan from left to right without looking ahead more than k symbols. These parsers are generally very efficient and good at

error reporting, but unfortunately they are very difficult to write without the help of special parser-generating programs. Even top-down parsers have their problems: left recursion has to be removed and further restrictions have to be imposed to ensure a deterministic and efficient parser. Parsing technique for LR(k) grammars was first described by Knuth [17] and has since been widely used and much developed. A convenient way of implementing an LR(1) parser is via a parsing table [16]. Each entry (indexed by the current input symbol and the state number at the top of the stack) contains a description of the next action the parser should perform. The possible actions are shift, reduce, accept and error. It is known that when a conflict happens in constructing the parsing table, the grammar is not acceptable by that parsing method. For example a grammar is not LR(1) if has either shift-reduce conflict for any item [A α.xβ, t] in s with x a terminal, there is no item in s of the from [B α., x] or reduce-reduce conflict there are no two items in s of the form [A α., t] and [B β., t]. Our concentration, however, is on the matter weather there exist quick ways to determine a given grammar type or not. Three methods, in order of increasing power are simple LR (SLR), lookahead LR (LALR), and canonical LR (CLR). SLR and LALR approaches reduce the size of the parsing table, but they cannot handle all the grammars that can be parsed by the canonical LR method. The SLR(1) parser is based on a LR(0) parsing table, but onesymbol lookahead is added after the table has been built [7]. A grammar is LR (0) if you can take a valid token sequence, chop it in two, and still make sense of the left part. The LR grammar hierarchy is as follows: LR(0) SLR(1) LR(1) LR(k) A CFG is LR(0) if it is LL(1) and do not have epsilon product. Almost every LL grammar is LR(0) and thus LALR. The exceptions being grammars with empty rules, some of them may be LL without being LR(0) [18]. A "null" non-terminal symbol is defined as a non-terminal that only derives or produces the null string (epsilon). A "p-reduced" grammar is a reduced grammar in which all nonterminal symbols are not "null". If First(A)=ε then A is null else A is not null. A CFG is LALR(1) if it is LL(1) and is p-reduced [18]. A CFG is SLR(1) if 1) For any item {A α.xβ: x T there is no {B γ. : x Follow(B) 2) For any item {A α. and {B β. Follow(A) Follow(B) = A CFG is SLR(k) if and only if the following two statements are true for all states q in the LR(0) machine for the S-augmented grammar [19], [21]. 1. Whenever q contains a pair of distinct items [A 1 ω 1 ] and [A 2 ω 2 ], then Follow k (A 1 ) Follow k (A 2 ) = 0 2. Whenever q contains a pair of items [A α.aβ] and [B ω.], where a is a terminal, then First k (aβ Follow k (A)) Follow k (B) = 0

A CFG is LR(1) if: 1) For any item [A α.xβ,a] with x T there is no [B γ.,x] 2) For any two complete items [A γ.,a] and [B β.,b] it follows a and a!=b. A CFG is LR(k), k 0, if the three conditions bellow imply that αaω =γbx. (That is, α=γ, A=B, and x=y.) [19], [20] Let G = (N,,P,S) be a CFG and let G =(N,,P,S ) be its augmented grammar. A grammar is LR if it is LR(k) for some k. 1. S * αaω αβω, rm rm 2. S * γbx αβy, rm rm 3. First k (ω) = First k (y) Main Theorem for LR detection A CFG is in first normal form (1NF) - Chomsky normal form - if and only if all production rules are of the form: A BC or A α or S ε, where A, B and C are non-terminal symbols, α is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol. Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be efficiently transformed into an equivalent one which is in Chomsky normal form. With the exception of the optional rule S ε (included when the grammar may generate the empty string), all rules of a grammar in Chomsky normal form are expansive; thus, throughout the derivation of a string, each string of terminals and non-terminals is always either the same length or one element longer than the previous such string [22]. A CFG is in second normal form (2NF) - Greibach normal form - means that all production rules are of the form: A αx or S ε, where A is a nonterminal symbol, α is a terminal symbol, X is a (possibly empty) sequence of nonterminal symbols not including the start symbol, S is the start symbol, and ε is the null string. Observe that the grammar must be without left recursions [22]. Let G be a grammar in 1NF. Then do the following as often as possible: Pick some non-terminal, 1. If A is left-recursive, apply full left-recursion elimination". 2. Unfold all occurrences of A in the grammar. 3. Eliminate productions for A from the grammar (as it become unreachable). A grammar G is said to be in third normal form (3NF), if it is in 2NF and there are no two productions the right-hand sides of which start with the same symbol, such as in Z x u x v. Except for the aforementioned rare termination problem this normal form can obviously be obtained by apply left factoring wherever possible. How-

ever, we can improve the efficiency by delaying the left factorings as long as possible. This may be called "lazy left factoring" [23]. Main Theorem: Pepper in [23] proved that if G is a grammar and its transformed 3NF version is G' then original grammar G would be LR(k) if and only if the transformed grammar G' is LL(k). 4 Proposed mechanism Regarding detection methods proposed in previous section, a procedural approach is needed to determine a given grammar type. To meet so, we propose the following recognition steps, shown in Fig. 3, such that in the most efficient way the context free grammar type will be found. In this approach if an ambiguity sign is detected, parsing is not possible but with the Unger backtracking method. Otherwise detection framework would continue with LL test. TryX(n) functions return false in case unable to parse with the correspondence parsing method and return true if possible. Therefore if parsing could not be handled, a more powerful approach would be evaluated. When LL rejects parsing, the LR evaluation starts and in case LR rejects, backtracking would be obtained. Fig. 4, shows a simple method to detect whether a grammar is ambiguous or not by checking left and right recursion. Other detection algorithms are proposed as discussed earlier. if(!isambiguous()) if(!tryll(0)) if(!tryll(1)) if(!tryll(k)) if(!trylr(0)) if(!tryslr(1)) if(!trylalr(1)) if(!trylr(1)) if(!trylr(k)) TryBackTrack(); else TryBackTrack(); Fig. 3. The proposed quick grammar recognizer algorithm if(canreplacenonterminals()) if(isleftrecursive() && IsRightRecursive()) else Fig. 4. IsAmbiguous algorithm

if(isleftrecursive() HasLeftFactoring())) if(twoproductsreachepsilon()) if(first_followintersect()!=0) else Fig. 5. TryLL(1) algorithm if(hastwolmd() && EqualFirstk()) if(!equallm()) else Fig. 6. TryLL(k) algorithm if(hasepsilonproduct()) if(isleftrecursive() HasLeftFactoring())) if(twoproductsreachepsilon()) if(first_followintersect()!=0) else Fig. 7. TryLR(0) algorithm if(existsameproduct()) if(followsetsintersect()!=0) else Fig. 8. TrySLR(1) algorithm

if(existnullnonterminal()) if(isleftrecursive() HasLeftFactoring())) if(twoproductsreachepsilon()) if(first_followintersect()!=0) else Fig. 9. TryLALR(1) algorithm if(existsunchecked_lr1item()){ if(hasthesamelookahead()) else Fig. 10. TryLR(1) algorithm if(hastwormd() && EqualFirstk()) if(!equalrm()) else Fig. 11. TryLR(k) algorithm 5 Conclusion and future work In this paper we investigated grammar classification techniques in terms of language specification and parsing method specifications in a systematic framework. The work concerned with a very important and always fashionable topic in computer science and compilers and language processing area: grammar specification and parsing. It is known that when a conflict happens in constructing the parsing table, the grammar is not acceptable by that parsing method, however we built a framework to quickly determine a given grammar type. We finalized the work with our quick grammar recognizer algorithm to detect grammar type. Our future work is based on a mathematical approach in order to formulize grammars and perform an interpolation curvefitting method and compare with the proposed approach.

References 1. Chomsky, N., Three Models for the Description of Language, IRE Transactions on Information Theory, 2 (1956), pp. 113-124, 1956. 2. Chomsky, N., On Certain Formal Properties of Grammars, Information and Control, 1 (1956), pp. 137-167, 1959. 3. A. V. Aho, R. Sethi, and J. D. Ullman, Compilers. Principles, techniques, and Tools, Addison-Wesley, 1986. 4. A. W. Appel, Modern Compiler Implementation in Java, Cambridge Univ. Press, 1998. 5. T. W. Parsons, "Introduction to Compiler Construction", Computer Science Press, New York, 1992. 6. K. Slonneger, B. L. Kurtz, Formal Syntax and Semantics of Programming Languages: A Laboratory Based Approach, Addison-Wesley, 1995, Available at: http://www.cs.uiowa.edu/~slonnegr/plf/book/ 7. Jokipii Antic ''Grammar-based Data Extraction Language (GDEL)'', Master of Science Thesis in Information Technology, University of Jyväskylä Department of Mathematical Information Technology, 10th October 2003 8. A. V. Aho, J. D. Ullman, The Theory of Parsing, Translation, and Compiling, Volume 1: Parsing, Prentice-Hall, 1972. 9. D. Grune, C. J. H. Jacobs, "Parsing Techniques: A Practical Guide", Ellis Horwood, 1990. 10. S. C. Johnson, YACC - Yet Another Compiler-Compiler, Technical Report Computer Science 32, Bell Laboratories, Murray Hill, New Jersey, 1975, Available at: http://epaperpress.com/lexandyacc/download/yacc.pdf 11. E, Gagnon, SableCC, an Object-Oriented Compiler Framework, PhD thesis, School of Computer Science, McGill University, Montreal, March 1998, Available at: http://www.sablecc.org/thesis.pdf 12. S. E. Hudson, CUP parser generator for Java, 1997. Available at: http://www.cs.princeton.edu/ appel/modern/java/cup/ 13. T. J. Parr and R. W. Quong, ANTLR: A predicated-ll(k) parser generator, Software Practice and Experience, 25(7):789 810, July 1995, Available at: http://www.antlr.org/papers/antlr.ps 14. T. J. Parr, Language Translation Using PCCTS & C++, Automata Publishing Company, 1997. ISBN: 0962748854. 15. S.H. Unger, "A global parser for context-free phrase structure grammars", Commun. ACM, vol. 11, no. 4, p. 240-247, April 1968. 16. Des Watson. High-Level Languages and their Compilers. International Computer Science Series. Addison-Wesley Publishing Company, Wokingham, England, 1989. 17. D. E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607 639, 1965. 18. John C. Beatty J, "On the relationship Betwen the LL(1) and LR(1) Grammars". ACM Vol. 29, 1982. 19. Žemlička, M.: "Principles of Kind Parsing - An Introduction". [Technical report KSI MFF UK No. 2002/1], MFF UK, Praha, December 2002 20. Alfred V. Aho, Jeffrey D. Ullman: The Theory of Parsing, Translation, and Compiling, Vol. I: Parsing, Prentice Hall, 1972. ISBN 0-13-914556-7. 21. Seppo Sippu, Eljas Soisalon-Soininen: Parsing Theory. Volume II: LR(k) and LL(k) Parsing. Springer Verlag. EATCS 20. ISBN 3-540-51732-4. 22. John Martin (2003). Introduction to Languages and the Theory of Computation. McGraw Hill. ISBN 0-07-232200-4. Pages 237 240 section 6.6: simplified forms and normal forms. 23. Peter Pepper, "LR Parsing = Grammar Transformation + LL Parsing - Making LR Parsing More Understandable And More Efficient", No 99-5, April 1999 http://citeseer.ist.psu.edu/pepper99lr.html