Syntax Analysis Context Free Grammar

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Parsing of part-of-speech tagged Assamese Texts

Proof Theory for Syntacticians

A General Class of Noncontext Free Grammars Generating Context Free Languages

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

CS 598 Natural Language Processing

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Language properties and Grammar of Parallel and Series Parallel Languages

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Character Stream Parsing of Mixed-lingual Text

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Developing a TT-MCTAG for German with an RCG-based Parser

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Linking Task: Identifying authors and book titles in verbose queries

Context Free Grammars. Many slides from Michael Collins

The College Board Redesigned SAT Grade 12

"f TOPIC =T COMP COMP... OBJ

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

A Version Space Approach to Learning Context-free Grammars

Natural Language Processing. George Konidaris

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Some Principles of Automated Natural Language Information Extraction

AQUA: An Ontology-Driven Question Answering System

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Compositional Semantics

The Discourse Anaphoric Properties of Connectives

Refining the Design of a Contracting Finite-State Dependency Parser

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Hyperedge Replacement and Nonprojective Dependency Structures

Unit 9. Teacher Guide. k l m n o p q r s t u v w x y z. Kindergarten Core Knowledge Language Arts New York Edition Skills Strand

Specifying Logic Programs in Controlled Natural Language

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

A Grammar for Battle Management Language

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Loughton School s curriculum evening. 28 th February 2017

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

What the National Curriculum requires in reading at Y5 and Y6

NBER WORKING PAPER SERIES INVESTING IN SCHOOLS: CAPITAL SPENDING, FACILITY CONDITIONS, AND STUDENT ACHIEVEMENT

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Myths, Legends, Fairytales and Novels (Writing a Letter)

School of Innovative Technologies and Engineering

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

2017 national curriculum tests. Key stage 1. English grammar, punctuation and spelling test mark schemes. Paper 1: spelling and Paper 2: questions

Enumeration of Context-Free Languages and Related Structures

Millersville University Degree Works Training User Guide

GACE Computer Science Assessment Test at a Glance

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

PHYSICS 40S - COURSE OUTLINE AND REQUIREMENTS Welcome to Physics 40S for !! Mr. Bryan Doiron

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

The Interface between Phrasal and Functional Constraints

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

An Interactive Intelligent Language Tutor Over The Internet

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Statewide Framework Document for:

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Analysis of Probabilistic Parsing in NLP

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Physics 270: Experimental Physics

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Prediction of Maximal Projection for Semantic Role Labeling

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

LTAG-spinal and the Treebank

Pre-Processing MRSes

Rendezvous with Comet Halley Next Generation of Science Standards

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Appendix D IMPORTANT WRITING TIPS FOR GRADUATE STUDENTS

STUDENT MOODLE ORIENTATION

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Chapter 2 Rule Learning in a Nutshell

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

GRAMMAR IN CONTEXT 2 PDF

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Constraining X-Bar: Theta Theory

Are You Ready? Simplify Fractions

Calibration of Confidence Measures in Speech Recognition

Abstractions and the Brain

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The stages of event extraction

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

An Introduction to the Minimalist Program

Noisy SMS Machine Translation in Low-Density Languages

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Transcription:

Syntax Analysis Context Free Grammar CMPSC 470 Lecture 05 Topics: Overview of Parser Context-Free Grammar (CFG) Eliminate ambiguity in CFG A. Overview of Parser Source Program Lexical Analyzer token get next token Parser Symbol table a) Role of Parser 1. Report any syntax error 2. Recover from commonly occurring error to continue processing the remainder of program 3. Create parse tree b) Parser use grammer Types of grammar representation Grammar gives precise, yet easy-to-understand syntactic specification of a programming language Parser can be automatically constructed from certain classes of grammars.

c) Types of parsers & grammars Commonly used parsing methods: top-down, bottom-up Common grammars: LL, LR Left recursive grammar Non-left recursive grammar A grammar is ambiguous if it permits more than one parse trees.

d) Programming errors Lexical error: misspelling of identifiers, keywords, or operators Syntactic errors: misplaced semicolons, extra or missing braces ( {, }, (, ) ), case statement without an enclosing switch. Semantic error: type mismatches between operators and operands. Logical error: anything from incorrect reasoning on the part of the programmer to the use in a program. It produces unintended or undesired output or behavior.

B. Context-Free Grammar (CFG) A context-free grammar (CFG) is a certain type of formal grammar, which is a set of production rules that describe all possible strings in a given formal language. CFG is used to specify the syntax of a language. a) Format CFG has following form: ssssssss iiii ( eeeeeeee ) ssssssss eeeeeeee ssssssss Terminals: components of tokens output by lexical analyzer. Here, we can see only keywords, which will be terminal node of parse tree. Nonterminal: syntactic variables that denote set of strings. Productions (of a grammar): they specify how terminals and nonterminals can be combined to form a string. It consists of Start symbol: One of nonterminal. Conveniently, the productions for the start symbol are listed first.

Example) Let we have a program. while ( i > 0 ) if ( i % 2 == 0 ) i = i / 2; else i = i 1; The grammar that supports the above program can be:

b) Notational conventions 1. Terminals (a) Lowercase letters early in alphabet, such as a,b,c (b) Operator symbols: +, *, /, (c) Punctuation symbols: parentheses, comma, (d) Digits: 0,1,,9 (e) Boldface string: id, if (f) 2. Nonterminals (a) Upper letters early in alphabet, such as A,B,C (b) S is usually start symbol. (c) Lowercase, italic names such as expr or stmt. (d) 3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols; that is, either nonterminals or terminals. 4. Lowercase letters late in the alphabet, chiefly u,v,..., z, represent (possibly empty) strings of terminals. 5. Lower Greek letters, αα, ββ, γγ for example, represent (possibly empty) string of grammar symbols. Thus, a generic production can be written as AA αα, where AA is the head and αα the body. 6. A set of productions AA αα 1, AA αα 2,, AA αα kk with common head AA (call them AA-productions), may be written AA αα 1 αα 2 αα kk. Call αα 1, αα 2,, αα kk the alternatives for AA. 7. Unless stated otherwise, the head of the first production is the start symbol.

c) Derivations Let we have the following grammar G: EE EE + EE EE (EE) iiii Starting from EE, we can obtain a sentence (iiii) by sequentially replacing EE like: We call, such a sequence of replacement as derivation of (iiii) from EE, and EE derives (iiii) Symbols: means, derives in one step, and means derive in zero or more steps, and + means derive in one or more steps Rules: 1. αα αα, for any string αα, 2. αα ββ and ββ γγ, then. Sentence: If SS is a start symbol and SS αα, then αα is a sentential form of grammar G. Sentential form may contain both terminals and nonterminals, and may be empty. If αα is a sentence of G, then αα has only terminals. A language generated by a grammar is the set of sentences. A string terminal ww is in LL(G) iff and only if ww is sentence of G (or SS ww). LL(G) = ww SS ww Context-free language is a language that can be generated by a grammar. Two grammars are equivalent if they generate the same language.

Leftmost derivation: The leftmost nonterminal is always derived. It use lm symbol. Rightmost derivation: The rightmost nonterminal is always derived. It use symbol. rm d) Parse Tree A parse tree is a graphical representation of a derivation. Each interior node of a parse tree represents the application of a production. Interior node is nonterminals, and leaves are terminals. Parse tree filters out the order in which productions are applied to replace nonterminals.

e) Ambiguity A grammar is ambiguous if It produces more than one parse trees for some sentences, or If it produces more than one leftmost (or rightmost) derivations for the same sentence. For most parsers, it is desirable that the grammar be unambiguous Example) The following grammar is ambiguous: EE EE + EE EE EE iiii Because it permits two distinct leftmost derivation for the sentence.

f) CFG is more powerful notation than regular expression Every construct described by regular expression can be described by CFG: every regular language is contextfree language, but not vice-versa. Convert NFA to CFG: Example) Determine CFG accepting regular expression (aa bb) aaaa 1. Determine NFA accepting the regular expression 2. For each state ii of NFA, create nonterminal AA ii 3. If state ii has a transition to jj on input aa aa (ii ii), add production AA ii aaaa jj. εε If ii ii, add production AA ii AA jj. 4. If ii is an accepting state, add AA ii εε 5. If ii is the start state, make AA ii the start symbol of grammar. Language LL = {aa nn bb nn nn 1} with equal number of aa s and bb s can be described by grammar (CFG), but not by regular expression. We say that finite automata cannot count. CFG accepting LL = {aa nn bb nn nn 1} is DFA accepting

g) Non-context-free language Semantic analysis cannot be checked by CFG. Example 1) Consider abstract language LL 1 = {wwwwww ww is in (aa bb) }. In programming language, first ww represents, Second ww represents cc represents In a programming language like C/C++/Java, LL 1 abstracts the problem of CFG cannot describe the non-context-free language like LL 1, and this checking should be done in semantic analysis phase Example 2) In LL 2 = {aa nn bb mm cc nn dd mm nn 1 and mm 1}, aa nn and bb mm represent cc nn and dd mm represent CFG cannot describe this language, and so the semantic analysis phase should check:

C. Eliminate Ambiguity in CFG An ambiguous grammar can have more than one parse tree generating a given string of terminals. Since a string with more than one parse tree has more than one meaning, we desire unambiguous grammars. Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. a) Associativity of operators Let we have a grammar G for assignments, like aa = bb = cc, such that: AA AA = AA aa bb cc zz Given sentence aa = bb = cc, there are 2 parse trees: Case 1: The parse tree built with a grammar that is left-associative, where bb belongs to left operator =: Case 2: The parse tree built with a grammar that is right-associative, where bb belongs to right operator =: A new grammar that eliminates the ambiguity by making operator = associate to left: A new grammar that eliminates the ambiguity by making operator = associate to right: The new parse tree of aa = bb = cc is: The new parse tree of aa = bb = cc is:

An ambiguous grammar for expression 9 + 5 2 is given as follows: EE EE + EE EE EE nnnnnn Operator + and are left-associative, in general. How to remove the ambiguity of the grammar? b) Precedence of operators Consider the expression 9 + 5 2. There are two ways of interpreting this: (9 + 5) 2 and 9 + (5 2). Associativity rule cannot be applied because has higher precedence than +. Given the following grammar for and, which are left-associative: EE EE + FF EE FF FF FF nnnnnn The ambiguity of the grammar can be eliminated by rewriting the grammar: The parse tree of 9 + 5 2 and 9 5 + 2 are:

In a similar manner, the grammar can be rewritten to have the parenthesis, which has highest precedence. Draw the parse tree of (1 + 2 3) 4:

c) Eliminate dangling else Consider the following grammar for if-statement: ssssssss iiii eeeeeeee tttttttt ssssssss iiii eeeeeeee tttttttt ssssssss eeeeeeee ssssssss ooooheeee The grammar is ambiguous since the following sentence iiii EE 1 tttttttt iiii EE 2 tttttttt SS 1 eeeeeeee SS 2 has two parse trees This case is called dangling else. In all programming language, the first parse tree is preferred. That is match each else with the closest unmatched then. Rewrite the grammar to eliminate the dangling else.

d) Eliminate immediate left recursion A grammar is immediate left recursion if there is a derivation: AA AAAA for some string αα that is terminals or nonterminals. Example) Given the following grammar: EE EE + TT TT its body begins with EE, so procedure of EE is called recursively. This case is called immediate left recursion. Top-down parser cannot handle the immediate leftrecursive grammar. The immediate left recursion can be eliminated, by rewriting the grammar, as follows: Generalize: Let we have a grammar: AA AAAA ββ AA is immediate left recursive, and the left recursion can be removed by rewriting the grammar using new nonterminal RR: Now, RR is right recurve, and the grammar does not have left recursive. Note that the two grammars produce the same sentence ββββ αααα.

Generalize eliminating immediate left recursion: Consider the immediate left recursion of the following AA-productions: AA AAαα 1 AAαα 2 AAαα mm ββ 1 ββ 2 ββ nn Rewrite the grammar to eliminate the immediate left recursion: Example) Eliminate the immediate left recursion of the following grammar: EE EE + TT EE TT TT e) Eliminate left recursion A grammar is left recursive if there is a derivation AA + AAAA for some string αα. Example) Consider the following grammar: SS AAAA BBBB CC AA AAAA BBBB SSSS εε BB AAAA SSSS εε AA is immediate left recursive. SS and BB are not immediate left recursive, but left recursive.

Eliminate the left recursion: 1a. For ss-production. apply SS AAAA to AA-productions apply SS BBBB to BB-productions repeat 1b. remove immediate left recursion among SS-productions 2a. For next productions: AA-production apply AA BBBB to BB-productions 2b. Remove immediate left recursions among AA-productions 3a. For next productions: BB-production apply 3b. Remove immediate left recursions among BB-productions 4. Stop because there is no other productions to eliminate left recursions. Finally, the following grammar does not have left recursions.

f) Left factoring Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive, or topdown, parsing. Example: Consider the following if-statements: ssssssss iiii eeeeeeee tttttttt ssssssss eeeeeeee ssssssss iiii eeeeeeee tttttttt ssssssss Given input iiii or iiii eeeeeeee tttttttt ssssssss, we cannot immediately choose which alternative production should be applied. By rewriting the grammar, the if-statement can be leftfactored to defer the decision until its inputs are clear, as follows: This new grammar is still ambiguous because it still have dangling else problem. This problem will be resolved later. Generalize: When the choice between two or more alternative productions are not clear for a nonterminal AA, find the longest prefix αα common to two or more of its alternatives. Let the AA-production has the following form: AA ααββ 1 ααββ 2 ααββ nn γγ where αα, ββ 1,, ββ nn, γγ are terminals or nonterminals. The grammars can be rewritten to defer the decision until its inputs are clear, using a new nonterminal AA, as follows: