Context-Free Grammar (CFG)

Similar documents
Language properties and Grammar of Parallel and Series Parallel Languages

Grammars & Parsing, Part 1:

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

A General Class of Noncontext Free Grammars Generating Context Free Languages

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

CS 598 Natural Language Processing

Proof Theory for Syntacticians

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Parsing of part-of-speech tagged Assamese Texts

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

A Version Space Approach to Learning Context-free Grammars

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Natural Language Processing. George Konidaris

Developing a TT-MCTAG for German with an RCG-based Parser

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

Character Stream Parsing of Mixed-lingual Text

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Context Free Grammars. Many slides from Michael Collins

Compositional Semantics

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Refining the Design of a Contracting Finite-State Dependency Parser

"f TOPIC =T COMP COMP... OBJ

Factoring - Grouping

Backwards Numbers: A Study of Place Value. Catherine Perez

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Grammar for Battle Management Language

Some Principles of Automated Natural Language Information Extraction

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Hyperedge Replacement and Nonprojective Dependency Structures

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Linking Task: Identifying authors and book titles in verbose queries

AQUA: An Ontology-Driven Question Answering System

What the National Curriculum requires in reading at Y5 and Y6

Enumeration of Context-Free Languages and Related Structures

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Specifying Logic Programs in Controlled Natural Language

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

CS 101 Computer Science I Fall Instructor Muller. Syllabus

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Morphotactics as Tier-Based Strictly Local Dependencies

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Sample Problems for MATH 5001, University of Georgia

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

On the Polynomial Degree of Minterm-Cyclic Functions

Abstractions and the Brain

GRAMMAR IN CONTEXT 2 PDF

The New York City Department of Education. Grade 5 Mathematics Benchmark Assessment. Teacher Guide Spring 2013

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

The Role of the Head in the Interpretation of English Deverbal Compounds

Type Theory and Universal Grammar

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Are You Ready? Simplify Fractions

Aspectual Classes of Verb Phrases

Statewide Framework Document for:

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Analysis of Probabilistic Parsing in NLP

WSU Five-Year Program Review Self-Study Cover Page

Probabilistic Latent Semantic Analysis

Detecting English-French Cognates Using Orthographic Edit Distance

Math 098 Intermediate Algebra Spring 2018

Ohio s Learning Standards-Clear Learning Targets

Ch VI- SENTENCE PATTERNS.

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Accurate Unlexicalized Parsing for Modern Hebrew

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Standard 1: Number and Computation

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

INTERMEDIATE ALGEBRA PRODUCT GUIDE

The Strong Minimalist Thesis and Bounded Optimality

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

PowerTeacher Gradebook User Guide PowerSchool Student Information System

An Interactive Intelligent Language Tutor Over The Internet

S.V.P.T's SARASWATI VIDYALAYA & JR. COLLEGE, GHODBUNDER ROAD, THANE (W) STD-III SYLLABUS FOR TERM I ( ) SUBJECT - ENGLISH

Parsing natural language

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Physics 270: Experimental Physics

ARNE - A tool for Namend Entity Recognition from Arabic Text

The Discourse Anaphoric Properties of Connectives

Using dialogue context to improve parsing performance in dialogue systems

Introduction to CRC Cards

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

KLI: Infer KCs from repeated assessment events. Do you know what you know? Ken Koedinger HCI & Psychology CMU Director of LearnLab

Universiteit Leiden ICT in Business

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

The Interface between Phrasal and Functional Constraints

The stages of event extraction

Transcription:

Context-Free Grammar (CFG) Dr. Nadeem Akhtar Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur PhD Formal methods in Software engineering IRISA University of South Brittany FRANCE. 1

Context Free Grammars (CFG) Context-Free Grammar is a more powerful method of describing languages. CFGs have a recursive structure, which makes them useful in a variety of applications. - Study of human languages. One way of understanding the relationship of terms such as noun, verb, and preposition and their respective phrases leads to a natural recursion because noun phrases may appear inside verb phrases and vice versa. Context-free grammars can capture important aspects of these relationships. 2

Context Free Grammars (CFG) An important application of context-free grammars occurs in the specification and compilation of programming languages. A grammar for a programming language is a reference for people trying to learn the language syntax. Designers of compilers and interpreters for programming languages start by obtaining a grammar for the language. 3

Context Free Grammars (CFG) Parser of a compiler or interpreter extracts the meaning of a program prior to generating the compiled code or performing the interpreted execution. A number of methodologies facilitate the construction of a parser once a context-free grammar is available. Some tools even automatically generate the parser from the grammar. 4

Context Free Grammars (CFG) The collection of languages associated with context-free grammars are called the context-free languages. They include all the regular languages and many additional languages. A formal definition of context-free grammars and study the properties of context-free languages. Pushdown Automata, a class of machines recognizing the context-free languages. Pushdown automata allow us to gain additional insight into the power of context-free grammars. 5

Context Free Grammar (CFG) A CFG has four components 1) A set of terminal symbols, sometimes referred to as tokens. Terminals are the elementary symbols of the language defined by the grammar. 2) A set of non-terminals sometimes called syntactic variables. Each non-terminal represents a set of strings of terminals Instmt if(expr)stmtelsestmt stmt and expr are non-terminals. 3) A set of productions Each production consists of a non-terminal, called head or left side of the production, an arrow, and a sequence of terminals and/or non-terminals, called the body or right side of the production. 4) A designation of one of the non-terminals as the start symbol (head) 6

Regular Languages Closed under Union, Concatenation and Closure ( ) Recognizable by finite-state automata Denoted by Regular Expressions Generated by Regular Grammars 7

Context Free Grammars More general productions than regular grammars S w where wis any string of terminals and nonterminals What languages do these grammars generate? S (A) A ε aa ASA S ε asb 8

Context-free languages more general than regular languages {a n b n n 0} is not regular but it is context-free Why are they called context-free? Context-sensitive grammars allow more than one symbol on the left-hand side of productions xay x(s)y can only be applied to the nonterminal A when it is in the context of x and y 9

Example derivation in a Grammar Grammar: start symbol is A A aaa A B B bb B ε Sample Derivation: A aaa aaaaa aaaaaaa aaabaaa aaabbaaa aaabbbaaa aaabbaaa Language? 10

Derivations in Tree Form 11

Example CFG Anexampleofacontext-freegrammar,whichwecallG 1. A 0A1 A B B # Collection of substitution rules, called productions. Each rule appears as a line in the grammar, comprising a symbol and a string separated by an arrow. The symbol is called a variable. The string consists of variables and terminals. The variable symbols often are represented by capital letters. The terminals are analogous to the input alphabet and often are represented by lowercase letters, numbers, or special symbols. One variable is designated as the start variable. It usually occurs on the lefthandsideofthetopmostrule. 12

Example CFG An example of a context-free grammar, which we call G 1. A 0A1 A B B # Grammar G 1 contains three rules. G 1 's variables are A and B, where A is the start variable. Its terminals are 0, 1, and #. 13

Example CFG Grammar is used to describe a language by generating each string of that language in the following manner. 1. Write down the start variable. It is the variable on the left-hand side of the top rule, unless specified otherwise. 2. Find a variable that is written down and a rule that starts with that variable. Replace the written down variable with the right-hand side of that rule. 3. Repeat step 2 until no variables remain. 14

Example-1 CFG Context-free grammar, G 1. A 0A1 A B B # Grammar G 1 generates the string 000#111. The sequence of substitutions to obtain a string is called a derivation. A derivation of string 000#111 in grammar G 1 is A OA1 00A11 000A111 000B111 000#111 15

Parse tree for 000#111 in G 1 16

Example CFG All strings generated in this way constitute the language of the grammar. We write L(G 1 )for the language of grammar G 1. Grammar G 1 shows that L(G 1 )is {0 n #1 n n > 0}. Any language that can be generated by some context-free grammar is called a Context-Free Language (CFL). 17

FORMAL DEFINITION OF A CONTEXT-FREE GRAMMAR A context-free grammar is a 4-tuple (V,Σ, R, S), where 1. V is a finite set called the variables 2. Σ is a finite set, disjoint from V, called the terminals 3. R is a finite set of rules, with each rule being a variable and a string of variables and terminals, and 4. SЄVis the start variable. 18

Example-1 CFG Context-free grammar, G 1. A 0A1 A B B # In grammarg 1, V = {A, B}, Σ = {0, 1, #}, S = A, and R is the collection of the three rules 19

Example-2 CFG ConsidergrammarG 2 =({S},{a,b},R,S). Thesetofrules,R,isS asbissiε This grammar generates strings such as abab, aaabbb, and aababb. You can see more easilywhat this language is if you think of a as a left parenthesis" (" and b as a right parenthesis") ". Viewed in this way, L(G 2 ) is the language of all strings of properly nested parentheses. 20

Example-3 CFG Consider grammar G 3 = (V, Σ, R, <EXPR>). V is {<EXPR>, <TERM>, <FACTOR>} and Σ is {a, +, x, (, )}. The rules are <EXPR) <EXPR> + <TERM> <TERM> <TERM> <TERM> x <FACTOR> <FACTOR> <FACTOR> (<EXPR>) a 21

<EXPR) <EXPR> + <TERM> <TERM> <TERM> <TERM> x <FACTOR> <FACTOR> <FACTOR> (<EXPR>) a The two strings a+axa and (a+a)xa can be generated with grammar G 3. The parse trees are shown in the following figure. 22

Example Designing CFGs Grammar for the language {0 n 1 n n 0} U {1 n 0 n n 0} First construct the grammar S 1 0S 1 1 ε for the language {0 n 1 n n 0} and the grammar S 2 1S 2 0 ε for the language {1 n 0 n n 0} And then add the rule S S 1 S 2 Grammar: S S 1 S 2 S 1 0S 1 1 ε S 2 1S 2 0 ε 23

Ambiguous Grammar Sometimes a grammar can generate the same string in several different ways. Such a string will have several different parse trees and thus several different meanings. In a programming language, its important that a a given program should have a unique interpretation. 24

Ambiguous Grammar If a grammar generates the same string in several different ways, then that string is derived ambiguously in that grammar. If a grammar generates some string ambiguously we say that the grammar is ambiguous. 25

For example, consider the following grammar: <EXPR> <EXPR>+<EXPR> <EXPR> x<expr> ( <EXPR>) a This grammar generates the string a+axaambiguously. The following figure shows the two different parse trees. 26

Ambiguous Grammar A string w is derived ambiguously in context-free grammar G if it has two or more different leftmost derivations. Grammar G is ambiguous if it generates some string ambiguously. 27

Ambiguous Grammar Agrammarcanhavemorethanoneparsetree generating a given string of terminals. Such a grammar is said to be ambiguous Grammar is ambiguous, a terminal string that yieldofmorethanoneparsetree. 28

Ambiguity E E + E E E ( E ) id String id * id + id has the following two parse trees Enforces precedence of * over + Doesn t enforce this precedence 29

Dealing with Ambiguity The most direct way is to re-write the grammar unambiguously E E + E E E ( E ) id E E ' + E E ' E ' id * E ' id ( E ) * E ' ( E ) Enforces precedence of * over + 30

Example E E ' E ' + id * E E E ' ' id ( E ) * E ' ( E ) id + id * id E id *id + id E E + E E + E id E id * E id id * E id id 31

Example Another Ambiguous Grammar S x S y S z S S + S S S S S S * S S S / S S (S) Generates two parse trees for x + y * z Rewrite it as: T x T y T z S S + T S S T S S * T S S / T T ( S ) S T Enforces precedence of * over + TRY DIFFERENT INPUTS AT HOME 32

Context Free Grammar (CFG) Example Java if-else statement If (expression) statement else statement stmt if (expr) stmt else stmt The arrow may be read as can have the form. Such a rule is called a production 33

Context Free Grammar (CFG) Example list list + digit list list - digit list digit digit 0 1 2 3 4 5 6 7 8 9 list list + digit list digit digit The terminals are + -0 1 2 3 4 5 6 7 8 9 34

Context Free Grammar (CFG) Example Function call call id(optparams) optparams params є params params, param param 35

Context Free Grammar (CFG) Example Operators on the same line have the same associativity and precedence left-associative: + - left-associative: * / Two non-terminals expr and termfor the two levels of precedence, non-terminal factor for generating basic units in expressions 36

Context Free Grammar (CFG) factor digit (expr) Binary operators * and / have the highest precedence term term * factor term / factor factor Similarly, expr expr expr + term expr - term term The resulting grammar is therefore expr expr + term expr term term term term * factor term / factor factor factor digit (expr) 37

Context Free Grammar (CFG) Example: A grammar for a subset of Java statement stmt id = expression; if(expression) stmt if(expression) stmt else stmt while(expression) stmt do stmt while (expression); {stmts} stmts stmts stmt є 38

Context Free Grammar (CFG) Example: Grammar for statement blocks and conditional statements: stmt if expr then stmt else stmt if stmt then stmt begin stmtlist end stmtlist stmt; stmtlist stmt 39

Context Free Grammar (CFG) Exercise Consider the context-free grammar S SS+ SS* a (a) Show how the string aa+a* can be generated by this grammar (b)construct a parse tree for this string (c) What language does this grammar generate? Justify your answer. 40

CFG vs. Regex CFGs are a more powerful notation than regexes Every construct that can be described by a regex can also be described by the CFG, but not viceversa Every regular language is a context-free language, but not vice versa. 41

CFG vs. Regex Regex: (a b)*abb Grammar: Describe the same language: the set of strings of a sand b s ending with abb A aa ba ba ba 0 2 3 0 aa 1 42

CFG vs. Regex Language L = {a n b n n>=1} can be described by a grammar but not by a regex Suppose L was defined by some regex We could construct a DFA with a finite number of states, say k, to accept L s 0 Path a i --- Path a j-i s i State s i : For an input beginning with more than ka s a i b i is in the language: A path b i from s i to state f Path a j b i is also possible --- Path b i This DFA accepts both a i b i and a j b i f DFA cannot count, i.e., keep track of the number of a sbefore it sees the b s 43

Context-free grammars are widely used for programming languages From the definition of Algol-60: procedure_identifier::= identifier. actual_parameter::= string_literal expression array_identifier switch_identifier procedure_identifier. letter_string::= letter letter_string letter. parameter_delimiter::= "," ")" letter_string":" "(". actual_parameter_list::= actual_parameter actual_parameter_list parameter_delimiter actual_parameter. actual_parameter_part::= empty "(" actual_parameter_list} ")". function_designator::= procedure_identifier actual_parameter_part. 44

Example adding_operator::= "+" " ". multiplying_operator::= " " "/" " ". primary::= unsigned_number variable function_designator "(" arithmetic_expression ")". factor::= primary factor factor power primary. term::= factor term multiplying_operator factor. simple_arithmetic_expression::= term adding_operator term simple_arithmetic_expression adding_operator term. if_clause::= if Boolean_expression then. arithmetic_expression::= simple_arithmetic_expression if_clause simple_arithmetic_expression else arithmetic_expression. if a < 0 then U+V else if a * b < 17 then U/V else if k <> y then V/U else 0 45

NFA To CFG Conversion We can mechanically construct the CFG from an NFA Converting the NFA for (a b)*abb into CFG For each state iof the NFA, create a non-terminal A i If ihas a transition to jon input a, add A If ihas a transition to jon input ε, add If iis an accepting state, add i aa j A i A j If iis the start state, make A i the start symbol of the grammar. A i 46

BNF: Meta-Syntax for CFGs <postal-address>::= <name-part> <street-address> <zip-part> <name-part>::= <personal-part> <last-name> <opt-jr-part> <EOL> <personal-part> <name-part> <personal-part>::=<first-name> <initial>"." <street-address>::= <house-num> <street-name> <opt-apt-num> <EOL> <zip-part>::= <town-name>"," <state-code> <ZIP-code> <EOL> <opt-jr-part>::="sr." "Jr." <roman-numeral> "" 47

Left-Most Derivation Parse Tree 48

Right-Most Derivation Parse Tree 49

Ambiguity There are no general techniques for handling ambiguity It is impossible to automatically convert an ambiguous grammar into an unambiguous one If used sensibly, ambiguity can simplify the grammar Disambiguation Rules: Instead of re-writing the grammar, we can Use the ambiguous grammar Along with disambiguation rules. 50

Associativityof Operators The operator + associates to the left An operator with + signs on both sides of it belongs to the operator to its left. In most programming languages the four arithmetic operators, addition, subtraction, multiplication, and division are left associative. 51

Right Associative Operator The operator = associates to the right right letter = right letter letter a b z Parsetreefor9 5 2growsdowntowardsthe left, whereas parse tree for a=b=c grows down towards the right 52

Precedence of Operators Associativity rules for + and * apply to occurrences of the same operator Rule *hasthehighestprecedencethan+if*takesits operands before + does Multiplication and division have higher precedence than addition and subtraction. 9+5*2and9*5+2equivalentto9+(5*2) and(9*2)+2 53

Questions CFGrepresentingtheRegularExpressiona + A aa a CFG representing the Regular Expression b* B B bb 54

Questions CFGrepresentingtheRegularExpressiona*b + (i.e. start with any number of a s followed by non-zero numbers of b) S R as R b br A CFG representing the Regular Expression ab + a (i.e. start with a followed by non-zero numbers of b sand ends with a) S ara R b br 55

Questions Every construct described by a Regular Expression can also be described by a CFG. Consider the following regular expression: (a b)*abbwhere Σ = {a, b} Create an equivalent CFG of the above regular expression A 0 aa 0 ba 0 aa 1 A 1 ba 2 A 2 ba 3 A 3 є 56