CSCI-GA Compiler Construction Lecture 6: Syntax Analysis. Mohamed Zahran (aka Z)

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Grammars & Parsing, Part 1:

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Proof Theory for Syntacticians

CS 598 Natural Language Processing

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

The Writing Process. The Academic Support Centre // September 2015

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

An Introduction to the Minimalist Program

Linking Task: Identifying authors and book titles in verbose queries

Some Principles of Automated Natural Language Information Extraction

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Cognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.

PowerTeacher Gradebook User Guide PowerSchool Student Information System

Software Maintenance

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

School of Innovative Technologies and Engineering

Specifying Logic Programs in Controlled Natural Language

AQUA: An Ontology-Driven Question Answering System

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Getting Started with Deliberate Practice

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Context Free Grammars. Many slides from Michael Collins

The Strong Minimalist Thesis and Bounded Optimality

Guidelines for Writing an Internship Report

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

GACE Computer Science Assessment Test at a Glance

Developing a TT-MCTAG for German with an RCG-based Parser

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Organizing Comprehensive Literacy Assessment: How to Get Started

Compositional Semantics

Foundations of Knowledge Representation in Cyc

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

The Discourse Anaphoric Properties of Connectives

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

The College Board Redesigned SAT Grade 12

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Name: Class: Date: ID: A

Measurement & Analysis in the Real World

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

M55205-Mastering Microsoft Project 2016

Outreach Connect User Manual

CS 100: Principles of Computing

Course Content Concepts

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

An Interactive Intelligent Language Tutor Over The Internet

Refining the Design of a Contracting Finite-State Dependency Parser

Language properties and Grammar of Parallel and Series Parallel Languages

Character Stream Parsing of Mixed-lingual Text

The Interface between Phrasal and Functional Constraints

"f TOPIC =T COMP COMP... OBJ

BENCHMARK MA.8.A.6.1. Reporting Category

Natural Language Processing. George Konidaris

Section 3.4. Logframe Module. This module will help you understand and use the logical framework in project design and proposal writing.

Introduction to Simulation

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

How to set up gradebook categories in Moodle 2.

Abstractions and the Brain

Using dialogue context to improve parsing performance in dialogue systems

National Literacy and Numeracy Framework for years 3/4

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Math 96: Intermediate Algebra in Context

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Generating Test Cases From Use Cases

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Constraining X-Bar: Theta Theory

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

A Version Space Approach to Learning Context-free Grammars

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

MOODLE 2.0 GLOSSARY TUTORIALS

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Ensemble Technique Utilization for Indonesian Dependency Parser

The Indices Investigations Teacher s Notes

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Contents. Foreword... 5

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Tap vs. Bottled Water

Global School-based Student Health Survey (GSHS) and Global School Health Policy and Practices Survey (SHPPS): GSHS

(Sub)Gradient Descent

Statewide Framework Document for:

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Mathematics Success Grade 7

Hyperedge Replacement and Nonprojective Dependency Structures

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

A General Class of Noncontext Free Grammars Generating Context Free Languages

Phonological and Phonetic Representations: The Case of Neutralization

Transcription:

CSCI-GA.2130-001 Compiler Construction Lecture 6: Syntax Analysis Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Context-Free Grammars Precise syntactic specifications of a programming language For some classes, we can construct automatically an efficient parser Allows a language to evolve

The Parser

The Parser Three general types of parsers Universal parsing methods: can parse any grammar too inefficient to use in production compilers

The Parser Three general types of parsers Top-down methods: Parse-trees built from root to leaves. Input to parser scanned from left to right one symbol at a time

The Parser Three general types of parsers Bottom-up methods: Start from leaves and work their way up to the root. Input to parser scanned from left to right one symbol at a time

Dealing With Errors If compiler had to process only correct programs, its design and implementation would be simplified greatly! Few languages have been designed with error handling in mind. Error handling is left to compiler designer. Bugs caused about 50% of the total cost, same as they used to be 50 years ago!

Common Programming Errors Lexical errors: misspellings of identifiers, keywords, or operators Syntactic errors: misplaced semicolons, extra or missing braces, case without switch,. Semantic errors: type mismatches between operators and operands Logical errors: anything else!

Wish List Report the presence of errors clearly and accurately Recover from each error quickly enough to detect subsequent errors Add minimal overhead to the processing of correct programs Easier said than done!

Error-Recovery Strategies Simplest: quit with an informative error message when detecting the first error Panic-mode Recovery: discards input symbols one at a time until a designated synchronizing tokens is found. Phrase-level Recovery: perform local correction on the remaining input. The choice of local correction is left to the compiler designer. Error Production: production rules for common errors.

Context-Free Grammar Terminals (token name) Example: Nonterminals Start Symbol Productions

Derivations Starting with start symbol At each step: a nonterminal replaced with the body of a production Example: Deriving: -(id + id)

More on Derivations means derive in one step means derive in zero or more steps means derive in one or more steps Leftmost derivations, the leftmost nonterminal in each sentential is always chosen. Rightmost derivations, the rightmost nonterminal in each sentential is always chosen.

For the context-free grammar: Example

Parse Trees What is the relationship between a parse-tree and derivations? Parse tree is the graphical representation of derivations Filters out order of nonterminal replacement many-to-one relationship between derivations and parse-tree

Context-Free Grammar Vs Regular Expressions Grammars are more powerful notations than regular expressions Every construct that can be described by a regular expression can be described by a grammar, but not vice-versa Regular expression -> NFA then:

(a b)*abb

Question Worth Asking If grammars are much powerful than regular expressions, why not using them in lexical analysis too? Lexical rules are quite simple and do not need notation as powerful as grammars Regular expressions are more concise and easier to understand for tokens More efficient lexical analyzers can be generated from regular expressions than from grammars

How Can We Enhance Our Grammar? Eliminating ambiguity Eliminating left-recursion Left factoring

Eliminating Ambiguity Sometimes we can re-write grammar to eliminate ambiguity

Eliminating Left-Recursion How about something like:

Left-Factoring A way of delaying the decision until more info is available Example: stmt -> EXP else stmt EXP EXP -> if expr then stmt

Top-Down Parsing Constructing a parse tree for an input string starting from root Parse tree built in preorder (depth-first) Finding left-most derivation At each step of a top-down parse: determine the production to be applied matching terminal symbols in production body with input string

Given: and:

Recursive-Descent Parsing How?

Example of Backtracking and input

Important Concepts: FIRST and FOLLOW

Example FIRST FOLLOW ( id )$ + ε )$ ( id + ) $ * ε + ) $ ( id * + ) $

LL(1) Grammars For recursive-descent parsers with no backtracking L = scan from left to right L = left-most derivation 1 symbol lookahead Cannot be left-recursive or ambiguous If A-> F T FIRST(F) and FIRST(T) are disjoint if ε is in FIRST(T) then FIRST(F) and FOLLOW(A) are disjoint and likewise when ε is in FIRST(F)

Parsing Table

Parsing Table Two dimensional array Rows: nonterminals Columns: input symbols M[A,a] where A is nonterminal and a is terminal or $ Gives the production rule to use.

First Follow ( id )$ + ε )$ ( id + ) $ * ε + ) $ ( id * + ) $

Exercise For the following productions: S-> +SS * SS a Write predictive parser Write parsing table Show how to parse: +*aaa

Bottom-Up Parsing Given a string of terminals Build parse tree starting from leaves and working up toward the root reverse of right-most derivation Used for type of grammars called LR LR parsers are difficult to build by hand We use automatic parser generators for LR grammars

Given: and the string:

Shift-Reduce Parsing Form of bottom-up parsing Consists of: Stack: holds grammar symbols input buffer: holds the rest of the string to be parsed Handle always appears on the top of the stack Initial position: Final position (success) Actions: shift, reduce, accept, error

Exercise Let s apply shift-reduce to the following input: 00S11 and the following productions: S-> 0S1 01

So Skim: 4.2.6, 4.3.5, 4.4.4, 4.4.5 Read rest of 4.1 to 4.5