CSCI 1010 Models of Computa3on. Lecture 16 The Chomsky Language Hierarchy

Similar documents
A General Class of Noncontext Free Grammars Generating Context Free Languages

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Language properties and Grammar of Parallel and Series Parallel Languages

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

A Version Space Approach to Learning Context-free Grammars

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Grammars & Parsing, Part 1:

Proof Theory for Syntacticians

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

CS 598 Natural Language Processing

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

"f TOPIC =T COMP COMP... OBJ

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Context Free Grammars. Many slides from Michael Collins

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Enumeration of Context-Free Languages and Related Structures

Parsing of part-of-speech tagged Assamese Texts

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Natural Language Processing. George Konidaris

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

GRAMMAR IN CONTEXT 2 PDF

Self Study Report Computer Science

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Hyperedge Replacement and Nonprojective Dependency Structures

Disambiguation of Thai Personal Name from Online News Articles

On the Polynomial Degree of Minterm-Cyclic Functions

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Radius STEM Readiness TM

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Detecting English-French Cognates Using Orthographic Edit Distance

Liquid Narrative Group Technical Report Number

Discriminative Learning of Beam-Search Heuristics for Planning

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Root Cause Analysis. Lean Construction Institute Provider Number H561. Root Cause Analysis RCA

Abstractions and the Brain

Morphotactics as Tier-Based Strictly Local Dependencies

The Strong Minimalist Thesis and Bounded Optimality

Developing a TT-MCTAG for German with an RCG-based Parser

Noisy SMS Machine Translation in Low-Density Languages

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

This scope and sequence assumes 160 days for instruction, divided among 15 units.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Market Design and Computer- Assisted Markets: An Economist s Perspec;ve. Simons Ins;tute, Berkeley May 31, 2013

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Pre-Processing MRSes

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

WSU Five-Year Program Review Self-Study Cover Page

Parsing natural language

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

A Grammar for Battle Management Language

Math DefragGED: Calculator Tips and Tricks

Primary National Curriculum Alignment for Wales

The Interface between Phrasal and Functional Constraints

4-3 Basic Skills and Concepts

Fostering Success Coaching: Effective partnering with students from foster care. Maddy Day, MSW Jamie Crandell, MSW Courtney Maher

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CSC200: Lecture 4. Allan Borodin

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Compositional Semantics

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

An Introduction to the Minimalist Program

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Prediction of Maximal Projection for Semantic Role Labeling

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Ensemble Technique Utilization for Indonesian Dependency Parser

Daily Assessment (All periods)

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Statewide Framework Document for:

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION PHYSICAL SETTING/PHYSICS

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning to Rank with Selection Bias in Personal Search

The New York City Department of Education. Grade 5 Mathematics Benchmark Assessment. Teacher Guide Spring 2013

Functional Skills Mathematics Level 2 assessment

Preparing for the oral. GCSEs in Arabic, Greek, Japanese & Russian

Constraining X-Bar: Theta Theory

Principal Points. Homo Erectus and the Semio2c Progression. Central Thesis 8/8/16

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Computational Grammars

Math 96: Intermediate Algebra in Context

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Guide to the Uniform mark scale (UMS) Uniform marks in A-level and GCSE exams

Linking Task: Identifying authors and book titles in verbose queries

SYSTEM ENTITY STRUCTUURE ONTOLOGICAL DATA FUSION PROCESS INTEGRAGTED WITH C2 SYSTEMS

Accurate Unlexicalized Parsing for Modern Hebrew

ADMN-1311: MicroSoft Word I ( Online Fall 2017 )

MASTERS VS. PH.D. WHICH ONE TO CHOOSE? HOW FAR TO GO? Rita H. Wouhaybi, Intel Labs Bushra Anjum, Amazon

CS Machine Learning

An Interactive Intelligent Language Tutor Over The Internet

Transcription:

CSCI 1010 Models of Computa3on Lecture 16 The Chomsky Language Hierarchy

Overview Defini3ons of phrase structure, context-free and regular languages Proof showing that languages defined by regular grammars and languages recognized by finite-state machines are the same. Parse trees for CFLs Conver3ng CFGs to Chomsky normal form

Chomsky Hierarchy Four language types, one less expressive than the next, each with its own grammar rules. Regular Context-Free Context-Sensi3ve Phrase Structure

The Chomsky Hierarchy Phrase structure languages are most expressive and recognized by Turing machines. Context-sensi3ve languages are recognized by linear-bounded automata, TM s with amount of space bounded by O( input ). Context-free languages are recognized by pushdown automata. Regular languages are recognized by FSMs.

Phrase Structure Languages Defined by grammars: G = (N, T, R, S) 1. N = non-terminals, T = terminals, V = N T, and start symbol S N 2. Rules R V + V*, R is finite. For (a, b) R, a contains at least one non-terminal. Also, (a, a) R for all a in N +. 3. If (a, b) R, we write a b and say b is derived from a 4. Let u V + and a be a substring of u. Let a b. If v is obtained by replacing a by b in u, we write u G v (immediate deriva4on of v from u.) 5. If u G x 1 G x 2 G... G x n G v, we write u * G v. Here * G is the transi3ve closure of G. 6. The language defined by G is the set of terminal strings derived from S using the rules R, L(G) = {v T* S * G v}

Context-Sensi3ve Languages Context-sensi3ve grammars are phrase structure grammars in which for each rule (a, b) R has a b. Context-sensi3ve languages are generated by context-sensi3ve grammars.

Example G1 = (N 1,T 1 R 1,S) is context-sensi3ve where N 1 = {S,B,C}, T 1 = {a, b, c} and R 1 shown below. Context is important! L(G) contains aabbcc, which follows from rules (a), (b), (c), (d), (e), (f), and (g) that produce a terminal string: S asbc aabcbc aabbcc aabbcc aabbcc aabbcc aabbcc. S a n (BC) n is possible using (a) n-1 4mes and (b). If (c) is not used to produce S a n B n C n, substring cb occurs for which there is no rule. Thus, L(G) = {a n b n c n n 1}.

Context-Free Languages A context-free grammar is a phrase-structure grammar G = (N, T, R, S) in which rules have a single non-terminal on the leh. Context-free languages generate context-free grammars. Example: N 2 = {S}, T 2 = {ε,a,b}, R 2 = {S asb, S ε}. Then, G 2 = (N 2,T 2,R 2,S) is context-free. L(G 2 )= {a n b n n 0}. To see this, apply S asb n 3mes to give S a n Sb n aher which apply S ε.

Context-Free Languages Context-free languages are widely used to parse a large por3on of programming languages. They need to be augmented with seman3c analysis because such languages are not context-free. For example, in the statement name1 = name2; name2 could be either a func3on or variable depending on context. A parse of tree of would be augmented with this type of informa3on.

Regular Languages A regular grammar is a context-free grammar G = (N,T,R,S) in which the right-hand side of each rule is either a terminal or a terminal followed by a non-terminal. That is, they are of the form A bc or A a. Regular languages are generated by regular grammars.

Regular Languages Example: Let N 4 = {S,A B}, T 4 = {0,1}, R 4 below. The rules given above are equivalent to S 0, S 01B, B 01B, B 0. Thus, the original and new grammars both generate the language L(G 4 ) = (01)*0. We now give an FSM that recognizes L(G 4 ).

Recognizing Regular Languages Theorem: The regular languages and those recognized by FSM s are the same. Proof If G is regular, L(G) is recognized by an FSM. Replace each rule A a by the two rules A af and F ε where F is a new non-terminal. Construct a state for each non-terminal. Insert edge from state A to state B with label a for each rule A ab. Make A final if A ε. This FSM accepts w such that S wb where B ε. It recognizes L(G). The FSM is nondeterminis3c.

Recognizing Regular Languages

Recognizing Regular Languages Proof (cont.) Given FSM M, there is a regular grammar G genera4ng language recognized by M. Let G have one non-terminal q i for each state of M and one rule of the form q i aq j if there is an edge labeled a from q i to q j. Add the rule q i ε if q i is a final state. Ini3al state q 0 is associated with the start symbol S of G. The set of strings {w} that takes M from the ini3al state to a final state is the same set of strings generated by G such that S wb where B ε.

Parse Trees for CFLs Example: G 3 = (N 3,T 3,R 3,S) A deriva3on of caacaabcbc and its parse tree. s cmnc camanc ca 2 Ma 2 Nc ca 2 ca 2 Nc ca 2 ca 2 bnbc ca 2 ca 2 bcbc

Parse Trees Yield of tree is the string of characters at the leaves. The height of a parse tree is length of its longest path. In a lehmost deriva3on, rules invoked in depth-first leh to right order. Rightmost deriva3on similar.

Context-Free Languages (CFLs) Recall: A context-free grammar (CFG) is a phrase-structure grammar G = (N,T,R,S) in which each rule has only a single non-terminal on the leh. CFLs are generated by context-free grammars. Example: Let N 2 = {S}, T 2 = {ε, a, b}, R 2 = {S asb, S ε}. Then, G 2 = (N 2,T 2,R 2,S) is context-free.

Chomsky Normal Form A CFG G = (N, T, R, S) is in Chomsky normal form if every rule is of the form A BC or A b, b T, except if ε L(G) in which case S ε is also a rule. Theorem: Every CFL L can be generated by a CFG in Chomsky normal form.

Chomsky Normal Form Example Example: G 3 = (N 3,T 3,R 3,S). A Chomsky normal form grammar genera3ng this language uses (c) & (e) and replaces others by: (a) S CD, C c, D ME, E NC, (b) M AF, A a, F MA (d) N BG, B b, G NB

Conver3ng to Chomsky Normal Form Theorem: Every CFL L can be generated by a CFG in Chomsky normal form. Proof: If ε L, add S ε. Let L be generated by G. Convert G to G in Chomsky normal form in stages. a) Eliminate from G ε-rules of the form B ε (except for S ε) as follows: for each rule with at least one 1 B in right-hand side, e.g. A αbβbγ (α,β,γ are strings), add all possible rules formed by replacing B by ε in all possible ways e.g. A αβbγ, A αbβγ, A αβγ, giving four rules for one original rule.

Conver3ng to Chomsky Normal Form Proof (cont.) b) For rules A αw i β (α,β are strings) with w i T, replace it by A αz i β & add rule Z i w i, where Z i is a new non-terminal. Con3nue un3l all rules have a single terminal on right or a string of non-terminals. This new grammar also generates L.

Conver3ng to Chomsky Normal Form Proof (cont.) Rules are now of the form: a) A b for b T, b) S ε, c) A Z 1 Z 2... Z k, for Z i N. Consider rules of type c) with k = 1. Cascading such rules gives deriva3ons A B; delete all rules of type c) with k = 1 and replace them with A B if A B. The same language is generated.

Conver3ng to Chomsky Normal Form Proof (cont.) If C D and D b, add C b, dele3ng all rules of the form A B. This generates same language; all remaining rules are of the form S ε, A b or A Z 1 Z 2... Z k with k 2, Z i N. Now replace all rules of the form A Z 1 Z 2... Z k by the rules A Z 1 N 1, N 1 Z 2 N 2,..., N k-3 Z k-2 N k-2, N k-2 Z k-1 Z k where each N i is a new nonterminal. This new grammar is in correct form and generates L. Q.E.D.

Example Let G = (N,T,R,E) be grammar with N = {E,T,F}, T = {a,b,+,*,(,)} and let R have following rules: E, T, F denote expressions, terms & factors. Easy to * * show that E (a*b+a)*(a+b) and E a*b+a. This grammar doesn t have ε rules. Use *, (, ), +, as non-terminals for *, (, ), and +.

Example Transform as indicated un3l only non-terminals on right. Then, reduce the number of non-terminals on the right to two.

Example The grammar is now in Chomsky normal form.

Summary Defini3ons of phrase structure, context-free and regular languages Proof showing that languages defined by regular grammars and languages recognized by finite-state machines are the same. Parse trees for CFLs Conver3ng CFGs to Chomsky normal form