Context-sensitive languages

Similar documents
Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

A General Class of Noncontext Free Grammars Generating Context Free Languages

Language properties and Grammar of Parallel and Series Parallel Languages

CS 598 Natural Language Processing

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Parsing of part-of-speech tagged Assamese Texts

Grammars & Parsing, Part 1:

Proof Theory for Syntacticians

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Developing a TT-MCTAG for German with an RCG-based Parser

"f TOPIC =T COMP COMP... OBJ

Hyperedge Replacement and Nonprojective Dependency Structures

An Introduction to the Minimalist Program

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

A Version Space Approach to Learning Context-free Grammars

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Context Free Grammars. Many slides from Michael Collins

Natural Language Processing. George Konidaris

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Cognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Multiple case assignment and the English pseudo-passive *

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Constraining X-Bar: Theta Theory

Today we examine the distribution of infinitival clauses, which can be

Enumeration of Context-Free Languages and Related Structures

GRAMMAR IN CONTEXT 2 PDF

Getting Started with Deliberate Practice

Specifying Logic Programs in Controlled Natural Language

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Compositional Semantics

Prediction of Maximal Projection for Semantic Role Labeling

The Interface between Phrasal and Functional Constraints

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Foundations of Knowledge Representation in Cyc

The Strong Minimalist Thesis and Bounded Optimality

Som and Optimality Theory

Control and Boundedness

Chapter 4: Valence & Agreement CSLI Publications

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Reinforcement Learning by Comparing Immediate Reward

Minimalism is the name of the predominant approach in generative linguistics today. It was first

The Evolution of Random Phenomena

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

A Grammar for Battle Management Language

Some Principles of Automated Natural Language Information Extraction

On the Polynomial Degree of Minterm-Cyclic Functions

UCLA UCLA Electronic Theses and Dissertations

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

An Interactive Intelligent Language Tutor Over The Internet

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Event on Teaching Assignments October 7, 2015

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Probabilistic Latent Semantic Analysis

Lecture 10: Reinforcement Learning

Should a business have the right to ban teenagers?

Pre-Processing MRSes

Part I. Figuring out how English works

Refining the Design of a Contracting Finite-State Dependency Parser

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Developing Grammar in Context

LNGT0101 Introduction to Linguistics

Detecting English-French Cognates Using Orthographic Edit Distance

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Tap vs. Bottled Water

Implementing a tool to Support KAOS-Beta Process Model Using EPF

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

How long did... Who did... Where was... When did... How did... Which did...

LTAG-spinal and the Treebank

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

arxiv: v1 [math.at] 10 Jan 2016

Dependency, licensing and the nature of grammatical relations *

BASIC EDUCATION IN GHANA IN THE POST-REFORM PERIOD

CSC200: Lecture 4. Allan Borodin

This curriculum is brought to you by the National Officer Team.

Morphotactics as Tier-Based Strictly Local Dependencies

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Lexical phonology. Marc van Oostendorp. December 6, Until now, we have presented phonological theory as if it is a monolithic

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Presentation Advice for your Professional Review

Evidence for Reliability, Validity and Learning Effectiveness

L1 and L2 acquisition. Holger Diessel

AQUA: An Ontology-Driven Question Answering System

FREQUENTLY ASKED QUESTIONS

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Transcription:

Informatics 2A: Lecture 28 Alex Simpson School of Informatics University of Edinburgh als@inf.ed.ac.uk 22 November, 2012 1 / 19

1 Showing a language isn t context-free 2 3 4 2 / 19

Non-context-free languages We saw in Lecture 8 that the pumping lemma can be used to show a language isn t regular. There s also a context-free version of this lemma, which can be used to show that a language isn t even context-free: Pumping Lemma for context-free languages. Suppose L is a context-free language. Then L has the following property. (P) There exists k 0 such that every z L with z k can be broken up into five substrings, z = uvwxy, such that vx 1, vwx k and uv i wx i y L for all i 0. 3 / 19

Context-free pumping lemma: the idea In the regular case, the key point is that any sufficiently long string will visit the same state twice. In the context-free case, we note that any sufficiently large syntax tree will have a downward path that visits the same non-terminal twice. We can then pump in extra copies of the relevant subtree and remain within the language: S S P P P P P P 4 / 19

Context-free pumping lemma: continued More precisely, suppose L has a CFG with m non-terminals. Then take k so large that the syntax tree for any string of length k must contain a path of length > m. Such a path is guaranteed to visit the same nonterminal twice. To show that a language L is not context free, we just need to prove that it satisfies the negation ( P) of the property (P): ( P) For every k 0, there exists z L with z k such that, for every decomposition z = uvwxy with vx 1 and vwx k, there exists i 0 such that uv i wx i y / L. 5 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b k c k. Then indeed z L and z k. 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b k c k. Then indeed z L and z k. Suppose we have a decomposition z = uvwxy with vx 1 and vwx k. 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b k c k. Then indeed z L and z k. Suppose we have a decomposition z = uvwxy with vx 1 and vwx k. Since vwx k, the string vwx contains at most two different letters. So there must be some letter d {a, b, c} that does not occur in vwx. 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b k c k. Then indeed z L and z k. Suppose we have a decomposition z = uvwxy with vx 1 and vwx k. Since vwx k, the string vwx contains at most two different letters. So there must be some letter d {a, b, c} that does not occur in vwx. But then uwy / L because at least one character different from d now occurs < k times, whereas d still occurs k times. 6 / 19

Standard example 1 The language L = {a n b n c n n 0} isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b k c k. Then indeed z L and z k. Suppose we have a decomposition z = uvwxy with vx 1 and vwx k. Since vwx k, the string vwx contains at most two different letters. So there must be some letter d {a, b, c} that does not occur in vwx. But then uwy / L because at least one character different from d now occurs < k times, whereas d still occurs k times. We have shown that ( P) holds with i = 0. 6 / 19

Standard example 2 The language L = {ss s {a, b} } isn t context-free! We prove that ( P) holds for L: Suppose k 0. We choose z = a k b a k b a k b a k b. Then indeed z L and z k. Suppose we have a decomposition z = uvwxy with vx 1 and vwx k. Since vwx k, the string vwx contains at most one b. There are two main cases: vx contains b, in which case uwy contains exactly 3 b s. Otherwise uwy has the form z = a g b a h b a i b a j b where either: exactly two adjacent numbers from g, h, i, j are < k (this happens if w contains b and v 1 x ), or exactly one of g, h, i, j is < k (this happens if w contains b and one of v, x is empty, or if vwx does not contain b). In each case, we have uwy / L. So ( P) holds with i = 0. 7 / 19

Complementation Consider the language L defined by: This is context free. (Exercise!) The complement of L is {a, b} {ss s {a, b} } {a, b} L = {a, b} ({a, b} {ss s {a, b} }) = {ss s {a, b} } Thus the complement of a context-free language is not necessarily context free. Context-free languages are not closed under complement. 8 / 19

Clicker question What method would you use to show that the language {a, b} {ss s {a, b} } is context free? 1 Construct an NFA for it. 2 Find a regular expression for it. 3 Build a CFG for it. 4 Construct a PDA for it. 5 Apply the context-free pumping lemma. 9 / 19

Context sensitive grammars A Context Sensitive Grammar has productions of the form αx γ αβγ where X is a nonterminal, and α, β, γ are sequences of terminals and nonterminals (i.e., α, β, γ (N Σ) ) with the requirement that β is nonempty. So the rules for expanding X can be sensitive to the context in which the X occurs (contrasts with context-free). Minor wrinkle: The nonempty restriction on β disallows rules with right-hand side ɛ. To remedy this, we also permit the special rule S ɛ where S is the start symbol, and with the restriction that this rule is only allowed to occur if the nonterminal S does not appear on the right-hand-side of any productions. 10 / 19

Context sensitive languages A language is context sensitive if it can be generated by a context sensitive grammar. The non-context-free languages: are both context sensitive. {a n b n c n n 0} {ss s {a, b} } In practice, it can be quite an effort to produce context sensitive grammars, according to the definition above. It is often more convenient to work with a more liberal notion of grammar for generating context-sensitive languages. 11 / 19

General and noncontracting grammars In a general or unrestricted grammar, we allow productions of the form α β where α, β are sequences of terminals and nonterminals, i.e., α, β (N Σ), with α containing at least one nonterminal. In a noncontracting grammar, we restrict productions to the form α β with α, β as above, subject to the additional requirement that α β (i.e., the sequence β is at least as long as α). In a noncontracting grammar also permit the special production S ɛ where S is the start symbol, as long as S does not appear on the right-hand-side of any productions. 12 / 19

Example noncontracting grammar Consider the noncontracting grammar with start symbol S: S abc S asbc cb Bc bb bb Example derivation (underlining the sequence to be expanded): S asbc aabcbc aabbcc aabbcc Exercise: Convince yourself that this grammar generates exactly the strings a n b n c n where n > 0. (N.B. With noncontracting grammars and CSGs, need to think in terms of derivations, not syntax trees.) 13 / 19

Noncontracting = Context sensitive Theorem. A language is context sensitive if and only if it can be generated by a noncontracting grammar. That every context-sensitive language can be generated by a noncontracting grammar is immediate, since context-sensitive grammars are, by definition, noncontracting. The proof that every noncontracting grammar can be turned into a context sensitive one is intricate, and beyond the scope of the course. Sometimes (e.g., in Kozen) noncontracting grammars are called context sensitive grammars; but this is not faithful to Chomsky s original definition. 14 / 19

The Chomsky Hierarchy At this point, we have a fairly complete understanding of the machinery associated with the different levels of the Chomsky hierarchy. Regular languages: DFAs, NFAs, regular expressions, regular grammars. Context-free languages: context-free grammars, nondeterministic pushdown automata. : context-sensitive grammars, noncontracting grammars. Recursively enumerable languages: unrestricted grammars. 15 / 19

Examples of context sensitivity in natural language were presented in Lecture 25. Agreement phenomena in many languages (e.g., verb-subject agreement). Crossing dependencies in Swiss German (and Dutch). There are other similar phenomena. It is believed that natural languages naturally live (comfortably) within the context-sensitive level of the Chomsky hierarchy. 16 / 19

Context-sensitivity in programming languages Some aspects of typical programming languages can t be captured by context-free grammars, e.g. Typing rules Scoping rules (e.g. variables can only be used in contexts where they have been declared ) Access constraints (e.g. use of public vs. private methods in Java). The usual approach is to give a CFG that s a bit too generous, and then separately describe these additional rules. (E.g. typechecking done as a separate stage after parsing.) In principle, though, all the above features fall within what can be captured by context-sensitive grammars. In fact, no programming language known to humankind contains anything that can t. 17 / 19

Scoping constraints aren t context-free Consider the simple language L 1 given by S ɛ declare v; S use v; S where v stands for a lexical class of variables. Let L 2 be the language consisting of strings of L 1 in which variables must be declared before use. Assuming there are infinitely many possible variables, it s a little exercise to show L 2 is not context-free, but is context-sensitive. (If there are just n possible variables, we could in theory give a CFG for L 2 with around 2 n nonterminals but that s obviously silly... ) 18 / 19

Summary are a big step up from context-free languages in terms of their power and generality. Natural languages have features that can t be captured conveniently (or at all) by context-free grammars. However, it appears that NLs are only mildly context-sensitive they only exploit the low end of the power offered by CSGs. Programming languages contain non-context-free features (typing, scoping etc.), but all these fall comfortably within the realm of context-sensitive languages. Next time: what kinds of machines are needed to recognize context-sensitive languages? 19 / 19