Ling 566 Oct 3, Context-Free Grammar CSLI Publications

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Grammars & Parsing, Part 1:

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Proof Theory for Syntacticians

Chapter 4: Valence & Agreement CSLI Publications

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Context Free Grammars. Many slides from Michael Collins

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Some Principles of Automated Natural Language Information Extraction

A Grammar for Battle Management Language

"f TOPIC =T COMP COMP... OBJ

An Introduction to the Minimalist Program

Developing a TT-MCTAG for German with an RCG-based Parser

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

A Version Space Approach to Learning Context-free Grammars

The Strong Minimalist Thesis and Bounded Optimality

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Compositional Semantics

Character Stream Parsing of Mixed-lingual Text

Feature-Based Grammar

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

An Interactive Intelligent Language Tutor Over The Internet

Constraining X-Bar: Theta Theory

A Usage-Based Approach to Recursion in Sentence Processing

Construction Grammar. University of Jena.

The College Board Redesigned SAT Grade 12

What the National Curriculum requires in reading at Y5 and Y6

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Theoretical Syntax Winter Answers to practice problems

Organizing Comprehensive Literacy Assessment: How to Get Started

Language properties and Grammar of Parallel and Series Parallel Languages

The Interface between Phrasal and Functional Constraints

Ontological spine, localization and multilingual access

Type-driven semantic interpretation and feature dependencies in R-LFG

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Argument structure and theta roles

Analysis of Probabilistic Parsing in NLP

Ch VI- SENTENCE PATTERNS.

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Computational Evaluation of Case-Assignment Algorithms

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Language acquisition: acquiring some aspects of syntax.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A relational approach to translation

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Multiple case assignment and the English pseudo-passive *

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

EAGLE: an Error-Annotated Corpus of Beginning Learner German

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Hyperedge Replacement and Nonprojective Dependency Structures

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

LING 329 : MORPHOLOGY

South Carolina English Language Arts

Words come in categories

Author: Fatima Lemtouni, Wayzata High School, Wayzata, MN

Specifying Logic Programs in Controlled Natural Language

TRAITS OF GOOD WRITING

The History of Language Teaching

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Replace difficult words for Is the language appropriate for the. younger audience. For audience?

Pseudo-Passives as Adjectival Passives

Underlying and Surface Grammatical Relations in Greek consider

English Language and Applied Linguistics. Module Descriptions 2017/18

First Grade Curriculum Highlights: In alignment with the Common Core Standards

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Common Core State Standards for English Language Arts

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Facing our Fears: Reading and Writing about Characters in Literary Text

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Highlighting and Annotation Tips Foundation Lesson

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Copyright Corwin 2015

Lower and Upper Secondary

Ontologies vs. classification systems

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Prediction of Maximal Projection for Semantic Role Labeling

Hindi Aspectual Verb Complexes

AQUA: An Ontology-Driven Question Answering System

Transcription:

Ling 566 Oct 3, 2017 Context-Free Grammar

Overview Two insufficient theories Formal definition of CFG Constituency, ambiguity, constituency tests Central claims of CFG Weaknesses of CFG Reading questions

Insufficient Theory #1 A grammar is simply a list of sentences. What s wrong with this?

Insufficient Theory #2: FSMs the noisy dogs left D A N V the noisy dogs chased the innocent cats D A N V D A N a* = {ø, a, aa, aaa, aaaa,... } a + = {a, aa, aaa, aaaa,... } (D) A* N V ((D) A* N)

A Finite State Machine D N V D N A V A V

What does a theory do? Monolingual Model grammaticality/acceptability Model relationships between sentences (internal structure) Multilingual Model relationships between languages Capture generalizations about possible languages

Summary Grammars as lists of sentences: Runs afoul of creativity of language Grammars as finite-state machines: No representation of structural ambiguity Misses generalizations about structure (Not formally powerful enough) Next attempt: Context-free grammar

Chomsky Hierarchy Type 0 Languages Context-Sensitive Languages Context-Free Languages Regular Languages

Context-Free Grammar A quadruple: <C,Σ,P,S > C: set of categories Σ: set of terminals (vocabulary) P: set of rewrite rules α β 1,β 2,...,β n S in C: start symbol For each rule α β 1, β 2,...,β n P α C; β i C Σ; 1 i n

A Toy Grammar RULES S NP VP NP VP (D) A* N PP* V (NP) (PP) LEXICON D: the, some A: big, brown, old N: birds, fleas, dog, hunter, I V: attack, ate, watched PP P NP P: for, beside, with

Structural Ambiguity I saw the astronomer with the telescope.

Structure 1: PP under VP S NP VP N V NP PP I saw D N P NP the astronomer with D N the telescope

Structure 1: PP under NP S NP VP N V NP I saw D N PP the astronomer P NP with D N the telescope

Constituents How do constituents help us? (What s the point?) What aspect of the grammar determines which words will be modeled as a constituent? How do we tell which words to group together into a constituent? What does the model claim or predict by grouping words together into a constituent?

Recurrent Patterns Constituency Tests The quick brown fox with the bushy tail jumped over the lazy brown dog with one ear. Coordination The quick brown fox with the bushy tail and the lazy brown dog with one ear are friends. Sentence-initial position The election of 2000, everyone will remember for a long time. Cleft sentences It was a book about syntax they were reading.

General Types of Constituency Tests Distributional Intonational Semantic Psycholinguistic... but they don t always agree.

Central claims implicit in CFG formalism: 1. Parts of sentences (larger than single words) are linguistically significant units, i.e. phrases play a role in determining meaning, pronunciation, and/or the acceptability of sentences. 2. Phrases are contiguous portions of a sentence (no discontinuous constituents). 3. Two phrases are either disjoint or one fully contains the other (no partially overlapping constituents). 4. What a phrase can consist of depends only on what kind of a phrase it is (that is, the label on its top node), not on what appears around it.

Claims 1-3 characterize what is called phrase structure grammar Claim 4 (that the internal structure of a phrase depends only on what type of phrase it is, not on where it appears) is what makes it context-free. There is another kind of phrase structure grammar called context-sensitive grammar (CSG) that gives up 4. That is, it allows the applicability of a grammar rule to depend on what is in the neighboring environment. So rules can have the form A X, in the context of Y_Z.

Possible Counterexamples To Claim 2 (no discontinuous constituents): A technician arrived who could solve the problem. To Claim 3 (no overlapping constituents): I read what was written about me. To Claim 4 (context independence): - He arrives this morning. - *He arrive this morning. - *They arrives this morning.

A Trivial CFG S NP VP NP VP D N V NP D: the V: chased N: dog, cat

Trees and Rules C 0 C 1... C n. is a well-formed nonlexical tree if (and only if). C 1,..., C n are well-formed trees, and C 0 C 1...Cn is a grammar rule.

Bottom-up Tree Construction D: the V: chased N: dog, cat D V N N the chased dog cat

NP D N VP V NP NP NP VP D N D N V NP the dog the cat chased D N the cat

S NP VP S NP VP D N V NP the dog chased D N the cat

Top-down Tree Construction S NP VP NP D N VP V NP S NP VP NP VP D (twice) N V NP

S NP VP D N V NP D N

D V N N the chased dog cat

S NP VP D N V NP the dog chased D N the cat

Weaknesses of CFG (w/atomic node labels) It doesn t tell us what constitutes a linguistically natural rule Rules get very cumbersome once we try to deal with things like agreement and transitivity. It has been argued that certain languages (notably Swiss German and Bambara) contain constructions that are provably beyond the descriptive capacity of CFG. VP PNP NP VP S

Agreement & Transitivity S! NP-SG VP-SG VP-SG! IV-SG S! NP-PL VP-PL VP-PL! IV-PL NP-SG! (D) NOM-SG VP-SG! TV-SG NP NP-PL! (D) NOM-PL VP-PL! TV-PL NP NOM-SG! NOM-SG PP VP-SG! DTV-SG NP NP NOM-PL! NOM-PL PP VP-PL! DTV-PL NP NP NOM-SG! N-SG VP-SG! CCV-SG S NOM-PL! N-PL VP-PL! CCV-PL S NP! NP-SG VP-SG! VP-SG PP NP! NP-PL VP-PL! VP-PL PP......

Shieber 1985 Swiss German example:... mer d chind em Hans es huus lönd hälfe aastriiche... we the children-acc Hans-dat the hous-acc let help paint... we let the children help Hans paint the house Cross-serial dependency: let governs case on children help governs case on Hans paint governs case on house

Shieber 1985 Define a new language f(sg): f(d chind) = a f(jan säit das mer) = w f(em Hans) = b f(es huus) = x f(lönde) = c f(aastriiche) = y f(hälfe) = d f([other]) = z Let r be the regular language wa b xc d y f(sg) r = wa m b n xc m d n y wa m b n xc m d n y is not context free. But context free languages are closed under intersection. f(sg) (and by extension Swiss German) must not be context free.

Strongly/weakly CF A language is weakly context-free if the set of strings in the language can be generated by a CFG. A language is strongly context-free if the CFG furthermore assigns the correct structures to the strings. Shieber s argument is that SW is not weakly context-free and a fortiori not strongly contextfree. Bresnan et al (1983) had already argued that Dutch is strongly not context-free, but the

On the other hand... It s a simple formalism that can generate infinite languages and assign linguistically plausible structures to them. Linguistic constructions that are beyond the descriptive power of CFG are rare. It s computationally tractable and techniques for processing CFGs are well understood.

So... CFG has been the starting point for most types of generative grammar. The theory we develop in this course is an extension of CFG.

Overview Two insufficient theories Formal definition of CFG Constituency, ambiguity, constituency tests Central claims of CFG Weaknesses of CFG Reading questions

Reading Questions The chapter stated that 'ambiguous sentences are often one which have multiple possible valid divisions of constituents'. This is a very neat and tidy way to think of ambiguity, and it made me wonder if ambiguity-ratings had ever been used in NLP, perhaps in a sort of "if ambiguity rating exceeds threshold, look farther than usual for additional context" or something along those lines. When we apply the CFG to build a tree for a sentence, are we supposed to build all the possible tree structure based on the CFG rules rather than using our own intuition to build the only "correct" tree?

Reading Questions A lexical structure is well-formed if a particular word is listed under its corresponding grammatical category. Via the concept of well-formedness, one can deduce the wellformedness of non-lexical trees by the Theorem given in this page. 1. Is there a rigorous proof of this Theorem? Or perhaps a clarified version of definition of well-formedness? 2. 'V--like' is given to exemplify well-formedness of a lexical structure. Suppose there exists another lexical structure 'P--like', using one of the other connotations of the word like in a different grammatical category, within the same S as the former structure is. Will well-formedness be defied?

Reading Questions What "context" is "context-free grammar" free of? Why is headedness a problem for CFG? On page 44, one of the suggested further readings is a work by Chomsky arguing against the use of context-free grammars. I am a bit confused about how Chomsky s approach differs from a CFG and I was wondering if we could break down the differences between a CFG and the Chomskyian proposal.

Reading Questions Although the rules in (36), (39), (40) and (41) are redundant for human beings, they are not a problem for computer. So, does CFG play a more important role in computational linguistics? I was wondering why it was that the CFG treats sentence parsing and sentence generation equally. How are the "topdown" and "bottom-up" processes equally efficient? parsing vs generating: What drives generation?

Reading Questions What's the point of NOM? I am also curious about the introduction of NOM on page 31. What is the process for coming up with such nonlexical categories and their corresponding rules? Why aren't we using the more general X' notation? Why is VP -> VP PP better than a version with PP*? Why have an explicit CONJ node in the tree on p. 21?

Reading Questions Are some languages more head-driven than others? English is order-sensitive in regards to the subject and object of the sentences, so is it inherent in the CFG? Or regarding other languages that are not order-sensitive, how is the CFG applied to them? After reading about CFGs and generative grammars in p.37 I am wondering about if there has been attempts to make crosslanguage grammars and how do they look like.

Reading Questions Is there an effort to explicitly define Chomsky's Universal Grammar in anyway or is UG simply an abstract/loosely defined principle? Is it not a useful problem since the Universal Grammar itself isn't a natural language and might not be constrained to the same rules of the languages we seek to describe?

Reading Questions If I have a lexicon and two context free grammars in hand, what is the criteria based on which I should decide which of the two is better? Am I correct if I assumed that better CFG: (1) should be less ambiguous, (2) should provide better modeling of grammatically correct sentences of the language in question, and (3) should not accept grammatically incorrect sentences? Any other criteria to be considered?

Reading Questions The statement on P.40 that "there are verbs that only appear in other environments; for example, some verbs require following PPs or Ss" makes me wonder whether it is possible to generalize thorough rules to represent natural language. It appears to me that there are too many possible combinations of constituents to generalize. There are many cases in which certain verbs with similar meanings can not be interchanged. Second language learners as myself often come up with grammatically correct expressions that sounds weird to native speakers. And native speakers often find it difficult to explain the reason. I think this is because language is often used by chunk, and I think it is far more complicated than what CFG or Transformational Grammar can generalize.

Reading Questions p.47: The textbook states that subject-verb agreement is handled by assuming that number is an "intrinsic property of nouns, but not of verbs." What is the motivation for nouns having this intrinsic property instead of verbs? Is it just arbitrary?

Reading Questions In Section 2.4.1, the text mentions the below rule: X -> X+ CONJ X The interpretation for this is that elements of a category can be conjoined in the same way. What can we conclude about the conjunction of elements that do not belong to the same category? Is it correct to conclude that conjunction of elements of different categories is ungrammatical? "Coordinate conjunction is used as a test for constituency." I would like to discuss this in class so that I can comprehend it better. Specifically, how does this test work? Is this a sufficient and complete condition for constituency?

Reading Questions Are there theories of grammar that revolve around chains of sentences that do not sound correct together? As a general notion, we understand that there are well-formed paragraphs in a similar way that we have well-formed sentences. Are there methods for defining what makes a paragraph sound "grammatically correct"? Are commas and punctuation ever considered as being part of a grammar, are they not also determining whether or not a sequence of strings is grammatically correct?