English Syntax and Context Free Grammars. COMP-550 Oct 10, 2017

Similar documents
CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Context Free Grammars. Many slides from Michael Collins

Parsing of part-of-speech tagged Assamese Texts

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Constraining X-Bar: Theta Theory

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Argument structure and theta roles

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Multiple case assignment and the English pseudo-passive *

"f TOPIC =T COMP COMP... OBJ

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Construction Grammar. University of Jena.

The Interface between Phrasal and Functional Constraints

Natural Language Processing. George Konidaris

Chapter 4: Valence & Agreement CSLI Publications

An Interactive Intelligent Language Tutor Over The Internet

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Developing a TT-MCTAG for German with an RCG-based Parser

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

LNGT0101 Introduction to Linguistics

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Parsing natural language

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

Developing Grammar in Context

BULATS A2 WORDLIST 2

Some Principles of Automated Natural Language Information Extraction

Hindi Aspectual Verb Complexes

Theoretical Syntax Winter Answers to practice problems

Hyperedge Replacement and Nonprojective Dependency Structures

Character Stream Parsing of Mixed-lingual Text

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Proof Theory for Syntacticians

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Derivational and Inflectional Morphemes in Pak-Pak Language

Ch VI- SENTENCE PATTERNS.

Analysis of Probabilistic Parsing in NLP

Specifying a shallow grammatical for parsing purposes

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Compositional Semantics

Emmaus Lutheran School English Language Arts Curriculum

Language acquisition: acquiring some aspects of syntax.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

EAGLE: an Error-Annotated Corpus of Beginning Learner German

LTAG-spinal and the Treebank

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Adapting Stochastic Output for Rule-Based Semantics

A relational approach to translation

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

Feature-Based Grammar

What the National Curriculum requires in reading at Y5 and Y6

Thornhill Primary School - Grammar coverage Year 1-6

The Role of the Head in the Interpretation of English Deverbal Compounds

Pseudo-Passives as Adjectival Passives

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

Course Outline for Honors Spanish II Mrs. Sharon Koller

An Introduction to the Minimalist Program

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Minimalism is the name of the predominant approach in generative linguistics today. It was first

BASIC ENGLISH. Book GRAMMAR

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Words come in categories

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

A Computational Evaluation of Case-Assignment Algorithms

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A Version Space Approach to Learning Context-free Grammars

The CYK -Approach to Serial and Parallel Parsing

Specifying Logic Programs in Controlled Natural Language

On the Notion Determiner

Prediction of Maximal Projection for Semantic Role Labeling

Underlying and Surface Grammatical Relations in Greek consider

Formulaic Language and Fluency: ESL Teaching Applications

MODELING DEPENDENCY GRAMMAR WITH RESTRICTED CONSTRAINTS. Ingo Schröder Wolfgang Menzel Kilian Foth Michael Schulz * Résumé - Abstract

Chapter 9 Banked gap-filling

Accurate Unlexicalized Parsing for Modern Hebrew

Sample Goals and Benchmarks

Advanced Grammar in Use

Ensemble Technique Utilization for Indonesian Dependency Parser

THE VERB ARGUMENT BROWSER

A Usage-Based Approach to Recursion in Sentence Processing

Copyright and moral rights for this thesis are retained by the author

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Presentation Exercise: Chapter 32

The Smart/Empire TIPSTER IR System

AQUA: An Ontology-Driven Question Answering System

Transcription:

English Syntax and Context Free Grammars COMP-550 Oct 10, 2017

Outline What is Syntax English Syntax Context Free Grammars Parsing 2

Syntax How words can be arranged together to form a grammatical sentence. This is a valid sentence. *A sentence this valid is. An asterisk is used to indicate ungrammaticality. One view of syntax: Generate all and exactly those sentences of a language which are grammatical 3

The First Grammarian Panini (Pāṇini) from the 4 th century B.C. developed a grammar for Sanskrit. Source: https://archive.org/details/ashtadhyayitrans06paniuoft 4

What We Don t Mean by Grammar Rules or guides for how to write properly e.g., These style guides are prescriptive. We are concerned with descriptive grammars of naturally occurring language. 5

Basic Definitions Terms grammaticality prescriptivism vs descriptivism constituency grammatical relations subcategorization 6

Constituency A group of words that behave as a unit Noun phrases: computational linguistics, it, Justin Trudeau, three people on the bus, Jean-Claude Van Damme, the Muscles from Brussels Adjective phrases: blue, purple, very good, ridiculously annoying and tame 7

Tests for Constituency 1. They can appear in similar syntactic environments. I saw it Jean-Claude Van Damme, the Muscles from Brussels three people on the bus *Van *on the 8

Tests for Constituency 2. They can be placed in different positions or replaced in a sentence as a unit. [Jean-Claude Van Damme, the Muscles from Brussels], beat me up. It was [Jean-Claude Van Damme, the Muscles from Brussels], who beat me up. I was beaten up by [Jean-Claude Van Damme, the Muscles from Brussels]. He beat me up. (i.e., J-C V D, the M from B) 9

Tests for Constituency 3. It can be used to answer a question. Who beat you up? [Jean-Claude Van Damme, the Muscles from Brussels] *[the Muscles from] 10

Grammatical Relations Relationships between different constituents Subject Jean-Claude Van Damme relaxed. The wallet was stolen by a thief. (Direct) object The boy kicked the ball. Indirect object She gave him a good beating. There are many other grammatical relations. 11

Subcategorization Notice that different verbs seem to require a different number of arguments: relax 1 subj steal* 2 subj, dobj kick 2 subj, dobj give 3 subj, iobj, dobj *the passive changes the subcategorization of the verb 12

More Subcategorization Some other possibilities: want 2 subj, inf. clause I want to learn about computational linguistics. apprise 3 subj, obj, pobj with of The minister apprised him of the new developments. different 2 subj, pobj with from/than/to This course is different [from/than/to] what I expected. 13

Short Exercise Identify the prepositional phrase in the following sentence. Give arguments for why it is a constituent. The next assignment is due on Friday, October 20th. 14

Formal Grammars Since we are computational linguists, we will use a formal computational model of grammar to account for these and other syntactic concerns. Formal grammar Rules that generate a set of strings that make up a language. (In this context, language simply refers to a set of strings.) Why? Formal understanding lets us develop appropriate algorithms for dealing with syntax. Implications for cognitive science/language learning 15

FSAs and Regular Grammars We ve already seen examples of languages defined by formal grammars before this class! FSAs to describe aspects of English morphology An FSA generates a regular language FSAs correspond to a class of formal grammars called regular grammars To describe the syntax of natural languages (with multiple constituents, subcategorization, etc.), we need a more powerful class of formal grammars context free grammars (CFGs). 16

Context Free Grammars (CFG)s Rules that describe what possible sentences are: S NP VP NP this VP V V is kicks jumps rocks 17

Constituent Tree Trees (and sentences) generated by the previous rules: S NP VP NP this VP V V is rules jumps rocks S S NP VP NP VP Non-terminals this V this V rules rocks Terminals 18

Formal Definition of a CFG A 4-tuple: N Σ set of non-terminal symbols set of terminal symbols R set of rules or productions in the form A Σ N, and A N S a designated start symbol, S N 19

Extended Example Let s develop a CFG that can account for verbs with different subcategorization frames: intransitive verbs relax 1 subj transitive verbs steal, kick 2 subj, dobj ditransitive verbs give 3 subj, iobj, dobj 20

Undergeneration and Overgeneration Problems with above grammar: Undergeneration: misses valid English sentences The boy kicked the ball softly. The thief stole the wallet with ease. Overgeneration: generates ungrammatical sentences *The boy kick the ball. *The thieves steals the wallets. 21

Extension 1 Let s add adverbs and prepositional phrases to our grammar 22

Recursion Consider the following sentences: The dog barked. I know that the dog barked. You know that I know that the dog barked. He knows that you know that I know that the dog barked. In general: S -> NP VP VP -> Vthat Sthat VP -> Vintr Vthat-> know Vintr -> barked Sthat -> that S 23

Recursion This recursion in the syntax of English means that sentences can be infinitely long (theoretically). For a given sentence S, you can always make it longer by adding [I/you/he know(s) that S]. In practice, the length is limited because we have limited attention span/memory/processing power. 24

Exercise Let s try to fix the subject-verb agreement issue: Present tense: Singular third-person subject -> verb has affix of s or es Otherwise -> base form of verb (to be is an exception, along with other irregular verbs) 25

Dependency Grammar Grammatical relations induce a dependency relation between the words that are involved. The student studied for the exam. Each phrase has a head word. the student studied for the exam the student for the exam the exam 26

Dependency Grammar We can represent the grammatical relations between phrases as directed edges between their heads. det subject pp arg prep. obj det The student studied for the exam. This lets us get at the relationships between words and phrases in the sentence more easily. Who/what are involved in the studying event? student, for the exam 27

Converting between Formalisms Dependency trees can be converted into a standard constituent tree deterministically (if the dependency edges don t cross each other). Constituent trees can be converted into a dependency tree, if you know what is the head of the constituent. Let s convert some of our previous examples 28

Crossing Dependencies Dependencies can cross. Especially if the language has freer word order: Er hat mich versucht zu erreichen. Er hat versucht mich zu erreichen. He tried to reach me. These have the same literal meaning. 29

Crossing Dependencies Example What would the dependency edges be in these cases? Er hat versucht, mich zu erreichen. HE HAS TRIED ME TO REACH Er hat mich versucht zu erreichen. HE HAS ME TRIED TO REACH Notice the discontinuous constituent that results in the second case. 30

Are Natural Languages CFGs? Recall that a formal language is defined to be a set of strings constructed over a specified vocabulary Are natural languages CFGs? i.e., can we define each natural language (e.g., English, French, German, etc.) as a CFG? Other possibilities: Chomsky hierarchy https://en.wikipedia.org/wiki/chomsky_hierarchy 31

Cross-serial Dependencies Swiss German (Shieber, 1985) and Bambara (Culy, 1985) have structures that generate strings which cannot be captured by CFGs (cross-serial dependencies): a m b n c m d n Relies on following assumption: m and n can be arbitrarily large values strings are either in a language or not (grammatical or ungrammatical) May not be the most useful question to ask after all 32

Parsing Input sentence, grammar output parse tree Parsing into a CFG: constituent parsing Parsing into a dependency representation: dependency parsing Difficulty: need an efficient way to search through plausible parse trees for the input sentence 33

Parsing into a CFG Given: 1. CFG 2. A sentence made up of words that are in the terminal vocabulary of the CFG Task: Recover all possible parses of the sentence. Why all possible parses? 34

Syntactic Ambiguity I shot the elephant in my pyjamas. S S NP VP NP VP I VP PP I V NP V NP in NP shot NP PP shot the elephant my pyjamas the elephant in NP my pyjamas 35

Types of Parsing Algorithms Top-down Start at the top of the tree, and expand downwards by using rewrite rules of the CFG to match the tokens in the input string e.g., Earley parser Bottom-up Start from the input words, and build ever-bigger subtrees, until a tree that spans the whole sentence is found e.g., CYK algorithm, shift-reduce parser Key to efficiency is to have an efficient search strategy that avoids redundant computation 36

CYK Algorithm Cocke-Younger-Kasami algorithm Steps: A dynamic programming algorithm partial solutions are stored and efficiently reused to find all possible parses for the entire sentence. Also known as the CKY algorithm 1. Convert CFG to an appropriate form 2. Set up a table of possible constituents 3. Fill in table 4. Read table to recover all possible parses 37