Constituency, Trees, Context-free Grammar

Similar documents
CS 598 Natural Language Processing

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

An Introduction to the Minimalist Program

Parsing of part-of-speech tagged Assamese Texts

Constraining X-Bar: Theta Theory

Proof Theory for Syntacticians

Grammars & Parsing, Part 1:

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Natural Language Processing. George Konidaris

Control and Boundedness

Theoretical Syntax Winter Answers to practice problems

Authors note Chapter One Why Simpler Syntax? 1.1. Different notions of simplicity

"f TOPIC =T COMP COMP... OBJ

LING 329 : MORPHOLOGY

Developing a TT-MCTAG for German with an RCG-based Parser

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

The Strong Minimalist Thesis and Bounded Optimality

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Compositional Semantics

Context Free Grammars. Many slides from Michael Collins

Accurate Unlexicalized Parsing for Modern Hebrew

Prediction of Maximal Projection for Semantic Role Labeling

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A Grammar for Battle Management Language

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Hindi Aspectual Verb Complexes

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Construction Grammar. University of Jena.

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Heads and history NIGEL VINCENT & KERSTI BÖRJARS The University of Manchester

Dependency, licensing and the nature of grammatical relations *

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

An Interactive Intelligent Language Tutor Over The Internet

Feature-Based Grammar

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Multiple case assignment and the English pseudo-passive *

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Derivational and Inflectional Morphemes in Pak-Pak Language

English Language and Applied Linguistics. Module Descriptions 2017/18

On the Notion Determiner

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

- «Crede Experto:,,,». 2 (09) ( '36

The Inclusiveness Condition in Survive-minimalism

Chapter 4: Valence & Agreement CSLI Publications

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

LTAG-spinal and the Treebank

SOME MINIMAL NOTES ON MINIMALISM *

Analysis of Probabilistic Parsing in NLP

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Some Principles of Automated Natural Language Information Extraction

Type-driven semantic interpretation and feature dependencies in R-LFG

Interfacing Phonology with LFG

Frequency and pragmatically unmarked word order *

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Florida Reading Endorsement Alignment Matrix Competency 1

A Usage-Based Approach to Recursion in Sentence Processing

Underlying and Surface Grammatical Relations in Greek consider

The Interface between Phrasal and Functional Constraints

Word Formation is Syntactic: Raising in Nominalizations

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Summer Assignment AP Literature and Composition Mrs. Schwartz

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Language acquisition: acquiring some aspects of syntax.

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Word Stress and Intonation: Introduction

Modeling full form lexica for Arabic

Update on Soar-based language processing

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

Derivations (MP) and Evaluations (OT) *

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

UCLA UCLA Electronic Theses and Dissertations

The College Board Redesigned SAT Grade 12

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Argument structure and theta roles

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Switched Control and other 'uncontrolled' cases of obligatory control

Adapting Stochastic Output for Rule-Based Semantics

Language Acquisition Chart

Guidelines for Writing an Internship Report

Pseudo-Passives as Adjectival Passives

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Som and Optimality Theory

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Transcription:

Constituency, Trees, Context-free Grammar Weiwei Sun Institute of Computer Science and Technology Peking University March 18, 2015

Administration Grading: Regular attendance of the lectures is required 3 4 assignments Mid-term project Take-home exam Bibliography Andrew Carnie. Syntax: A Generative Introduction Mary Dalrymple. Lexical Functional Grammar Course website http://www.icst.pku.edu.cn/lcwm/course/fs/ Email: wsun106@163.com Weiwei Sun Constituency, Trees, Context-free Grammar 2/26

Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 3/26

What is this course about? What aspects of language should be the focus of our linguistic study? Theory of Language Structure Structural properties of natural languages Theory of Language Acquisition How children acquire their native language(s) Theory of Language Use How linguistic and nonlinguistic knowledge interact in speech comprehension and production. Developing a theory of language structure is prior to the other two. phonetics, phonology, morphology, syntax, semantics, pragmatics,... How words are organized into phrases and sentences. Weiwei Sun Constituency, Trees, Context-free Grammar 3/26

Syntax: What does it mean? We can view syntax/syntactic theory in a number of ways, two of which are the following: Psychological model: syntactic structures correspond to what is in the heads of speakers and hearers Computational model: syntactic structures are formal objects which can be mathematically treated/manipulated Syntax attempts to capture the nature of rules with which we generate strings of those words (weak generative power) structures which license strings of those words (strong generative power) Weiwei Sun Constituency, Trees, Context-free Grammar 4/26

The Generative Revolution Some History Writings on grammar go back at least 3000 years Ferdinand de Saussure Towards modern syntax Structuralism (1920s-30s): Bloomfield Distributionalism (1950s): Hockett, Harris Categorial grammar (1930s): Adjukiewicz Dependency grammar (1930s): Tesnière Noam Chomsky s work in the 1950s radically changed linguistics, making syntax central. The theory we will study is in the tradition started by Chomsky, but diverges from his work in many ways. Weiwei Sun Constituency, Trees, Context-free Grammar 5/26

The Generative Revolution Main Tenets of Generative Grammar Grammars should be formulated precisely and explicitly. Grammars must be tested against invented data, not just attested examples. The theory of grammar is a theory of human linguistic abilities. Chomsky (Syntactic Structures) By pushing a precise but inadequate formulation to an unacceptable conclusion, we can often expose the exact source of this inadequacy and, consequently, gain a deeper understanding of the linguistic data. [...] Obscure and intuition-bound notions can neither lead to absurd conclusions nor provide new and correct ones, [...] Weiwei Sun Constituency, Trees, Context-free Grammar 6/26

Generative Grammar Aspects of the Theory of Syntax A grammar of a language purports to be a description of the ideal speaker-hearer s intrinsic competence. If the grammar is, furthermore, perfectly explicit in other words, if it does not rely on the intelligence of the understanding reader but rather provides an explicit analysis of his contribution we may (somewhat redundantly) call it a generative grammar. Weiwei Sun Constituency, Trees, Context-free Grammar 7/26

Generative Grammar Chomsky s Syntactic Structures Main task for linguist: separate grammatical from ungrammatical strings Two issues: How to define grammatical strings? Corpus-based or statistical methods fail because of the creative nature of language Grammaticality cannot be determined by meaningfulness His proposed method: native speaker judgments What kind of system can describe all grammatical strings of a language? It must consist of a finite set of rules be descriptively adequate be explanatory Weiwei Sun Constituency, Trees, Context-free Grammar 8/26

Descriptive Adequacy Some researchers try to explain the underlying mechanisms, but we are most concerned with being able to describe linguistic phenomena, ideally: Providing accurate structural descriptions for well-formed sentences Giving an explicit encoding of a language Approaching broad coverage, i.e., aiming to describe all of the well-formed sentences possible in a language Weiwei Sun Constituency, Trees, Context-free Grammar 9/26

Adequacy of a Linguistic Theory How to test whether a linguistic theory is adequate? Can it account for all of the data? Can it account for the data in an elegant, straightforward way, or does it lead to extreme complexity? Can the same system be used to construct grammars for all languages? Weiwei Sun Constituency, Trees, Context-free Grammar 10/26

Precise Encoding Mathematical formalism Formal ways to generate sets of strings or structures Precisely define: elementary structures ways of combining those structures Weiwei Sun Constituency, Trees, Context-free Grammar 11/26

Family Tree of Syntactic Theories Early Transformational Grammar (1955-1964) Standard Theory TG (1964-1967) Extended ST (1967-1977) Generative Semantics (1966-1975) Revised EST (1977-1981) GB (1981-1993) Minimal Program (1993-present) GPSG (1979-1985) HPSG (1986-present) Realistic TG (1978-1980) LFG (1980-present) Weiwei Sun Constituency, Trees, Context-free Grammar 12/26

Course schedule Context-free Grammar Government and Binding Structural relations X-bar theory Constraining X-bar theory: Lexicon Movement Lexical Functional Grammar Functional structure Constituent structure Syntactic correspondences Long-distance dependencies Coordination Tree-Adjoining Grammar Head-driven Phrase-Structure Grammar Combinatory Categorial Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 13/26

Quote Edward Sapir All grammars leak! Weiwei Sun Constituency, Trees, Context-free Grammar 14/26

Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 15/26

Immediate Constituent Analysis L. Bloomfield N. Chomsky A constituent is a word or a group of words that functions as a single unit within a hierarchical structure. Immediate Constituent Analysis divides up a sentence into major parts or immediate constituents, and these constituents are in turn divided into further immediate constituents Weiwei Sun Constituency, Trees, Context-free Grammar 15/26

Constituency Test Replacement If a group of words can be replaced with a single word, Stand Alone If a group of words can stand alone in response to a question, Movement If a group of words can be moved around in the sentence, Coordination If you can coordinate a group of words with a similar group of words, Sometimes, constituency tests fail! Weiwei Sun Constituency, Trees, Context-free Grammar 16/26

Syntactic Category We would like some way to say that two groups of words are of the same type. For this, we will talk about different categories. Lexical category (Part-of-speech) How a word is going to function in a sentence? Phrasal category How to determine part-of-speech? Distributional Criteria Morphological distribution Syntactic distribution How to determine phrasal category? Weiwei Sun Constituency, Trees, Context-free Grammar 17/26

Phrase-structure Tree The result of IC-analysis is often presented as a phrase-structure tree that reveals the hierarchical immediate constituent structure of the sentence. Example TP NP VP D AdjP N V NP The AdvP boy kissed D N Adv Adj the platypus very small Weiwei Sun Constituency, Trees, Context-free Grammar 18/26

How to Draw a Tree Bottom-up Identify the parts-of-speech. Identify what modifies what. Start linking together items that modify one another. Determine the phrasel categories. Keep applying the rules until you have attached all the modifiers to the modified constituents. How to perform a top-down procedure? Weiwei Sun Constituency, Trees, Context-free Grammar 19/26

Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 20/26

Context-free Phrase-structure Grammar A context-free phrase-structure grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. The block structure of sentences is captured in a natural way. The basic recursive structure of sentences is described exactly. Weiwei Sun Constituency, Trees, Context-free Grammar 20/26

Phrase Structure Grammar The formalism of context-free grammars was developed in the mid-1950s by Noam Chomsky. Phrase structure grammars are also known as constituency grammars. There are probably languages that cannot be described by a context-free grammar (CFG) Shown in the 1980s to be correct, for at least for Swiss German English may be within the descriptive power of a CFG But there may be other reasons beyond formal power to reject CFGs for representing natural languages... Account for the tree-like structure that sentences have. Weiwei Sun Constituency, Trees, Context-free Grammar 21/26

Definition of Context-free Grammars Four components in a grammatical description of a language: 1. A finite set of symbols that form the strings of a language. We call this alphabet the terminals or terminal symbols. In terms of syntactic analysis, this alphabet is the lexicon. 2. A finite set of variables, also called nonterminals or syntactic categories. Each variable represents a class of strings, i.e., a set of strings. 3. START symbol: One of the variables represents the language being defined. Other variables represent auxiliary classes of strings that are used to help define the language. 4. A finite set of productions or rules that represent the recursive definition of a language. Each production consists of: 4.1 A variable h 4.2 The production symbol 4.3 A string of zero or more terminals and variables. This string represents one way to form strings in the class of h. Leave terminals unchanged Substitute each variable with any string in it. Weiwei Sun Constituency, Trees, Context-free Grammar 22/26

Definition of Context-free Grammars The four components form a context-free grammar. We represent a CFG G by its four components, G = (V, T, P, S). 1. V : variables 2. T : terminals 3. P : productions 4. S: START Weiwei Sun Constituency, Trees, Context-free Grammar 23/26

An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Weiwei Sun Constituency, Trees, Context-free Grammar 24/26

An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Derivations We can infer the structure of a string. We can define the language of a grammar by applying the productions. S NP, V P NN, V P 警察, V P... Weiwei Sun Constituency, Trees, Context-free Grammar 24/26

An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Derivations We can infer the structure of a string. We can define the language of a grammar by applying the productions. S NP, V P NN, V P 警察, V P... S NP NN 警察 VV 调查 VP NN 原因 Weiwei Sun Constituency, Trees, Context-free Grammar 24/26

An Example S S S NP VP NP VP NP VP NN ADVP VP NN ADVP VP NN VV NN 警察 AD VV NN 警察 AD VV NN 警察 调查 原因 正在 调查 原因 S 详细 调查 原因 S NP VP NP VP NN ADVP VP NN ADVP VP 警察 AD ADVP VP 警察 AD VV NP 正在 AD VV NP 正在 调查 NN NN 详细 调查 NN NN 事故 原因 事故 原因 Weiwei Sun Constituency, Trees, Context-free Grammar 25/26

Reading Chap. 3. Syntax: A Generative Introduction. * Chap. 1. Aspects of the Theory of Syntax. Weiwei Sun Constituency, Trees, Context-free Grammar 26/26