Constituency, Trees, Context-free Grammar Weiwei Sun Institute of Computer Science and Technology Peking University March 18, 2015
Administration Grading: Regular attendance of the lectures is required 3 4 assignments Mid-term project Take-home exam Bibliography Andrew Carnie. Syntax: A Generative Introduction Mary Dalrymple. Lexical Functional Grammar Course website http://www.icst.pku.edu.cn/lcwm/course/fs/ Email: wsun106@163.com Weiwei Sun Constituency, Trees, Context-free Grammar 2/26
Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 3/26
What is this course about? What aspects of language should be the focus of our linguistic study? Theory of Language Structure Structural properties of natural languages Theory of Language Acquisition How children acquire their native language(s) Theory of Language Use How linguistic and nonlinguistic knowledge interact in speech comprehension and production. Developing a theory of language structure is prior to the other two. phonetics, phonology, morphology, syntax, semantics, pragmatics,... How words are organized into phrases and sentences. Weiwei Sun Constituency, Trees, Context-free Grammar 3/26
Syntax: What does it mean? We can view syntax/syntactic theory in a number of ways, two of which are the following: Psychological model: syntactic structures correspond to what is in the heads of speakers and hearers Computational model: syntactic structures are formal objects which can be mathematically treated/manipulated Syntax attempts to capture the nature of rules with which we generate strings of those words (weak generative power) structures which license strings of those words (strong generative power) Weiwei Sun Constituency, Trees, Context-free Grammar 4/26
The Generative Revolution Some History Writings on grammar go back at least 3000 years Ferdinand de Saussure Towards modern syntax Structuralism (1920s-30s): Bloomfield Distributionalism (1950s): Hockett, Harris Categorial grammar (1930s): Adjukiewicz Dependency grammar (1930s): Tesnière Noam Chomsky s work in the 1950s radically changed linguistics, making syntax central. The theory we will study is in the tradition started by Chomsky, but diverges from his work in many ways. Weiwei Sun Constituency, Trees, Context-free Grammar 5/26
The Generative Revolution Main Tenets of Generative Grammar Grammars should be formulated precisely and explicitly. Grammars must be tested against invented data, not just attested examples. The theory of grammar is a theory of human linguistic abilities. Chomsky (Syntactic Structures) By pushing a precise but inadequate formulation to an unacceptable conclusion, we can often expose the exact source of this inadequacy and, consequently, gain a deeper understanding of the linguistic data. [...] Obscure and intuition-bound notions can neither lead to absurd conclusions nor provide new and correct ones, [...] Weiwei Sun Constituency, Trees, Context-free Grammar 6/26
Generative Grammar Aspects of the Theory of Syntax A grammar of a language purports to be a description of the ideal speaker-hearer s intrinsic competence. If the grammar is, furthermore, perfectly explicit in other words, if it does not rely on the intelligence of the understanding reader but rather provides an explicit analysis of his contribution we may (somewhat redundantly) call it a generative grammar. Weiwei Sun Constituency, Trees, Context-free Grammar 7/26
Generative Grammar Chomsky s Syntactic Structures Main task for linguist: separate grammatical from ungrammatical strings Two issues: How to define grammatical strings? Corpus-based or statistical methods fail because of the creative nature of language Grammaticality cannot be determined by meaningfulness His proposed method: native speaker judgments What kind of system can describe all grammatical strings of a language? It must consist of a finite set of rules be descriptively adequate be explanatory Weiwei Sun Constituency, Trees, Context-free Grammar 8/26
Descriptive Adequacy Some researchers try to explain the underlying mechanisms, but we are most concerned with being able to describe linguistic phenomena, ideally: Providing accurate structural descriptions for well-formed sentences Giving an explicit encoding of a language Approaching broad coverage, i.e., aiming to describe all of the well-formed sentences possible in a language Weiwei Sun Constituency, Trees, Context-free Grammar 9/26
Adequacy of a Linguistic Theory How to test whether a linguistic theory is adequate? Can it account for all of the data? Can it account for the data in an elegant, straightforward way, or does it lead to extreme complexity? Can the same system be used to construct grammars for all languages? Weiwei Sun Constituency, Trees, Context-free Grammar 10/26
Precise Encoding Mathematical formalism Formal ways to generate sets of strings or structures Precisely define: elementary structures ways of combining those structures Weiwei Sun Constituency, Trees, Context-free Grammar 11/26
Family Tree of Syntactic Theories Early Transformational Grammar (1955-1964) Standard Theory TG (1964-1967) Extended ST (1967-1977) Generative Semantics (1966-1975) Revised EST (1977-1981) GB (1981-1993) Minimal Program (1993-present) GPSG (1979-1985) HPSG (1986-present) Realistic TG (1978-1980) LFG (1980-present) Weiwei Sun Constituency, Trees, Context-free Grammar 12/26
Course schedule Context-free Grammar Government and Binding Structural relations X-bar theory Constraining X-bar theory: Lexicon Movement Lexical Functional Grammar Functional structure Constituent structure Syntactic correspondences Long-distance dependencies Coordination Tree-Adjoining Grammar Head-driven Phrase-Structure Grammar Combinatory Categorial Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 13/26
Quote Edward Sapir All grammars leak! Weiwei Sun Constituency, Trees, Context-free Grammar 14/26
Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 15/26
Immediate Constituent Analysis L. Bloomfield N. Chomsky A constituent is a word or a group of words that functions as a single unit within a hierarchical structure. Immediate Constituent Analysis divides up a sentence into major parts or immediate constituents, and these constituents are in turn divided into further immediate constituents Weiwei Sun Constituency, Trees, Context-free Grammar 15/26
Constituency Test Replacement If a group of words can be replaced with a single word, Stand Alone If a group of words can stand alone in response to a question, Movement If a group of words can be moved around in the sentence, Coordination If you can coordinate a group of words with a similar group of words, Sometimes, constituency tests fail! Weiwei Sun Constituency, Trees, Context-free Grammar 16/26
Syntactic Category We would like some way to say that two groups of words are of the same type. For this, we will talk about different categories. Lexical category (Part-of-speech) How a word is going to function in a sentence? Phrasal category How to determine part-of-speech? Distributional Criteria Morphological distribution Syntactic distribution How to determine phrasal category? Weiwei Sun Constituency, Trees, Context-free Grammar 17/26
Phrase-structure Tree The result of IC-analysis is often presented as a phrase-structure tree that reveals the hierarchical immediate constituent structure of the sentence. Example TP NP VP D AdjP N V NP The AdvP boy kissed D N Adv Adj the platypus very small Weiwei Sun Constituency, Trees, Context-free Grammar 18/26
How to Draw a Tree Bottom-up Identify the parts-of-speech. Identify what modifies what. Start linking together items that modify one another. Determine the phrasel categories. Keep applying the rules until you have attached all the modifiers to the modified constituents. How to perform a top-down procedure? Weiwei Sun Constituency, Trees, Context-free Grammar 19/26
Outline Introduction Constituency, Trees Context-free Grammar Weiwei Sun Constituency, Trees, Context-free Grammar 20/26
Context-free Phrase-structure Grammar A context-free phrase-structure grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. The block structure of sentences is captured in a natural way. The basic recursive structure of sentences is described exactly. Weiwei Sun Constituency, Trees, Context-free Grammar 20/26
Phrase Structure Grammar The formalism of context-free grammars was developed in the mid-1950s by Noam Chomsky. Phrase structure grammars are also known as constituency grammars. There are probably languages that cannot be described by a context-free grammar (CFG) Shown in the 1980s to be correct, for at least for Swiss German English may be within the descriptive power of a CFG But there may be other reasons beyond formal power to reject CFGs for representing natural languages... Account for the tree-like structure that sentences have. Weiwei Sun Constituency, Trees, Context-free Grammar 21/26
Definition of Context-free Grammars Four components in a grammatical description of a language: 1. A finite set of symbols that form the strings of a language. We call this alphabet the terminals or terminal symbols. In terms of syntactic analysis, this alphabet is the lexicon. 2. A finite set of variables, also called nonterminals or syntactic categories. Each variable represents a class of strings, i.e., a set of strings. 3. START symbol: One of the variables represents the language being defined. Other variables represent auxiliary classes of strings that are used to help define the language. 4. A finite set of productions or rules that represent the recursive definition of a language. Each production consists of: 4.1 A variable h 4.2 The production symbol 4.3 A string of zero or more terminals and variables. This string represents one way to form strings in the class of h. Leave terminals unchanged Substitute each variable with any string in it. Weiwei Sun Constituency, Trees, Context-free Grammar 22/26
Definition of Context-free Grammars The four components form a context-free grammar. We represent a CFG G by its four components, G = (V, T, P, S). 1. V : variables 2. T : terminals 3. P : productions 4. S: START Weiwei Sun Constituency, Trees, Context-free Grammar 23/26
An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Weiwei Sun Constituency, Trees, Context-free Grammar 24/26
An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Derivations We can infer the structure of a string. We can define the language of a grammar by applying the productions. S NP, V P NN, V P 警察, V P... Weiwei Sun Constituency, Trees, Context-free Grammar 24/26
An Example 1. V = {S, NP, V P, ADV P } {NN, AD, V V } 2. T = { 警察, 正在, 详细, 调查, 事故, 原因 } 3. P S NP, V P V P ADV P, V P V P V V, NP NP NN, NN NP NN 4. S NN 警察 NN 原因 AD 详细 NN 事故 AD 正在 V V 调查 Derivations We can infer the structure of a string. We can define the language of a grammar by applying the productions. S NP, V P NN, V P 警察, V P... S NP NN 警察 VV 调查 VP NN 原因 Weiwei Sun Constituency, Trees, Context-free Grammar 24/26
An Example S S S NP VP NP VP NP VP NN ADVP VP NN ADVP VP NN VV NN 警察 AD VV NN 警察 AD VV NN 警察 调查 原因 正在 调查 原因 S 详细 调查 原因 S NP VP NP VP NN ADVP VP NN ADVP VP 警察 AD ADVP VP 警察 AD VV NP 正在 AD VV NP 正在 调查 NN NN 详细 调查 NN NN 事故 原因 事故 原因 Weiwei Sun Constituency, Trees, Context-free Grammar 25/26
Reading Chap. 3. Syntax: A Generative Introduction. * Chap. 1. Aspects of the Theory of Syntax. Weiwei Sun Constituency, Trees, Context-free Grammar 26/26