An Introduction to Natural Language Syntax Rajat Mohanty rkm@cse.iitb.ac.in CS-460/IT-632 Department of Computer Science and Engineering Indian Institute of Technology, Bombay
Outline Grammatical Analysis Finite State Grammar Phrase Structure Grammar Transformational Grammar Natural Language Phenomena
A Ubiquitous Task for NLP Sequence labeling task can be at different levels. In written text Words Phrases Sentences Paragraphs
Names for Labeling Tasks Words: Part of Speech tagging Phrases: Chunking Sentences: Parsing Paragraphs: Co-reference annotating
Example (Words: POS Tagging) <s> The dispute shows clearly the global power of Japan's financial titans.</s> <s>[ The/DT dispute/nn ] shows/vbz clearly/rb [ the/dt global/jj power/nn ] of/in [ Japan/N 's/pos financial/jj titans/nns ]./. </s>
Example (Phrases: Chunking) The dispute shows clearly the global power of Japan's financial titans
Example (Sentences: Parsing) ( (S (-SBJ The dispute) (VP shows (ADVP-MNR clearly) ( ( the global power) (PP of ( ( Japan 's) financial titans)))).))
Parse Tree S VP Det N V Det JJ N PP The dispute shows the global power of Japan s financial titans
Example (Sentences: Co-referencing) ( (S (-SBJ-1 The banks) (VP (ADVP-MNR badly) want (S (-SBJ *-1) (VP to (VP break (PP into ( ( all aspects) (PP of ( the securities business))))))))
What is Grammar? A theory of language A theory of competence of a native speaker (in the context of a Natural Language) A finite set of rules that generates only and all sentences of a language. that assigns an appropriate structural description to each one. An explicit model of competence
What are the requirements? An explicit model of competence Should be able to generate an infinite set of grammatical sentences of the language Should not generate any ungrammatical ones Should be able to account for ambiguities (i.e., If a sentence is understood to have two meanings, the grammar should give two different structural description) If two sentences are understood to have same meaning, the grammar should give the same structure for both at some level If two sentences are understood to have different internal relationship, the grammar should assign different structural description
What is Syntax? Syntax is the study of the combination of words into phrases, clauses and sentences Syntax describes how sentences and their constituents are structured
Grammatical Analysis Techniques Two main devices Breaking up a String Sequential Hierarchical Transformational Labeling the Constituents Morphological Categorial Functional A grammar may combine any of these devices for grammatical analysis.
Breaking up and Labeling Sequential Breaking up Sequential Breaking up and Morphological Labeling Sequential Breaking up and Categorial Labeling Sequential Breaking up and Functional Labeling Hierarchical Breaking up Hierarchical Breaking up and Categorial Labeling Hierarchical Breaking up and Functional Labeling
Sequential Breaking up That student solved the problems. that + student + solve + ed + the + problem + s
Sequential Breaking up and Morphological Labeling That student solved the problems. that student solve ed the problem s word word stem affix word stem affix
Sequential Breaking up and Categorial Labeling This boy can solve the problem. this boy can solve the problem Det N Aux V Det N They called her a taxi. They call ed her a taxi Pron V Affix Pron Det N
Sequential Breaking up and Functional Labeling They called her a taxi Subject Verbal Direct Indirect Object Object They called her a taxi Subject Verbal Indirect Object Direct Object
Hierarchical Breaking up Old men and women Old men and women Old men and women Old men and women Old men and women men and women Old men
Hierarchical Breaking up and Categorial Labeling Poor John ran away. S VP A N V Adv Poor John ran away
Hierarchical Breaking up and Functional Labeling Immediate Constituent (IC) Analysis Construction types in terms of the function of the constituents: Predication (subject + predicate) Modification (modifier + head) Complementation (verbal + complement) Subordination (subordinator + dependent unit) Coordination (independent unit + coordinator)
Predication [Birds] subject [fly] predicate S Subject Predicate Birds fly
Modification [A] modifier [flower] head John [slept] head [in the room] modifier S Subject Predicate John Head slept Modifier In the room
Complementation He [saw] verbal [a lake] complement S Subject Predicate He Verbal Complement saw alake
Subordination John slept [in] subordinator [the room] dependent unit S Subject Predicate John Head Modifier slept Subordinator Dependent Unit in the room
Coordination [John came in time] independent unit [but] coordinator [Mary was not ready] independent unit S Independent Unit Coordinator Independent Unit John came in time but Mary was not ready
S An Example In the morning, the sky looked much brighter. Modifier Head Subordinator DU Subject Predicate Modifier Head Modifier Head Verbal Complement Modifier Head In the morning,the sky looked much brighter
Hierarchical Breaking up and Categorial / Functional Labeling Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device. But there are ambiguities which demand something more powerful. E.g., Love of God Someone loves God God loves someone
Hierarchical Breaking up Categorial Labeling Love of God Functional Labeling Love of God Noun Phrase Prepositional Phrase Head Modifier Sub DU love of God love of God
Types of Generative Grammar Finite State Model (sequential) Phrase Structure Model (sequential + hierarchical) + (categorial) Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)
Finite State Model THE OLD THE MEN MAN MAN COMES COME COMES The machine begins in the initial state, runs through a sequence of states (producing a word with each transition), and ends in the final state (producing a sentence) MEN COME
Phrase Structure Model
Phrase Structure Grammar (PSG) A phrase-structure grammar G consists of a four tuple (V, T, S, P), where V is a finite set of alphabets (or vocabulary) E.g., N, V, A, Adv, P,, VP, AP, AdvP, PP, student, sing, etc. T is a finite set of terminal symbols: T V E.g., student, sing, etc. S is a distinguished non-terminal symbol, also called start symbol: S V P is a set of production rules
Noun Phrases John the student the intelligent student N Det N Det AdjP N John the student the intelligent student
Noun Phrase his first five PhD students Det Ord Quant N N his first five PhD students
Noun Phrase The five best students of my class Det Quant AP N PP the five best students of my class
Verb Phrases can sing can hit the ball VP VP Aux V Aux V can sing can hit the ball
Verb Phrase Can give a flower to Mary VP Aux V PP can give a flower to Mary
Verb Phrase may make John the chairman VP Aux V may make John the chairman
Verb Phrase may find the book very interesting VP Aux V AP may find the book very interesting
Prepositional Phrases in the classroom PP near the river PP P P in the classroom near the river
Adjective Phrases intelligent very honest fond of sweets AP AP AP A Degree A A PP intelligent very honest fond of sweets
Adjective Phrase very worried that she might have done badly in the assignment AP Degree very A worried S that she might have done badly in the assignment
Phrase Structure Rules The boy hit the ball. Rewrite Rules: 1. S VP 2. Det N 3. VP V 4. Det the 5. N boy, ball 6. V hit We interpret each rule X Y as the instruction rewrite X as Y.
Derivation The boy hit the ball. Sentence + VP (1) S VP Det + N + VP (2) Det N Det + N + V + (3) VP V The + N + V + (4) Det the The + boy + V + (5) N boy The + boy + hit + (6) V hit The + boy + hit + Det + N (2) Det N The + boy + hit + the + N (4) Det the The + boy + hit + the + ball (5) N ball
PSG Parse Tree The boy hit the ball. S VP Det N V the boy hit Det N the ball
PSG Parse Tree John wrote those words in the Book of Proverbs. S VP PropN V PP P John wrote those words in the book PP of proverbs
Transformational Model
Transformational Grammar If a generative grammar makes use of all the three Sequential Hierarchical transformational breaking up and two categorial functional labeling is called a Transformational grammar (Universal Grammar).
Other Grammar Formalisms Lexical Functional Grammar (LFG) Generalised Phrase Structure Grammar (GPSG) Tree Adjoining Grammar (TAG) Categorial Grammar (CG) Head-driven Phrase Structure Grammar (HPSG) Systemic Functional Grammar (SFG)
Levels of Representation in Universal Grammar (UG) Lexicon D(eep)-Structure S(urface)-Structure Move -alpha PF (phonetic form) LF (logical form)
Interacting subsystems UG consists of interacting subsystems Various subcomponents of the rule system of grammar Subsystems of Principles
Subcomponents Subcomponents of the rule system Lexicon Syntax Categorial component Transformational component PF-component LF-component
Principles Subsystem of Principles X-bar Theory Theta-theory Government Binding Principles Case Theory Control Theory
Issues in Phrase Structure Grammar Limitation Overgeneration Solutions Subcategorization Restrictions Selectional Restriction
Overgeneration Ungrammaticality The boy relied on the girl. * The boy relied the girl. *The boy relied. Grammatically sound but semantically odd *The boy frightens sincerity. *Sincerity kicked the boy.
Ungrammaticality Given sentences: The boy relied on the girl. * The boy relied the girl. *The boy relied. PS Rules: VP V () (PP) Det N V rely Det the N boy girl
Subcategorization Frame Specify the categorial class of the lexical item. Specify the environment. Examples: kick: [V; _ ] cry: [V; _ ] rely: [V; _PP] put: [V; _ PP] think: : [V; _ S` ]
Subcategorization Frame forward V PP e.g., We will be forwarding our new catalogue to you invitation N PP accessible A PP e.g., e.g., An invitation to the party A program making science is more accessible to young people
Subcategorization Rules Subcategorization Rule: V y / _ ] _PP] _ PP] _] _S`]
Applying Subcategorization Rules The boy relied on the girl. 1. S VP 2. VP V () (PP) (S`) 3. Det N 4. V rely / _PP] 5. P on / _] 6. Det the 7. N boy, girl * The boy relied the girl. *The boy relied.
Semantically Odd Constructions Can we exclude these two ill-formed structures? *The boy frightened sincerity. *Sincerity kicked the boy. Necessity of a mechanism
Selectional Restrictions Inherent Properties of Nouns: E.g., [+/- ABSTRACT], [+/- ANIMATE] Sincerity [+ ABSTRACT] Boy [+ANIMATE] Lexical information of this type can be used to set up a context sensitive rewrite rule.
Selectional Rules A selectional rule specifies certain selectional restrictions associated with a verb. V y / [+/-ABSTARCT] [+/-ANIMATE] V frighten/ [+/-ABSTARCT] [+ANIMATE] *The boy frightened sincerity. *Sincerity kicked the boy.
Nature of Transformation Topicalization Topicalized Topicalized PP Movement Wh-movement Relative Pronoun movement
Topicalization I can solve this problem. This problem, I can solve. I can solve *(this problem). S VP Pron Aux V I can solve Det N the problem
Topicalization This problem, I can solve. S i VP Det this N problem Pron I Aux can V solve t(race) i
Topicalization To John, Mary gave the book. S PP i VP P N N V Det N PP t(race) i to John Mary gave the book
Wh-movement John can solve this problem. Which problem can John solve? S VP N Aux V John can solve Det N this problem
Wh-movement [Which problem i can John solve t i? ] S` Comp S Aux VP Wh-Det i N N V which problem can John solve t(race) i
Relative Pronoun Movement John heard the claim which Bill made. S VP N V John heard Det N S` the claim i
Relative Pronoun Movement [the claim which i Bill made t i ]. Det the N claim i Comp S` S VP Rel-Pron N V which i Bill made t(race) i
Relative Pronoun Movement [The problem i that i he solved t i was easy]. S VP Det N Comp S` S V AP VP was A Rel-Pron Pron V easy the problem i that i he solved t(race) i
Parser Output The problem that he solved was easy. S VP DT NN SBAR AUX ADJP IN S VP was JJ PRP VBD easy the problem that he solved
X-bar Theory It tells us how words are combined to make phrases and sentences. It captures the commonality between different types of phrases, which PSrules cannot.
X-bar Projection XP (Maximal projection) YP X `(Intermediate projection) X (Zero projection) ZP
X-bar Projection XP (X-phrase) YP(Specifier) X ` X (Head) ZP (Complement)
X-bar Projection XP YP (Specifier) X ` X ` ZP (Adjunct) X (Head) ZP (Complement)
X-bar Projection N ` John s N solution PP to the problem
X-bar Projection Det N ` the N ` PP N PP In the cabinet meeting discussion of the cricket match
X-bar Theory [Specifier-Head-Complement] SHC [Specifier-Complement-Head] SCH [Head-Complement-Specifier] HCS Every phrase is endocentric. There is a specific relation between the specifier and the head, i.e., Spec-Head configuration.
C(onstituent)-command C-command is a structural relation among the terminal and non-terminal nodes in a syntactic tree α c-commands β iff: the first branching node dominating α also dominates β α does not dominate β A B E C D F G
C-command Det N ` the N ` PP N PP P discussion P of Det N ` of the cricket match the N meeting
Government α governs β iff α is a lexical head (or tensed I) α C-commands β No barrier (VP,, PP, AP, or tensed IP) intervenes between α and β
Theta-Theory Hit: <1,2> (argument structure) <Agent, Patient> (thematic structure) Smile: <1> (argument structure) <Agent> (thematic structure) Forward: <1,2,3> (argument structure) <Agent, Theme, Goal> (thematic structure) Theta-Criterion Each argument must be assigned a theta-role Each theta-role must be assigned to an argument
Thematic Roles The man forwarded the mail to the minister. forward V PP ( Event FORWARD [ Agent THE MAN], [ Theme THE MAIL], [ Goal TO THE MINISTER] )
Binding Principles A relation, called Binding α binds β iff α c-commands β α and β are co-indexed Rajiv i likes himself i.
IP Binding I ` N` I VP N Rajiv Tense AGR t V like V ` N` N himself i
IP Binding I ` Rajiv s brother I VP Tense AGR t V like V ` N` N himself i
Binding Rajiv i s brother j likes himself *i /j [Rajiv s brother] is the antecedent of [himself]. [Rajiv] cannot be the antecedent of [himself]. That is, the sentence cannot mean that Rajiv i s brother likes Rajiv i. A particular kind of structural relation is maintained between [Rajiv s brother] and [himself], but not between [Rajiv] and [himself]. This structural relation is called C(onstituent)-command.
Binding For the purpose of interpretation, noun phrases have been conveniently divided into three groups: Anaphors (Reflexives and Reciprocals) e.g., myself, yourself, each other, one another, etc Pronouns e.g. he, she, it, we, etc R-Expressions e.g., John, Mumbai
Binding Principles Principle A: An anaphor is bound in its governing category Rajiv i likes himself i Principle B: A pronominal is free in its governing category Rajiv i likes him *i / j Principle C: An R-expression is always free John likes Mary Examples We think that nobody likes us. *We think that nobody likes ourselves.
Natural Language Phenomena Agreement Subject-verb agreement Agreement in Relative Pronouns (English): The man who/*which I saw The book which/*who I saw Ambiguity The mayor asked the police to stop drinking after midnight. Yesterday I saw a crane in the campus. Negation Scope John did not deliberately broke the glass. John deliberately did not broke the glass. Quantifier Scope Every student likes a teacher in the class. Gapping John bought a story book and Mary a pen. Meena was crying because her mother was.
Natural Language Phenomena Scrambling effect Slifting John has robbed the bank, I believe. Sluicing John bought something but I don t know what [John bought t]. Question Auxiliary Inversion Wh-fronting Intonation Wh-in situ Control Structures I compelled John to read this article. I promised John to read this article.
Suggested Readings Chomsky, N. 1957. Syntactic Structures. Mouton, The Hague. Chomsky, N. 1981. Lectures on Government and Binding. MIT, Mass. Radford, A. 1988. Transformational Grammar. CUP. Jurafsky, D and J. Martin, 2000. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, New Jersey. Allen, James, 1995. Natural Language Understanding. The Benjamins/Cummings Publishing Company, Inc. UK.
Thank You