An Introduction to Natural Language Syntax

Similar documents
CS 598 Natural Language Processing

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Grammars & Parsing, Part 1:

Constraining X-Bar: Theta Theory

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Context Free Grammars. Many slides from Michael Collins

Som and Optimality Theory

Theoretical Syntax Winter Answers to practice problems

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Argument structure and theta roles

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Prediction of Maximal Projection for Semantic Role Labeling

Chapter 4: Valence & Agreement CSLI Publications

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Parsing of part-of-speech tagged Assamese Texts

An Introduction to the Minimalist Program

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Developing a TT-MCTAG for German with an RCG-based Parser

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Ch VI- SENTENCE PATTERNS.

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Control and Boundedness

Unit 8 Pronoun References

Pseudo-Passives as Adjectival Passives

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Underlying and Surface Grammatical Relations in Greek consider

A Grammar for Battle Management Language

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Minimalism is the name of the predominant approach in generative linguistics today. It was first

AQUA: An Ontology-Driven Question Answering System

Proof Theory for Syntacticians

LNGT0101 Introduction to Linguistics

A First-Pass Approach for Evaluating Machine Translation Systems

L1 and L2 acquisition. Holger Diessel

An Interactive Intelligent Language Tutor Over The Internet

"f TOPIC =T COMP COMP... OBJ

Language acquisition: acquiring some aspects of syntax.

A Computational Evaluation of Case-Assignment Algorithms

Dependency, licensing and the nature of grammatical relations *

Derivations (MP) and Evaluations (OT) *

BULATS A2 WORDLIST 2

The Structure of Multiple Complements to V

LTAG-spinal and the Treebank

Advanced Grammar in Use

SOME MINIMAL NOTES ON MINIMALISM *

Words come in categories

Developing Grammar in Context

UCLA UCLA Electronic Theses and Dissertations

Construction Grammar. University of Jena.

Multiple case assignment and the English pseudo-passive *

Accurate Unlexicalized Parsing for Modern Hebrew

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Compositional Semantics

Some Principles of Automated Natural Language Information Extraction

Hindi-Urdu Phrase Structure Annotation

Update on Soar-based language processing

Procedia - Social and Behavioral Sciences 154 ( 2014 )

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Natural Language Processing. George Konidaris

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Today we examine the distribution of infinitival clauses, which can be

On the Notion Determiner

LING 329 : MORPHOLOGY

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Adapting Stochastic Output for Rule-Based Semantics

The Inclusiveness Condition in Survive-minimalism

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Interfacing Phonology with LFG

Emmaus Lutheran School English Language Arts Curriculum

Derivational and Inflectional Morphemes in Pak-Pak Language

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Agree or Move? On Partial Control Anna Snarska, Adam Mickiewicz University

Writing a composition

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

The College Board Redesigned SAT Grade 12

Chapter 9 Banked gap-filling

California Department of Education English Language Development Standards for Grade 8

Hindi Aspectual Verb Complexes

Focusing bound pronouns

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Disharmonic Word Order from a Processing Typology Perspective. John A. Hawkins, U of Cambridge RCEAL & UC Davis Linguistics

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Loughton School s curriculum evening. 28 th February 2017

Transcription:

An Introduction to Natural Language Syntax Rajat Mohanty rkm@cse.iitb.ac.in CS-460/IT-632 Department of Computer Science and Engineering Indian Institute of Technology, Bombay

Outline Grammatical Analysis Finite State Grammar Phrase Structure Grammar Transformational Grammar Natural Language Phenomena

A Ubiquitous Task for NLP Sequence labeling task can be at different levels. In written text Words Phrases Sentences Paragraphs

Names for Labeling Tasks Words: Part of Speech tagging Phrases: Chunking Sentences: Parsing Paragraphs: Co-reference annotating

Example (Words: POS Tagging) <s> The dispute shows clearly the global power of Japan's financial titans.</s> <s>[ The/DT dispute/nn ] shows/vbz clearly/rb [ the/dt global/jj power/nn ] of/in [ Japan/N 's/pos financial/jj titans/nns ]./. </s>

Example (Phrases: Chunking) The dispute shows clearly the global power of Japan's financial titans

Example (Sentences: Parsing) ( (S (-SBJ The dispute) (VP shows (ADVP-MNR clearly) ( ( the global power) (PP of ( ( Japan 's) financial titans)))).))

Parse Tree S VP Det N V Det JJ N PP The dispute shows the global power of Japan s financial titans

Example (Sentences: Co-referencing) ( (S (-SBJ-1 The banks) (VP (ADVP-MNR badly) want (S (-SBJ *-1) (VP to (VP break (PP into ( ( all aspects) (PP of ( the securities business))))))))

What is Grammar? A theory of language A theory of competence of a native speaker (in the context of a Natural Language) A finite set of rules that generates only and all sentences of a language. that assigns an appropriate structural description to each one. An explicit model of competence

What are the requirements? An explicit model of competence Should be able to generate an infinite set of grammatical sentences of the language Should not generate any ungrammatical ones Should be able to account for ambiguities (i.e., If a sentence is understood to have two meanings, the grammar should give two different structural description) If two sentences are understood to have same meaning, the grammar should give the same structure for both at some level If two sentences are understood to have different internal relationship, the grammar should assign different structural description

What is Syntax? Syntax is the study of the combination of words into phrases, clauses and sentences Syntax describes how sentences and their constituents are structured

Grammatical Analysis Techniques Two main devices Breaking up a String Sequential Hierarchical Transformational Labeling the Constituents Morphological Categorial Functional A grammar may combine any of these devices for grammatical analysis.

Breaking up and Labeling Sequential Breaking up Sequential Breaking up and Morphological Labeling Sequential Breaking up and Categorial Labeling Sequential Breaking up and Functional Labeling Hierarchical Breaking up Hierarchical Breaking up and Categorial Labeling Hierarchical Breaking up and Functional Labeling

Sequential Breaking up That student solved the problems. that + student + solve + ed + the + problem + s

Sequential Breaking up and Morphological Labeling That student solved the problems. that student solve ed the problem s word word stem affix word stem affix

Sequential Breaking up and Categorial Labeling This boy can solve the problem. this boy can solve the problem Det N Aux V Det N They called her a taxi. They call ed her a taxi Pron V Affix Pron Det N

Sequential Breaking up and Functional Labeling They called her a taxi Subject Verbal Direct Indirect Object Object They called her a taxi Subject Verbal Indirect Object Direct Object

Hierarchical Breaking up Old men and women Old men and women Old men and women Old men and women Old men and women men and women Old men

Hierarchical Breaking up and Categorial Labeling Poor John ran away. S VP A N V Adv Poor John ran away

Hierarchical Breaking up and Functional Labeling Immediate Constituent (IC) Analysis Construction types in terms of the function of the constituents: Predication (subject + predicate) Modification (modifier + head) Complementation (verbal + complement) Subordination (subordinator + dependent unit) Coordination (independent unit + coordinator)

Predication [Birds] subject [fly] predicate S Subject Predicate Birds fly

Modification [A] modifier [flower] head John [slept] head [in the room] modifier S Subject Predicate John Head slept Modifier In the room

Complementation He [saw] verbal [a lake] complement S Subject Predicate He Verbal Complement saw alake

Subordination John slept [in] subordinator [the room] dependent unit S Subject Predicate John Head Modifier slept Subordinator Dependent Unit in the room

Coordination [John came in time] independent unit [but] coordinator [Mary was not ready] independent unit S Independent Unit Coordinator Independent Unit John came in time but Mary was not ready

S An Example In the morning, the sky looked much brighter. Modifier Head Subordinator DU Subject Predicate Modifier Head Modifier Head Verbal Complement Modifier Head In the morning,the sky looked much brighter

Hierarchical Breaking up and Categorial / Functional Labeling Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device. But there are ambiguities which demand something more powerful. E.g., Love of God Someone loves God God loves someone

Hierarchical Breaking up Categorial Labeling Love of God Functional Labeling Love of God Noun Phrase Prepositional Phrase Head Modifier Sub DU love of God love of God

Types of Generative Grammar Finite State Model (sequential) Phrase Structure Model (sequential + hierarchical) + (categorial) Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)

Finite State Model THE OLD THE MEN MAN MAN COMES COME COMES The machine begins in the initial state, runs through a sequence of states (producing a word with each transition), and ends in the final state (producing a sentence) MEN COME

Phrase Structure Model

Phrase Structure Grammar (PSG) A phrase-structure grammar G consists of a four tuple (V, T, S, P), where V is a finite set of alphabets (or vocabulary) E.g., N, V, A, Adv, P,, VP, AP, AdvP, PP, student, sing, etc. T is a finite set of terminal symbols: T V E.g., student, sing, etc. S is a distinguished non-terminal symbol, also called start symbol: S V P is a set of production rules

Noun Phrases John the student the intelligent student N Det N Det AdjP N John the student the intelligent student

Noun Phrase his first five PhD students Det Ord Quant N N his first five PhD students

Noun Phrase The five best students of my class Det Quant AP N PP the five best students of my class

Verb Phrases can sing can hit the ball VP VP Aux V Aux V can sing can hit the ball

Verb Phrase Can give a flower to Mary VP Aux V PP can give a flower to Mary

Verb Phrase may make John the chairman VP Aux V may make John the chairman

Verb Phrase may find the book very interesting VP Aux V AP may find the book very interesting

Prepositional Phrases in the classroom PP near the river PP P P in the classroom near the river

Adjective Phrases intelligent very honest fond of sweets AP AP AP A Degree A A PP intelligent very honest fond of sweets

Adjective Phrase very worried that she might have done badly in the assignment AP Degree very A worried S that she might have done badly in the assignment

Phrase Structure Rules The boy hit the ball. Rewrite Rules: 1. S VP 2. Det N 3. VP V 4. Det the 5. N boy, ball 6. V hit We interpret each rule X Y as the instruction rewrite X as Y.

Derivation The boy hit the ball. Sentence + VP (1) S VP Det + N + VP (2) Det N Det + N + V + (3) VP V The + N + V + (4) Det the The + boy + V + (5) N boy The + boy + hit + (6) V hit The + boy + hit + Det + N (2) Det N The + boy + hit + the + N (4) Det the The + boy + hit + the + ball (5) N ball

PSG Parse Tree The boy hit the ball. S VP Det N V the boy hit Det N the ball

PSG Parse Tree John wrote those words in the Book of Proverbs. S VP PropN V PP P John wrote those words in the book PP of proverbs

Transformational Model

Transformational Grammar If a generative grammar makes use of all the three Sequential Hierarchical transformational breaking up and two categorial functional labeling is called a Transformational grammar (Universal Grammar).

Other Grammar Formalisms Lexical Functional Grammar (LFG) Generalised Phrase Structure Grammar (GPSG) Tree Adjoining Grammar (TAG) Categorial Grammar (CG) Head-driven Phrase Structure Grammar (HPSG) Systemic Functional Grammar (SFG)

Levels of Representation in Universal Grammar (UG) Lexicon D(eep)-Structure S(urface)-Structure Move -alpha PF (phonetic form) LF (logical form)

Interacting subsystems UG consists of interacting subsystems Various subcomponents of the rule system of grammar Subsystems of Principles

Subcomponents Subcomponents of the rule system Lexicon Syntax Categorial component Transformational component PF-component LF-component

Principles Subsystem of Principles X-bar Theory Theta-theory Government Binding Principles Case Theory Control Theory

Issues in Phrase Structure Grammar Limitation Overgeneration Solutions Subcategorization Restrictions Selectional Restriction

Overgeneration Ungrammaticality The boy relied on the girl. * The boy relied the girl. *The boy relied. Grammatically sound but semantically odd *The boy frightens sincerity. *Sincerity kicked the boy.

Ungrammaticality Given sentences: The boy relied on the girl. * The boy relied the girl. *The boy relied. PS Rules: VP V () (PP) Det N V rely Det the N boy girl

Subcategorization Frame Specify the categorial class of the lexical item. Specify the environment. Examples: kick: [V; _ ] cry: [V; _ ] rely: [V; _PP] put: [V; _ PP] think: : [V; _ S` ]

Subcategorization Frame forward V PP e.g., We will be forwarding our new catalogue to you invitation N PP accessible A PP e.g., e.g., An invitation to the party A program making science is more accessible to young people

Subcategorization Rules Subcategorization Rule: V y / _ ] _PP] _ PP] _] _S`]

Applying Subcategorization Rules The boy relied on the girl. 1. S VP 2. VP V () (PP) (S`) 3. Det N 4. V rely / _PP] 5. P on / _] 6. Det the 7. N boy, girl * The boy relied the girl. *The boy relied.

Semantically Odd Constructions Can we exclude these two ill-formed structures? *The boy frightened sincerity. *Sincerity kicked the boy. Necessity of a mechanism

Selectional Restrictions Inherent Properties of Nouns: E.g., [+/- ABSTRACT], [+/- ANIMATE] Sincerity [+ ABSTRACT] Boy [+ANIMATE] Lexical information of this type can be used to set up a context sensitive rewrite rule.

Selectional Rules A selectional rule specifies certain selectional restrictions associated with a verb. V y / [+/-ABSTARCT] [+/-ANIMATE] V frighten/ [+/-ABSTARCT] [+ANIMATE] *The boy frightened sincerity. *Sincerity kicked the boy.

Nature of Transformation Topicalization Topicalized Topicalized PP Movement Wh-movement Relative Pronoun movement

Topicalization I can solve this problem. This problem, I can solve. I can solve *(this problem). S VP Pron Aux V I can solve Det N the problem

Topicalization This problem, I can solve. S i VP Det this N problem Pron I Aux can V solve t(race) i

Topicalization To John, Mary gave the book. S PP i VP P N N V Det N PP t(race) i to John Mary gave the book

Wh-movement John can solve this problem. Which problem can John solve? S VP N Aux V John can solve Det N this problem

Wh-movement [Which problem i can John solve t i? ] S` Comp S Aux VP Wh-Det i N N V which problem can John solve t(race) i

Relative Pronoun Movement John heard the claim which Bill made. S VP N V John heard Det N S` the claim i

Relative Pronoun Movement [the claim which i Bill made t i ]. Det the N claim i Comp S` S VP Rel-Pron N V which i Bill made t(race) i

Relative Pronoun Movement [The problem i that i he solved t i was easy]. S VP Det N Comp S` S V AP VP was A Rel-Pron Pron V easy the problem i that i he solved t(race) i

Parser Output The problem that he solved was easy. S VP DT NN SBAR AUX ADJP IN S VP was JJ PRP VBD easy the problem that he solved

X-bar Theory It tells us how words are combined to make phrases and sentences. It captures the commonality between different types of phrases, which PSrules cannot.

X-bar Projection XP (Maximal projection) YP X `(Intermediate projection) X (Zero projection) ZP

X-bar Projection XP (X-phrase) YP(Specifier) X ` X (Head) ZP (Complement)

X-bar Projection XP YP (Specifier) X ` X ` ZP (Adjunct) X (Head) ZP (Complement)

X-bar Projection N ` John s N solution PP to the problem

X-bar Projection Det N ` the N ` PP N PP In the cabinet meeting discussion of the cricket match

X-bar Theory [Specifier-Head-Complement] SHC [Specifier-Complement-Head] SCH [Head-Complement-Specifier] HCS Every phrase is endocentric. There is a specific relation between the specifier and the head, i.e., Spec-Head configuration.

C(onstituent)-command C-command is a structural relation among the terminal and non-terminal nodes in a syntactic tree α c-commands β iff: the first branching node dominating α also dominates β α does not dominate β A B E C D F G

C-command Det N ` the N ` PP N PP P discussion P of Det N ` of the cricket match the N meeting

Government α governs β iff α is a lexical head (or tensed I) α C-commands β No barrier (VP,, PP, AP, or tensed IP) intervenes between α and β

Theta-Theory Hit: <1,2> (argument structure) <Agent, Patient> (thematic structure) Smile: <1> (argument structure) <Agent> (thematic structure) Forward: <1,2,3> (argument structure) <Agent, Theme, Goal> (thematic structure) Theta-Criterion Each argument must be assigned a theta-role Each theta-role must be assigned to an argument

Thematic Roles The man forwarded the mail to the minister. forward V PP ( Event FORWARD [ Agent THE MAN], [ Theme THE MAIL], [ Goal TO THE MINISTER] )

Binding Principles A relation, called Binding α binds β iff α c-commands β α and β are co-indexed Rajiv i likes himself i.

IP Binding I ` N` I VP N Rajiv Tense AGR t V like V ` N` N himself i

IP Binding I ` Rajiv s brother I VP Tense AGR t V like V ` N` N himself i

Binding Rajiv i s brother j likes himself *i /j [Rajiv s brother] is the antecedent of [himself]. [Rajiv] cannot be the antecedent of [himself]. That is, the sentence cannot mean that Rajiv i s brother likes Rajiv i. A particular kind of structural relation is maintained between [Rajiv s brother] and [himself], but not between [Rajiv] and [himself]. This structural relation is called C(onstituent)-command.

Binding For the purpose of interpretation, noun phrases have been conveniently divided into three groups: Anaphors (Reflexives and Reciprocals) e.g., myself, yourself, each other, one another, etc Pronouns e.g. he, she, it, we, etc R-Expressions e.g., John, Mumbai

Binding Principles Principle A: An anaphor is bound in its governing category Rajiv i likes himself i Principle B: A pronominal is free in its governing category Rajiv i likes him *i / j Principle C: An R-expression is always free John likes Mary Examples We think that nobody likes us. *We think that nobody likes ourselves.

Natural Language Phenomena Agreement Subject-verb agreement Agreement in Relative Pronouns (English): The man who/*which I saw The book which/*who I saw Ambiguity The mayor asked the police to stop drinking after midnight. Yesterday I saw a crane in the campus. Negation Scope John did not deliberately broke the glass. John deliberately did not broke the glass. Quantifier Scope Every student likes a teacher in the class. Gapping John bought a story book and Mary a pen. Meena was crying because her mother was.

Natural Language Phenomena Scrambling effect Slifting John has robbed the bank, I believe. Sluicing John bought something but I don t know what [John bought t]. Question Auxiliary Inversion Wh-fronting Intonation Wh-in situ Control Structures I compelled John to read this article. I promised John to read this article.

Suggested Readings Chomsky, N. 1957. Syntactic Structures. Mouton, The Hague. Chomsky, N. 1981. Lectures on Government and Binding. MIT, Mass. Radford, A. 1988. Transformational Grammar. CUP. Jurafsky, D and J. Martin, 2000. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, New Jersey. Allen, James, 1995. Natural Language Understanding. The Benjamins/Cummings Publishing Company, Inc. UK.

Thank You