CMPT-825 Natural Language Processing

Similar documents
Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

CS 598 Natural Language Processing

Grammars & Parsing, Part 1:

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Language properties and Grammar of Parallel and Series Parallel Languages

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

"f TOPIC =T COMP COMP... OBJ

Control and Boundedness

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Parsing of part-of-speech tagged Assamese Texts

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Proof Theory for Syntacticians

Natural Language Processing. George Konidaris

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Context Free Grammars. Many slides from Michael Collins

Developing a TT-MCTAG for German with an RCG-based Parser

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

A General Class of Noncontext Free Grammars Generating Context Free Languages

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

An Introduction to the Minimalist Program

A Version Space Approach to Learning Context-free Grammars

Using a Native Language Reference Grammar as a Language Learning Tool

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Ch VI- SENTENCE PATTERNS.

Argument structure and theta roles

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Lecture 1: Machine Learning Basics

Constraining X-Bar: Theta Theory

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Korean ECM Constructions and Cyclic Linearization

Compositional Semantics

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

A Usage-Based Approach to Recursion in Sentence Processing

Chapter 4: Valence & Agreement CSLI Publications

A Grammar for Battle Management Language

Pre-Processing MRSes

Theoretical Syntax Winter Answers to practice problems

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Enumeration of Context-Free Languages and Related Structures

Refining the Design of a Contracting Finite-State Dependency Parser

Hindi Aspectual Verb Complexes

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Hyperedge Replacement and Nonprojective Dependency Structures

Words come in categories

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

Cross Language Information Retrieval

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Self Study Report Computer Science

Axiom 2013 Team Description Paper

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Computational Evaluation of Case-Assignment Algorithms

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Reinforcement Learning by Comparing Immediate Reward

Prediction of Maximal Projection for Semantic Role Labeling

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

L1 and L2 acquisition. Holger Diessel

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Type-driven semantic interpretation and feature dependencies in R-LFG

Language acquisition: acquiring some aspects of syntax.

(Sub)Gradient Descent

On the Polynomial Degree of Minterm-Cyclic Functions

Derivational and Inflectional Morphemes in Pak-Pak Language

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Accurate Unlexicalized Parsing for Modern Hebrew

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Specifying Logic Programs in Controlled Natural Language

LTAG-spinal and the Treebank

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Some Principles of Automated Natural Language Information Extraction

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Lecture 10: Reinforcement Learning

Parsing natural language

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

The suffix -able means "able to be." Adding the suffix -able to verbs turns the verbs into adjectives. chewable enjoyable

Evolution of Collective Commitment during Teamwork

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Genevieve L. Hartman, Ph.D.

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

Update on Soar-based language processing

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Constructing Parallel Corpus from Movie Subtitles

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

University of Groningen. Systemen, planning, netwerken Bosman, Aart

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Abstractions and the Brain

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Applications of memory-based natural language processing

Developing Grammar in Context

Transcription:

CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop 1

Natural Language and Complexity Formal language theory in computer science is a way to quantify computation From regular expressions to Turing machines, we obtain a hierarchy of recursion We can similarly use formal languages to describe the set of human languages Usually we abstract away from in the individual words in the language and concentrate on general aspects of the language 2

Natural Language and Complexity We ask the question: Does a particular formal language describe some aspect of human language Then we find out if that language isn t in a particular language class For example, if we abstract some aspect of human language to the formal language: {ww R where w {a, b}, w R is the reverse of w} we can then ask if it is possible to write a regular expression for this language If we can, then we can say that this particular example from human language does not go beyond regular languages. If not, then we have to go higher in the hierarchy (say, up to context-free languages) 3

The Chomsky Hierarchy unrestricted or type-0 grammars, generate the recursively enumerable languages, automata equals Turing machines context-sensitive grammars, generate the context-sensitive languages, automata equals Linear Bounded Automata context-free grammars, generate the context-free languages, automata equals Pushdown Automata regular grammars, generate the regular languages, automata equals Finite-State Automata 4

The Chomsky Hierarchy: G = (V, T, P, S ) where, α, β, γ (N T) unrestricted or type-0 grammars: α β, such that α ɛ context-sensitive grammars: αaβ αγβ, such that γ ɛ context-free grammars: A γ regular grammars: A a B or A a 5

Regular grammars: right-linear CFG: L(G) = {a b n 0} A a A A ɛ A b B B b B B ɛ 6

Context-free grammars: L(G) = {a n b n n 0} S a S b S ɛ 7

Dependency Grammar SBJ OBJ MOD OBJ Calvin imagined monsters in school SBJ OBJ MOD OBJ Calvin imagined monsters in school 8

Dependency Grammar: (Tesnière, 1959), (Panini) 1 Calvin 2 SBJ 2 imagined TOP 3 monsters 2 OBJ 4 in {2,3} MOD 5 school 4 OBJ If the dependencies are nested then DGs are equivalent (formally) to CFGs 1. TOP(imagined) SBJ(Calvin) imagined OBJ(monsters) MOD(in) 2. MOD(in) in OBJ(school) However, each rule is lexicalized (has a terminal symbol) 9

Categorial Grammar (Adjukiewicz, 1935) Calvin hates mangoes (S\)/ S\ S Also equivalent to CFGs Similar to DGs, each rule in CG is lexicalized 10

Context-sensitive grammars: L(G) = {a n b n n 1} S S B C S a C a B a a C B B C B a a a C b 11

Context-sensitive grammars: L(G) = {a n b n n 1} S 1 S 2 B 1 C 1 S 3 B 2 C 2 B 1 C 1 a 3 C 3 B 2 C 2 B 1 C 1 a 3 B 2 C 3 C 2 B 1 C 1 a 3 a 2 C 3 C 2 B 1 C 1 a 3 a 2 C 3 B 1 C 2 C 1 a 3 a 2 B 1 C 3 C 2 C 1 a 3 a 2 a 1 C 3 C 2 C 1 a 3 a 2 a 1 b 3 b 2 b 1 12

Unrestricted grammars: L(G) = {a 2i i 1} S A C a B C a a a C C B D B C B E a D D a A D A C a E E a A E ɛ 13

Unrestricted grammars: L(G) = {a 2i n 1} S A C a B A a a C B A a a E A a E a A E a a a a 14

Unrestricted grammars: L(G) = {a 2i i 1} A and B serve as left and right end-markers for sentential forms (derivation of each string) C is a marker that moves through the string of a s between A and B, doubling their number using C a a a C When C hits right end-marker B, it becomes a D or E by C B D B or C B E If a D is chosen, that D migrates left using a D D a until left end-marker A is reached 15

At that point D becomes C using A D A C and the process starts over Finally, E migrates left until it hits left end-marker A using a E E a Note that L(G) = {a 2i i 1} can also be written as a context-sensitive grammar, but consider G, where L(G ) = {a 2i i 0} can only be an unrestricted grammar. Note that a 0 = ɛ. Why is this true?

Strong vs. Weak Generative Capacity Weak generative capacity of a grammar is the set of strings or the language, e.g. 0 n 1 n for n 0 Strong generative capacity is the set of structures (usually the set of trees) provided by the grammar Let s ask the question: is the set of human languages contained in the set of regular languages? 16

Strong vs. Weak Generative Capacity If we consider strong generative capacity then the answer is somewhat easier to obtain For example, do we need to combine two non-terminals to provide the semantics? Or do we need nested dependencies? 17

Strong vs. Weak Generative Capacity VP VP a program to VP a program to VP promote promote PP and safety in safety PP minivans trucks and minivans in trucks 18

Strong vs. Weak Generative Capacity However, strong generative capacity requires a particular grammar and a particular linguistics theory of semantics or how meaning is assigned (in steps or compositionally) So, the stronger claim will be that some aspect of human language when you consider weak generative capacity is not regular This is quite tricky: consider L 1 = {a n b n } is context-free but L 2 = {a b } is regular and L 1 L 2 : so you could cheat and pick some subset of the language which won t prove anything Furthermore, the language should be infinite 19

Strong vs. Weak Generative Capacity Also, if we consider the size of a grammar then also the answer is easier to obtain ( joyable, richment). The CFG is more elegant and smaller than the equivalent regular grammar: V X A X -able X -ment X en- NA NA joy rich This is an engineering argument. However, it is related to the problem of describing the human learning process. Certain aspects of language are learned all at once not individually for each case. e.g., learning enjoyment automatically if enrichment was learnt 20

Is Human Language a Regular Language Consider the following set of English sentences (strings) S = If S 1 then S 2 S = Either S 3, or S 4 S = The man who said S 5 is arriving today Map If, then a and either, or b. This results in strings like abba or abaaba or abbaabba L = {ww R where w {a, b}, w R is the reverse of w} 21

Human Language is not a Regular Language Is L = ww R a regular language? To show something is not a regular language, we use the pumping lemma: for any infinite set of strings generated by a FSA if you consider a long enough string from this set, there has to be a loop which visits the same state at least twice Thus, in a regular language L, there are strings x, y, z such that xy n z for n 0 where y ɛ Let L be the intersection of L with aa bbaa. Recall that RLs are closed under intersection, so L must also be a RL. L = a n bba n For any choice of y (consider a i or b i or a i b or ba i ) the pumping lemma leads to the conclusion that L is not regular. 22

Human Language is not a Regular Language Another example, also from English, is the set of center embedded structures Think of S a S b and the nested dependencies a 1 a 2 a 3 b 3 b 2 b 1 Center embedding in English: the shares that the broker recommended were bought N 1 N 2 V 2 V 1 the moment when the shares that the broker recommended were bought has passed N 1 N 2 N 3 V 3 V 2 V 1 Can you come up with an example that has four verbs and corresponding number of nouns? cf. The Embedding by Ian Watson 23

Human Competence vs. Human Performance What if no more than 3 or 4 center embedding structures are possible? Then the language is finite, so the language is no longer strictly context-free The common assumption made is that human competence is represented by the context-free grammar, but human performance suffers from memory limitations which can be simulated by a simpler mechanism The arguments about elegance, size and the learning process in humans also apply in this case 24

Human Language is not a Context-Free Language Two approaches as before: consider strong and weak generative capacity For strong generative capacity, if we can show crossing dependencies in a language then no CFG can be written for such a language. Why? Quite a few major languages spoken by humans have crossed dependencies: Dutch (Bresnan et al., 1982), Swiss German, Tagalog, among others. 25

Human Language is not a Context-Free Language Swiss German:... mer em Hans es huus hälfed aastriiche... we Hans-DAT the house-acc helped paint N 1 N 2 V 1 V 2... we helped Hans paint the house Analogous structures in English (PRO is a empty pronoun subject): Eng: S 1 = we [ V1 helped] [ N1 Hans] (to do) [ S 2...] SwGer: S 1 = we [ N1 Hans] [ S 2... [ V1 helped]...] Eng: S 2 = PRO(ɛ) [ V2 paint] [ N2 the house] SwGer: S 2 = PRO(ɛ) [ N2 the house] [ V2 paint] Eng: S 1 + S 2 = we helped 1 Hans 1 PRO(ɛ) paint 2 the house 2 SwGer: S 1 + S 2 = we Hans 1 PRO(ɛ) the house 2 helped 1 paint 1 26

Human Language is not a Context-Free Language Weak generative capacity of human language being greater than context-free was much harder to show. (Pullum, 1982) was a compendium of all the failed efforts so far. (Shieber, 1985) and (Huybregts, 1984) showed this using examples from Swiss-German: mer d chind em Hans es huus lönd hälfed aastriiche we the children-acc Hans-DAT the house-acc let helped paint w a b x c d y N 1 N 2 N 3 V 1 V 2 V 3... we let the children help Hans paint the house 27

Let this set of sentences be represented by a language L (mapped to symbols w, a, b, x, c, d, y) Do the usual intersection with a regular language: wa b xc d y to obtain L = wa m b n xc m d n y The pumping lemma for CFLs [Bar-Hillel] states that if a string from the CFL can be written as wuxvy for u, v ɛ and wuxvy is long enough then wu n xv n y for n 0 is also in that CFL. The pumping lemma for CFLs shows that L is not context-free and hence human language is not even weakly context-free

Transformational (Movement) Grammars Note: not related to Transformation-Based Learning As we saw showing strong generative capacity beyond context-free was quite easy: all we needed was crossed dependencies to link verbs with their arguments. Linguists care about strong generative capacity since it provides the means to compute meanings using grammars. Linguists also want to express generalizations (cf. the morphology example: joyment, richment) 28

Transformational (Movement) Grammars Calvin admires Hobbes. Hobbes is admired by Calvin. Who does Calvin admire? Who admires Hobbes? Who does Calvin believe admires Hobbes? The stuffed animal who admires Hobbes is a genius. The stuffed animal who Calvin admires is imaginative. Who is admired by Calvin? The stuffed animal who is admired by Calvin is a genius. Who is Hobbes admired by? The stuffed animal who Hobbes is admired by is imaginative. Calvin seems to admire Hobbes. Calvin is likely to seem to admire Hobbes. Who does Calvin think I believe Hobbes admires? 29

S S VP S Calvin V who VP admires who Calvin V admires ɛ 30

S S who VP ɛ V VP is VP PP V P admired ɛ by Calvin 31

context-sensitive grammars: 0 i, i is not a prime number and i > 0 indexed grammars: 0 n 1 n 2 n... m n, for any fixed m and n 0 tree-adjoining grammars (TAG), linear-indexed grammars (LIG), combinatory categorial grammars (CCG): 0 n 1 n 2 n 3 n, for n 0 context-free grammars: 0 n 1 n for n 0 deterministic context-free grammars: S S c, S S A A, A a S b ab: the language of balanced parentheses regular grammars: (0 1) 00(0 1) 32

33

Given grammar G and input x, provide algorithm for: Is x L(G)? unrestricted: undecidable (movement grammars, feature structure unification) context-sensitive: NSPACE[n] linear non-deterministic space indexed grammars: -Complete (restricted feature structure unification) tree-adjoining grammars (TAG), linear-indexed grammars (LIG), combinatory categorial grammars (CCG), head grammars: O(n 6 ) context-free: O(n 3 ) deterministic context-free: O(n) regular grammars: O(n) Which class corresponds to human language? 34