Foundations for Natural Language Processing Lecture 11 Syntax and parsing

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Context Free Grammars. Many slides from Michael Collins

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The Interface between Phrasal and Functional Constraints

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Chapter 4: Valence & Agreement CSLI Publications

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Proof Theory for Syntacticians

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Developing a TT-MCTAG for German with an RCG-based Parser

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Prediction of Maximal Projection for Semantic Role Labeling

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Ch VI- SENTENCE PATTERNS.

LTAG-spinal and the Treebank

Control and Boundedness

Analysis of Probabilistic Parsing in NLP

Compositional Semantics

Ensemble Technique Utilization for Indonesian Dependency Parser

Character Stream Parsing of Mixed-lingual Text

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

"f TOPIC =T COMP COMP... OBJ

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Construction Grammar. University of Jena.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Some Principles of Automated Natural Language Information Extraction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Argument structure and theta roles

Specifying Logic Programs in Controlled Natural Language

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Accurate Unlexicalized Parsing for Modern Hebrew

Constraining X-Bar: Theta Theory

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Parsing natural language

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

An Interactive Intelligent Language Tutor Over The Internet

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Pre-Processing MRSes

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

A Version Space Approach to Learning Context-free Grammars

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Part I. Figuring out how English works

DESIGNING NARRATIVE LEARNING MATERIAL AS A GUIDANCE FOR JUNIOR HIGH SCHOOL STUDENTS IN LEARNING NARRATIVE TEXT

A Usage-Based Approach to Recursion in Sentence Processing

Hyperedge Replacement and Nonprojective Dependency Structures

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

GACE Computer Science Assessment Test at a Glance

Pseudo-Passives as Adjectival Passives

Adjectives tell you more about a noun (for example: the red dress ).

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Chapter 9 Banked gap-filling

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Language acquisition: acquiring some aspects of syntax.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

LNGT0101 Introduction to Linguistics

Rule-based Expert Systems

Using a Native Language Reference Grammar as a Language Learning Tool

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

IBAN LANGUAGE PARSER USING RULE BASED APPROACH

Developing Grammar in Context

California Department of Education English Language Development Standards for Grade 8

A Computational Evaluation of Case-Assignment Algorithms

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Derivational and Inflectional Morphemes in Pak-Pak Language

The College Board Redesigned SAT Grade 12

Programma di Inglese

Name: Class: Date: ID: A

A Grammar for Battle Management Language

Cognitive Modeling. Tower of Hanoi: Description. Tower of Hanoi: The Task. Lecture 5: Models of Problem Solving. Frank Keller.

Words come in categories

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Multimedia Application Effective Support of Education

AQUA: An Ontology-Driven Question Answering System

Multiple case assignment and the English pseudo-passive *

Universiteit Leiden ICT in Business

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Som and Optimality Theory

Using dialogue context to improve parsing performance in dialogue systems

Thornhill Primary School - Grammar coverage Year 1-6

Transcription:

Foundations for Natural Language Processing Lecture 11 Syntax and parsing Alex Lascarides (slides from Alex Lascarides, Sharon Goldwater Mark Steedman and Philipp Koehn) 27 February 2018 Alex Lascarides FNLP Lecture 11 27 February 2018

Modelling word behaviour We ve seen various ways to model word behaviour. Bag-of-words models: ignore word order entirely N-gram models: capture a fixed-length history to predict word sequences. HMMs: also capture fixed-length history, using latent variables. Useful for various tasks, but a really accurate model of language needs more than a fixed-length history! Alex Lascarides FNLP Lecture 11 1

Long-range dependencies The form of one word often depends on (agrees with) another, even when arbitrarily long material intervenes. Sam/Dogs sleeps/sleep soundly Sam, who is my cousin, sleeps soundly Dogs often stay at my house and sleep soundly Sam, the man with red hair who is my cousin, sleeps soundly We want models that can capture these dependencies. Alex Lascarides FNLP Lecture 11 2

Phrasal categories We may also want to capture substitutability at the phrasal level. POS categories indicate which words are substitutable. substituting adjectives: For example, I saw a red cat I saw a former cat I saw a billowy cat Phrasal categories indicate which phrases are substitutable. substituting noun phrase: For example, Dogs sleep soundly My next-door neighbours sleep soundly Green ideas sleep soundly Alex Lascarides FNLP Lecture 11 3

Theories of syntax A theory of syntax should explain which sentences are well-formed (grammatical) and which are not. Note that well-formed is distinct from meaningful. Famous example from Chomsky: Colorless green ideas sleep furiously However we ll see shortly that the reason we care about syntax is mainly for interpreting meaning. Alex Lascarides FNLP Lecture 11 4

Theories of syntax We ll look at two theories of syntax to handle one or both phenomena above (long-range dependencies, phrasal substitutability): Context-free grammar (and variants): today, next class Dependency grammar: following class These can be viewed as different models of language behaviour. As with other models, we will look at What each model can capture, and what it cannot. Algorithms that provide syntactic analyses for sentences using these models (i.e., syntactic parsers). Alex Lascarides FNLP Lecture 11 5

Reminder: Context-free grammar Two types of grammar symbols: terminals (t): words. Non-terminals (NT): phrasal categories like S, NP, VP, PP, with S being the Start symbol. In practice, we sometimes distinguish pre-terminals (POS tags), a type of NT. Rules of the form NT β, where β is any string of NT s and t s. Strictly speaking, that s a notation for a rule. There s also an abbreviated notation for sets of rules with same LHS: NT β 1 β 2 β 3... A CFG in Chomsky Normal Form only has rules of the form NT i NT j NT k or NT i t j Alex Lascarides FNLP Lecture 11 6

CFG example S NP VP NP D N Pro PropN D PosPro Art NP s VP Vi Vt NP Vp NP VP Pro i we you he she him her PosPro my our your his her PropN Robin Jo Art a an the N man duck saw park telescope Vi sleep run duck Vt eat break see saw Vp see saw heard (Sentences) (Noun phrases) (Determiners) (Verb phrases) (Pronouns) (Possessive pronouns) (Proper nouns) (Articles) (Nouns) (Intransitive verbs) (Transitive verbs) (Verbs with NP VP args) Alex Lascarides FNLP Lecture 11 7

Example syntactic analysis To show that a sentence is well-formed under this CFG, we must provide a parse. One way to do this is by drawing a tree: S NP VP Pro Vt NP i saw D N Art man the You can think of a tree like this as proving that its leaves are in the language generated by the grammar. Alex Lascarides FNLP Lecture 11 8

Structural Ambiguity Some sentences have more than one parse: structural ambiguity. S S NP VP NP VP Pro Vt NP Pro Vp NP VP he saw PosPro N he saw Pro Vi her duck her duck Here, the structural ambiguity is caused by POS ambiguity in several of the words. (Both are types of syntactic ambiguity.) Alex Lascarides FNLP Lecture 11 9

Attachment ambiguity Some sentences have structural ambiguity even without part-of-speech ambiguity. This is called attachment ambiguity. Depends on where different phrases attach in the tree. Different attachments have different meanings: I saw the man with the telescope She ate the pizza on the floor Good boys and girls get presents from Santa Next slides show trees for the first example: prepositional phrase (PP) attachment ambiguity. (Trees slightly abbreviated...) Alex Lascarides FNLP Lecture 11 10

Attachment ambiguity S NP VP Pro i Vt saw NP NP PP the man P NP with the telescope Alex Lascarides FNLP Lecture 11 11

Attachment ambiguity S NP VP Pro i Vt NP PP saw the man P NP with the telescope Alex Lascarides FNLP Lecture 11 12

Parsing algorithms Goal: compute the structure(s) for an input string given a grammar. Ultimately, want to use the structure to interpret meaning. As usual, ambiguity is a huge problem. For correctness: need to find the right structure to get the right meaning. For efficiency: searching all possible structures can be very slow; want to use parsing for large-scale language tasks (e.g., used to create Google s infoboxes ). Alex Lascarides FNLP Lecture 11 13

Global and local ambiguity We ve already seen examples of global ambiguity: multiple analyses for a full sentence, such as I saw the man with the telescope But local ambiguity is also a big problem: sentence. multiple analyses for parts of the dog bit the child: first three words could be NP (but aren t). Building useless partial structures wastes time. Avoiding useless computation is a major issue in parsing. Syntactic ambiguity is rampant; humans usually don t even notice because we are good at using context/semantics to disambiguate. Alex Lascarides FNLP Lecture 11 14

Parser properties All parsers have two fundamental properties: Directionality: the sequence in which the structures are constructed. top-down: start with root category (S), choose expansions, build down to words. bottom-up: build subtrees over words, build up to S. Mixed strategies also possible (e.g., left corner parsers) Search strategy: the order in which the search space of possible analyses is explored. Alex Lascarides FNLP Lecture 11 15

Example: search space for top-down parser Start with S node. Choose one of many possible expansions. S NP VP aux NP VP S S NP S Each of which has children with many possible expansions... NP S S S S S S............... etc Alex Lascarides FNLP Lecture 11 16

Search strategies depth-first search: explore one branch of the search space at a time, as far as possible. If this branch is a dead-end, parser needs to backtrack. breadth-first search: expand all possible branches in parallel (or simulated parallel). Requires storing many incomplete parses in memory at once. best-first search: score each partial parse and pursue the highest-scoring options first. (Will get back to this when discussing statistical parsing.) Alex Lascarides FNLP Lecture 11 17

Recursive Descent Parsing A recursive descent parser treats a grammar as a specification of how to break down a top-level goal (find S) into subgoals (find NP VP). It is a top-down, depth-first parser: Blindly expand nonterminals until reaching a terminal (word). If multiple options available, choose one but store current state as a backtrack point (in a stack to ensure depth-first.) If terminal matches next input word, continue; else, backtrack. Alex Lascarides FNLP Lecture 11 18

RD Parsing algorithm Start with subgoal = S, then repeat until input/subgoals are empty: If first subgoal in list is a non-terminal A, then pick an expansion A B C from grammar and replace A in subgoal list with B C If first subgoal in list is a terminal w: If input is empty, backtrack. If next input word is different from w, backtrack. If next input word is w, match! i.e., consume input word w and subgoal w and move to next subgoal. If we run out of backtrack points but not input, no parse is possible. Alex Lascarides FNLP Lecture 11 19

Consider a very simple example: Recursive descent example Grammar contains only these rules: S NP VP VP V NN bit V bit NP DT NN DT the NN dog V dog The input sequence is the dog bit Alex Lascarides FNLP Lecture 11 20

Recursive descent example Operations: Expand (E) Match (M) Backtrack to step n (Bn) Step Op. Subgoals Input 0 S the dog bit 1 E NP VP the dog bit 2 E DT NN VP the dog bit 3 E the NN VP the dog bit 4 M NN VP dog bit 5 E bit VP dog bit 6 B4 NN VP dog bit 7 E dog VP dog bit 8 M VP bit 9 E V bit 10 E bit bit 11 M Alex Lascarides FNLP Lecture 11 21

Further notes The above sketch is actually a recognizer: it tells us whether the sentence has a valid parse, but not what the parse is. For a parser, we d need more details to store the structure as it is built. We only had one backtrack, but in general things can be much worse! See Inf2a Lecture 17 for a much longer example showing inefficiency. If we have left-recursive rules like NP NP PP, we get an infinite loop! Alex Lascarides FNLP Lecture 11 22

Shift-Reduce Parsing Search strategy and directionality are orthogonal properties. Shift-reduce parsing is depth-first (like RD) but bottom-up (unlike RD). Basic shift-reduce recognizer repeatedly: Whenever possible, reduces one or more items from top of stack that match RHS of rule, replacing with LHS of rule. When that s not possible, shifts an input symbol onto a stack. Like RD parser, needs to maintain backtrack points. Alex Lascarides FNLP Lecture 11 23

Same example grammar and sentence. Operations: Reduce (R) Shift (S) Backtrack to step n (Bn) Note that at 9 and 11 we skipped over backtracking to 7 and 5 respectively as there were actually no choices to be made at those points. Shift-reduce example Step Op. Stack Input 0 the dog bit 1 S the dog bit 2 R DT dog bit 3 S DT dog bit 4 R DT V bit 5 R DT VP bit 6 S DT VP bit 7 R DT VP V 8 R DT VP VP 9 B6 DT VP bit 10 R DT VP NN 11 B4 DT V bit 12 S DT V bit 13 R DT V V 14 R DT V VP 15 B3 DT dog bit 16 R DT NN bit 17 R NP bit... Alex Lascarides FNLP Lecture 11 24

Depth-first parsing in practice Depth-first parsers are very efficient for unambiguous structures. Widely used to parse/compile programming languages, which are constructed to be unambiguous. But can be massively inefficient (exponential in sentence length) if faced with local ambiguity. Blind backtracking may require re-building the same structure over and over: so, simple depth-first parsers are not used in NLP. But: if we use a probabilistic model to learn which choices to make, we can do very well in practice (coming next week...) Alex Lascarides FNLP Lecture 11 25

Breadth-first search using dynamic programming With a CFG, you should be able to avoid re-analysing any substring because its analysis is independent of the rest of the parse. [he] np [saw her duck] vp chart parsing algorithms exploit this fact. use dynamic programming to store and reuse sub-parses, composing them into a full solution. So multiple potential parses are explored at once: a breadth-first strategy. Alex Lascarides FNLP Lecture 11 26

Parsing as dynamic programming For parsing, subproblems are analyses of substrings, memoized in chart (aka well-formed substring table, WFST). Chart entries are indexed by start and end positions in the sentence, and correspond to: either a complete constituent (sub-tree) spanning those positions (if working bottom-up), or a prediction about what complete constituent might be found (if working top-down). Alex Lascarides FNLP Lecture 11 27

What s in the chart? We assume indices between each word in the sentence: 0 he 1 saw 2 her 3 duck 4 The chart is a matrix where cell [i, j] holds information about the word span from position i to position j: The root node of any constituent(s) spanning those words Pointers to its sub-constituents (Depending on parsing method,) predictions about what constituents might follow the substring. Alex Lascarides FNLP Lecture 11 28

Algorithms for Chart Parsing Many different chart parsing algorithms, including the CKY algorithm, which memoizes only complete constituents various algorithms that also memoize predictions/partial constituents often using mixed bottom-up and top-down approaches, e.g., the Earley algorithm described in J&M, and left-corner parsing. We ll look at CKY parsing and statistical parsing next time... Alex Lascarides FNLP Lecture 11 29