Basic Parsing with Context Free Grammars

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

Developing a TT-MCTAG for German with an RCG-based Parser

Natural Language Processing. George Konidaris

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

The Interface between Phrasal and Functional Constraints

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Chapter 4: Valence & Agreement CSLI Publications

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Parsing natural language

Some Principles of Automated Natural Language Information Extraction

Proof Theory for Syntacticians

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

LTAG-spinal and the Treebank

Prediction of Maximal Projection for Semantic Role Labeling

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Words come in categories

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Analysis of Probabilistic Parsing in NLP

"f TOPIC =T COMP COMP... OBJ

An Interactive Intelligent Language Tutor Over The Internet

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Using dialogue context to improve parsing performance in dialogue systems

LING 329 : MORPHOLOGY

Hyperedge Replacement and Nonprojective Dependency Structures

Compositional Semantics

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Adapting Stochastic Output for Rule-Based Semantics

Character Stream Parsing of Mixed-lingual Text

A Graph Based Authorship Identification Approach

AQUA: An Ontology-Driven Question Answering System

Specifying Logic Programs in Controlled Natural Language

Accurate Unlexicalized Parsing for Modern Hebrew

Context Free Grammars. Many slides from Michael Collins

The CYK -Approach to Serial and Parallel Parsing

A Version Space Approach to Learning Context-free Grammars

Update on Soar-based language processing

Refining the Design of a Contracting Finite-State Dependency Parser

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Ensemble Technique Utilization for Indonesian Dependency Parser

Constraining X-Bar: Theta Theory

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Construction Grammar. University of Jena.

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Computational Evaluation of Case-Assignment Algorithms

Ch VI- SENTENCE PATTERNS.

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Applications of memory-based natural language processing

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

An Efficient Implementation of a New POP Model

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Argument structure and theta roles

Feature-Based Grammar

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Control and Boundedness

Underlying and Surface Grammatical Relations in Greek consider

Language Model and Grammar Extraction Variation in Machine Translation

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

Noisy SMS Machine Translation in Low-Density Languages

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Beyond the Pipeline: Discrete Optimization in NLP

Interfacing Phonology with LFG

Linking Task: Identifying authors and book titles in verbose queries

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Specifying a shallow grammatical for parsing purposes

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

LNGT0101 Introduction to Linguistics

Problems of the Arabic OCR: New Attitudes

The Role of the Head in the Interpretation of English Deverbal Compounds

Derivational and Inflectional Morphemes in Pak-Pak Language

An Investigation into Team-Based Planning

Language properties and Grammar of Parallel and Series Parallel Languages

Pseudo-Passives as Adjectival Passives

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Som and Optimality Theory

BULATS A2 WORDLIST 2

Learning goal-oriented strategies in problem solving

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Transcription:

Basic Parsing with Context Free Grammars Lecture #5 SNU 4th Industrial Revolution Academy: Artificial Intelligence Agent 1

Analyzing Linguistic Units Morphological parsing: analyze words into morphemes and affixes rule-based, FSAs, FSTs Phonological parsing: analyze sounds into words and phrases POS Tagging Syntactic parsing: identify component parts and how related to see if a sentence is grammatical to assign an abstract representation of meaning 2

Syntactic Parsing Declarative formalisms like CFGs define the legal strings of a language but don t specify how to recognize or assign structure to them Parsing algorithms specify how to recognize the strings of a language and assign each string one or more syntactic structures Parse trees useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition and almost every task in NLP 3

Parsing is a Form of Search Searching FSAs Finding the right path through the automaton Search space defined by structure of FSA Searching CFGs Finding the right parse tree among all possible parse trees Search space defined by the grammar Constraints provided by the input sentence and the automaton or grammar 4

CFG for Fragment of English S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N VP V Det that this a N book flight meal money V book include prefer Aux does Prep from to on PropN Houston TWA Nom Nom PP VP V NP TopD BotUp E.g. LC s 5

Parse Tree for Book that flight for Prior CFG S VP NP Nom Verb Book Det that Noun Flight 6

Top-Down Parser Builds from the root S node to the leaves Find a rule to apply by matching the left hand side of a rule Build a tree by replacing LHS with the right hand side Assuming we build all trees in parallel: Find all trees with root S (or all rules w/lhs S) Next expand all constituents in these trees/rules Continue until leaves are pos Candidate trees failing to match pos of input string are rejected (e.g. Book that flight can only match subtree 5) 7

Top Down Space 8

CFG for Fragment of English S NP VP S Aux NP VP S VP (1) NP Det Nom (4) NP PropN Nom N Nom Nom N (6) Nom Nom PP VP V NP (2) VP V Det that (5) this a N book flight (7) meal money V book (3) include prefer Aux does Prep from to on PropN Houston TWA TopD BotUp E.g. LC s 9

Parse Tree for Book that flight for Prior CFG S VP NP Nom Verb Book Det that Noun Flight 10

Bottom-Up Parsing Parser begins with words of input and builds up trees, applying grammar rules whose right hand side match Book that flight N Det N V Det N Book that flight Book that flight Book ambiguous Parse continues until an S root node reached or no further node expansion possible 11

Bottom-Up Space flight flight flight flight flight flight flight flight 12

CFG for Fragment of English S NP VP S Aux NP VP S VP (7) NP Det Nom (5) NP PropN Nom N Nom Nom N (4) Nom Nom PP VP V NP (6) VP V Det that (2) this a N book flight (3) meal money V book (1) include prefer Aux does Prep from to on PropN Houston TWA TopD BotUp E.g. LC s 13

Parse Tree for Book that flight for Prior CFG S VP NP Nom Verb Book Det that Noun Flight 14

Control Of course, we left out how to keep track of the spaces and how to make choices Which node to try to expand next Which grammar rule to use to expand a node 15

A Top-Down Parsing Strategy Depth-first search: Agenda of search states: expand search space incrementally, exploring most recently generated state (tree) each time When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree) Which node to expand? Leftmost or rightmost Which grammar rule to use? Order in the grammar?? 16

Top-Down, Depth-First, Left- Right Strategy Initialize agenda with S tree and ptr to first word and make this current search state (cur) Loop until successful parse or empty agenda Apply all applicable grammar rules to leftmost unexpanded node of cur If this node is a POS category and matches that of the current input, push this onto agenda O.w. push new trees onto agenda Pop new cur from agenda Does this flight include a meal? 17

Top-Down, Depth-First, Left-to- Right Search Curr: Grammar: S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V 18

Top-Down, Depth-First, Left-to- Right Search Curr: Grammar: S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V 19

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 20

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 21

Top-Down, Depth-First, Left-to- Right Search Curr: Grammar: S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V 22

Top-Down, Depth-First, Left-to- Right Search Curr: Grammar: S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V 23

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 24

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 25

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 26

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: 27

Top-Down, Depth-First, Left-to- Right Search Curr: Agenda: Continue putting NP rules on agenda 28

Does this flight include a meal? Parsing Overview 29

Does this flight include a meal? Parsing Overview (cont.) 30

Does this flight include a meal? Parsing Overview (cont.) [flight] [flight] 31

Does this flight include a meal? Parsing Overview (cont.) flight flight 32

A Bottom-Up Parsing Strategy Depth-first search: State of parse is going to be initialized to the input words At each step, look for Right Hand Side of a rule in the state, replace the matched right hand side with the Left Hand Side of the rule and continue Agenda of search states: expand search space incrementally, exploring most recently generated state each time When you reach a state that contains only the start symbol, you have successfully parsed 33

Bottom Up: Book that flight Curr: N det N Agenda: V det N Curr: Nom det N Agenda: N det Nom, V det N Curr: Nom det Nom Agenda: N det Nom, V det N Curr: Nom NP Agenda: N det Nom, V det N Curr: N det Nom Agenda: V det N S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V Grammar: 34

Bottom Up: Book that flight Curr: V det N Agenda: Curr: VP det N Agenda: V det Nom Curr: VP NP Agenda: V det Nom Curr: S NP Agenda: V det Nom S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP Grammar: VP V 35

Bottom Up: Book that flight Curr: V det Nom Agenda: Curr: V NP Agenda: Curr: VP Agenda: Curr: S Agenda: SUCCESS!!!! S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V Grammar: 36

What s wrong with. Top-Down parsers never explore illegal parses (e.g. can t form an S) -- but waste time on trees that can never match the input Bottom-Up parsers never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root) For both: control strategy -- how explore search space? Pursuing all parses in parallel or backtrack or? Which rule to apply next? Which node to expand next? 37

Left Corners: Top-Down Parsing with Bottom-Up Filtering We saw: Top-Down, depth-first, L2R parsing Expands non-terminals along the tree s left edge down to leftmost leaf of tree Moves on to expand down to next leftmost leaf Note: In successful parse, current input word will be first word in derivation of node the parser currently processing So.look ahead to left-corner of the tree B is a left-corner of A if A =*=> Bα Build table with left-corners of all non-terminals in grammar and consult before applying rule 38

Left Corners 39

Calculating Left Corners For each constituent on the LHS of a rule, follow through LHS until you find a preterminal (lexical category). That s the left corner. Consider S one rule at a time Det PropN Aux V Same procedure for other constituents S NP VP S Aux NP VP S VP NP Det Nom NP PropN Nom N Nom Nom N Nom Nom PP VP V NP VP V Grammar: 40

Left-Corner Table for CFG Category S NP Nom VP Left Corners Det, PropN, Aux, V Det, PropN N V 41

Left-Corner Example Assume that we again have the following grammar: Now, let's look at how a leftcorner recognizer would proceed to recognize vincent died. 42

Left-Corner Example 43

Left-Corner Example 44

Ambiguity Structural ambiguity occurs when the grammar assigns more than one possible parse to a sentence Attachment ambiguity attached to the parse tree more than one place (We saw the Eiffel Tower flying to Paris) Coordination ambiguity old men and women 45

Dynamic Programming Parsing Methods CKY Parsing Bottom-up Chomsky Normal Form(CNF) A->B C or A -> w Conversion to CNF Mix terminals and non-terminals -> introduce a new dummy non-terminal : INF-VP -> to VP : INF-VP ->TO VP, TO->to Unit productions (single nonterminal on the right) -> rewriting the right-hand side of the original rules with the right-hand side of all the non-unit production rules that they ultimately lead to. A=>B and B->γ (non-unit production), then A-> γ Right-hand side longer than 2 introduce new nonterminals. S->Aux NP VP : S->X1 VP, X1-> Aux NP 46

L1 for CKY example 47

CNF of L1 48

49

Dynamic Programming Parsing Methods The Earley Algorithm Top-down search Single left-to-right pass that fills an array (chart) that has N+1 entries Chart contains three kinds of information A subtree corresponding to a single grammar rule Information about the progress made in completing this subtree The position of the subtree with respect to the input Dotted rule(.) S -> VP, [0,0], two numbers- where state begins and where its dot lies. 50

Dynamic Programming Parsing Methods The Earley Algorithm Three Operators Predictor to create new states representing top-down expectations generated during the parsing process. Predictor is applied to any state that has a non-terminal immediately to the right of its dot that is not a part-of-speech category. Scanner When a state has a part-of-speech category to the right of the dot, SCANNER is called to examine the input and incorporate a state corresponding to the prediction of a word with a particular part-of-speech into the chart. Completer- applied to a state when its dot has reached the right end of the rule. 51

Dynamic Programming Parsing Methods The Earley Algorithm 52