Dependency Parsing. Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Ensemble Technique Utilization for Indonesian Dependency Parser

CS 598 Natural Language Processing

Some Principles of Automated Natural Language Information Extraction

Proof Theory for Syntacticians

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

The Interface between Phrasal and Functional Constraints

AQUA: An Ontology-Driven Question Answering System

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Grammars & Parsing, Part 1:

Context Free Grammars. Many slides from Michael Collins

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

Multimedia Application Effective Support of Education

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Prediction of Maximal Projection for Semantic Role Labeling

"f TOPIC =T COMP COMP... OBJ

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

LTAG-spinal and the Treebank

Compositional Semantics

Hyperedge Replacement and Nonprojective Dependency Structures

A heuristic framework for pivot-based bilingual dictionary induction

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Linking Task: Identifying authors and book titles in verbose queries

Developing a TT-MCTAG for German with an RCG-based Parser

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Character Stream Parsing of Mixed-lingual Text

A Framework for Customizable Generation of Hypertext Presentations

Second Exam: Natural Language Parsing with Neural Networks

The Strong Minimalist Thesis and Bounded Optimality

Accurate Unlexicalized Parsing for Modern Hebrew

GACE Computer Science Assessment Test at a Glance

A deep architecture for non-projective dependency parsing

Beyond the Pipeline: Discrete Optimization in NLP

A Graph Based Authorship Identification Approach

Constraining X-Bar: Theta Theory

An Interactive Intelligent Language Tutor Over The Internet

THE VERB ARGUMENT BROWSER

Loughton School s curriculum evening. 28 th February 2017

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

A Case Study: News Classification Based on Term Frequency

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

LING 329 : MORPHOLOGY

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Ch VI- SENTENCE PATTERNS.

The Discourse Anaphoric Properties of Connectives

Chapter 2 Rule Learning in a Nutshell

The Smart/Empire TIPSTER IR System

Experiments with a Higher-Order Projective Dependency Parser

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Disambiguation of Thai Personal Name from Online News Articles

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Refining the Design of a Contracting Finite-State Dependency Parser

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Visual CP Representation of Knowledge

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

National Literacy and Numeracy Framework for years 3/4

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Radius STEM Readiness TM

The Role of the Head in the Interpretation of English Deverbal Compounds

Short Text Understanding Through Lexical-Semantic Analysis

Learning Computational Grammars

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

A Domain Ontology Development Environment Using a MRD and Text Corpus

An Introduction to the Minimalist Program

The College Board Redesigned SAT Grade 12

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Adapting Stochastic Output for Rule-Based Semantics

Learning Methods in Multilingual Speech Recognition

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Abstractions and the Brain

CS Machine Learning

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Python Machine Learning

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Organizational Knowledge Distribution: An Experimental Evaluation

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Ontologies vs. classification systems

Analysis of Probabilistic Parsing in NLP

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Transcription:

Dependency Parsing Computational Linguistics: Jordan Boyd-Graber University of Maryland INTRO / CHART PARSING Adapted from slides by Neelamadhav Gantayat and Ryan MacDonald Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 1 / 28

Motivation Dependency Syntax Turns sentence into syntactic structure Essential for information extraction and other NLP tasks Lucien Tesnière, 1959 The sentence is an organized whole, the constituent elements of which are words. Every word that belongs to a sentence ceases by itself to be isolated as in the dictionary. Between the word and its neighbors, the mind percieves connections, the totality of which forms the structure of the sentence. The structural connections establish dependency relations between the words. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 2 / 28

Motivation Dependency Grammar Basic Assumption: Syntactic structure essentially consists of lexical items linked by binary asymmetrical relations called dependencies. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 3 / 28

Motivation Example of dependency parser output Figure: Output of Stanford dependency parser Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 4 / 28

Motivation Example of dependency parser output Figure: Output of Stanford dependency parser Verb has an artificial root Notion of phrases: by and its children So how do we choose these edges? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 4 / 28

Motivation Criteria for dependency D is likely a dependent of head H in construction C: H determines syntactic category of C and can often replace C H gives semantic specification of C; D specifies H H is obligatory; D may be optional H selectes D and determines whether D is obligatory The form of D depends on H (agreement or government) The linear position of D is specified with reference to H Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 5 / 28

Motivation Which direction? Some clear cases... Modifiers: nmod and vmod Verb slots: subject and object root subj obj nmod vmod nmod ROOT Economic news suddenly affected financial markets Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 6 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Which direction? Some tricky cases... Complex verb groups Subordinate clauses Coordination Prepositions Punctuation I can see that they rely on this and that. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 7 / 28

Motivation Dependency Parsing Input: Sentence x = w0,w 1,...,w n Output: Dependency graph G = (V,A) for x where: V = 0,1,...,n is the vertex set, A is the arc set, i.e., (i,j,k) A represents a dependency from w i to w j with label l k L Notational Conventions i j k : (i,j,k) A (unlabeled dependency) i j i j i (undirected dependency) i j i = j i : i i,i j (unlabeled closure) i j i i : i i, i j (undirected closure) Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 8 / 28

Motivation Conditions Intuitions Syntactic structure is complete (Connectedness) Syntactic structure is hierarchical (Acyclic) Every word has at most one syntactic head (Single-Head) Connectedness is enforced by adding special root node Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 9 / 28

Motivation Conditions Connected: i,j V,i j Acyclic: If i j, then not j i Single-head: If i j, then not i j i i Projective: If i j, then i i for any i such that i < i < j or j < i < i. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 10 / 28

Motivation Projectivity Equivalent to planar embedding Most theoretical frameworks do not assume projectivity Non-projective structures needed for free word order and long-distance dependencies Non-projective example The algorithm later we ll discuss is projective Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 11 / 28

Many algorithms exist (good overview in Kübler et al) We will focus on a arc-factored projective model arc-factored: Score factorizes over edges projective: no crossing lines (planar embedding) This is a common, but not universal assumption Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 12 / 28

How good is a given tree? 1. score(g) = score(v,a) 2. Arc-factored assumption: score(g) = ψ wi,r,w j (1) (w i,r,w j ) A 3. Further simplification for class: score(g) = ψ wi,w j (2) (w i,w j ) A 4. You can think about this probabilistically when ψ wi,w j log p((w i,w j ) A) (3) Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 13 / 28

Dynamic Programming A parser should avoid re-analyzing sub-strings because the analysis of a substring is independent of the rest of the parse. The parsers exploration of its search space can exploit this independence: dynamic programming (CS) / chart parsing (ling) Once solutions to sub-problems have been accumulated, solve the overall problem by composing them Sub-trees are stored in a chart, which records all substructures: re-parsing: sub-trees are looked up, not reparsed ambiguity: chart implicitly stores all parses Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 14 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans Like Viterbi algorithm, we ll solve sub problems to find the overall optimum Our overall goal is to find the best parse for the entire sentence LEFT RIGHT COMPLETE INCOMPLETE Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 15 / 28

Central Idea: Spans To do this, we ll find the best parse for contiguous spans of the sentence, characterized by start 0...n stop 0...n direction, completeness, Each span gets an entry in a 4D chart (same as 2D chart for POS tagging) Find the overall tree that gives highest score Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 16 / 28

Right Complete Spans We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word s Can have arbitrary substructure until word t, but cannot take additional right children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 17 / 28

Left Complete Spans We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word t Can have arbitrary substructure until word s, but cannot take additional left children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 18 / 28

Right Incomplete Spans We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word s Can have arbitrary substructure until word t, but can take additional right children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 19 / 28

Right Incomplete Spans Can accept additional right children We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word s Can have arbitrary substructure until word t, but can take additional right children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 19 / 28

Left Incomplete Spans We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word t Can have arbitrary substructure until word s, but can take additional left children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 20 / 28

Left Incomplete Spans Can accept additional left children We write the total score of these spans C[s][t][ ][ ] Root of this subtree is at word t Can have arbitrary substructure until word s, but can take additional left children Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 20 / 28

Dynamic Programming Intuition C[0][L][ ][ ] contains the best score for the overall tree. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 21 / 28

Dynamic Programming Intuition C[0][L][ ][ ] contains the best score for the overall tree. Where's the main verb? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 21 / 28

Dynamic Programming Intuition C[0][L][ ][ ] contains the best score for the overall tree. Where's the main verb? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 21 / 28

Dynamic Programming Intuition C[0][L][ ][ ] contains the best score for the overall tree. Where's the main verb? What are the right children of the root? What are the left children of the main verb? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 21 / 28

Dynamic Programming Intuition C[0][L][ ][ ] contains the best score for the overall tree. Where's the main verb? What are the right children of the verb? What are the right children of the root? What are the left children of the main verb? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 21 / 28

Building Incomplete Spans Left incomplete spans are built by joining left complete to right complete C[s][t][ ][ ] = max s q<t C[s][q][ ][ ] + C[q + 1][t][ ][ ] + λ (w t,w s ) (4)? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 22 / 28

Building Incomplete Spans Right incomplete spans are built by joining right complete to left complete C[s][t][ ][ ] = max s q<t C[s][q][ ][ ] + C[q + 1][t][ ][ ] + λ (ws,w t ) (4)? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 22 / 28

Building Incomplete Spans Right incomplete spans are built by joining right complete to left complete C[s][t][ ][ ] = max s q<t C[s][q][ ][ ] + C[q + 1][t][ ][ ] + λ (ws,w t ) (4) Dynamic Programming? When we compute the score for any span, we consider all possible ways that the span could have been built. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 22 / 28

Building Incomplete Spans Right incomplete spans are built by joining right complete to left complete C[s][t][ ][ ] = max s q<t C[s][q][ ][ ] + C[q + 1][t][ ][ ] + λ (ws,w t ) (4)? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 22 / 28

Completing Spans Right complete spans are built by taking an incomplete right span and then completing it with a right complete span C[s][t][ ][ ] = max s<q t C[s][q][ ][ ] + C[q][t][ ][ ]? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 23 / 28

Completing Spans Left complete spans are built by taking an incomplete left span and then completing it with a left complete span C[s][t][ ][ ] = max s q<t C[s][q][ ][ ] + C[q][t][ ][ ]? Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 23 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

Example Sentence Final step Look at cell at corresponding to 0 to the length of the sentence, complete, and directed to the right. That is the best parse. Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 24 / 28

What s left: Breadcrumbs and Complexity As you build the chart, you must keep track of what the best subtrees were to construct each cell; call this b Then look at b[0][l][ ][ ], and recursively build the tree Complexity is O(n3 ): Table is size O(n 2 ) Each cell requires at most n possible subtrees Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 25 / 28

Extensions to Dependency Parsing Horizontal and vertical Markovization (node depends on siblings and grandparents in tree logical!) saw with telescope more likely than bridge with telescope (grandparent) fast sports car more likely than fast slow car (sibling) Graph algorithms: allow non-projectivity Sequential processing (next!) Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 26 / 28

Extensions to Dependency Parsing Horizontal and vertical Markovization (node depends on siblings and grandparents in tree logical!) saw with telescope more likely than bridge with telescope (grandparent) fast sports car more likely than fast slow car (sibling) Graph algorithms: allow non-projectivity Sequential processing (next!) Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 26 / 28

Evaluation and Estimation Where does the attachment score come from? Language model: vertical rather than horizontal How likely is the noun bagel the child of the verb eat? Back off to noun being the child of the verb eat... Back off to a noun being the child of a verb Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 27 / 28

Evaluation and Estimation Where does the attachment score come from? Language model: vertical rather than horizontal How likely is the noun bagel the child of the verb eat? Back off to noun being the child of the verb eat... Back off to a noun being the child of a verb Discriminative models: minimize errors Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 27 / 28

Evaluation and Estimation Evaluation Methodology How many sentences are exactly correct Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 28 / 28

Evaluation and Estimation Evaluation Methodology How many sentences are exactly correct Edge accuracy 1. Labeled attachment score (LAS): i.e. Tokens with correct head and label 2. Unlabeled attachment score (UAS): i.e. Tokens with correct head 3. Label accuracy (LA): i.e. Tokens with correct label Performance on downstream task (e.g., information extraction) Computational Linguistics: Jordan Boyd-Graber UMD Dependency Parsing 28 / 28