Dependency Parsing. Prashanth Mannem

Similar documents
Ensemble Technique Utilization for Indonesian Dependency Parser

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

CS 598 Natural Language Processing

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Prediction of Maximal Projection for Semantic Role Labeling

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Context Free Grammars. Many slides from Michael Collins

LTAG-spinal and the Treebank

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

The Interface between Phrasal and Functional Constraints

Parsing of part-of-speech tagged Assamese Texts

Experiments with a Higher-Order Projective Dependency Parser

Some Principles of Automated Natural Language Information Extraction

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Natural Language Processing. George Konidaris

Two methods to incorporate local morphosyntactic features in Hindi dependency

The stages of event extraction

Accurate Unlexicalized Parsing for Modern Hebrew

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

CS Machine Learning

Adapting Stochastic Output for Rule-Based Semantics

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

GACE Computer Science Assessment Test at a Glance

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

A deep architecture for non-projective dependency parsing

"f TOPIC =T COMP COMP... OBJ

An Efficient Implementation of a New POP Model

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Hyperedge Replacement and Nonprojective Dependency Structures

Linking Task: Identifying authors and book titles in verbose queries

Refining the Design of a Contracting Finite-State Dependency Parser

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Learning Computational Grammars

Developing a TT-MCTAG for German with an RCG-based Parser

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Chapter 4: Valence & Agreement CSLI Publications

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Beyond the Pipeline: Discrete Optimization in NLP

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Multimedia Application Effective Support of Education

Discriminative Learning of Beam-Search Heuristics for Planning

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

An Introduction to the Minimalist Program

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Domain Adaptation for Parsing

Learning Methods in Multilingual Speech Recognition

A Version Space Approach to Learning Context-free Grammars

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Second Exam: Natural Language Parsing with Neural Networks

Compositional Semantics

Lecture 1: Machine Learning Basics

Survey on parsing three dependency representations for English

Proof Theory for Syntacticians

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

AQUA: An Ontology-Driven Question Answering System

Analysis of Probabilistic Parsing in NLP

Pre-Processing MRSes

The Smart/Empire TIPSTER IR System

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

(Sub)Gradient Descent

Introduction to Causal Inference. Problem Set 1. Required Problems

The Discourse Anaphoric Properties of Connectives

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Construction Grammar. University of Jena.

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

A Case Study: News Classification Based on Term Frequency

The Strong Minimalist Thesis and Bounded Optimality

A relational approach to translation

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

The Role of the Head in the Interpretation of English Deverbal Compounds

LFG Semantics via Constraints

CSC200: Lecture 4. Allan Borodin

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Feature-Based Grammar

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

A Graph Based Authorship Identification Approach

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

A Computational Evaluation of Case-Assignment Algorithms

Corrective Feedback and Persistent Learning for Information Extraction

Rule Learning With Negation: Issues Regarding Effectiveness

Transcription:

Dependency Parsing Prashanth Mannem mannemp@eecs.oregonstate.edu

Outline Introduction Dependency Parsing Formal definition Parsing Algorithms Introduction Dynamic programming Deterministic search 2

Syntax Study of the way sentences are constructed from smaller units Formal systems that enable this Phrase Structure Grammar Dependency Grammar More Tree Adjoining Grammar(TAG), Categorical Grammar

Phrase Structure Grammar S walked Constituents as building blocks Phrase structure rules to form constituents NP Sue NNP Sue VBD walked VP walked PP into o o Recursive Lexicalized Sue walked P into NP store into DT the NN store the store [ S [ NP Sue/NNP] [ VP walked/vbd [ PP into/p [ NP the/dt store/nn ]] ] ] 4 02/13/15

Dependency Grammar The idea of dependency structure goes back a long way To Pāṇini s grammar (c. 5th century BCE) Constituency is a new invention 20th century Modern work often linked to work of L. Tesniere (1959) Dominant approach in East (Eastern bloc/east Asia) Among the earliest kinds of parsers in NLP, even in US: David Hays, one of the founders of computational linguistics, built early (first?) dependency parser (Hays 1962)

Dependency Grammar The dog The huge dog The huge lovable dog dog dog dog the the huge the huge lovable 02/13/15

Dependency Grammar dog with a very loud bark the huge lovable 02/13/15

Dependency Grammar dog with a very bark the huge lovable loud 02/13/15

Dependency Grammar dog with a bark the huge lovable loud very 02/13/15

Dependency Grammar dog with bark the huge lovable a loud very 02/13/15

Dependency Grammar dog with the huge lovable bark a loud very 02/13/15

Dependency Grammar dog the huge lovable with bark a loud very 02/13/15

Dependency Grammar dog the huge lovable with bark a loud very the huge lovable dog with a very loud bark 02/13/15

Dependency Grammar Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies Interested in grammatical relations between individual words (governing & dependent words) Does not propose a recursive structure Rather a network of relations These relations can also have labels

Red figures on the screen indicated falling stocks John booked me a flight from Houston to Portland to attend the seminar

_ROOT_ Red figures on the screen indicated falling stocks John booked me a flight from Houston to Portland to attend the seminar

_ROOT_ Red figures on the screen indicated falling stocks John booked me a flight from Houston to Portland to attend the seminar

Dependency Tree with Labels Phrasal nodes are missing in the dependency structure when compared to constituency structure.

Comparison Dependency structures explicitly represent Head-dependent relations (directed arcs) Functional categories (arc labels) Possibly some structural categories (parts-of-speech) Phrase structure explicitly represent Phrases (non-terminal nodes) Structural categories (non-terminal labels) Possibly some functional categories (grammatical functions)

Parsing DG over PSG Dependency Parsing is more straightforward Parsing can be reduced to labeling each token w i with w j Direct encoding of predicate-argument structure Fragments are directly interpretable Dependency structure independent of word order Suitable for free word order languages (like Indian languages)

Outline Introduction Dependency Parsing Formal definition Parsing Algorithms Introduction Dynamic programming Deterministic search

Dependency Tree Formal definition An input word sequence w 1 w n Dependency graph D = (W,E) where W is the set of nodes i.e. word tokens in the input seq. E is the set of unlabeled tree edges (w i, w j ) (w i, w j є W). (w i, w j ) indicates an edge from w i (parent) to w j (child). Task of mapping an input string to a dependency graph satisfying certain conditions is dependency parsing

Well-formedness A dependency graph is well-formed iff Single head: Each word has only one head. Acyclic: The graph should be acyclic. Connected: The graph should be a single tree with all the words in the sentence. Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B).

Non-projective dependency tree John saw a dog yesterday which was a Yorkshire Terrier * Crossing lines English has very few non-projective cases.

Outline Introduction Phrase Structure Grammar Dependency Grammar Comparison and Conversion Dependency Parsing Formal definition Parsing Algorithms Introduction Dynamic programming Deterministic search

Dependency Parsing Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars. Data driven approaches Parsing by training on annotated/un-annotated data.

Dependency Parsing Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars. Data driven approaches Parsing by training on annotated/un-annotated data. These approaches are not mutually exclusive

Covington s Incremental Algorithm Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i 1 down to 1 3. LINK(w i, w j )

Covington s Incremental Algorithm Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i 1 down to 1 3. LINK(w i, w j ) Constraints such as Single-Head and Projectivity can be incorporated into the LINK operation.

Parsing Methods Main traditions Dynamic programming CYK, Eisner, McDonald MST Deterministic search Covington, Yamada and Matsumuto, Nivre

Dynamic Programming Basic Idea: Treat dependencies as constituents. Use, e.g., CYK parser (with minor modifications)

Dependency Chart Parsing Grammar is regarded as context-free, in which each node is lexicalized Chart entries are subtrees, i.e., words with all their left and right dependents Problem: Different entries for different subtrees spanning a sequence of words with different heads O(n 5 )

Slide from [Eisner, 1997] Generic Chart Parsing for each of the O(n 2 ) substrings, for each of O(n) ways of splitting it, for each of S analyses of first half for each of S analyses of second half, for each of c ways of combining them: combine, & add result to chart if best [cap spending] + [at $300 million] = [[cap spending] [at $300 million]] S analyses S analyses cs 2 analyses of which we keep S

Slide from [Eisner, 1997] Headed constituents...... have too many signatures. How bad is (n 3 S 2 c)? For unheaded constituents, S is constant: NP, VP... (similarly for dotted trees). So (n 3 ). But when different heads different signatures,the average substring has (n) possible heads and S= (n) possible signatures. So (n 5 ).

Dynamic Programming Approaches Original version [Hays 1964] (grammar driven) Link grammar [Sleator and Temperley 1991] (grammar driven) Bilexical grammar [Eisner 1996] (data driven) Maximum spanning tree [McDonald 2006] (data driven)

Eisner 1996 Two novel aspects: Modified parsing algorithm Probabilistic dependency parsing Complexity: O(n 3 ) Modification: Instead of storing subtrees,storespans Span: Substring such that no interior word links to any word outside the span. Idea: In a span, only the boundary words are active, i.e. still need a head or a child One or both of the boundary words can be active

Example _ROOT_ Red figures on the screen indicated falling stocks

Example _ROOT_ Red figures on the screen indicated falling stocks Spans: Red figures indicated falling stocks

Assembly of correct parse _ROOT_ Red figures on the screen indicated falling stocks Start by combining adjacent words to minimal spans Red figures figures on on the

Assembly of correct parse _ROOT_ Red figures on the screen indicated falling stocks Combine spans which overlap in one word; this word must be governed by a word in the left or right span. on the + the screen on the screen

Assembly of correct parse _ROOT_ Red figures on the screen indicated falling stocks Combine spans which overlap in one word; this word must be governed by a word in the left or right span. figures on + on the screen figures on the screen

Assembly of correct parse _ROOT_ Red figures on the screen indicated falling stocks Combine spans which overlap in one word; this word must be governed by a word in the left or right span. Invalid span Red figures on the screen

Assembly of correct parse _ROOT_ Red figures on the screen indicated falling stocks Combine spans which overlap in one word; this word must be governed by a word in the left or right span. indicated falling + falling stocks indicated falling stocks

Eisner 1996 Two novel aspects: Modified parsing algorithm Probabilistic dependency parsing Complexity: O(n 3 )

McDonald s Maximum Spanning Trees Score of a dependency tree = sum of scores of dependencies Scores are independent of other dependencies If scores are available, parsing can be formulated as maximum spanning tree problem Two cases: Projective: Use Eisner s parsing algorithm. Non-projective: Use Chu-Liu-Edmonds algorithm [Chu and Liu 1965, Edmonds 1967] Uses online structured perceptron for determining weight vector w

Parsing Methods Main traditions Dynamic programming CYK, Eisner, McDonald Deterministic parsing Covington, Yamada and Matsumuto, Nivre

Deterministic Parsing Basic idea: Derive a single syntactic representation (dependency graph) through a deterministic sequence of elementary parsing actions Sometimes combined with backtracking or repair Motivation: Psycholinguistic modeling Efficiency Simplicity

Yamada and Matsumoto Parsing in several rounds: deterministic bottom-up O(n 2 ) Looks at pairs of words 3 actions: shift, left, right Shift: shifts focus to next word pair

Yamada and Matsumoto Left: decides that the left word depends on the right one Right: decides that the right word depends on the left word

Parsing Algorithm Go through each pair of words Decide which action to take If a relation was detected in a pass, do another pass E.g. the little girl First pass: relation between little and girl Second pass: relation between the and girl Decision on action depends on word pair and context

Parsing Data-driven deterministic parsing: Deterministic parsing requires an oracle. An oracle can be approximated by a classifier. A classifier can be trained using treebank data. Learning algorithms: Support vector machines (SVM) [Kudo and Matsumoto 2002, Yamada and Matsumoto 2003,Isozaki et al. 2004, Cheng et al. 2004, Nivre et al. 2006] Maximum entropy modeling (MaxEnt) [Cheng et al. 2005] Structured Perceptron [McDonald et al. 2006]

Evaluation of Dependency Parsing: Simply use (labeled) dependency accuracy GOLD PARSED 1 2 3 4 5 Accuracy = number of correct dependencies total number of dependencies = 2 / 5 = 0.40 40% 1 2 We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheese MOD 5 2 sandwich SUBJ 1 2 We SUBJ 2 0 eat ROOT 3 4 the DET 4 2 cheese OBJ 5 2 sandwich PRED

Feature Models Learning problem: Approximate a function from parser states, represented by feature vectors to parser actions, Given a training set of gold standard trees. Typical features: Tokens and POS tags of : Target words Linear context (neighbors in S and Q) Structural context (parents, children, siblings in G) Can not be used in dynamic programming algorithms.

Summary Provided an intro to dependency parsing and various dependency parsing algorithms Read up Nivre s and McDonald s tutorial on dependency parsing at ESSLLI 07

References Nivre s and McDonald s tutorial on dependency parsing at ESSLLI 07 Dependency Grammar and Dependency Parsing http://stp.lingfil.uu.se/~nivre/docs/05133.pdf Online Large-Margin Training of Dependency Parsers R. McDonald, K. Crammer and F. Pereira ACL, 2005 Pseudo-Projective Dependency Parsing. Nivre, J. and J. Nilsson ACL, 2005

Phrase Structure Grammar Phrases (non-terminal nodes) Structural categories (nonterminal labels) CFG Rules o Recursive o Lexicalized Sue NNP walked VBD into P the DT PP NP store NN NP VP S [ S Sue walked into the store] [ S [ NP Sue] [ VP walked into the store ] ] S NP VP [ S [ NP Sue] [ VP [ VBD walked ][ PP into the store ]]] VP VBD PP [ S [ NP Sue] [ VP [ VBD walked ] [ PP [ P into ][ NP the store ] ] ] ] PP P NP [ S [ NP Sue] [ VP [ VBD walked ][ PP [ P into ] [ NP [ DT the ] [ NN store ] ] ]]] NP DT NN 57

Phrase Structure Grammar [ S Sue walked into the store] [ S [ NP Sue] [ VP walked into the store ] ] S NP VP [ S [ NP Sue] [ VP [ VBD walked ] [ PP into the store ] ]] VP VBD PP [ S [ NP Sue] [ VP [ VBD walked ] [ PP [ P into ] [ NP the store ] ] ] ] PP P NP [ S [ NP Sue] [ VP [ VBD walked ] [ PP [ P into ] [ NP [ DT the ] [ NN store ] ] ]] ] NP DT NN Phrases (non-terminal nodes) Structural categories (non-terminal labels) 58 02/13/15

Eisner s Model Recursive Generation Each word generates its actual dependents Two Markov chains: Left dependents Right dependents

Eisner s Model where tw(i) is i th tagged word lc(i) & rc(i) are the left and right children of i th word where lc j (i) is the j th left child of the i th word t(lc j-1 (i)) is the tag of the preceding left child

Nivre s Algorithm Four parsing actions: Shift [...]S [w i,...]q [..., w i ]S [...]Q Reduce [..., w i ]S [...]Q Эw k : w k w i [...]S [...]Q Left-Arc [..., w i ]S [w j,...]q Эw k : w k w i [...]S [w j,...]q w i w j Right-Arc [...,w i ]S [w j,...]q Эw k : w k w j [..., w i, w j ]S [...]Q w i w j 61 61 Dependency Parsing 02/12/15 02/13/15

Nivre s Algorithm Characteristics: Arc-eager processing of right-dependents Single pass over the input gives time worst case complexity O(2n) 62 62 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ S Red figures on the screen indicated falling stocks Q 63 63 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Shift 64 64 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ S Red figures on the screen indicated falling stocks Q Left-arc 65 65 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Shift 66 66 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Right-arc 67 67 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Shift 68 68 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Left-arc 69 69 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Right-arc 70 70 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Reduce 71 71 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Reduce 72 72 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Left-arc 73 73 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Right-arc 74 74 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Shift 75 75 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Left-arc 76 76 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Right-arc 77 77 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Reduce 78 78 Dependency Parsing 02/12/15 02/13/15

Example _ROOT_ Red figures on the screen indicated falling stocks S Q Reduce 79 79 Dependency Parsing 02/12/15 02/13/15