Syntactic analysis. Marco Kuhlmann Department of Computer and Information Science. Language Technology (2019)

Similar documents
11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Context Free Grammars. Many slides from Michael Collins

Parsing of part-of-speech tagged Assamese Texts

Prediction of Maximal Projection for Semantic Role Labeling

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Accurate Unlexicalized Parsing for Modern Hebrew

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Natural Language Processing. George Konidaris

Ensemble Technique Utilization for Indonesian Dependency Parser

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Developing a TT-MCTAG for German with an RCG-based Parser

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Learning Computational Grammars

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Using dialogue context to improve parsing performance in dialogue systems

Second Exam: Natural Language Parsing with Neural Networks

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Analysis of Probabilistic Parsing in NLP

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Some Principles of Automated Natural Language Information Extraction

Parsing natural language

The stages of event extraction

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Specifying a shallow grammatical for parsing purposes

Construction Grammar. University of Jena.

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Proof Theory for Syntacticians

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

An Efficient Implementation of a New POP Model

LTAG-spinal and the Treebank

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

"f TOPIC =T COMP COMP... OBJ

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Adapting Stochastic Output for Rule-Based Semantics

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

AQUA: An Ontology-Driven Question Answering System

A Grammar for Battle Management Language

Compositional Semantics

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

(Sub)Gradient Descent

The Smart/Empire TIPSTER IR System

Domain Adaptation for Parsing

The Interface between Phrasal and Functional Constraints

Constraining X-Bar: Theta Theory

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Chapter 4: Valence & Agreement CSLI Publications

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

An Interactive Intelligent Language Tutor Over The Internet

Control and Boundedness

Applications of memory-based natural language processing

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Argument structure and theta roles

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Ch VI- SENTENCE PATTERNS.

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Refining the Design of a Contracting Finite-State Dependency Parser

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

A Computational Evaluation of Case-Assignment Algorithms

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Multiple case assignment and the English pseudo-passive *

Theoretical Syntax Winter Answers to practice problems

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Specifying Logic Programs in Controlled Natural Language

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Cross Language Information Retrieval

A Graph Based Authorship Identification Approach

Universiteit Leiden ICT in Business

Hyperedge Replacement and Nonprojective Dependency Structures

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Underlying and Surface Grammatical Relations in Greek consider

LING 329 : MORPHOLOGY

An Introduction to the Minimalist Program

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

A General Class of Noncontext Free Grammars Generating Context Free Languages

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Pseudo-Passives as Adjectival Passives

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Extracting Verb Expressions Implying Negative Opinions

Update on Soar-based language processing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Transcription:

Language Technology (2019) Syntactic analysis Marco Kuhlmann Department of Computer and Information Science This work is licensed under a Creative Commons Attribution 4.0 International License.

Syntactic analysis Syntactic analysis or syntactic parsing is the task to map a sentence to a formal representation of its syntactic structure. The syntactic structure of a sentence provides important clues about the meaning of the sentence. example application: information extraction

Different syntactic representations Phrase structure tree Dependency tree S NP VP Pro Verb NP I booked a flight from L.A. I booked Det Nom a Nom PP Noun from L.A. flight Source: Wikimedia Commons [1] [2] Noam Chomsky Lucien Tesnière

Information extraction Information extraction (IE) is the task of extracting structured information from running text. More specifically, the term structured information refers to named entities and semantic relations between those entities. persons, organisations, companies X is-leader-of Y, X bought Y

This Stanford University alumnus co-founded educational technology company Coursera. Source: MacArthur Foundation SPARQL query against DBPedia SELECT DISTINCT?x WHERE {?x dbo:almamater dbr:stanford_university. dbr:coursera dbo:foundedby?x. }

Syntactic structure, semantic relations subject object Koller co-founded Coursera dbr:coursera dbo:foundedby dbr:daphne_koller

Algorithmic approaches to syntactic analysis Exhaustive search Cast parsing as a combinatorial optimisation problem over the set of target representations (trees). CKY algorithm Greedy search Casts parsing as a sequence of classification problems: at each point in time, predict one of several parser actions. transition-based dependency parsing

This lecture Introduction to syntactic analysis Parsing to phrase structure trees Context-free grammars Parsing with probabilistic context-free grammars Parsing to dependency trees Transition-based dependency parsing

Context-free grammars

Phrases and syntactic heads Words within sentences form groupings called phrases. Kim read [a book]. Kim read [a very interesting book about grammar]. Each phrase is projected by a syntactic head, which determines its internal structure and external distribution. [The war on drugs] is controversial. / *[The battle on drugs] is controversial. [The war on drugs] is controversial. / *[The war on drugs] are controversial.

Context-free grammars Phrases can be combined to form larger phrases. This gives rise to a hierarchical structure. The phrase structure of a sentence can be described using context-free grammars. The main ingredient of a context-free grammar is a set of rules that describe how phrases are structured.

A context-free grammar Rule S NP VP NP Pronoun NP Proper-Noun NP Det Nominal Nominal Nominal Noun Nominal Noun VP Verb VP Verb NP VP Verb NP PP VP Verb PP PP Preposition NP Example I + want a morning flight I Los Angeles a flight morning flight flights do want + a flight leave + Boston + in the morning leaving + on Thursday from + Los Angeles

Context-free grammars, formal definition N T P S a set of nonterminals (phrase labels) a set of terminals (words) a finite set of rules or productions a distinguished nonterminal symbol called the start symbol

Notation for rules left-hand side S NP VP right-hand side A sentence (S) consists of a noun phrase (NP) and a verb phrase (VP).

Phrase structure tree S NP VP Pro Verb NP I prefer Det Nom a Nom Noun Noun flight morning

Limitations of context-free grammars Context-free grammars can model many important aspects of natural language syntax. linguistic creativity, nested structures But there are other aspects that they do not model adequately, or are unable to model at all. agreement, crossing dependencies

Subject verb agreement In English, a verb and its grammatical subject need to agree with respect to number. *[A flight] [leave Boston in the morning] The rules of our example grammar do not capture this regularity. The grammar overgenerates.

Subject verb agreement One way to solve the problem with overgeneration is to specialise the rules of the grammar with respect to number: Rule S NP[sg] VP[sg] NP[sg] Det[sg] Nom[sg] VP[sg] Verb[sg] PP NP[pl] Det[pl] NP[pl] Example this flight + leaves on Monday this + flight leaves + on Monday these + flights However, this makes the size of grammar explode.

Chomsky hierarchy recursively enumerable (type 0) context-sensitive (type 1) context-free (type 2) regular (type 3)

This lecture Introduction to syntactic analysis Parsing to phrase structure trees Context-free grammars Parsing with probabilistic context-free grammars Parsing to dependency trees Transition-based dependency parsing

Parsing with probabilistic context-free grammars

Syntactic ambiguity S S NP VP NP VP Pro Verb NP Pro Verb NP PP I booked Det Nom I booked Det Nom from LA a Nom PP a Noun Noun from LA flight flight The PP modifies flight. The PP modifies booked.

Combinatorial explosion 800 600 400 exponential cubic linear 200 0 0 1 2 3 4 5 6

Probabilistic grammars The number of possible parse trees grows exponentially with the length of the sentence. But not all parse trees are equally relevant, and in many applications, we just want to find the most probable parse tree.

Probabilistic context-free grammar A probabilistic context-free grammar (PCFG) is a context-free grammar with the following additional properties: Every rule r has been assigned a probability P(r). The total probability of all rules with the same left-hand side is 1.

Probabilistic context-free grammar Rule Probability S NP VP 1/1 NP Pronoun 1/3 NP Proper-Noun 1/3 NP Det Nominal 1/3 Nominal Nominal PP 1/3 Nominal Noun 2/3 VP Verb NP 8/9 VP Verb NP PP 1/9 PP Preposition NP 1/1

The probability of a parse tree The probability of a parse tree t is defined as the product of the probabilities of the rules that appear in t:

Probability of a parse tree S 1/1 NP 1/3 VP 8/9 Pro Verb NP 1/3 I booked Det Nom 1/3 a Nom 2/3 PP Noun from LA Probability of this tree: 0.0219 flight

Probability of a parse tree S 1/1 NP 1/3 VP 1/9 Pro Verb NP 1/3 PP I booked Det Nom 2/3 from LA a Noun flight Probability of this tree: 0.0082

The CKY algorithm We need an efficient algorithm that can find the most probable parse tree, much like the Viterbi algorithm for POS tagging. efficient = runtime is at most polynomial in the length of the sentence One such algorithm is (the probabilistic extension of) the Cocke Kasami Younger (CKY) algorithm. advanced material

Combinatorial explosion 800 600 400 exponential cubic linear 200 0 0 1 2 3 4 5 6

Treebanks Until the mid-1990s, syntactic parsers used large, hand-written grammars created by linguistic experts. Modern parsers are learned from corpora of syntactic analyses called treebanks. Penn Treebank, Swedish Treebank, Universal Dependencies Project

Penn Treebank ( (S Grammar rule Phrase (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) S NP-SBJ VP. Pierre Vinken Nov. 29. (,,) NP-SBJ NP, ADJP, Pierre Vinken, 61 years old, (ADJP (NP (CD 61) (NNS years) ) VP MD VP will join the board (JJ old) ) (,,) ) (VP (MD will) NP DT NN the board (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (..) ))

Estimation of rule probabilities Given a phrase structure treebank, the rule probabilities of a PCFG can be obtained using maximum likelihood estimation. To do this, we divide the count for a certain rule by the count for all rules that share the same left-hand side.

Sample exam question: Estimate rule probabilities

This lecture Introduction to syntactic analysis Parsing to phrase structure trees Context-free grammars Parsing with probabilistic context-free grammars Parsing to dependency trees Transition-based dependency parsing

Transition-based dependency parsing

Algorithmic approaches Exhaustive search Cast parsing as a combinatorial optimisation problem over the set of target representations (trees). CKY algorithm Greedy search Cast parsing as a sequence of classification problems: at each point in time, predict one of several parser actions. transition-based dependency parsing

Dependency parsing as classification In Section 3 we have seen how part-of-speech tagging can be broken down into a sequence of classification problems. part-of-speech tagging with the multi-class perceptron In this section we will see how the same idea can be applied to dependency parsing. Instead of POS tags, the classifier will predict transitions that take the parser from one configuration to another. moves, states

Transition-based dependency parsing The parser starts in the initial configuration. It then calls the classifier, which predicts the transition that the parser should make to move to the next configuration. This process is repeated until the parser reaches a terminal configuration.

Configurations A parser configuration consists of three parts: A buffer, which contains those words in the sentence that still need to be processed. Initially, the buffer contains all words. A stack, which contains those words in the sentence that are currently being processed. Initially, the stack is empty. A partial dependency tree. Initially, this tree contains all the words of the sentence, but no dependency arcs.

Transitions The shift transition (SH) removes the frontmost word from the buffer and pushes it to the top of the stack. The left-arc transition (LA) creates a dependency from the topmost word on the stack to the second-topmost word, and removes the second-topmost word. The right-arc transition (RA) creates a dependency from the second-topmost word on the stack to the topmost word, and removes the topmost word.

Transition-based dependency parsing, example I booked a flight from L.A. I booked a flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. I booked a flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. I booked a flight from L.A. stack buffer LA classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked a flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked a flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked a flight from L.A. stack buffer LA classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked flight from L.A. stack buffer SH classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked flight from L.A. stack buffer RA classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked flight from stack buffer RA classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked flight stack buffer RA classifier

Transition-based dependency parsing, example I booked a flight from L.A. booked stack buffer (terminal configuration)

Features in transition-based dependency parsing Features can be defined over the next words in the buffer the topmost words in the stack the partial dependency tree

Features in transition-based dependency parsing I booked a flight from L.A. I booked a flight from L.A. stack buffer Is booked a verb? Can I be a subject? Does booked already have a subject?

Training transition-based dependency parsers To train a transition-based dependency parser, we need a treebank with dependency trees. In addition to that, we need an algorithm that tells us the goldstandard transition sequence for a tree in that treebank. oracle

This lecture Introduction to syntactic analysis Parsing to phrase structure trees Context-free grammars Parsing with probabilistic context-free grammars Parsing to dependency trees Transition-based dependency parsing