Introduction to Computational Linguistics

Similar documents
Grammars & Parsing, Part 1:

Context Free Grammars. Many slides from Michael Collins

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

CS 598 Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Natural Language Processing. George Konidaris

Words come in categories

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Prediction of Maximal Projection for Semantic Role Labeling

Parsing of part-of-speech tagged Assamese Texts

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Constraining X-Bar: Theta Theory

Argument structure and theta roles

Using dialogue context to improve parsing performance in dialogue systems

The Role of the Head in the Interpretation of English Deverbal Compounds

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Chapter 4: Valence & Agreement CSLI Publications

An Interactive Intelligent Language Tutor Over The Internet

LTAG-spinal and the Treebank

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Construction Grammar. University of Jena.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

The stages of event extraction

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Introduction to Text Mining

Adapting Stochastic Output for Rule-Based Semantics

Developing a TT-MCTAG for German with an RCG-based Parser

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Compositional Semantics

Some Principles of Automated Natural Language Information Extraction

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Ch VI- SENTENCE PATTERNS.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Accurate Unlexicalized Parsing for Modern Hebrew

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Ensemble Technique Utilization for Indonesian Dependency Parser

Proof Theory for Syntacticians

Specifying a shallow grammatical for parsing purposes

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Learning Computational Grammars

What the National Curriculum requires in reading at Y5 and Y6

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

The Smart/Empire TIPSTER IR System

Analysis of Probabilistic Parsing in NLP

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

A Graph Based Authorship Identification Approach

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A Grammar for Battle Management Language

The Indiana Cooperative Remote Search Task (CReST) Corpus

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Interfacing Phonology with LFG

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

The History of Language Teaching

Phenomena of gender attraction in Polish *

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Control and Boundedness

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Parsing natural language

Update on Soar-based language processing

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Linking Task: Identifying authors and book titles in verbose queries

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Pseudo-Passives as Adjectival Passives

On the Notion Determiner

A Computational Evaluation of Case-Assignment Algorithms

LING 329 : MORPHOLOGY

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

AQUA: An Ontology-Driven Question Answering System

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Annotation Projection for Discourse Connectives

CS Machine Learning

Cross Language Information Retrieval

A Usage-Based Approach to Recursion in Sentence Processing

The Interface between Phrasal and Functional Constraints

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Children s Acquisition of Syntax: Simple Models are Too Simple

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Multiple case assignment and the English pseudo-passive *

Applications of memory-based natural language processing

Derivational and Inflectional Morphemes in Pak-Pak Language

Copyright and moral rights for this thesis are retained by the author

Universiteit Leiden ICT in Business

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Underlying and Surface Grammatical Relations in Greek consider

Loughton School s curriculum evening. 28 th February 2017

Transcription:

Syntax and. Introduction to Computational Linguistics Syntax and. Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington April 24, 2018 1 / 55

Syntax and. Assignment 3 (N-grams): due Friday Midterm: May 1st Optional discussion board available instead of RQ Project Milestone 1: Grading was done based on how easy it was to decide if it was a viable project Data! Project Milestone 2: due next Friday Description expanded Data! Submit your group members names... (as per instructions: http://courses.washington.edu/ling472/final project.html) 2 / 55

Syntax and. : constituent and dependency Leonard Bernstein s musical syntax https://www.youtube.com/watch?v=r fxb6yrdvo Target representations Evaluating parsing 3 / 55

Syntax and. Recognizing string as input and assigning structure to it Syntactic parsing: assigning syntactic structure Semantic parsing: assigning semantic structure 4 / 55

Syntactic Syntax and. : Making explicit structure that is inherent (implicit) in natural language strings What is that structure? Why would we need it? 5 / 55

: Making explicit structure that is inherent (implicit) in natural language strings NP What is that structure? Why would we need it? S VP Syntax and. I saw NP the astronomer PP S with the telescope NP VP I saw NP PP the astronomer with the telescope 6 / 55

: Making explicit structure that is inherent (implicit) in natural language strings What is that structure? Why would we need it? Syntax and. pic from: Carnie, A. Mixed categories in Irish (2011). 7 / 55

: Making explicit structure that is inherent (implicit) in natural language strings What is that structure? Why would we need it? Syntax and. pic from: http://www.dobnik.net/simon/teaching/shared/lt2112-formling/pics/?ma 8 / 55

Implicit structure Syntax and. What do these sentences have in common? Kim gave the book to Sandy. Kim gave Sandy the book. The book was given to Sandy by Kim. This is the book that Kim gave to Sandy. Which book did Kim give to Sandy? Kim will be expected to continue to try to give the book to Sandy. This book everyone agrees Pat thinks Kim gave to Sandy. This book is difficult for Kim to give to Sandy. 9 / 55

Implicit structure: Constituent structure & Dependency structure Syntax and. Kim gave the book to Sandy. (S (NP Kim) (VP (V gave) (NP (D the) (N book)) (PP (P to) (NP Sandy)))) subj(gave, Kim); dobj(gave, book); iobj(gave, to); dobj(to, Sandy); spec(book, the) S?? 10 / 55

Implicit structure: Constituent structure & Dependency structure Kim gave the book to Sandy. (S (NP Kim) (VP (V gave) (NP (D the) (N book)) (PP (P to) (NP Sandy)))) subj(gave, Kim); dobj(gave, book); iobj(gave, to); dobj(to, Sandy); spec(book, the) Syntax and. S NP VP Kim V NP PP gave D N P NP the book to Sandy 11 / 55

Dependency parsing Syntax and. Instead of constituents, look at grammatical relations between heads of constituents Why? 12 / 55

Dependency parsing Syntax and. Instead of constituents, look at grammatical relations between heads of constituents Why? flexible word order semantics relations between active and passive 13 / 55

Dependency structure Syntax and. 14 / 55

Exercise: Constituent Structure & Dependency Structure Syntax and. How much wood would a woodchuck chuck if a woodchuck could chuck wood? 15 / 55

English Resource Grammar Syntax and. 16 / 55

Stanford Parser Syntax and. 17 / 55

When do we need structure? When do we need constituent structure? When do we need dependency structure? Syntax and. S NP VP Kim V NP PP gave D N P NP the book to Sandy 18 / 55

When do we need structure? Syntax and. When do we need constituent structure? Structured language models (ASR, MT) Translation models (MT) Generation TTS: assigning intonation information When do we need dependency structure? Information extraction (... QA, machine reading) Dialogue systems Sentiment analysis Transfer-based MT 19 / 55

Ambiguity : Making explicit structure that is inherent (implicit) in natural language strings How does it relate to the ambiguity issue? Suppose you have a parser. Does it help you with ambiguity? S Syntax and. NP VP I saw NP the astronomer PP S with the telescope NP VP I saw NP PP the astronomer with the telescope 20 / 55

Ambiguity : Making explicit structure that is inherent (implicit) in natural language strings How does it relate to the ambiguity issue? Suppose you have a parser. Does it help you with ambiguity? No! But can do parse ranking... S Syntax and. NP VP I saw NP the astronomer PP S with the telescope NP VP I saw NP PP the astronomer with the telescope 21 / 55

Syntax and. Context-Free generate Context-Free Languages CF languages fit into the Chomsky hierarchy between regular languages and context-sensitive languages All regular languages are also context free languages All sets of strings describable by FSAs can be described by a But not vice versa 22 / 55

Syntax and. NP Det N NP Det N Represent constituent structure Encode a sharp notion of grammaticality Compare to N-gram models 23 / 55

Grammaticality Syntax and. What is a grammatical sentence? What is an ungrammatical sentence? 24 / 55

Grammaticality Syntax and. What is a grammatical sentence? What is an ungrammatical sentence? Which sentences are grammatical? I want to book a flight to Boston I want to booked a flight to Boston Colorless green ideas sleep furiously Twas brillig, and the slithy toves did gyre and gimble in the wabe...from a s point of view?...from a probabilistic model point of view?...from a human point of view? 25 / 55

s, informally Syntax and. Consist of rules, or productions each expresses the ways that symbols of the language can be grouped and ordered...and a lexicon of words and symbols. NP Det Nominal NP ProperNoun Nominal Noun Nominal Noun Det a Det the Noun flight 26 / 55

s, formally Syntax and. A is a 4-tuple: < C,, P, S >: C is the set of categories (aka non-terminals, e.g., { S, NP, VP, V,...} ) is the vocabulary (aka terminals, e.g., { Kim, snow, adores,... }) P is the set of rewrite rules, of the form: α β 1, β 2,..., β n S (in C) is the start-symbol For each rule α β 1, β 2,..., β n in P, α is drawn from C and each β is drawn from C or 27 / 55

and Generation Syntax and....familiar dualism recall FSTs : assgning a structure to a string Generation: using rules to write strings Derivation: Arriving from a string to a structure (or vice versa) by applying a series of rules 28 / 55

The Start symbol Syntax and. Is needed for us to know where to start (or finish), to get a well-formed structure Would we want to start deriving from VP? maybe! depends on the situation When denoted S, is easy to think of as referring to sentence, but it need not be the case It really refers to Start, not sentence. 29 / 55

Example Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Using the following lexicon, write rules that will generate (at least) these sentences, and assign them plausible structures. Aux = {do, does} V = {book, know, gave, serve, served} N = {flight, number, dinner, you, me, he} Det = {my, the} 30 / 55

Example, candidate grammar Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Aux = {do, does} V = {book, know, gave, serve, served} N = {flight, number, dinner, you, me, he} Det = {my, the} S NP VP VP Aux S NP Det N VP V NP (NP) What is missing? How to fix it? 31 / 55

Example, candidate grammar Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Aux = {do, does} V = {book, know, gave, serve, served} N = {flight, number, dinner} PRO = {me, you, he} Det = {my, the} S NP VP VP Aux S NP Det N PRO VP V NP (NP) Better? 32 / 55

Example Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Aux = {do, does} V = {book, know, gave, serve, served} N = {flight, number, dinner} PRO = {me, you, he} Det = {my, the} S NP VP VP Aux S NP Det N PRO VP V NP (NP) Does the flight serve dinner? (Problem?) 33 / 55

Example Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Aux = {do, does} V = {book, know, gave, serve, served} N = {flight, number, dinner} PRO = {me, you, he} Det = {my, the} S NP VP VP Aux S NP Det N PRO VP V NP (NP) *He serve my dinner (Problem?) 34 / 55

Example Syntax and. Book my flight. Do you know the number? He gave me the number. He served my dinner. Aux = {do, does} V = {book, know, gave, serve, served, serves} N = {flight, number, dinner} PRO = {me, you, he} Det = {my, the} S NP VP VP Aux S NP Det N PRO VP V NP (NP) *Does this flight serves you dinner (Problem?) 35 / 55

Example Book my flight. Do you know the number? He gave me the number. He served my dinner. AuxSG = {does} AuxPL = {do} V-PL = {book, know, gave, serve, served} V-SG = {books, knows, gave, serves, served} N-SG = {flight, number, dinner} N-PL = {flights, numbers, dinners} PRO-SG = {me, you, he} PRO-PL = {you} Det = {my, the} S NP VP VP Aux S NP-SG Det N-SG PRO NP-SG Det N-PL PRO VP V NP (NP) Problem? Syntax and. 36 / 55

Limitations of s Syntax and. The cat chases the mouse vs. *The cat chase the mouse Can we model agreement? 37 / 55

Limitations of s Syntax and. The cat chases the mouse vs. *The cat chase the mouse Can we model agreement? Sure. But the grammar will quickly become rather huge and inelegant!...will need duplicate rules whenever 3rd person singular and plural are involved...what about languages with lots of various inflections? what about relating passive and interrogative sentences to their declarative counterparts? how many subtypes of verbs will we need? For each subcategorization frame, we will need to duplicate all the appropriate rules 38 / 55

What about heads? Syntax and. A head in syntactic theory is an item that is the most important in the phrase Essential for dependency parsing and probabilistic parsing Can we augment with heads? 39 / 55

What about heads? Syntax and. 40 / 55

algorithms and grammars Syntax and. A grammar is typically input to a parser (e.g. we ve been parsing by hand, using s) can be engineered or learned statistically from corpora Both approaches have pros and cons In particular, engineered grammars have higher precision while statistically-learned grammars have higher recall why? 41 / 55

Evaluating parsing Syntax and. How would you do extrinsic evaluation of a parsing system? How would you do intrinsic evaluation? Gold standard data? Metrics? 42 / 55

Gold standard Syntax and. What would a gold standard look like? 43 / 55

Gold standard Syntax and. A corpus of string-to-structure mappings Is this different from a corpus of hand-written digit to actual digit mappings? From a corpus of string-to-pos sequence mappings? 44 / 55

Gold standard Syntax and. A corpus of string-to-structure mappings But: there s no ground truth in trees! Semantic dependencies might be easier to get cross-framework agreement on, but even there it s non-trivial The Penn Treebank (Marcus et al 1993) was originally conceived of as a target for cross-framework parser evaluation 45 / 55

Metrics: Parseval Syntax and. 46 / 55

Syntax and. A treebank is a syntactically annotated corpus 47 / 55

Tree visualization NP-SBJ S VP Syntax and. DT NN MD VP the flight should VB PP-TMP NP-TMP arrive IN NP NN at CD RB tomorrow eleven a.m. 48 / 55

Treebanks as grammars Syntax and. How to turn a treebank into a grammar? 49 / 55

Treebanks as grammars Syntax and. How to turn a treebank into a grammar? Extract the rewrite rules We could also count how many times we saw which production Anything useful we could do with that? 50 / 55

Treebanks as grammars Syntax and. How to turn a treebank into a grammar? Extract the rewrite rules Exercise: extract the rules from the sample treebank sentences in the previous slide 51 / 55

Treebanks as grammars Syntax and. How to turn a treebank into a grammar? Extract the rewrite rules We could also count how many times we saw which production Anything useful we could do with that? Statistical parsing! (next week) 52 / 55

Dependency Formalisms and Treebanks Syntax and. Universal Dependencies (Nivre et al., 2016) The Penn Treebank (Marcus et al., 1993) 53 / 55

Dependency Treebank Syntax and. 54 / 55

What you need to know Syntax and., grammar, grammaticality definitions Bracket and tree notation s: informal definition, production, derivation Treebanks (what they are) 55 / 55