LING/C SC/PSYC 438/538. Lecture 23 Sandiway Fong

Similar documents
Grammars & Parsing, Part 1:

Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS 598 Natural Language Processing

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

LTAG-spinal and the Treebank

Natural Language Processing. George Konidaris

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Parsing of part-of-speech tagged Assamese Texts

Proof Theory for Syntacticians

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Character Stream Parsing of Mixed-lingual Text

Specifying a shallow grammatical for parsing purposes

Words come in categories

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Ch VI- SENTENCE PATTERNS.

Constraining X-Bar: Theta Theory

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Accurate Unlexicalized Parsing for Modern Hebrew

AQUA: An Ontology-Driven Question Answering System

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Prediction of Maximal Projection for Semantic Role Labeling

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

The Role of the Head in the Interpretation of English Deverbal Compounds

The stages of event extraction

What the National Curriculum requires in reading at Y5 and Y6

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Ensemble Technique Utilization for Indonesian Dependency Parser

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Chapter 9 Banked gap-filling

Intensive English Program Southwest College

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Developing Grammar in Context

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Derivational and Inflectional Morphemes in Pak-Pak Language

Analysis of Probabilistic Parsing in NLP

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Developing a TT-MCTAG for German with an RCG-based Parser

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Today we examine the distribution of infinitival clauses, which can be

Adapting Stochastic Output for Rule-Based Semantics

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Annotation Projection for Discourse Connectives

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Discourse Anaphoric Properties of Connectives

Word Stress and Intonation: Introduction

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Pseudo-Passives as Adjectival Passives

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Advanced Grammar in Use

Learning Computational Grammars

Adjectives tell you more about a noun (for example: the red dress ).

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Survey on parsing three dependency representations for English

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

An Evaluation of POS Taggers for the CHILDES Corpus

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

LING 329 : MORPHOLOGY

Loughton School s curriculum evening. 28 th February 2017

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

BASIC ENGLISH. Book GRAMMAR

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Training and evaluation of POS taggers on the French MULTITAG corpus

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

The Smart/Empire TIPSTER IR System

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Sample Goals and Benchmarks

Argument structure and theta roles

The Indiana Cooperative Remote Search Task (CReST) Corpus

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

4 th Grade Reading Language Arts Pacing Guide

Underlying and Surface Grammatical Relations in Greek consider

Copyright and moral rights for this thesis are retained by the author

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

CX 101/201/301 Latin Language and Literature 2015/16

Latin I (LA 4923) August 23-Dec 17, 2014 Michal A. Isbell. Course Description, Policies, and Syllabus

Some Principles of Automated Natural Language Information Extraction

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Using dialogue context to improve parsing performance in dialogue systems

Memory-based grammatical error correction

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Hindi Aspectual Verb Complexes

Transcription:

LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong

Today's Topics Natural language parsing: syntactic analysis Homeworks 11 and 12

Natural Language Parsing Syntax trees are a big deal in NLP Reminder: reading homework: JM: chapter 5, sections 1 and 2 chapter 12 Stanford Parser / Berkeley Parser (Context-Free grammars: type-2) http://nlp.stanford.edu:8080/parser/index.jsp http://tomato.banatao.berkeley.edu:8080/parser/parser.html Uses probabilistic rules learnt from a Treebank corpus Output: syntax trees diagrams (also dependency graph: Stanford) We do a lot with Treebanks in the follow-on course to this one (LING 581, Spring)

Natural Language Parsing A new generation of "deep learning" parsers (last two years): Google Cloud Natural Language (aka syntaxnet) UDPipe Output: dependency parses (only) https://cloud.google.com/natural-language/

Training Data Penn Treebank: parsed by human annotators Efforts by the Hong Kong Futures Exchange to introduce a new interest-rate futures contract continue to hit snags despite the support the proposed instrument enjoys in the colony s financial community. (WSJ section)

Natural Language Parsing 6

Natural Language Parsing Comparison between human parse and machine parse: empty categories not recovered by parsing, otherwise a good match! 7

Natural Language Parsing

Part of Speech (POS) JM Chapter 5 Parts of speech Classic eight parts of speech: e.g. englishclub.com => traced back to Latin scholars, back further to ancient Greek (Thrax) not everyone agrees on what they are.. The textbook lists: open class 4 (noun, verbs, adjectives, adverbs) closed class 7 (prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals) or what the subclasses are e.g. what is a Proper Noun? Saturday, April Textbook answer below

Part of Speech (POS) Getting POS information about a word 1. dictionary 2. pronunciation: e.g. are you content with the CONtent of the slide? 3. possible n-gram sequences e.g. *pronoun << common noun the << common noun 4. structure of the sentence/phrase (Syntax) 5. possible inflectional endings: e.g. V-s/-ed/-en/-ing e.g. N-s Task: POS tagging In computational linguistics, the Penn Treebank tagset is the most commonly used tagset (reprinted inside the front cover of your textbook) 45 tags listed in textbook 36 POS + 10 punctuation

Part of Speech (POS) http://faculty.washington.edu/dillon/gramresources/penntable.html NNP NNPS

Part of Speech (POS) PRP PRP$

Part of Speech (POS)

Part of Speech (POS) Stanford parser: walk noun/verb Disambiguation: 1. Syntax 2. Bigram sequence: *PRP << NN DT << NN

Part of Speech (POS) Word sense disambiguation (WSD) is more than POS tagging: different senses of the noun bank

Syntax Words combine recursively with one another into phrases (aka constituents) usually when two words combine, one word will head the phrase e.g [ VB/VBP eat] [ NN chocolate] projects e.g [ VB/VBP eat] [ DT some][ NN chocolate] Warning: terminology and parses in computational linguistics not necessarily the same as those used in theoretical linguistics object projects

Syntax Words combine recursively with one another into phrases (aka constituents) e.g. [ PRP we][ VB/VBP eat] [ NN chocolate] e.g. [ TO to][ VB/VBP eat] [ NN chocolate] subject

Syntax Words combine recursively with one another into phrases (aka constituents) e.g. [ NNP John][ VBD noticed][ IN/DT/WDT that][prp we][ VB/VBP eat] [ NN chocolate] selects/subcategorizes for preposition projects CP projects complementizer (C)

Syntax Words combine recursively with one another into phrases (aka constituents) How about a SBAR node? PRO cf. John wanted me to eat chocolate

Syntax Words combine recursively with one another into phrases (aka constituents) 1. John noticed that we eat chocolate 2. John noticed we eat chocolate

Homework 11 Question 1: write a Prolog CFG for the following sentences: 1. John ate (sensibly) (intransitive eat) 2. I fish (intransitive fish) 3. I ate fish (transitive eat) 4. Bill ate rice 5. Harry ate roast beef Note: you can use lowercase names (or quotes, e.g. 'John') Note: use Penn Treebank tagset for words (see inside the cover of your textbook, or Stanford Parser) nnp(prp(i)) --> [i]. nnp(nnp(john)) --> [john]. vbd(vbd(ate)) --> [ate]. Your grammar should produce one parse tree per example Your grammar should not contain infinite loops Use ; (for more answers) to show your code obeys the aforementioned constraints Submit your grammar and examples of runs

Homework 11

Homework 11 Question 2: expand your grammar to handle these sentences: 6. I ate fish, and Bill ate rice 7. *I ate fish, Bill ate rice 8. I ate fish, Bill ate rice, and Harry ate roast beef Note: the comma can be a quoted terminal, e.g. [','] comma(comma(',')) --> [',']. ','(','(',')) --> [',']. Note: be careful of left recursion on S (Stanford Parser)

Homework 12 Mandatory for 538; Extra Credit for 438. From Ross (1970), English exhibits (forward) gapping: 8. I ate fish, Bill rice, and Harry roast beef cf. I ate fish, Bill ate rice, and Harry ate roast beef Forwards only (cf. Japanese: backwards): 9. I ate fish, Bill ate rice, and Harry roast beef 10. *I fish, Bill rice, and Harry ate roast beef 11. *I fish, Bill ate rice, and Harry ate roast beef 12. *I fish, Bill ate rice, and Harry roast beef Parallelism requirement: 13. *I ate fish, Bill, and Harry roast beef 14. *I ate fish, Bill rice, and Harry

Homework 12 Gapping: 8. I ate fish, Bill rice, and Harry roast beef (not as gapping) I ate fish, Bill rice, and roast beef I ate fish, rice, and Harry roast beef (you don't have to handle these two) Update your grammar in Homework 11 to handle gapping HInt 1: use an extra argument to represent and spread the elided verb Hint 2: can insert Prolog code into rules e.g. {nonvar(v)}, {var(v)}, or {A=B}

Homework 12

Homework 12

Homeworks 11 and 12 Homework 11 due next Monday Homework 12 due next Wednesday Submit two files with each homework 1. PDF writeup 2. Your.pl file (code)