Computational Linguistics

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Grammars & Parsing, Part 1:

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Parsing of part-of-speech tagged Assamese Texts

Chapter 4: Valence & Agreement CSLI Publications

Proof Theory for Syntacticians

Natural Language Processing. George Konidaris

Context Free Grammars. Many slides from Michael Collins

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

BULATS A2 WORDLIST 2

Argument structure and theta roles

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Developing a TT-MCTAG for German with an RCG-based Parser

Constraining X-Bar: Theta Theory

Construction Grammar. University of Jena.

Words come in categories

Some Principles of Automated Natural Language Information Extraction

Adjectives tell you more about a noun (for example: the red dress ).

Ch VI- SENTENCE PATTERNS.

Prediction of Maximal Projection for Semantic Role Labeling

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

AQUA: An Ontology-Driven Question Answering System

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Pseudo-Passives as Adjectival Passives

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

LTAG-spinal and the Treebank

Sight Word Assessment

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Compositional Semantics

Character Stream Parsing of Mixed-lingual Text

Control and Boundedness

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Analysis of Probabilistic Parsing in NLP

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

"f TOPIC =T COMP COMP... OBJ

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Ensemble Technique Utilization for Indonesian Dependency Parser

Chapter 9 Banked gap-filling

The Interface between Phrasal and Functional Constraints

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Hindi-Urdu Phrase Structure Annotation

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

Language acquisition: acquiring some aspects of syntax.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Formulaic Language and Fluency: ESL Teaching Applications

Loughton School s curriculum evening. 28 th February 2017

Parsing natural language

Underlying and Surface Grammatical Relations in Greek consider

BASIC ENGLISH. Book GRAMMAR

An Interactive Intelligent Language Tutor Over The Internet

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

A Version Space Approach to Learning Context-free Grammars

Advanced Grammar in Use

A Computational Evaluation of Case-Assignment Algorithms

Using dialogue context to improve parsing performance in dialogue systems

Dependency, licensing and the nature of grammatical relations *

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Accurate Unlexicalized Parsing for Modern Hebrew

Som and Optimality Theory

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Interfacing Phonology with LFG

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Concept Acquisition Without Representation William Dylan Sabo

Hindi Aspectual Verb Complexes

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

The Discourse Anaphoric Properties of Connectives

Developing Grammar in Context

Writing a composition

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Specifying Logic Programs in Controlled Natural Language

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Emmaus Lutheran School English Language Arts Curriculum

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

A Usage-Based Approach to Recursion in Sentence Processing

LING 329 : MORPHOLOGY

Word Stress and Intonation: Introduction

Adapting Stochastic Output for Rule-Based Semantics

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Procedia - Social and Behavioral Sciences 154 ( 2014 )

A relational approach to translation

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Update on Soar-based language processing

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Transcription:

Computational Linguistics CSC 2501 / 485 Fall 2018 2 2. Introduction to syntax and parsing Gerald Penn Department of Computer Science, University of Toronto Reading: Jurafsky & Martin: 5.0 1, 12.0 12.3.3, 12.3.7, [13.1 2]. Bird et al: 8.0 4. Copyright 2017 Suzanne Stevenson, Graeme Hirst and Gerald Penn. All rights reserved.

Syntactic structure 1 Syntax: The combinatorial structure of words. How words can be linearly organized: left/right precedence, and contiguity. How words can be hierarchically organized into phrases and sentences. 2

Syntactic structure 2 The cat hunted the squirrel living in the tree with persistence. [ [The cat] [hunted [the squirrel [living [in [the tree] ] ] ] [with [persistence] ] ] ] 3

Syntactic structure 2 The cat hunted the squirrel living in the tree with persistence. The cat hunted the squirrel living in with persistence the tree 4

Syntactic structure 2 The cat hunted the squirrel living in the tree with persistence. NP S VP DET N V NP PP The cat hunted NP S P NP DET N V PP with the squirrel living P NP N in DET N persistence the tree 5

Syntactic structure 3 Goal: meaning, interpretation, semantics. So why do we care about syntax? 6

Grammars and parsing Grammar: Formal specification of allowable structures. Knowledge Representation Parsing: Analysis of string of words to determine the structure assigned by grammar. Algorithm Process 7

Using grammar to capture structure Main issues: Which words are grouped together into phrases. How words within a phrase project the properties of a single, common word (the head of the phrase). How different phrases relate to each other. Grammar encodes these relations. Some grammars interpret these relations with respect to meaning. 8

Good and bad grammars There are many possible grammars for any natural language. Some are better than others. Desiderata: Faithfulness to (vastly divergent) details about language. Economy of description. Fidelity to some prevailing linguistic intuition. Efficiency of parsing. 9

Elements of grammar Primitives: lexical categories or parts of speech. Each word-type is a member of one or more. Each word-token is an instance of exactly one. e.g. The cat in the hat sat. Categories are open or closed to new words. X Eight main categories, many subcategories. Nine X Seven X Twenty-three 10

Elements of grammar Primitives: lexical categories or parts of speech. Each word-type is a member of one or more. Each word-token is an instance of exactly one. Categories are open or closed to new words. X Eight main categories, many subcategories. Nine X Seven X Twenty-three The categories might possibly be languagespecific as well. 11

Parts of speech 1 Nouns: denote an object, a concept, a place, Count nouns: dog, spleen, Band-Aid, Mass nouns: water, wheat, Proper nouns: Fred, New York City, Pronouns: he, she, you, I, they, Adjectives: denote an attribute of the denotation of a noun. Intersective: pink, furry, Measure: big, Intensional: former, alleged, 12

Parts of speech 2 Determiners, articles: specify certain attributes of the denotation of a noun that are grammatically relevant. the, a, some, Verbs: predicates, denote an action or a state. Numerous distinctions, e.g. transitivity: Intransitive: sleep, die, Transitive: eat, kiss, Ditransitive: give, sell, Copula: be, feel, become, 13

Parts of speech 3 Adverbs: denote an attribute of the denotation of a predicate. Time and place: today, there, now, Manner: happily, furtively, Degree: much, very, Prepositions: relate two phrases with a location, direction, manner, etc. up, at, with, in front of, before, 14

Parts of speech 4 Conjunctions: combine two clauses or phrases: Coordinating conjunctions: and, or, but Subordinating conjunctions: because, while, Interjections: stand-alone emotive expressions: um, wow, oh dear, balderdash, crikey, 15

Elements of grammar Combinations: Phrase: a hierarchical grouping of words and/or phrases. Clause: a phrase consisting of a verb and (almost) all of its dependents. Sentence: a clause that is syntactically independent of other clauses. Can be represented by tree (or a labelled bracketing). Terminology: A constituent is a well-formed phrase with overtones of semantic and/or psychological significance. 16

Types of phrase 1 Noun phrase (NP): a mouse mice Mickey the handsome marmot the handsome marmot on the roof the handsome marmot whom I adore Verb phrase (VP): laughed loudly quickly gave the book to Mary 17

Types of phrase 2 Adjective phrase (AP): green proud of Kyle very happy that you went Prepositional phrase (PP): in the sink without feathers astride the donkey 18

Clauses and sentences 1 Clauses: Ross remarked upon Nadia s dexterity to become a millionaire by the age of 30 that her mother had lent her for the banquet Sentences: Ross remarked upon Nadia s dexterity. Nathan wants to become a millionaire by the age of 30. Nadia rode the donkey that her mother had lent her for the banquet. The handsome marmot on the roof [in dialogue]. 19

Clauses and sentences 2 Clauses may act as noun phrases: To become a millionaire by the age of 30 is what Ross wants. Nadia riding her donkey is a spectacular sight. Ross discovered that Nadia had been feeding his truffles to the donkey. 20

The structure of an idealized phrase XP ZP X YP XP subject or pre-modifier head word ZP X xxxx YP object, complement or post-modifier, adjunct 21

Example phrases AP VP ADV A S ADV V PP PP very happy that you went quickly go to the store with Maya S NP AUX VP Kim will go 22

Formal definition of a CFG A context-free grammar is a quadruple G = (Vt,Vn, P, S), where Vt is a finite set of terminal symbols. Vn is a finite set of non-terminal symbols. P is a finite set of production rules of the form A α where A Vn and α is a sequence of symbols in (Vn Vt)*. S Vn is the start symbol. 23

A very simple grammar S = S, P = { S NP NP VP Det N NP NP VP Det Adj N NP PP V Vt and Vn can be inferred from the production rules. Lexical categories: NT s that rewrite as a single T. VP PP Det Adj N V V NP P NP the a an old red happy The lexicon: In practice, a separate data structure dog park statue contumely run saw ate run disdained P in to on under with } 24

Terminology Non-terminal (NT): A symbol that occurs on the left-hand side (LHS) of some rule. Pre-terminal: a kind of non-terminal located on the LHS of a lexical entry. Terminal (T): A symbol that never occurs on the LHS of a rule. Start symbol: A specially designated NT that must be the root of any tree derived from the grammar. In our grammars, it is usually S for sentence. 25

Parsing 1 Parsing: Determining the structure of a sequence of words, given a grammar. Which grammar rules should be used? To which symbols (words / terminals and nodes / non-terminals) should each rule apply? 27

Parsing 2 Input: A context-free grammar. A sequence of words Time flies like an arrow or, more precisely, of sets of parts of speech. {noun,verb} {noun,verb} {verb,prep} {det} {noun} Process: Working from left to right, guess how each word fits in. 28

Parsing 3 If a guess leads to failure (parse is stymied), back up to a choice point and try a different guess. Backtracking, non-determinism. At each guess, must save state of parse on a stack. (Or, explore in parallel.) Want to guess right: Order of preference for rules. 29

Top-down parsing 1 Top-down or rule-directed parsing: Can I take these rules and match them to this input? Initial goal is an S. Repeatedly look for rules that decompose /expand current goals and give new goals. E.g., goal of S may decompose to goals NP and VP. Eventually get to goals that look at input. E.g., goal of NP may decompose to det noun. Succeed iff entire input stream is accounted for as S. 31

Top-down parsing 2 Example: A recursive descent parser. >>> nltk.app.rdparser() Operations on leftmost frontier node: Expand it. Match it to the next input word. 32

33

Top-down parsing 3 Choice of next operation (in NLTK demo): If it s a terminal, try matching it to input. If it s a non-terminal, try expanding with first-listed untried rule for that non-terminal. 34

Bottom-up parsing 1 Bottom-up or data-directed parsing: Can I take this input and match it to these rules? Try to find rules that match a possible PoS of the input words and then rules that match the constituents so formed. Succeed iff the entire input is eventually matched to an S. 35

Bottom-up parsing 2 Example: A shift reduce parser. >>> nltk.app.srparser() Operations: Shift next input word onto stack. Match the top n elements of stack to RHS of rule, reduce them to LHS. 36

37

Bottom-up parsing 3 Choice of next operation (in NLTK demo): Always prefer reduction to shifting. Choose the first-listed reduction that applies. Choice of next operation (in real life): Always prefer reduction to shifting for words, but not necessarily for larger constituents. 38

Problems Neither top-down nor bottom-up search exploits useful idiosyncrasies that CFG rules, alone or together, often have. Problems: Recomputation of constituents. Recomputation of common prefixes. Solution: Keep track of: Completed constituents. Partial matches of rules. 39