Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing -

Similar documents
Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Parsing of part-of-speech tagged Assamese Texts

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Compositional Semantics

Natural Language Processing. George Konidaris

Chapter 4: Valence & Agreement CSLI Publications

Some Principles of Automated Natural Language Information Extraction

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Construction Grammar. University of Jena.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Analysis of Probabilistic Parsing in NLP

Parsing natural language

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Prediction of Maximal Projection for Semantic Role Labeling

Developing a TT-MCTAG for German with an RCG-based Parser

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Interface between Phrasal and Functional Constraints

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

AQUA: An Ontology-Driven Question Answering System

Character Stream Parsing of Mixed-lingual Text

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Ensemble Technique Utilization for Indonesian Dependency Parser

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

LING 329 : MORPHOLOGY

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Using dialogue context to improve parsing performance in dialogue systems

Words come in categories

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Proof Theory for Syntacticians

Derivational and Inflectional Morphemes in Pak-Pak Language

A Computational Evaluation of Case-Assignment Algorithms

Theoretical Syntax Winter Answers to practice problems

Accurate Unlexicalized Parsing for Modern Hebrew

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Context Free Grammars. Many slides from Michael Collins

Seminar - Organic Computing

A Graph Based Authorship Identification Approach

Ch VI- SENTENCE PATTERNS.

Adapting Stochastic Output for Rule-Based Semantics

English Language and Applied Linguistics. Module Descriptions 2017/18

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Update on Soar-based language processing

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

An Interactive Intelligent Language Tutor Over The Internet

Underlying and Surface Grammatical Relations in Greek consider

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Linking Task: Identifying authors and book titles in verbose queries

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

LNGT0101 Introduction to Linguistics

An Introduction to the Minimalist Program

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

THE VERB ARGUMENT BROWSER

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Natural Language Analysis and Machine Translation in Pilot - ATC Communication. Boh Wasyliw* & Douglas Clarke $

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

LTAG-spinal and the Treebank

BULATS A2 WORDLIST 2

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Guidelines for Writing an Internship Report

Specifying a shallow grammatical for parsing purposes

Interfacing Phonology with LFG

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Copyright and moral rights for this thesis are retained by the author

California Department of Education English Language Development Standards for Grade 8

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

A Grammar for Battle Management Language

Foundations of Knowledge Representation in Cyc

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

The Structure of Multiple Complements to V

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Oakland Unified School District English/ Language Arts Course Syllabus

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

The College Board Redesigned SAT Grade 12

and secondary sources, attending to such features as the date and origin of the information.

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

"f TOPIC =T COMP COMP... OBJ

Control and Boundedness

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Applications of memory-based natural language processing

Argument structure and theta roles

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Introduction to Text Mining

Grounding Language for Interactive Task Learning

Pseudo-Passives as Adjectival Passives

Organizing Comprehensive Literacy Assessment: How to Get Started

Som and Optimality Theory

Transcription:

74.419 Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing Natural Language - General "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651] (Natural) Language characterized by a sign system common or shared set of signs a systematic procedure to produce combinations of signs a shared meaning of signs and combinations of signs

Natural Language Processing Areas in Natural Language Processing Morphology (word stem + ending) Syntax, Grammar & Parsing (syntactic description & analysis) Semantics & Pragmatics (meaning; constructive; context-dependent; references; ambiguity) Intentions Pragmatic Theory of Language (Communication as Action) Discourse / Dialogue / Text Spoken Language Understanding Language Learning Natural Language - Parsing Natural Language syntactically described by a formal language, usually a (context-free) grammar: the start-symbol S = sentence non-terminals = syntactic constituents terminals = lexical entries/ words rules = grammar rules Parsing derive the syntactic structure of a sentence based on a language model (grammar) construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system)

Sample Grammar Grammar (S, NT, T, P) Sentence Symbol S NT, Part-of-Speech NT, syntactic Constituents NT, Grammar Rules P NT (NT T)* S NP VP statement S Aux NP VP question S VP command NP Det Nominal NP Proper-Noun Nominal Noun Noun Nominal Nominal PP VP Verb Verb NP Verb PP Verb NP PP PP Prep NP Det that this a Noun book flight meal money Proper-Noun Houston American Airlines TWA Verb book include prefer Aux does Prep from to on Task: Parse "Does this flight include a meal?" Sample Parse Tree Task: Parse "Does this flight include a meal?" S Aux NP VP Det Nominal Verb NP Noun Det Nominal does this flight include a meal

Bottom-up and Top-down Parsing Bottom-up from word-nodes to sentence-symbol Top-down Parsing from sentence-symbol to words S Aux NP VP Det Nominal Verb NP Noun Det Nominal does this flight include a meal Problems with Bottom-up and Top-down Parsing Problems with left-recursive rules like NP NP PP: don t know how many times recursion is needed Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid (several grammar rules applicable interim ambiguity). Combine top-down and bottom-up approach: Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right). Chart-Parsing / Earley-Parser

Problems in Parsing - Ambiguity Ambiguity One morning, I shot an elephant in my pajamas. How he got into my pajamas, I don t know. Groucho Marx syntactical/structural ambiguity several parse trees are possible e.g. above sentence semantic/lexical ambiguity several word meanings e.g. bank (where you get money) and (river) bank even different word categories possible (interim) e.g. He books the flight. vs. The books are here. or Fruit flies from the balcony vs. Fruit flies are on the balcony. Problems in Parsing - Attachment Attachment in particular PP (prepositional phrase) binding; often referred to as binding problem One morning, I shot an elephant in my pajamas. (S... (NP (PNoun I)(VP (Verb shot) (NP (Det an (Nominal (Noun elephant))) (PP in my pajamas))...) rule VP Verb NP PP (S... (NP (PNoun I)) (VP (Verb shot) (NP (Det an) (Nominal (Nominal (Noun elephant) (PP in my pajamas)... ) rule VP Verb NP and NP Det Nominal and Nominal Nominal PP and Nominal Noun

Chart Parsing / Early Algorithm Earley-Parser based on Chart-Parsing Essence: Integrate top-down and bottom-up parsing. Keep recognized sub-structures (sub-trees) for shared use during parsing. Top-down: Start with S-symbol. Generate all applicable rules for S. Go further down with leftmost constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS). Bottom-up: Read input word and compare. If word matches, mark as recognized and move parsing on to the next category in the rule(s). Chart Chart Sequence of n input words; n+1 nodes marked 0 to n. Arcs indicate recognized part of RHS of rule. The indicates recognized constituents in rules. Jurafsky & Martin, Figure 10.15, p. 380

Chart Parsing / Earley Parser 1 Chart Sequence of input words; n+1 nodes marked 0 to n. States in chart represent possible rules and recognized constituents, with arcs. Interim state S VP, [0,0] top-down look at rule S VP nothing of RHS of rule yet recognized ( is far left) arc at beginning, no coverage (covers no input word; beginning of arc at 0 and end of arc at 0) Chart Parsing / Earley Parser 2 Interim states NP Det Nominal, [1,2] top-down look with rule NP Det Nominal Det recognized ( after Det) arc covers one input word which is between node 1 and node 2 look next for Nominal NP Det Nominal, [1,3] Nominal was recognized, move after Nominal move end of arc to cover Nominal (change 2 to 3) structure is completely recognized; arc is inactive; mark NP as recognized in other rules (move ).

Chart - 0 S fi. VP NP VPfi. V NP Chart - 1 S fi. VP NP VPfi. V NP VPfi V. NP NPfi. Det Nom V

Chart - 2 S fi. VP NP VPfi V. NP NPfi Det. Nom Nom fi. Noun V Book Det this flight Chart - 3a S fi. VP NP VPfi V. NP NPfi Det. Nom Nom fi Noun. V Det Noun

Chart - 3b S fi. VP NP VPfi V. NP NPfi Det Nom. V Det Nom fi Noun. Noun Chart - 3c VPfi V NP. S fi. VP NPfi Det Nom. V Det Nom fi Noun. Noun

Chart - 3d S fi VP. VPfi V NP. V NPfi Det Nom. Det Nom fi Noun. Noun Chart - All States S fi VP. S fi. VP VPfi. V NP VPfi V NP. NPfi Det Nom. NPfi Det. Nom VPfi V. NP Nom fi. Noun NPfi. Det Nom Nom fi Noun. V Det Noun

Chart - Final States S fi VP NP. VPfi V NP. NPfi Det Nom. Nom fi Noun. V Det Noun Chart 0 with two S-Rules S fi. VP NP VPfi. V NP S fi. VP NP

Chart - 3 with two S-Rules VPfi V NP. S fi. VP NPfi Det Nom. S fi. VP NP V Det Nom fi Noun. Noun Final Chart - with two S-Rules S fi VP. S fi VP. NP VPfi V NP. V NPfi Det Nom. Det Nom fi Noun. Noun

Earley Algorithm - Functions predictor generates new rules for partly recognized RHS with constituent right of (top-down generation) scanner if word category (POS) is found right of the, the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode) completer if rule is completely recognized (the is far right), the recognition state of earlier rules in the chart advances: the is moved over the recognized constituent (bottom-up recognition).

Additional References Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. (Chapters 9 and 10) Earley Algorithm Jurafsky & Martin, Figure 10.16, p.384 Earley Algorithm - Examples Jurafsky & Martin, Figures 10.17 and 10.18