Intelligent Systems (AI-2)

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 598 Natural Language Processing

Natural Language Processing. George Konidaris

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

Compositional Semantics

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Developing a TT-MCTAG for German with an RCG-based Parser

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Chapter 4: Valence & Agreement CSLI Publications

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Context Free Grammars. Many slides from Michael Collins

Proof Theory for Syntacticians

Some Principles of Automated Natural Language Information Extraction

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

An Interactive Intelligent Language Tutor Over The Internet

Analysis of Probabilistic Parsing in NLP

Developing Grammar in Context

Using dialogue context to improve parsing performance in dialogue systems

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Construction Grammar. University of Jena.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

AQUA: An Ontology-Driven Question Answering System

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Ch VI- SENTENCE PATTERNS.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The Interface between Phrasal and Functional Constraints

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Conversational User Interface

Prediction of Maximal Projection for Semantic Role Labeling

Accurate Unlexicalized Parsing for Modern Hebrew

A Framework for Customizable Generation of Hypertext Presentations

The Smart/Empire TIPSTER IR System

LNGT0101 Introduction to Linguistics

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Specifying Logic Programs in Controlled Natural Language

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Adapting Stochastic Output for Rule-Based Semantics

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Update on Soar-based language processing

Parsing natural language

Linking Task: Identifying authors and book titles in verbose queries

Applications of memory-based natural language processing

LTAG-spinal and the Treebank

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Intensive English Program Southwest College

On the Notion Determiner

Lecture 1: Basic Concepts of Machine Learning

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Modeling full form lexica for Arabic

Ensemble Technique Utilization for Indonesian Dependency Parser

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Guidelines for Writing an Internship Report

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Minimalism is the name of the predominant approach in generative linguistics today. It was first

A First-Pass Approach for Evaluating Machine Translation Systems

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Words come in categories

THE VERB ARGUMENT BROWSER

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

A Graph Based Authorship Identification Approach

California Department of Education English Language Development Standards for Grade 8

Constraining X-Bar: Theta Theory

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Pre-Processing MRSes

A Version Space Approach to Learning Context-free Grammars

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Derivational and Inflectional Morphemes in Pak-Pak Language

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Introduction to Text Mining

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

Multiple case assignment and the English pseudo-passive *

Part I. Figuring out how English works

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Character Stream Parsing of Mixed-lingual Text

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Hans-Ulrich Block, Hans Haugeneder Siemens AG, MOnchen ZT ZTI INF W. Germany. (2) [S' [NP who][s does he try to find [NP e]]s IS' $=~

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Type Theory and Universal Grammar

The Structure of Multiple Complements to V

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Argument structure and theta roles

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Transcription:

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 25 Nov, 8, 2017 CPSC 422, Lecture 25 Slide 1

NLP: Knowledge-Formalisms Map (including probabilistic formalisms) State Machines (and prob. versions) Morphology Syntax (Finite State Automata, Finite State Transducers, Markov Models) Neural Models, Neural Sequence Modeling M a c L e Semantics Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) h i n a r n Pragmatics Discourse and Dialogue Logical formalisms (First-Order Logics, Prob. Logics) e i n g AI planners (MDP Markov Decision Processes) 2 CPSC 422, Lecture 25

NLP Practical Goal for FOL: the ultimate Web question-answering system? Map NL queries and the Web into FOL so that answers can be effectively computed What African countries are not on the Mediterranean Sea? c Country ( c) ^ Borders( c, Med. Sea) ^ In( c, Africa) Was 2007 the first El Nino year after 2001? ElNino(2007) y Year( y) ^ After( y,2001) ^ Before( y,2007) ElNino( y) CPSC 422, Lecture 25 3

Today Nov 8 English Syntax Context Free Grammars Parsing CPSC 422, Lecture 25 4

Syntax of Natural Languages Def. The study of how sentences are formed by grouping and ordering words Part of speech: Noun, Verb. It is so The is Example: Ming and Sue prefer morning flights * Ming Sue flights morning and prefer Groups behave as single unit wrt Substitution they, it, do so Movement: passive, question Coordination. and 5 CPSC 422, Lecture 25

Syntax: Useful tasks Why should you care? Grammar checkers Basis for semantic interpretation Question answering Information extraction Summarization Discourse Parsing Machine translation CPSC 422, Lecture 25 6

Key Constituents: Examples Noun phrases Verb phrases Prepositional phrases Adjective phrases Sentences (Det) N (PP) the cat on the table (Qual) V (NP) never eat a cat (Deg) P (NP) almost in the net (Deg) A (PP) very happy about it (NP) (-) (VP) a mouse -- ate it CPSC 422, Lecture 25 8

Start-symbol Context Free Grammar (Example) S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left Non-terminal Terminal Backbone of many models of syntax Parsing is tractable CPSC 422, Lecture 25 9

CFG more complex Example Grammar with example phrases Lexicon CPSC 422, Lecture 25 10

CFGs Define a Formal Language (un/grammatical sentences) Generative Formalism Generate strings in the language Reject strings not in the language Impose structures (trees) on strings in the language CPSC 422, Lecture 25 11

CFG: Formal Definitions 4-tuple (non-term., term., productions, start) (N,, P, S) P is a set of rules A ; A N, ( N)* A derivation is the process of rewriting 1 into m (both strings in ( N)*) by applying a sequence of rules: 1 * m L G = W w * and S * w CPSC 422, Lecture 25 12

Derivations as Trees Nominal Nominal flight Context Free? CPSC 422, Lecture 25 13

Common Sentence-Types Declaratives: A plane left S -> NP VP Imperatives: Leave! S -> VP Yes-No Questions: Did the plane leave? S -> Aux NP VP WH Questions: Which flights serve breakfast? S -> WH NP VP When did the plane leave? S -> WH Aux NP VP CPSC 422, Lecture 25 14

Conjunctive Constructions S -> S and S John went to NY and Mary followed him NP -> NP and NP John went to NY and Boston VP -> VP and VP John went to NY and visited MOMA In fact the right rule for English is X -> X and X CPSC 422, Lecture 25 16

CFG for NLP: summary CFGs cover most syntactic structure in English. Many practical computational grammars simply rely on CFG CPSC 422, Lecture 25 17

Today Nov 8 Context Free Grammars / English Syntax Parsing CPSC 422, Lecture 25 19

Parsing with CFGs Sequence of words Valid parse trees I prefer a morning flight CFG Parser Nominal Nominal flight Assign valid trees: covers all and only the elements of the input and has an S at the top CPSC 422, Lecture 25 20

CFG S -> NP VP S -> Aux NP VP NP -> Det Noun VP -> Verb Det -> a Noun -> flight Parsing as Search Verb -> left, arrive Aux -> do, does Search space of possible parse trees defines Parsing: find all trees that cover all and only the words in the input CPSC 422, Lecture 25 21

Constraints on Search Sequence of words Valid parse trees I prefer a morning flight CFG (search space) Parser Nominal Nominal flight Search Strategies: Top-down or goal-directed Bottom-up or data-directed CPSC 422, Lecture 25 22

Context Free Grammar (Used in parsing Example) CPSC 422, Lecture 25 23

Top-Down Parsing Since we re trying to find trees rooted with S (Sentences) start with the rules that rewrite S. Then work your way down from there to the words. Input: flight CPSC 422, Lecture 25 24

Next step: Top Down Space Input: flight...... When POS categories are reached, reject trees whose leaves fail to match all words in the input CPSC 422, Lecture 25 25

Bottom-Up Parsing Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. Then work your way up from there. flight flight flight CPSC 422, Lecture 25 26

Two more steps: Bottom-Up Space...... flight flight flight flight flight flight flight CPSC 422, Lecture 25 27

Top-down Top-Down vs. Bottom-Up Only searches for trees that can be answers But suggests trees that are not consistent with the words Bottom-up Only forms trees consistent with the words Suggest trees that make no sense globally CPSC 422, Lecture 25 28

So Combine Them (from here to slide 35 not required for 422 just for your interest) Top-down: control strategy to generate trees Bottom-up: to filter out inappropriate parses Top-down Control strategy: Depth vs. Breadth first Which node to try to expand next Which grammar rule to use to expand a node (left-most) (textual order) CPSC 422, Lecture 25 29

Top-Down, Depth-First, Left-to- Right Search Sample sentence: Does this flight include a meal? CPSC 422, Lecture 25 30

Example Does this flight include a meal? CPSC 422, Lecture 25 31

Example Does this flight include a meal? flight flight CPSC 422, Lecture 25 32

Example Does this flight include a meal? flight flight CPSC 422, Lecture 25 33

Adding Bottom-up Filtering The following sequence was a waste of time because an NP cannot generate a parse tree starting with an AUX Aux Aux Aux Aux CPSC 422, Lecture 25 34

Bottom-Up Filtering Category S NP Nominal VP Left Corners Det, Proper-Noun, Aux, Verb Det, Proper-Noun Noun Verb Aux Aux Aux CPSC 422, Lecture 25 35

Problems with TD-BU-filtering Ambiguity Repeated Parsing SOLUTION: Earley Algorithm (once again dynamic programming!) CPSC 422, Lecture 25 36

Effective Parsing Top-down and Bottom-up can be effectively combined but still cannot deal with ambiguity and repeated parsing PARTIAL SOLUTION: Dynamic Programming approaches (you ll see one applied to Prob, CFG) Fills tables with solution to sub-problems Parsing: sub-trees consistent with the input, once discovered, are stored and can be reused 1. Stores ambiguous parse compactly (but cannot select best one) 2. Does not do (avoidable) repeated work CPSC 422, Lecture 25 37

Example of relatively complex parse tree Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports. CPSC 422, Lecture 25 38

Check out demos on course web page - Berkeley Parser with demo - Stanford Parser with demo CPSC 422, Lecture 25 39

11/8/2017 CPSC503 Winter 2016 40

Learning Goals for today s class You can: Explain what is the syntax of a Natural Language Formally define a Context Free Grammar Justify why a CFG is a reasonable model for the English Syntax Apply a CFG as a Generative Formalism to Impose structures (trees) on strings in the language (i.e. Trace Top-down and Bottom-up parsing on sentence given a grammar) Reject strings not in the language (also part of parsing) Generate strings in the language given a CFG CPSC 422, Lecture 25 Slide 41

Next class Fri Probabilistic CFG Assignment-3 out due Nov 20 (8-18 hours working in pairs on programming parts is strongly advised) Still have midterms pick them up! CPSC 422, Lecture 25 42