Introduction to Deep Processing Techniques for NLP. Deep Processing Techniques for NLP Ling 571 January 5, 2015 Gina-Anne Levow

Similar documents
CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

Parsing of part-of-speech tagged Assamese Texts

Natural Language Processing. George Konidaris

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Compositional Semantics

Context Free Grammars. Many slides from Michael Collins

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

BULATS A2 WORDLIST 2

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Chapter 4: Valence & Agreement CSLI Publications

Applications of memory-based natural language processing

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Construction Grammar. University of Jena.

Developing a TT-MCTAG for German with an RCG-based Parser

Program in Linguistics. Academic Year Assessment Report

AQUA: An Ontology-Driven Question Answering System

Proof Theory for Syntacticians

Ch VI- SENTENCE PATTERNS.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Using dialogue context to improve parsing performance in dialogue systems

Words come in categories

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

An Introduction to the Minimalist Program

Constraining X-Bar: Theta Theory

English Language and Applied Linguistics. Module Descriptions 2017/18

An Interactive Intelligent Language Tutor Over The Internet

LING 329 : MORPHOLOGY

Some Principles of Automated Natural Language Information Extraction

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Underlying and Surface Grammatical Relations in Greek consider

Parsing natural language

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

BASIC ENGLISH. Book GRAMMAR

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Update on Soar-based language processing

On the Notion Determiner

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

LNGT0101 Introduction to Linguistics

"f TOPIC =T COMP COMP... OBJ

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Prediction of Maximal Projection for Semantic Role Labeling

Analysis of Probabilistic Parsing in NLP

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Argument structure and theta roles

Accurate Unlexicalized Parsing for Modern Hebrew

The Role of the Head in the Interpretation of English Deverbal Compounds

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Derivational and Inflectional Morphemes in Pak-Pak Language

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Specifying a shallow grammatical for parsing purposes

THE VERB ARGUMENT BROWSER

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

A Grammar for Battle Management Language

Control and Boundedness

Refining the Design of a Contracting Finite-State Dependency Parser

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A Usage-Based Approach to Recursion in Sentence Processing

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

EAGLE: an Error-Annotated Corpus of Beginning Learner German

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Language properties and Grammar of Parallel and Series Parallel Languages

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Modeling full form lexica for Arabic

Emotional Variation in Speech-Based Natural Language Generation

Formulaic Language and Fluency: ESL Teaching Applications

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Jacqueline C. Kowtko, Patti J. Price Speech Research Program, SRI International, Menlo Park, CA 94025

Linking Task: Identifying authors and book titles in verbose queries

Probabilistic Latent Semantic Analysis

The Smart/Empire TIPSTER IR System

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Interfacing Phonology with LFG

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Literacy THE KEYS TO SUCCESS. Tips for Elementary School Parents (grades K-2)

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Type Theory and Universal Grammar

Transcription:

Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 5, 2015 Gina-Anne Levow

Roadmap Motivation: Applications Language and Thought Knowledge of Language Cross-cutting themes Ambiguity, Evaluation, & Multi-linguality Course Overview

Motivation: Applications Applications of Speech and Language Processing Call routing Information retrieval Question-answering Machine translation Dialog systems Spell-, Grammar- checking Sentiment Analysis Information extraction.

Building on Many Fields Linguistics: Morphology, phonology, syntax, semantics,.. Psychology: Reasoning, mental representations Formal logic Philosophy (of language) Theory of Computation: Automata,.. Artificial Intelligence: Search, Reasoning, Knowledge representation, Machine learning, Pattern matching Probability..

Language & Intelligence Turing Test: (1950) Operationalize intelligence Two contestants: human, computer Judge: human Test: Interact via text questions Question: Can you tell which contestant is human? Crucially requires language use and understanding

Limitations of Turing Test ELIZA (Weizenbaum 1966) Simulates Rogerian therapist User: You are like my father in some ways ELIZA: WHAT RESEMBLANCE DO YOU SEE User: You are not very aggressive ELIZA: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE... Passes the Turing Test!! (sort of) You can fool some of the people... Simple pattern matching technique True understanding requires deeper analysis & processing

Turing Test Revived On the web, no one knows you re a. Problem: bots Automated agents swamp services Challenge: Prove you re human Test: Something human can do, bot can t Solution: CAPTCHAs Distorted images: trivial for human; hard for bot* Key: Perception, not reasoning

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that.

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that. Phonetics & Phonology (Ling 450/550) Sounds of a language, acoustics Legal sound sequences in words

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that. Morphology (Ling 570) Recognize, produce variation in word forms Singular vs. plural: Door + sg: -> door; Door + plural -> doors Verb inflection: Be + 1 st person, sg, present -> am

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that. Part-of-speech tagging (Ling 570) Identify word use in sentence Bay (Noun) --- Not verb, adjective

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that. Syntax (Ling 566: analysis; Ling 570 chunking; Ling 571- parsing) Order and group words in sentence I m I do, sorry that afraid Dave I can t.

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. HAL: I'm sorry, Dave. I'm afraid I can't do that. Semantics (Ling 571) Word meaning: individual (lexical), combined (compositional) Open : AGENT cause THEME to become open; pod bay doors : (pod bay) doors

Knowledge of Language What does HAL (of 2001, A Space Odyssey) need to know to converse? Dave: Open the pod bay doors, HAL. (request) HAL: I'm sorry, Dave. I'm afraid I can't do that. (statement) Pragmatics/Discourse/Dialogue (Ling 571) Interpret utterances in context Speech act (request, statement) Reference resolution: I = HAL; that = open doors Politeness: I m sorry, I m afraid I can t

Language Processing Pipeline Deep Processing Shallow Processing

Shallow vs Deep Processing Shallow processing (Ling 570) Usually relies on surface forms (e.g., words) Less elaborate linguistics representations E.g. HMM POS-tagging; FST morphology Deep processing (Ling 571) Relies on more elaborate linguistic representations Deep syntactic analysis (Parsing) Rich spoken language understanding (NLU)

Cross-cutting Themes Ambiguity How can we select among alternative analyses? Evaluation How well does this approach perform: On a standard data set? When incorporated into a full system? Multi-linguality Can we apply this approach to other languages? How much do we have to modify it to do so?

I made her duck Means... Ambiguity

I made her duck Ambiguity Means... I caused her to duck down I made the (carved) duck she has I cooked duck for her I cooked the duck she owned I magically turned her into a duck

Ambiguity: POS I made her duck V Means... I caused her to duck down I made the (carved) duck she has I cooked duck for her I cooked the duck she owned I magically turned her into a duck N Poss Pron

Ambiguity: Syntax I made her duck Means... I made the (carved) duck she has ((VP (V made) (NP (POSS her) (N duck))) I cooked duck for her ((VP (V made) (NP (PRON her)) (NP (N (duck)))

Ambiguity: Semantics I made her duck Means... I caused her to duck down Make: AG cause TH to do sth I cooked duck for her Make: AG cook TH for REC I cooked the duck she owned Make: AG cook TH I magically turned her into a duck Duck: animal I made the (carved) duck she has Duck: duck-shaped figurine

Ambiguity Pervasive Pernicious Particularly challenging for computational systems Problem we will return to again and again in class

Course Information http://courses.washington.edu/ling571

Syntax Ling 571 Deep Processing Techniques for Natural Language Processing January 5, 2015

Roadmap Sentence Structure Motivation: More than a bag of words Constituency Representation: Context-free grammars Formal definition of context free grammars Chomsky hierarchy Why not finite state? Aside: Context-sensitivity

More than a Bag of Words Sentences are structured: Impacts meaning: Dog bites man vs man bites dog Impacts acceptability: Dog man bites

Constituency Constituents: basic units of sentences word or group of words that acts as a single unit Phrases: Noun phrase (NP), verb phrase (VP), prepositional phrase (PP), etc Single unit: type determined by head (e.g., N->NP)

Constituency How can we tell what units are constituents? On September seventeenth, I d like to fly from Sea- Tac Airport to Denver.

Constituency How can we tell what units are constituents? On September seventeenth, I d like to fly from Sea- Tac Airport to Denver. September seventeenth On September seventeen Sea-Tac Airport from Sea-Tac Airport

Constituency Testing Appear in similar contexts PPs, NPs, PPs Preposed or Postposed constructions On September seventeenth, I d like to fly from Sea-Tac Airport to Denver. I d like to fly from Sea-Tac Airport to Denver on September seventeenth. Must move as unit *On I d like to fly September seventeenth from Sea-Tac Airport to Denver. *I d like to fly on September from Sea-Tac airport to Denver seventeenth.

Representing Sentence Structure Captures constituent structure Basic units Phrases Subcategorization Argument structure Components expected by verbs Hierarchical

Representation: Context-free Grammars CFGs: 4-tuple A set of terminal symbols: Σ A set of non-terminal symbols: N A set of productions P: of the form A -> α Where A is a non-terminal and α in (Σ U N)* A designated start symbol S L =W w in Σ* and S=>*w Where S=>*w means S derives w by some seq

Representation: Context-free Grammars Partial example Σ: the, cat, dog, bit, bites, man N: NP, VP, AdjP, Nom, Det, V, N, Adj, P: Sà NP VP; NP à Det Nom; Nom à N Nom N; VPà V NP, Nà cat, Nà dog, Nà man, Detà the, Và bit, Và bites S S NP VP Det Nom V NP N Det Nom N The dog bit the man

Sentence-level Knowledge: Syntax Different models of language Specify the expressive power of a formal language Chomsky Hierarchy Recursively =Any Enumerable Context = αaβ->αγβ n n n Context A-> γ Sensitive a b c Free a n b n Regular S->aB Expression a*b*

Representing Sentence Structure Why not just Finite State Models? Cannot describe some grammatical phenomena Inadequate expressiveness to capture generalization Center embedding Finite State: A w * ; A w * B Context-Free: A αaβ Allows recursion The luggage arrived. The luggage that the passengers checked arrived. The luggage that the passengers that the storm delayed checked arrived.

Parsing Goals Accepting: Legal string in language? Formally: rigid Practically: degrees of acceptability Analysis What structure produced the string? Produce one (or all) parse trees for the string Will develop techniques to produce analyses of sentences Rigidly accept (with analysis) or reject Produce varying degrees of acceptability