Natural Language Processing. George Konidaris

Similar documents
CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Compositional Semantics

Parsing of part-of-speech tagged Assamese Texts

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Speech Recognition at ICSI: Broadcast News and beyond

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Grammars & Parsing, Part 1:

Chapter 4: Valence & Agreement CSLI Publications

Context Free Grammars. Many slides from Michael Collins

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Argument structure and theta roles

Analysis of Probabilistic Parsing in NLP

Construction Grammar. University of Jena.

Theoretical Syntax Winter Answers to practice problems

The Strong Minimalist Thesis and Bounded Optimality

Control and Boundedness

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Merry-Go-Round. Science and Technology Grade 4: Understanding Structures and Mechanisms Pulleys and Gears. Language Grades 4-5: Oral Communication

AQUA: An Ontology-Driven Question Answering System

Applications of memory-based natural language processing

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Developing a TT-MCTAG for German with an RCG-based Parser

An Interactive Intelligent Language Tutor Over The Internet

Linking Task: Identifying authors and book titles in verbose queries

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Proof Theory for Syntacticians

A Grammar for Battle Management Language

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Accurate Unlexicalized Parsing for Modern Hebrew

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

"f TOPIC =T COMP COMP... OBJ

An Introduction to the Minimalist Program

Ensemble Technique Utilization for Indonesian Dependency Parser

Some Principles of Automated Natural Language Information Extraction

The History of Language Teaching

The Interface between Phrasal and Functional Constraints

English Language and Applied Linguistics. Module Descriptions 2017/18

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Second Exam: Natural Language Parsing with Neural Networks

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Copyright and moral rights for this thesis are retained by the author

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Probabilistic Latent Semantic Analysis

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Knowledge-Based - Systems

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Empiricism as Unifying Theme in the Standards for Mathematical Practice. Glenn Stevens Department of Mathematics Boston University

Visual CP Representation of Knowledge

Natural Language Analysis and Machine Translation in Pilot - ATC Communication. Boh Wasyliw* & Douglas Clarke $

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Pseudo-Passives as Adjectival Passives

Emotional Variation in Speech-Based Natural Language Generation

Distant Supervised Relation Extraction with Wikipedia and Freebase

Adapting Stochastic Output for Rule-Based Semantics

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

ISR PARENT EDUCATION HOW TO FILL OUT A FULL BUDS SHEET

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Word Stress and Intonation: Introduction

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Algebra 2- Semester 2 Review

Modeling full form lexica for Arabic

LING 329 : MORPHOLOGY

REVIEW OF CONNECTED SPEECH

Language acquisition: acquiring some aspects of syntax.

Just Because You Can t Count It Doesn t Mean It Doesn t Count: Doing Good Research with Qualitative Data

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Cross Language Information Retrieval

Prediction of Maximal Projection for Semantic Role Labeling

Interfacing Phonology with LFG

Domain Adaptation for Parsing

The Smart/Empire TIPSTER IR System

A relational approach to translation

Module Title: Managing and Leading Change. Lesson 4 THE SIX SIGMA

Writing a composition

SER CHANGES~ACCOMMODATIONS PAGES

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Derivations (MP) and Evaluations (OT) *

Transcription:

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017

Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans use language to communicate. Most natural interface. Huge amounts of NLP knowledge around. E.g., books. Generative power Key to intelligence? Hints as to underlying mechanism Key indicator of intelligence

Natural Language Processing It is also incredibly hard. Why? I saw a bat. Lucy owns a parrot that is larger than a cat. John kissed his wife, and so did Sam. Mary invited Sue for a visit, but she told her she had to go to work. I went to the hospital, and they told me to go home and rest. The price of tomatoes in Des Moines has gone through the roof. Mozart was born in Salzburg and Beethoven, in Bonn. (examples via Ernest Davis, NYU)

Natural Language Processing If you are a fan of the justices who fought throughout the Rehnquist years to pull the Supreme Court to the right, Alito is a home run - a strong and consistent conservative with the skill to craft opinions that make radical results appear inevitable and the ability to build trusting professional relationships across ideological lines. (TNR, Nov. 2005) The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a covershoot. (Economist, Feb. 2012) (examples via Ernest Davis, NYU)

Component Problems the cat sat on the mat perception syntactic semantic analysis SatOn(x = Cat, y = Mat) NP S VP analysis Article Noun VP PP Verb Prep NP disambiguation Article Noun Cat? the cat sat on the mat Mat? incorporation SatOn(cat3, mat16)

Perception The cat sat on the mat.

Major Challenges Speaker accent, volume, tone. No pauses - word boundaries? Noise. Variation.

Speech Recognition th ah ca t

Speech Recognition Using HMMs transition model St St+1 observation model Must store: P(O S) P(St+1 St) Ot Ot+1 prob. of observed audio given phoneme prob. of one phoneme following another

Issues Phoneme sequence not Markov Must introduce memory for context k-markov Models People speak faster or slower Window does not have fixed length Dynamic Time Warping Quite a simplistic model for a complex phenomenon. Nevertheless, speech recognition tech based on HMMs commercially-viable mid-1990s.

Speech Recognition with Deep Nets Mid-to-late 2000s: replace HMM with Deep Net. o1 o2 ah ca th 0.1 0.3 0.1 hn1 hn2 hn3. h11 h12 h13 x1 x2

Speech Recognition with Deep Nets How to deal with dependency on prior states and observations? o1 o2 h1 h2 h3 x1 x2 Recurrent nets: form of memory.

Component Problems the cat sat on the mat perception syntactic semantic analysis SatOn(x = Cat, y = Mat) NP S VP analysis Article Noun VP PP Verb Prep NP disambiguation Article Noun Cat? the cat sat on the mat Mat? incorporation SatOn(cat3, mat16)

Syntactic Analysis Syntax: characteristic of language. Structure. Composition. But observed in linear sequence. S NP VP Article Noun VP PP Verb Prep NP Article Noun the cat sat on the mat

Syntactic Analysis How to describe this structure? Formal grammar. Set of rules for generating sentences. Varying power: Recursively enumerable (equiv. Turing Machines) Context-Sensitive Context-Free Regular Each uses a set of rewrite rules to generate syntactically correct sentences. Colorless green ideas sleep furiously.

Formal Grammars Two types of symbols: Terminals (stop and output this) Non-terminals (one is a start symbol) Production (rewrite) rules that modify a string of symbols by matching expression on left, and replacing it with one on right. S! AB A! AA A! a B! BBB ab aaaaaab abbb aabbbbb B! b

Context-Free Grammars Rules must be of the form: A! B where A is a single non-terminal and B is any sequence of terminals and non-terminal. Why is this called context-free?

Probabilistic CFGs Attach a probability to each rewrite rule: Probabilities for the same left symbol sum to 1. Why do this? A! B[0.3] A! AA[0.6] A! a[0.1] More vs. less likely sentences. Probability distribution over valid sentences.

E0 Lexicon (R&N)

E0 Grammar (R&N)

S NP VP Article Noun VP PP Verb Prep NP Article Noun the cat sat on the mat

Component Problems the cat sat on the mat perception syntactic semantic analysis SatOn(x = Cat, y = Mat) NP S VP analysis Article Noun VP PP Verb Prep NP disambiguation Article Noun Cat? the cat sat on the mat Mat? incorporation SatOn(cat3, mat16)

Semantic Analysis Semantics: what the sentence actually means, eventually in terms of symbols available to the agent (e.g., a KB). the cat sat on the mat SatOn(x = Cat, y = Mat) SatOn(cat3, mat16)

Semantic Analysis Key idea: compositional semantics. The semantics of sentences are built out of the semantics of their constituent parts. The cat sat on the mat. Therefore there is a clear relationship between syntactic analysis and semantic analysis.

Semantic Analysis Useful step: Probability of parse depends on words Lexicalized PCFGs VP(v)! Verb(v)NP(n)[P 1 (v, n)] variables probability depends ate bandanna on variable bindings vs. ate banana

Semantic Analysis John loves Mary Desired output: Loves(John, Mary) Semantic parsing: Exploit compositionality of parsing to build semantics. (R&N)

Semantic Analysis sentence to add to KB S(Loves(John, Mary)) NP(John) VP(λx Loves(x, Mary)) Name(John) Verb(λy, λx Loves(x, y)) NP(Mary) Name(Mary) John loves Mary λ-expression symbols in KB

Machine Translation Major goal of NLP research for decades. Document in Russian Document in English

Competing Approaches Formal Language Document in Russian Document in English

Competing Approaches Document in Russian Document in English

Google Translate 100 languages, 200 million people daily