Natural Language Processing

Similar documents
CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

Applications of memory-based natural language processing

Natural Language Processing. George Konidaris

AQUA: An Ontology-Driven Question Answering System

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Using Moodle in ESOL Writing Classes

Some Principles of Automated Natural Language Information Extraction

CS177 Python Programming

Constraining X-Bar: Theta Theory

Houghton Mifflin Online Assessment System Walkthrough Guide

The Conversational User Interface

Classify: by elimination Road signs

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Getting Started with Deliberate Practice

Control and Boundedness

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

An Interactive Intelligent Language Tutor Over The Internet

Proof Theory for Syntacticians

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

LING 329 : MORPHOLOGY

Cross Language Information Retrieval

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Tour. English Discoveries Online

Course Specifications

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ScienceDirect. Malayalam question answering system

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Linking Task: Identifying authors and book titles in verbose queries

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

COMMUNICATING EFFECTIVELY WITH YOUR INSTRUCTOR

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

ACADEMIC TECHNOLOGY SUPPORT

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The Smart/Empire TIPSTER IR System

English Language and Applied Linguistics. Module Descriptions 2017/18

Compositional Semantics

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Best Practices in Internet Ministry Released November 7, 2008

Using computational modeling in language acquisition research

Sight Word Assessment

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Age Effects on Syntactic Control in. Second Language Learning

A Note on Structuring Employability Skills for Accounting Students

Lecture 1: Machine Learning Basics

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Java Programming. Specialized Certificate

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Controlled vocabulary

The Strong Minimalist Thesis and Bounded Optimality

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Introduction to CRC Cards

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CARITAS PROJECT GRADING RUBRIC

Mini Lesson Ideas for Expository Writing

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Appendix L: Online Testing Highlights and Script

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Eye Movements in Speech Technologies: an overview of current research

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Improving Advanced Learners' Communication Skills Through Paragraph Reading and Writing. Mika MIYASONE

Ch VI- SENTENCE PATTERNS.

The College Board Redesigned SAT Grade 12

A Case Study: News Classification Based on Term Frequency

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

B. How to write a research paper

Software Maintenance

L1 and L2 acquisition. Holger Diessel

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Update on Soar-based language processing

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Probabilistic Latent Semantic Analysis

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Context Free Grammars. Many slides from Michael Collins

Loughton School s curriculum evening. 28 th February 2017

Intensive English Program Southwest College

What to Do When Conflict Happens

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

An Introduction to Simio for Beginners

Developing Grammar in Context

Construction Grammar. University of Jena.

Transcription:

Natural Language Processing Lecture 1 1/13/2015 CSCI 5832 Susan W. Brown Natural Language Processing We re going to study what goes into getting computers to perform useful and interesting tasks involving human language. 1/14/15 Speech and Language Processing - Jurafsky and Martin 2 1

Natural Language Processing More specifically, it s about the structure of human languages, the algorithms that exploit that structure to process language, and the formal basis for those algorithms. 1/14/15 Speech and Language Processing - Jurafsky and Martin 3 Why Should You Care? Three trends 1. An enormous amount of information is now available in machine readable form as natural language text (newspapers, web pages, medical records, financial filings, etc.) 2. Conversational agents are becoming an important form of human-computer communication 3. Much of human-human interaction is now mediated by computers via social media 1/14/15 Speech and Language Processing - Jurafsky and Martin 4 2

Applications Let s take a quick look at three important application areas! Text analytics! Question answering! Machine translation 1/14/15 Speech and Language Processing - Jurafsky and Martin 5 Text Analytics Data-mining of weblogs, microblogs, discussion forums, message boards, user groups, and other forms of user generated media! Product marketing information! Political opinion tracking! Social network analysis! Buzz analysis (what s hot, what topics are people talking about right now) 1/14/15 Speech and Language Processing - Jurafsky and Martin 6 3

Text Analytics 1/14/15 Speech and Language Processing - Jurafsky and Martin 7 Text Analytics 1/14/15 Speech and Language Processing - Jurafsky and Martin 8 4

Question Answering Traditional information retrieval provides documents/resources that provide users with what they need to satisfy their information needs. Question answering on the other hand directly provides an answer to information needs posed as questions. 1/14/15 Speech and Language Processing - Jurafsky and Martin 9 Web Q/A 1/14/15 Speech and Language Processing - Jurafsky and Martin 10 5

Watson 1/14/15 Speech and Language Processing - Jurafsky and Martin 11 Machine Translation The automatic translation of texts between languages is one of the oldest non-numerical applications in Computer Science. In the past 10 years or so, MT has gone from a niche academic curiosity to a robust commercial industry. 1/14/15 Speech and Language Processing - Jurafsky and Martin 12 6

Google Translate 1/14/15 Speech and Language Processing - Jurafsky and Martin 13 Google Translate 1/14/15 Speech and Language Processing - Jurafsky and Martin 14 7

How? All of these applications operate by exploiting underlying regularities inherent in human languages. Sometimes in complex ways, sometimes in pretty trivial ways. Language structure Formal models Practical applications 1/14/15 Speech and Language Processing - Jurafsky and Martin 15 Major Class Topics 1. Words 2. Syntax 3. Meaning 4. Texts 5. Applications exploiting each 1/14/15 Speech and Language Processing - Jurafsky and Martin 16 8

Applications First, what makes an application a language processing application (as opposed to any other piece of software)?! An application that requires the use of knowledge about the structure of human language " Example: Is Unix wc (word count) an example of a language processing application? 1/14/15 Speech and Language Processing - Jurafsky and Martin 17 Applications Word count?! When it counts words: Yes " To count words you need to know what a word is. That s knowledge of language. Note that the definition of word embodied in wc doesn t work for Chinese or other languages that don t delimit words with spaces! When it counts lines and bytes: No " Lines and bytes are computer artifacts, not linguistic entities 1/14/15 Speech and Language Processing - Jurafsky and Martin 18 9

Caveat NLP has an distinct AI aspect to it! We re often dealing with ill-defined problems! We don t often come up with exact solutions/ algorithms " That is, we re dealing with algorithms that don t work.! To make progress we need to have concrete metrics that tell us how well we re doing, or at least whether our systems are improving or not 1/14/15 Speech and Language Processing - Jurafsky and Martin 19 Administrative Stuff Waitlist Web page! verbs.colorado.edu/~mpalmer/csci5832/ Reasonable preparation Requirements 1/14/15 Speech and Language Processing - Jurafsky and Martin 20 10

Web Page The course web page can be found at. verbs.colorado.edu/~mpalmer/csci5832/ It will have the syllabus, lecture notes, assignments, announcements, etc. You should check the News tab periodically for new stuff. I ll be using this in preference to email. 1/14/15 Speech and Language Processing - Jurafsky and Martin 21 Mailing List There is a automatically generated mailing list. Mail goes to your colorado.edu email address.! I can t alter it so don t ask me to send your mail to gmail/yahoo/work or whatever! You can set up a forward yourself 1/14/15 Speech and Language Processing - Jurafsky and Martin 22 11

Preparation Some exposure to logic Exposure to basic concepts in probability Familiarity with linguistics Ability to write well in English Ability to program Basic algorithm and data structure analysis 1/14/15 Speech and Language Processing - Jurafsky and Martin 23 Requirements Readings:! Speech and Language Processing by Jurafsky and Martin, 2ed. Prentice-Hall 2009! A few conference or journal papers 3 programming assignments Problem sets (about 10) 2 midterms Final report and presentation 1/14/15 Speech and Language Processing - Jurafsky and Martin 24 12

Programming Most of the programming will be done in Python.! It s free and works on Windows, Macs, and Linux! It s easy to install! Easy to learn 1/14/15 Speech and Language Processing - Jurafsky and Martin 25 Programming Go to www.python.org to get started. The default installation comes with an editor called IDLE. It s a serviceable development environment. Python mode in Emacs is pretty good. It s what I use, but I m a dinosaur. If you like Eclipse use that. 1/14/15 Speech and Language Processing - Jurafsky and Martin 26 13

Grading Programming assignments 30% Problem sets 18% Midterms 28% Final report 14% Participation 10% 1/14/15 Speech and Language Processing - Jurafsky and Martin 27 Questions? 1/14/15 Speech and Language Processing - Jurafsky and Martin 28 14

Course Material We ll be intermingling discussions of:! Linguistic topics " Morphology, syntax, semantics, discourse! Formal systems " Regular languages, context-free grammars, probabilistic models! Applications " Question answering, machine translation, information extraction 1/14/15 Speech and Language Processing - Jurafsky and Martin 29 Course Material We won t be doing speech recognition or synthesis. 1/14/15 Speech and Language Processing - Jurafsky and Martin 30 15

Topics: Linguistics Word-level processing Syntactic processing Lexical and compositional semantics 1/14/15 Speech and Language Processing - Jurafsky and Martin 31 Topics: Techniques Finite-state methods Context-free methods Probabilistic models Supervised machine learning methods 1/14/15 Speech and Language Processing - Jurafsky and Martin 32 16

Categories of Knowledge Phonology Morphology Syntax Semantics Pragmatics Discourse Each kind of knowledge has associated with it an encapsulated set of processes that make use of it. Interfaces are defined that allow the various levels to communicate. This often leads to a pipeline architecture. Morphological Processing Syntactic Analysis Semantic Interpretation Context 1/14/15 Speech and Language Processing - Jurafsky and Martin 33 Ambiguity Ambiguity is a fundamental problem in computational linguistics Hence, resolving, or managing, ambiguity is a recurrent theme 1/14/15 Speech and Language Processing - Jurafsky and Martin 34 17

Ambiguity Find at least 5 meanings of this sentence:! I made her duck 1/14/15 Speech and Language Processing - Jurafsky and Martin 35 Ambiguity Find at least 5 meanings of this sentence:! I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (ceramic?) duck she owns I caused her to quickly lower her upper body I waved my magic wand and turned her into undifferentiated waterfowl 1/14/15 Speech and Language Processing - Jurafsky and Martin 36 18

Ambiguity is Pervasive I caused her to quickly lower her head or body! Lexical category: duck can be a noun or verb I cooked waterfowl belonging to her.! Lexical category: her can be a possessive ( of her ) or dative ( for her ) pronoun I made the (ceramic) duck statue she owns! Lexical Semantics: make can mean create or cook, and about 100 other things as well 1/14/15 Speech and Language Processing - Jurafsky and Martin 37 Ambiguity is Pervasive Grammar: Make can be:! Transitive: (verb has a noun direct object) " I cooked [waterfowl belonging to her]! Ditransitive: (verb has 2 noun objects) " I made [her] (into) [undifferentiated waterfowl]! Action-transitive (verb has a direct object and another verb)! I caused [her] [to move her body] 1/14/15 Speech and Language Processing - Jurafsky and Martin 38 19

Ambiguity is Pervasive Phonetics!! I mate or duck! I m eight or duck! Eye maid; her duck! Aye mate, her duck! I maid her duck! I m aid her duck! I mate her duck! I m ate her duck! I m ate or duck! I mate or duck 1/14/15 Speech and Language Processing - Jurafsky and Martin 39 Problem Remember our pipeline... Morphological Processing Syntactic Analysis Semantic Interpretation Context 1/14/15 Speech and Language Processing - Jurafsky and Martin 40 20

Really it s this Morphological Processing Semantic Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Syntactic Interpretation Semantic Syntactic Interpretation Semantic Analysis Syntactic Interpretation Semantic Analysis Syntactic Interpretation Semantic Analysis Syntactic Interpretation Semantic Analysis Syntactic Interpretation Semantic Analysis Syntactic Interpretation Semantic Analysis Interpretation Semantic Analysis Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Semantic Interpretation Interpretation 1/14/15 Speech and Language Processing - Jurafsky and Martin 41 Dealing with Ambiguity Four possible approaches: 1. Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. 2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. 1/14/15 Speech and Language Processing - Jurafsky and Martin 42 21

Dealing with Ambiguity 3. Probabilistic approaches based on making the most likely choices 1. Or passing along n-best choices 4. Don t do anything, maybe it won t matter 1. We ll leave when the duck is ready to eat. 2. The duck is ready to eat now. Does the duck ambiguity matter with respect to whether we can leave? 1/14/15 Speech and Language Processing - Jurafsky and Martin 43 Models and Algorithms By models we mean the formalisms that are used to capture the various kinds of linguistic knowledge we need. Algorithms are then used to manipulate the knowledge representations needed to tackle the task at hand. 1/14/15 Speech and Language Processing - Jurafsky and Martin 44 22

Models State machines Rule-based approaches Logical formalisms Probabilistic models 1/14/15 Speech and Language Processing - Jurafsky and Martin 45 Algorithms Many of the algorithms that we ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another. Unfortunately, ambiguity makes this process difficult. This leads us to employ algorithms that are designed to handle ambiguity of various kinds 1/14/15 Speech and Language Processing - Jurafsky and Martin 46 23

Paradigms In particular..! State-space search " To manage the problem of making choices during processing when we lack the information needed to make the right choice! Dynamic programming " To avoid having to redo work during the course of a state-space search CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch! Classifiers " Machine learning based classifiers that are trained to make decisions based on features extracted from the local context 1/14/15 Speech and Language Processing - Jurafsky and Martin 47 Next Time Read Chapters 1 and 2 of the textbook 1/14/15 Speech and Language Processing - Jurafsky and Martin 48 24