Statistical Approaches to Natural Language Processing CS 4390/5319 Spring Semester, 2003 Syllabus

Similar documents
CS 598 Natural Language Processing

Applications of memory-based natural language processing

Parsing of part-of-speech tagged Assamese Texts

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The Smart/Empire TIPSTER IR System

Natural Language Processing. George Konidaris

Using dialogue context to improve parsing performance in dialogue systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Lecture 1: Basic Concepts of Machine Learning

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

English Language and Applied Linguistics. Module Descriptions 2017/18

(Sub)Gradient Descent

Compositional Semantics

Natural Language Processing: Interpretation, Reasoning and Machine Learning

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS Machine Learning

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Python Machine Learning

Linking Task: Identifying authors and book titles in verbose queries

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Computational Grammars

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

BYLINE [Heng Ji, Computer Science Department, New York University,

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Modeling function word errors in DNN-HMM based LVCSR systems

Distant Supervised Relation Extraction with Wikipedia and Freebase

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Laboratorio di Intelligenza Artificiale e Robotica

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

AQUA: An Ontology-Driven Question Answering System

Knowledge-Based - Systems

Learning Methods in Multilingual Speech Recognition

Developing a TT-MCTAG for German with an RCG-based Parser

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Computer Science 141: Computing Hardware Course Information Fall 2012

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Modeling function word errors in DNN-HMM based LVCSR systems

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Analysis of Probabilistic Parsing in NLP

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Using Semantic Relations to Refine Coreference Decisions

BUS Computer Concepts and Applications for Business Fall 2012

Probabilistic Latent Semantic Analysis

The Conversational User Interface

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Controlled vocabulary

BSM 2801, Sport Marketing Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Psychology and Language

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

LING 329 : MORPHOLOGY

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Speech Recognition at ICSI: Broadcast News and beyond

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Beyond the Pipeline: Discrete Optimization in NLP

Ensemble Technique Utilization for Indonesian Dependency Parser

Adapting Stochastic Output for Rule-Based Semantics

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

SPAN 2311: Spanish IV DC Department of Modern Languages Angelo State University Fall 2017

BA 130 Introduction to International Business

Dialog Act Classification Using N-Gram Algorithms

Constraining X-Bar: Theta Theory

George Mason University Graduate School of Education Program: Special Education

An Introduction to the Minimalist Program

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ARNE - A tool for Namend Entity Recognition from Arabic Text

A Case Study: News Classification Based on Term Frequency

Eye Movements in Speech Technologies: an overview of current research

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

A Comparison of Two Text Representations for Sentiment Analysis

Some Principles of Automated Natural Language Information Extraction

The stages of event extraction

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

The MEANING Multilingual Central Repository

Welcome to. ECML/PKDD 2004 Community meeting

Laboratorio di Intelligenza Artificiale e Robotica

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

An Interactive Intelligent Language Tutor Over The Internet

Grammars & Parsing, Part 1:

A study of speaker adaptation for DNN-based speech synthesis

Introduction, Organization Overview of NLP, Main Issues

CS 446: Machine Learning

Word Segmentation of Off-line Handwritten Documents

A Graph Based Authorship Identification Approach

arxiv: v1 [cs.cl] 2 Apr 2017

Transcription:

Statistical Approaches to Natural Language Processing CS 4390/5319 Spring Semester, 2003 Syllabus http://www.cs.utep.edu/nigel/nlp.html Time and Location 15:00 16:25, Tuesdays and Thursdays Computer Science 322 Instructor Nigel WARD nigel@cs.utep.edu Computer Science, Room 206 (915) 747-6827 The Topic The field of Natural Language Processing (NLP) and spoken language processing (SLP) has applications such as: A. Machine Translation B. Information Retrieval and Search C. Information Filtering and Text Categorization D. Information Extraction E. Input Methods F. Spell Checking G. Dictation H. Command Interfaces I. Question-Answering Systems J. Tutorial Systems K. Other Dialog Systems Course Goals: to learn some useful concepts, models, algorithms, and techniques to practice some of the techniques used in building natural language systems to introduce or reinforce basic knowledge of: probability English grammar formal language and automata theory human-computer interaction

NLP Syllabus 2003 2 machine learning and AI simple data structures basic programming skills the engineering issues involved in building systems to appreciate the complexities of language Coverage This class will cover the basics of NLP, including: - representations of syntactic structure: PSG, bracketing, dependency, deep case - parsing: FSM, CFG, PCFG; chart, unification, Viterbi search - models of meaning: logic-based, case frames, semantic networks, connectionist - knowledge representation: semantic networks, vector spaces, database semantics - techniques for modeling spelling and morphology: - architectures for integration: pipeline, integrated, blackboard, Bayesian - learning methods: unsupervised, clustering, perceptron, decision trees, EM - performance evaluation: objective measures, usability metrics - human language vs. computer language: properties, uses - user needs: embedded NLP, rival interface technologies Textbooks and Readings: This class will use two textbooks. SLP Speech and Language Processing: An Introduction to Natural Language Processing Computational Linguistics, and Speech Recognition. by Daniel Jurafsky and James H. Martin, Prentice-Hall, 2000. (http://www.cs.colorado.edu/ martin/slp.html) MMML The Motivations behind Modern Models of Language. Nigel Ward (in preparation). SLP should be available in the bookstore. MMML will be xeroxed off and distributed somehow. It is important that you read the assigned chapters before each class. There will also be articles chosen to present classic issues, to illustrate NL systems, or to present recent research results. For more background, you may want to refer to Natural Language Understanding, 2nd edition, by James Allen, Benjamin-Cummings 1995 Foundations of Statistical Natural Language Processing Christopher Manning and Hinrich Schütze, MIT Press, 1999, Assignments: The assignments are also important. There will be several types of assignments: thought assignments

NLP Syllabus 2003 3 observation assignments computer assignments Graduate students will do two additional assignments: leading in-class discussion writing a research proposal Most assignments may be done either individually or in pairs. Some assignments will be done partly in class. Late assignments will be penalized. Tests: There will probably be two tests, tentatively February 11 and March 13. There will be a final examination, tentatively 13:00 15:45, Thursday, May 8. Grading: The weighting will be approximately: Final Exam 35%, Assignments 30%, Tests 25%, Quizzes 5%, and Class Participation 5%. Office Hours: Fridays 13:15 14:15 in my office, or by appointment, or whenever the door is open. Come with any questions, or just to chat.

NLP Syllabus 2003 4 Tentative Schedule of Readings and Assignments a. Introduction 1 a1. Overview of NLP Applications Read SLP1: Introduction a2. Overview of the Course b. Words b1. Review of Simple Finite State Models Read SLP2: Regular Expressions and Automata b2. Finite State Transducers Read SLP3 Morphology and Finite-State Transducers b3. Pronunciation Read SLP4 (except 4.4,4.5) Computational Phonology and Text-to-Speech b4. Basic Recognition Algorithms Read SLP5 Probabilistic Models of Pronunciation and Spelling 2 3 4 b5. Language Modeling Read SLP6 N-gram Models of Syntax 5 b6. Input Methods b7. Hidden Markov Models Read SLP7 HMMs and Speech Recognition Assignment: transcribe one 6,7 minute of a conversation c. Syntax c1. Motivation Read MMML Why We Ascribe Structures to Sentences (Ch.7+6.7) 8 c2. Some Complexities of English Read SLP8: Word Classes c3. Part-of-Speech Tagging c4. English Grammar Read SLP9: Context Free Grammars Assignment: train a part-ofspeech tagger for Spanish 9 10 11

NLP Syllabus 2003 5 c5. Context-Free Parsing Read SLP10 Parsing with Context-Free Grammars c6. Probabilistic Parsing Read SLP12 Lexicalized and Probabilistic Parsing Assignment: parse by hand and introspect on how Assignment: improve and test a grammar 12 13 d. Systems and Semantics 14 d1. Classic NLP Read MMML Five or Six Classic NLP Systems (Ch.7+6.7) Read Experience with the Evaluation of Natural Language Question Answerers (Tennant 1979) d2. Disambiguation Read Parsing, How to (Charniak 1983) Assignment: identify some sources of ambiguity Assignment: define a word Read Introduction to... Word Sense Disambiguation (Ide and Veronis 1998) d3. Information Retrieval, Web Search Read SLP17: Word Sense Disambiguation and Information Retrieval Read Topics in Information Retrieval (Manning and Schuetze 1999), pp 529-543, 554-556 Assignment: index creation with perl d4. Text Categorization Read Learning to Classify Text (Mitchell 1997) pp 180 184 Assignment: message classification d5. Information Extraction Read discussion article Fastus: A Cascaded Finite-state Transducer for Extracting Information from Natural-Language Text (Hobbs, Appelt et al 1997) d6. Template-Filling; Database Interfaces d7. The Dream of General-Purpose Meaning Understanding Read SLP14: Representing Meaning Read MMML AI and Connectionist Models of Meaning and Knowledge [ch8,9,13] Read discussion article KBMT... 15 16 17 18 19 20 e. Spoken Language Systems

NLP Syllabus 2003 6 e1. Speech Recognition and Understanding 21,22 Read discussion article Hidden Understanding Models of Natural Language (Miller, R. Bobrow et al 1994) e2. Applications for Spoken Language Systems e3. Dialog Management Read SLP 19: Dialogue and Conversational Agents Assignment: dialog design using VoiceXML 24 e4. Natural Language Generation e5. Usability Issues in Spoken Language Interfaces 25 e6. Real-Time Interaction in Dialog Systems Read A Simple Rule for the Cooperative Timing of Utterances in Spoken Dialog (N. Ward 1997) e7. Non-Verbal Communication and Multi-Modal Systems f. Machine Translation Read SLP21 Machine Translation Assignment: translate by 26 hand and introspect on the process Read discussion article Integrating Knowledge Bases and Statistics in MT, 27 Knight et al. 1994) Read discussion article Automatic Acquisition of Hierarchical Transduction Models (Alshawi et al. 1998) 23 g. Computational Linguistics g1. Psycholinguistics Assignment L: gather a speech error Read MMML Psycholinguistic Issues and Methods [ch11] Read discussion article A Probabilistic Model of Lexical and Syntactic Access and Disambiguation (Jurafsky 1996) g2. Formal Linguistics Read MMML Modeling Modern Linguistic Theories [ch12] g3. Cognitive Linguistics Read discussion article Metaphors We Live By, Chapters 1 4 (Lakoff & Johnson 1980) Assignment: find a metaphor Read Women, Fire and Dangerous Things selection (Lakoff 1987) 28 29 30 h. Review