Reference: Steven Bird, Ewan Klein, and Edward Loper. (2009). Natural language processing with Python. O reilly.

Similar documents
Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Applications of memory-based natural language processing

City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

Introduction, Organization Overview of NLP, Main Issues

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

BA 130 Introduction to International Business

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Natural Language Processing. George Konidaris

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Linking Task: Identifying authors and book titles in verbose queries

AQUA: An Ontology-Driven Question Answering System

CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

Department of Computer Science. Program Review Self-Study

An Interactive Intelligent Language Tutor Over The Internet

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

LING 329 : MORPHOLOGY

CS 100: Principles of Computing

Speech Recognition at ICSI: Broadcast News and beyond

Beyond the Pipeline: Discrete Optimization in NLP

FINN FINANCIAL MANAGEMENT Spring 2014

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Some Principles of Automated Natural Language Information Extraction

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Ensemble Technique Utilization for Indonesian Dependency Parser

Firms and Markets Saturdays Summer I 2014

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

The Strong Minimalist Thesis and Bounded Optimality

A Case Study: News Classification Based on Term Frequency

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Developing a TT-MCTAG for German with an RCG-based Parser

Instructor: Matthew Wickes Kilgore Office: ES 310

University of Pittsburgh Department of Slavic Languages and Literatures. Russian 0015: Russian for Heritage Learners 2 MoWe 3:00PM - 4:15PM G13 CL

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Distant Supervised Relation Extraction with Wikipedia and Freebase

Data Structures and Algorithms

Python Machine Learning

21st Century Community Learning Center

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Intensive English Program Southwest College

Statistics and Data Analytics Minor

Course Syllabus for Math

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Using dialogue context to improve parsing performance in dialogue systems

Office Hours: Mon & Fri 10:00-12:00. Course Description

Multi-Lingual Text Leveling

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

Florida Reading Endorsement Alignment Matrix Competency 1

Guidelines for Writing an Internship Report

Course Content Concepts

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Prediction of Maximal Projection for Semantic Role Labeling

Math 181, Calculus I

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

TUCSON CAMPUS SCHOOL OF BUSINESS SYLLABUS

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

CENTRAL MICHIGAN UNIVERSITY COLLEGE OF EDUCATION AND HUMAN SERVICES Department of Teacher Education and Professional Development

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Cross Language Information Retrieval

Math 96: Intermediate Algebra in Context

Disambiguation of Thai Personal Name from Online News Articles

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

English Policy Statement and Syllabus Fall 2017 MW 10:00 12:00 TT 12:15 1:00 F 9:00 11:00

Probabilistic Latent Semantic Analysis

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Master Syllabus ENGL 1020 English Composition II

CENTRAL MICHIGAN UNIVERSITY COLLEGE OF EDUCATION AND HUMAN SERVICES

Foothill College Summer 2016

CS Machine Learning

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Transcription:

Elective course in Computer Science University of Macau Faculty of Science and Technology Department of Computer and Information Science SFTW462 Introduction to Natural Language Processing Syllabus 1 st Semester 2013/2014 Part A Course Outline Course description: (2-2) 3 credits. This course introduces fundamental concepts and skills associated with the design and implementation of different natural language processing systems covered from morphology, syntax and semantics. The main topics include regular expressions, (weighted) minimum edit distance, language modeling, Nävie Bayes (generative model), maximum entropy (discriminative model), text classification, sequence labeling, POS tagging, syntax parsing and computational lexical semantics. The course also includes an overview of practical natural language processing applications. Course type: Theoretical with substantial laboratory/practice content Prerequisites: MATH111 Textbook(s) and other required material: Dan Jurafsky, and James H. Martin. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed.). Pearson International Edition. Reference: Steven Bird, Ewan Klein, and Edward Loper. (2009). Natural language processing with Python. O reilly. Major prerequisites by topic: Programming algorithms and formal structures. Basic knowledge in artificial intelligence. Basic familiarity with logic, linear algebra, probability theory. Mathematical principals in analyzing and problem modeling. Course objectives: Learn the fundamental concepts, models, algorithms, and techniques. [a, e, k] Review basic knowledge of probability, formal language, computational linguistics, and programming skills. [a, e] Introduce engineering issues involved in the analysis and design natural language processing systems. [a, c, e] Practice of the techniques used in building natural language systems. [a, c, e, k] Appreciate the complexities of natural language. [a, c, e] Topics covered: Basic Concepts (2 hours): Introduce fundamental knowledge of natural language processing (NLP), and different analytical tasks at the morphology, part-of-speech (POS), syntactic structure and word sense. Discuss the problem of language ambiguities, and review the models and algorithms used in processing natural language. Text Processing (4 hours): Introduce the fundamental techniques of text processing and string similarity measure, including regular expression, sentence segmentation, word tokenization, normalization and (weighted) minimum edit distance for string alignment. Those are the basic techniques that used in the first step for text preprocessing. 1

Probabilistic Models (8 hours): Introduce N-grams, Nävie Bayes, and Maximum Entropy Models, which are commonly used in language processing. Probabilistic models are crucial for capturing every kind of linguistic knowledge, and can be used to augment state machines and formal rule systems to solve many kinds of ambiguity problems. Morphological Analysis (4 hours): Introduce the tasks of morphological analysis and part-of-speech tagging. Study the relevant algorithms and problem-solving techniques in morphological analysis. Syntactic Parsing (6 hours): Study the fundamental concepts in syntax through the use of declarative formalisms: context-free grammars and dependency grammars. Learn parsing algorithms that employ grammars to automatically assign a syntactic structure to an input sentence. Lexical Semantic (4 hours): Study the representation of meaning. Concern the issues of meaning that associated with lexicon, and introduce a computational problem of word sense disambiguation. Applications (2 hours): Show how language-related algorithms and techniques can be applied to important real-world problems. This includes spelling checking and correction, text classification, named entity recognition, sentiment analysis, POS tagging and syntactic parsing. Class/laboratory schedule: Timetabled work in hours per week Lecture Tutorial Practice No of teaching weeks Total hours Total credits No/Duratio n of exam papers 2 2 Nil 14 56 3 1 / 3 hours Student study effort required: Class contact: Lecture Tutorial Other study effort Self-study Homework assignment Project / Case study Total student study effort 28 hours 28 hours 24 hours 8 hours 15 hours 103 hours Student assessment: Final assessment will be determined on the basis of: Homework 10% Project 20% Midterm 30% Final exam 40% Course assessment: The assessment of course objectives will be determined on the basis of: Homework, project and exams Course evaluation Course outline: Weeks Topic Course work 1 2-3 4-5 Introduction Concepts of natural language processing (NLP), layers of language processing, morphology, part-of-speech, phrase structure and syntax tree, lexicon semantic, linguistic and computational issues Text Processing Regular expression, sentence segmentation, word tokenization and normalization, string matching, alignments, minimum edit distance, weighted minimum edit distance Language Modeling Probability foundations, noise channel, maximum likelihood estimation, model evaluation - perplexity, smoothing techniques, spelling checking and correction Assignment#1 Project Task #1 2

Weeks Topic Course work Classification Models 6-7 Generative and discriminative models, Näive Bayes, feature-based models, maximum entropy model, sequence labeling model Text Classification Classification algorithms, information extraction, named entity 8 recognition and classification, sentiment analysis, feature selection, learning and evaluation Part-Of-Speech (POS) Tagging 9 Word class, POS disambiguation, maximum entropy Markov model Syntax Parsing 10-12 Context-free grammar, dependency grammar, parsing strategy, statistical CYK parsing Lexical Semantic 13 Representation of meaning, word sense relations, word sense disambiguation 14 Project Demonstration Contribution of course to meet the professional component: This course prepares students to work professionally in the area of human language processing. Assignment#2 Project Task #2 Assignment#3 Midterm exam Project Task #3 Assignment#4 Relationship to CS program objectives and outcomes: This course primarily contributes to the Computer Science program outcomes that develop student abilities to: (a) an ability to apply knowledge of mathematics, science, and engineering. (c) an ability to design a system, component, or process to meet desired needs within realistic constraints such as economic, environmental, social, political, ethical, health and safety, manufacturability, and sustainability. (e) an ability to identify, formulate, and solve engineering problems. (k) an ability to use the techniques, skills, and modern engineering tools necessary for engineering practice. Relationship to CS program criteria: Criterion DS PF AL AR OS NC PL HC GV IS IM SP SE CN Scale: 1 (highest) to 4 (lowest) 4 2 1 3 2 Discrete Structures (DS), Programming Fundamentals (PF), Algorithms and Complexity (AL), Architecture and Organization (AR), Operating Systems (OS), Net-Centric Computing (NC), Programming Languages (PL), Human-Computer Interaction (HC), Graphics and Visual Computing (GV), Intelligent Systems (IS), Information Management (IM), Social and Professional Issues (SP), Software Engineering (SE), Computational Science (CN). Course content distribution: Percentage content for Mathematics Science and engineering subjects Complementary electives Total 10% 80% 10% 100% Persons who prepared this description: Dr. Fai Wong, Dr. Sam Chao 3

Part B General Course Information and Policies 1st Semester 2013/2014 Instructor: Dr. Fai Wong Office: R108 Office hour: Mon ~ Fri 15:00 18:00, or by appointment Phone: 8397 8051 Email: derekfw@umac.mo Time/Venue: Mon 11:00 13:00, WLG113 (lecture) Wed 14:00 16:00, RLG302 (tutorial) Grading distribution: Percentage Grade Final Grade Percentage Grade Final Grade 100-93 A 92-88 A 87-83 B+ 82-78 B 77-73 B 72-68 C+ 67-63 C 62-58 C 57-53 D+ 52-50 D below 50 F Comment: The objectives of the lectures are to explain and to supplement the text material. Students are responsible for the assigned material whether or not it is covered in the lecture. Students who wish to succeed in this course should read the textbook prior to the lecture and should work all homework and project assignments. You are encouraged to look at other sources (other texts, etc.) to complement the lectures and text. Homework policy: The completion and correction of homework is a powerful learning experience; therefore: There will be approximately 4 homework assignments. Homework is due one week after assignment unless otherwise noted, no late homework is accepted. The course grade will be based on the average of the HW grades. Course project: The project is probably the most exciting part of this course and provides students with meaningful experiences to design and implement an NLP system in the course: The application domain will be discussed further in class. The project will be presented at the end of the semester. Exams: One midterm exam will be held during the semester. Both the midterm and final exams are closed book, 2-hour examinations. There will be occasional in-class assignment. Note: Check UMMoodle (https://ummoodle.umac.mo/) for announcement, homework and lectures. Report any mistake on your grades within one week after posting. No make-up exam is given except for CLEAR medical proof. Cheating is absolutely prohibited by the university. 4

Appendix: Rubric for Program Outcomes Rubric for (a) 5 (Excellent) 3 (Average) 1 (Poor) Students have some Understand the confusion on some Students do not understand theoretic background and theoretic background or do not the background or do not the limitations of the background understand theoretic study at all. respective applications. background completely. Rubric for (c) 5 (Excellent) 3 (Average) 1 (Poor) Student understands very clearly what needs to be Student understands what designed and the realistic Design needs to be designed and Student does not design constraints such as capability and the design constraints, but understand what needs to economic, environmental, design may not fully understand be designed and the design social, political, ethical, constraints the limitations of the design constraints. health and safety, constraints. manufacturability, and sustainability. Rubric for (e) 5 (Excellent) 3 (Average) 1 (Poor) Identify Students cannot identify applications in problem but cannot apply problem and can identify correct terms for engineering formulation, or cannot fundamental formulation. engineering applications. systems understand problem. Rubric for (k) 5 (Excellent) 3 (Average) 1 (Poor) Student applies the Use modern Student does not apply principles, skills and tools Student applies the principles, principles and tools to correctly model and principles, skills and tools skills, and tools correctly and/or does not analyze engineering to analyze and implement in engineering correctly interpret the problems, and understands engineering problems. practice results. the limitations. 5