Natural Language Processing. COMP-599 Sept 5, 2017

Similar documents
ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

CS 598 Natural Language Processing

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Applications of memory-based natural language processing

Context Free Grammars. Many slides from Michael Collins

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

English Language and Applied Linguistics. Module Descriptions 2017/18

Parsing of part-of-speech tagged Assamese Texts

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Natural Language Processing. George Konidaris

Argument structure and theta roles

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Constraining X-Bar: Theta Theory

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Developing a TT-MCTAG for German with an RCG-based Parser

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Part I. Figuring out how English works

Control and Boundedness

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Florida Reading Endorsement Alignment Matrix Competency 1

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

The History of Language Teaching

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Underlying and Surface Grammatical Relations in Greek consider

Linguistics. The School of Humanities

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

The Strong Minimalist Thesis and Bounded Optimality

Language acquisition: acquiring some aspects of syntax.

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Cross Language Information Retrieval

Language contact in East Nusantara

Analysis of Probabilistic Parsing in NLP

The Smart/Empire TIPSTER IR System

Lingüística Cognitiva/ Cognitive Linguistics

Beyond the Pipeline: Discrete Optimization in NLP

TU-E2090 Research Assignment in Operations Management and Services

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

TESL /002 Principles of Linguistics Professor N.S. Baron Spring 2007 Wednesdays 5:30 pm 8:00 pm

First Grade Curriculum Highlights: In alignment with the Common Core Standards

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Proof Theory for Syntacticians

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

5 th Grade Language Arts Curriculum Map

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Using computational modeling in language acquisition research


L1 and L2 acquisition. Holger Diessel

Construction Grammar. University of Jena.

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Update on Soar-based language processing

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

FINN FINANCIAL MANAGEMENT Spring 2014

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

MTH 215: Introduction to Linear Algebra

Orange Coast College Spanish 180 T, Th Syllabus. Instructor: Jeff Brown

LITERACY, AND COGNITIVE DEVELOPMENT

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Problems of the Arabic OCR: New Attitudes

Grammars & Parsing, Part 1:

The Conversational User Interface

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

On the Formation of Phoneme Categories in DNN Acoustic Models

Syllabus SOCI 305 Socialisation Fall 2013 TR 11:35AM 12:55PM in Leacock 232

CIS Introduction to Digital Forensics 12:30pm--1:50pm, Tuesday/Thursday, SERC 206, Fall 2015

Derivations (MP) and Evaluations (OT) *

Teaching ideas. AS and A-level English Language Spark their imaginations this year

Learning Methods in Multilingual Speech Recognition

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intermediate Academic Writing

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Derivational and Inflectional Morphemes in Pak-Pak Language

Multiple case assignment and the English pseudo-passive *

Neuroscience I. BIOS/PHIL/PSCH 484 MWF 1:00-1:50 Lecture Center F6. Fall credit hours

BYLINE [Heng Ji, Computer Science Department, New York University,

Effect of Word Complexity on L2 Vocabulary Learning

Phonological and Phonetic Representations: The Case of Neutralization

Chapter 4: Valence & Agreement CSLI Publications

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Interior Design 350 History of Interiors + Furniture

Frequency and pragmatically unmarked word order *

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

CS 100: Principles of Computing

CEFR Overall Illustrative English Proficiency Scales

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Transcription:

Natural Language Processing COMP-599 Sept 5, 2017

Preliminaries Instructor: Jackie Chi Kit Cheung Time and Loc.: TR 16:05-17:25 in MAASS 217 Office hours: TAs: T 14:30-15:45 or by appointment in MC108N Ali Emami, Jad Kabbara, Kian Kenyon-Dean, Krtin Kumar Evaluation: 4 assignments (40%) 1 midterm (20%) 1 group project (40%) 2

The Course Is Full If you ve registered for more courses than you plan to take, please decide soon! Many students are trying to get into this course. Due to resource and classroom size limits, I cannot extend the class size anymore. 3

General Policies Lateness policy for assignments: < 15 minutes: no penalty 15 minutes 24 hours: 10% absolute penalty > 24 hours: not accepted Plagiarism: just don t do it. Language policy: In accordance with McGill policy, you have the right to write essays and examinations in English or in French. Course website: http://cs.mcgill.ca/~jcheung/teaching/fall- 2017/comp550/index.html Important announcements given in-class or on the course website, not on MyCourses 4

Assignments Four assignments (10% each) Involve readings, problem sets and programming component. Programming component hand in online through mycourses Programming to be done in Python 2.7. Non-programming components hand in on paper in class 5

Midterm Worth 20% of your final grade Currently scheduled for Thu, November 9, 2017 Will be conducted in-class (80 minutes long). More details as we approach the midterm date. 6

Final Project Worth 40%. Experiment on some language data set Summarize and review relevant papers Report on experiments Must be done in teams of two Coming up with a project idea: Extend a model we see in class Work on a relevant topic of interest Consult a list of suggested projects, to be posted 7

Project Steps Paper or project proposal Progress update Final submission Due dates to be announced 8

Computational Linguistics and Natural Language Processing 9

Language is Everywhere 10

Languages Are Diverse 6000+ languages in the world language langue ਭ ਸ 語言 idioma Sprache lingua The Great Language Game http://greatlanguagegame.com/ (My high score is 1300) 11

Computational Linguistics (CL) Modelling natural language with computational models and techniques Domains of natural language Acoustic signals, phonemes, words, syntax, semantics, Speech vs. text Natural language understanding (or comprehension) vs. natural language generation (or production) 12

Computational Linguistics (CL) Modelling natural language with computational models and techniques Goals Language technology applications Scientific understanding of how language works 13

Computational Linguistics (CL) Modelling natural language with computational models and techniques Methodology and techniques Gathering data: language resources Evaluation Statistical methods and machine learning Rule-based methods 14

Natural Language Processing Sometimes, computational linguistics and natural language processing (NLP) are used interchangeably. Slight difference in emphasis: NLP Goal: practical technologies Engineering CL Goal: how language actually works Science 15

Understanding and Generation Natural language understanding (NLU) Language to form usable by machines or humans Natural language generation (NLG) Traditionally, semantic formalism to text More recently, also text to text Most work in NLP is in NLU c.f. linguistics, where most theories deal primarily with production 16

Personal Assistant App Understanding Call a taxi to take me to the airport in 30 minutes. What is the weather forecast for tomorrow? Generation 17

Machine Translation I like natural language processing. Automatische Sprachverarbeitung gefällt mir. Understanding Generation 18

Recommendation System A system chats with you to discover what you like, and recommends an event to check out this weekend. Understanding Generation 19

Computational Linguistics Besides new language technologies, there are other reasons to study CL and NLP as well. 20

The Nature of Language First language acquisition Chomsky proposed a universal grammar Is language an instinct? Do children have enough linguistic input to learn their mother tongue? Train a model to find out! 21

The Nature of Language Language processing Some sentences are supposed to be grammatically correct, but are difficult to process. Formal mathematical models to account for this. The rat escaped. The rat the cat caught escaped.?? The rat the cat the dog chased caught escaped. 22

Mathematical Foundations of CL We describe language with various formal systems. 23

Mathematical Foundations of CL Mathematical properties of formal systems and algorithms Can they be efficiently learned from data? Efficiently recovered from a sentence? Complexity analysis Implications for algorithm design 24

Types of Language Text Much of traditional NLP work has been on news text. Clean, formal, standard English, but very limited! More recent work on diversifying into multiple domains Speech Political texts, text messages, Twitter Messier: disfluencies, non-standard language Automatic speech recognition (ASR) Text-to-speech generation 25

Domains of Language The grammar of a language has traditionally been divided into multiple levels. Phonetics Phonology Morphology Syntax Semantics Pragmatics Discourse 26

Phonetics Study of the speech sounds that make up language Articulation, transmission, perception peach [phi:tsh] Involves closing of the lips, building up of pressure in the oral cavity, release with aspiration, Vowel can be described by its formants, 27

Phonology Study of the rules that govern sound patterns and how they are organized peach speech beach [phi:tsh] [spi:tsh] [bi:tsh] The p in peach and speech are the same phoneme, but they actually are phonetically distinct! 28

Morphology Word formation and meaning antidisestablishmentarianism anti- dis- establish -ment -arian -ism establish establishment establishmentarian establishmentarianism disestablishmentarianism antidisestablishmentarianism 29

Syntax Study of the structure of language *I a woman saw park in the. I saw a woman in the park. There are two meanings for the sentence above! What are they? This is called ambiguity. 30

Semantics Study of the meaning of language bank Ambiguity in the sense of the word 31

Semantics Ross wants to marry a Swedish woman. 32

Pragmatics Study of the meaning of language in context. Literal meaning (semantics) vs. meaning in context: http://www.smbc-comics.com/index.php?id=3730 33

Pragmatics 34

Pragmatics 35

Pragmatics 36

Pragmatics Deixis Interpretation of expressions can depend on extralinguistic context e.g., pronouns I think cilantro tastes great! The entity referred to (the antecedent) by I depends on who is saying this sentence. 37

Discourse Study of the structure of larger spans of language (i.e., beyond individual clauses or sentences) I am angry at her. She lost my cell phone. I am angry at her. The rabbit jumped and ate two carrots. 38

Questions 1. What is the difference between phonetics and phonology? 2. What are two possible readings of this phrase? What level does the ambiguity act at? (i.e., lexical, syntactic, semantic, discourse) old men and women 39

Topics in COMP-550 Progress through the subfields, roughly organized by the level of linguistic analysis Morphology -> Syntax -> Semantics -> Discourse NLP problems: Language modelling, part-of-speech tagging, parsing, word sense disambiguation, semantic parsing, coreference resolution, discourse coherence modelling Focus on: Basic linguistics needed to understand NLP issues Algorithms and problem setups 40

Machine Learning in COMP-550 Interspersed throughout the course, and introduced as necessary Machine learning topics we will cover: Feature extraction Sequence and structure prediction algorithms Probabilistic graphical models Linear discriminative models Neural networks and deep learning 41

Applications in COMP-550 Last three weeks of the course focus on language technology applications and advanced topics: Automatic summarization Machine translation Evaluation issues in NLP 42

Course Objectives Understand the broad topics, applications and common terminology in the field Prepare you for research or employment in CL/NLP Learn some basic linguistics Learn the basic algorithms Be able to read an NLP paper Understand the challenges in CL/NLP Answer questions like Is it easy or hard to 43

Plan for the Next Week I will be away at a conference for the next week Thursday's class: Lecture by TA Krtin Kumar on finite state machines for morphology Tuesday's class: Python tutorial + a presentation of a NLP research project by TA Jad Kabbara This means no office hours next Tuesday. E-mail me if you need to discuss anything. 44