Algorithms for NLP (11-711) Fall Introductory Lecture

Similar documents
Parsing of part-of-speech tagged Assamese Texts

CS 598 Natural Language Processing

LINGUISTICS. Learning Outcomes (Graduate) Learning Outcomes (Undergraduate) Graduate Programs in Linguistics. Bachelor of Arts in Linguistics

Context Free Grammars. Many slides from Michael Collins

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Applications of memory-based natural language processing

Compositional Semantics

Lecture 1: Basic Concepts of Machine Learning

The Conversational User Interface

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Department of Anthropology ANTH 1027A/001: Introduction to Linguistics Dr. Olga Kharytonava Course Outline Fall 2017

Natural Language Processing. George Konidaris

AQUA: An Ontology-Driven Question Answering System

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS177 Python Programming

English Language and Applied Linguistics. Module Descriptions 2017/18

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Data Structures and Algorithms

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

LING 329 : MORPHOLOGY

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS 100: Principles of Computing

Constraining X-Bar: Theta Theory

Control and Boundedness

SPCH 1315: Public Speaking Course Syllabus: SPRING 2014

Language properties and Grammar of Parallel and Series Parallel Languages

A Grammar for Battle Management Language

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Developing a TT-MCTAG for German with an RCG-based Parser

Linguistics. The School of Humanities

Cross Language Information Retrieval

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

ARNE - A tool for Namend Entity Recognition from Arabic Text

Grammars & Parsing, Part 1:

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Some Principles of Automated Natural Language Information Extraction

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Foundations of Knowledge Representation in Cyc

Effect of Word Complexity on L2 Vocabulary Learning

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Syllabus ENGR 190 Introductory Calculus (QR)

ACC 380K.4 Course Syllabus

An Interactive Intelligent Language Tutor Over The Internet

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

ACC 362 Course Syllabus

Disciplinary Literacy in Science

Social Media Marketing BUS COURSE OUTLINE

Syllabus: Introduction to Philosophy

TESL /002 Principles of Linguistics Professor N.S. Baron Spring 2007 Wednesdays 5:30 pm 8:00 pm

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Abstractions and the Brain

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Knowledge-Based - Systems

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

ECON 442: Economic Development Course Syllabus Second Semester 2009/2010


Physics 270: Experimental Physics

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Legal Studies 450: Jurisprudence and Contemporary Issues

Florida Reading Endorsement Alignment Matrix Competency 1

New Venture Financing

ACCREDITATION STANDARDS

Controlled vocabulary

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

A General Class of Noncontext Free Grammars Generating Context Free Languages

EQuIP Review Feedback

Analysis of Probabilistic Parsing in NLP

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Why Pay Attention to Race?

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Self Study Report Computer Science

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Popular Music and Youth Culture DBQ

CS Machine Learning

Emotional Variation in Speech-Based Natural Language Generation

Modeling full form lexica for Arabic

Using dialogue context to improve parsing performance in dialogue systems

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CRITICAL THINKING AND WRITING: ENG 200H-D01 - Spring 2017 TR 10:45-12:15 p.m., HH 205

Arabic Orthography vs. Arabic OCR

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Linking Task: Identifying authors and book titles in verbose queries

RED 3313 Language and Literacy Development course syllabus Dr. Nancy Marshall Associate Professor Reading and Elementary Education

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Transcription:

Algorithms for NLP (11-711) Fall 2015 Introductory Lecture

Motivating the Course

What is NLP? Automating language analysis, generation, acquisition. Analysis (or understanding or processing ): input is language, output is some representation that supports useful action Generation: input is that representation, output is language Acquisition: obtaining the representation and necessary algorithms, from knowledge and data Representation?

Note Some people use NLP to mean all of language technologies. Some people use it only to refer to analysis.

Note 2 NLP vs. Computational Linguistics NLP is focussed on the technology of processing language CL is focussed on using technology to support/implement linguistics (Like AI vs. cognitive science )

analysis generation Levels of Linguistic Representation discourse pragmatics semantics syntax lexemes most of this class morphology phonology phonetics speech orthography text

Why It's Hard 1. The mappings between levels are extremely complex. 2. Appropriateness of a representation depends on the application.

Complexity of Linguistic Representations Input is likely to be noisy. Linguistic representations are theorized constructs; we cannot observe them directly. Ambiguity: each string may have many possible interpretations at every level. The correct resolution of the ambiguity will depend on the intended meaning, which is often inferable from context. People are good at linguistic ambiguity resolution Computers are not so good at it How do we represent sets of possible alternatives? How do we represent context?

Complexity of Linguistic Representations Richness: there are many ways to express the same meaning, and immeasurably many meanings to express. Each level interacts with the others. There is tremendous diversity in human languages. Languages express the same kind of meaning in different ways Some languages express some meanings more readily/often

Let's Examine Some of the Levels

discourse pragmatics semantics syntax lexemes morphology phonology orthography phonetics

Morphology Analysis of words into meaningful components Spectrum of complexity across languages Analytic or Isolating languages (e.g., English, Chinese) Synthetic languages (e.g., Finnish, Turkish, Hebrew) Examples TIFGOSH ET HAYELED BAGAN you will meet the boy in the park Puedes dármelo You can give it to me uygarlaştıramadıklarımızdanmışsınızcasına (behaving) as if you are among those whom we could not civilize unfriend, Obamacare, Bill s

discourse pragmatics semantics syntax lexemes morphology phonology orthography phonetics

Lexical Analysis Normalize and disambiguate words Words with multiple meanings: bank, mean Extra challenge: domain-specific meanings Multi-word expressions make... decision, take out, make up,... For English, part-of-speech tagging is one very common kind of lexical analysis Others: supersense tagging, various forms of word sense disambiguation, syntactic supertags,

discourse pragmatics semantics syntax lexemes morphology phonology orthography phonetics

Syntax Transform a sequence of symbols into a hierarchical or compositional structure. Closely related to linguistic theories about what makes some sentences well-formed and others not. For example: I want a flight to Tokyo I want to fly to Tokyo I found a flight to Tokyo I found to fly to Tokyo Ambiguities explode combinatorially Simple examples: Students hate annoying professors. John saw the woman with the telescope. John saw the woman with the telescope wrapped in paper.

Some of the Possible Syntactic Analyses John saw the woman with the telescope wrapped in paper. John saw the woman with the telescope wrapped in paper. John saw the woman with the telescope wrapped in paper. John saw the woman with the telescope wrapped in paper.

discourse pragmatics semantics syntax lexemes morphology phonology orthography phonetics

Semantics Mapping of natural language sentences into domain representations. E.g., a robot command language, a database query, or an expression in a formal logic. Scope ambiguities: In this country a woman gives birth every fifteen minutes. Groucho Going beyond specific domains is a goal of Artificial Intelligence

discourse pragmatics semantics syntax lexemes morphology phonology orthography phonetics

Pragmatics, Discourse Pragmatics Any non-local meaning phenomena Can you pass the salt? Is he 21? Yes, he s 25. Discourse Structures and effects in related sequences of sentences Texts, dialogues, multi-party conversations I said the black shoes. Oh, black. (Is that a sentence?)

Applications: Challenges Application tasks evolve and are often hard to define formally. Objective evaluations of system performance are always up for debate This holds for NL analysis as well as application tasks. Different applications may require different kinds of representations at different levels.

Key Applications in 2015 Computational linguistics (i.e., modeling the human capacity for language computationally) Information extraction, especially open IE Question answering (e.g., Watson, Siri) Machine translation Summarization Opinion and sentiment analysis Social media analysis

Course Scope This course is meant to introduce some formal tools that will help you navigate the field of NLP. We focus on formalisms and algorithms. This is not a comprehensive overview; it's a deep introduction to some key topics. We'll focus mainly on analysis and mainly on English. The skills you develop will apply to any subfield of NLP

Course Objectives Algorithms for NLP is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages. Objectives: 1. Develop a thorough understanding of the principles and formal methods used in the design and analysis of language processing algorithms. 2. Provide an in-depth presentation of the major algorithms used in NLP, including lexical, morphological, syntactic, and semantic analysis, with the primary focus on parsing algorithms and their analysis.

Introductions

Chris Dyer

Administrivia

Basic Information Instructors: (Chris Dyer, 5707 Bob Frederking, 6515 Miguel Ballesteros, 5413) Office hours: by appointment TAs: (TBA1 TBA2); Office hours: TBA Lecture: Tuesday and Thursday 1:30-2:50, GHC4307 Recitation: Friday 1:30-2:20, DH2302 Not this week! http://demo.clab.cs.cmu.edu/fa2015-11711

What We're Going to Cover 1. Finite-state NLP Formal (regular) language theory (5) Finite-state methods in NLP (5) 2. Context-free NLP Formal (context-free) language theory (2) Parsing algorithms (4) Dynamic programming and search (3) 3. Context-sensitive NLP and Semantics Context-sensitive formalisms (2) Semantic problems and representations (3) 4. Current NLP challenges and research (2)

Formal Background 1. Finite-state NLP Formal (regular) language theory (5) Finite-state methods in NLP (5) 2. Context-free NLP Formal (context-free) language theory (2) Parsing algorithms (4) Dynamic programming and search (3) 3. Context-sensitive NLP and Semantics Context-sensitive formalisms (2) Semantic problems and representations (3) 4. Current NLP challenges and research (2)

Practical NLP Techniques 1. Finite-state NLP Formal (regular) language theory (5) Finite-state methods in NLP (5) 2. Context-free NLP Formal (context-free) language theory (2) Parsing algorithms (4) Dynamic programming and search (3) 3. Context-sensitive NLP and Semantics Context-sensitive formalisms (2) Semantic problems and representations (3) 4. Current NLP challenges and research (2)

Course Philosophy NLP is a very large field! We aim to strike a balance between theory and practice, and between classic foundations and current applications. But mind the gap.

Prerequisites and Corequisites Exposure to syntax and structure of natural language (or at least English) College-level course on algorithms College-level programming skills The NLP Lab (11-712, offered in the spring) complements this course with further programming exercises.

Format Most material will come in the lectures. Readings associated with each lecture will be found on the web page. About five assignments (35% of the grade), each taking about two weeks. Two exams: midterm (25%) and final (40%).

Books and Readings John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman, Introduction to Automata Theory, Languages and Computation. 2000, 2 nd edition, chapters 1-7. Daniel Jurafsky and James H. Martin, Speech and Language Processing. 2008, 2 nd edition, selected chapters. Noah A. Smith, Linguistic Structure Prediction. 2011, chapter 2. (Electronic version is available free through the CMU library.) Others as needed.

Electronic Communication http://demo.clab.cs.cmu.edu/fa2015-11711 Schedule, assignments, readings, lecture slides, additional handouts. Email the instructors: 11711-fall15-instructors@lists.andrew.cmu.edu Subscribe to the course email list! http://lists.andrew.cmu.edu/11711-fall15

Electronic Communication Piazza? Work it out with the TAs

Academic Integrity Please read the cheating policy carefully. Sign the second page and turn it in. Key things to remember: By default, all work must be done individually. Don t copy anyone else s work: Includes previous years solutions Includes materials from other courses (at CMU or elsewhere) Includes publicly available materials Cite sources! Not sure? Ask instructors!

Academic Integrity Severe actions will be taken against students that violate the policy, possibly resulting in course failure or dismissal from the program.