Lecture 24 Wrapping Up

Similar documents
Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Applications of memory-based natural language processing

CS 598 Natural Language Processing

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

CS Machine Learning

Lecture 1: Machine Learning Basics

Linking Task: Identifying authors and book titles in verbose queries

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

(Sub)Gradient Descent

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Natural Language Processing. George Konidaris

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Ensemble Technique Utilization for Indonesian Dependency Parser

A Vector Space Approach for Aspect-Based Sentiment Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

BYLINE [Heng Ji, Computer Science Department, New York University,

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Introduction to Text Mining

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Parsing of part-of-speech tagged Assamese Texts

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Compositional Semantics

Semi-supervised Training for the Averaged Perceptron POS Tagger

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

CS 446: Machine Learning

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Beyond the Pipeline: Discrete Optimization in NLP

arxiv: v1 [cs.cl] 2 Apr 2017

A Comparison of Two Text Representations for Sentiment Analysis

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Cross Language Information Retrieval

A Bayesian Learning Approach to Concept-Based Document Classification

English Language and Applied Linguistics. Module Descriptions 2017/18

A Case Study: News Classification Based on Term Frequency

CSL465/603 - Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The Smart/Empire TIPSTER IR System

Grammars & Parsing, Part 1:

Second Exam: Natural Language Parsing with Neural Networks

Training and evaluation of POS taggers on the French MULTITAG corpus

Discriminative Learning of Beam-Search Heuristics for Planning

The stages of event extraction

Annotation Projection for Discourse Connectives

Self Study Report Computer Science

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Indian Institute of Technology, Kanpur

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Memory-based grammatical error correction

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Probabilistic Latent Semantic Analysis

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Developing a TT-MCTAG for German with an RCG-based Parser

Experts Retrieval with Multiword-Enhanced Author Topic Model

THE world surrounding us involves multiple modalities

Distant Supervised Relation Extraction with Wikipedia and Freebase

Online Updating of Word Representations for Part-of-Speech Tagging

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Analysis of Probabilistic Parsing in NLP

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

A Graph Based Authorship Identification Approach

An Evaluation of POS Taggers for the CHILDES Corpus

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Lecture 1: Basic Concepts of Machine Learning

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Semi-Supervised Face Detection


OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

AQUA: An Ontology-Driven Question Answering System

Multilingual Sentiment and Subjectivity Analysis

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Teaching ideas. AS and A-level English Language Spark their imaginations this year

Speech Emotion Recognition Using Support Vector Machine

The Conversational User Interface

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Word Segmentation of Off-line Handwritten Documents

Prediction of Maximal Projection for Semantic Role Labeling

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Developing a large semantically annotated corpus

Speech Recognition at ICSI: Broadcast News and beyond

Program in Linguistics. Academic Year Assessment Report

Context Free Grammars. Many slides from Michael Collins

Transcription:

Lecture 24 Wrapping Up Nathan Schneider ENLP 30 April 2018 1

In a nutshell We have seen representations, datasets, models, and algorithms for computationally reasoning about textual language. Persistent challenges: Zipf s Law, ambiguity & flexibility, variation, context Core NLP tasks (judgments about the language itself): tokenization, POS tagging, syntactic parsing (constituency, dependency), word sense disambiguation, word similarity, semantic role labeling, coreference resolution NLP applications (solve some practical problem involving/using language): spam classification, language/author identification, sentiment analysis, named entity recognition, question answering, machine translation Which of these are generally easy, and which are hard? 2

Language complexity and diversity Ambiguity and flexibility of expression often best addressed with corpora & statistics Treebanks and statistical parsing Grammatical forms help convey meaning, but the relationship is complicated, motivating semantic representations proposed by linguists, or induced from data Typological variation: Languages vary extensively in phonology, morphology, and syntax

Methods useful for more than one task annotation, crowdsourcing rule-based/finite-state methods, e.g. regular expressions classification (naïve Bayes, perceptron) language modeling (n-gram or neural) grammars & parsing sequence modeling (HMMs, structured perceptron) structured prediction dynamic programming (Viterbi, CKY) 4

Models & Learning Because language is so complex, most NLP tasks benefit from statistical learning. In this course, mostly supervised learning with labeled data. Exceptions: unsupervised learning: the EM algorithm (e.g. for word alignment, topic models) language models, distributional similarity/embeddings: supervised learning, but no extra labels necessary the context is the supervision In NLP research, a tension between building a lot of linguistic insights into models vs. learning almost purely from the data. Current research on neural networks tries to bypass hand-designed features/ intermediate representations as much as possible. We still don t quite know how to capture deep understanding. 5

Generative and discriminative models Assign probability to language AND hidden variable? Or just score hidden variable GIVEN language? Independence assumptions: how useful/harmful are they? all models are wrong, but some are useful bag-of-words; Markov models combining statistics from different sources, e.g. Noisy Channel Model Avoiding overfitting (smoothing, regularization) Evaluation: gold standard? sometimes difficult

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: IBM Model 2 word alignment, where the other sentence has length M? 7

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? 8

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? O(NL²) CKY, with a grammar of size G? 9

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? O(NL²) CKY, with a grammar of size G? O(N³G) 10

Applications Sentiment analysis, machine translation Your projects! Now that you know the tools in the toolbox, you can

The Final Exam Thursday 5/10, 4:00-6:00 Largely similar in style to the midterm & quizzes, but with content covering the entire course. and more short answer questions. For each major concept or technique, be prepared to define it, explain its relevance to NLP, discuss its strengths and weaknesses, and compare to alternatives. E.g.: Why is smoothing used? For a model covered in class, describe two methods for smoothing and their pros/cons. Study guide will be posted. Review session: Wednesday 1:00 2:00, ICC 462

Other Administrivia Projects due midnight tomorrow! Peer evaluations for the final project (watch for an announcement after tomorrow; we need these to determine your grade) No more office hours (unless you contact us) Related courses next semester include Advanced Semantic Representation (COSC/LING-672) and Dialogue Systems (COSC-483/LING-463) TA & course evaluations https://eval.georgetown.edu/