FNLP Lecture 23b Wrapping Up

Similar documents
Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Applications of memory-based natural language processing

CS 598 Natural Language Processing

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Ensemble Technique Utilization for Indonesian Dependency Parser

CS 446: Machine Learning

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Assignment 1: Predicting Amazon Review Ratings

Python Machine Learning

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Natural Language Processing. George Konidaris

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

A Vector Space Approach for Aspect-Based Sentiment Analysis

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

English Language and Applied Linguistics. Module Descriptions 2017/18

Introduction to Text Mining

Semi-supervised Training for the Averaged Perceptron POS Tagger

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models

A Bayesian Learning Approach to Concept-Based Document Classification

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

BYLINE [Heng Ji, Computer Science Department, New York University,

A Comparison of Two Text Representations for Sentiment Analysis

Cross Language Information Retrieval

Parsing of part-of-speech tagged Assamese Texts

The Smart/Empire TIPSTER IR System

Self Study Report Computer Science

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Second Exam: Natural Language Parsing with Neural Networks

Speech Recognition at ICSI: Broadcast News and beyond

A Case Study: News Classification Based on Term Frequency

CSL465/603 - Machine Learning

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Lecture 1: Basic Concepts of Machine Learning

Beyond the Pipeline: Discrete Optimization in NLP

Indian Institute of Technology, Kanpur

Experts Retrieval with Multiword-Enhanced Author Topic Model

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

arxiv: v1 [cs.cl] 2 Apr 2017

THE world surrounding us involves multiple modalities

Training and evaluation of POS taggers on the French MULTITAG corpus

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

AQUA: An Ontology-Driven Question Answering System

The stages of event extraction

Probabilistic Latent Semantic Analysis

Controlled vocabulary

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Grammars & Parsing, Part 1:

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Speech Emotion Recognition Using Support Vector Machine

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Semi-Supervised Face Detection

Compositional Semantics

Discriminative Learning of Beam-Search Heuristics for Planning

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Annotation Projection for Discourse Connectives

Multilingual Sentiment and Subjectivity Analysis

Prediction of Maximal Projection for Semantic Role Labeling

Memory-based grammatical error correction

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Word Segmentation of Off-line Handwritten Documents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Distant Supervised Relation Extraction with Wikipedia and Freebase

The Conversational User Interface

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Online Updating of Word Representations for Part-of-Speech Tagging

A Graph Based Authorship Identification Approach

Developing a TT-MCTAG for German with an RCG-based Parser

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

An Evaluation of POS Taggers for the CHILDES Corpus

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Context Free Grammars. Many slides from Michael Collins

Analysis of Probabilistic Parsing in NLP

Using Semantic Relations to Refine Coreference Decisions

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Test Blueprint. Grade 3 Reading English Standards of Learning

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Knowledge-Based - Systems

Calibration of Confidence Measures in Speech Recognition


Transcription:

FNLP Lecture 23b Wrapping Up Nathan Schneider 7 December 2016 1

In a nutshell We have seen representations, datasets, models, and algorithms for computationally reasoning about textual language in a data-driven fashion. Persistent challenges: Zipf s Law, ambiguity & flexibility, variation, context Core NLP tasks (judgments about the language itself): tokenization, POS tagging, syntactic parsing (constituency, dependency), word sense disambiguation, word similarity, semantic role labeling, coreference resolution NLP applications (solve some practical problem involving/using language): spam classification, language/author identification, sentiment analysis, spelling correction, named entity recognition, question answering, machine translation Which of these are generally easy, and which are hard? 2

Language complexity and diversity Ambiguity and flexibility of expression often best addressed with corpora & statistics Treebanks and statistical parsing Grammatical forms help convey meaning, but the relationship is complicated, motivating semantic representations proposed by linguists, or induced from data Typological variation: Languages vary extensively in phonology, morphology, and syntax

Methods useful for more annotation, crowdsourcing than one task rule-based algorithms, e.g. regular expressions classification (naïve Bayes, perceptron, SVM, MaxEnt) n-gram language modeling grammars & parsing sequence modeling (HMMs, structured perceptron) structured prediction decoding as search: greedy vs. exact; dynamic programming (Viterbi, CKY) 4

Models & Learning Because language is so complex, most NLP tasks benefit from statistical learning. In this course, mostly supervised learning with labeled data. Exceptions: unsupervised learning: the EM algorithm (e.g. for word alignment, topic models) n-gram models: supervised learning, but no extra labels necessary. In NLP research, a tension between building a lot of linguistic insights into models vs. learning almost purely from the data. Current research on neural networks tries to bypass hand-designed features/ intermediate representations as much as possible. We still don t quite know how to capture deep understanding. 5

Generative and discriminative models Assign probability to language AND hidden variable? Or just score hidden variable GIVEN language? Independence assumptions: how useful/harmful are they? all models are wrong, but some are useful bag-of-words; Markov models combining statistics from different sources, e.g. Noisy Channel Model Avoiding overfitting (smoothing, regularization) Evaluation: gold standard? sometimes difficult

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: IBM Model 2 word alignment, where the other sentence has length M? 7

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? 8

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? O(N²L) CKY, with a grammar of size G? 9

Dynamic Programming Algorithms Allow us to search a combinatorial (exponential) space efficiently by reusing partial results. In a sentence of length N, what is the asymptotic runtime complexity of: Word edit distance, where the other sentence has length M? O(M N) Viterbi (in a first-order HMM), with L possible labels? O(N²L) CKY, with a grammar of size G? O(N³G) 10

Applications Question answering, information retrieval, machine translation Your projects! Now that you know the tools in the toolbox, you can build all kinds of cool things!

The Final Exam Tuesday, 4:00-6:00 Largely similar in style to the midterm & quizzes, but with content covering the entire course. and more short answer questions. For each major concept or technique, be prepared to define it, explain its relevance to NLP, discuss its strengths and weaknesses, and compare to alternatives. E.g.: Why is smoothing used? For a model covered in class, describe two methods for smoothing and their pros/cons. Study guide will be posted.

Other Administrivia Grading is ongoing Peer evaluations for the final project Course evaluation https://eval.georgetown.edu/ James will hold usual office hours on Friday. Office hour tomorrow?