Introduction to Computational Linguistics

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Semi-Supervised Face Detection

Laboratorio di Intelligenza Artificiale e Robotica

Generative models and adversarial training

Lecture 1: Basic Concepts of Machine Learning

Probabilistic Latent Semantic Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evolutive Neural Net Fuzzy Filtering: Basic Description

Assignment 1: Predicting Amazon Review Ratings

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Case Study: News Classification Based on Term Frequency

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Knowledge Transfer in Deep Convolutional Neural Nets

Laboratorio di Intelligenza Artificiale e Robotica

Learning From the Past with Experiment Databases

Calibration of Confidence Measures in Speech Recognition

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speech Recognition at ICSI: Broadcast News and beyond

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Australian Journal of Basic and Applied Sciences

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Axiom 2013 Team Description Paper

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Active Learning. Yingyu Liang Computer Sciences 760 Fall

A study of speaker adaptation for DNN-based speech synthesis

arxiv: v2 [cs.cv] 30 Mar 2017

CS 446: Machine Learning

A survey of multi-view machine learning

Rule Learning With Negation: Issues Regarding Effectiveness

arxiv: v1 [cs.lg] 15 Jun 2015

Artificial Neural Networks written examination

Reducing Features to Improve Bug Prediction

Truth Inference in Crowdsourcing: Is the Problem Solved?

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Human Emotion Recognition From Speech

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Indian Institute of Technology, Kanpur

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Modeling function word errors in DNN-HMM based LVCSR systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Welcome to. ECML/PKDD 2004 Community meeting

Linking Task: Identifying authors and book titles in verbose queries

Lecture 10: Reinforcement Learning

Using dialogue context to improve parsing performance in dialogue systems

Model Ensemble for Click Prediction in Bing Search Ads

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Georgetown University at TREC 2017 Dynamic Domain Track

INPE São José dos Campos

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Rule Learning with Negation: Issues Regarding Effectiveness

Discriminative Learning of Beam-Search Heuristics for Planning

Vorlesung Mensch-Maschine-Interaktion

Machine Learning and Development Policy

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Learning Methods for Fuzzy Systems

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The taming of the data:

Speech Emotion Recognition Using Support Vector Machine

Switchboard Language Model Improvement with Conversational Data from Gigaword

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Toward Probabilistic Natural Logic for Syllogistic Reasoning

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Second Exam: Natural Language Parsing with Neural Networks

Applications of data mining algorithms to analysis of medical data

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Knowledge-Based - Systems

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Time series prediction

Software Maintenance

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Learning Methods in Multilingual Speech Recognition

Universidade do Minho Escola de Engenharia

Probability and Statistics Curriculum Pacing Guide

Word Segmentation of Off-line Handwritten Documents

Word learning as Bayesian inference

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Transcription:

Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Guestrin (2013) University of Washington April 10, 2018 1 / 30

This and last lecture: bird s eye view Next lecture: understand precision & recall in detail Coming next week: N-grams, then CFG will also look in more detail Some more bird s eye topics later in the course 2 / 30

What is? Study of algorithms that: improve their performance at some task with experience Data ML Understanding Note: understanding is more general than e.g. linguistics or speech. (This is where my the distinction between and NLP comes in, and that s why NLP is more closely associated with ML.) 3 / 30

ML tasks: Classification From data to discrete labels Spam filtering Text classification Object detection Weather prediction (e.g. rain, snow...) Sentiment analysis etc. 4 / 30

ML tasks: Regression Predict a numeric value Stock market Weather prediction (temperature) Predict final scores given comments in the code :) 5 / 30

ML tasks: Similarity Finding data Given image, find similar ones Similar products, songs... Similar texts Similar words... 6 / 30

Clustering (Unsupervised learning) Group similar things together 7 / 30

Embedding Representing data (e.g. images) 8 / 30

Embedding Representing data (e.g. words) 9 / 30

Reinforcement Learning Training by feedback Have an agent: make sensor observations select action receive rewards compute a strategy to maximize expected rewards balance immediate reward and exploration pic from: http://www.todayifoundout.com/index.php/2013/08/the-history-of-pac-man/ 10 / 30

Neural Nets and Deep Learning picture from: https://mapr.com/blog/demystifying-ai-ml-dl/ 11 / 30

Data in ML is all about finding patterns in the Data typically, Big data is required Training data find patterns in the data train a function to minimize mistakes (learn) Development data Test data Tune the parameters Perform error analysis Never ever learn on the test data (why?) But even if you never look at your test data but keep using the same test data... The case of Wall Street Journal, section 23 12 / 30

Decision function: separates data points I Train the function on the training data I Given a new ( test ) point, know which side of the DF I parameters, e.g. feature weights are optimized I which θ = P (heads ) makes the data HHHTT most probable I or: find such a vector w that the decision function φ(w1 f1, w2 f2, w3 f3 ) makes least mistakes I (f1 is the value for feature 1, e.g. yesterday s t ) pic from: Koprowski et al. (2012) 13 / 30

Features Extract informative features (e.g. whiskers length, ear size...) Turn them into numbers somehow How important is each feature? (weights) Given the below training dataset, how important is whiskers length? what about color? what about tail position? 14 / 30

Loss function On the training data, observe the true label (value) and penalize the mistakes 15 / 30

Bias-Variance Tradeoff 16 / 30

Fundamental questions in ML (according to Mitchel, 2017) How can computers improve performance through experience? Which theoretical laws govern learning systems? Think again about what NLP s fundamental questions are What about Linguistics fundamental questions? Acquisition (the Holy Grail, for some linguists?) 17 / 30

ML perspectives ML as optimization E.g. optimize a loss function to get better predictions ML as probabilistic inference E.g. derive a function that makes the data most probable (Recall MLE (maximum likelihood estimation)) ML as parametric programming E.g. Deep Learning networks instantiate a specific program out of a set of possible programs ML as evolutionary search :) Is evolution a ML phenomenon? Think again about the research question of NLP...we want to understand something about the world through language... 18 / 30

ML: Key results No free lunch...no system has any basis to reliably classify new examples that go beyond those it has already seen... Three sources of error: Bias, variance, and unavoidable error: Overfitting some probability of us being wrong When True error > Train error What is the relationship between True error and Test error? 19 / 30

Overfitting 20 / 30

Bayesian Networks and Graphical Models Discover some structure in and analyze complex data distributions https://stats.stackexchange.com/questions/249392/how-to-calculate-causal-inference-in-bayesian-networks 21 / 30

Discriminative and Generative models Generative: Learn joint distribution P(x,y) (from which conditional can be inferred) Need to make more assumptions Can generate data (x,y) Based on my generation assumptions, which category is most likely to generate this observation? Example: Naive Bayes classifier, HMM Discriminative: Learn conditional probability directly (P(y x) ) Need fewer constraints/assumptions Which class to predict given observation? Does not care about how the data was generated Example: Logistic Regression (Maximum Entropy classifier) While Generative models sound more generally useful, discriminative often perform better 22 / 30

Discriminative and Generative models x=1: cat goes outside; x=0: cat stays indoors y=1: cat catches mouse; y=0 cat does not catch mouse Observe the cat for 10 days and get the following data (x,y): (0,1), (0,0) (0,0) (1,0) (1,0) (1,0) (1,1) (1,1) (1,1) (1,1) Joint probability of both events happening: P(x,y): y=0 y=1 x=0 0.2 0.1 x=1 0.3 0.4 Conditional P(y x): choice of x value is fixed: y=0 y=1 x=0 0.66 0.33 x=1 0.42 0.58 Now suppose you want to artificially create more observations (e.g. for a computer game about a cat). If you generate N more observations using P(x,y), will you end up with the same probabilities of events? Can you use P(y x)? But how to determine how many times X was equal to 0 and to 1? 23 / 30

Deep Neural Networks A family of ML algorithms where simple units are combined to perform a larger computation Simultaneously train millions of parameters (for all simple units) Development in types of units used LSTM (Long Short Term Memory) units Structural questions are asked about the data But how well can we tell what is going on in the end? Specific architectures for specific problems Good performance...in domain How to generalize knowledge from here? Representation learning Learn new representation of data in hidden layers E.g. progress in relating text to images 24 / 30

Other issues PAC learning theory (upper error bounds) Ensemble learning Semi-supervised learning and Active learning Kernel methods (changing dimensionality of data) Reinforcement learning 25 / 30

Where is ML headed next? Will ML change the way we think about human learning? Human-machine (learning) interaction ML by reading Note that both directions involve natural language understanding 26 / 30

ML & NLP Natural Language Understanding (NLU) in demand in the industry At the same time, NLP is behind e.g. machine vision (in using deep learning) Researchers are after performance improvement on classic tasks, as well as defining new interesting tasks, as well as after understanding how learning works, through language input ML is the dominant paradigm in today s NLP 27 / 30

ML & ML is often employed for automatically tagging data e.g. to get access to larger annotated corpora (bootstrap from smaller sample) Is it likely to discover something linguistically valid about language via statistics? Definitely! But can we learn everything we want this way? The question of how learning happens is equally interesting to most linguists...however, most related questions go beyond well-defined linguistic theories Note that e.g. NLU as defined in NLP is not strictly speaking a linguistic task E.g. semantic theory is not trying to model world knowledge ML is not the dominant paradigm in Linguistics 28 / 30

Python libraries for ML http://scikit-learn.org/stable/install.html comes with good documentation and (usually small) examples! http://scikit-learn.org/stable/tutorial/text analytics/ working with text data.html sample datasets and sample code are available 29 / 30

What you need to know ML fundamental questions Training, Dev, Test data Decision vs. Loss function (what are they for) Overfitting, Sources of error No free lunch Regression vs. Classification Optimization (why is it an important perspective) Features and feature weights (what is their role) Role of conditional probability (why is it used) Difference between conditional and joint probability in terms of generating data 30 / 30