Machine Learning for NLP

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

CS 446: Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Switchboard Language Model Improvement with Conversational Data from Gigaword

(Sub)Gradient Descent

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Issues in the Mining of Heart Failure Datasets

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CSL465/603 - Machine Learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Case Study: News Classification Based on Term Frequency

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Reducing Features to Improve Bug Prediction

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Re-envisioning library opening hours: University of the Western Cape library 24/7 Pilot Study

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The stages of event extraction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Decision Making. Unsure about how to decide which sorority to join? Review this presentation to learn more about the mutual selection process!

Human Emotion Recognition From Speech

Learning From the Past with Experiment Databases

Linking Task: Identifying authors and book titles in verbose queries

Learning Methods in Multilingual Speech Recognition

4 th Grade Number and Operations in Base Ten. Set 3. Daily Practice Items And Answer Keys

Memory-based grammatical error correction

A Comparison of Two Text Representations for Sentiment Analysis

arxiv:cmp-lg/ v1 22 Aug 1994

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Semi-Supervised Face Detection

arxiv: v2 [cs.cv] 30 Mar 2017

Applications of memory-based natural language processing

Generative models and adversarial training

A Bayesian Learning Approach to Concept-Based Document Classification

arxiv: v1 [cs.lg] 3 May 2013

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

Universidade do Minho Escola de Engenharia

Disambiguation of Thai Personal Name from Online News Articles

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

International Seminar: Dates, Locations, and Course Descriptions

Axiom 2013 Team Description Paper

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

BYLINE [Heng Ji, Computer Science Department, New York University,

GAT General (Analytical Reasoning Section) NOTE: This is GAT-C where: English-40%, Analytical Reasoning-30%, Quantitative-30% GAT

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Rule Learning with Negation: Issues Regarding Effectiveness

Indian Institute of Technology, Kanpur

Probabilistic Latent Semantic Analysis

Beyond the Pipeline: Discrete Optimization in NLP

A Vector Space Approach for Aspect-Based Sentiment Analysis

Mathematics Success Grade 7

Multilingual Sentiment and Subjectivity Analysis

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

Preference Learning in Recommender Systems

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Word Segmentation of Off-line Handwritten Documents

Artificial Neural Networks written examination

Business 4 exchange academic guide

Using dialogue context to improve parsing performance in dialogue systems

Learning Methods for Fuzzy Systems

TCC Jim Bolen Math Competition Rules and Facts. Rules:

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Data Fusion Through Statistical Matching

Course Syllabus for Math

Graduate Calendar. Graduate Calendar. Fall Semester 2015

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

AQUA: An Ontology-Driven Question Answering System

Cross-Lingual Text Categorization

Vocabulary Usage and Intelligibility in Learner Language

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Model Ensemble for Click Prediction in Bing Search Ads

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Matching Similarity for Keyword-Based Clustering

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Cross-lingual Short-Text Document Classification for Facebook Comments

Word learning as Bayesian inference

Transcription:

Natural Language Processing SoSe 2014 Machine Learning for NLP Dr. Mariana Neves April 30th, 2014 (based on the slides of Dr. Saeedeh Momtazi)

Introduction Field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel, 1959 Learning Methods Supervised learning 2 Active learning Unsupervised learning Semi-supervised learning Reinforcement learning Natural Language Processing Machine Learning for NLP

Outline 3 Supervised Learning Semi-supervised learning Unsupervised learning Natural Language Processing Machine Learning for NLP

Outline 4 Supervised Learning Semi-supervised learning Unsupervised learning Natural Language Processing Machine Learning for NLP

Supervised Learning Example: mortgage credit decision Age Income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 5 Natural Language Processing Machine Learning for NLP

Supervised Learning age? income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 6 Natural Language Processing Machine Learning for NLP

Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1 7? Fn+1 Natural Language Processing Machine Learning for NLP Cn+1

Applications Problems POS tagging Named entity recognition Word sense disambiguation Spam mail detection Language identification Text categorization Information retrieval 8 Natural Language Processing Machine Learning for NLP Items Word Word Word Document Document Document Document Categories POS Named entity Word's sense Spam/Not Spam Language Topic Relevant/Not relevant

Part-of-speech tagging http://weaver.nlplab.org/~brat/demo/latest/#/not-editable/conll-00-chunking/train.txt-doc-1 9 Natural Language Processing Machine Learning for NLP

Named entity recognition http://corpora.informatik.hu-berlin.de/index.xhtml#/cellfinder/version1_sections/16316465_03_results 10 Natural Language Processing Machine Learning for NLP

Word sense disambiguation 11 Natural Language Processing Machine Learning for NLP

Spam mail detection 12 Natural Language Processing Machine Learning for NLP

Language identification 13 Natural Language Processing Machine Learning for NLP

Text categorization 14 Natural Language Processing Machine Learning for NLP

Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1? Fn+1 15 Natural Language Processing Machine Learning for NLP Cn+1

Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 16 Natural Language Processing Machine Learning for NLP

Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 17 Natural Language Processing Machine Learning for NLP

K Nearest Neighbor? 18 Natural Language Processing Machine Learning for NLP

K Nearest Neighbor? 19 Natural Language Processing Machine Learning for NLP

K Nearest Neighbor 1-nearest neighbor 20 Natural Language Processing Machine Learning for NLP

K Nearest Neighbor 3-nearest neighbors? 21 Natural Language Processing Machine Learning for NLP

K Nearest Neighbor 3-nearest neighbors 22 Natural Language Processing Machine Learning for NLP

Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 23 Natural Language Processing Machine Learning for NLP

Support vector machines 24 Natural Language Processing Machine Learning for NLP

Support vector machines Find a hyperplane in the vector space that separates the items of the two categories 25 Natural Language Processing Machine Learning for NLP

Support vector machines There might be more than one possible separating hyperplane 26 Natural Language Processing Machine Learning for NLP

Support vector machines Find the hyperplane with maximum margin Vectors at the margins are called support vectors 27 Natural Language Processing Machine Learning for NLP

Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 28 Natural Language Processing Machine Learning for NLP

Naïve Bayes Selecting the class with highest probability Minimizing the number of items with wrong labels c =argmax c P (c i ) i Probability should depend on the to be classified data (d) P(c i d ) 29 Natural Language Processing Machine Learning for NLP

Naïve Bayes c =argmax c P (c i ) i c =argmax c P (c i d ) i P (d c i ) P (c i ) c =argmax c P (d ) i c =argmax c P (d c i ) P (c i ) i 30 Natural Language Processing Machine Learning for NLP

Naïve Bayes c =argmax c P (d c i ) P (c i ) i Prior probability Likelihood probability 31 Natural Language Processing Machine Learning for NLP

Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1? Fn+1 32 Natural Language Processing Machine Learning for NLP Cn+1

Spam mail detection Features: - words - sender's email - contains links - contains attachments - contains money amounts... 33 Natural Language Processing Machine Learning for NLP

Feature selection Bag-of-words: Each document can be represented by the set of words that appear in the document Result is a high dimensional feature space The process is computationally expensive Solution Using a feature selection method to select informative words 34 Natural Language Processing Machine Learning for NLP

Feature selection methods Information gain Mutual information χ-square 35 Natural Language Processing Machine Learning for NLP

Information gain Measuring the number of bits required for category prediction w.r.t. the presence or absence of a term in the document Removing words whose information gain is less than a predefined threshold IG (w)= i=1 K P (c i ) log P(ci ) + P( w) i=1 + P( w ) i=1 36 Natural Language Processing Machine Learning for NLP K P (c i w ) log P (ci w) K P (c i w ) log P (ci w )

Information gain N = # docs N i = # docs in category ci N w = # docs containing w N w = # docs not containing w N iw = # docs in category ci containing w N i w = # docs in category ci not containing w Ni P(c i )= N Nw P( w)= N P(c i w)= N iw Ni N w P( w )= N P(c i w )= N i w Ni 37 Natural Language Processing Machine Learning for NLP

Mutual information Measuring the effect of each word in predicting the category How much does its presence or absence in a document contribute to category prediction? P (w, c i ) MI ( w, c i )=log P (w) P (c i ) Removing words whose mutual information is less than a predefined threshold MI ( w)=max i MI ( w, c i ) MI ( w)= i P (c i ) MI ( w, c i ) 38 Natural Language Processing Machine Learning for NLP

χ-square Measuring the dependencies between words and categories 2 N ( N iw N iw N i w N i w ) χ 2 (w, c i )= ( N iw + N i w ) ( N i w + N iw ) ( N iw + N i w ) ( N i w + N iw ) Ranking words based on their χ-square measure χ 2 (w)= i=1 K P (c i ) χ 2 (w, ci ) Selecting the top words as features 39 Natural Language Processing Machine Learning for NLP

Feature selection These models perform well for document-level classification Spam Mail Detection Language Identification Text Categorization Word-level Classification might need another types of features Part-of-speech tagging Named Entity Recognition 40 Natural Language Processing Machine Learning for NLP

Supervised learning Shortcoming Relies heavily on annotated data Time consuming and expensive task Solution Active learning Using a minimum amount of annotated data Annotating further data by human, if they are very informative 41 Natural Language Processing Machine Learning for NLP

Active learning 42 Natural Language Processing Machine Learning for NLP

Active learning - Annotating a small amount of data 43 Natural Language Processing Machine Learning for NLP

Active learning - Calculating the confidence score of the classifier on unlabeled data H L M L 44 Natural Language Processing Machine Learning for NLP

Active learning - Finding the informative unlabeled data (data with lowest confidence) H L M L - manually annotating the informative data 45 Natural Language Processing Machine Learning for NLP

Outline Supervised Learning Semi-supervised learning Unsupervised learning 46 Natural Language Processing Machine Learning for NLP

Semi-supervised learning Annotating data is a time consuming and expensive task Solution Using a minimum amount of annotated data Annotating further data automatically 47 Natural Language Processing Machine Learning for NLP

Semi-supervised learning - A small amount of labeled data 48 Natural Language Processing Machine Learning for NLP

Semi-supervised learning - A large amount of unlabeled data 49 Natural Language Processing Machine Learning for NLP

Semi-supervised learning - Finding the similarity between the labeled and unlabeled data - Predicting the labels of the unlabeled data 50 Natural Language Processing Machine Learning for NLP

Semi-supervised learning - Training the classifier using labeled data and predicted labels of unlabeled data 51 Natural Language Processing Machine Learning for NLP

Semi-supervised learning - Introducing a lot of noisy data to the system - Adding unlabeled data to the training set, if the predicted label has a high confidence 52 Natural Language Processing Machine Learning for NLP

Outline Supervised Learning Semi-supervised learning Unsupervised learning 53 Natural Language Processing Machine Learning for NLP

Supervised Learning age? income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 54 Natural Language Processing Machine Learning for NLP

Unsupervised Learning age income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 55 Natural Language Processing Machine Learning for NLP

Unsupervised Learning age income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 56 Natural Language Processing Machine Learning for NLP

Clustering Calculating similarities between the data items Assigning similar data items to the same cluster 57 Natural Language Processing Machine Learning for NLP

Applications Word clustering Speech recognition Machine translation Named entity recognition Information retrieval... Document clustering Text classification Information retrieval... 58 Natural Language Processing Machine Learning for NLP

Speech recognition Computers can recognize a speeech. Computers can wreck a nice peach. recognition speech named-entity hand-writing 59 Natural Language Processing Machine Learning for NLP wreck ball ship

Machine translation The cat eats... Die Katze frisst... Die Katze isst... Katze fressen Hund laufen 60 Natural Language Processing Machine Learning for NLP essen Jung Mann

Language modelling I have a meeting on Moday evening. You should work on Wednesday afternoon. The next session is on Thursday morning. The talk is on Monday morning. The talk is on Monday molding. Monday Thursday Friday Sunday Saturday Tuesday morning afternoon evening night Tuesday 61 Natural Language Processing Machine Learning for NLP

Clustering algorithms Flat K-means Hierarchical Top-Down (Divisive) Bottom-Up (Agglomerative) Single-link Complete-link Average-link 62 Natural Language Processing Machine Learning for NLP

K-means The best known clustering algorithm Works well for many cases Used as default/baseline for clustering documents Defining each cluster center as the mean or centroid of the items in the cluster 1 μ = x c x c Minimizing the average squared Euclidean distance of the items from their cluster centers 63 Natural Language Processing Machine Learning for NLP

K-means Initialization: Randomly choose k items as initial centroids while stopping criterion has not been met do for each item do Find the nearest centroid Assign the item to the cluster associated with the nearest centroid end for for each cluster do Update the centroid of the cluster based on the average of all items in the cluster end for end while Iterating two steps: Re-assignment Assigning each vector to its closest centroid Re-computation Computing each centroid as the average of the vectors that were assigned to it in re-assignment 64 Natural Language Processing Machine Learning for NLP

K-means http://home.deib.polimi.it/matteucc/clustering/tutorial_html/appletkm.html 65 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Creating a hierarchy in the form of a binary tree http://home.deib.polimi.it/matteucc/clustering/tutorial_html/hierarchical.html 66 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Creating a hierarchy in the form of a binary tree 67 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Initial Mapping: Put a single item in each cluster while reaching the predefined number of clusters do for each pair of clusters do Measure the similarity of two clusters end for Merge the two clusters that are most similar end while Measuring the similarity in three ways: Single-link Complete-link Average-link 68 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Single-link / single-linkage clustering Based on the similarity of the most similar members 69 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Complete-link / complete-linkage clustering Based on the similarity of the most dissimilar members 70 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) Average-link / average-linkage clustering Based on the average of all similarities between the members 71 Natural Language Processing Machine Learning for NLP

Hierarchical Agglomerative Clustering (HAC) http://home.deib.polimi.it/matteucc/clustering/tutorial_html/appleth.html 72 Natural Language Processing Machine Learning for NLP

This is no clustering...just word frequencies http://www.wordle.net/display/wrdl/1059224/english_notebook_cover 73 Natural Language Processing Machine Learning for NLP

Further reading 74 Natural Language Processing Machine Learning for NLP

Further reading 75 Natural Language Processing Machine Learning for NLP

Further reading 76 Natural Language Processing Machine Learning for NLP