Natural Language Processing SoSe 2014 Machine Learning for NLP Dr. Mariana Neves April 30th, 2014 (based on the slides of Dr. Saeedeh Momtazi)
Introduction Field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel, 1959 Learning Methods Supervised learning 2 Active learning Unsupervised learning Semi-supervised learning Reinforcement learning Natural Language Processing Machine Learning for NLP
Outline 3 Supervised Learning Semi-supervised learning Unsupervised learning Natural Language Processing Machine Learning for NLP
Outline 4 Supervised Learning Semi-supervised learning Unsupervised learning Natural Language Processing Machine Learning for NLP
Supervised Learning Example: mortgage credit decision Age Income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 5 Natural Language Processing Machine Learning for NLP
Supervised Learning age? income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 6 Natural Language Processing Machine Learning for NLP
Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1 7? Fn+1 Natural Language Processing Machine Learning for NLP Cn+1
Applications Problems POS tagging Named entity recognition Word sense disambiguation Spam mail detection Language identification Text categorization Information retrieval 8 Natural Language Processing Machine Learning for NLP Items Word Word Word Document Document Document Document Categories POS Named entity Word's sense Spam/Not Spam Language Topic Relevant/Not relevant
Part-of-speech tagging http://weaver.nlplab.org/~brat/demo/latest/#/not-editable/conll-00-chunking/train.txt-doc-1 9 Natural Language Processing Machine Learning for NLP
Named entity recognition http://corpora.informatik.hu-berlin.de/index.xhtml#/cellfinder/version1_sections/16316465_03_results 10 Natural Language Processing Machine Learning for NLP
Word sense disambiguation 11 Natural Language Processing Machine Learning for NLP
Spam mail detection 12 Natural Language Processing Machine Learning for NLP
Language identification 13 Natural Language Processing Machine Learning for NLP
Text categorization 14 Natural Language Processing Machine Learning for NLP
Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1? Fn+1 15 Natural Language Processing Machine Learning for NLP Cn+1
Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 16 Natural Language Processing Machine Learning for NLP
Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 17 Natural Language Processing Machine Learning for NLP
K Nearest Neighbor? 18 Natural Language Processing Machine Learning for NLP
K Nearest Neighbor? 19 Natural Language Processing Machine Learning for NLP
K Nearest Neighbor 1-nearest neighbor 20 Natural Language Processing Machine Learning for NLP
K Nearest Neighbor 3-nearest neighbors? 21 Natural Language Processing Machine Learning for NLP
K Nearest Neighbor 3-nearest neighbors 22 Natural Language Processing Machine Learning for NLP
Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 23 Natural Language Processing Machine Learning for NLP
Support vector machines 24 Natural Language Processing Machine Learning for NLP
Support vector machines Find a hyperplane in the vector space that separates the items of the two categories 25 Natural Language Processing Machine Learning for NLP
Support vector machines There might be more than one possible separating hyperplane 26 Natural Language Processing Machine Learning for NLP
Support vector machines Find the hyperplane with maximum margin Vectors at the margins are called support vectors 27 Natural Language Processing Machine Learning for NLP
Classification algorithms K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Linear Regression Logistic Regression Neural Networks Decision Trees Boosting... 28 Natural Language Processing Machine Learning for NLP
Naïve Bayes Selecting the class with highest probability Minimizing the number of items with wrong labels c =argmax c P (c i ) i Probability should depend on the to be classified data (d) P(c i d ) 29 Natural Language Processing Machine Learning for NLP
Naïve Bayes c =argmax c P (c i ) i c =argmax c P (c i d ) i P (d c i ) P (c i ) c =argmax c P (d ) i c =argmax c P (d c i ) P (c i ) i 30 Natural Language Processing Machine Learning for NLP
Naïve Bayes c =argmax c P (d c i ) P (c i ) i Prior probability Likelihood probability 31 Natural Language Processing Machine Learning for NLP
Classification Training T1 T2 Tn C1 C2 Cn F1 F2 Fn Model(F,C) Testing Tn+1? Fn+1 32 Natural Language Processing Machine Learning for NLP Cn+1
Spam mail detection Features: - words - sender's email - contains links - contains attachments - contains money amounts... 33 Natural Language Processing Machine Learning for NLP
Feature selection Bag-of-words: Each document can be represented by the set of words that appear in the document Result is a high dimensional feature space The process is computationally expensive Solution Using a feature selection method to select informative words 34 Natural Language Processing Machine Learning for NLP
Feature selection methods Information gain Mutual information χ-square 35 Natural Language Processing Machine Learning for NLP
Information gain Measuring the number of bits required for category prediction w.r.t. the presence or absence of a term in the document Removing words whose information gain is less than a predefined threshold IG (w)= i=1 K P (c i ) log P(ci ) + P( w) i=1 + P( w ) i=1 36 Natural Language Processing Machine Learning for NLP K P (c i w ) log P (ci w) K P (c i w ) log P (ci w )
Information gain N = # docs N i = # docs in category ci N w = # docs containing w N w = # docs not containing w N iw = # docs in category ci containing w N i w = # docs in category ci not containing w Ni P(c i )= N Nw P( w)= N P(c i w)= N iw Ni N w P( w )= N P(c i w )= N i w Ni 37 Natural Language Processing Machine Learning for NLP
Mutual information Measuring the effect of each word in predicting the category How much does its presence or absence in a document contribute to category prediction? P (w, c i ) MI ( w, c i )=log P (w) P (c i ) Removing words whose mutual information is less than a predefined threshold MI ( w)=max i MI ( w, c i ) MI ( w)= i P (c i ) MI ( w, c i ) 38 Natural Language Processing Machine Learning for NLP
χ-square Measuring the dependencies between words and categories 2 N ( N iw N iw N i w N i w ) χ 2 (w, c i )= ( N iw + N i w ) ( N i w + N iw ) ( N iw + N i w ) ( N i w + N iw ) Ranking words based on their χ-square measure χ 2 (w)= i=1 K P (c i ) χ 2 (w, ci ) Selecting the top words as features 39 Natural Language Processing Machine Learning for NLP
Feature selection These models perform well for document-level classification Spam Mail Detection Language Identification Text Categorization Word-level Classification might need another types of features Part-of-speech tagging Named Entity Recognition 40 Natural Language Processing Machine Learning for NLP
Supervised learning Shortcoming Relies heavily on annotated data Time consuming and expensive task Solution Active learning Using a minimum amount of annotated data Annotating further data by human, if they are very informative 41 Natural Language Processing Machine Learning for NLP
Active learning 42 Natural Language Processing Machine Learning for NLP
Active learning - Annotating a small amount of data 43 Natural Language Processing Machine Learning for NLP
Active learning - Calculating the confidence score of the classifier on unlabeled data H L M L 44 Natural Language Processing Machine Learning for NLP
Active learning - Finding the informative unlabeled data (data with lowest confidence) H L M L - manually annotating the informative data 45 Natural Language Processing Machine Learning for NLP
Outline Supervised Learning Semi-supervised learning Unsupervised learning 46 Natural Language Processing Machine Learning for NLP
Semi-supervised learning Annotating data is a time consuming and expensive task Solution Using a minimum amount of annotated data Annotating further data automatically 47 Natural Language Processing Machine Learning for NLP
Semi-supervised learning - A small amount of labeled data 48 Natural Language Processing Machine Learning for NLP
Semi-supervised learning - A large amount of unlabeled data 49 Natural Language Processing Machine Learning for NLP
Semi-supervised learning - Finding the similarity between the labeled and unlabeled data - Predicting the labels of the unlabeled data 50 Natural Language Processing Machine Learning for NLP
Semi-supervised learning - Training the classifier using labeled data and predicted labels of unlabeled data 51 Natural Language Processing Machine Learning for NLP
Semi-supervised learning - Introducing a lot of noisy data to the system - Adding unlabeled data to the training set, if the predicted label has a high confidence 52 Natural Language Processing Machine Learning for NLP
Outline Supervised Learning Semi-supervised learning Unsupervised learning 53 Natural Language Processing Machine Learning for NLP
Supervised Learning age? income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 54 Natural Language Processing Machine Learning for NLP
Unsupervised Learning age income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 55 Natural Language Processing Machine Learning for NLP
Unsupervised Learning age income http://nationalmortgageprofessional.com/news24271/regulatory-compliance-outlook-new-risk-based-pricing-rules 56 Natural Language Processing Machine Learning for NLP
Clustering Calculating similarities between the data items Assigning similar data items to the same cluster 57 Natural Language Processing Machine Learning for NLP
Applications Word clustering Speech recognition Machine translation Named entity recognition Information retrieval... Document clustering Text classification Information retrieval... 58 Natural Language Processing Machine Learning for NLP
Speech recognition Computers can recognize a speeech. Computers can wreck a nice peach. recognition speech named-entity hand-writing 59 Natural Language Processing Machine Learning for NLP wreck ball ship
Machine translation The cat eats... Die Katze frisst... Die Katze isst... Katze fressen Hund laufen 60 Natural Language Processing Machine Learning for NLP essen Jung Mann
Language modelling I have a meeting on Moday evening. You should work on Wednesday afternoon. The next session is on Thursday morning. The talk is on Monday morning. The talk is on Monday molding. Monday Thursday Friday Sunday Saturday Tuesday morning afternoon evening night Tuesday 61 Natural Language Processing Machine Learning for NLP
Clustering algorithms Flat K-means Hierarchical Top-Down (Divisive) Bottom-Up (Agglomerative) Single-link Complete-link Average-link 62 Natural Language Processing Machine Learning for NLP
K-means The best known clustering algorithm Works well for many cases Used as default/baseline for clustering documents Defining each cluster center as the mean or centroid of the items in the cluster 1 μ = x c x c Minimizing the average squared Euclidean distance of the items from their cluster centers 63 Natural Language Processing Machine Learning for NLP
K-means Initialization: Randomly choose k items as initial centroids while stopping criterion has not been met do for each item do Find the nearest centroid Assign the item to the cluster associated with the nearest centroid end for for each cluster do Update the centroid of the cluster based on the average of all items in the cluster end for end while Iterating two steps: Re-assignment Assigning each vector to its closest centroid Re-computation Computing each centroid as the average of the vectors that were assigned to it in re-assignment 64 Natural Language Processing Machine Learning for NLP
K-means http://home.deib.polimi.it/matteucc/clustering/tutorial_html/appletkm.html 65 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Creating a hierarchy in the form of a binary tree http://home.deib.polimi.it/matteucc/clustering/tutorial_html/hierarchical.html 66 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Creating a hierarchy in the form of a binary tree 67 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Initial Mapping: Put a single item in each cluster while reaching the predefined number of clusters do for each pair of clusters do Measure the similarity of two clusters end for Merge the two clusters that are most similar end while Measuring the similarity in three ways: Single-link Complete-link Average-link 68 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Single-link / single-linkage clustering Based on the similarity of the most similar members 69 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Complete-link / complete-linkage clustering Based on the similarity of the most dissimilar members 70 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) Average-link / average-linkage clustering Based on the average of all similarities between the members 71 Natural Language Processing Machine Learning for NLP
Hierarchical Agglomerative Clustering (HAC) http://home.deib.polimi.it/matteucc/clustering/tutorial_html/appleth.html 72 Natural Language Processing Machine Learning for NLP
This is no clustering...just word frequencies http://www.wordle.net/display/wrdl/1059224/english_notebook_cover 73 Natural Language Processing Machine Learning for NLP
Further reading 74 Natural Language Processing Machine Learning for NLP
Further reading 75 Natural Language Processing Machine Learning for NLP
Further reading 76 Natural Language Processing Machine Learning for NLP