Text Classification & Naïve Bayes

Similar documents
(Sub)Gradient Descent

CS 446: Machine Learning

Python Machine Learning

Lecture 1: Machine Learning Basics

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Probabilistic Latent Semantic Analysis

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Assignment 1: Predicting Amazon Review Ratings

CS Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

Rule Learning With Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

Compositional Semantics

Reducing Features to Improve Bug Prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Learning From the Past with Experiment Databases

A Case Study: News Classification Based on Term Frequency

BMC Medical Informatics and Decision Making 2012, 12:33

Using dialogue context to improve parsing performance in dialogue systems

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Australian Journal of Basic and Applied Sciences

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Rule Learning with Negation: Issues Regarding Effectiveness

Cross-lingual Short-Text Document Classification for Facebook Comments

The stages of event extraction

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Generative models and adversarial training

Speech Emotion Recognition Using Support Vector Machine

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Exploration. CS : Deep Reinforcement Learning Sergey Levine

arxiv: v1 [cs.cl] 2 Apr 2017

A Bayesian Learning Approach to Concept-Based Document Classification

Detecting English-French Cognates Using Orthographic Edit Distance

Grade 6: Correlated to AGS Basic Math Skills

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Tablet PCs, Interactive Teaching, and Integrative Advising Promote STEM Success

Switchboard Language Model Improvement with Conversational Data from Gigaword

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Multi-Lingual Text Leveling

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Semi-Supervised Face Detection

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Leveraging Sentiment to Compute Word Similarity

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

A Comparison of Two Text Representations for Sentiment Analysis

South Carolina English Language Arts

Multilingual Sentiment and Subjectivity Analysis

Softprop: Softmax Neural Network Backpropagation Learning

Radius STEM Readiness TM

Using Web Searches on Important Words to Create Background Sets for LSI Classification

The Ups and Downs of Preposition Error Detection in ESL Writing

Trends in Student Aid and Trends in College Pricing

Economics Unit: Beatrice s Goat Teacher: David Suits

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The taming of the data:

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Verbal Behaviors and Persuasiveness in Online Multimedia Content

CS177 Python Programming

CHESTER FRITZ AUDITORIUM REPORT

Disambiguation of Thai Personal Name from Online News Articles

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

arxiv: v1 [cs.lg] 3 May 2013

INFORMS Transactions on Education

Multi-label classification via multi-target regression on data streams

Beyond the Pipeline: Discrete Optimization in NLP

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Lecture 1: Basic Concepts of Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

WHEN THERE IS A mismatch between the acoustic

Bug triage in open source systems: a review

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Universidade do Minho Escola de Engenharia

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

TRENDS IN. College Pricing

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Term Weighting based on Document Revision History

Managerial Decision Making

Detecting Online Harassment in Social Networks

Truth Inference in Crowdsourcing: Is the Problem Solved?

Answer each question by placing an X over the appropriate answer. Select only one answer for each question.

Should a business have the right to ban teenagers?

Prokaryotic/Eukaryotic Cells Lesson Plan ETPT 2020:008 Sidney, Tiana, Iyona & Jeremy Team Hinckley 4/23/2013

I m sorry Dave, I m afraid I can t do that : Linguistics, Statistics, and Natural Language Processing circa 2001 Λ

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

MYCIN. The MYCIN Task

November 2012 MUET (800)

Transcription:

Text Classification & Naïve Bayes CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Some slides by Dan Jurafsky & James Martin, Jacob Eisenstein

Today Text classification problems and their evaluation Linear classifiers Features & Weights Bag of words Naïve Bayes Machine Learning, Probability Linguistics

TEXT CLASSIFICATION

Is this spam? From: "Fabian Starr <Patrick_Freeman@pamietaniepeerelu.pl> Subject: Hey! Sofware for the funny prices! Get the great discounts on popular software today for PC and Macintosh http://iiled.org/cj4lmx 70-90% Discounts from retail price!!! All sofware is instantly available to download - No Need Wait!

Who wrote which Federalist papers? 1787-8: anonymous essays try to convince New York to ratify U.S Constitution: Jay, Madison, Hamilton. Authorship of 12 of the letters in dispute 1963: solved by Mosteller and Wallace using Bayesian methods James Madison Alexander Hamilton

Positive or negative movie review? unbelievably disappointing Full of zany characters and richly applied satire, and some great plot twists this is the greatest screwball comedy ever filmed It was pathetic. The worst part about it was the boxing scenes.

What is the subject of this article? MEDLINE Article MeSH Subject Category Hierarchy? Antogonists and Inhibitors Blood Supply Chemistry Drug Therapy Embryology Epidemiology

Text Classification Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis

Text Classification: definition Input: a document w a fixed set of classes Y = {y 1, y 2,, y J } Output: a predicted class y Y

Classification Methods: Hand-coded rules Rules based on combinations of words or other features spam: black-list-address OR ( dollars AND have been selected ) Accuracy can be high If rules carefully refined by expert But building and maintaining these rules is expensive

Input Classification Methods: Supervised Machine Learning a document w a fixed set of classes Y = {y 1, y 2,, y J } A training set of m hand-labeled documents (w 1,y 1 ),...,(w m,y m ) Output a learned classifier w y

Aside: getting examples for supervised learning Human annotation By experts or non-experts (crowdsourcing) Found data Truth vs. gold standard How do we know how good a classifier is? Accuracy on held out data

Aside: evaluating classifiers How do we know how good a classifier is? Compare classifier predictions with human annotation On held out test examples Evaluation metrics: accuracy, precision, recall

The 2-by-2 contingency table correct not correct selected tp fp not selected fn tn

Precision and recall Precision: % of selected items that are correct Recall: % of correct items that are selected correct not correct selected tp fp not selected fn tn

A combined measure: F A combined measure that assesses the P/R tradeoff is F measure (weighted harmonic mean): F 2 1 ( b + 1) PR = = 2 1 1 a + (1 -a) b P + R P R People usually use balanced F1 measure i.e., with = 1 (that is, = ½): F = 2PR/(P+R)

LINEAR CLASSIFIERS

Bag of words

Defining features

Linear classification

Linear Models for Classification Feature function representation Weights

How can we learn weights? By hand Probability Today: Naïve Bayes Discriminative training e.g., perceptron, support vector machines

Generative Story for Multinomial Naïve Bayes A hypothetical stochastic process describing how training examples are generated

Prediction with Naïve Bayes

Parameter Estimation count and normalize Parameters of a multinomial distribution Relative frequency estimator Formally: this is the maximum likelihood estimate See CIML for derivation

Smoothing

Naïve Bayes recap

Today Text classification problems and their evaluation Linear classifiers Features & Weights Bag of words Naïve Bayes Machine Learning, Probability Linguistics