Sentiment Analysis. wine_sentiment.r

Similar documents
Assignment 1: Predicting Amazon Review Ratings

Lecture 1: Machine Learning Basics

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Probability and Statistics Curriculum Pacing Guide

CS Machine Learning

STA 225: Introductory Statistics (CT)

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning From the Past with Experiment Databases

Universityy. The content of

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Individual Differences & Item Effects: How to test them, & how to test them well

Probabilistic Latent Semantic Analysis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

(Sub)Gradient Descent

On-the-Fly Customization of Automated Essay Scoring

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

4.0 CAPACITY AND UTILIZATION

w o r k i n g p a p e r s

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

arxiv: v1 [cs.cl] 2 Apr 2017

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Working Paper: Do First Impressions Matter? Improvement in Early Career Teacher Effectiveness Allison Atteberry 1, Susanna Loeb 2, James Wyckoff 1

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

12- A whirlwind tour of statistics

Truth Inference in Crowdsourcing: Is the Problem Solved?

Statewide Framework Document for:

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Artificial Neural Networks written examination

Lecture 2: Quantifiers and Approximation

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Getting Started with MOODLE

The Evolution of Random Phenomena

A study of speaker adaptation for DNN-based speech synthesis

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Running head: DELAY AND PROSPECTIVE MEMORY 1

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Software Maintenance

Model Ensemble for Click Prediction in Bing Search Ads

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Knowledge Transfer in Deep Convolutional Neural Nets

A Comparison of Charter Schools and Traditional Public Schools in Idaho

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Corpus Linguistics (L615)

Comment-based Multi-View Clustering of Web 2.0 Items

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Go fishing! Responsibility judgments when cooperation breaks down

Speech Emotion Recognition Using Support Vector Machine

Genevieve L. Hartman, Ph.D.

Data Fusion Through Statistical Matching

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

TD(λ) and Q-Learning Based Ludo Players

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

A Case Study: News Classification Based on Term Frequency

Development of Multistage Tests based on Teacher Ratings

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Why Did My Detector Do That?!

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

How to Judge the Quality of an Objective Classroom Test

Grade 6: Correlated to AGS Basic Math Skills

NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Unequal Opportunity in Environmental Education: Environmental Education Programs and Funding at Contra Costa Secondary Schools.

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Calibration of Confidence Measures in Speech Recognition

California Professional Standards for Education Leaders (CPSELs)

arxiv: v1 [cs.lg] 15 Jun 2015

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Australian Journal of Basic and Applied Sciences

A Bootstrapping Model of Frequency and Context Effects in Word Learning

A. What is research? B. Types of research

Extending Place Value with Whole Numbers to 1,000,000

AQUA: An Ontology-Driven Question Answering System

Radius STEM Readiness TM

learning collegiate assessment]

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Multilingual Sentiment and Subjectivity Analysis

Theory of Probability

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

School Size and the Quality of Teaching and Learning

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

An overview of risk-adjusted charts

Machine Learning and Development Policy

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Multi-Lingual Text Leveling

Analysis of Enzyme Kinetic Data

Transcription:

Sentiment Analysis 39 wine_sentiment.r

Dictionary Methods Count the usage of words from specified lists Example LWIC Tausczik and Pennebake (2010), The Psychological Meaning of Words, Journal of Language and Social Psychology Positive and negative emotions Sources Essentially make our own later LIWC developed for various languages Methods in other direction: read summary and write article WSJ Google for current locations, languages Software 40

LIWC Words Linguistic Inquiry and Word Count (LIWC) Commercial collection of words # in category 41

Sentiment Analysis Basic version Identify words that associate with different concepts Positive - Negative Cruel - Kind Red - White wine Over a corpus of documents, count the prevalence of the different types of words Use differences in these counts as a measure of the sentiment of the document Application Words used by judge hearing a case 42

Word Lists Established word lists Bing Liu s negative/positive words from early paper LIWC commercial list (next slide) Grow your own Start with seed words Expand using WordNet to find synonyms, antonyms Issues Counting only Count funny also counts not funny Parsing complicates the analysis Words that are negative may not be negative in every context 43

Example with Wines Relate counts of words to points assigned to wines Some words clearly not negative are counted as such example: lemon Use counts or proportions Difference in counts linearly related to points est points 85.5 + 0.6 score RMSE 3 R 2 14% 44

Negative Words less Useful Role of positive/negative words Asymmetric association Positive words add more than negative words Multiple regression, however, gives a different impression 45

Combination Multiple regression with positive and negative A model with these counts basically repeats the two simple regressions These counts are not highly correlated (r -0.09) Adding total word count tells a different story Why so different from prior? 46

Regression Methods & Examples 47 wine_regr.r

Regression Analysis Objective Find weighted combination of variables that best predicts a response Application to text What weighted combination of word counts best predicts the rating point of a wine? Perspective Sentiment analysis assigns fixed weight to selected words Regression assigns weights that are most predictive in the context of the observed corpus 48

Regression vs Sentiment Previous sentiment analysis Common positive weight to positive words Common negative weight to negative words Advantage: no modeling, can do unsupervised Disadvantage: generic, not adapted to problem Regression model Customize the weight for the observed data Advantage: customized! Better fit, more predictive Disadvantage: Must be superivised. Which words? 49

Which words? How to pick the word features to use? Variable selection for regression Theory Very much like sentiment analysis, but with custom weights External sorting Limit the analysis to the most common word types Stepwise type selection methods Need criterion like Bonferroni to avoid overfitting Lasso type penalized methods Popular, fast alternative to stepwise methods Convex algorithm faster than stepwise search (albeit different search) 50

Shrinkage Methods Alternative to subset selection Difficult to identify and fit all subsets Consider how many such models are possible... Solve a simpler problem that shrinks estimates Careful. Estimates need to be on common scale to combine Why shrink? Trade bias to reduce variance Shrinkage allows fitting all the variables even if more variables than cases Penalized likelihoods Penalize by a measure of the size of the coefficients. Fit has to improve by enough (RSS decrease) to compensate for size of coefficients Ridge regression: min RSS + λ 2 b b Lasso regression: min RSS + λ 1 Σ bj λ is a tuning parameter that must be chosen by some method usually cross-validation Also have a Bayesian interpretation (see ISL) 51

L1 vs L2 Penalty L 1 L 2 b 1 b 1 b 2 b 2 min RSS, Σ b j <c min RSS, Σb j2 <c Corners produce selection Figure 6.7, p 222 Interpret λ as Lagrange multiplier. 52

Cross-Validation Fundamental, commonly used Use part of the data to build a model Use a separate, hidden part to test the model Happens often in practice in consulting Question: how to partition data? Remedy Repeat the division between the two groups K-fold cross-validation partitions data into K parts Fit to K-1 folds, validate on 1 fold (K = 5,10) 53

Missing Data Always present In medical example, only 170 out of 1,200 cases were complete Often informative In bankruptcy model, half of predictors indicate presence of missing data Is data ever missing at random? Handle as part of the modeling process? Offer a simple patch that requires few assumptions Main idea Done as a data preparation step Add indicator column for missing values Fill the missing value 54

Handle Missing by Adding Vars Add another variable Add indicator column for missing values Fill the missing with average of those seen Simple approach, fewer assumptions Expands the domain of the feature search ONLY applies to explanatory variables, never the response Allows missing cases to behave differently Conservative evaluation of variable Part of the modeling process Distinguish missing subsets only if predictive Missing in a categorical variable: not a problem Missing define another category 55

Example Data frame with missing values Filled in data with added indicator columns missing_data.r 56

Validation Regression for Points Set aside 5,000 cases for checking models Initial model, without words Note the significant role for the missing indicators 57

Regression for Points Initial model, with only words(proportion) and lengths Just 15 words to get the idea, adding lengths really helps 58

Regression for Points Combined R files build larger models more Dilemma Get better and better as keep adding more words 59

Calibration Plot Check out-of-sample fit is correct on average Does out of sample fit match claimed fit of model? Check that predictions are honest: E(Y Ŷ) = Ŷ Common problem Limited range response Any wines more than 100 pts? Less than 80 points? True Points 80 85 90 95 80 85 90 95 Estimated Points 60

Checking Claimed Precision Does model meet claims of precision Are the predictions of the model for the test data as good as they are when predicting the training data The training data was used to build the model Overfitting Occurs when model capitalizes on random variation in the training data Predicts training data better than test data. For example Average squared prediction error in test > in training Correlation 2 (predicted, actual) in test < in training (ie R 2 ) 61

Lasso Fit Which model do you want to keep Fishbone plot for model with others and words 0 13 24 35 44 Coefficients -20-10 0 10 20 30 0 100 200 300 400 L1 Norm 62

Cross-Validation Picks 10 fold cross validation Chooses best value for the tuning parameter Big model! Really wants to use them all! 1 SE heuristic picks a simpler model 47 47 46 44 43 41 38 35 30 24 18 12 6 4 2 1 Mean-Squared Error 5 6 7 8 9-6 -5-4 -3-2 -1 0 log(lambda) 63

Comparisons Scatterplot matrix of the predictions and actual All in the test sample 64

Eye Candy Word cloud Which words have large coefficients in the lasso model? 65